Technical SEO Analysis Simplified
Streamlined SEO analysis workflow focused on homepage analysis with AI recommendations
Workflow Information
ID: seo_analysis_simple
Namespace: default
Version: 1.0.0
Created: 2025-07-07
Updated: 2025-07-07
Tasks: 3
Quick Actions
Inputs
| Name | Type | Required | Default |
|---|---|---|---|
target_url |
string | Required | None |
Outputs
| Name | Type | Source |
|---|---|---|
full_report |
string | Complete SEO analysis report with AI recommendations |
raw_analysis |
string | Raw technical analysis data |
ai_recommendations |
string | AI-generated recommendations |
Tasks
analyze_website
scriptNo description
generate_recommendations
ai_agentNo description
compile_report
scriptNo description
YAML Source
id: seo_analysis_simple
name: Technical SEO Analysis Simplified
tasks:
- id: analyze_website
type: script
script: "import json\nimport requests\nfrom bs4 import BeautifulSoup\nimport time\n\
import os\nfrom urllib.parse import urlparse, urljoin\n\ntarget_url = os.environ.get('target_url',\
\ '')\n\nresults = {\n \"url\": target_url,\n \"homepage\": {},\n \"\
robots_txt\": {},\n \"sitemap\": {},\n \"overall_issues\": []\n}\n\ntry:\n\
\ # Parse URL\n parsed = urlparse(target_url)\n domain = parsed.netloc\n\
\ base_url = f\"{parsed.scheme}://{domain}\"\n \n # 1. Analyze homepage\n\
\ print(f\"Analyzing homepage: {target_url}\")\n start_time = time.time()\n\
\ response = requests.get(target_url, timeout=15, headers={\n 'User-Agent':\
\ 'Mozilla/5.0 (compatible; SEO-Analyzer/1.0)'\n })\n load_time = time.time()\
\ - start_time\n \n soup = BeautifulSoup(response.text, 'html.parser')\n\
\ \n # Extract all SEO elements\n title = soup.find('title')\n meta_desc\
\ = soup.find('meta', attrs={'name': 'description'})\n meta_keywords = soup.find('meta',\
\ attrs={'name': 'keywords'})\n canonical = soup.find('link', attrs={'rel':\
\ 'canonical'})\n \n # Headers\n h1_tags = soup.find_all('h1')\n h2_tags\
\ = soup.find_all('h2')\n h3_tags = soup.find_all('h3')\n \n # Images\n\
\ images = soup.find_all('img')\n images_without_alt = [img for img in images\
\ if not img.get('alt', '').strip()]\n \n # Links analysis\n links =\
\ soup.find_all('a', href=True)\n internal_links = set()\n external_links\
\ = set()\n \n for link in links:\n href = link.get('href', '')\n\
\ if href.startswith('http'):\n if domain in href:\n \
\ internal_links.add(href)\n else:\n external_links.add(href)\n\
\ elif href.startswith('/'):\n internal_links.add(urljoin(base_url,\
\ href))\n \n # Schema markup\n schema_scripts = soup.find_all('script',\
\ type='application/ld+json')\n schema_data = []\n for script in schema_scripts:\n\
\ try:\n schema_data.append(json.loads(script.string))\n \
\ except:\n pass\n \n # Mobile & technical\n viewport\
\ = soup.find('meta', attrs={'name': 'viewport'})\n charset = soup.find('meta',\
\ attrs={'charset': True}) or soup.find('meta', attrs={'http-equiv': 'Content-Type'})\n\
\ \n # Open Graph\n og_tags = {}\n for tag in soup.find_all('meta',\
\ property=True):\n if tag.get('property', '').startswith('og:'):\n \
\ og_tags[tag['property']] = tag.get('content', '')\n \n # Twitter\
\ Card\n twitter_tags = {}\n for tag in soup.find_all('meta', attrs={'name':\
\ True}):\n if tag.get('name', '').startswith('twitter:'):\n \
\ twitter_tags[tag['name']] = tag.get('content', '')\n \n # Page issues\n\
\ issues = []\n \n # Title checks\n if not title or not title.text.strip():\n\
\ issues.append(\"Missing title tag\")\n else:\n title_length\
\ = len(title.text.strip())\n if title_length > 60:\n issues.append(f\"\
Title tag too long ({title_length} chars, recommended: 50-60)\")\n elif\
\ title_length < 30:\n issues.append(f\"Title tag too short ({title_length}\
\ chars, recommended: 30-60)\")\n \n # Meta description checks\n if not\
\ meta_desc:\n issues.append(\"Missing meta description\")\n else:\n\
\ desc_length = len(meta_desc.get('content', ''))\n if desc_length\
\ > 160:\n issues.append(f\"Meta description too long ({desc_length}\
\ chars, recommended: 120-160)\")\n elif desc_length < 70:\n \
\ issues.append(f\"Meta description too short ({desc_length} chars, recommended:\
\ 70-160)\")\n \n # Header checks\n if len(h1_tags) == 0:\n issues.append(\"\
No H1 tag found\")\n elif len(h1_tags) > 1:\n issues.append(f\"Multiple\
\ H1 tags found ({len(h1_tags)})\")\n \n # Image checks\n if images_without_alt:\n\
\ issues.append(f\"{len(images_without_alt)} of {len(images)} images missing\
\ alt text\")\n \n # Technical checks\n if not viewport:\n issues.append(\"\
No mobile viewport meta tag\")\n \n if not charset:\n issues.append(\"\
No character encoding specified\")\n \n if load_time > 3:\n issues.append(f\"\
Slow page load time ({load_time:.2f}s)\")\n \n # Schema checks\n if not\
\ schema_scripts:\n issues.append(\"No structured data (Schema.org) found\"\
)\n \n # Social media checks\n if not og_tags:\n issues.append(\"\
No Open Graph tags found\")\n elif not all(k in og_tags for k in ['og:title',\
\ 'og:description', 'og:image']):\n issues.append(\"Incomplete Open Graph\
\ tags\")\n \n if not twitter_tags:\n issues.append(\"No Twitter\
\ Card tags found\")\n \n results[\"homepage\"] = {\n \"status_code\"\
: response.status_code,\n \"load_time\": round(load_time, 2),\n \
\ \"title\": title.text.strip() if title else None,\n \"title_length\"\
: len(title.text.strip()) if title else 0,\n \"meta_description\": meta_desc.get('content')\
\ if meta_desc else None,\n \"meta_description_length\": len(meta_desc.get('content',\
\ '')) if meta_desc else 0,\n \"canonical_url\": canonical.get('href')\
\ if canonical else None,\n \"h1_count\": len(h1_tags),\n \"h2_count\"\
: len(h2_tags),\n \"h3_count\": len(h3_tags),\n \"images_total\"\
: len(images),\n \"images_without_alt\": len(images_without_alt),\n \
\ \"internal_links_count\": len(internal_links),\n \"external_links_count\"\
: len(external_links),\n \"has_schema_markup\": len(schema_scripts) > 0,\n\
\ \"schema_types\": [s.get('@type') for s in schema_data if '@type' in\
\ s],\n \"has_viewport\": viewport is not None,\n \"has_charset\"\
: charset is not None,\n \"has_og_tags\": len(og_tags) > 0,\n \"\
has_twitter_cards\": len(twitter_tags) > 0,\n \"issues\": issues\n }\n\
\ \n results[\"overall_issues\"].extend(issues)\n \n # 2. Check robots.txt\n\
\ print(\"Checking robots.txt...\")\n robots_url = f\"{base_url}/robots.txt\"\
\n try:\n robots_response = requests.get(robots_url, timeout=5)\n \
\ if robots_response.status_code == 200:\n robots_content = robots_response.text\n\
\ \n # Parse robots.txt\n robots_issues = []\n\
\ has_sitemap = False\n user_agents = {}\n current_agent\
\ = None\n \n for line in robots_content.split('\\n'):\n\
\ line = line.strip()\n if line.lower().startswith('sitemap:'):\n\
\ has_sitemap = True\n elif line.startswith('User-agent:'):\n\
\ current_agent = line.split(':', 1)[1].strip()\n \
\ user_agents[current_agent] = {'allow': [], 'disallow': []}\n \
\ elif line.startswith('Disallow:') and current_agent:\n \
\ path = line.split(':', 1)[1].strip()\n if path:\n\
\ user_agents[current_agent]['disallow'].append(path)\n\
\ \n if not has_sitemap:\n robots_issues.append(\"\
No sitemap reference in robots.txt\")\n \n if '*' in user_agents\
\ and '/' in user_agents.get('*', {}).get('disallow', []):\n robots_issues.append(\"\
Blocking all search engines (Disallow: /)\")\n \n results[\"\
robots_txt\"] = {\n \"exists\": True,\n \"has_sitemap_reference\"\
: has_sitemap,\n \"user_agents_count\": len(user_agents),\n \
\ \"issues\": robots_issues\n }\n results[\"\
overall_issues\"].extend(robots_issues)\n else:\n results[\"\
robots_txt\"] = {\n \"exists\": False,\n \"issues\"\
: [\"No robots.txt file found\"]\n }\n results[\"overall_issues\"\
].append(\"No robots.txt file found\")\n except:\n results[\"robots_txt\"\
] = {\"exists\": False, \"error\": \"Failed to fetch robots.txt\"}\n \n \
\ # 3. Check sitemap\n print(\"Checking sitemap...\")\n sitemap_url = f\"\
{base_url}/sitemap.xml\"\n try:\n sitemap_response = requests.get(sitemap_url,\
\ timeout=5)\n if sitemap_response.status_code == 200:\n results[\"\
sitemap\"] = {\n \"exists\": True,\n \"url\": sitemap_url\n\
\ }\n else:\n results[\"sitemap\"] = {\n \
\ \"exists\": False,\n \"issues\": [\"No sitemap.xml found\
\ at standard location\"]\n }\n results[\"overall_issues\"\
].append(\"No sitemap.xml found\")\n except:\n results[\"sitemap\"]\
\ = {\"exists\": False, \"error\": \"Failed to fetch sitemap\"}\n \nexcept\
\ Exception as e:\n results[\"error\"] = str(e)\n results[\"overall_issues\"\
].append(f\"Analysis failed: {str(e)}\")\n\nprint(f\"__OUTPUTS__ {json.dumps(results)}\"\
)\n"
requirements:
- requests==2.31.0
- beautifulsoup4==4.12.2
- lxml==4.9.3
- id: generate_recommendations
type: ai_agent
prompt: "You are an expert SEO consultant. Analyze this comprehensive technical\
\ SEO data and provide actionable recommendations.\n\nWebsite: ${target_url}\n\
\nAnalysis Results:\n${analyze_website}\n\nBased on this data, provide a thorough\
\ SEO analysis. Consider:\n- Title and meta optimization\n- Content structure\
\ (headers)\n- Technical SEO (speed, mobile, charset)\n- Schema markup implementation\n\
- Social media optimization (OG, Twitter)\n- Image optimization\n- robots.txt\
\ and sitemap presence\n- Internal/external link balance\n\nReturn a JSON object\
\ with:\n{\n \"executive_summary\": \"2-3 sentence overview\",\n \"overall_health_score\"\
: 1-10,\n \"critical_issues\": [\"list of must-fix issues\"],\n \"high_priority_recommendations\"\
: [\"important improvements\"],\n \"quick_wins\": [\"easy fixes with impact\"\
],\n \"long_term_improvements\": [\"strategic changes\"],\n \"strengths\": [\"\
what the site does well\"]\n}\n"
agent_type: analyst
depends_on:
- analyze_website
model_client_id: seo_analyzer
- id: compile_report
type: script
script: "import json\nfrom datetime import datetime\nimport os\n\n# Get analysis\
\ data\nanalysis = json.loads(os.environ.get('analyze_website', '{}'))\n\n# Parse\
\ AI recommendations\nai_rec = os.environ.get('generate_recommendations', '{}')\n\
try:\n if ai_rec and isinstance(ai_rec, str):\n # Extract JSON from\
\ response\n start = ai_rec.find('{')\n end = ai_rec.rfind('}')\
\ + 1\n if start >= 0 and end > start:\n recommendations = json.loads(ai_rec[start:end])\n\
\ else:\n recommendations = {}\n else:\n recommendations\
\ = {}\nexcept:\n recommendations = {\n \"executive_summary\": \"SEO\
\ analysis completed. Review findings below.\",\n \"overall_health_score\"\
: 5,\n \"critical_issues\": analysis.get('overall_issues', [])[:3],\n \
\ \"high_priority_recommendations\": [\"Address critical issues first\"\
],\n \"quick_wins\": [\"Add missing meta tags\", \"Optimize images\"],\n\
\ \"long_term_improvements\": [\"Implement structured data\"],\n \
\ \"strengths\": []\n }\n\n# Build comprehensive report\nreport = {\n \"\
metadata\": {\n \"generated_at\": datetime.now().isoformat(),\n \
\ \"target_url\": os.environ.get('target_url', ''),\n \"analysis_version\"\
: \"1.0.0\"\n },\n \"executive_summary\": recommendations.get(\"executive_summary\"\
, \"\"),\n \"health_score\": {\n \"overall\": recommendations.get(\"\
overall_health_score\", 0),\n \"technical\": 10 if analysis.get('homepage',\
\ {}).get('has_viewport') and analysis.get('homepage', {}).get('has_charset')\
\ else 5,\n \"content\": 10 if analysis.get('homepage', {}).get('h1_count')\
\ == 1 else 5,\n \"performance\": 10 if analysis.get('homepage', {}).get('load_time',\
\ 999) < 3 else 5\n },\n \"issues_summary\": {\n \"total\": len(analysis.get('overall_issues',\
\ [])),\n \"critical\": recommendations.get(\"critical_issues\", []),\n\
\ \"by_category\": {\n \"technical\": [i for i in analysis.get('overall_issues',\
\ []) if any(k in i.lower() for k in ['viewport', 'charset', 'load'])],\n \
\ \"content\": [i for i in analysis.get('overall_issues', []) if any(k\
\ in i.lower() for k in ['h1', 'title', 'description'])],\n \"images\"\
: [i for i in analysis.get('overall_issues', []) if 'image' in i.lower() or 'alt'\
\ in i.lower()],\n \"structured_data\": [i for i in analysis.get('overall_issues',\
\ []) if any(k in i.lower() for k in ['schema', 'og', 'twitter'])]\n }\n\
\ },\n \"recommendations\": {\n \"immediate_action\": recommendations.get(\"\
quick_wins\", []),\n \"high_priority\": recommendations.get(\"high_priority_recommendations\"\
, []),\n \"long_term\": recommendations.get(\"long_term_improvements\"\
, [])\n },\n \"strengths\": recommendations.get(\"strengths\", []),\n \
\ \"detailed_analysis\": analysis\n}\n\nprint(f\"__OUTPUTS__ {json.dumps(report)}\"\
)\n"
depends_on:
- analyze_website
- generate_recommendations
inputs:
- name: target_url
type: string
required: true
validation:
pattern: ^https?://.*
description: The target website URL to analyze
outputs:
full_report:
source: compile_report
description: Complete SEO analysis report with AI recommendations
raw_analysis:
source: analyze_website
description: Raw technical analysis data
ai_recommendations:
source: generate_recommendations
description: AI-generated recommendations
version: 1.0.0
description: Streamlined SEO analysis workflow focused on homepage analysis with AI
recommendations
model_clients:
seo_analyzer:
model: gpt-4o-mini
api_key: ${env.OPENAI_API_KEY}
provider: openai
temperature: 0.3
| Execution ID | Status | Started | Duration | Actions |
|---|---|---|---|---|
1828b2c9...
|
COMPLETED |
2025-07-07
08:03:54 |
N/A | View |