Technical SEO Analysis Simplified

Streamlined SEO analysis workflow focused on homepage analysis with AI recommendations

Back
Workflow Information

ID: seo_analysis_simple

Namespace: default

Version: 1.0.0

Created: 2025-07-07

Updated: 2025-07-07

Tasks: 3

Quick Actions
Manage Secrets
Inputs
Name Type Required Default
target_url string Required None
Outputs
Name Type Source
full_report string Complete SEO analysis report with AI recommendations
raw_analysis string Raw technical analysis data
ai_recommendations string AI-generated recommendations
Tasks
analyze_website
script

No description

generate_recommendations
ai_agent

No description

compile_report
script

No description

YAML Source
id: seo_analysis_simple
name: Technical SEO Analysis Simplified
tasks:
- id: analyze_website
  type: script
  script: "import json\nimport requests\nfrom bs4 import BeautifulSoup\nimport time\n\
    import os\nfrom urllib.parse import urlparse, urljoin\n\ntarget_url = os.environ.get('target_url',\
    \ '')\n\nresults = {\n    \"url\": target_url,\n    \"homepage\": {},\n    \"\
    robots_txt\": {},\n    \"sitemap\": {},\n    \"overall_issues\": []\n}\n\ntry:\n\
    \    # Parse URL\n    parsed = urlparse(target_url)\n    domain = parsed.netloc\n\
    \    base_url = f\"{parsed.scheme}://{domain}\"\n    \n    # 1. Analyze homepage\n\
    \    print(f\"Analyzing homepage: {target_url}\")\n    start_time = time.time()\n\
    \    response = requests.get(target_url, timeout=15, headers={\n        'User-Agent':\
    \ 'Mozilla/5.0 (compatible; SEO-Analyzer/1.0)'\n    })\n    load_time = time.time()\
    \ - start_time\n    \n    soup = BeautifulSoup(response.text, 'html.parser')\n\
    \    \n    # Extract all SEO elements\n    title = soup.find('title')\n    meta_desc\
    \ = soup.find('meta', attrs={'name': 'description'})\n    meta_keywords = soup.find('meta',\
    \ attrs={'name': 'keywords'})\n    canonical = soup.find('link', attrs={'rel':\
    \ 'canonical'})\n    \n    # Headers\n    h1_tags = soup.find_all('h1')\n    h2_tags\
    \ = soup.find_all('h2')\n    h3_tags = soup.find_all('h3')\n    \n    # Images\n\
    \    images = soup.find_all('img')\n    images_without_alt = [img for img in images\
    \ if not img.get('alt', '').strip()]\n    \n    # Links analysis\n    links =\
    \ soup.find_all('a', href=True)\n    internal_links = set()\n    external_links\
    \ = set()\n    \n    for link in links:\n        href = link.get('href', '')\n\
    \        if href.startswith('http'):\n            if domain in href:\n       \
    \         internal_links.add(href)\n            else:\n                external_links.add(href)\n\
    \        elif href.startswith('/'):\n            internal_links.add(urljoin(base_url,\
    \ href))\n    \n    # Schema markup\n    schema_scripts = soup.find_all('script',\
    \ type='application/ld+json')\n    schema_data = []\n    for script in schema_scripts:\n\
    \        try:\n            schema_data.append(json.loads(script.string))\n   \
    \     except:\n            pass\n    \n    # Mobile & technical\n    viewport\
    \ = soup.find('meta', attrs={'name': 'viewport'})\n    charset = soup.find('meta',\
    \ attrs={'charset': True}) or soup.find('meta', attrs={'http-equiv': 'Content-Type'})\n\
    \    \n    # Open Graph\n    og_tags = {}\n    for tag in soup.find_all('meta',\
    \ property=True):\n        if tag.get('property', '').startswith('og:'):\n   \
    \         og_tags[tag['property']] = tag.get('content', '')\n    \n    # Twitter\
    \ Card\n    twitter_tags = {}\n    for tag in soup.find_all('meta', attrs={'name':\
    \ True}):\n        if tag.get('name', '').startswith('twitter:'):\n          \
    \  twitter_tags[tag['name']] = tag.get('content', '')\n    \n    # Page issues\n\
    \    issues = []\n    \n    # Title checks\n    if not title or not title.text.strip():\n\
    \        issues.append(\"Missing title tag\")\n    else:\n        title_length\
    \ = len(title.text.strip())\n        if title_length > 60:\n            issues.append(f\"\
    Title tag too long ({title_length} chars, recommended: 50-60)\")\n        elif\
    \ title_length < 30:\n            issues.append(f\"Title tag too short ({title_length}\
    \ chars, recommended: 30-60)\")\n    \n    # Meta description checks\n    if not\
    \ meta_desc:\n        issues.append(\"Missing meta description\")\n    else:\n\
    \        desc_length = len(meta_desc.get('content', ''))\n        if desc_length\
    \ > 160:\n            issues.append(f\"Meta description too long ({desc_length}\
    \ chars, recommended: 120-160)\")\n        elif desc_length < 70:\n          \
    \  issues.append(f\"Meta description too short ({desc_length} chars, recommended:\
    \ 70-160)\")\n    \n    # Header checks\n    if len(h1_tags) == 0:\n        issues.append(\"\
    No H1 tag found\")\n    elif len(h1_tags) > 1:\n        issues.append(f\"Multiple\
    \ H1 tags found ({len(h1_tags)})\")\n    \n    # Image checks\n    if images_without_alt:\n\
    \        issues.append(f\"{len(images_without_alt)} of {len(images)} images missing\
    \ alt text\")\n    \n    # Technical checks\n    if not viewport:\n        issues.append(\"\
    No mobile viewport meta tag\")\n    \n    if not charset:\n        issues.append(\"\
    No character encoding specified\")\n    \n    if load_time > 3:\n        issues.append(f\"\
    Slow page load time ({load_time:.2f}s)\")\n    \n    # Schema checks\n    if not\
    \ schema_scripts:\n        issues.append(\"No structured data (Schema.org) found\"\
    )\n    \n    # Social media checks\n    if not og_tags:\n        issues.append(\"\
    No Open Graph tags found\")\n    elif not all(k in og_tags for k in ['og:title',\
    \ 'og:description', 'og:image']):\n        issues.append(\"Incomplete Open Graph\
    \ tags\")\n    \n    if not twitter_tags:\n        issues.append(\"No Twitter\
    \ Card tags found\")\n    \n    results[\"homepage\"] = {\n        \"status_code\"\
    : response.status_code,\n        \"load_time\": round(load_time, 2),\n       \
    \ \"title\": title.text.strip() if title else None,\n        \"title_length\"\
    : len(title.text.strip()) if title else 0,\n        \"meta_description\": meta_desc.get('content')\
    \ if meta_desc else None,\n        \"meta_description_length\": len(meta_desc.get('content',\
    \ '')) if meta_desc else 0,\n        \"canonical_url\": canonical.get('href')\
    \ if canonical else None,\n        \"h1_count\": len(h1_tags),\n        \"h2_count\"\
    : len(h2_tags),\n        \"h3_count\": len(h3_tags),\n        \"images_total\"\
    : len(images),\n        \"images_without_alt\": len(images_without_alt),\n   \
    \     \"internal_links_count\": len(internal_links),\n        \"external_links_count\"\
    : len(external_links),\n        \"has_schema_markup\": len(schema_scripts) > 0,\n\
    \        \"schema_types\": [s.get('@type') for s in schema_data if '@type' in\
    \ s],\n        \"has_viewport\": viewport is not None,\n        \"has_charset\"\
    : charset is not None,\n        \"has_og_tags\": len(og_tags) > 0,\n        \"\
    has_twitter_cards\": len(twitter_tags) > 0,\n        \"issues\": issues\n    }\n\
    \    \n    results[\"overall_issues\"].extend(issues)\n    \n    # 2. Check robots.txt\n\
    \    print(\"Checking robots.txt...\")\n    robots_url = f\"{base_url}/robots.txt\"\
    \n    try:\n        robots_response = requests.get(robots_url, timeout=5)\n  \
    \      if robots_response.status_code == 200:\n            robots_content = robots_response.text\n\
    \            \n            # Parse robots.txt\n            robots_issues = []\n\
    \            has_sitemap = False\n            user_agents = {}\n            current_agent\
    \ = None\n            \n            for line in robots_content.split('\\n'):\n\
    \                line = line.strip()\n                if line.lower().startswith('sitemap:'):\n\
    \                    has_sitemap = True\n                elif line.startswith('User-agent:'):\n\
    \                    current_agent = line.split(':', 1)[1].strip()\n         \
    \           user_agents[current_agent] = {'allow': [], 'disallow': []}\n     \
    \           elif line.startswith('Disallow:') and current_agent:\n           \
    \         path = line.split(':', 1)[1].strip()\n                    if path:\n\
    \                        user_agents[current_agent]['disallow'].append(path)\n\
    \            \n            if not has_sitemap:\n                robots_issues.append(\"\
    No sitemap reference in robots.txt\")\n            \n            if '*' in user_agents\
    \ and '/' in user_agents.get('*', {}).get('disallow', []):\n                robots_issues.append(\"\
    Blocking all search engines (Disallow: /)\")\n            \n            results[\"\
    robots_txt\"] = {\n                \"exists\": True,\n                \"has_sitemap_reference\"\
    : has_sitemap,\n                \"user_agents_count\": len(user_agents),\n   \
    \             \"issues\": robots_issues\n            }\n            results[\"\
    overall_issues\"].extend(robots_issues)\n        else:\n            results[\"\
    robots_txt\"] = {\n                \"exists\": False,\n                \"issues\"\
    : [\"No robots.txt file found\"]\n            }\n            results[\"overall_issues\"\
    ].append(\"No robots.txt file found\")\n    except:\n        results[\"robots_txt\"\
    ] = {\"exists\": False, \"error\": \"Failed to fetch robots.txt\"}\n    \n   \
    \ # 3. Check sitemap\n    print(\"Checking sitemap...\")\n    sitemap_url = f\"\
    {base_url}/sitemap.xml\"\n    try:\n        sitemap_response = requests.get(sitemap_url,\
    \ timeout=5)\n        if sitemap_response.status_code == 200:\n            results[\"\
    sitemap\"] = {\n                \"exists\": True,\n                \"url\": sitemap_url\n\
    \            }\n        else:\n            results[\"sitemap\"] = {\n        \
    \        \"exists\": False,\n                \"issues\": [\"No sitemap.xml found\
    \ at standard location\"]\n            }\n            results[\"overall_issues\"\
    ].append(\"No sitemap.xml found\")\n    except:\n        results[\"sitemap\"]\
    \ = {\"exists\": False, \"error\": \"Failed to fetch sitemap\"}\n    \nexcept\
    \ Exception as e:\n    results[\"error\"] = str(e)\n    results[\"overall_issues\"\
    ].append(f\"Analysis failed: {str(e)}\")\n\nprint(f\"__OUTPUTS__ {json.dumps(results)}\"\
    )\n"
  requirements:
  - requests==2.31.0
  - beautifulsoup4==4.12.2
  - lxml==4.9.3
- id: generate_recommendations
  type: ai_agent
  prompt: "You are an expert SEO consultant. Analyze this comprehensive technical\
    \ SEO data and provide actionable recommendations.\n\nWebsite: ${target_url}\n\
    \nAnalysis Results:\n${analyze_website}\n\nBased on this data, provide a thorough\
    \ SEO analysis. Consider:\n- Title and meta optimization\n- Content structure\
    \ (headers)\n- Technical SEO (speed, mobile, charset)\n- Schema markup implementation\n\
    - Social media optimization (OG, Twitter)\n- Image optimization\n- robots.txt\
    \ and sitemap presence\n- Internal/external link balance\n\nReturn a JSON object\
    \ with:\n{\n  \"executive_summary\": \"2-3 sentence overview\",\n  \"overall_health_score\"\
    : 1-10,\n  \"critical_issues\": [\"list of must-fix issues\"],\n  \"high_priority_recommendations\"\
    : [\"important improvements\"],\n  \"quick_wins\": [\"easy fixes with impact\"\
    ],\n  \"long_term_improvements\": [\"strategic changes\"],\n  \"strengths\": [\"\
    what the site does well\"]\n}\n"
  agent_type: analyst
  depends_on:
  - analyze_website
  model_client_id: seo_analyzer
- id: compile_report
  type: script
  script: "import json\nfrom datetime import datetime\nimport os\n\n# Get analysis\
    \ data\nanalysis = json.loads(os.environ.get('analyze_website', '{}'))\n\n# Parse\
    \ AI recommendations\nai_rec = os.environ.get('generate_recommendations', '{}')\n\
    try:\n    if ai_rec and isinstance(ai_rec, str):\n        # Extract JSON from\
    \ response\n        start = ai_rec.find('{')\n        end = ai_rec.rfind('}')\
    \ + 1\n        if start >= 0 and end > start:\n            recommendations = json.loads(ai_rec[start:end])\n\
    \        else:\n            recommendations = {}\n    else:\n        recommendations\
    \ = {}\nexcept:\n    recommendations = {\n        \"executive_summary\": \"SEO\
    \ analysis completed. Review findings below.\",\n        \"overall_health_score\"\
    : 5,\n        \"critical_issues\": analysis.get('overall_issues', [])[:3],\n \
    \       \"high_priority_recommendations\": [\"Address critical issues first\"\
    ],\n        \"quick_wins\": [\"Add missing meta tags\", \"Optimize images\"],\n\
    \        \"long_term_improvements\": [\"Implement structured data\"],\n      \
    \  \"strengths\": []\n    }\n\n# Build comprehensive report\nreport = {\n    \"\
    metadata\": {\n        \"generated_at\": datetime.now().isoformat(),\n       \
    \ \"target_url\": os.environ.get('target_url', ''),\n        \"analysis_version\"\
    : \"1.0.0\"\n    },\n    \"executive_summary\": recommendations.get(\"executive_summary\"\
    , \"\"),\n    \"health_score\": {\n        \"overall\": recommendations.get(\"\
    overall_health_score\", 0),\n        \"technical\": 10 if analysis.get('homepage',\
    \ {}).get('has_viewport') and analysis.get('homepage', {}).get('has_charset')\
    \ else 5,\n        \"content\": 10 if analysis.get('homepage', {}).get('h1_count')\
    \ == 1 else 5,\n        \"performance\": 10 if analysis.get('homepage', {}).get('load_time',\
    \ 999) < 3 else 5\n    },\n    \"issues_summary\": {\n        \"total\": len(analysis.get('overall_issues',\
    \ [])),\n        \"critical\": recommendations.get(\"critical_issues\", []),\n\
    \        \"by_category\": {\n            \"technical\": [i for i in analysis.get('overall_issues',\
    \ []) if any(k in i.lower() for k in ['viewport', 'charset', 'load'])],\n    \
    \        \"content\": [i for i in analysis.get('overall_issues', []) if any(k\
    \ in i.lower() for k in ['h1', 'title', 'description'])],\n            \"images\"\
    : [i for i in analysis.get('overall_issues', []) if 'image' in i.lower() or 'alt'\
    \ in i.lower()],\n            \"structured_data\": [i for i in analysis.get('overall_issues',\
    \ []) if any(k in i.lower() for k in ['schema', 'og', 'twitter'])]\n        }\n\
    \    },\n    \"recommendations\": {\n        \"immediate_action\": recommendations.get(\"\
    quick_wins\", []),\n        \"high_priority\": recommendations.get(\"high_priority_recommendations\"\
    , []),\n        \"long_term\": recommendations.get(\"long_term_improvements\"\
    , [])\n    },\n    \"strengths\": recommendations.get(\"strengths\", []),\n  \
    \  \"detailed_analysis\": analysis\n}\n\nprint(f\"__OUTPUTS__ {json.dumps(report)}\"\
    )\n"
  depends_on:
  - analyze_website
  - generate_recommendations
inputs:
- name: target_url
  type: string
  required: true
  validation:
    pattern: ^https?://.*
  description: The target website URL to analyze
outputs:
  full_report:
    source: compile_report
    description: Complete SEO analysis report with AI recommendations
  raw_analysis:
    source: analyze_website
    description: Raw technical analysis data
  ai_recommendations:
    source: generate_recommendations
    description: AI-generated recommendations
version: 1.0.0
description: Streamlined SEO analysis workflow focused on homepage analysis with AI
  recommendations
model_clients:
  seo_analyzer:
    model: gpt-4o-mini
    api_key: ${env.OPENAI_API_KEY}
    provider: openai
    temperature: 0.3
Execution ID Status Started Duration Actions
1828b2c9... COMPLETED 2025-07-07
08:03:54
N/A View