Documentation Search with AI Assistant

Workflow that searches documentation, performs vector search, and provides AI-powered answers

Back
Workflow Information

ID: documentation_search_workflow

Namespace: default

Version: N/A

Created: 2025-07-30

Updated: 2025-08-13

Tasks: 3

Quick Actions
Manage Secrets
Inputs
Name Type Required Default
search_query string Required None
Outputs
Name Type Source
ai_response string AI-generated answer based on documentation search
search_results object Relevant documentation sections found
Tasks
fetch_documentation
script

No description

vector_search
script

No description

ai_assistant
ai_agent

No description

YAML Source
id: documentation_search_workflow
name: Documentation Search with AI Assistant
tasks:
- id: fetch_documentation
  name: Fetch Documentation API
  type: script
  script: "import requests\nimport json\n\n# Call documentation API endpoint - try\
    \ multiple URLs for Docker environments\ntry:\n    # Try multiple API endpoints\
    \ for different Docker networking scenarios\n    api_endpoints = [\n        \"\
    http://host.docker.internal:5000/api/docs/yaml-reference\",  # Docker Desktop\n\
    \        \"http://172.17.0.1:5000/api/docs/yaml-reference\",           # Docker\
    \ bridge network\n        \"http://localhost:5000/api/docs/yaml-reference\", \
    \           # Local development\n        \"http://0.0.0.0:5000/api/docs/yaml-reference\"\
    ,             # Bind all interfaces\n        \"http://127.0.0.1:5000/api/docs/yaml-reference\"\
    \            # Loopback\n        \"https://workflow-dev.assistents.ai/api/docs/yaml-reference\"\
    ,  # Production environment\n    ]\n    \n    response = None\n    successful_url\
    \ = None\n    \n    for api_url in api_endpoints:\n        try:\n            print(f\"\
    Trying to fetch documentation from: {api_url}\")\n            response = requests.get(api_url,\
    \ timeout=10)\n            if response.status_code == 200:\n                successful_url\
    \ = api_url\n                print(f\"\u2705 Successfully connected to: {api_url}\"\
    )\n                break\n            else:\n                print(f\"\u274C Failed\
    \ with status {response.status_code}: {api_url}\")\n        except requests.exceptions.RequestException\
    \ as e:\n            print(f\"\u274C Connection failed: {api_url} - {str(e)}\"\
    )\n            continue\n    \n    if response is None or response.status_code\
    \ != 200:\n        raise Exception(\"Could not connect to documentation API on\
    \ any of the attempted URLs\")\n    \n    # Process successful response\n    doc_data\
    \ = response.json()\n    print(f\"Successfully fetched documentation from {successful_url}:\
    \ {len(str(doc_data))} characters\")\n    \n    # Store the documentation data\
    \ for next task\n    outputs = {}\n    outputs['documentation'] = doc_data\n \
    \   outputs['status'] = 'success'\n    outputs['successful_url'] = successful_url\n\
    \    \n    # Print outputs in the required format\n    import json\n    print(f\"\
    __OUTPUTS__ {json.dumps(outputs)}\")\n        \nexcept Exception as e:\n    print(f\"\
    Error fetching documentation: {str(e)}\")\n    outputs = {}\n    outputs['status']\
    \ = 'error'\n    outputs['error'] = str(e)\n    \n    # Print outputs in the required\
    \ format\n    import json\n    print(f\"__OUTPUTS__ {json.dumps(outputs)}\")\n"
  timeout_seconds: 120
- id: vector_search
  name: ChromaDB Vector Search Documentation
  type: script
  script: "import json\nimport re\nfrom langchain_chroma import Chroma\nfrom langchain_community.embeddings\
    \ import JinaEmbeddings\nfrom langchain_text_splitters import MarkdownHeaderTextSplitter,\
    \ RecursiveCharacterTextSplitter\n\ndef extract_text_content(data, path=None):\n\
    \    \"\"\"Recursively extract text content from nested data\"\"\"\n    content_list\
    \ = []\n    current_path = path or \"root\"\n    \n    if isinstance(data, dict):\n\
    \        for key, value in data.items():\n            new_path = f\"{current_path}.{key}\"\
    \ if current_path != \"root\" else key\n            if isinstance(value, str)\
    \ and len(value.strip()) > 50:\n                content_list.append({\n      \
    \              'path': new_path,\n                    'content': value.strip(),\n\
    \                    'type': 'text'\n                })\n            else:\n \
    \               content_list.extend(extract_text_content(value, new_path))\n \
    \   elif isinstance(data, list):\n        for i, item in enumerate(data):\n  \
    \          new_path = f\"{current_path}[{i}]\" if current_path != \"root\" else\
    \ f\"[{i}]\"\n            content_list.extend(extract_text_content(item, new_path))\n\
    \    elif isinstance(data, str) and len(data.strip()) > 50:\n        content_list.append({\n\
    \            'path': current_path,\n            'content': data.strip(),\n   \
    \         'type': 'text'\n        })\n        \n    return content_list\n\ndef\
    \ split_text_by_structure(text, source_path):\n    \"\"\"Split text by headings\
    \ and subheadings using LangChain splitters\"\"\"\n    chunks = []\n    \n   \
    \ # Check if text contains markdown headers\n    if re.search(r'^#{1,6}\\s+',\
    \ text, re.MULTILINE):\n        # Use MarkdownHeaderTextSplitter for markdown\
    \ content\n        headers_to_split_on = [\n            (\"#\", \"Header 1\"),\n\
    \            (\"##\", \"Header 2\"), \n            (\"###\", \"Header 3\"),\n\
    \            (\"####\", \"Header 4\"),\n            (\"#####\", \"Header 5\"),\n\
    \            (\"######\", \"Header 6\"),\n        ]\n        \n        markdown_splitter\
    \ = MarkdownHeaderTextSplitter(\n            headers_to_split_on=headers_to_split_on,\n\
    \            strip_headers=False\n        )\n        \n        try:\n        \
    \    md_header_splits = markdown_splitter.split_text(text)\n            \n   \
    \         # Further split large chunks using RecursiveCharacterTextSplitter\n\
    \            text_splitter = RecursiveCharacterTextSplitter(\n               \
    \ chunk_size=800,  # Optimal size for embeddings\n                chunk_overlap=100,\
    \  # Overlap to maintain context\n                separators=[\"\\n\\n\", \"\\\
    n\", \". \", \" \", \"\"]\n            )\n            \n            final_splits\
    \ = text_splitter.split_documents(md_header_splits)\n            \n          \
    \  for i, split in enumerate(final_splits):\n                # Preserve header\
    \ metadata\n                header_info = []\n                for key, value in\
    \ split.metadata.items():\n                    if key.startswith('Header'):\n\
    \                        header_info.append(f\"{key}: {value}\")\n           \
    \     \n                header_context = \" > \".join(header_info) if header_info\
    \ else \"\"\n                \n                chunks.append({\n             \
    \       'content': split.page_content,\n                    'path': f\"{source_path}.chunk_{i}\"\
    ,\n                    'headers': header_context,\n                    'chunk_index':\
    \ i,\n                    'source': source_path\n                })\n        \
    \        \n        except Exception as e:\n            print(f\"Markdown splitting\
    \ failed for {source_path}, using fallback: {e}\")\n            # Fallback to\
    \ simple splitting\n            fallback_splitter = RecursiveCharacterTextSplitter(\n\
    \                chunk_size=800,\n                chunk_overlap=100\n        \
    \    )\n            simple_splits = fallback_splitter.split_text(text)\n     \
    \       \n            for i, split in enumerate(simple_splits):\n            \
    \    chunks.append({\n                    'content': split,\n                \
    \    'path': f\"{source_path}.chunk_{i}\",\n                    'headers': \"\"\
    ,\n                    'chunk_index': i,\n                    'source': source_path\n\
    \                })\n    else:\n        # Use RecursiveCharacterTextSplitter for\
    \ plain text\n        text_splitter = RecursiveCharacterTextSplitter(\n      \
    \      chunk_size=800,\n            chunk_overlap=100,\n            separators=[\"\
    \\n\\n\", \"\\n\", \". \", \" \", \"\"]\n        )\n        \n        splits =\
    \ text_splitter.split_text(text)\n        \n        for i, split in enumerate(splits):\n\
    \            chunks.append({\n                'content': split,\n            \
    \    'path': f\"{source_path}.chunk_{i}\",\n                'headers': \"\",\n\
    \                'chunk_index': i,\n                'source': source_path\n  \
    \          })\n    \n    return chunks\n\ntry:\n    # Get documentation from previous\
    \ task\n    fetch_result = ${fetch_documentation}\n    documentation = fetch_result.get('documentation',\
    \ {})\n    search_query = \"${search_query}\"\n    \n    print(f\"Processing vector\
    \ search for query: '{search_query}'\")\n    \n    # Extract all text content\
    \ from documentation\n    all_content = extract_text_content(documentation)\n\
    \    print(f\"Extracted {len(all_content)} content sections from documentation\"\
    )\n    \n    if not all_content:\n        print(\"No content found in documentation\"\
    )\n        outputs = {\n            'search_results': [],\n            'search_query':\
    \ search_query,\n            'total_results': 0,\n            'status': 'success',\n\
    \            'message': 'No content found in documentation'\n        }\n     \
    \   print(f\"__OUTPUTS__ {json.dumps(outputs)}\")\n        exit()\n    \n    #\
    \ Split text content by structure (headings/subheadings)\n    all_chunks = []\n\
    \    for content_item in all_content:\n        chunks = split_text_by_structure(content_item['content'],\
    \ content_item['path'])\n        all_chunks.extend(chunks)\n    \n    print(f\"\
    Split into {len(all_chunks)} semantic chunks\")\n    \n    # Initialize Jina embeddings\n\
    \    print(\"Initializing embeddings...\")\n    embeddings = JinaEmbeddings(\n\
    \        jina_api_key=\"jina_45105ba73bf2426084abf11fb7710efaL7HzX9Yxl26RSmfkUMt9tM2M8XDY\"\
    ,\n        model_name=\"jina-embeddings-v3\"\n    )\n    \n    # Create in-memory\
    \ ChromaDB vector store\n    vector_store = Chroma(embedding_function=embeddings)\n\
    \    \n    # Prepare documents and metadata for embedding\n    documents = [chunk['content']\
    \ for chunk in all_chunks]\n    metadatas = [{\n        'path': chunk['path'],\n\
    \        'headers': chunk['headers'],\n        'source': chunk['source'],\n  \
    \      'chunk_index': chunk['chunk_index'],\n        'length': len(chunk['content'])\n\
    \    } for chunk in all_chunks]\n    \n    print(f\"Embedding {len(documents)}\
    \ text chunks...\")\n    \n    # Add documents to vector store (this creates embeddings)\n\
    \    vector_store.add_texts(texts=documents, metadatas=metadatas)\n    \n    print(\"\
    Performing semantic similarity search...\")\n    \n    # Perform vector similarity\
    \ search\n    results = vector_store.similarity_search_with_score(\n        query=search_query,\n\
    \        k=min(12, len(documents))  # Get more results for better coverage\n \
    \   )\n    \n    print(f\"Found {len(results)} semantic matches\")\n    \n   \
    \ # Process and rank results\n    search_results = []\n    \n    for doc, distance_score\
    \ in results:\n        # Convert distance to similarity score (higher = more similar)\n\
    \        similarity_score = 1.0 / (1.0 + distance_score)\n        \n        #\
    \ Include results with reasonable similarity\n        if similarity_score > 0.25:\
    \  # Lower threshold for better recall\n            result = {\n             \
    \   'path': doc.metadata['path'],\n                'content': doc.page_content[:1000],\
    \  # Limit content length\n                'relevance_score': similarity_score,\n\
    \                'distance_score': distance_score,\n                'headers':\
    \ doc.metadata.get('headers', ''),\n                'source': doc.metadata.get('source',\
    \ ''),\n                'content_length': len(doc.page_content)\n            }\n\
    \            search_results.append(result)\n    \n    # Sort by similarity score\
    \ (descending)\n    search_results.sort(key=lambda x: x['relevance_score'], reverse=True)\n\
    \    \n    # Take top 8 results\n    top_results = search_results[:8]\n    \n\
    \    print(f\"Returning {len(top_results)} most relevant results:\")\n    for\
    \ i, result in enumerate(top_results[:3]):\n        headers_info = f\" [{result['headers']}]\"\
    \ if result['headers'] else \"\"\n        print(f\"  {i+1}. {result['source'][:40]}...{headers_info}\
    \ (score: {result['relevance_score']:.3f})\")\n    \n    outputs = {\n       \
    \ 'search_results': top_results,\n        'search_query': search_query,\n    \
    \    'total_results': len(search_results),\n        'total_chunks': len(all_chunks),\n\
    \        'status': 'success'\n    }\n    \n    print(f\"__OUTPUTS__ {json.dumps(outputs)}\"\
    )\n    \nexcept Exception as e:\n    print(f\"Error in vector search: {str(e)}\"\
    )\n    import traceback\n    print(f\"Traceback: {traceback.format_exc()}\")\n\
    \    \n    outputs = {\n        'status': 'error',\n        'error': str(e),\n\
    \        'search_results': [],\n        'search_query': search_query,\n      \
    \  'total_results': 0\n    }\n    \n    print(f\"__OUTPUTS__ {json.dumps(outputs)}\"\
    )\n"
  depends_on:
  - fetch_documentation
  requirements:
  - langchain-chroma
  - langchain-community
  - langchain-text-splitters
  timeout_seconds: 180
- id: ai_assistant
  name: AI Documentation Assistant
  type: ai_agent
  config:
    user_message: 'Please answer the following question about the workflow engine
      documentation:


      **User Question:** ${search_query}


      **Relevant Documentation Sections:**

      ${vector_search.search_results}


      Based on the documentation search results above, please provide a comprehensive
      answer to the user''s question.

      Include specific examples and reference the relevant documentation sections.

      '
    system_message: "You are a helpful AI assistant that answers questions about workflow\
      \ engine documentation.\n\nYour role is to:\n1. Analyze the provided documentation\
      \ search results\n2. Answer the user's question based on the relevant documentation\
      \ content\n3. Provide clear, accurate, and helpful responses\n4. Include specific\
      \ examples when possible\n5. Reference the documentation sections used in your\
      \ answer\n\nIf the search results don't contain enough information to answer\
      \ the question, \nsay so clearly and suggest what additional information might\
      \ be needed.\n"
    model_client_id: openrouter_kimi
  depends_on:
  - vector_search
  timeout_seconds: 60
inputs:
- name: search_query
  type: string
  required: true
  description: User's search query for documentation
outputs:
  ai_response:
    type: string
    source: ai_assistant.ai_response
    description: AI-generated answer based on documentation search
  search_results:
    type: object
    source: vector_search.search_results
    description: Relevant documentation sections found
description: Workflow that searches documentation, performs vector search, and provides
  AI-powered answers
model_clients:
- id: openrouter_kimi
  config:
    model: moonshotai/kimi-k2
    api_key: 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
    base_url: https://openrouter.ai/api/v1
  provider: openrouter
Execution ID Status Started Duration Actions
5bb4fd71... COMPLETED 2025-08-14
11:23:05
N/A View
82e5215d... COMPLETED 2025-08-13
08:23:07
N/A View
ce96f7fa... COMPLETED 2025-08-13
08:21:21
N/A View
c6b2439e... COMPLETED 2025-08-13
08:19:58
N/A View
226238da... COMPLETED 2025-08-13
06:57:52
N/A View
c6f50413... COMPLETED 2025-08-13
06:55:47
N/A View
b43cdc36... COMPLETED 2025-08-13
06:54:39
N/A View
49613d8c... COMPLETED 2025-08-11
08:06:23
N/A View
6ff85e34... COMPLETED 2025-08-08
11:26:21
N/A View
32022384... COMPLETED 2025-08-08
11:23:42
N/A View
500ef903... COMPLETED 2025-08-08
11:12:49
N/A View
97431cba... COMPLETED 2025-08-08
10:57:49
N/A View
493c6edd... COMPLETED 2025-08-08
10:28:42
N/A View
17648457... COMPLETED 2025-08-07
12:33:18
N/A View
f2303d91... COMPLETED 2025-08-07
12:24:17
N/A View
ced44353... COMPLETED 2025-08-07
11:50:11
N/A View
21564bfc... COMPLETED 2025-08-07
11:37:09
N/A View
dce7edc8... COMPLETED 2025-08-07
06:09:12
N/A View
c6d99bee... COMPLETED 2025-08-06
07:39:29
N/A View
9243c9f6... COMPLETED 2025-08-06
06:59:27
N/A View