Documentation Search with AI Assistant
Workflow that searches documentation, performs vector search, and provides AI-powered answers
Workflow Information
ID: documentation_search_workflow
Namespace: default
Version: N/A
Created: 2025-07-30
Updated: 2025-08-13
Tasks: 3
Quick Actions
Inputs
| Name | Type | Required | Default |
|---|---|---|---|
search_query |
string | Required | None |
Outputs
| Name | Type | Source |
|---|---|---|
ai_response |
string | AI-generated answer based on documentation search |
search_results |
object | Relevant documentation sections found |
Tasks
fetch_documentation
scriptNo description
vector_search
scriptNo description
ai_assistant
ai_agentNo description
YAML Source
id: documentation_search_workflow
name: Documentation Search with AI Assistant
tasks:
- id: fetch_documentation
name: Fetch Documentation API
type: script
script: "import requests\nimport json\n\n# Call documentation API endpoint - try\
\ multiple URLs for Docker environments\ntry:\n # Try multiple API endpoints\
\ for different Docker networking scenarios\n api_endpoints = [\n \"\
http://host.docker.internal:5000/api/docs/yaml-reference\", # Docker Desktop\n\
\ \"http://172.17.0.1:5000/api/docs/yaml-reference\", # Docker\
\ bridge network\n \"http://localhost:5000/api/docs/yaml-reference\", \
\ # Local development\n \"http://0.0.0.0:5000/api/docs/yaml-reference\"\
, # Bind all interfaces\n \"http://127.0.0.1:5000/api/docs/yaml-reference\"\
\ # Loopback\n \"https://workflow-dev.assistents.ai/api/docs/yaml-reference\"\
, # Production environment\n ]\n \n response = None\n successful_url\
\ = None\n \n for api_url in api_endpoints:\n try:\n print(f\"\
Trying to fetch documentation from: {api_url}\")\n response = requests.get(api_url,\
\ timeout=10)\n if response.status_code == 200:\n successful_url\
\ = api_url\n print(f\"\u2705 Successfully connected to: {api_url}\"\
)\n break\n else:\n print(f\"\u274C Failed\
\ with status {response.status_code}: {api_url}\")\n except requests.exceptions.RequestException\
\ as e:\n print(f\"\u274C Connection failed: {api_url} - {str(e)}\"\
)\n continue\n \n if response is None or response.status_code\
\ != 200:\n raise Exception(\"Could not connect to documentation API on\
\ any of the attempted URLs\")\n \n # Process successful response\n doc_data\
\ = response.json()\n print(f\"Successfully fetched documentation from {successful_url}:\
\ {len(str(doc_data))} characters\")\n \n # Store the documentation data\
\ for next task\n outputs = {}\n outputs['documentation'] = doc_data\n \
\ outputs['status'] = 'success'\n outputs['successful_url'] = successful_url\n\
\ \n # Print outputs in the required format\n import json\n print(f\"\
__OUTPUTS__ {json.dumps(outputs)}\")\n \nexcept Exception as e:\n print(f\"\
Error fetching documentation: {str(e)}\")\n outputs = {}\n outputs['status']\
\ = 'error'\n outputs['error'] = str(e)\n \n # Print outputs in the required\
\ format\n import json\n print(f\"__OUTPUTS__ {json.dumps(outputs)}\")\n"
timeout_seconds: 120
- id: vector_search
name: ChromaDB Vector Search Documentation
type: script
script: "import json\nimport re\nfrom langchain_chroma import Chroma\nfrom langchain_community.embeddings\
\ import JinaEmbeddings\nfrom langchain_text_splitters import MarkdownHeaderTextSplitter,\
\ RecursiveCharacterTextSplitter\n\ndef extract_text_content(data, path=None):\n\
\ \"\"\"Recursively extract text content from nested data\"\"\"\n content_list\
\ = []\n current_path = path or \"root\"\n \n if isinstance(data, dict):\n\
\ for key, value in data.items():\n new_path = f\"{current_path}.{key}\"\
\ if current_path != \"root\" else key\n if isinstance(value, str)\
\ and len(value.strip()) > 50:\n content_list.append({\n \
\ 'path': new_path,\n 'content': value.strip(),\n\
\ 'type': 'text'\n })\n else:\n \
\ content_list.extend(extract_text_content(value, new_path))\n \
\ elif isinstance(data, list):\n for i, item in enumerate(data):\n \
\ new_path = f\"{current_path}[{i}]\" if current_path != \"root\" else\
\ f\"[{i}]\"\n content_list.extend(extract_text_content(item, new_path))\n\
\ elif isinstance(data, str) and len(data.strip()) > 50:\n content_list.append({\n\
\ 'path': current_path,\n 'content': data.strip(),\n \
\ 'type': 'text'\n })\n \n return content_list\n\ndef\
\ split_text_by_structure(text, source_path):\n \"\"\"Split text by headings\
\ and subheadings using LangChain splitters\"\"\"\n chunks = []\n \n \
\ # Check if text contains markdown headers\n if re.search(r'^#{1,6}\\s+',\
\ text, re.MULTILINE):\n # Use MarkdownHeaderTextSplitter for markdown\
\ content\n headers_to_split_on = [\n (\"#\", \"Header 1\"),\n\
\ (\"##\", \"Header 2\"), \n (\"###\", \"Header 3\"),\n\
\ (\"####\", \"Header 4\"),\n (\"#####\", \"Header 5\"),\n\
\ (\"######\", \"Header 6\"),\n ]\n \n markdown_splitter\
\ = MarkdownHeaderTextSplitter(\n headers_to_split_on=headers_to_split_on,\n\
\ strip_headers=False\n )\n \n try:\n \
\ md_header_splits = markdown_splitter.split_text(text)\n \n \
\ # Further split large chunks using RecursiveCharacterTextSplitter\n\
\ text_splitter = RecursiveCharacterTextSplitter(\n \
\ chunk_size=800, # Optimal size for embeddings\n chunk_overlap=100,\
\ # Overlap to maintain context\n separators=[\"\\n\\n\", \"\\\
n\", \". \", \" \", \"\"]\n )\n \n final_splits\
\ = text_splitter.split_documents(md_header_splits)\n \n \
\ for i, split in enumerate(final_splits):\n # Preserve header\
\ metadata\n header_info = []\n for key, value in\
\ split.metadata.items():\n if key.startswith('Header'):\n\
\ header_info.append(f\"{key}: {value}\")\n \
\ \n header_context = \" > \".join(header_info) if header_info\
\ else \"\"\n \n chunks.append({\n \
\ 'content': split.page_content,\n 'path': f\"{source_path}.chunk_{i}\"\
,\n 'headers': header_context,\n 'chunk_index':\
\ i,\n 'source': source_path\n })\n \
\ \n except Exception as e:\n print(f\"Markdown splitting\
\ failed for {source_path}, using fallback: {e}\")\n # Fallback to\
\ simple splitting\n fallback_splitter = RecursiveCharacterTextSplitter(\n\
\ chunk_size=800,\n chunk_overlap=100\n \
\ )\n simple_splits = fallback_splitter.split_text(text)\n \
\ \n for i, split in enumerate(simple_splits):\n \
\ chunks.append({\n 'content': split,\n \
\ 'path': f\"{source_path}.chunk_{i}\",\n 'headers': \"\"\
,\n 'chunk_index': i,\n 'source': source_path\n\
\ })\n else:\n # Use RecursiveCharacterTextSplitter for\
\ plain text\n text_splitter = RecursiveCharacterTextSplitter(\n \
\ chunk_size=800,\n chunk_overlap=100,\n separators=[\"\
\\n\\n\", \"\\n\", \". \", \" \", \"\"]\n )\n \n splits =\
\ text_splitter.split_text(text)\n \n for i, split in enumerate(splits):\n\
\ chunks.append({\n 'content': split,\n \
\ 'path': f\"{source_path}.chunk_{i}\",\n 'headers': \"\",\n\
\ 'chunk_index': i,\n 'source': source_path\n \
\ })\n \n return chunks\n\ntry:\n # Get documentation from previous\
\ task\n fetch_result = ${fetch_documentation}\n documentation = fetch_result.get('documentation',\
\ {})\n search_query = \"${search_query}\"\n \n print(f\"Processing vector\
\ search for query: '{search_query}'\")\n \n # Extract all text content\
\ from documentation\n all_content = extract_text_content(documentation)\n\
\ print(f\"Extracted {len(all_content)} content sections from documentation\"\
)\n \n if not all_content:\n print(\"No content found in documentation\"\
)\n outputs = {\n 'search_results': [],\n 'search_query':\
\ search_query,\n 'total_results': 0,\n 'status': 'success',\n\
\ 'message': 'No content found in documentation'\n }\n \
\ print(f\"__OUTPUTS__ {json.dumps(outputs)}\")\n exit()\n \n #\
\ Split text content by structure (headings/subheadings)\n all_chunks = []\n\
\ for content_item in all_content:\n chunks = split_text_by_structure(content_item['content'],\
\ content_item['path'])\n all_chunks.extend(chunks)\n \n print(f\"\
Split into {len(all_chunks)} semantic chunks\")\n \n # Initialize Jina embeddings\n\
\ print(\"Initializing embeddings...\")\n embeddings = JinaEmbeddings(\n\
\ jina_api_key=\"jina_45105ba73bf2426084abf11fb7710efaL7HzX9Yxl26RSmfkUMt9tM2M8XDY\"\
,\n model_name=\"jina-embeddings-v3\"\n )\n \n # Create in-memory\
\ ChromaDB vector store\n vector_store = Chroma(embedding_function=embeddings)\n\
\ \n # Prepare documents and metadata for embedding\n documents = [chunk['content']\
\ for chunk in all_chunks]\n metadatas = [{\n 'path': chunk['path'],\n\
\ 'headers': chunk['headers'],\n 'source': chunk['source'],\n \
\ 'chunk_index': chunk['chunk_index'],\n 'length': len(chunk['content'])\n\
\ } for chunk in all_chunks]\n \n print(f\"Embedding {len(documents)}\
\ text chunks...\")\n \n # Add documents to vector store (this creates embeddings)\n\
\ vector_store.add_texts(texts=documents, metadatas=metadatas)\n \n print(\"\
Performing semantic similarity search...\")\n \n # Perform vector similarity\
\ search\n results = vector_store.similarity_search_with_score(\n query=search_query,\n\
\ k=min(12, len(documents)) # Get more results for better coverage\n \
\ )\n \n print(f\"Found {len(results)} semantic matches\")\n \n \
\ # Process and rank results\n search_results = []\n \n for doc, distance_score\
\ in results:\n # Convert distance to similarity score (higher = more similar)\n\
\ similarity_score = 1.0 / (1.0 + distance_score)\n \n #\
\ Include results with reasonable similarity\n if similarity_score > 0.25:\
\ # Lower threshold for better recall\n result = {\n \
\ 'path': doc.metadata['path'],\n 'content': doc.page_content[:1000],\
\ # Limit content length\n 'relevance_score': similarity_score,\n\
\ 'distance_score': distance_score,\n 'headers':\
\ doc.metadata.get('headers', ''),\n 'source': doc.metadata.get('source',\
\ ''),\n 'content_length': len(doc.page_content)\n }\n\
\ search_results.append(result)\n \n # Sort by similarity score\
\ (descending)\n search_results.sort(key=lambda x: x['relevance_score'], reverse=True)\n\
\ \n # Take top 8 results\n top_results = search_results[:8]\n \n\
\ print(f\"Returning {len(top_results)} most relevant results:\")\n for\
\ i, result in enumerate(top_results[:3]):\n headers_info = f\" [{result['headers']}]\"\
\ if result['headers'] else \"\"\n print(f\" {i+1}. {result['source'][:40]}...{headers_info}\
\ (score: {result['relevance_score']:.3f})\")\n \n outputs = {\n \
\ 'search_results': top_results,\n 'search_query': search_query,\n \
\ 'total_results': len(search_results),\n 'total_chunks': len(all_chunks),\n\
\ 'status': 'success'\n }\n \n print(f\"__OUTPUTS__ {json.dumps(outputs)}\"\
)\n \nexcept Exception as e:\n print(f\"Error in vector search: {str(e)}\"\
)\n import traceback\n print(f\"Traceback: {traceback.format_exc()}\")\n\
\ \n outputs = {\n 'status': 'error',\n 'error': str(e),\n\
\ 'search_results': [],\n 'search_query': search_query,\n \
\ 'total_results': 0\n }\n \n print(f\"__OUTPUTS__ {json.dumps(outputs)}\"\
)\n"
depends_on:
- fetch_documentation
requirements:
- langchain-chroma
- langchain-community
- langchain-text-splitters
timeout_seconds: 180
- id: ai_assistant
name: AI Documentation Assistant
type: ai_agent
config:
user_message: 'Please answer the following question about the workflow engine
documentation:
**User Question:** ${search_query}
**Relevant Documentation Sections:**
${vector_search.search_results}
Based on the documentation search results above, please provide a comprehensive
answer to the user''s question.
Include specific examples and reference the relevant documentation sections.
'
system_message: "You are a helpful AI assistant that answers questions about workflow\
\ engine documentation.\n\nYour role is to:\n1. Analyze the provided documentation\
\ search results\n2. Answer the user's question based on the relevant documentation\
\ content\n3. Provide clear, accurate, and helpful responses\n4. Include specific\
\ examples when possible\n5. Reference the documentation sections used in your\
\ answer\n\nIf the search results don't contain enough information to answer\
\ the question, \nsay so clearly and suggest what additional information might\
\ be needed.\n"
model_client_id: openrouter_kimi
depends_on:
- vector_search
timeout_seconds: 60
inputs:
- name: search_query
type: string
required: true
description: User's search query for documentation
outputs:
ai_response:
type: string
source: ai_assistant.ai_response
description: AI-generated answer based on documentation search
search_results:
type: object
source: vector_search.search_results
description: Relevant documentation sections found
description: Workflow that searches documentation, performs vector search, and provides
AI-powered answers
model_clients:
- id: openrouter_kimi
config:
model: moonshotai/kimi-k2
api_key: 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
base_url: https://openrouter.ai/api/v1
provider: openrouter
| Execution ID | Status | Started | Duration | Actions |
|---|---|---|---|---|
5bb4fd71...
|
COMPLETED |
2025-08-14
11:23:05 |
N/A | View |
82e5215d...
|
COMPLETED |
2025-08-13
08:23:07 |
N/A | View |
ce96f7fa...
|
COMPLETED |
2025-08-13
08:21:21 |
N/A | View |
c6b2439e...
|
COMPLETED |
2025-08-13
08:19:58 |
N/A | View |
226238da...
|
COMPLETED |
2025-08-13
06:57:52 |
N/A | View |
c6f50413...
|
COMPLETED |
2025-08-13
06:55:47 |
N/A | View |
b43cdc36...
|
COMPLETED |
2025-08-13
06:54:39 |
N/A | View |
49613d8c...
|
COMPLETED |
2025-08-11
08:06:23 |
N/A | View |
6ff85e34...
|
COMPLETED |
2025-08-08
11:26:21 |
N/A | View |
32022384...
|
COMPLETED |
2025-08-08
11:23:42 |
N/A | View |
500ef903...
|
COMPLETED |
2025-08-08
11:12:49 |
N/A | View |
97431cba...
|
COMPLETED |
2025-08-08
10:57:49 |
N/A | View |
493c6edd...
|
COMPLETED |
2025-08-08
10:28:42 |
N/A | View |
17648457...
|
COMPLETED |
2025-08-07
12:33:18 |
N/A | View |
f2303d91...
|
COMPLETED |
2025-08-07
12:24:17 |
N/A | View |
ced44353...
|
COMPLETED |
2025-08-07
11:50:11 |
N/A | View |
21564bfc...
|
COMPLETED |
2025-08-07
11:37:09 |
N/A | View |
dce7edc8...
|
COMPLETED |
2025-08-07
06:09:12 |
N/A | View |
c6d99bee...
|
COMPLETED |
2025-08-06
07:39:29 |
N/A | View |
9243c9f6...
|
COMPLETED |
2025-08-06
06:59:27 |
N/A | View |