Google Drive Single Excel File Analysis with Smart Routing
Comprehensive single Excel file analysis workflow with Google Drive integration, conditional routing, and loop-based sheet processing
Workflow Information
ID: excel_single_file_analysis_workflow
Namespace: excel_analysis
Version: 2.0
Created: 2025-07-31
Updated: 2025-07-31
Tasks: 3
Quick Actions
Inputs
| Name | Type | Required | Default |
|---|---|---|---|
nango_connection_id |
string | Required |
4274993f-c614-4efa-a01e-8d07422f4b09
|
nango_key |
string | Required |
8df3e2de-2307-48d3-94bd-ddd3fd6a62ec
|
file_id |
string | Required |
1Glz-mTMDCuGzUGoTLWOrSyIMI1sJDYlA
|
analysis_depth |
string | Optional |
comprehensive
|
target_sheets |
string | Optional |
Volumes,Power & Fuel,Per tonne,P&L,Operating metrics,Cash Flow,Balance Sheet,Financial Ratios,Valuation
|
results_file_name |
string | Optional |
Excel_Analysis_Results
|
analysis_years |
string | Optional |
FY2024
|
gemini_api_key |
string | Required |
AIzaSyB0_e6aU4gF-qRapMm3UYBSITpbd0ehsYk
|
Outputs
| Name | Type | Source |
|---|---|---|
excel_created |
boolean | Whether Excel file was successfully created |
final_excel_url |
string | Direct link to final Excel analysis file |
sheets_available |
integer | Number of sheets available for processing |
sheets_processed |
integer | Number of sheets successfully processed |
processing_status |
string | Overall processing status |
total_extractions |
integer | Total number of data extractions performed |
final_excel_file_id |
string | Google Drive file ID of final Excel analysis |
json_download_status |
string | Status of JSON file download |
Tasks
analyze_excel_file
scriptComplete Excel analysis workflow in a single readable script
download_json_results
scriptDownload and parse the JSON results file
create_dynamic_models_and_process
scriptCreate dynamic Pydantic models from JSON and process with Gemini for each analysis year
YAML Source
id: excel_single_file_analysis_clean
name: Clean Excel File Analysis with Drive Upload
tasks:
- id: analyze_excel_file
name: Analyze Excel File and Upload Results
type: script
script: "import json\nimport requests\nimport pandas as pd\nimport io\nfrom datetime\
\ import datetime\nfrom typing import Dict, List, Any\n\n# =============================================================================\n\
# CONFIGURATION\n# =============================================================================\n\
def get_configuration():\n \"\"\"Get and validate input configuration.\"\"\"\
\n config = {\n 'nango_connection_id': \"${nango_connection_id}\",\n\
\ 'nango_key': \"${nango_key}\",\n 'file_id': \"${file_id}\".strip(),\n\
\ 'analysis_depth': \"${analysis_depth}\",\n 'target_sheets': [sheet.strip()\
\ for sheet in \"${target_sheets}\".split(',') if sheet.strip()],\n 'results_file_name':\
\ \"${results_file_name}\"\n }\n \n print(\"\U0001F4CB EXCEL FILE ANALYSIS\
\ CONFIGURATION\")\n print(\"=\" * 50)\n print(f\"File ID: {config['file_id']}\"\
)\n print(f\"Analysis Depth: {config['analysis_depth']}\")\n print(f\"Target\
\ Sheets: {len(config['target_sheets'])}\")\n for i, sheet in enumerate(config['target_sheets'],\
\ 1):\n print(f\" {i}. {sheet}\")\n print(f\"Results File Name: {config['results_file_name']}\"\
)\n print()\n \n return config\n\n# =============================================================================\n\
# GOOGLE DRIVE AUTHENTICATION\n# =============================================================================\n\
def authenticate_google_drive(nango_connection_id: str, nango_key: str) -> str:\n\
\ \"\"\"Authenticate with Google Drive and return access token.\"\"\"\n \
\ print(\"\U0001F510 Authenticating with Google Drive...\")\n \n auth_url\
\ = f\"https://auth-dev.assistents.ai/connection/{nango_connection_id}?provider_config_key=google-drive-hq3h\"\
\n headers = {\n 'Authorization': f'Bearer {nango_key}',\n 'Content-Type':\
\ 'application/json'\n }\n \n try:\n response = requests.get(auth_url,\
\ headers=headers, timeout=15)\n response.raise_for_status()\n \n\
\ auth_data = response.json()\n access_token = auth_data.get('credentials',\
\ {}).get('access_token')\n \n if not access_token:\n \
\ raise Exception(\"No access token received from authentication\")\n \
\ \n print(\"\u2705 Authentication successful\")\n return access_token\n\
\ \n except Exception as e:\n raise Exception(f\"Authentication\
\ failed: {str(e)}\")\n\n# =============================================================================\n\
# FILE DOWNLOAD AND INFO\n# =============================================================================\n\
def get_file_info(file_id: str, access_token: str) -> Dict[str, Any]:\n \"\"\
\"Get file information from Google Drive.\"\"\"\n print(\"\U0001F4C4 Getting\
\ file information...\")\n \n headers = {'Authorization': f'Bearer {access_token}'}\n\
\ url = f\"https://www.googleapis.com/drive/v3/files/{file_id}\"\n params\
\ = {'fields': 'id,name,mimeType,size,createdTime,modifiedTime'}\n \n try:\n\
\ response = requests.get(url, headers=headers, params=params, timeout=15)\n\
\ response.raise_for_status()\n \n file_info = response.json()\n\
\ print(f\"\u2705 File: {file_info['name']}\")\n print(f\" Type:\
\ {file_info['mimeType']}\")\n print(f\" Size: {file_info.get('size',\
\ 'N/A')} bytes\")\n \n return file_info\n \n except Exception\
\ as e:\n raise Exception(f\"Failed to get file info: {str(e)}\")\n\ndef\
\ download_excel_file(file_id: str, file_info: Dict[str, Any], access_token: str)\
\ -> bytes:\n \"\"\"Download Excel file content.\"\"\"\n print(\"\u2B07\uFE0F\
\ Downloading Excel file...\")\n \n headers = {'Authorization': f'Bearer\
\ {access_token}'}\n mime_type = file_info.get('mimeType')\n \n # Handle\
\ Google Sheets vs Excel files\n if mime_type == 'application/vnd.google-apps.spreadsheet':\n\
\ url = f\"https://www.googleapis.com/drive/v3/files/{file_id}/export\"\
\n params = {'mimeType': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'}\n\
\ else:\n url = f\"https://www.googleapis.com/drive/v3/files/{file_id}\"\
\n params = {'alt': 'media'}\n \n try:\n response = requests.get(url,\
\ headers=headers, params=params, timeout=60)\n response.raise_for_status()\n\
\ \n print(\"\u2705 File downloaded successfully\")\n return\
\ response.content\n \n except Exception as e:\n raise Exception(f\"\
Failed to download file: {str(e)}\")\n\n# =============================================================================\n\
# EXCEL ANALYSIS\n# =============================================================================\n\
def analyze_excel_structure(file_content: bytes, target_sheets: List[str]) ->\
\ Dict[str, Any]:\n \"\"\"Analyze Excel file structure and identify target\
\ sheets.\"\"\"\n print(\"\U0001F50D Analyzing Excel file structure...\")\n\
\ \n try:\n excel_file = pd.ExcelFile(io.BytesIO(file_content))\n\
\ available_sheets = excel_file.sheet_names\n \n print(f\"\
\u2705 Found {len(available_sheets)} sheets:\")\n for i, sheet in enumerate(available_sheets,\
\ 1):\n print(f\" {i}. {sheet}\")\n \n # Match target\
\ sheets with available sheets (case-insensitive)\n matched_sheets = []\n\
\ for available_sheet in available_sheets:\n for target_sheet\
\ in target_sheets:\n if available_sheet.lower().strip() == target_sheet.lower().strip():\n\
\ matched_sheets.append({\n 'sheet_name':\
\ available_sheet,\n 'target_name': target_sheet\n \
\ })\n break\n \n print(f\"\\\
n\U0001F3AF Target sheets found: {len(matched_sheets)}\")\n for sheet in\
\ matched_sheets:\n print(f\" \u2713 {sheet['sheet_name']} (matches\
\ '{sheet['target_name']}')\")\n \n return {\n 'excel_file':\
\ excel_file,\n 'available_sheets': available_sheets,\n \
\ 'matched_sheets': matched_sheets\n }\n \n except Exception\
\ as e:\n raise Exception(f\"Failed to analyze Excel structure: {str(e)}\"\
)\n\ndef extract_sheet_data(excel_file, sheet_name: str, analysis_depth: str)\
\ -> Dict[str, Any]:\n \"\"\"Extract data from a specific sheet based on analysis\
\ depth.\"\"\"\n print(f\"\U0001F4CA Analyzing sheet: {sheet_name}\")\n \
\ \n try:\n df = pd.read_excel(excel_file, sheet_name=sheet_name, header=None)\n\
\ \n if df.empty:\n print(f\" \u26A0\uFE0F Sheet '{sheet_name}'\
\ is empty\")\n return {\n 'sheet_name': sheet_name,\n\
\ 'status': 'empty',\n 'values_extracted': [],\n\
\ 'total_values': 0,\n 'dimensions': {'rows': 0,\
\ 'columns': 0}\n }\n \n print(f\" \U0001F4CF Dimensions:\
\ {df.shape[0]} rows \xD7 {df.shape[1]} columns\")\n \n # Extract\
\ data based on analysis depth\n extracted_values = []\n \n \
\ if analysis_depth == \"basic\":\n # Basic: Only first column,\
\ first 10 rows\n if len(df.columns) > 0:\n col_values\
\ = df.iloc[:10, 0].dropna().astype(str).str.strip()\n extracted_values.extend([v\
\ for v in col_values if v and not v.startswith('Unnamed:')])\n \
\ \n elif analysis_depth == \"comprehensive\":\n # Comprehensive:\
\ First 2 columns, all rows\n for col_idx in range(min(2, len(df.columns))):\n\
\ col_values = df.iloc[:, col_idx].dropna().astype(str).str.strip()\n\
\ extracted_values.extend([v for v in col_values if v and not v.startswith('Unnamed:')])\n\
\ \n else: # detailed\n # Detailed: First 3\
\ columns, all rows, plus data type info\n for col_idx in range(min(3,\
\ len(df.columns))):\n col_values = df.iloc[:, col_idx].dropna().astype(str).str.strip()\n\
\ extracted_values.extend([v for v in col_values if v and not v.startswith('Unnamed:')])\n\
\ \n # Remove duplicates while preserving order\n unique_values\
\ = []\n seen = set()\n for value in extracted_values:\n \
\ if value not in seen:\n seen.add(value)\n \
\ unique_values.append(value)\n \n print(f\" \u2705 Extracted {len(unique_values)}\
\ unique values\")\n \n result = {\n 'sheet_name': sheet_name,\n\
\ 'status': 'success',\n 'values_extracted': unique_values,\n\
\ 'total_values': len(unique_values),\n 'dimensions': {'rows':\
\ df.shape[0], 'columns': df.shape[1]},\n 'analysis_depth': analysis_depth\n\
\ }\n \n if analysis_depth == \"detailed\":\n \
\ # Add data type analysis for detailed mode\n data_types = {}\n \
\ for col_idx in range(min(3, len(df.columns))):\n col_data\
\ = df.iloc[:, col_idx].dropna()\n if len(col_data) > 0:\n \
\ data_types[f\"column_{col_idx}\"] = {\n \
\ \"data_type\": str(col_data.dtype),\n \"sample_values\"\
: col_data.head(3).tolist()\n }\n result['data_type_analysis']\
\ = data_types\n \n return result\n \n except Exception\
\ as e:\n print(f\" \u274C Error analyzing sheet '{sheet_name}': {str(e)}\"\
)\n return {\n 'sheet_name': sheet_name,\n 'status':\
\ 'error',\n 'error': str(e),\n 'total_values': 0\n \
\ }\n\ndef process_all_sheets(excel_analysis: Dict[str, Any], analysis_depth:\
\ str) -> Dict[str, Any]:\n \"\"\"Process all matched sheets and compile results.\"\
\"\"\n print(\"\\n\U0001F4DA Processing all target sheets...\")\n print(\"\
=\" * 40)\n \n excel_file = excel_analysis['excel_file']\n matched_sheets\
\ = excel_analysis['matched_sheets']\n \n if not matched_sheets:\n \
\ print(\"\u26A0\uFE0F No target sheets found to process\")\n return\
\ {\n 'processed_sheets': [],\n 'total_values_extracted':\
\ 0,\n 'processing_errors': [],\n 'success_count': 0\n \
\ }\n \n processed_sheets = []\n total_values = 0\n processing_errors\
\ = []\n success_count = 0\n \n for i, sheet_info in enumerate(matched_sheets,\
\ 1):\n print(f\"\\n\U0001F504 Processing sheet {i}/{len(matched_sheets)}:\
\ {sheet_info['sheet_name']}\")\n \n sheet_result = extract_sheet_data(excel_file,\
\ sheet_info['sheet_name'], analysis_depth)\n processed_sheets.append(sheet_result)\n\
\ \n if sheet_result['status'] == 'success':\n total_values\
\ += sheet_result['total_values']\n success_count += 1\n elif\
\ sheet_result['status'] == 'error':\n processing_errors.append(sheet_result)\n\
\ \n print(f\"\\n\u2705 Processing complete!\")\n print(f\" Sheets\
\ processed: {len(processed_sheets)}\")\n print(f\" Successful: {success_count}\"\
)\n print(f\" Errors: {len(processing_errors)}\")\n print(f\" Total\
\ values extracted: {total_values}\")\n \n return {\n 'processed_sheets':\
\ processed_sheets,\n 'total_values_extracted': total_values,\n \
\ 'processing_errors': processing_errors,\n 'success_count': success_count\n\
\ }\n\n# =============================================================================\n\
# RESULTS UPLOAD\n# =============================================================================\n\
def upload_results_to_drive(results: Dict[str, Any], file_id: str, file_info:\
\ Dict[str, Any], \n config: Dict[str, Any], access_token:\
\ str) -> Dict[str, Any]:\n \"\"\"Upload comprehensive results to Google Drive\
\ as JSON file.\"\"\"\n print(\"\\n\U0001F4E4 Uploading results to Google Drive...\"\
)\n print(\"=\" * 40)\n \n # Create comprehensive results structure\n\
\ comprehensive_results = {\n \"metadata\": {\n \"workflow_id\"\
: \"excel_single_file_analysis_clean\",\n \"execution_timestamp\":\
\ datetime.now().isoformat(),\n \"analysis_depth\": config['analysis_depth'],\n\
\ \"source_file_id\": file_id,\n \"target_sheets_requested\"\
: config['target_sheets']\n },\n \"execution_summary\": {\n \
\ \"file_id\": file_id,\n \"file_name\": file_info.get('name',\
\ 'Unknown'),\n \"file_size\": file_info.get('size', 'N/A'),\n \
\ \"sheets_processed\": len(results['processed_sheets']),\n \
\ \"successful_sheets\": results['success_count'],\n \"failed_sheets\"\
: len(results['processing_errors']),\n \"total_values_extracted\":\
\ results['total_values_extracted'],\n \"processing_timestamp\": datetime.now().isoformat()\n\
\ },\n \"detailed_results\": {\n \"sheet_analysis\":\
\ results['processed_sheets'],\n \"processing_errors\": results['processing_errors']\n\
\ },\n \"source_file_metadata\": file_info\n }\n \n # Convert\
\ to JSON\n json_content = json.dumps(comprehensive_results, indent=2, ensure_ascii=False)\n\
\ print(f\"\U0001F4C4 Generated results JSON ({len(json_content)} characters)\"\
)\n \n # Create filename with timestamp\n timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')\n\
\ filename = f\"{config['results_file_name']}_{timestamp}.json\"\n \n \
\ try:\n # Upload to Google Drive using multipart upload\n boundary\
\ = '-------314159265358979323846'\n delimiter = f'\\r\\n--{boundary}\\\
r\\n'\n close_delim = f'\\r\\n--{boundary}--'\n \n metadata\
\ = {\n 'name': filename,\n 'parents': [] # Upload to root\
\ folder\n }\n \n body = delimiter + 'Content-Type: application/json\\\
r\\n\\r\\n'\n body += json.dumps(metadata)\n body += delimiter +\
\ 'Content-Type: application/json\\r\\n\\r\\n'\n body += json_content\n\
\ body += close_delim\n \n headers = {\n 'Authorization':\
\ f'Bearer {access_token}',\n 'Content-Type': f'multipart/related;\
\ boundary=\"{boundary}\"'\n }\n \n upload_url = 'https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart'\n\
\ response = requests.post(upload_url, headers=headers, data=body, timeout=60)\n\
\ response.raise_for_status()\n \n upload_result = response.json()\n\
\ uploaded_file_id = upload_result['id']\n file_url = f\"https://drive.google.com/file/d/{uploaded_file_id}/view\"\
\n \n print(\"\u2705 Results uploaded successfully!\")\n \
\ print(f\" File ID: {uploaded_file_id}\")\n print(f\" Filename: {filename}\"\
)\n print(f\" URL: {file_url}\")\n \n return {\n \
\ 'upload_success': True,\n 'uploaded_file_id': uploaded_file_id,\n\
\ 'filename': filename,\n 'file_url': file_url,\n \
\ 'file_size_bytes': len(json_content)\n }\n \n except Exception\
\ as e:\n print(f\"\u274C Failed to upload results: {str(e)}\")\n \
\ return {\n 'upload_success': False,\n 'error': str(e)\n\
\ }\n\n# =============================================================================\n\
# MAIN EXECUTION\n# =============================================================================\n\
def main():\n \"\"\"Main execution function.\"\"\"\n try:\n # Get\
\ configuration\n config = get_configuration()\n \n # Validate\
\ file ID\n if not config['file_id'] or len(config['file_id']) < 10:\n\
\ raise Exception(\"Invalid file ID provided\")\n \n \
\ # Authenticate with Google Drive\n access_token = authenticate_google_drive(config['nango_connection_id'],\
\ config['nango_key'])\n \n # Get file information\n file_info\
\ = get_file_info(config['file_id'], access_token)\n \n # Download\
\ Excel file\n file_content = download_excel_file(config['file_id'], file_info,\
\ access_token)\n \n # Analyze Excel structure\n excel_analysis\
\ = analyze_excel_structure(file_content, config['target_sheets'])\n \n\
\ # Process all sheets\n processing_results = process_all_sheets(excel_analysis,\
\ config['analysis_depth'])\n \n # Upload results to Google Drive\n\
\ upload_results = upload_results_to_drive(\n processing_results,\
\ config['file_id'], file_info, config, access_token\n )\n \n \
\ # Compile final output\n final_results = {\n 'status':\
\ 'success',\n 'file_id': config['file_id'],\n 'file_name':\
\ file_info.get('name', 'Unknown'),\n 'analysis_depth': config['analysis_depth'],\n\
\ 'sheets_processed': len(processing_results['processed_sheets']),\n\
\ 'successful_sheets': processing_results['success_count'],\n \
\ 'total_values_extracted': processing_results['total_values_extracted'],\n\
\ 'upload_success': upload_results['upload_success'],\n \
\ 'uploaded_file_id': upload_results.get('uploaded_file_id'),\n 'results_file_url':\
\ upload_results.get('file_url'),\n 'results_filename': upload_results.get('filename'),\n\
\ 'processing_timestamp': datetime.now().isoformat()\n }\n \
\ \n if not upload_results['upload_success']:\n final_results['upload_error']\
\ = upload_results.get('error')\n \n print(\"\\n\" + \"=\" * 70)\n\
\ print(\"\U0001F389 EXCEL ANALYSIS COMPLETE!\")\n print(\"=\" *\
\ 70)\n print(f\"\u2705 File: {final_results['file_name']}\")\n \
\ print(f\"\U0001F4CA Sheets processed: {final_results['sheets_processed']}\"\
)\n print(f\"\U0001F4C8 Values extracted: {final_results['total_values_extracted']}\"\
)\n \n if final_results['upload_success']:\n print(f\"\
\U0001F4E4 Results uploaded: {final_results['results_filename']}\")\n \
\ print(f\"\U0001F517 Access at: {final_results['results_file_url']}\")\n \
\ else:\n print(f\"\u26A0\uFE0F Upload failed: {final_results.get('upload_error',\
\ 'Unknown error')}\")\n \n return final_results\n \n \
\ except Exception as e:\n error_result = {\n 'status': 'error',\n\
\ 'error': str(e),\n 'file_id': config.get('file_id', 'unknown')\
\ if 'config' in locals() else 'unknown',\n 'sheets_processed': 0,\n\
\ 'total_values_extracted': 0,\n 'upload_success': False,\n\
\ 'processing_timestamp': datetime.now().isoformat()\n }\n \
\ \n print(f\"\\n\u274C ANALYSIS FAILED: {str(e)}\")\n return\
\ error_result\n\n# Execute main function and output results\nresults = main()\n\
print(f\"\\n__OUTPUTS__ {json.dumps(results)}\")\n"
description: Complete Excel analysis workflow in a single readable script
requirements:
- pandas
- openpyxl
- requests
timeout_seconds: 300
- id: download_json_results
name: Download JSON Results from Drive
type: script
script: "import json\nimport requests\nfrom typing import Dict, List, Any\n\ndef\
\ authenticate_google_drive(nango_connection_id: str, nango_key: str) -> str:\n\
\ \"\"\"Authenticate with Google Drive and return access token.\"\"\"\n \
\ print(\"\U0001F510 Authenticating with Google Drive...\")\n \n auth_url\
\ = f\"https://auth-dev.assistents.ai/connection/{nango_connection_id}?provider_config_key=google-drive-hq3h\"\
\n headers = {\n 'Authorization': f'Bearer {nango_key}',\n 'Content-Type':\
\ 'application/json'\n }\n \n try:\n response = requests.get(auth_url,\
\ headers=headers, timeout=15)\n response.raise_for_status()\n \n\
\ auth_data = response.json()\n access_token = auth_data.get('credentials',\
\ {}).get('access_token')\n \n if not access_token:\n \
\ raise Exception(\"No access token received from authentication\")\n \
\ \n print(\"\u2705 Authentication successful\")\n return access_token\n\
\ \n except Exception as e:\n raise Exception(f\"Authentication\
\ failed: {str(e)}\")\n\ndef download_json_file(file_id: str, access_token: str)\
\ -> Dict[str, Any]:\n \"\"\"Download JSON file from Google Drive.\"\"\"\n\
\ print(\"\U0001F4E5 Downloading JSON results file...\")\n \n headers\
\ = {'Authorization': f'Bearer {access_token}'}\n url = f\"https://www.googleapis.com/drive/v3/files/{file_id}\"\
\n params = {'alt': 'media'}\n \n try:\n response = requests.get(url,\
\ headers=headers, params=params, timeout=60)\n response.raise_for_status()\n\
\ \n json_data = response.json()\n print(\"\u2705 JSON file\
\ downloaded and parsed successfully\")\n return json_data\n \n\
\ except Exception as e:\n raise Exception(f\"Failed to download JSON\
\ file: {str(e)}\")\n\ndef main():\n try:\n # Get configuration\n \
\ nango_connection_id = \"${nango_connection_id}\"\n nango_key = \"\
${nango_key}\"\n results_file_id = \"${analyze_excel_file.uploaded_file_id}\"\
\n \n print(\"\U0001F4CB DOWNLOADING JSON RESULTS\")\n print(\"\
=\" * 50)\n print(f\"File ID: {results_file_id}\")\n \n #\
\ Authenticate and download\n access_token = authenticate_google_drive(nango_connection_id,\
\ nango_key)\n json_data = download_json_file(results_file_id, access_token)\n\
\ \n # Extract key information\n execution_summary = json_data.get('execution_summary',\
\ {})\n detailed_results = json_data.get('detailed_results', {})\n \
\ sheet_analysis = detailed_results.get('sheet_analysis', [])\n \n \
\ print(f\"\U0001F4CA Found {len(sheet_analysis)} sheets to process\")\n\
\ \n result = {\n 'status': 'success',\n 'json_data':\
\ json_data,\n 'sheet_count': len(sheet_analysis),\n 'sheets_available':\
\ [sheet['sheet_name'] for sheet in sheet_analysis if sheet['status'] == 'success'],\n\
\ 'total_values': sum(sheet['total_values'] for sheet in sheet_analysis\
\ if sheet['status'] == 'success'),\n 'source_file_name': execution_summary.get('file_name',\
\ 'Unknown')\n }\n \n print(f\"\u2705 Ready to process {result['sheet_count']}\
\ sheets\")\n print(f\"\U0001F4DD Total values extracted: {result['total_values']}\"\
)\n \n return result\n \n except Exception as e:\n \
\ return {\n 'status': 'error',\n 'error': str(e),\n\
\ 'sheet_count': 0,\n 'total_values': 0\n }\n\n#\
\ Execute and output results\nresults = main()\nprint(f\"\\n__OUTPUTS__ {json.dumps(results)}\"\
)\n"
depends_on:
- analyze_excel_file
description: Download and parse the JSON results file
requirements:
- requests
timeout_seconds: 120
- id: create_dynamic_models_and_process
name: Create Dynamic Pydantic Models and Process with Gemini
type: script
script: "import json\nimport time\nimport io\nfrom datetime import datetime\nfrom\
\ typing import Dict, List, Any, Optional\nfrom pydantic import BaseModel, Field,\
\ create_model\nimport google.generativeai as genai\nimport requests\nimport pandas\
\ as pd\nfrom openpyxl import Workbook\nfrom openpyxl.styles import PatternFill,\
\ Font, Alignment, Border, Side\nfrom openpyxl.utils import get_column_letter\n\
\ndef create_dynamic_pydantic_model(sheet_name: str, values_extracted: List[str]):\n\
\ \"\"\"Create a dynamic Pydantic model based on sheet name and extracted values.\"\
\"\"\n print(f\"\U0001F4DD Creating model for sheet: {sheet_name}\")\n \n\
\ # Clean sheet name for class name\n class_name = sheet_name.replace('\
\ ', '').replace('&', 'And').replace('-', '').replace('/', '')\n class_name\
\ = f\"{class_name}Model\"\n \n # Create field definitions\n field_definitions\
\ = {}\n for value in values_extracted:\n # Clean field name\n \
\ field_name = value.lower().replace(' ', '_').replace('&', 'and').replace('-',\
\ '_')\n field_name = field_name.replace('/', '_').replace('(', '').replace(')',\
\ '')\n field_name = field_name.replace('%', 'percent').replace('.', '').replace(',',\
\ '')\n field_name = ''.join(c for c in field_name if c.isalnum() or c\
\ == '_')\n \n # Ensure field name starts with letter\n if\
\ field_name and field_name[0].isdigit():\n field_name = f\"field_{field_name}\"\
\n \n if field_name: # Only add if we have a valid field name\n\
\ field_definitions[field_name] = (str, Field(description=value))\n\
\ \n # Create the dynamic model\n model = create_model(class_name, **field_definitions)\n\
\ \n print(f\"\u2705 Created model '{class_name}' with {len(field_definitions)}\
\ fields\")\n return model, class_name\n\ndef setup_gemini(api_key: str):\n\
\ \"\"\"Setup Gemini API client.\"\"\"\n genai.configure(api_key=api_key)\n\
\ return genai.GenerativeModel('gemini-1.5-flash')\n\ndef process_sheet_with_gemini(model,\
\ sheet_data: Dict, dynamic_model, model_name: str, analysis_year: str) -> Optional[Dict]:\n\
\ \"\"\"Process sheet data with Gemini using dynamic model.\"\"\"\n try:\n\
\ sheet_name = sheet_data['sheet_name']\n values_extracted = sheet_data['values_extracted']\n\
\ \n print(f\"\U0001F916 Processing {sheet_name} for {analysis_year}\
\ with Gemini...\")\n \n # Create prompt for financial data extraction\n\
\ prompt = f\"\"\"\n You are a financial data extraction specialist.\
\ I need you to extract financial data for {analysis_year} from the following\
\ sheet: {sheet_name}.\n\n Available data points from the sheet:\n \
\ {json.dumps(values_extracted, indent=2)}\n\n Please extract the financial\
\ values for {analysis_year} and structure them according to the provided schema.\n\
\ \n IMPORTANT INSTRUCTIONS:\n 1. Focus specifically on {analysis_year}\
\ data\n 2. All monetary figures should be in the original units found\
\ in the data\n 3. If a value is not found or not applicable, use an empty\
\ string \"\"\n 4. For ratios and percentages, include the unit (e.g.,\
\ \"2.5x\", \"15%\")\n 5. Be accurate and thorough in extraction\n \
\ 6. Convert all values to strings for consistency\n\n Extract the data\
\ for sheet: {sheet_name}\n \"\"\"\n\n # Generate content with Gemini\n\
\ response = model.generate_content(\n prompt,\n \
\ generation_config=genai.types.GenerationConfig(\n response_mime_type=\"\
application/json\",\n response_schema=dynamic_model\n \
\ )\n )\n \n result = json.loads(response.text)\n \
\ print(f\"\u2705 Successfully processed {sheet_name} for {analysis_year}\"\
)\n return result\n \n except Exception as e:\n print(f\"\
\u274C Error processing {sheet_name} for {analysis_year}: {str(e)}\")\n \
\ return None\n\ndef create_comprehensive_excel(processed_data: Dict[str, Dict[str,\
\ Dict]], source_file_name: str) -> bytes:\n \"\"\"Create comprehensive Excel\
\ workbook with processed data.\"\"\"\n print(\"\U0001F4CA Creating comprehensive\
\ Excel workbook...\")\n \n wb = Workbook()\n wb.remove(wb.active) #\
\ Remove default sheet\n \n # Define styles\n header_fill = PatternFill(start_color=\"\
366092\", end_color=\"366092\", fill_type=\"solid\")\n header_font = Font(bold=True,\
\ color=\"FFFFFF\", size=11)\n year_fill = PatternFill(start_color=\"DCE6F1\"\
, end_color=\"DCE6F1\", fill_type=\"solid\")\n year_font = Font(bold=True,\
\ size=10)\n data_font = Font(size=10)\n center_align = Alignment(horizontal='center',\
\ vertical='center')\n left_align = Alignment(horizontal='left', vertical='center')\n\
\ right_align = Alignment(horizontal='right', vertical='center')\n \n \
\ thin_border = Border(\n left=Side(style='thin'), right=Side(style='thin'),\n\
\ top=Side(style='thin'), bottom=Side(style='thin')\n )\n \n #\
\ Process each sheet\n for sheet_name, year_data in processed_data.items():\n\
\ if not year_data:\n continue\n \n ws = wb.create_sheet(title=sheet_name[:31])\n\
\ \n # Get all unique metrics across all years\n all_metrics\
\ = set()\n for year, data in year_data.items():\n if data:\n\
\ all_metrics.update(data.keys())\n \n all_metrics\
\ = sorted(list(all_metrics))\n analysis_years = sorted(year_data.keys())\n\
\ \n # Create header row\n headers = ['Metric'] + analysis_years\n\
\ ws.append(headers)\n \n # Style header row\n for\
\ col_idx, header in enumerate(headers, 1):\n cell = ws.cell(row=1,\
\ column=col_idx)\n cell.fill = header_fill\n cell.font\
\ = header_font\n cell.alignment = center_align\n cell.border\
\ = thin_border\n \n # Add data rows\n for metric in all_metrics:\n\
\ row_data = [metric.replace('_', ' ').title()]\n \n \
\ for year in analysis_years:\n value = year_data.get(year,\
\ {}).get(metric, \"\")\n row_data.append(value)\n \n\
\ ws.append(row_data)\n \n # Style the row\n\
\ current_row = ws.max_row\n for col_idx in range(1, len(headers)\
\ + 1):\n cell = ws.cell(row=current_row, column=col_idx)\n \
\ cell.border = thin_border\n \n if\
\ col_idx == 1: # Metric name column\n cell.font = year_font\n\
\ cell.alignment = left_align\n else: # Data\
\ columns\n cell.font = data_font\n cell.alignment\
\ = right_align\n \n # Adjust column widths\n ws.column_dimensions['A'].width\
\ = 40\n for col_idx in range(2, len(headers) + 1):\n ws.column_dimensions[get_column_letter(col_idx)].width\
\ = 15\n \n # Freeze panes\n ws.freeze_panes = 'B2'\n \
\ \n print(f\"\u2705 Created sheet: {sheet_name} with {len(all_metrics)}\
\ metrics\")\n \n # Save to bytes\n excel_buffer = io.BytesIO()\n \
\ wb.save(excel_buffer)\n excel_buffer.seek(0)\n \n print(f\"\U0001F4C1\
\ Excel workbook created with {len(processed_data)} sheets\")\n return excel_buffer.getvalue()\n\
\ndef upload_excel_to_drive(excel_content: bytes, filename: str, access_token:\
\ str) -> Dict[str, Any]:\n \"\"\"Upload Excel file to Google Drive.\"\"\"\n\
\ print(\"\U0001F4E4 Uploading Excel file to Google Drive...\")\n \n \
\ try:\n boundary = '-------314159265358979323846'\n delimiter =\
\ f'\\r\\n--{boundary}\\r\\n'\n close_delim = f'\\r\\n--{boundary}--'\n\
\ \n metadata = {\n 'name': filename,\n 'parents':\
\ [] # Upload to root folder\n }\n \n body = delimiter +\
\ 'Content-Type: application/json\\r\\n\\r\\n'\n body += json.dumps(metadata)\n\
\ body += delimiter + 'Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet\\\
r\\n\\r\\n'\n body += excel_content.decode('latin-1')\n body +=\
\ close_delim\n \n headers = {\n 'Authorization': f'Bearer\
\ {access_token}',\n 'Content-Type': f'multipart/related; boundary=\"\
{boundary}\"'\n }\n \n upload_url = 'https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart'\n\
\ response = requests.post(upload_url, headers=headers, data=body.encode('latin-1'),\
\ timeout=120)\n response.raise_for_status()\n \n upload_result\
\ = response.json()\n file_url = f\"https://drive.google.com/file/d/{upload_result['id']}/view\"\
\n \n print(\"\u2705 Excel file uploaded successfully!\")\n \
\ return {\n 'success': True,\n 'file_id': upload_result['id'],\n\
\ 'file_url': file_url\n }\n \n except Exception as\
\ e:\n print(f\"\u274C Failed to upload Excel file: {str(e)}\")\n \
\ return {\n 'success': False,\n 'error': str(e)\n \
\ }\n\ndef authenticate_google_drive(nango_connection_id: str, nango_key:\
\ str) -> str:\n \"\"\"Authenticate with Google Drive.\"\"\"\n auth_url\
\ = f\"https://auth-dev.assistents.ai/connection/{nango_connection_id}?provider_config_key=google-drive-hq3h\"\
\n headers = {\n 'Authorization': f'Bearer {nango_key}',\n 'Content-Type':\
\ 'application/json'\n }\n \n response = requests.get(auth_url, headers=headers,\
\ timeout=15)\n response.raise_for_status()\n \n auth_data = response.json()\n\
\ return auth_data['credentials']['access_token']\n\ndef main():\n try:\n\
\ # Get configuration\n gemini_api_key = \"${gemini_api_key}\"\n\
\ analysis_years = [year.strip() for year in \"${analysis_years}\".split(',')]\n\
\ json_data = ${download_json_results.json_data}\n source_file_name\
\ = \"${download_json_results.source_file_name}\"\n nango_connection_id\
\ = \"${nango_connection_id}\"\n nango_key = \"${nango_key}\"\n \
\ \n print(\"\U0001F680 PROCESSING WITH DYNAMIC MODELS AND GEMINI\")\n\
\ print(\"=\" * 60)\n print(f\"Analysis Years: {analysis_years}\"\
)\n print(f\"Source File: {source_file_name}\")\n \n # Setup\
\ Gemini\n model = setup_gemini(gemini_api_key)\n \n # Get\
\ sheet analysis data\n sheet_analysis = json_data['detailed_results']['sheet_analysis']\n\
\ successful_sheets = [sheet for sheet in sheet_analysis if sheet['status']\
\ == 'success']\n \n print(f\"\U0001F4CA Processing {len(successful_sheets)}\
\ successful sheets\")\n \n # Process each sheet for each analysis\
\ year\n processed_data = {}\n \n for sheet_data in successful_sheets:\n\
\ sheet_name = sheet_data['sheet_name']\n values_extracted\
\ = sheet_data['values_extracted']\n \n if not values_extracted:\n\
\ print(f\"\u26A0\uFE0F Skipping {sheet_name} - no values extracted\"\
)\n continue\n \n print(f\"\\n\U0001F4CB\
\ Processing sheet: {sheet_name}\")\n \n # Create dynamic\
\ Pydantic model\n dynamic_model, model_name = create_dynamic_pydantic_model(sheet_name,\
\ values_extracted)\n \n # Process for each analysis year\n\
\ year_results = {}\n for analysis_year in analysis_years:\n\
\ result = process_sheet_with_gemini(\n model,\
\ sheet_data, dynamic_model, model_name, analysis_year\n )\n \
\ if result:\n year_results[analysis_year] = result\n\
\ \n # Rate limiting\n time.sleep(2)\n\
\ \n if year_results:\n processed_data[sheet_name]\
\ = year_results\n print(f\"\u2705 Completed processing {sheet_name}\
\ for {len(year_results)} years\")\n \n # Delay between\
\ sheets\n time.sleep(1)\n \n if not processed_data:\n\
\ raise Exception(\"No data was successfully processed\")\n \
\ \n # Create Excel workbook\n excel_content = create_comprehensive_excel(processed_data,\
\ source_file_name)\n \n # Upload to Google Drive\n access_token\
\ = authenticate_google_drive(nango_connection_id, nango_key)\n timestamp\
\ = datetime.now().strftime('%Y%m%d_%H%M%S')\n filename = f\"Financial_Analysis_{source_file_name}_{timestamp}.xlsx\"\
\n \n upload_result = upload_excel_to_drive(excel_content, filename,\
\ access_token)\n \n # Compile final results\n final_results\
\ = {\n 'status': 'success',\n 'sheets_processed': len(processed_data),\n\
\ 'analysis_years': analysis_years,\n 'source_file': source_file_name,\n\
\ 'excel_created': upload_result['success'],\n 'excel_file_id':\
\ upload_result.get('file_id'),\n 'excel_file_url': upload_result.get('file_url'),\n\
\ 'excel_filename': filename,\n 'processing_timestamp':\
\ datetime.now().isoformat(),\n 'total_extractions': sum(len(year_data)\
\ for year_data in processed_data.values())\n }\n \n print(\"\
\\n\" + \"=\" * 70)\n print(\"\U0001F389 FINANCIAL ANALYSIS COMPLETE!\"\
)\n print(\"=\" * 70)\n print(f\"\u2705 Sheets processed: {final_results['sheets_processed']}\"\
)\n print(f\"\U0001F4C5 Analysis years: {len(analysis_years)}\")\n \
\ print(f\"\U0001F522 Total extractions: {final_results['total_extractions']}\"\
)\n \n if final_results['excel_created']:\n print(f\"\
\U0001F4CA Excel file created: {filename}\")\n print(f\"\U0001F517\
\ Access at: {final_results['excel_file_url']}\")\n else:\n \
\ print(f\"\u26A0\uFE0F Excel upload failed: {upload_result.get('error', 'Unknown\
\ error')}\")\n \n return final_results\n \n except Exception\
\ as e:\n return {\n 'status': 'error',\n 'error':\
\ str(e),\n 'sheets_processed': 0,\n 'excel_created': False,\n\
\ 'processing_timestamp': datetime.now().isoformat()\n }\n\n\
# Execute main function and output results\nresults = main()\nprint(f\"\\n__OUTPUTS__\
\ {json.dumps(results)}\")\n"
depends_on:
- download_json_results
description: Create dynamic Pydantic models from JSON and process with Gemini for
each analysis year
requirements:
- pydantic
- google-generativeai
- requests
- openpyxl
- pandas
timeout_seconds: 600
inputs:
- name: nango_connection_id
type: string
default: 4274993f-c614-4efa-a01e-8d07422f4b09
required: true
description: Nango connection ID for Google Drive access
- name: nango_key
type: string
default: 8df3e2de-2307-48d3-94bd-ddd3fd6a62ec
required: true
description: Nango authentication key
- name: file_id
type: string
default: 1Glz-mTMDCuGzUGoTLWOrSyIMI1sJDYlA
required: true
description: Google Drive file ID of the Excel file to analyze
- name: analysis_depth
type: string
default: comprehensive
required: false
description: 'Analysis depth: basic, comprehensive, or detailed'
- name: target_sheets
type: string
default: Volumes,Power & Fuel,Per tonne,P&L,Operating metrics,Cash Flow,Balance
Sheet,Financial Ratios,Valuation
required: false
description: Comma-separated list of sheet names to analyze
- name: results_file_name
type: string
default: Excel_Analysis_Results
required: false
description: Name for the results file to upload to Google Drive
- name: analysis_years
type: string
default: FY2024
required: false
description: Comma-separated list of analysis years to process
- name: gemini_api_key
type: string
default: AIzaSyB0_e6aU4gF-qRapMm3UYBSITpbd0ehsYk
required: true
description: Gemini API key for AI processing
outputs:
excel_created:
type: boolean
source: create_dynamic_models_and_process.excel_created
description: Whether Excel file was successfully created
final_excel_url:
type: string
source: create_dynamic_models_and_process.excel_file_url
description: Direct link to final Excel analysis file
sheets_available:
type: integer
source: download_json_results.sheet_count
description: Number of sheets available for processing
sheets_processed:
type: integer
source: create_dynamic_models_and_process.sheets_processed
description: Number of sheets successfully processed
processing_status:
type: string
source: create_dynamic_models_and_process.status
description: Overall processing status
total_extractions:
type: integer
source: create_dynamic_models_and_process.total_extractions
description: Total number of data extractions performed
final_excel_file_id:
type: string
source: create_dynamic_models_and_process.excel_file_id
description: Google Drive file ID of final Excel analysis
json_download_status:
type: string
source: download_json_results.status
description: Status of JSON file download
version: '2.0'
namespace: excel_analysis
description: Simplified workflow to analyze a single Excel file and upload results
to Google Drive
| Execution ID | Status | Started | Duration | Actions |
|---|---|---|---|---|
7a8e8bb9...
|
COMPLETED |
2025-07-31
18:27:43 |
N/A | View |
25eeb35a...
|
COMPLETED |
2025-07-31
18:24:53 |
N/A | View |
8128a06a...
|
COMPLETED |
2025-07-31
18:07:49 |
N/A | View |
29ad898b...
|
COMPLETED |
2025-07-31
17:01:50 |
N/A | View |
a99889a4...
|
COMPLETED |
2025-07-31
11:06:42 |
N/A | View |
cba6488d...
|
COMPLETED |
2025-07-31
11:02:07 |
N/A | View |
8a67cfa3...
|
COMPLETED |
2025-07-31
10:53:23 |
N/A | View |
f07e88a6...
|
COMPLETED |
2025-07-31
10:35:46 |
N/A | View |
7d0cb621...
|
COMPLETED |
2025-07-31
10:09:01 |
N/A | View |
997f4416...
|
COMPLETED |
2025-07-31
09:33:18 |
N/A | View |
bc7419e9...
|
COMPLETED |
2025-07-31
09:27:11 |
N/A | View |
f46a2948...
|
COMPLETED |
2025-07-31
09:12:08 |
N/A | View |
ccb4fe69...
|
COMPLETED |
2025-07-31
09:09:30 |
N/A | View |
6a4aed8b...
|
COMPLETED |
2025-07-31
07:51:28 |
N/A | View |
e46f3152...
|
COMPLETED |
2025-07-31
07:47:29 |
N/A | View |
ebe186d3...
|
COMPLETED |
2025-07-31
07:45:12 |
N/A | View |
a818f809...
|
COMPLETED |
2025-07-31
07:43:17 |
N/A | View |
c615b214...
|
COMPLETED |
2025-07-31
07:40:16 |
N/A | View |
f7a94b6a...
|
COMPLETED |
2025-07-31
07:38:43 |
N/A | View |
c4920a09...
|
COMPLETED |
2025-07-31
07:36:09 |
N/A | View |