Google Drive Single Excel File Analysis with Smart Routing

Comprehensive single Excel file analysis workflow with Google Drive integration, conditional routing, and loop-based sheet processing

Back
Workflow Information

ID: excel_single_file_analysis_workflow

Namespace: excel_analysis

Version: 2.0

Created: 2025-07-31

Updated: 2025-07-31

Tasks: 3

Quick Actions
Manage Secrets
Inputs
Name Type Required Default
nango_connection_id string Required 4274993f-c614-4efa-a01e-8d07422f4b09
nango_key string Required 8df3e2de-2307-48d3-94bd-ddd3fd6a62ec
file_id string Required 1Glz-mTMDCuGzUGoTLWOrSyIMI1sJDYlA
analysis_depth string Optional comprehensive
target_sheets string Optional Volumes,Power & Fuel,Per tonne,P&L,Operating metrics,Cash Flow,Balance Sheet,Financial Ratios,Valuation
results_file_name string Optional Excel_Analysis_Results
analysis_years string Optional FY2024
gemini_api_key string Required AIzaSyB0_e6aU4gF-qRapMm3UYBSITpbd0ehsYk
Outputs
Name Type Source
excel_created boolean Whether Excel file was successfully created
final_excel_url string Direct link to final Excel analysis file
sheets_available integer Number of sheets available for processing
sheets_processed integer Number of sheets successfully processed
processing_status string Overall processing status
total_extractions integer Total number of data extractions performed
final_excel_file_id string Google Drive file ID of final Excel analysis
json_download_status string Status of JSON file download
Tasks
analyze_excel_file
script

Complete Excel analysis workflow in a single readable script

download_json_results
script

Download and parse the JSON results file

create_dynamic_models_and_process
script

Create dynamic Pydantic models from JSON and process with Gemini for each analysis year

YAML Source
id: excel_single_file_analysis_clean
name: Clean Excel File Analysis with Drive Upload
tasks:
- id: analyze_excel_file
  name: Analyze Excel File and Upload Results
  type: script
  script: "import json\nimport requests\nimport pandas as pd\nimport io\nfrom datetime\
    \ import datetime\nfrom typing import Dict, List, Any\n\n# =============================================================================\n\
    # CONFIGURATION\n# =============================================================================\n\
    def get_configuration():\n    \"\"\"Get and validate input configuration.\"\"\"\
    \n    config = {\n        'nango_connection_id': \"${nango_connection_id}\",\n\
    \        'nango_key': \"${nango_key}\",\n        'file_id': \"${file_id}\".strip(),\n\
    \        'analysis_depth': \"${analysis_depth}\",\n        'target_sheets': [sheet.strip()\
    \ for sheet in \"${target_sheets}\".split(',') if sheet.strip()],\n        'results_file_name':\
    \ \"${results_file_name}\"\n    }\n    \n    print(\"\U0001F4CB EXCEL FILE ANALYSIS\
    \ CONFIGURATION\")\n    print(\"=\" * 50)\n    print(f\"File ID: {config['file_id']}\"\
    )\n    print(f\"Analysis Depth: {config['analysis_depth']}\")\n    print(f\"Target\
    \ Sheets: {len(config['target_sheets'])}\")\n    for i, sheet in enumerate(config['target_sheets'],\
    \ 1):\n        print(f\"  {i}. {sheet}\")\n    print(f\"Results File Name: {config['results_file_name']}\"\
    )\n    print()\n    \n    return config\n\n# =============================================================================\n\
    # GOOGLE DRIVE AUTHENTICATION\n# =============================================================================\n\
    def authenticate_google_drive(nango_connection_id: str, nango_key: str) -> str:\n\
    \    \"\"\"Authenticate with Google Drive and return access token.\"\"\"\n   \
    \ print(\"\U0001F510 Authenticating with Google Drive...\")\n    \n    auth_url\
    \ = f\"https://auth-dev.assistents.ai/connection/{nango_connection_id}?provider_config_key=google-drive-hq3h\"\
    \n    headers = {\n        'Authorization': f'Bearer {nango_key}',\n        'Content-Type':\
    \ 'application/json'\n    }\n    \n    try:\n        response = requests.get(auth_url,\
    \ headers=headers, timeout=15)\n        response.raise_for_status()\n        \n\
    \        auth_data = response.json()\n        access_token = auth_data.get('credentials',\
    \ {}).get('access_token')\n        \n        if not access_token:\n          \
    \  raise Exception(\"No access token received from authentication\")\n       \
    \     \n        print(\"\u2705 Authentication successful\")\n        return access_token\n\
    \        \n    except Exception as e:\n        raise Exception(f\"Authentication\
    \ failed: {str(e)}\")\n\n# =============================================================================\n\
    # FILE DOWNLOAD AND INFO\n# =============================================================================\n\
    def get_file_info(file_id: str, access_token: str) -> Dict[str, Any]:\n    \"\"\
    \"Get file information from Google Drive.\"\"\"\n    print(\"\U0001F4C4 Getting\
    \ file information...\")\n    \n    headers = {'Authorization': f'Bearer {access_token}'}\n\
    \    url = f\"https://www.googleapis.com/drive/v3/files/{file_id}\"\n    params\
    \ = {'fields': 'id,name,mimeType,size,createdTime,modifiedTime'}\n    \n    try:\n\
    \        response = requests.get(url, headers=headers, params=params, timeout=15)\n\
    \        response.raise_for_status()\n        \n        file_info = response.json()\n\
    \        print(f\"\u2705 File: {file_info['name']}\")\n        print(f\"   Type:\
    \ {file_info['mimeType']}\")\n        print(f\"   Size: {file_info.get('size',\
    \ 'N/A')} bytes\")\n        \n        return file_info\n        \n    except Exception\
    \ as e:\n        raise Exception(f\"Failed to get file info: {str(e)}\")\n\ndef\
    \ download_excel_file(file_id: str, file_info: Dict[str, Any], access_token: str)\
    \ -> bytes:\n    \"\"\"Download Excel file content.\"\"\"\n    print(\"\u2B07\uFE0F\
    \ Downloading Excel file...\")\n    \n    headers = {'Authorization': f'Bearer\
    \ {access_token}'}\n    mime_type = file_info.get('mimeType')\n    \n    # Handle\
    \ Google Sheets vs Excel files\n    if mime_type == 'application/vnd.google-apps.spreadsheet':\n\
    \        url = f\"https://www.googleapis.com/drive/v3/files/{file_id}/export\"\
    \n        params = {'mimeType': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'}\n\
    \    else:\n        url = f\"https://www.googleapis.com/drive/v3/files/{file_id}\"\
    \n        params = {'alt': 'media'}\n    \n    try:\n        response = requests.get(url,\
    \ headers=headers, params=params, timeout=60)\n        response.raise_for_status()\n\
    \        \n        print(\"\u2705 File downloaded successfully\")\n        return\
    \ response.content\n        \n    except Exception as e:\n        raise Exception(f\"\
    Failed to download file: {str(e)}\")\n\n# =============================================================================\n\
    # EXCEL ANALYSIS\n# =============================================================================\n\
    def analyze_excel_structure(file_content: bytes, target_sheets: List[str]) ->\
    \ Dict[str, Any]:\n    \"\"\"Analyze Excel file structure and identify target\
    \ sheets.\"\"\"\n    print(\"\U0001F50D Analyzing Excel file structure...\")\n\
    \    \n    try:\n        excel_file = pd.ExcelFile(io.BytesIO(file_content))\n\
    \        available_sheets = excel_file.sheet_names\n        \n        print(f\"\
    \u2705 Found {len(available_sheets)} sheets:\")\n        for i, sheet in enumerate(available_sheets,\
    \ 1):\n            print(f\"  {i}. {sheet}\")\n        \n        # Match target\
    \ sheets with available sheets (case-insensitive)\n        matched_sheets = []\n\
    \        for available_sheet in available_sheets:\n            for target_sheet\
    \ in target_sheets:\n                if available_sheet.lower().strip() == target_sheet.lower().strip():\n\
    \                    matched_sheets.append({\n                        'sheet_name':\
    \ available_sheet,\n                        'target_name': target_sheet\n    \
    \                })\n                    break\n        \n        print(f\"\\\
    n\U0001F3AF Target sheets found: {len(matched_sheets)}\")\n        for sheet in\
    \ matched_sheets:\n            print(f\"  \u2713 {sheet['sheet_name']} (matches\
    \ '{sheet['target_name']}')\")\n        \n        return {\n            'excel_file':\
    \ excel_file,\n            'available_sheets': available_sheets,\n           \
    \ 'matched_sheets': matched_sheets\n        }\n        \n    except Exception\
    \ as e:\n        raise Exception(f\"Failed to analyze Excel structure: {str(e)}\"\
    )\n\ndef extract_sheet_data(excel_file, sheet_name: str, analysis_depth: str)\
    \ -> Dict[str, Any]:\n    \"\"\"Extract data from a specific sheet based on analysis\
    \ depth.\"\"\"\n    print(f\"\U0001F4CA Analyzing sheet: {sheet_name}\")\n   \
    \ \n    try:\n        df = pd.read_excel(excel_file, sheet_name=sheet_name, header=None)\n\
    \        \n        if df.empty:\n            print(f\"  \u26A0\uFE0F Sheet '{sheet_name}'\
    \ is empty\")\n            return {\n                'sheet_name': sheet_name,\n\
    \                'status': 'empty',\n                'values_extracted': [],\n\
    \                'total_values': 0,\n                'dimensions': {'rows': 0,\
    \ 'columns': 0}\n            }\n        \n        print(f\"  \U0001F4CF Dimensions:\
    \ {df.shape[0]} rows \xD7 {df.shape[1]} columns\")\n        \n        # Extract\
    \ data based on analysis depth\n        extracted_values = []\n        \n    \
    \    if analysis_depth == \"basic\":\n            # Basic: Only first column,\
    \ first 10 rows\n            if len(df.columns) > 0:\n                col_values\
    \ = df.iloc[:10, 0].dropna().astype(str).str.strip()\n                extracted_values.extend([v\
    \ for v in col_values if v and not v.startswith('Unnamed:')])\n              \
    \  \n        elif analysis_depth == \"comprehensive\":\n            # Comprehensive:\
    \ First 2 columns, all rows\n            for col_idx in range(min(2, len(df.columns))):\n\
    \                col_values = df.iloc[:, col_idx].dropna().astype(str).str.strip()\n\
    \                extracted_values.extend([v for v in col_values if v and not v.startswith('Unnamed:')])\n\
    \                \n        else:  # detailed\n            # Detailed: First 3\
    \ columns, all rows, plus data type info\n            for col_idx in range(min(3,\
    \ len(df.columns))):\n                col_values = df.iloc[:, col_idx].dropna().astype(str).str.strip()\n\
    \                extracted_values.extend([v for v in col_values if v and not v.startswith('Unnamed:')])\n\
    \        \n        # Remove duplicates while preserving order\n        unique_values\
    \ = []\n        seen = set()\n        for value in extracted_values:\n       \
    \     if value not in seen:\n                seen.add(value)\n               \
    \ unique_values.append(value)\n        \n        print(f\"  \u2705 Extracted {len(unique_values)}\
    \ unique values\")\n        \n        result = {\n            'sheet_name': sheet_name,\n\
    \            'status': 'success',\n            'values_extracted': unique_values,\n\
    \            'total_values': len(unique_values),\n            'dimensions': {'rows':\
    \ df.shape[0], 'columns': df.shape[1]},\n            'analysis_depth': analysis_depth\n\
    \        }\n        \n        if analysis_depth == \"detailed\":\n           \
    \ # Add data type analysis for detailed mode\n            data_types = {}\n  \
    \          for col_idx in range(min(3, len(df.columns))):\n                col_data\
    \ = df.iloc[:, col_idx].dropna()\n                if len(col_data) > 0:\n    \
    \                data_types[f\"column_{col_idx}\"] = {\n                     \
    \   \"data_type\": str(col_data.dtype),\n                        \"sample_values\"\
    : col_data.head(3).tolist()\n                    }\n            result['data_type_analysis']\
    \ = data_types\n        \n        return result\n        \n    except Exception\
    \ as e:\n        print(f\"  \u274C Error analyzing sheet '{sheet_name}': {str(e)}\"\
    )\n        return {\n            'sheet_name': sheet_name,\n            'status':\
    \ 'error',\n            'error': str(e),\n            'total_values': 0\n    \
    \    }\n\ndef process_all_sheets(excel_analysis: Dict[str, Any], analysis_depth:\
    \ str) -> Dict[str, Any]:\n    \"\"\"Process all matched sheets and compile results.\"\
    \"\"\n    print(\"\\n\U0001F4DA Processing all target sheets...\")\n    print(\"\
    =\" * 40)\n    \n    excel_file = excel_analysis['excel_file']\n    matched_sheets\
    \ = excel_analysis['matched_sheets']\n    \n    if not matched_sheets:\n     \
    \   print(\"\u26A0\uFE0F No target sheets found to process\")\n        return\
    \ {\n            'processed_sheets': [],\n            'total_values_extracted':\
    \ 0,\n            'processing_errors': [],\n            'success_count': 0\n \
    \       }\n    \n    processed_sheets = []\n    total_values = 0\n    processing_errors\
    \ = []\n    success_count = 0\n    \n    for i, sheet_info in enumerate(matched_sheets,\
    \ 1):\n        print(f\"\\n\U0001F504 Processing sheet {i}/{len(matched_sheets)}:\
    \ {sheet_info['sheet_name']}\")\n        \n        sheet_result = extract_sheet_data(excel_file,\
    \ sheet_info['sheet_name'], analysis_depth)\n        processed_sheets.append(sheet_result)\n\
    \        \n        if sheet_result['status'] == 'success':\n            total_values\
    \ += sheet_result['total_values']\n            success_count += 1\n        elif\
    \ sheet_result['status'] == 'error':\n            processing_errors.append(sheet_result)\n\
    \    \n    print(f\"\\n\u2705 Processing complete!\")\n    print(f\"   Sheets\
    \ processed: {len(processed_sheets)}\")\n    print(f\"   Successful: {success_count}\"\
    )\n    print(f\"   Errors: {len(processing_errors)}\")\n    print(f\"   Total\
    \ values extracted: {total_values}\")\n    \n    return {\n        'processed_sheets':\
    \ processed_sheets,\n        'total_values_extracted': total_values,\n       \
    \ 'processing_errors': processing_errors,\n        'success_count': success_count\n\
    \    }\n\n# =============================================================================\n\
    # RESULTS UPLOAD\n# =============================================================================\n\
    def upload_results_to_drive(results: Dict[str, Any], file_id: str, file_info:\
    \ Dict[str, Any], \n                           config: Dict[str, Any], access_token:\
    \ str) -> Dict[str, Any]:\n    \"\"\"Upload comprehensive results to Google Drive\
    \ as JSON file.\"\"\"\n    print(\"\\n\U0001F4E4 Uploading results to Google Drive...\"\
    )\n    print(\"=\" * 40)\n    \n    # Create comprehensive results structure\n\
    \    comprehensive_results = {\n        \"metadata\": {\n            \"workflow_id\"\
    : \"excel_single_file_analysis_clean\",\n            \"execution_timestamp\":\
    \ datetime.now().isoformat(),\n            \"analysis_depth\": config['analysis_depth'],\n\
    \            \"source_file_id\": file_id,\n            \"target_sheets_requested\"\
    : config['target_sheets']\n        },\n        \"execution_summary\": {\n    \
    \        \"file_id\": file_id,\n            \"file_name\": file_info.get('name',\
    \ 'Unknown'),\n            \"file_size\": file_info.get('size', 'N/A'),\n    \
    \        \"sheets_processed\": len(results['processed_sheets']),\n           \
    \ \"successful_sheets\": results['success_count'],\n            \"failed_sheets\"\
    : len(results['processing_errors']),\n            \"total_values_extracted\":\
    \ results['total_values_extracted'],\n            \"processing_timestamp\": datetime.now().isoformat()\n\
    \        },\n        \"detailed_results\": {\n            \"sheet_analysis\":\
    \ results['processed_sheets'],\n            \"processing_errors\": results['processing_errors']\n\
    \        },\n        \"source_file_metadata\": file_info\n    }\n    \n    # Convert\
    \ to JSON\n    json_content = json.dumps(comprehensive_results, indent=2, ensure_ascii=False)\n\
    \    print(f\"\U0001F4C4 Generated results JSON ({len(json_content)} characters)\"\
    )\n    \n    # Create filename with timestamp\n    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')\n\
    \    filename = f\"{config['results_file_name']}_{timestamp}.json\"\n    \n  \
    \  try:\n        # Upload to Google Drive using multipart upload\n        boundary\
    \ = '-------314159265358979323846'\n        delimiter = f'\\r\\n--{boundary}\\\
    r\\n'\n        close_delim = f'\\r\\n--{boundary}--'\n        \n        metadata\
    \ = {\n            'name': filename,\n            'parents': []  # Upload to root\
    \ folder\n        }\n        \n        body = delimiter + 'Content-Type: application/json\\\
    r\\n\\r\\n'\n        body += json.dumps(metadata)\n        body += delimiter +\
    \ 'Content-Type: application/json\\r\\n\\r\\n'\n        body += json_content\n\
    \        body += close_delim\n        \n        headers = {\n            'Authorization':\
    \ f'Bearer {access_token}',\n            'Content-Type': f'multipart/related;\
    \ boundary=\"{boundary}\"'\n        }\n        \n        upload_url = 'https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart'\n\
    \        response = requests.post(upload_url, headers=headers, data=body, timeout=60)\n\
    \        response.raise_for_status()\n        \n        upload_result = response.json()\n\
    \        uploaded_file_id = upload_result['id']\n        file_url = f\"https://drive.google.com/file/d/{uploaded_file_id}/view\"\
    \n        \n        print(\"\u2705 Results uploaded successfully!\")\n       \
    \ print(f\"   File ID: {uploaded_file_id}\")\n        print(f\"   Filename: {filename}\"\
    )\n        print(f\"   URL: {file_url}\")\n        \n        return {\n      \
    \      'upload_success': True,\n            'uploaded_file_id': uploaded_file_id,\n\
    \            'filename': filename,\n            'file_url': file_url,\n      \
    \      'file_size_bytes': len(json_content)\n        }\n        \n    except Exception\
    \ as e:\n        print(f\"\u274C Failed to upload results: {str(e)}\")\n     \
    \   return {\n            'upload_success': False,\n            'error': str(e)\n\
    \        }\n\n# =============================================================================\n\
    # MAIN EXECUTION\n# =============================================================================\n\
    def main():\n    \"\"\"Main execution function.\"\"\"\n    try:\n        # Get\
    \ configuration\n        config = get_configuration()\n        \n        # Validate\
    \ file ID\n        if not config['file_id'] or len(config['file_id']) < 10:\n\
    \            raise Exception(\"Invalid file ID provided\")\n        \n       \
    \ # Authenticate with Google Drive\n        access_token = authenticate_google_drive(config['nango_connection_id'],\
    \ config['nango_key'])\n        \n        # Get file information\n        file_info\
    \ = get_file_info(config['file_id'], access_token)\n        \n        # Download\
    \ Excel file\n        file_content = download_excel_file(config['file_id'], file_info,\
    \ access_token)\n        \n        # Analyze Excel structure\n        excel_analysis\
    \ = analyze_excel_structure(file_content, config['target_sheets'])\n        \n\
    \        # Process all sheets\n        processing_results = process_all_sheets(excel_analysis,\
    \ config['analysis_depth'])\n        \n        # Upload results to Google Drive\n\
    \        upload_results = upload_results_to_drive(\n            processing_results,\
    \ config['file_id'], file_info, config, access_token\n        )\n        \n  \
    \      # Compile final output\n        final_results = {\n            'status':\
    \ 'success',\n            'file_id': config['file_id'],\n            'file_name':\
    \ file_info.get('name', 'Unknown'),\n            'analysis_depth': config['analysis_depth'],\n\
    \            'sheets_processed': len(processing_results['processed_sheets']),\n\
    \            'successful_sheets': processing_results['success_count'],\n     \
    \       'total_values_extracted': processing_results['total_values_extracted'],\n\
    \            'upload_success': upload_results['upload_success'],\n           \
    \ 'uploaded_file_id': upload_results.get('uploaded_file_id'),\n            'results_file_url':\
    \ upload_results.get('file_url'),\n            'results_filename': upload_results.get('filename'),\n\
    \            'processing_timestamp': datetime.now().isoformat()\n        }\n \
    \       \n        if not upload_results['upload_success']:\n            final_results['upload_error']\
    \ = upload_results.get('error')\n        \n        print(\"\\n\" + \"=\" * 70)\n\
    \        print(\"\U0001F389 EXCEL ANALYSIS COMPLETE!\")\n        print(\"=\" *\
    \ 70)\n        print(f\"\u2705 File: {final_results['file_name']}\")\n       \
    \ print(f\"\U0001F4CA Sheets processed: {final_results['sheets_processed']}\"\
    )\n        print(f\"\U0001F4C8 Values extracted: {final_results['total_values_extracted']}\"\
    )\n        \n        if final_results['upload_success']:\n            print(f\"\
    \U0001F4E4 Results uploaded: {final_results['results_filename']}\")\n        \
    \    print(f\"\U0001F517 Access at: {final_results['results_file_url']}\")\n \
    \       else:\n            print(f\"\u26A0\uFE0F Upload failed: {final_results.get('upload_error',\
    \ 'Unknown error')}\")\n        \n        return final_results\n        \n   \
    \ except Exception as e:\n        error_result = {\n            'status': 'error',\n\
    \            'error': str(e),\n            'file_id': config.get('file_id', 'unknown')\
    \ if 'config' in locals() else 'unknown',\n            'sheets_processed': 0,\n\
    \            'total_values_extracted': 0,\n            'upload_success': False,\n\
    \            'processing_timestamp': datetime.now().isoformat()\n        }\n \
    \       \n        print(f\"\\n\u274C ANALYSIS FAILED: {str(e)}\")\n        return\
    \ error_result\n\n# Execute main function and output results\nresults = main()\n\
    print(f\"\\n__OUTPUTS__ {json.dumps(results)}\")\n"
  description: Complete Excel analysis workflow in a single readable script
  requirements:
  - pandas
  - openpyxl
  - requests
  timeout_seconds: 300
- id: download_json_results
  name: Download JSON Results from Drive
  type: script
  script: "import json\nimport requests\nfrom typing import Dict, List, Any\n\ndef\
    \ authenticate_google_drive(nango_connection_id: str, nango_key: str) -> str:\n\
    \    \"\"\"Authenticate with Google Drive and return access token.\"\"\"\n   \
    \ print(\"\U0001F510 Authenticating with Google Drive...\")\n    \n    auth_url\
    \ = f\"https://auth-dev.assistents.ai/connection/{nango_connection_id}?provider_config_key=google-drive-hq3h\"\
    \n    headers = {\n        'Authorization': f'Bearer {nango_key}',\n        'Content-Type':\
    \ 'application/json'\n    }\n    \n    try:\n        response = requests.get(auth_url,\
    \ headers=headers, timeout=15)\n        response.raise_for_status()\n        \n\
    \        auth_data = response.json()\n        access_token = auth_data.get('credentials',\
    \ {}).get('access_token')\n        \n        if not access_token:\n          \
    \  raise Exception(\"No access token received from authentication\")\n       \
    \     \n        print(\"\u2705 Authentication successful\")\n        return access_token\n\
    \        \n    except Exception as e:\n        raise Exception(f\"Authentication\
    \ failed: {str(e)}\")\n\ndef download_json_file(file_id: str, access_token: str)\
    \ -> Dict[str, Any]:\n    \"\"\"Download JSON file from Google Drive.\"\"\"\n\
    \    print(\"\U0001F4E5 Downloading JSON results file...\")\n    \n    headers\
    \ = {'Authorization': f'Bearer {access_token}'}\n    url = f\"https://www.googleapis.com/drive/v3/files/{file_id}\"\
    \n    params = {'alt': 'media'}\n    \n    try:\n        response = requests.get(url,\
    \ headers=headers, params=params, timeout=60)\n        response.raise_for_status()\n\
    \        \n        json_data = response.json()\n        print(\"\u2705 JSON file\
    \ downloaded and parsed successfully\")\n        return json_data\n        \n\
    \    except Exception as e:\n        raise Exception(f\"Failed to download JSON\
    \ file: {str(e)}\")\n\ndef main():\n    try:\n        # Get configuration\n  \
    \      nango_connection_id = \"${nango_connection_id}\"\n        nango_key = \"\
    ${nango_key}\"\n        results_file_id = \"${analyze_excel_file.uploaded_file_id}\"\
    \n        \n        print(\"\U0001F4CB DOWNLOADING JSON RESULTS\")\n        print(\"\
    =\" * 50)\n        print(f\"File ID: {results_file_id}\")\n        \n        #\
    \ Authenticate and download\n        access_token = authenticate_google_drive(nango_connection_id,\
    \ nango_key)\n        json_data = download_json_file(results_file_id, access_token)\n\
    \        \n        # Extract key information\n        execution_summary = json_data.get('execution_summary',\
    \ {})\n        detailed_results = json_data.get('detailed_results', {})\n    \
    \    sheet_analysis = detailed_results.get('sheet_analysis', [])\n        \n \
    \       print(f\"\U0001F4CA Found {len(sheet_analysis)} sheets to process\")\n\
    \        \n        result = {\n            'status': 'success',\n            'json_data':\
    \ json_data,\n            'sheet_count': len(sheet_analysis),\n            'sheets_available':\
    \ [sheet['sheet_name'] for sheet in sheet_analysis if sheet['status'] == 'success'],\n\
    \            'total_values': sum(sheet['total_values'] for sheet in sheet_analysis\
    \ if sheet['status'] == 'success'),\n            'source_file_name': execution_summary.get('file_name',\
    \ 'Unknown')\n        }\n        \n        print(f\"\u2705 Ready to process {result['sheet_count']}\
    \ sheets\")\n        print(f\"\U0001F4DD Total values extracted: {result['total_values']}\"\
    )\n        \n        return result\n        \n    except Exception as e:\n   \
    \     return {\n            'status': 'error',\n            'error': str(e),\n\
    \            'sheet_count': 0,\n            'total_values': 0\n        }\n\n#\
    \ Execute and output results\nresults = main()\nprint(f\"\\n__OUTPUTS__ {json.dumps(results)}\"\
    )\n"
  depends_on:
  - analyze_excel_file
  description: Download and parse the JSON results file
  requirements:
  - requests
  timeout_seconds: 120
- id: create_dynamic_models_and_process
  name: Create Dynamic Pydantic Models and Process with Gemini
  type: script
  script: "import json\nimport time\nimport io\nfrom datetime import datetime\nfrom\
    \ typing import Dict, List, Any, Optional\nfrom pydantic import BaseModel, Field,\
    \ create_model\nimport google.generativeai as genai\nimport requests\nimport pandas\
    \ as pd\nfrom openpyxl import Workbook\nfrom openpyxl.styles import PatternFill,\
    \ Font, Alignment, Border, Side\nfrom openpyxl.utils import get_column_letter\n\
    \ndef create_dynamic_pydantic_model(sheet_name: str, values_extracted: List[str]):\n\
    \    \"\"\"Create a dynamic Pydantic model based on sheet name and extracted values.\"\
    \"\"\n    print(f\"\U0001F4DD Creating model for sheet: {sheet_name}\")\n    \n\
    \    # Clean sheet name for class name\n    class_name = sheet_name.replace('\
    \ ', '').replace('&', 'And').replace('-', '').replace('/', '')\n    class_name\
    \ = f\"{class_name}Model\"\n    \n    # Create field definitions\n    field_definitions\
    \ = {}\n    for value in values_extracted:\n        # Clean field name\n     \
    \   field_name = value.lower().replace(' ', '_').replace('&', 'and').replace('-',\
    \ '_')\n        field_name = field_name.replace('/', '_').replace('(', '').replace(')',\
    \ '')\n        field_name = field_name.replace('%', 'percent').replace('.', '').replace(',',\
    \ '')\n        field_name = ''.join(c for c in field_name if c.isalnum() or c\
    \ == '_')\n        \n        # Ensure field name starts with letter\n        if\
    \ field_name and field_name[0].isdigit():\n            field_name = f\"field_{field_name}\"\
    \n        \n        if field_name:  # Only add if we have a valid field name\n\
    \            field_definitions[field_name] = (str, Field(description=value))\n\
    \    \n    # Create the dynamic model\n    model = create_model(class_name, **field_definitions)\n\
    \    \n    print(f\"\u2705 Created model '{class_name}' with {len(field_definitions)}\
    \ fields\")\n    return model, class_name\n\ndef setup_gemini(api_key: str):\n\
    \    \"\"\"Setup Gemini API client.\"\"\"\n    genai.configure(api_key=api_key)\n\
    \    return genai.GenerativeModel('gemini-1.5-flash')\n\ndef process_sheet_with_gemini(model,\
    \ sheet_data: Dict, dynamic_model, model_name: str, analysis_year: str) -> Optional[Dict]:\n\
    \    \"\"\"Process sheet data with Gemini using dynamic model.\"\"\"\n    try:\n\
    \        sheet_name = sheet_data['sheet_name']\n        values_extracted = sheet_data['values_extracted']\n\
    \        \n        print(f\"\U0001F916 Processing {sheet_name} for {analysis_year}\
    \ with Gemini...\")\n        \n        # Create prompt for financial data extraction\n\
    \        prompt = f\"\"\"\n        You are a financial data extraction specialist.\
    \ I need you to extract financial data for {analysis_year} from the following\
    \ sheet: {sheet_name}.\n\n        Available data points from the sheet:\n    \
    \    {json.dumps(values_extracted, indent=2)}\n\n        Please extract the financial\
    \ values for {analysis_year} and structure them according to the provided schema.\n\
    \        \n        IMPORTANT INSTRUCTIONS:\n        1. Focus specifically on {analysis_year}\
    \ data\n        2. All monetary figures should be in the original units found\
    \ in the data\n        3. If a value is not found or not applicable, use an empty\
    \ string \"\"\n        4. For ratios and percentages, include the unit (e.g.,\
    \ \"2.5x\", \"15%\")\n        5. Be accurate and thorough in extraction\n    \
    \    6. Convert all values to strings for consistency\n\n        Extract the data\
    \ for sheet: {sheet_name}\n        \"\"\"\n\n        # Generate content with Gemini\n\
    \        response = model.generate_content(\n            prompt,\n           \
    \ generation_config=genai.types.GenerationConfig(\n                response_mime_type=\"\
    application/json\",\n                response_schema=dynamic_model\n         \
    \   )\n        )\n        \n        result = json.loads(response.text)\n     \
    \   print(f\"\u2705 Successfully processed {sheet_name} for {analysis_year}\"\
    )\n        return result\n        \n    except Exception as e:\n        print(f\"\
    \u274C Error processing {sheet_name} for {analysis_year}: {str(e)}\")\n      \
    \  return None\n\ndef create_comprehensive_excel(processed_data: Dict[str, Dict[str,\
    \ Dict]], source_file_name: str) -> bytes:\n    \"\"\"Create comprehensive Excel\
    \ workbook with processed data.\"\"\"\n    print(\"\U0001F4CA Creating comprehensive\
    \ Excel workbook...\")\n    \n    wb = Workbook()\n    wb.remove(wb.active)  #\
    \ Remove default sheet\n    \n    # Define styles\n    header_fill = PatternFill(start_color=\"\
    366092\", end_color=\"366092\", fill_type=\"solid\")\n    header_font = Font(bold=True,\
    \ color=\"FFFFFF\", size=11)\n    year_fill = PatternFill(start_color=\"DCE6F1\"\
    , end_color=\"DCE6F1\", fill_type=\"solid\")\n    year_font = Font(bold=True,\
    \ size=10)\n    data_font = Font(size=10)\n    center_align = Alignment(horizontal='center',\
    \ vertical='center')\n    left_align = Alignment(horizontal='left', vertical='center')\n\
    \    right_align = Alignment(horizontal='right', vertical='center')\n    \n  \
    \  thin_border = Border(\n        left=Side(style='thin'), right=Side(style='thin'),\n\
    \        top=Side(style='thin'), bottom=Side(style='thin')\n    )\n    \n    #\
    \ Process each sheet\n    for sheet_name, year_data in processed_data.items():\n\
    \        if not year_data:\n            continue\n            \n        ws = wb.create_sheet(title=sheet_name[:31])\n\
    \        \n        # Get all unique metrics across all years\n        all_metrics\
    \ = set()\n        for year, data in year_data.items():\n            if data:\n\
    \                all_metrics.update(data.keys())\n        \n        all_metrics\
    \ = sorted(list(all_metrics))\n        analysis_years = sorted(year_data.keys())\n\
    \        \n        # Create header row\n        headers = ['Metric'] + analysis_years\n\
    \        ws.append(headers)\n        \n        # Style header row\n        for\
    \ col_idx, header in enumerate(headers, 1):\n            cell = ws.cell(row=1,\
    \ column=col_idx)\n            cell.fill = header_fill\n            cell.font\
    \ = header_font\n            cell.alignment = center_align\n            cell.border\
    \ = thin_border\n        \n        # Add data rows\n        for metric in all_metrics:\n\
    \            row_data = [metric.replace('_', ' ').title()]\n            \n   \
    \         for year in analysis_years:\n                value = year_data.get(year,\
    \ {}).get(metric, \"\")\n                row_data.append(value)\n            \n\
    \            ws.append(row_data)\n            \n            # Style the row\n\
    \            current_row = ws.max_row\n            for col_idx in range(1, len(headers)\
    \ + 1):\n                cell = ws.cell(row=current_row, column=col_idx)\n   \
    \             cell.border = thin_border\n                \n                if\
    \ col_idx == 1:  # Metric name column\n                    cell.font = year_font\n\
    \                    cell.alignment = left_align\n                else:  # Data\
    \ columns\n                    cell.font = data_font\n                    cell.alignment\
    \ = right_align\n        \n        # Adjust column widths\n        ws.column_dimensions['A'].width\
    \ = 40\n        for col_idx in range(2, len(headers) + 1):\n            ws.column_dimensions[get_column_letter(col_idx)].width\
    \ = 15\n        \n        # Freeze panes\n        ws.freeze_panes = 'B2'\n   \
    \     \n        print(f\"\u2705 Created sheet: {sheet_name} with {len(all_metrics)}\
    \ metrics\")\n    \n    # Save to bytes\n    excel_buffer = io.BytesIO()\n   \
    \ wb.save(excel_buffer)\n    excel_buffer.seek(0)\n    \n    print(f\"\U0001F4C1\
    \ Excel workbook created with {len(processed_data)} sheets\")\n    return excel_buffer.getvalue()\n\
    \ndef upload_excel_to_drive(excel_content: bytes, filename: str, access_token:\
    \ str) -> Dict[str, Any]:\n    \"\"\"Upload Excel file to Google Drive.\"\"\"\n\
    \    print(\"\U0001F4E4 Uploading Excel file to Google Drive...\")\n    \n   \
    \ try:\n        boundary = '-------314159265358979323846'\n        delimiter =\
    \ f'\\r\\n--{boundary}\\r\\n'\n        close_delim = f'\\r\\n--{boundary}--'\n\
    \        \n        metadata = {\n            'name': filename,\n            'parents':\
    \ []  # Upload to root folder\n        }\n        \n        body = delimiter +\
    \ 'Content-Type: application/json\\r\\n\\r\\n'\n        body += json.dumps(metadata)\n\
    \        body += delimiter + 'Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet\\\
    r\\n\\r\\n'\n        body += excel_content.decode('latin-1')\n        body +=\
    \ close_delim\n        \n        headers = {\n            'Authorization': f'Bearer\
    \ {access_token}',\n            'Content-Type': f'multipart/related; boundary=\"\
    {boundary}\"'\n        }\n        \n        upload_url = 'https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart'\n\
    \        response = requests.post(upload_url, headers=headers, data=body.encode('latin-1'),\
    \ timeout=120)\n        response.raise_for_status()\n        \n        upload_result\
    \ = response.json()\n        file_url = f\"https://drive.google.com/file/d/{upload_result['id']}/view\"\
    \n        \n        print(\"\u2705 Excel file uploaded successfully!\")\n    \
    \    return {\n            'success': True,\n            'file_id': upload_result['id'],\n\
    \            'file_url': file_url\n        }\n        \n    except Exception as\
    \ e:\n        print(f\"\u274C Failed to upload Excel file: {str(e)}\")\n     \
    \   return {\n            'success': False,\n            'error': str(e)\n   \
    \     }\n\ndef authenticate_google_drive(nango_connection_id: str, nango_key:\
    \ str) -> str:\n    \"\"\"Authenticate with Google Drive.\"\"\"\n    auth_url\
    \ = f\"https://auth-dev.assistents.ai/connection/{nango_connection_id}?provider_config_key=google-drive-hq3h\"\
    \n    headers = {\n        'Authorization': f'Bearer {nango_key}',\n        'Content-Type':\
    \ 'application/json'\n    }\n    \n    response = requests.get(auth_url, headers=headers,\
    \ timeout=15)\n    response.raise_for_status()\n    \n    auth_data = response.json()\n\
    \    return auth_data['credentials']['access_token']\n\ndef main():\n    try:\n\
    \        # Get configuration\n        gemini_api_key = \"${gemini_api_key}\"\n\
    \        analysis_years = [year.strip() for year in \"${analysis_years}\".split(',')]\n\
    \        json_data = ${download_json_results.json_data}\n        source_file_name\
    \ = \"${download_json_results.source_file_name}\"\n        nango_connection_id\
    \ = \"${nango_connection_id}\"\n        nango_key = \"${nango_key}\"\n       \
    \ \n        print(\"\U0001F680 PROCESSING WITH DYNAMIC MODELS AND GEMINI\")\n\
    \        print(\"=\" * 60)\n        print(f\"Analysis Years: {analysis_years}\"\
    )\n        print(f\"Source File: {source_file_name}\")\n        \n        # Setup\
    \ Gemini\n        model = setup_gemini(gemini_api_key)\n        \n        # Get\
    \ sheet analysis data\n        sheet_analysis = json_data['detailed_results']['sheet_analysis']\n\
    \        successful_sheets = [sheet for sheet in sheet_analysis if sheet['status']\
    \ == 'success']\n        \n        print(f\"\U0001F4CA Processing {len(successful_sheets)}\
    \ successful sheets\")\n        \n        # Process each sheet for each analysis\
    \ year\n        processed_data = {}\n        \n        for sheet_data in successful_sheets:\n\
    \            sheet_name = sheet_data['sheet_name']\n            values_extracted\
    \ = sheet_data['values_extracted']\n            \n            if not values_extracted:\n\
    \                print(f\"\u26A0\uFE0F Skipping {sheet_name} - no values extracted\"\
    )\n                continue\n            \n            print(f\"\\n\U0001F4CB\
    \ Processing sheet: {sheet_name}\")\n            \n            # Create dynamic\
    \ Pydantic model\n            dynamic_model, model_name = create_dynamic_pydantic_model(sheet_name,\
    \ values_extracted)\n            \n            # Process for each analysis year\n\
    \            year_results = {}\n            for analysis_year in analysis_years:\n\
    \                result = process_sheet_with_gemini(\n                    model,\
    \ sheet_data, dynamic_model, model_name, analysis_year\n                )\n  \
    \              if result:\n                    year_results[analysis_year] = result\n\
    \                \n                # Rate limiting\n                time.sleep(2)\n\
    \            \n            if year_results:\n                processed_data[sheet_name]\
    \ = year_results\n                print(f\"\u2705 Completed processing {sheet_name}\
    \ for {len(year_results)} years\")\n            \n            # Delay between\
    \ sheets\n            time.sleep(1)\n        \n        if not processed_data:\n\
    \            raise Exception(\"No data was successfully processed\")\n       \
    \ \n        # Create Excel workbook\n        excel_content = create_comprehensive_excel(processed_data,\
    \ source_file_name)\n        \n        # Upload to Google Drive\n        access_token\
    \ = authenticate_google_drive(nango_connection_id, nango_key)\n        timestamp\
    \ = datetime.now().strftime('%Y%m%d_%H%M%S')\n        filename = f\"Financial_Analysis_{source_file_name}_{timestamp}.xlsx\"\
    \n        \n        upload_result = upload_excel_to_drive(excel_content, filename,\
    \ access_token)\n        \n        # Compile final results\n        final_results\
    \ = {\n            'status': 'success',\n            'sheets_processed': len(processed_data),\n\
    \            'analysis_years': analysis_years,\n            'source_file': source_file_name,\n\
    \            'excel_created': upload_result['success'],\n            'excel_file_id':\
    \ upload_result.get('file_id'),\n            'excel_file_url': upload_result.get('file_url'),\n\
    \            'excel_filename': filename,\n            'processing_timestamp':\
    \ datetime.now().isoformat(),\n            'total_extractions': sum(len(year_data)\
    \ for year_data in processed_data.values())\n        }\n        \n        print(\"\
    \\n\" + \"=\" * 70)\n        print(\"\U0001F389 FINANCIAL ANALYSIS COMPLETE!\"\
    )\n        print(\"=\" * 70)\n        print(f\"\u2705 Sheets processed: {final_results['sheets_processed']}\"\
    )\n        print(f\"\U0001F4C5 Analysis years: {len(analysis_years)}\")\n    \
    \    print(f\"\U0001F522 Total extractions: {final_results['total_extractions']}\"\
    )\n        \n        if final_results['excel_created']:\n            print(f\"\
    \U0001F4CA Excel file created: {filename}\")\n            print(f\"\U0001F517\
    \ Access at: {final_results['excel_file_url']}\")\n        else:\n           \
    \ print(f\"\u26A0\uFE0F Excel upload failed: {upload_result.get('error', 'Unknown\
    \ error')}\")\n        \n        return final_results\n        \n    except Exception\
    \ as e:\n        return {\n            'status': 'error',\n            'error':\
    \ str(e),\n            'sheets_processed': 0,\n            'excel_created': False,\n\
    \            'processing_timestamp': datetime.now().isoformat()\n        }\n\n\
    # Execute main function and output results\nresults = main()\nprint(f\"\\n__OUTPUTS__\
    \ {json.dumps(results)}\")\n"
  depends_on:
  - download_json_results
  description: Create dynamic Pydantic models from JSON and process with Gemini for
    each analysis year
  requirements:
  - pydantic
  - google-generativeai
  - requests
  - openpyxl
  - pandas
  timeout_seconds: 600
inputs:
- name: nango_connection_id
  type: string
  default: 4274993f-c614-4efa-a01e-8d07422f4b09
  required: true
  description: Nango connection ID for Google Drive access
- name: nango_key
  type: string
  default: 8df3e2de-2307-48d3-94bd-ddd3fd6a62ec
  required: true
  description: Nango authentication key
- name: file_id
  type: string
  default: 1Glz-mTMDCuGzUGoTLWOrSyIMI1sJDYlA
  required: true
  description: Google Drive file ID of the Excel file to analyze
- name: analysis_depth
  type: string
  default: comprehensive
  required: false
  description: 'Analysis depth: basic, comprehensive, or detailed'
- name: target_sheets
  type: string
  default: Volumes,Power & Fuel,Per tonne,P&L,Operating metrics,Cash Flow,Balance
    Sheet,Financial Ratios,Valuation
  required: false
  description: Comma-separated list of sheet names to analyze
- name: results_file_name
  type: string
  default: Excel_Analysis_Results
  required: false
  description: Name for the results file to upload to Google Drive
- name: analysis_years
  type: string
  default: FY2024
  required: false
  description: Comma-separated list of analysis years to process
- name: gemini_api_key
  type: string
  default: AIzaSyB0_e6aU4gF-qRapMm3UYBSITpbd0ehsYk
  required: true
  description: Gemini API key for AI processing
outputs:
  excel_created:
    type: boolean
    source: create_dynamic_models_and_process.excel_created
    description: Whether Excel file was successfully created
  final_excel_url:
    type: string
    source: create_dynamic_models_and_process.excel_file_url
    description: Direct link to final Excel analysis file
  sheets_available:
    type: integer
    source: download_json_results.sheet_count
    description: Number of sheets available for processing
  sheets_processed:
    type: integer
    source: create_dynamic_models_and_process.sheets_processed
    description: Number of sheets successfully processed
  processing_status:
    type: string
    source: create_dynamic_models_and_process.status
    description: Overall processing status
  total_extractions:
    type: integer
    source: create_dynamic_models_and_process.total_extractions
    description: Total number of data extractions performed
  final_excel_file_id:
    type: string
    source: create_dynamic_models_and_process.excel_file_id
    description: Google Drive file ID of final Excel analysis
  json_download_status:
    type: string
    source: download_json_results.status
    description: Status of JSON file download
version: '2.0'
namespace: excel_analysis
description: Simplified workflow to analyze a single Excel file and upload results
  to Google Drive
Execution ID Status Started Duration Actions
7a8e8bb9... COMPLETED 2025-07-31
18:27:43
N/A View
25eeb35a... COMPLETED 2025-07-31
18:24:53
N/A View
8128a06a... COMPLETED 2025-07-31
18:07:49
N/A View
29ad898b... COMPLETED 2025-07-31
17:01:50
N/A View
a99889a4... COMPLETED 2025-07-31
11:06:42
N/A View
cba6488d... COMPLETED 2025-07-31
11:02:07
N/A View
8a67cfa3... COMPLETED 2025-07-31
10:53:23
N/A View
f07e88a6... COMPLETED 2025-07-31
10:35:46
N/A View
7d0cb621... COMPLETED 2025-07-31
10:09:01
N/A View
997f4416... COMPLETED 2025-07-31
09:33:18
N/A View
bc7419e9... COMPLETED 2025-07-31
09:27:11
N/A View
f46a2948... COMPLETED 2025-07-31
09:12:08
N/A View
ccb4fe69... COMPLETED 2025-07-31
09:09:30
N/A View
6a4aed8b... COMPLETED 2025-07-31
07:51:28
N/A View
e46f3152... COMPLETED 2025-07-31
07:47:29
N/A View
ebe186d3... COMPLETED 2025-07-31
07:45:12
N/A View
a818f809... COMPLETED 2025-07-31
07:43:17
N/A View
c615b214... COMPLETED 2025-07-31
07:40:16
N/A View
f7a94b6a... COMPLETED 2025-07-31
07:38:43
N/A View
c4920a09... COMPLETED 2025-07-31
07:36:09
N/A View