← Retour à l'accueil

🤖 LLM Orchestration Process

Claude/Ollama intelligent routing pour optimisation cost-aware

Pipeline Orchestration

┌────────────────────────────────┐
│  💬 User Query + Context      │
│  (from RAG/Symbolic/Forecast) │
└──────────────┬─────────────────┘
               │
               ▼
    ┌──────────────────────┐
    │  🔍 Complexity Score │
    │  • Token count       │
    │  • Query type        │
    │  • Domain expertise  │
    └──────────┬───────────┘
               │
        ┌──────▼──────┐
        │  Cost-aware │
        │   routing   │
        └──┬───────┬──┘
           │       │
    Simple │       │ Complex
      │    │       │    │
      ▼    ▼       ▼    ▼
┌─────────────┐  ┌──────────────┐
│ 🏠 Ollama   │  │ ☁️ Claude API │
│ • Free      │  │ • High qual  │
│ • Fast      │  │ • Reasoning  │
│ • llama3.1  │  │ • Sonnet 4.5 │
└─────┬───────┘  └──────┬───────┘
      │                 │
      └────────┬────────┘
               │
               ▼
    ┌──────────────────────┐
    │  ✅ Response Check   │
    │  • Quality score     │
    │  • Fallback logic    │
    └──────────┬───────────┘
               │
               ▼
    ┌──────────────────────┐
    │  📦 Final Answer     │──► User
    └──────────────────────┘

Routing Logic

🏠 Ollama (Local)

  • Simple queries
  • Factual retrieval
  • Document summarization
  • Cost: $0.00
  • Latency: ~200ms

☁️ Claude API

  • Complex reasoning
  • Multi-step analysis
  • Creative synthesis
  • Cost: ~$0.02/query
  • Latency: ~800ms

💡 Smart Routing Benefits

  • Cost Optimization: 80% queries handled by free Ollama
  • Performance: Local-first = faster average response
  • Quality Guarantee: Claude fallback for complex tasks
  • Privacy: Sensitive queries stay local

📊 Architecture Visualisations

Hardware Architecture

Hardware Architecture Diagram

Phase Migration: Virtual → Production

Phase Migration Virtual to Production