📊 Monitoring Stack

Observabilité production avec Prometheus + Grafana + Alertmanager

Architecture Monitoring

┌───────────────────────────────────────────┐
│   COSMIC Services (Instrumented)         │
│   ────────────────────────────────        │
│   • Orchestrator API (8100)              │
│   • Ollama Bridge (8200)                 │
│   • /metrics endpoints                   │
└──────────────┬────────────────────────────┘
               │
               │ Scrape every 15s
               │ (pull-based collection)
               ▼
    ┌──────────────────────┐
    │  📊 Prometheus       │
    │  Time-series DB      │
    │  Port 9090           │
    │                      │
    │  • Metrics storage   │
    │  • Query engine      │
    │  • Alert rules       │
    └──────────┬───────────┘
               │
        ┌──────┴────────┐
        │               │
        ▼               ▼
┌──────────────┐  ┌──────────────┐
│  📈 Grafana  │  │  🔔 Alert-   │
│  Dashboard   │  │  manager     │
│  Port 3100   │  │  Port 9093   │
│              │  │              │
│  Visualize   │  │  Notify      │
│  metrics     │  │  on issues   │
└──────────────┘  └──────┬───────┘
                         │
                         ▼
                  Email/Slack/PagerDuty

┌─────────────────────────────────┐
│  Key Metrics Tracked            │
│  ──────────────────             │
│  • API latency (p50/p95/p99)    │
│  • Error rate percentage        │
│  • Embedding generation time    │
│  • DuckDB query performance     │
│  • Ollama bridge uptime         │
│  • Cost tracking (Claude usage) │
└─────────────────────────────────┘

4 Piliers Observabilité

Metrics Collection

Prometheus scrape endpoints toutes les 15s

API response times
Ollama bridge health
DuckDB query latency
Embedding generation metrics

Data Storage

Time-series database avec rétention configurable

15-day retention policy
Automatic downsampling
Disk-based persistence
Query-optimized indexes

Visualization

Dashboards Grafana en temps réel

System health overview
Query performance tracking
Error rate monitoring
Cost analysis (Claude vs Ollama)

Alerting

Alertmanager pour notifications proactives

API latency > 500ms threshold
Error rate > 1% alert
Ollama downtime detection
Email/Slack integration

Métriques Production Actuelles

API Uptime

100%

no downtime

Avg Latency

78ms

p50 response time

Error Rate

118/118 success

Bridge Health

100%

Ollama stable

Scrape Interval

15s

Prometheus

Retention

15d

time-series data

Dashboards Disponibles

📊 System Overview

Vue d'ensemble santé globale

Total requests/s
P95 latency trends
Error rate timeline
Service health status

🔍 Query Performance

Analyse détaillée performance requêtes

Neural vs Symbolic vs Hybrid breakdown
DuckDB query times
Embedding generation latency
Context size distribution

💰 Cost Tracking

Monitoring coûts Claude vs Ollama

API tokens consumed
Estimated monthly cost
Ollama local savings
Cost per query type

🔌 Infrastructure

Métriques infrastructure Docker

Container CPU/memory usage
Network I/O rates
Disk space utilization
Ollama bridge uptime

💡 Avantages Clés

Visibilité: Métriques temps réel sur toute la stack
Proactivité: Alertes avant que l'utilisateur détecte problème
Debugging: Historique 15j pour root cause analysis
Optimisation: Identifier goulots performance
Cost Control: Tracking précis coûts Claude vs Ollama

Accès aux Services

Prometheus

http://localhost:9090

Métriques brutes + Query explorer

Grafana

http://localhost:3100

Dashboards visuels (admin/admin)

Alertmanager

http://localhost:9093

Configuration alertes