ObservabilitΓ© production avec Prometheus + Grafana + Alertmanager
βββββββββββββββββββββββββββββββββββββββββββββ
β COSMIC Services (Instrumented) β
β ββββββββββββββββββββββββββββββββ β
β β’ Orchestrator API (8100) β
β β’ Ollama Bridge (8200) β
β β’ /metrics endpoints β
ββββββββββββββββ¬βββββββββββββββββββββββββββββ
β
β Scrape every 15s
β (pull-based collection)
βΌ
ββββββββββββββββββββββββ
β π Prometheus β
β Time-series DB β
β Port 9090 β
β β
β β’ Metrics storage β
β β’ Query engine β
β β’ Alert rules β
ββββββββββββ¬ββββββββββββ
β
ββββββββ΄βββββββββ
β β
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β π Grafana β β π Alert- β
β Dashboard β β manager β
β Port 3100 β β Port 9093 β
β β β β
β Visualize β β Notify β
β metrics β β on issues β
ββββββββββββββββ ββββββββ¬ββββββββ
β
βΌ
Email/Slack/PagerDuty
βββββββββββββββββββββββββββββββββββ
β Key Metrics Tracked β
β ββββββββββββββββββ β
β β’ API latency (p50/p95/p99) β
β β’ Error rate percentage β
β β’ Embedding generation time β
β β’ DuckDB query performance β
β β’ Ollama bridge uptime β
β β’ Cost tracking (Claude usage) β
βββββββββββββββββββββββββββββββββββPrometheus scrape endpoints toutes les 15s
Time-series database avec rΓ©tention configurable
Dashboards Grafana en temps rΓ©el
Alertmanager pour notifications proactives
Vue d'ensemble santΓ© globale
Analyse dΓ©taillΓ©e performance requΓͺtes
Monitoring coΓ»ts Claude vs Ollama
MΓ©triques infrastructure Docker