Skip to main content
agtOS provides three observability interfaces: structured JSON logs, Prometheus-compatible metrics, and a health check system with per-service status.

Logging

agtOS uses Pino for structured JSON logging.

Log levels

Set the log level via LOG_LEVEL environment variable or the Settings API:
LevelWhen to use
traceExtremely verbose, per-audio-frame diagnostics
debugDevelopment — request/response details, provider calls
infoProduction default — startup, requests, errors
warnDegraded behavior — fallback providers, missing optional services
errorFailures — provider errors, unhandled exceptions
fatalCritical — server cannot start
# Set at startup
LOG_LEVEL=debug npx agtos start

# Change at runtime (no restart needed)
curl -X PUT http://localhost:4102/api/settings \
  -H "Content-Type: application/json" \
  -d '{"logLevel": "debug"}'

Output format

EnvironmentFormatBest for
NODE_ENV=productionJSON (structured)Log aggregation (ELK, Loki, CloudWatch)
Development (default)JSONMachine parsing, piped through pino-pretty for readability
# Pretty-print logs in development
npx agtos start 2>&1 | npx pino-pretty

Log dashboard

The web dashboard includes a Logs page with filtering by level, component, and correlation ID. Access it at http://localhost:4102 (Logs tab in the sidebar).

Metrics

agtOS exposes a Prometheus-compatible metrics endpoint for integration with Prometheus, Grafana, and other monitoring tools.

Endpoint

GET http://localhost:4102/metrics
Returns metrics in the standard text exposition format (text/plain; version=0.0.4).
The /metrics endpoint does not require authentication, even when AGTOS_API_KEY is set. This allows Prometheus to scrape without token configuration.

Available metrics

MetricTypeDescription
HTTP request countCounterTotal requests by method, path, and status code
HTTP request latencyHistogramRequest duration in seconds (buckets: 0.1s, 0.5s, 1s, 5s)
Active voice sessionsGaugeCurrently connected WebSocket clients
STT latencyHistogramSpeech-to-text processing time (p50, p95, p99)
TTS latencyHistogramText-to-speech synthesis time (p50, p95, p99)
Audio chunks in/outCounterTotal audio frames received and sent

Prometheus configuration

# prometheus.yml
scrape_configs:
  - job_name: 'agtos'
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:4102']
    metrics_path: '/metrics'

Health Checks

The health system provides per-service status with response times.

Endpoints

EndpointDescription
GET /healthAggregated health of all services
GET /health/metricsMetrics summary (request rates, latency percentiles)
GET /health/:serviceNameIndividual service health (e.g., /health/redis)
GET /api/healthSame as /health (under API prefix, requires auth if enabled)

Service checkers

The health manager runs periodic checks on:
ServiceWhat it checks
redisConnection, ping latency
stt-sherpa-onnxSTT provider availability and model status
tts-sherpa-onnxTTS provider availability and model status
ollamaServer reachability at OLLAMA_HOST
claudeAPI key validity (cached)
mcp-serverMCP server listening on port

Response format

{
  "status": "healthy",
  "services": {
    "redis": { "status": "healthy", "responseTime": 2 },
    "ollama": { "status": "healthy", "responseTime": 12 },
    "stt-sherpa-onnx": { "status": "healthy", "responseTime": 5 }
  },
  "timestamp": 1711612800000
}
Returns 200 when all services are healthy, 503 when any service is degraded.

Dashboard

The web dashboard Health page visualizes service health with color-coded status cards, latency metrics, and auto-refresh every 5 seconds. Access at http://localhost:4102.

CLI diagnostics

# Quick health check
curl http://localhost:4102/health

# Comprehensive diagnostics (12 checks with remediation hints)
npx agtos doctor
See CLI Reference for details on the doctor command.