Skip to content

Scaling

Guidelines for scaling PolitikTok for larger deployments.

Horizontal Scaling

Application Layer

PolitikTok is stateless at the application layer (sessions are stored in PostgreSQL), so you can run multiple instances behind a load balancer:

                    ┌──────────────┐
                    │ Load Balancer│
                    └──────┬───────┘
               ┌───────────┼───────────┐
               │           │           │
         ┌─────┴──┐  ┌─────┴──┐  ┌─────┴──┐
         │ App 1  │  │ App 2  │  │ App 3  │
         └────────┘  └────────┘  └────────┘
               │           │           │
         ┌─────┴───────────┴───────────┴──┐
         │          PostgreSQL             │
         └────────────────────────────────┘

Database Layer

  • Use PostgreSQL connection pooling (PgBouncer) for high connection counts
  • Consider read replicas for analytics-heavy workloads
  • Qdrant supports distributed mode for large vector datasets

Vertical Scaling

LLM Performance

LLM inference is typically the bottleneck. Options:

ApproachBenefit
GPU acceleration10-50x faster inference
Smaller modelsFaster responses, less memory
Multiple Ollama instancesParallel request handling
Remote LLM APIOffload compute entirely

Memory Optimization

ComponentMemory UsageNotes
PolitikTok app~100-300 MBScales with connections
PostgreSQL~256 MB - 2 GBDepends on data size
Qdrant~500 MB - 4 GBDepends on vector count
Ollama (7B model)~4-8 GBDepends on model size
Ollama (13B model)~8-16 GBGPU recommended

Monitoring

For production deployments, set up monitoring:

  • Application metrics: Integrate with Prometheus via axum middleware
  • Database monitoring: PostgreSQL pg_stat_* views
  • Infrastructure: Node Exporter + Grafana dashboards
  • Logs: Centralize with Loki, ELK, or similar

Caching

Currently not implemented. Planned caching layers:

  • LLM response caching for repeated queries
  • Database query result caching
  • Static asset CDN