Multi-Server Deployment Guide¶

PLAN-129 Phase 2.3: Distributed State Infrastructure for Multi-Server Deployments

This guide explains how to deploy Cesivi Server in a multi-server configuration with load balancing, using Redis for distributed state and SQL Server (or other backends) for shared storage.

Overview¶

Cesivi Server supports multi-server deployments with:

Distributed Session State: Sessions persist across server restarts and are accessible from any server instance
Distributed Cache: User permissions, authentication tokens cached with pub/sub invalidation
Distributed Locks: Coordination for critical operations
Pub/Sub: Cache invalidation notifications propagate to all servers
Queue: Background job processing across servers
Load Balancing: No sticky sessions required - any server can handle any request

Key Benefits: - High Availability: Server failures don't lose user sessions - Horizontal Scalability: Add more servers as load increases - Zero Downtime Deployments: Rolling updates without session loss - Fault Tolerance: Redis replication and SQL Server clustering support

Architecture¶

┌──────────────┐
│ Load Balancer│ (Nginx/HAProxy/Azure LB)
│ (Round-Robin)│
└──────┬───────┘
       │
   ┌───┴────┬────────┬────────┐
   │        │        │        │
┌──▼──┐  ┌──▼──┐  ┌──▼──┐  ┌──▼──┐
│SPM 1│  │SPM 2│  │SPM 3│  │ ... │  Cesivi Server instances
└──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘
   │        │        │        │
   └────────┴────────┴────────┘
            │
       ┌────▼────┐
       │  Redis  │ ← Distributed State (session/cache/locks/pub-sub/queue)
       │ Primary │
       └────┬────┘
            │ (optional replication)
       ┌────▼────┐
       │  Redis  │
       │ Replica │
       └─────────┘
            │
   ┌────────┴────────┐
   │                 │
┌──▼──────┐   ┌──────▼────┐
│SQL Server│   │ File Share│ ← Shared Storage (persistent data)
│(Primary)  │   │ (NFS/S3)  │
│(Replica)  │   │           │
└───────────┘   └───────────┘

Components:

Load Balancer: Distributes requests across SPM instances (round-robin, least-connections, etc.)
Cesivi Server Instances: Stateless application servers (scale horizontally)
Redis: Stores session state, cache, provides pub/sub and distributed locks
Shared Storage: SQL Server, PostgreSQL, or file share for persistent SharePoint data

Prerequisites¶

Infrastructure Requirements¶

Per Server Instance: - CPU: 2+ cores - RAM: 4GB+ (8GB recommended) - Disk: 10GB+ (depends on storage backend)

Redis: - CPU: 1-2 cores - RAM: 2GB+ (size based on session/cache volume) - Disk: 1GB+ for persistence (AOF/RDB)

SQL Server (if used): - CPU: 2+ cores - RAM: 4GB+ - Disk: 50GB+ (depends on data volume)

Software Requirements¶

Docker & Docker Compose (for containerized deployments) OR
.NET 10.0 SDK/Runtime (for manual deployments)
Redis 7.0+
SQL Server 2019+ or PostgreSQL 13+ (optional)
Load balancer (Nginx, HAProxy, Azure Load Balancer, etc.)

Quick Start with Docker Compose¶

The easiest way to test multi-server deployment is using the provided docker-compose.multiserver.yml:

# 1. Start all services (3 SPM instances + Redis + SQL Server + Nginx)
docker-compose -f docker-compose.multiserver.yml up -d

# 2. Check service health
docker-compose -f docker-compose.multiserver.yml ps

# 3. Access Cesivi Server through load balancer
curl http://localhost:8080/health

# 4. View logs
docker-compose -f docker-compose.multiserver.yml logs -f spm-server-1

# 5. Stop all services
docker-compose -f docker-compose.multiserver.yml down

What's Included:

3x Cesivi Server instances (ports 5001, 5002, 5003)
1x Redis (port 6379) - Distributed state backend
1x SQL Server Express (port 1433) - Shared storage (optional)
1x Nginx (port 8080) - Load balancer

Testing Session Persistence:

# Create a session on any server
curl -c cookies.txt http://localhost:8080/api/session/create

# Subsequent requests will be load-balanced but session persists
curl -b cookies.txt http://localhost:8080/api/session/get

# Stop server 1, session still works on servers 2 and 3
docker stop spm-server-1
curl -b cookies.txt http://localhost:8080/api/session/get

Configuration¶

Environment Variables¶

Distributed State Configuration:

# Redis provider (multi-server)
Cesivi__DistributedState__Provider=Redis
Cesivi__DistributedState__ConnectionString=redis-host:6379,password=YourPassword,ssl=true

# InMemory provider (single-server)
Cesivi__DistributedState__Provider=InMemory

Storage Configuration:

# InMemory (development/testing)
Cesivi__Storage__Provider=InMemory

# SQL Server (production)
Cesivi__Storage__Provider=SqlServer
Cesivi__Storage__ConnectionString=Server=sql-host,1433;Database=Cesivi;User=sa;Password=YourPassword;TrustServerCertificate=True

# PostgreSQL (production)
Cesivi__Storage__Provider=PostgreSQL
Cesivi__Storage__ConnectionString=Host=pg-host;Port=5432;Database=Cesivi;Username=postgres;Password=YourPassword

Farm Configuration:

# Enable farm mode (multi-server)
Cesivi__Farm__Enabled=true
Cesivi__Farm__ServerId=spm-server-1  # Unique per server

appsettings.Production.json¶

{
  "Cesivi": {
    "DistributedState": {
      "Provider": "Redis",
      "ConnectionString": "redis-cluster.example.com:6379,password=SecurePassword123!,ssl=true,abortConnect=false"
    },
    "Storage": {
      "Provider": "SqlServer",
      "ConnectionString": "Server=sql-cluster.example.com,1433;Database=Cesivi;User Id=spm_app;Password=AppPassword123!;TrustServerCertificate=True;Encrypt=True"
    },
    "Farm": {
      "Enabled": true,
      "ServerId": "${HOSTNAME}"
    }
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Cesivi": "Information",
      "Microsoft.AspNetCore": "Warning"
    }
  }
}

Distributed DataProtection + OIDC server-side state (PLAN-1772)¶

Running more than one instance behind a load balancer requires a shared ASP.NET Core DataProtection key ring — otherwise each instance mints its own keys and auth cookies / OIDC correlation cookies minted on instance A fail to validate on instance B.

Cesivi.Server: Cesivi:DataProtection:Provider = PostgreSql | SqlServer | SharedDirectory (reuses the configured storage connection string, or an explicit UNC/SMB/NFS KeysPath). Fail-fast startup gates reject an unshareable FileSystem provider when Cluster:Mode=Cluster, and reject a DataProtection backend that doesn't match the configured StorageProvider (e.g. SqlServer DataProtection with a non-SQL storage backend).
Cesivi.WebUI: WebUI:DataProtectionKeysPath — a shared path all instances point at — plus WebUI:DistributedState:Provider=Redis so the OIDC state blob (the ServerSideStateDataFormat) is stored server-side in shared Redis instead of per-process memory. This is what lets a login started on instance A complete on instance B (and avoids the Keycloak HTTP 431 you get from an oversized state query parameter). The default InMemory provider is correct only for a single WebUI instance.

Full reference (complete config templates, nginx upstream template with no session affinity, health probes, acceptance checklist): _docs_dev/multi-instance-deploy.md.

Deployment Scenarios¶

Scenario 1: Docker Swarm / Kubernetes¶

Docker Swarm Stack:

# spm-stack.yml
version: '3.8'

services:
  spm-server:
    image: Cesivi/server:latest
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
    environment:
      - Cesivi__DistributedState__Provider=Redis
      - Cesivi__DistributedState__ConnectionString=${REDIS_CONN_STRING}
      - Cesivi__Storage__Provider=SqlServer
      - Cesivi__Storage__ConnectionString=${SQL_CONN_STRING}
    networks:
      - spm-network
    ports:
      - target: 5000
        published: 5000
        protocol: tcp
        mode: host

networks:
  spm-network:
    driver: overlay

Deploy:

docker stack deploy -c spm-stack.yml spm

Kubernetes Deployment:

# spm-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spm-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: spm-server
  template:
    metadata:
      labels:
        app: spm-server
    spec:
      containers:
      - name: spm-server
        image: Cesivi/server:latest
        ports:
        - containerPort: 5000
        env:
        - name: Cesivi__DistributedState__Provider
          value: "Redis"
        - name: Cesivi__DistributedState__ConnectionString
          valueFrom:
            secretKeyRef:
              name: spm-secrets
              key: redis-connection
        - name: Cesivi__Storage__Provider
          value: "SqlServer"
        - name: Cesivi__Storage__ConnectionString
          valueFrom:
            secretKeyRef:
              name: spm-secrets
              key: sql-connection
        livenessProbe:
          httpGet:
            path: /health
            port: 5000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 5000
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: spm-server
spec:
  type: LoadBalancer
  selector:
    app: spm-server
  ports:
  - protocol: TCP
    port: 80
    targetPort: 5000

Deploy:

kubectl apply -f spm-deployment.yaml

Scenario 2: Azure App Service¶

Create Azure Resources:
Azure App Service (multiple instances with auto-scaling)
Azure Cache for Redis (Standard/Premium tier)
Azure SQL Database (Standard/Premium tier)
Configure App Service:

# Set environment variables
az webapp config appsettings set -n spm-app -g spm-rg --settings \
  Cesivi__DistributedState__Provider=Redis \
  Cesivi__DistributedState__ConnectionString="spm-cache.redis.cache.windows.net:6380,password=...,ssl=true" \
  Cesivi__Storage__Provider=SqlServer \
  Cesivi__Storage__ConnectionString="Server=tcp:spm-sql.database.windows.net,1433;Database=Cesivi;..."

# Enable auto-scaling
az monitor autoscale create -n spm-autoscale -g spm-rg \
  --resource spm-app \
  --min-count 2 \
  --max-count 10 \
  --count 3

Scenario 3: AWS ECS Fargate¶

{
  "family": "spm-server",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "spm-server",
      "image": "Cesivi/server:latest",
      "portMappings": [
        {
          "containerPort": 5000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "Cesivi__DistributedState__Provider",
          "value": "Redis"
        },
        {
          "name": "Cesivi__DistributedState__ConnectionString",
          "value": "spm-cache.abc123.0001.use1.cache.amazonaws.com:6379"
        },
        {
          "name": "Cesivi__Storage__Provider",
          "value": "SqlServer"
        }
      ],
      "secrets": [
        {
          "name": "Cesivi__Storage__ConnectionString",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:spm/sql-connection"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:5000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3
      }
    }
  ]
}

Monitoring & Health Checks¶

Health Endpoints¶

Cesivi Server provides several health check endpoints:

# Basic health check (returns 200 OK if healthy)
GET /health

# Detailed health check (includes dependencies)
GET /health/ready

# Liveness probe (for Kubernetes)
GET /health/live

# Metrics endpoint (Prometheus format)
GET /metrics

Example Response:

{
  "status": "Healthy",
  "totalDuration": "00:00:00.0234567",
  "entries": {
    "DistributedState": {
      "status": "Healthy",
      "description": "Redis connection healthy",
      "data": {
        "provider": "Redis",
        "connected": true
      }
    },
    "Storage": {
      "status": "Healthy",
      "description": "SQL Server connection healthy",
      "data": {
        "provider": "SqlServer",
        "connected": true
      }
    }
  }
}

Logging¶

Recommended Logging Configuration:

{
  "Serilog": {
    "MinimumLevel": {
      "Default": "Information",
      "Override": {
        "Microsoft": "Warning",
        "Cesivi.Server.Services.DistributedSessionStateStore": "Debug",
        "Cesivi.Server.Authentication.DistributedTokenCache": "Debug",
        "Cesivi.Server.Authorization.DistributedUserContextCache": "Debug"
      }
    },
    "WriteTo": [
      {
        "Name": "Console",
        "Args": {
          "outputTemplate": "[{Timestamp:HH:mm:ss} {Level:u3}] {SourceContext} {Message:lj}{NewLine}{Exception}"
        }
      },
      {
        "Name": "File",
        "Args": {
          "path": "/var/log/spm/log-.txt",
          "rollingInterval": "Day",
          "retainedFileCountLimit": 7
        }
      }
    ]
  }
}

Metrics & Monitoring¶

Key Metrics to Monitor:

Request Rate: Requests/second per server
Response Time: P50, P95, P99 latencies
Error Rate: 4xx/5xx responses
Session Count: Active sessions (from Redis)
Cache Hit Rate: User context cache efficiency
Redis Memory Usage: Track for capacity planning
SQL Connection Pool: Active connections
Server CPU/RAM: Resource utilization

Prometheus Metrics:

# Request rate
rate(http_requests_total[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# Response time (p95)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Active sessions
redis_sessions_active

# Cache hit rate
rate(usercontext_cache_hits_total[5m]) / rate(usercontext_cache_requests_total[5m])

Troubleshooting¶

Issue: Sessions Not Persisting Across Servers¶

Symptoms: - User logged out when request goes to different server - Session data lost randomly

Diagnosis:

# Check Redis connection from SPM server
docker exec spm-server-1 curl http://localhost:5000/health

# Check Redis directly
docker exec spm-redis redis-cli -a YourPassword KEYS "session:*"

# Check distributed state provider configuration
docker exec spm-server-1 env | grep DistributedState

Solutions:

Verify Cesivi__DistributedState__Provider=Redis is set
Check Redis connection string is correct
Ensure Redis is accessible from all SPM instances
Check firewall rules (port 6379)
Verify Redis password if authentication is enabled

Issue: Cache Invalidation Not Propagating¶

Symptoms: - User permissions cached incorrectly after changes - Stale data visible on some servers but not others

Diagnosis:

# Enable debug logging for pub/sub
Cesivi__Logging__LogLevel__Cesivi.Server.Authorization.DistributedUserContextCache=Debug

# Check Redis pub/sub channels
docker exec spm-redis redis-cli -a YourPassword PUBSUB CHANNELS

Solutions:

Verify pub/sub is working: PUBLISH usercontext:invalidated "test"
Check subscription logs in SPM server logs
Ensure Redis supports pub/sub (should be enabled by default)
Check network latency between servers and Redis

Issue: High Redis Memory Usage¶

Symptoms: - Redis memory usage growing continuously - Out of memory errors

Diagnosis:

# Check Redis memory stats
docker exec spm-redis redis-cli -a YourPassword INFO memory

# Check key count and expiration
docker exec spm-redis redis-cli -a YourPassword DBSIZE
docker exec spm-redis redis-cli -a YourPassword TTL "session:abc123"

Solutions:

Verify TTL is set correctly on keys
Enable Redis eviction policy: maxmemory-policy allkeys-lru
Increase Redis memory limit
Reduce session/cache TTL if appropriate
Monitor for memory leaks in application code

Issue: SQL Server Connection Pool Exhausted¶

Symptoms: - Timeout errors when accessing storage - "Timeout expired. The timeout period elapsed..." errors

Diagnosis:

# Check SQL Server connections
SELECT * FROM sys.dm_exec_sessions WHERE program_name LIKE '%Cesivi%'

# Check connection pool size in SPM logs
grep "Connection pool" /var/log/spm/*.log

Solutions:

Increase connection pool size: Max Pool Size=200 in connection string
Verify connections are being properly disposed
Check for long-running queries causing connection blocking
Enable connection pooling: Pooling=true (default)

Performance Tuning¶

Redis Optimization¶

Connection Pooling:

# Cesivi Server automatically pools Redis connections
# Configure connection multiplexer settings:
Cesivi__DistributedState__ConnectionString=redis-host:6379,abortConnect=false,connectTimeout=5000,syncTimeout=5000

Redis Configuration:

# redis.conf optimizations

# Maximum memory (e.g., 2GB)
maxmemory 2gb

# Eviction policy (remove least recently used keys when out of memory)
maxmemory-policy allkeys-lru

# Persistence (RDB + AOF for durability)
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

# Networking
tcp-backlog 511
timeout 0
tcp-keepalive 300

# Performance
maxclients 10000

SQL Server Optimization¶

Connection String:

Server=sql-host;Database=Cesivi;User=spm_app;Password=...;
Min Pool Size=10;Max Pool Size=200;Pooling=true;
Connection Timeout=30;Command Timeout=300;
TrustServerCertificate=True;Encrypt=True;MultipleActiveResultSets=True

Indexing:

-- Create indexes for frequently queried columns
CREATE NONCLUSTERED INDEX IX_ListItems_ListId ON ListItems(ListId);
CREATE NONCLUSTERED INDEX IX_ListItems_Created ON ListItems(Created);
CREATE NONCLUSTERED INDEX IX_Files_Url ON Files(Url);
CREATE NONCLUSTERED INDEX IX_Users_LoginName ON Users(LoginName);

-- Update statistics regularly
UPDATE STATISTICS ListItems WITH FULLSCAN;
UPDATE STATISTICS Files WITH FULLSCAN;

Load Balancer Tuning¶

Nginx:

# nginx.conf optimizations

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 10000;
    use epoll;
}

http {
    # Connection pooling to backends
    upstream spm_backend {
        least_conn;  # Use least-connections instead of round-robin for better distribution
        server spm-server-1:5000 max_fails=3 fail_timeout=30s;
        server spm-server-2:5000 max_fails=3 fail_timeout=30s;
        server spm-server-3:5000 max_fails=3 fail_timeout=30s;
        keepalive 64;  # Connection pool to backends
    }

    # Caching (optional, for static assets)
    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=spm_cache:10m max_size=1g;

    # Timeouts
    proxy_connect_timeout 60s;
    proxy_send_timeout 60s;
    proxy_read_timeout 60s;
    keepalive_timeout 65s;

    # Buffering
    proxy_buffering on;
    proxy_buffer_size 4k;
    proxy_buffers 8 4k;
    proxy_busy_buffers_size 8k;
}

Security Considerations¶

Network Security¶

TLS/SSL Termination: Terminate HTTPS at load balancer
Private Networks: Run Redis and SQL Server on private network (no public IP)
Firewall Rules: Restrict access to management ports
VPN/Bastion: Use VPN or bastion host for administrative access

Redis Security¶

# redis.conf security settings

# Require password
requirepass YourStrongPasswordHere123!

# Disable dangerous commands
rename-command FLUSHDB ""
rename-command FLUSHALL ""
rename-command CONFIG ""
rename-command SHUTDOWN ""

# Bind to private IP only
bind 10.0.1.100

# Enable TLS (Redis 6+)
tls-port 6380
tls-cert-file /path/to/redis.crt
tls-key-file /path/to/redis.key
tls-ca-cert-file /path/to/ca.crt

SQL Server Security¶

Least Privilege: Use dedicated SQL user with minimal permissions
Encryption: Enable TLS encryption (Encrypt=True)
Firewall: Restrict SQL Server port 1433 to SPM servers only
Backup Encryption: Encrypt SQL Server backups
Audit Logging: Enable SQL Server audit logging

Application Security¶

Secrets Management: Use Azure Key Vault, AWS Secrets Manager, etc.
Environment Variables: Never commit credentials to git
Token Rotation: Implement token rotation for bearer tokens
Rate Limiting: Implement rate limiting at load balancer
CORS: Configure CORS policies appropriately

Summary¶

Multi-server deployment of Cesivi Server enables:

✅ High Availability: No single point of failure
✅ Horizontal Scalability: Scale out as needed
✅ Session Persistence: Sessions survive server restarts
✅ Cache Coordination: Pub/sub invalidation across servers
✅ Production-Ready: Enterprise-grade architecture

Next Steps:

Review Configuration section for your environment
Choose a Deployment Scenario
Set up Monitoring
Test failover scenarios
Implement Security Considerations

For questions or issues, see TROUBLESHOOTING or API_REFERENCE.