Skip to content

Multi-Server Deployment Guide

PLAN-129 Phase 2.3: Distributed State Infrastructure for Multi-Server Deployments

This guide explains how to deploy Cesivi Server in a multi-server configuration with load balancing, using Redis for distributed state and SQL Server (or other backends) for shared storage.


Table of Contents

  1. Overview
  2. Architecture
  3. Prerequisites
  4. Quick Start with Docker Compose
  5. Configuration
  6. Deployment Scenarios
  7. Monitoring & Health Checks
  8. Troubleshooting
  9. Performance Tuning
  10. Security Considerations

Overview

Cesivi Server supports multi-server deployments with:

  • Distributed Session State: Sessions persist across server restarts and are accessible from any server instance
  • Distributed Cache: User permissions, authentication tokens cached with pub/sub invalidation
  • Distributed Locks: Coordination for critical operations
  • Pub/Sub: Cache invalidation notifications propagate to all servers
  • Queue: Background job processing across servers
  • Load Balancing: No sticky sessions required - any server can handle any request

Key Benefits: - High Availability: Server failures don't lose user sessions - Horizontal Scalability: Add more servers as load increases - Zero Downtime Deployments: Rolling updates without session loss - Fault Tolerance: Redis replication and SQL Server clustering support


Architecture

┌──────────────┐
│ Load Balancer│ (Nginx/HAProxy/Azure LB)
│ (Round-Robin)│
└──────┬───────┘
       │
   ┌───┴────┬────────┬────────┐
   │        │        │        │
┌──▼──┐  ┌──▼──┐  ┌──▼──┐  ┌──▼──┐
│SPM 1│  │SPM 2│  │SPM 3│  │ ... │  Cesivi Server instances
└──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘
   │        │        │        │
   └────────┴────────┴────────┘
            │
       ┌────▼────┐
       │  Redis  │ ← Distributed State (session/cache/locks/pub-sub/queue)
       │ Primary │
       └────┬────┘
            │ (optional replication)
       ┌────▼────┐
       │  Redis  │
       │ Replica │
       └─────────┘
            │
   ┌────────┴────────┐
   │                 │
┌──▼──────┐   ┌──────▼────┐
│SQL Server│   │ File Share│ ← Shared Storage (persistent data)
│(Primary)  │   │ (NFS/S3)  │
│(Replica)  │   │           │
└───────────┘   └───────────┘

Components:

  1. Load Balancer: Distributes requests across SPM instances (round-robin, least-connections, etc.)
  2. Cesivi Server Instances: Stateless application servers (scale horizontally)
  3. Redis: Stores session state, cache, provides pub/sub and distributed locks
  4. Shared Storage: SQL Server, PostgreSQL, or file share for persistent SharePoint data

Prerequisites

Infrastructure Requirements

Per Server Instance: - CPU: 2+ cores - RAM: 4GB+ (8GB recommended) - Disk: 10GB+ (depends on storage backend)

Redis: - CPU: 1-2 cores - RAM: 2GB+ (size based on session/cache volume) - Disk: 1GB+ for persistence (AOF/RDB)

SQL Server (if used): - CPU: 2+ cores - RAM: 4GB+ - Disk: 50GB+ (depends on data volume)

Software Requirements

  • Docker & Docker Compose (for containerized deployments) OR
  • .NET 10.0 SDK/Runtime (for manual deployments)
  • Redis 7.0+
  • SQL Server 2019+ or PostgreSQL 13+ (optional)
  • Load balancer (Nginx, HAProxy, Azure Load Balancer, etc.)

Quick Start with Docker Compose

The easiest way to test multi-server deployment is using the provided docker-compose.multiserver.yml:

# 1. Start all services (3 SPM instances + Redis + SQL Server + Nginx)
docker-compose -f docker-compose.multiserver.yml up -d

# 2. Check service health
docker-compose -f docker-compose.multiserver.yml ps

# 3. Access Cesivi Server through load balancer
curl http://localhost:8080/health

# 4. View logs
docker-compose -f docker-compose.multiserver.yml logs -f spm-server-1

# 5. Stop all services
docker-compose -f docker-compose.multiserver.yml down

What's Included:

  • 3x Cesivi Server instances (ports 5001, 5002, 5003)
  • 1x Redis (port 6379) - Distributed state backend
  • 1x SQL Server Express (port 1433) - Shared storage (optional)
  • 1x Nginx (port 8080) - Load balancer

Testing Session Persistence:

# Create a session on any server
curl -c cookies.txt http://localhost:8080/api/session/create

# Subsequent requests will be load-balanced but session persists
curl -b cookies.txt http://localhost:8080/api/session/get

# Stop server 1, session still works on servers 2 and 3
docker stop spm-server-1
curl -b cookies.txt http://localhost:8080/api/session/get

Configuration

Environment Variables

Distributed State Configuration:

# Redis provider (multi-server)
Cesivi__DistributedState__Provider=Redis
Cesivi__DistributedState__ConnectionString=redis-host:6379,password=YourPassword,ssl=true

# InMemory provider (single-server)
Cesivi__DistributedState__Provider=InMemory

Storage Configuration:

# InMemory (development/testing)
Cesivi__Storage__Provider=InMemory

# SQL Server (production)
Cesivi__Storage__Provider=SqlServer
Cesivi__Storage__ConnectionString=Server=sql-host,1433;Database=Cesivi;User=sa;Password=YourPassword;TrustServerCertificate=True

# PostgreSQL (production)
Cesivi__Storage__Provider=PostgreSQL
Cesivi__Storage__ConnectionString=Host=pg-host;Port=5432;Database=Cesivi;Username=postgres;Password=YourPassword

Farm Configuration:

# Enable farm mode (multi-server)
Cesivi__Farm__Enabled=true
Cesivi__Farm__ServerId=spm-server-1  # Unique per server

appsettings.Production.json

{
  "Cesivi": {
    "DistributedState": {
      "Provider": "Redis",
      "ConnectionString": "redis-cluster.example.com:6379,password=SecurePassword123!,ssl=true,abortConnect=false"
    },
    "Storage": {
      "Provider": "SqlServer",
      "ConnectionString": "Server=sql-cluster.example.com,1433;Database=Cesivi;User Id=spm_app;Password=AppPassword123!;TrustServerCertificate=True;Encrypt=True"
    },
    "Farm": {
      "Enabled": true,
      "ServerId": "${HOSTNAME}"
    }
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Cesivi": "Information",
      "Microsoft.AspNetCore": "Warning"
    }
  }
}

Distributed DataProtection + OIDC server-side state (PLAN-1772)

Running more than one instance behind a load balancer requires a shared ASP.NET Core DataProtection key ring — otherwise each instance mints its own keys and auth cookies / OIDC correlation cookies minted on instance A fail to validate on instance B.

  • Cesivi.Server: Cesivi:DataProtection:Provider = PostgreSql | SqlServer | SharedDirectory (reuses the configured storage connection string, or an explicit UNC/SMB/NFS KeysPath). Fail-fast startup gates reject an unshareable FileSystem provider when Cluster:Mode=Cluster, and reject a DataProtection backend that doesn't match the configured StorageProvider (e.g. SqlServer DataProtection with a non-SQL storage backend).
  • Cesivi.WebUI: WebUI:DataProtectionKeysPath — a shared path all instances point at — plus WebUI:DistributedState:Provider=Redis so the OIDC state blob (the ServerSideStateDataFormat) is stored server-side in shared Redis instead of per-process memory. This is what lets a login started on instance A complete on instance B (and avoids the Keycloak HTTP 431 you get from an oversized state query parameter). The default InMemory provider is correct only for a single WebUI instance.

Full reference (complete config templates, nginx upstream template with no session affinity, health probes, acceptance checklist): _docs_dev/multi-instance-deploy.md.


Deployment Scenarios

Scenario 1: Docker Swarm / Kubernetes

Docker Swarm Stack:

# spm-stack.yml
version: '3.8'

services:
  spm-server:
    image: Cesivi/server:latest
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
    environment:
      - Cesivi__DistributedState__Provider=Redis
      - Cesivi__DistributedState__ConnectionString=${REDIS_CONN_STRING}
      - Cesivi__Storage__Provider=SqlServer
      - Cesivi__Storage__ConnectionString=${SQL_CONN_STRING}
    networks:
      - spm-network
    ports:
      - target: 5000
        published: 5000
        protocol: tcp
        mode: host

networks:
  spm-network:
    driver: overlay

Deploy:

docker stack deploy -c spm-stack.yml spm

Kubernetes Deployment:

# spm-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spm-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: spm-server
  template:
    metadata:
      labels:
        app: spm-server
    spec:
      containers:
      - name: spm-server
        image: Cesivi/server:latest
        ports:
        - containerPort: 5000
        env:
        - name: Cesivi__DistributedState__Provider
          value: "Redis"
        - name: Cesivi__DistributedState__ConnectionString
          valueFrom:
            secretKeyRef:
              name: spm-secrets
              key: redis-connection
        - name: Cesivi__Storage__Provider
          value: "SqlServer"
        - name: Cesivi__Storage__ConnectionString
          valueFrom:
            secretKeyRef:
              name: spm-secrets
              key: sql-connection
        livenessProbe:
          httpGet:
            path: /health
            port: 5000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 5000
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: spm-server
spec:
  type: LoadBalancer
  selector:
    app: spm-server
  ports:
  - protocol: TCP
    port: 80
    targetPort: 5000

Deploy:

kubectl apply -f spm-deployment.yaml

Scenario 2: Azure App Service

  1. Create Azure Resources:
  2. Azure App Service (multiple instances with auto-scaling)
  3. Azure Cache for Redis (Standard/Premium tier)
  4. Azure SQL Database (Standard/Premium tier)

  5. Configure App Service:

# Set environment variables
az webapp config appsettings set -n spm-app -g spm-rg --settings \
  Cesivi__DistributedState__Provider=Redis \
  Cesivi__DistributedState__ConnectionString="spm-cache.redis.cache.windows.net:6380,password=...,ssl=true" \
  Cesivi__Storage__Provider=SqlServer \
  Cesivi__Storage__ConnectionString="Server=tcp:spm-sql.database.windows.net,1433;Database=Cesivi;..."

# Enable auto-scaling
az monitor autoscale create -n spm-autoscale -g spm-rg \
  --resource spm-app \
  --min-count 2 \
  --max-count 10 \
  --count 3

Scenario 3: AWS ECS Fargate

{
  "family": "spm-server",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "spm-server",
      "image": "Cesivi/server:latest",
      "portMappings": [
        {
          "containerPort": 5000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "Cesivi__DistributedState__Provider",
          "value": "Redis"
        },
        {
          "name": "Cesivi__DistributedState__ConnectionString",
          "value": "spm-cache.abc123.0001.use1.cache.amazonaws.com:6379"
        },
        {
          "name": "Cesivi__Storage__Provider",
          "value": "SqlServer"
        }
      ],
      "secrets": [
        {
          "name": "Cesivi__Storage__ConnectionString",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:spm/sql-connection"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:5000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3
      }
    }
  ]
}

Monitoring & Health Checks

Health Endpoints

Cesivi Server provides several health check endpoints:

# Basic health check (returns 200 OK if healthy)
GET /health

# Detailed health check (includes dependencies)
GET /health/ready

# Liveness probe (for Kubernetes)
GET /health/live

# Metrics endpoint (Prometheus format)
GET /metrics

Example Response:

{
  "status": "Healthy",
  "totalDuration": "00:00:00.0234567",
  "entries": {
    "DistributedState": {
      "status": "Healthy",
      "description": "Redis connection healthy",
      "data": {
        "provider": "Redis",
        "connected": true
      }
    },
    "Storage": {
      "status": "Healthy",
      "description": "SQL Server connection healthy",
      "data": {
        "provider": "SqlServer",
        "connected": true
      }
    }
  }
}

Logging

Recommended Logging Configuration:

{
  "Serilog": {
    "MinimumLevel": {
      "Default": "Information",
      "Override": {
        "Microsoft": "Warning",
        "Cesivi.Server.Services.DistributedSessionStateStore": "Debug",
        "Cesivi.Server.Authentication.DistributedTokenCache": "Debug",
        "Cesivi.Server.Authorization.DistributedUserContextCache": "Debug"
      }
    },
    "WriteTo": [
      {
        "Name": "Console",
        "Args": {
          "outputTemplate": "[{Timestamp:HH:mm:ss} {Level:u3}] {SourceContext} {Message:lj}{NewLine}{Exception}"
        }
      },
      {
        "Name": "File",
        "Args": {
          "path": "/var/log/spm/log-.txt",
          "rollingInterval": "Day",
          "retainedFileCountLimit": 7
        }
      }
    ]
  }
}

Metrics & Monitoring

Key Metrics to Monitor:

  • Request Rate: Requests/second per server
  • Response Time: P50, P95, P99 latencies
  • Error Rate: 4xx/5xx responses
  • Session Count: Active sessions (from Redis)
  • Cache Hit Rate: User context cache efficiency
  • Redis Memory Usage: Track for capacity planning
  • SQL Connection Pool: Active connections
  • Server CPU/RAM: Resource utilization

Prometheus Metrics:

# Request rate
rate(http_requests_total[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# Response time (p95)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Active sessions
redis_sessions_active

# Cache hit rate
rate(usercontext_cache_hits_total[5m]) / rate(usercontext_cache_requests_total[5m])

Troubleshooting

Issue: Sessions Not Persisting Across Servers

Symptoms: - User logged out when request goes to different server - Session data lost randomly

Diagnosis:

# Check Redis connection from SPM server
docker exec spm-server-1 curl http://localhost:5000/health

# Check Redis directly
docker exec spm-redis redis-cli -a YourPassword KEYS "session:*"

# Check distributed state provider configuration
docker exec spm-server-1 env | grep DistributedState

Solutions:

  1. Verify Cesivi__DistributedState__Provider=Redis is set
  2. Check Redis connection string is correct
  3. Ensure Redis is accessible from all SPM instances
  4. Check firewall rules (port 6379)
  5. Verify Redis password if authentication is enabled

Issue: Cache Invalidation Not Propagating

Symptoms: - User permissions cached incorrectly after changes - Stale data visible on some servers but not others

Diagnosis:

# Enable debug logging for pub/sub
Cesivi__Logging__LogLevel__Cesivi.Server.Authorization.DistributedUserContextCache=Debug

# Check Redis pub/sub channels
docker exec spm-redis redis-cli -a YourPassword PUBSUB CHANNELS

Solutions:

  1. Verify pub/sub is working: PUBLISH usercontext:invalidated "test"
  2. Check subscription logs in SPM server logs
  3. Ensure Redis supports pub/sub (should be enabled by default)
  4. Check network latency between servers and Redis

Issue: High Redis Memory Usage

Symptoms: - Redis memory usage growing continuously - Out of memory errors

Diagnosis:

# Check Redis memory stats
docker exec spm-redis redis-cli -a YourPassword INFO memory

# Check key count and expiration
docker exec spm-redis redis-cli -a YourPassword DBSIZE
docker exec spm-redis redis-cli -a YourPassword TTL "session:abc123"

Solutions:

  1. Verify TTL is set correctly on keys
  2. Enable Redis eviction policy: maxmemory-policy allkeys-lru
  3. Increase Redis memory limit
  4. Reduce session/cache TTL if appropriate
  5. Monitor for memory leaks in application code

Issue: SQL Server Connection Pool Exhausted

Symptoms: - Timeout errors when accessing storage - "Timeout expired. The timeout period elapsed..." errors

Diagnosis:

# Check SQL Server connections
SELECT * FROM sys.dm_exec_sessions WHERE program_name LIKE '%Cesivi%'

# Check connection pool size in SPM logs
grep "Connection pool" /var/log/spm/*.log

Solutions:

  1. Increase connection pool size: Max Pool Size=200 in connection string
  2. Verify connections are being properly disposed
  3. Check for long-running queries causing connection blocking
  4. Enable connection pooling: Pooling=true (default)

Performance Tuning

Redis Optimization

Connection Pooling:

# Cesivi Server automatically pools Redis connections
# Configure connection multiplexer settings:
Cesivi__DistributedState__ConnectionString=redis-host:6379,abortConnect=false,connectTimeout=5000,syncTimeout=5000

Redis Configuration:

# redis.conf optimizations

# Maximum memory (e.g., 2GB)
maxmemory 2gb

# Eviction policy (remove least recently used keys when out of memory)
maxmemory-policy allkeys-lru

# Persistence (RDB + AOF for durability)
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

# Networking
tcp-backlog 511
timeout 0
tcp-keepalive 300

# Performance
maxclients 10000

SQL Server Optimization

Connection String:

Server=sql-host;Database=Cesivi;User=spm_app;Password=...;
Min Pool Size=10;Max Pool Size=200;Pooling=true;
Connection Timeout=30;Command Timeout=300;
TrustServerCertificate=True;Encrypt=True;MultipleActiveResultSets=True

Indexing:

-- Create indexes for frequently queried columns
CREATE NONCLUSTERED INDEX IX_ListItems_ListId ON ListItems(ListId);
CREATE NONCLUSTERED INDEX IX_ListItems_Created ON ListItems(Created);
CREATE NONCLUSTERED INDEX IX_Files_Url ON Files(Url);
CREATE NONCLUSTERED INDEX IX_Users_LoginName ON Users(LoginName);

-- Update statistics regularly
UPDATE STATISTICS ListItems WITH FULLSCAN;
UPDATE STATISTICS Files WITH FULLSCAN;

Load Balancer Tuning

Nginx:

# nginx.conf optimizations

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 10000;
    use epoll;
}

http {
    # Connection pooling to backends
    upstream spm_backend {
        least_conn;  # Use least-connections instead of round-robin for better distribution
        server spm-server-1:5000 max_fails=3 fail_timeout=30s;
        server spm-server-2:5000 max_fails=3 fail_timeout=30s;
        server spm-server-3:5000 max_fails=3 fail_timeout=30s;
        keepalive 64;  # Connection pool to backends
    }

    # Caching (optional, for static assets)
    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=spm_cache:10m max_size=1g;

    # Timeouts
    proxy_connect_timeout 60s;
    proxy_send_timeout 60s;
    proxy_read_timeout 60s;
    keepalive_timeout 65s;

    # Buffering
    proxy_buffering on;
    proxy_buffer_size 4k;
    proxy_buffers 8 4k;
    proxy_busy_buffers_size 8k;
}

Security Considerations

Network Security

  1. TLS/SSL Termination: Terminate HTTPS at load balancer
  2. Private Networks: Run Redis and SQL Server on private network (no public IP)
  3. Firewall Rules: Restrict access to management ports
  4. VPN/Bastion: Use VPN or bastion host for administrative access

Redis Security

# redis.conf security settings

# Require password
requirepass YourStrongPasswordHere123!

# Disable dangerous commands
rename-command FLUSHDB ""
rename-command FLUSHALL ""
rename-command CONFIG ""
rename-command SHUTDOWN ""

# Bind to private IP only
bind 10.0.1.100

# Enable TLS (Redis 6+)
tls-port 6380
tls-cert-file /path/to/redis.crt
tls-key-file /path/to/redis.key
tls-ca-cert-file /path/to/ca.crt

SQL Server Security

  1. Least Privilege: Use dedicated SQL user with minimal permissions
  2. Encryption: Enable TLS encryption (Encrypt=True)
  3. Firewall: Restrict SQL Server port 1433 to SPM servers only
  4. Backup Encryption: Encrypt SQL Server backups
  5. Audit Logging: Enable SQL Server audit logging

Application Security

  1. Secrets Management: Use Azure Key Vault, AWS Secrets Manager, etc.
  2. Environment Variables: Never commit credentials to git
  3. Token Rotation: Implement token rotation for bearer tokens
  4. Rate Limiting: Implement rate limiting at load balancer
  5. CORS: Configure CORS policies appropriately

Summary

Multi-server deployment of Cesivi Server enables:

  • High Availability: No single point of failure
  • Horizontal Scalability: Scale out as needed
  • Session Persistence: Sessions survive server restarts
  • Cache Coordination: Pub/sub invalidation across servers
  • Production-Ready: Enterprise-grade architecture

Next Steps:

  1. Review Configuration section for your environment
  2. Choose a Deployment Scenario
  3. Set up Monitoring
  4. Test failover scenarios
  5. Implement Security Considerations

For questions or issues, see TROUBLESHOOTING or API_REFERENCE.