Skip to content

Three-Server Validation Runbook

Roles

  • Server A: Registry and image/build source
  • Server B: Public control plane, app, auth, teams, scoreboard, admin
  • Server C: VPN and routing plane for per-principal / per-team access

Preflight

  • Run python cli.py env validate role-a --app-env production --json
  • Run python cli.py env validate role-b --app-env production --json
  • Run python cli.py env validate role-c --app-env production --json
  • Run python cli.py db bootstrap --app-env production --json
  • Run python cli.py health all --app-env production --json

Functional Staging

Server A

  • Verify registry host, credentials, and local image availability with python cli.py registry check --json
  • Build and push challenge images with python cli.py challenge build <slug> then python cli.py challenge push <slug>
  • Confirm missing-image failure behavior before enabling fresh spawns

Server B

  • Bootstrap the schema and admin user
  • Validate registration, login, reset, solo principal creation, team create/join, challenge access, scoreboard, admin pages, and supported appearance presets
  • Validate local or remote instance lifecycle with:
  • python cli.py instance spawn <slug> --principal-id <id>
  • python cli.py instance reset <slug> --principal-id <id>
  • python cli.py instance destroy <slug> --principal-id <id>

Server C

  • Validate WireGuard access and peer visibility with:
  • python cli.py vpn status --json
  • python cli.py vpn test --json
  • python cli.py vpn reconcile --json
  • Confirm strict routing: one team/principal may only reach its own assigned subnet

Load and Soak

  • Target 50-100 teams and 200-400 concurrent VPN clients
  • Blend traffic:
  • 25% solo principals
  • 75% team principals
  • 60% control-plane browsing
  • 30% active challenge traffic
  • 10% spawn/reset/reconnect churn
  • Observe:
  • login latency
  • spawn latency
  • active WireGuard peers
  • container start time
  • Redis and database latency
  • CPU, memory, and network saturation on B and C

Failure Drills

  • Stop registry access on Server A and verify fresh spawn failures are clear and recoverable
  • Restart Server B and verify session continuity and no duplicate instance creation
  • Restart Server C and verify peer reconciliation and route restoration
  • Force one stale WireGuard peer, one expired lease, and one wrong registry credential and confirm cleanup plus audit visibility

Acceptance Criteria

  • Solo users receive one personal principal and private challenge access
  • Teams never exceed 4 active members
  • Lease cleanup occurs at the configured TTL boundary
  • No cross-team subnet reachability exists
  • Supported appearance presets render correctly on public, player, and admin pages
  • CLI commands remain deterministic, automation-friendly, and JSON-capable