Troubleshooting¶

Common issues and their fixes, grouped by component.

CTFHive app (`ctfapp`)¶

`ModuleNotFoundError: No module named 'provisioner'` when starting the control plane¶

Symptom: The CTFHive control plane crashes at import time with:

ModuleNotFoundError: No module named 'provisioner'

Cause: The provisioner package lives at the repository root, but when gunicorn wsgi:app or flask run is invoked from inside CTF_Saas_CTRL_Pane/, the repo root is not on sys.path.

Fix: The ctrlapp/__init__.py app factory bootstraps sys.path automatically at import time, so the normal entrypoints work without manual intervention:

_REPO_ROOT = os.path.dirname(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
)
if _REPO_ROOT not in sys.path:
    sys.path.insert(0, _REPO_ROOT)

This is idempotent. If you still see the error, ensure you are launching from the correct working directory:

# From the repo root:
gunicorn "CTF_Saas_CTRL_Pane.ctrlapp:create_app()"

# OR from inside CTF_Saas_CTRL_Pane/:
cd CTF_Saas_CTRL_Pane
gunicorn "ctrlapp:create_app()"

For pytest, the conftest.py in CTF_Saas_CTRL_Pane/ adds the repo root to sys.path. Run the control-plane tests from the CTF_Saas_CTRL_Pane/ directory:

cd CTF_Saas_CTRL_Pane
uv run pytest

Running pytest from the repo root without a conftest.py that handles the path can reproduce the error.

Redis is down — app behaviour¶

Symptom: Warning messages in logs; rate limiting may not work correctly; flag lookup is slower.

Cause: REDIS_URL points to an unreachable Redis instance.

Behaviour: The app degrades gracefully:

Caching falls back to SimpleCache (in-memory, per-process, no TTL sharing).
Rate limiting falls back to in-memory counters.
Flag lookup falls back to the database (TeamFlag rows), then on-the-fly re-derivation.

Fix for development: Start Redis locally:

docker run -d -p 6379:6379 redis:7-alpine

Fix for production: Ensure REDIS_URL is correct and Redis is reachable before starting the app. Do not run CTFHive in production without Redis — the in-memory rate-limit fallback allows workers × limit requests (see below).

Rate limiting is not working under multiple Gunicorn workers¶

Symptom: Clients can submit significantly more requests than configured limits suggest. For example, LOGIN_RATE_LIMIT=10 per minute allows 40 attempts per minute with 4 workers.

Cause: RATELIMIT_STORAGE_URI=memory:// (the default) stores counters in-process memory. Each Gunicorn worker has its own counter, so the effective limit is workers × configured_limit.

Fix: Set a Redis backend for rate limiting:

RATELIMIT_STORAGE_URI=redis://localhost:6379/1

Use a different Redis DB number than REDIS_URL to avoid key collisions (though there is no technical requirement to separate them).

Labs/containers are not spawning¶

Symptom: Clicking "Start Lab" returns an error or silently does nothing. Container instances do not appear in the admin panel.

Cause (most common): LAB_ENABLED is false (the default).

Fix:

LAB_ENABLED=true
LAB_DOCKER_HOST=unix:///var/run/docker.sock

Ensure the process running the CTFHive app has permission to access the Docker socket. If running in Docker Compose, mount the socket:

volumes:
  - /var/run/docker.sock:/var/run/docker.sock

Security

Mounting the Docker socket gives the app root-equivalent access to the host. In production, use the forge-dockerd-proxy mTLS proxy (see Architecture docs) instead of exposing the socket directly.

Other causes:

DISPATCH_INTERNAL_URL is unreachable → set DISPATCH_USE_REMOTE=false to use the local Docker fallback.
LAB_REQUIRE_PINNED_IMAGES=true and the challenge image is not digest-pinned (image@sha256:...) → pin the image or set LAB_REQUIRE_PINNED_IMAGES=false during development.

Email features are silently disabled¶

Symptom: Email verification links are never sent; password reset emails do not arrive. No error is raised in the app.

Cause: MAILTRAP_API_KEY is empty (the default). The email service checks for the API key before attempting delivery and skips sending silently.

Fix: Set the Mailtrap API key:

MAILTRAP_API_KEY=your-mailtrap-api-key

Alternatively, disable email-dependent features during development:

EMAIL_VERIFICATION_ENABLED=false

Changing `ADMIN_KEY` breaks all flags and the audit chain¶

Symptom: After rotating ADMIN_KEY, all flag submissions fail. The audit chain verify_chain() returns (False, <id>) for every row.

Cause: ADMIN_KEY is the HMAC key for both flag derivation (HMAC-SHA3-256) and the application audit chain (HMAC-SHA256). Changing it means:

Every previously-derived flag is no longer reproducible.
Every historical audit chain signature cannot be re-verified.

Fix:

For flags — flush Redis and regenerate all TeamFlag rows before reopening the event:

# 1. Flush the Redis flag cache (adjust DB number as needed)
redis-cli -n 0 KEYS "team_flags:*" | xargs redis-cli -n 0 DEL

# 2. Delete all TeamFlag rows
flask shell -c "from ctfapp.extensions import db; from ctfapp.models.submission import TeamFlag; db.session.query(TeamFlag).delete(); db.session.commit()"

# 3. Re-generate flags for all active principals
flask shell -c "
from ctfapp.extensions import db
from ctfapp.models.principal import Principal
from ctfapp.services.flag_engine import generate_flags_for_principal
for p in Principal.query.filter_by(active=True).all():
    generate_flags_for_principal(p)
db.session.commit()
"

For the audit chain — the historical chain is permanently broken after key rotation. Archive the old log for the record and accept that verify_chain() will not validate entries signed with the old key. Future entries will form a new valid chain from the current _last_sig.

CTFHive control plane¶

Control-plane `docker build` produces an empty image¶

Symptom: docker build -f CTF_Saas_CTRL_Pane/Dockerfile . succeeds but the image contains no application code.

Cause: CTF_Saas_CTRL_Pane/Dockerfile and CTF_Saas_CTRL_Pane/Dockerfile.dev are empty files (0 bytes). They were created as stubs and have not yet been written.

Workaround: Run the control plane directly via Gunicorn:

cd CTF_Saas_CTRL_Pane
uv run gunicorn "ctrlapp:create_app()" \
    --bind 0.0.0.0:5001 \
    --workers 2 \
    --timeout 60

Or via flask run for development:

cd CTF_Saas_CTRL_Pane
FLASK_APP="ctrlapp:create_app()" uv run flask run --port 5001

Status

Dockerfiles for the control plane are not yet implemented. Track progress or contribute at the repository.

`ModuleNotFoundError: No module named 'provisioner'` (control plane context)¶

See the CTFHive app section above. The fix is identical.

Provisioner¶

Provisioner dry-run vs real provisioning¶

Symptom: Running a provision does nothing — no Linode server is created, no DNS record appears.

Cause: LINODE_API_TOKEN is empty (the default). The ProvisionService selects FakeExecutor when the token is absent, which logs steps but does not make real API calls.

Fix: Set a real Linode API token to enable live provisioning:

LINODE_API_TOKEN=your-linode-api-token
LINODE_REGION=us-east
LINODE_PLAN=g6-standard-2
LINODE_IMAGE=linode/debian12

Always test with FakeExecutor (empty token, or pass --dry-run when the CLI supports it) before provisioning real infrastructure.

Audit chain verification fails for provisioner log¶

Symptom: AuditLog.verify_chain() returns (False, N) for an existing log file.

Causes and fixes:

Cause	Fix
`PROVISION_AUDIT_SECRET` changed	The chain is permanently broken for entries signed with the old secret. Archive the old file and start a new chain.
Log file was manually edited	Any modification to a JSONL line changes the signed body, breaking the chain from that line onward. Do not edit log files.
Log file truncated or corrupted	Partial writes during a crash can corrupt the last line. `verify_chain()` returns the 0-based index of the first bad line. Lines before that index are still valid.

To identify the first bad line:

from provisioner.audit import AuditLog
import os

log = AuditLog(
    path="runs/tenant-abc/audit.jsonl",
    secret=os.environ["PROVISION_AUDIT_SECRET"],
)
ok, bad_idx = log.verify_chain()
if not ok:
    print(f"Chain broken at line index {bad_idx}")

General¶

App boots but all pages return 500¶

Check: Run with APP_ENV=development and DEBUG=true to see the traceback. Common causes:

Missing database tables — Run migrations: flask db upgrade or flask shell -c "from ctfapp.extensions import db; db.create_all()".
Unreachable database — Check DATABASE_URL and that PostgreSQL is running.
Bad ENCRYPTION_KEY — If the key changed after data was written, decryption will raise exceptions. See the key-rotation section under environment variables.

Flag submissions always return "wrong"¶

Checklist:

ADMIN_KEY has not changed since the challenge was first seeded.
The principal's team_secret has not been re-generated (only happens on Principal row re-creation).
Challenge.flag_prefix matches what was used to derive the stored flag.
Redis is available; if not, the fallback path should still work but check for exceptions in the logs.
The submitted string has no leading/trailing whitespace (the engine calls .strip() but browser autofill can occasionally add non-breaking spaces).

CSRF errors on form submission¶

Cause: WTF_CSRF_TIME_LIMIT (default 3600 seconds) exceeded, or the page was cached and the CSRF token is stale.

Fix: Hard-refresh the page. For production, ensure session cookies are correctly scoped and SECRET_KEY has not changed mid-session.