Skip to content

Troubleshooting

Common issues and their fixes, grouped by component.


CTFHive app (ctfapp)

ModuleNotFoundError: No module named 'provisioner' when starting the control plane

Symptom: The CTFHive control plane crashes at import time with:

ModuleNotFoundError: No module named 'provisioner'

Cause: The provisioner package lives at the repository root, but when gunicorn wsgi:app or flask run is invoked from inside CTF_Saas_CTRL_Pane/, the repo root is not on sys.path.

Fix: The ctrlapp/__init__.py app factory bootstraps sys.path automatically at import time, so the normal entrypoints work without manual intervention:

_REPO_ROOT = os.path.dirname(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
)
if _REPO_ROOT not in sys.path:
    sys.path.insert(0, _REPO_ROOT)

This is idempotent. If you still see the error, ensure you are launching from the correct working directory:

# From the repo root:
gunicorn "CTF_Saas_CTRL_Pane.ctrlapp:create_app()"

# OR from inside CTF_Saas_CTRL_Pane/:
cd CTF_Saas_CTRL_Pane
gunicorn "ctrlapp:create_app()"

For pytest, the conftest.py in CTF_Saas_CTRL_Pane/ adds the repo root to sys.path. Run the control-plane tests from the CTF_Saas_CTRL_Pane/ directory:

cd CTF_Saas_CTRL_Pane
uv run pytest

Running pytest from the repo root without a conftest.py that handles the path can reproduce the error.


Redis is down — app behaviour

Symptom: Warning messages in logs; rate limiting may not work correctly; flag lookup is slower.

Cause: REDIS_URL points to an unreachable Redis instance.

Behaviour: The app degrades gracefully:

  • Caching falls back to SimpleCache (in-memory, per-process, no TTL sharing).
  • Rate limiting falls back to in-memory counters.
  • Flag lookup falls back to the database (TeamFlag rows), then on-the-fly re-derivation.

Fix for development: Start Redis locally:

docker run -d -p 6379:6379 redis:7-alpine

Fix for production: Ensure REDIS_URL is correct and Redis is reachable before starting the app. Do not run CTFHive in production without Redis — the in-memory rate-limit fallback allows workers × limit requests (see below).


Rate limiting is not working under multiple Gunicorn workers

Symptom: Clients can submit significantly more requests than configured limits suggest. For example, LOGIN_RATE_LIMIT=10 per minute allows 40 attempts per minute with 4 workers.

Cause: RATELIMIT_STORAGE_URI=memory:// (the default) stores counters in-process memory. Each Gunicorn worker has its own counter, so the effective limit is workers × configured_limit.

Fix: Set a Redis backend for rate limiting:

RATELIMIT_STORAGE_URI=redis://localhost:6379/1

Use a different Redis DB number than REDIS_URL to avoid key collisions (though there is no technical requirement to separate them).


Labs/containers are not spawning

Symptom: Clicking "Start Lab" returns an error or silently does nothing. Container instances do not appear in the admin panel.

Cause (most common): LAB_ENABLED is false (the default).

Fix:

LAB_ENABLED=true
LAB_DOCKER_HOST=unix:///var/run/docker.sock

Ensure the process running the CTFHive app has permission to access the Docker socket. If running in Docker Compose, mount the socket:

volumes:
  - /var/run/docker.sock:/var/run/docker.sock

Security

Mounting the Docker socket gives the app root-equivalent access to the host. In production, use the forge-dockerd-proxy mTLS proxy (see Architecture docs) instead of exposing the socket directly.

Other causes:

  • DISPATCH_INTERNAL_URL is unreachable → set DISPATCH_USE_REMOTE=false to use the local Docker fallback.
  • LAB_REQUIRE_PINNED_IMAGES=true and the challenge image is not digest-pinned (image@sha256:...) → pin the image or set LAB_REQUIRE_PINNED_IMAGES=false during development.

Email features are silently disabled

Symptom: Email verification links are never sent; password reset emails do not arrive. No error is raised in the app.

Cause: MAILTRAP_API_KEY is empty (the default). The email service checks for the API key before attempting delivery and skips sending silently.

Fix: Set the Mailtrap API key:

MAILTRAP_API_KEY=your-mailtrap-api-key

Alternatively, disable email-dependent features during development:

EMAIL_VERIFICATION_ENABLED=false


Changing ADMIN_KEY breaks all flags and the audit chain

Symptom: After rotating ADMIN_KEY, all flag submissions fail. The audit chain verify_chain() returns (False, <id>) for every row.

Cause: ADMIN_KEY is the HMAC key for both flag derivation (HMAC-SHA3-256) and the application audit chain (HMAC-SHA256). Changing it means:

  1. Every previously-derived flag is no longer reproducible.
  2. Every historical audit chain signature cannot be re-verified.

Fix:

For flags — flush Redis and regenerate all TeamFlag rows before reopening the event:

# 1. Flush the Redis flag cache (adjust DB number as needed)
redis-cli -n 0 KEYS "team_flags:*" | xargs redis-cli -n 0 DEL

# 2. Delete all TeamFlag rows
flask shell -c "from ctfapp.extensions import db; from ctfapp.models.submission import TeamFlag; db.session.query(TeamFlag).delete(); db.session.commit()"

# 3. Re-generate flags for all active principals
flask shell -c "
from ctfapp.extensions import db
from ctfapp.models.principal import Principal
from ctfapp.services.flag_engine import generate_flags_for_principal
for p in Principal.query.filter_by(active=True).all():
    generate_flags_for_principal(p)
db.session.commit()
"

For the audit chain — the historical chain is permanently broken after key rotation. Archive the old log for the record and accept that verify_chain() will not validate entries signed with the old key. Future entries will form a new valid chain from the current _last_sig.


CTFHive control plane

Control-plane docker build produces an empty image

Symptom: docker build -f CTF_Saas_CTRL_Pane/Dockerfile . succeeds but the image contains no application code.

Cause: CTF_Saas_CTRL_Pane/Dockerfile and CTF_Saas_CTRL_Pane/Dockerfile.dev are empty files (0 bytes). They were created as stubs and have not yet been written.

Workaround: Run the control plane directly via Gunicorn:

cd CTF_Saas_CTRL_Pane
uv run gunicorn "ctrlapp:create_app()" \
    --bind 0.0.0.0:5001 \
    --workers 2 \
    --timeout 60

Or via flask run for development:

cd CTF_Saas_CTRL_Pane
FLASK_APP="ctrlapp:create_app()" uv run flask run --port 5001

Status

Dockerfiles for the control plane are not yet implemented. Track progress or contribute at the repository.


ModuleNotFoundError: No module named 'provisioner' (control plane context)

See the CTFHive app section above. The fix is identical.


Provisioner

Provisioner dry-run vs real provisioning

Symptom: Running a provision does nothing — no Linode server is created, no DNS record appears.

Cause: LINODE_API_TOKEN is empty (the default). The ProvisionService selects FakeExecutor when the token is absent, which logs steps but does not make real API calls.

Fix: Set a real Linode API token to enable live provisioning:

LINODE_API_TOKEN=your-linode-api-token
LINODE_REGION=us-east
LINODE_PLAN=g6-standard-2
LINODE_IMAGE=linode/debian12

Always test with FakeExecutor (empty token, or pass --dry-run when the CLI supports it) before provisioning real infrastructure.


Audit chain verification fails for provisioner log

Symptom: AuditLog.verify_chain() returns (False, N) for an existing log file.

Causes and fixes:

Cause Fix
PROVISION_AUDIT_SECRET changed The chain is permanently broken for entries signed with the old secret. Archive the old file and start a new chain.
Log file was manually edited Any modification to a JSONL line changes the signed body, breaking the chain from that line onward. Do not edit log files.
Log file truncated or corrupted Partial writes during a crash can corrupt the last line. verify_chain() returns the 0-based index of the first bad line. Lines before that index are still valid.

To identify the first bad line:

from provisioner.audit import AuditLog
import os

log = AuditLog(
    path="runs/tenant-abc/audit.jsonl",
    secret=os.environ["PROVISION_AUDIT_SECRET"],
)
ok, bad_idx = log.verify_chain()
if not ok:
    print(f"Chain broken at line index {bad_idx}")


General

App boots but all pages return 500

Check: Run with APP_ENV=development and DEBUG=true to see the traceback. Common causes:

  1. Missing database tables — Run migrations: flask db upgrade or flask shell -c "from ctfapp.extensions import db; db.create_all()".
  2. Unreachable database — Check DATABASE_URL and that PostgreSQL is running.
  3. Bad ENCRYPTION_KEY — If the key changed after data was written, decryption will raise exceptions. See the key-rotation section under environment variables.

Flag submissions always return "wrong"

Checklist:

  1. ADMIN_KEY has not changed since the challenge was first seeded.
  2. The principal's team_secret has not been re-generated (only happens on Principal row re-creation).
  3. Challenge.flag_prefix matches what was used to derive the stored flag.
  4. Redis is available; if not, the fallback path should still work but check for exceptions in the logs.
  5. The submitted string has no leading/trailing whitespace (the engine calls .strip() but browser autofill can occasionally add non-breaking spaces).

CSRF errors on form submission

Cause: WTF_CSRF_TIME_LIMIT (default 3600 seconds) exceeded, or the page was cached and the CSRF token is stale.

Fix: Hard-refresh the page. For production, ensure session cookies are correctly scoped and SECRET_KEY has not changed mid-session.