Troubleshooting
Engine
Etcd connection issues
Managed etcd: If etcd fails to start, check that port 2379 is not already in use and that the data directory (/var/lib/banyan/etcd/ by default) is writable.
“failed to connect to etcd” — For external etcd, make sure your etcd server is running and reachable at the configured address:
sudo apt-get install etcd-server # Debian/Ubuntusudo systemctl start etcdIf you configured TLS or mTLS, verify that the certificate paths in /etc/banyan/banyan.yaml are correct and the files are readable.
To reconfigure the etcd connection, re-run banyan-engine init. See Etcd for setup details.
Engine starts but agents cannot connect
Agents connect to the Engine’s gRPC port (default: 50051). Check:
-
The Engine is running and the gRPC server started successfully (look for “Engine gRPC server listening on :50051” in the output).
-
The agent’s config has the correct engine host and port. Check
/etc/banyan/banyan.yamlon the agent:agent:engine_host: <engine-ip>engine_port: "50051"wg_public_key: "<base64-key>" -
Port 50051 is open in your firewall between agents and the engine.
-
The agent’s public key is whitelisted on the engine. Check that a
.pubfile containing the agent’s public key exists in/etc/banyan/whitelisted-keys/on the engine machine.
”Unauthenticated” errors
If agents or CLI clients receive “Unauthenticated” errors:
- Verify the component’s public key is whitelisted on the engine. Check
/etc/banyan/whitelisted-keys/for a.pubfile containing the key. - If the engine was re-initialized, the whitelisted keys directory is recreated empty. Re-copy all agent and CLI public keys.
- If no config exists yet, run
sudo banyan-agent init(orsudo banyan-cli init) to generate a keypair, then whitelist the public key on the engine. - To find a component’s public key:
grep wg_public_key /etc/banyan/banyan.yaml
See Authentication for details on key management.
WireGuard overlay issues
If containers on different agents cannot communicate:
- Check that
wireguard-toolsis installed on all agents:wg --version - Verify WireGuard kernel support:
ip link add wg-test type wireguard && ip link delete wg-test— if this fails, the kernel module is missing (requires Linux 5.6+ orwireguard-dkms). - Ensure port 51820/UDP is open between agents.
- If WireGuard is unavailable, Banyan falls back to VXLAN automatically. You can also force VXLAN by setting
overlay_type: "vxlan"in the engine config.
Control tunnel issues
If agents or CLI cannot connect through the WireGuard control tunnel:
- Check that the
wg-controlinterface exists:ip link show wg-control - Verify the tunnel peer:
wg show wg-control - Ensure port 51821/UDP is open from agents/CLI to the engine.
- Test connectivity:
ping 10.200.0.1from the agent/CLI. - If the control tunnel fails, Banyan falls back to direct TCP with public key metadata authentication. Check the agent/engine logs for “Control tunnel setup failed” messages.
- The CLI creates its tunnel during
banyan-cli init(requires root). The tunnel is a kernel interface and doesn’t survive reboots. After a restart, runsudo banyan-cli loginto re-establish it without re-running init. Subsequent CLI commands don’t need root.
Agent
”nerdctl not found”
Install nerdctl on the agent node:
curl -L https://github.com/containerd/nerdctl/releases/download/v2.0.3/nerdctl-2.0.3-linux-amd64.tar.gz \ | sudo tar -xz -C /usr/local/bin nerdctl“containerd not running”
Start containerd:
sudo systemctl start containerdIf containerd is not installed:
sudo apt-get install containerdAgent shows “ready” but tasks fail
Check if the Agent can pull images. SSH into the agent and test:
sudo nerdctl pull nginx:alpineIf this fails, the agent may not have internet access or the image registry may be unreachable.
Deployment
Deployment stays in “deploying” status
The Engine is waiting for Agents to complete their tasks. Check:
- Are agents connected? Run
banyan-cli agent. - Check agent logs for errors in the terminal where
agent startis running. - Verify agents can pull the images specified in your manifest.
Deployment fails immediately
Check the error message in banyan-cli deployment. Common causes:
- Image not found: The image name in
banyan.yamlis wrong or the registry is unreachable from agents. - Port conflict: Another container is already using the same host port.
”deployment timed out”
The up command waits up to 2 minutes by default. If your images are large, they may take longer to pull. Use --no-wait and check status manually:
banyan-cli up -f banyan.yaml --no-wait# Check later:banyan-cli deploymentRedeployment doesn’t replace old containers
When you run banyan-cli up again, Banyan should automatically replace old containers using a blue-green strategy. If old containers aren’t being replaced:
- Check that the application name in
banyan.yamlmatches the running deployment. The name must be identical for Banyan to recognize it as a redeployment. - If the old deployment is in
stoppingordeployingstate, the Engine waits for it to finish before scheduling the new one. Checkbanyan-cli deploymentand wait a few seconds. - If a previous redeployment failed, the old containers stay running. Fix the issue and run
banyan-cli upagain — it will retry the replacement.
Old containers still running after redeployment
During blue-green redeployment, old containers run alongside new ones until the new deployment is confirmed healthy. This overlap is expected and usually lasts a few seconds. If old containers persist:
- The new deployment may have failed. Check
banyan-cli deploymentfor the deployment status and error message. - If the new deployment failed, old containers are intentionally kept running to avoid downtime. Fix the issue and redeploy.
See Redeployment for details on how blue-green and per-service deploys work.
Per-service deploy fails with dependency error
When deploying specific services (e.g., banyan-cli up -f banyan.yaml web), Banyan validates that all depends_on dependencies are satisfied. If you see an error like:
Error: service "web" depends on "api" which is not running and not being deployedEither deploy the dependency too (banyan-cli up -f banyan.yaml web api) or make sure the dependency is already running in the existing deployment.
Containers are running but the application doesn’t work
Banyan deploys containers but does not manage application-level networking between services across nodes. Containers on the same agent can communicate via localhost. Containers on different agents need external networking or a load balancer.
General
Permission errors
The engine and agent require sudo for all commands — they manage network interfaces, iptables rules, and containers:
sudo banyan-engine initsudo systemctl enable --now banyan-engine
sudo banyan-agent initsudo systemctl enable --now banyan-agentThe CLI needs sudo for init and login (both create WireGuard kernel interfaces). All other CLI commands (up, down, engine, agent, deployment, container, events, logs, dashboard) run as your normal user. After a machine restart, run sudo banyan-cli login to re-establish the tunnel.
Checking logs
When running as a systemd service, use journalctl:
sudo journalctl -u banyan-engine -f # engine logssudo journalctl -u banyan-agent -f # agent logsWhen running in the foreground (sudo banyan-engine start), logs print to stdout.
Etcd logs:
- Managed etcd: Logs are printed to stdout alongside the engine output.
- External etcd: Check the logs of your externally managed etcd service.
Stopping containers manually
If you need to remove containers directly on an agent:
sudo nerdctl rm -f <container-name>To list all Banyan-managed containers:
sudo nerdctl ps | grep <app-name>