Troubleshooting
Engine
Store backend issues
The Engine uses BadgerDB by default — an embedded store that requires no external process. If you’ve configured Redis or etcd as your backend instead:
“failed to connect to redis” — Ensure your Redis server is running and reachable at the configured address:
sudo apt-get install redis-server # Debian/Ubuntusudo systemctl start redis-server“failed to connect to etcd” — Ensure your etcd server is running and reachable:
sudo apt-get install etcd-server # Debian/Ubuntusudo systemctl start etcdWhen using Redis or etcd, you are responsible for running and managing these services. You choose the backend during banyan-engine init. See Store Backend for details. To switch back to BadgerDB (no external dependencies), re-run banyan-engine init and choose badger.
Engine starts but agents cannot connect
Agents connect to the Engine’s gRPC port (default: 50051). Check:
-
The Engine is running and the gRPC server started successfully (look for “Engine gRPC server listening on :50051” in the output).
-
The agent’s config has the correct engine host and port. Check
/etc/banyan/banyan.yamlon the worker:agent:engine_host: <engine-ip>engine_port: "50051" -
Port 50051 is open in your firewall between workers and the engine.
-
The agent and engine have the same cluster password.
”VPC initialization: failed to write Flannel config”
This warning about etcdctl not being found is safe to ignore. It does not affect deployment functionality.
Agent
”nerdctl not found”
Install nerdctl on the worker node:
curl -L https://github.com/containerd/nerdctl/releases/download/v2.0.3/nerdctl-2.0.3-linux-amd64.tar.gz \ | sudo tar -xz -C /usr/local/bin nerdctl“containerd not running”
Start containerd:
sudo systemctl start containerdIf containerd is not installed:
sudo apt-get install containerdAgent shows “ready” but tasks fail
Check if the Agent can pull images. SSH into the worker and test:
sudo nerdctl pull nginx:alpineIf this fails, the worker may not have internet access or the image registry may be unreachable.
Deployment
Deployment stays in “deploying” status
The Engine is waiting for Agents to complete their tasks. Check:
- Are agents connected? Run
banyan-cli status. - Check agent logs for errors in the terminal where
agent startis running. - Verify agents can pull the images specified in your manifest.
Deployment fails immediately
Check the error message in banyan-cli status. Common causes:
- Image not found: The image name in
banyan.yamlis wrong or the registry is unreachable from workers. - Port conflict: Another container is already using the same host port.
”deployment timed out”
The deploy command waits up to 2 minutes by default. If your images are large, they may take longer to pull. Use --no-wait and check status manually:
banyan-cli deploy -f banyan.yaml --no-wait# Check later:banyan-cli statusContainers are running but the application doesn’t work
Banyan deploys containers but does not manage application-level networking between services across nodes. Containers on the same worker can communicate via localhost. Containers on different workers need external networking or a load balancer.
General
Permission errors
Engine and Agent commands need root access because they manage system services (data store, containerd):
sudo banyan-engine startsudo banyan-agent start --node-name <name>The banyan-cli deploy and banyan-cli status commands do not require root (but banyan-cli init does, to write /etc/banyan/banyan.yaml).
Checking logs
Engine and Agent run in the foreground and print logs to stdout. Check the terminal where they are running.
Store backend logs:
- BadgerDB: Logs are suppressed by default (embedded, no separate log file).
- Redis/etcd: Check the logs of your externally managed Redis or etcd service.
Stopping containers manually
If you need to remove containers directly on a worker:
sudo nerdctl rm -f <container-name>To list all Banyan-managed containers:
sudo nerdctl ps | grep <app-name>