Auto-Scaling
Scale services up when demand increases and back down when it drops — defined in your manifest, handled by the engine.
Manual scaling
Adjust replicas on a running deployment without redeploying:
banyan-cli scale my-app api=5 api: 2 → 5 replicas (scaling up)New containers are distributed across available agents using the same resource-aware scheduling as initial deployment. Scale-down removes containers gracefully — proxy removal, DNS cleanup, drain period, then stop.
Scale multiple services at once:
banyan-cli scale my-app api=5 web=3No blue-green. No new deployment ID. Containers are added or removed individually from the running deployment.
Automatic scaling
Add deploy.autoscale to your manifest and the engine adjusts replicas based on CPU usage:
name: my-app
services: api: image: myapp/api:latest deploy: replicas: 2 # initial count autoscale: min: 2 # never go below 2 max: 10 # never go above 10 target_cpu: 70 # target average CPU % cooldown: 30s # min time between scale events stop_grace_period: 5s # drain time before stopping ports: - "8080:8080"Deploy this manifest with banyan-cli up -f banyan.yaml. Banyan starts 2 replicas and manages the count from there.
What happens under the hood
-
Metrics collection — Each agent runs
nerdctl statsevery 10 seconds and reports per-container CPU and memory to the engine. -
Autoscale evaluation — Every 30 seconds, the engine checks each service with
deploy.autoscaleconfigured:- If average CPU across all replicas >
target_cpu→ add 1 replica (up tomax) - If average CPU <
target_cpu / 2→ remove 1 replica (down tomin)
- If average CPU across all replicas >
-
Cooldown — After any scale event, no further scaling happens for the
cooldownperiod. This prevents rapid flapping when CPU fluctuates near the threshold. -
Graceful drain — When removing a container, the engine follows a sequence: remove from the load balancer, remove DNS records, wait
stop_grace_period, then stop the container.
Scale timing
From a load spike to a new container being ready:
| Step | Typical time |
|---|---|
| Agent detects CPU increase | ~10s (health check interval) |
| Engine evaluates autoscale | ~30s (evaluation interval) |
| New container starts | ~5-15s (image pull + startup) |
| Total | ~45-55s |
Scale-down is more conservative. The hysteresis threshold (target_cpu / 2) means CPU must drop significantly before replicas are removed. Combined with the cooldown, this prevents premature scale-down after a brief traffic spike.
Combining autoscale with other features
Placement constraints
Autoscale respects deploy.placement.node. New replicas are only scheduled on agents matching the glob pattern:
services: api: deploy: placement: node: compute-* # only on compute nodes autoscale: min: 1 max: 5 target_cpu: 70Resource limits
Pair autoscale with deploy.resources so the scheduler places new replicas on agents with enough capacity:
services: api: deploy: replicas: 2 resources: limits: memory: 512m cpus: "1" autoscale: min: 2 max: 8 target_cpu: 70Stateful services
Auto-scaling works best with stateless services. For databases and other stateful containers, use manual scaling instead:
# Scale Redis to 3 replicas (manual, intentional)banyan-cli scale my-app redis=3Workload rebalancing
Separate from auto-scaling, the engine monitors agent-level resource usage and rebalances containers when agents become unevenly loaded.
When it triggers: An agent’s CPU or memory exceeds 95%.
What it does: Migrates one stateless container from the overloaded agent to the least loaded agent.
What it won’t do:
- Move containers with volumes (stateful)
- Move containers with placement constraints (pinned)
- Move a container to an agent that would exceed 70% load after migration
- Move a container that was migrated less than 10 minutes ago
- Move more than one container per agent per cycle
Rebalancing is automatic and requires no configuration. It handles situations like an agent restarting empty while others carry its former workload.
Monitoring auto-scale events
Watch the engine logs for scaling activity:
engine: Autoscale: adjusting replicas deployment=my-app service=api from=2 to=3 avg_cpu=85.0% target_cpu=70Check current replica counts:
banyan-cli deployment my-appAuto-scale events also appear in the terminal dashboard (banyan-cli dashboard) and the cluster events (banyan-cli events).