Roadmap
Milestone 1 — Core Orchestration (MVP)
Status: Done
Deploy containers across multiple servers using a familiar YAML manifest.
- Parse banyan.yaml manifest (Docker Compose-compatible syntax)
- Engine control plane with etcd-based state
- Agent workers with containerd/nerdctl container management
- Round-robin scheduling across workers
- CLI for engine, agent, and deploy workflows
- VPC networking layer (IPAM, DNS, CNI)
- E2E test infrastructure
Milestone 2 — Service Observability
Status: Done
Per-container health status, logs, and visibility from the CLI.
- Agent monitors container health after deployment (running, exited, restarting)
- Agent reports per-container status back to Engine via gRPC
banyan-cli statusshows per-service and per-container status (not just aggregate)- CLI command to stream container logs from agents (via engine gRPC proxy)
- Detect and surface failed containers (e.g. exited immediately after start)
banyan-cli downcommand to stop and remove all containers for a deployment
Milestone 3 — Basic Security
Status: Done
Secure gRPC communication between CLI, Engine, and Agents.
- All inter-component communication uses gRPC with password authentication
- Agent → Engine: password in gRPC metadata on every call
- CLI → Engine: password in gRPC metadata on every call
- Engine → Agent: session token authentication for log streaming
- Config file at
/etc/banyan/banyan.yamlwith sections:security,engine,agent,cli initcommands for engine, agent, and CLI prompt for credentials and connection info- Three separate binaries:
banyan-engine,banyan-agent,banyan-cli
Milestone 4 — Metrics Collection
Collect and store resource metrics from every node and container.
- Agent-side metric collection: CPU, memory, disk usage
- Container-level metrics: per-container CPU, memory, restart count
- Request throughput metrics per service
- Metric storage in etcd (or lightweight time-series store)
- Metric retrieval API for other components to consume
Milestone 5 — Health-Based Scheduling and Resource Requests
Smarter task distribution based on node resources instead of simple round-robin.
- Agent reports node resource usage (CPU, memory, disk) to Engine via etcd
- Engine selects the node with the most available resources when scheduling new tasks
- Resource requests in banyan.yaml: services can declare CPU and memory requirements (e.g.
cpus: 2,memory: 4g) - Engine validates that target node has sufficient resources before assigning a task
- Engine rejects deployments that exceed total cluster capacity
Milestone 6 — Multi-Engine High Availability
Multiple active engine nodes share workload for high availability and horizontal scaling.
- Active-active engines: Any engine can handle CLI requests and schedule tasks
- etcd coordination: Task claiming via Compare-And-Swap to prevent duplication
- Distributed registry: Index-based lookup so agents pull images from the correct engine
- Optimistic locking: Concurrent deployment updates are serialized
- Session state in etcd: Agents can reconnect to any engine
- Client load balancing: CLI connects to any available engine
See Multi-Engine HA Design for detailed architecture.
Milestone 7 — Auto-Scaling and Redeployment
Scale services based on metrics and support zero-downtime updates.
- Auto-scaling: Define scaling rules in the manifest (min/max replicas, target thresholds)
- Auto-scaling: Engine evaluates metrics against rules and adjusts replica count
- Auto-scaling: Graceful scale-down (drain before stopping)
- Redeployment: Rolling update when service image or config changes
- Redeployment: Health check between rollout steps
- Redeployment: Automatic rollback on failure
Milestone 8 — Monitoring Dashboard and CLI
Give operators visibility into the cluster through a web UI and CLI commands.
- CLI: Live cluster status with per-node resource usage
- CLI: Per-service metrics (replicas, throughput, error rate)
- CLI: Container log streaming
- Dashboard: Web UI for cluster overview
- Dashboard: Deployment history and status
- Dashboard: Real-time metrics and graphs
Milestone 9 — Advanced Security
Stronger authentication model for production environments.
- Private key authentication for agent-to-engine connections
- Private key authentication for CLI-to-engine and CLI-to-agent
- Key generation and distribution tooling
- Certificate rotation support
Milestone 10 — Advanced Metrics and Dashboard Enhancements
Deeper observability and richer operational tooling.
- Custom application metrics (user-defined)
- Alerting rules and notifications
- Historical trends and capacity planning views
- Multi-cluster dashboard support
- Metric export to external systems (Prometheus, Grafana)