an external controller.
# ~/.hermes/config.yaml
kanban:
dispatch_in_gateway: false
dispatch_interval_seconds: 0
Setting dispatch_interval_seconds to 0 disables the internal ticker. The board remains fully functional for manual CLI operations, but automated promotion is now exclusively controlled by your external scheduler.
Step 2: Build a Slot-Aware Dispatch Controller
The original CLI --max parameter only caps spawns per execution. To enforce a hard concurrency ceiling, you must query the current running count, calculate remaining capacity, and pass that value to the dispatch command. The following Python controller implements this logic with robust error handling and structured logging.
#!/usr/bin/env python3
"""
kanban_flow_controller.py
Calculates available concurrency slots and dispatches tasks safely.
"""
import subprocess
import sys
import logging
import os
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [KANBAN-CTRL] %(levelname)s: %(message)s"
)
TARGET_CONCURRENCY = int(os.getenv("KANBAN_MAX_PARALLEL", "2"))
BOARD_NAME = os.getenv("KANBAN_BOARD_ID", "")
HERMES_BIN = os.getenv("HERMES_PATH", "hermes")
def run_hermes_cmd(args: list[str]) -> str:
cmd = [HERMES_BIN, "kanban"] + args
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
return result.stdout.strip()
except subprocess.CalledProcessError as exc:
logging.error("Hermes CLI failed: %s", exc.stderr)
sys.exit(1)
def get_active_task_count() -> int:
output = run_hermes_cmd(["list", "--status", "running"])
if not output or "(no matching tasks)" in output:
return 0
return len(output.splitlines())
def dispatch_remaining_slots(available: int) -> None:
if available <= 0:
logging.info("Concurrency limit reached. Skipping dispatch.")
return
board_flag = ["--board", BOARD_NAME] if BOARD_NAME else []
logging.info("Dispatching up to %d tasks.", available)
run_hermes_cmd(board_flag + ["dispatch", "--max", str(available)])
def main() -> None:
active = get_active_task_count()
remaining = TARGET_CONCURRENCY - active
logging.info("Active: %d | Target: %d | Available: %d", active, TARGET_CONCURRENCY, remaining)
dispatch_remaining_slots(remaining)
if __name__ == "__main__":
main()
Architecture Rationale:
- Python over Bash: Provides native subprocess handling, structured logging, and cleaner environment variable parsing. Reduces shell quoting pitfalls in production cron environments.
- State Query First:
list --status running reads the SQLite WAL journal directly, ensuring the count reflects in-flight tasks before any new claims occur.
- Dynamic
--max Injection: The controller computes remaining and passes it to dispatch --max. This guarantees the total running tasks never exceed TARGET_CONCURRENCY, regardless of queue depth.
Step 3: Model Sequential Dependencies
Not all workloads benefit from parallelism. Data pipelines, infrastructure migrations, and shared-state operations require strict ordering. Hermes Kanban supports parent-child relationships that gate child dispatch until the parent reaches a terminal state.
#!/usr/bin/env bash
# setup_sequential_pipeline.sh
set -euo pipefail
echo "Creating parent ingestion task..."
PARENT_REF=$(hermes kanban add \
--title "Raw log ingestion pipeline" \
--profile "data-engineer" \
--column backlog)
echo "Parent created: ${PARENT_REF}"
echo "Registering dependent analysis tasks..."
hermes kanban add \
--title "Statistical anomaly detection" \
--profile "ml-analyst" \
--column backlog \
--parent "${PARENT_REF}"
hermes kanban add \
--title "Compliance report generation" \
--profile "compliance-auditor" \
--column backlog \
--parent "${PARENT_REF}"
echo "Pipeline registered. Children will remain gated until parent completes."
Why This Works: The dispatcher evaluates dependency graphs before claiming cards. Children in ready state remain invisible to the claim algorithm until the parent transitions to done. This eliminates manual concurrency tuning for sequential workflows and prevents intermediate artifact corruption.
Step 4: Schedule with Cron and File Locking
External dispatch requires a reliable trigger. Linux cron provides deterministic execution, but minute-level granularity can feel sluggish for fast-failing tasks. A sub-minute loop wrapped in flock solves this without spawning overlapping processes.
#!/usr/bin/env bash
# kanban_subminute_scheduler.sh
set -euo pipefail
LOCK_PATH="/tmp/kanban_dispatch.lock"
EXEC_PATH="/opt/agents/scripts/kanban_flow_controller.py"
TICK_INTERVAL="${TICK_INTERVAL:-15}"
MAX_TICKS="${MAX_TICKS:-4}"
exec 9>"${LOCK_PATH}"
flock -n 9 || { echo "Scheduler already running. Exiting."; exit 0; }
for (( i=1; i<=MAX_TICKS; i++ )); do
python3 "${EXEC_PATH}"
if (( i < MAX_TICKS )); then
sleep "${TICK_INTERVAL}"
fi
done
The flock -n command acquires an exclusive non-blocking lock. If a previous tick is still executing, the new invocation exits immediately. This prevents SQLite contention and ensures only one dispatcher cycle runs at any given moment.
Pitfall Guide
1. Dual Dispatcher Race Conditions
Explanation: Running hermes kanban daemon alongside gateway-embedded dispatch creates concurrent SQLite readers/writers. Both processes attempt to claim the same ready cards, resulting in duplicate executions or corrupted state transitions.
Fix: Explicitly set dispatch_in_gateway: false and verify with pgrep -af "hermes" that only one scheduler process exists. Use flock to serialize external dispatch attempts.
2. Misinterpreting the --max Flag
Explanation: hermes kanban dispatch --max 3 limits new spawns during that tick, not the total number of running tasks. If 5 tasks are already running, executing this command can push concurrency to 8.
Fix: Always calculate available_slots = target_limit - active_count before invoking dispatch. Never pass a static number to --max in production.
3. Cron Environment Path Blind Spots
Explanation: Cron executes with a minimal $PATH. If hermes or python3 is installed in a user-specific directory (e.g., ~/.local/bin), the scheduler will fail silently or throw command not found.
Fix: Export absolute paths in the cron environment or wrapper script. Use which hermes and which python3 to resolve binaries, then hardcode them or set PATH=/usr/local/bin:/usr/bin:/home/user/.local/bin at the top of the script.
4. Ignoring VRAM Fragmentation vs. Compute Saturation
Explanation: GPU utilization metrics (e.g., nvidia-smi 90% usage) do not guarantee stability. LLM inference suffers from KV-cache fragmentation. Multiple concurrent requests can fragment VRAM, causing OOM kills even when compute appears available.
Fix: Monitor vLLM or Ollama memory allocation logs alongside utilization. Set concurrency limits based on worst-case context window requirements, not average token throughput. Use --max-model-len and --gpu-memory-utilization flags to reserve headroom.
5. Over-Reliance on Internal Agent Scheduling
Explanation: Using the agent's own LLM to schedule dispatch commands (e.g., prompting the model to run hermes kanban dispatch) introduces circular dependencies. When the model is busy, scheduling stalls. When the model crashes, the queue deadlocks.
Fix: Keep scheduling entirely outside the inference loop. Use OS-level cron, systemd timers, or external workflow engines. The LLM should only consume tasks, never manage them.
6. Static Concurrency Caps on Dynamic Workloads
Explanation: Hardcoding TARGET_CONCURRENCY=2 works for uniform tasks but fails when mixing lightweight classification jobs with heavy reasoning pipelines. The GPU sits idle while waiting for long tasks to finish.
Fix: Implement tiered concurrency pools. Route tasks by profile to separate dispatch controllers with different limits. Use hermes kanban list --profile "lightweight" vs --profile "reasoning" to calculate independent slot availability.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Mixed interactive + batch workloads | Slot-aware cron controller + tiered concurrency pools | Prevents background jobs from starving interactive queries | Low (CPU overhead for scheduler) |
| Strict data pipelines with shared artifacts | Parent-child dependency gating | Eliminates race conditions on intermediate storage | None (native Kanban feature) |
| Multi-GPU cluster with heterogeneous workloads | Profile-based dispatch routing + independent controllers | Matches task complexity to GPU capability | Medium (requires board partitioning) |
| Low-memory edge devices (CPU/Integrated GPU) | Dependency sequencing + TARGET_CONCURRENCY=1 | Prevents context thrashing and swap thrashing | None |
| High-throughput cloud API fallback | Gateway-embedded dispatch + provider rate limits | Leverages elastic scaling and upstream backpressure | High (API costs scale with concurrency) |
Configuration Template
# ~/.hermes/config.yaml
kanban:
dispatch_in_gateway: false
dispatch_interval_seconds: 0
board_path: "~/.hermes/kanban.db"
# Environment variables for scheduler
KANBAN_MAX_PARALLEL=2
KANBAN_BOARD_ID=production-pipeline
HERMES_PATH=/usr/local/bin/hermes
# crontab -e
# Sub-minute dispatch scheduler with file locking
* * * * * /opt/agents/scripts/kanban_subminute_scheduler.sh >> /var/log/hermes/dispatch.log 2>&1
-- SQLite optimization for concurrent Kanban access
PRAGMA journal_mode=WAL;
PRAGMA busy_timeout=5000;
PRAGMA cache_size=-64000;
Quick Start Guide
- Isolate the dispatcher: Edit
~/.hermes/config.yaml and set dispatch_in_gateway: false. Restart any running gateway processes.
- Deploy the controller: Save
kanban_flow_controller.py to /opt/agents/scripts/, make it executable, and set KANBAN_MAX_PARALLEL=2 in your environment.
- Schedule execution: Add the
flock-wrapped scheduler to your crontab. Verify with crontab -l and monitor /var/log/hermes/dispatch.log for the first three ticks.
- Validate concurrency: Run
hermes kanban list --status running while the queue is processing. The count should never exceed your TARGET_CONCURRENCY value.
- Tune for hardware: Adjust
KANBAN_MAX_PARALLEL based on VRAM allocation logs. Increase only when latency remains stable under sustained load.