# FEATURE.md — Orchestration Service ## Purpose Define features, data models, behaviors, and operational rules for an Orchestration Service that assigns and coordinates agents to complete tasks inside isolated running containers. --- ## Service Overview - Name — Orchestration Service for agent-based task execution. - Goal — Accept task definitions, schedule and dispatch agents, manage execution environments, track progress and complexity, and escalate on failures. - Primary actors — Task, Agent, Running Container, Tool, Scheduler, Operator. --- ## Task Model ### Core Fields - id — Unique identifier. - name — Human-readable title. - prompt — Primary instruction text prepended to the task before execution. - container — Reference to the required running container. - resolve_time_estimate — Estimated completion time in minutes. - task_tags — Array of TaskTag values describing domain(s). - complexity_level — One of: Low, Middle, Hard, ExtremHard. - escalation — Optional object with reason and prompt. - metadata — Free-form key/value map. ### Task Flow Control A task is always a first-class entity. There is no parent/subtask concept. Tasks chain to each other via `next_task`. Tasks are organised into groups using a `TaskGroup` entity that defines sequential or parallel execution. - schedule — Schedule definition. - type: immediate, time-based, datetime-event. - datetime-event is used for the schedule task feature (starts on a specific date/time). - next_task — Optional ID of the next task in the chain (within or across groups). - previous_task — Optional ID of the previous task in the chain. Specialised control-flow behaviours — iteration, conditional branching, and jump-with-prompt — are expressed as distinct task types (see **Specialised Task Types** below). Each specialised task type is a separate entity and a separate database row with a `task_type` discriminator. --- ## Task Groups ### Purpose A `TaskGroup` is a first-class chain participant, just like a task. It owns a set of children, waits for all of them to complete, and then advances to its `next` item — which can itself be either a task or another group. Children of a group can be any mix of: - `AgentTask` (standard or any specialised type: `ForeachTask`, `GotoTask`, `ConditionTask`) - Nested `TaskGroup` instances (groups can be nested arbitrarily deep) ### TaskGroup Fields - id — Unique identifier. - name — Human-readable label. - type — `Sequential` or `Parallel`. - `Sequential` — Children run one at a time in the declared order. When one child completes, the next child is activated. - `Parallel` — All children are dispatched simultaneously. The group completes when every child has finished. - children — Ordered list of `GroupChildRef` entries. Each entry identifies a child by its ID and kind (`Task` or `Group`). - next — Optional `GroupChildRef` identifying the item to activate after this group completes. It can point to a task or another group. - previous — Optional `GroupChildRef` identifying the item that precedes this group in the chain. ### GroupChildRef A `GroupChildRef` is a value object that points to either a task or a group: - child_id — Guid of the child. - kind — `Task` or `Group`. ### Orchestrator Behavior - A group begins execution when it is activated (either by the root scheduler or by the `next` pointer of a predecessor). - **Sequential**: the engine activates the first child; when that child completes, it activates the second child, and so on. After the last child completes, the group itself is marked complete and activates its `next`. - **Parallel**: the engine activates all children simultaneously. After all children complete, the group is marked complete and activates its `next`. - If a child is itself a `TaskGroup`, the same rules apply recursively — the nested group must complete before it counts as done for its parent. - A child belongs to at most one group. - Cycles in the chain (via `next` or through children) must be rejected at validation time (`DependencyCycleException`). --- ## Specialised Task Types Each specialised control-flow behaviour is a distinct domain entity. Each type is stored as a separate database row with a `task_type` discriminator column. All specialised types inherit the core task fields and `next_task` / `previous_task` chaining. ### ForeachTask Iterates over a list of items and spawns one task instance per item. - foreach_items — Ordered list of string values to iterate over. - foreach_template_task_id — ID of the task template to clone per item. - Generated task instances are linked via `next_task` / `previous_task` in creation order and can be placed into a group. ### GotoTask On completion, merges additional context into a target task and activates it — regardless of where the target sits in the chain. - goto_target_task_id — ID of the task to activate. - goto_prompt — Additional prompt text merged (appended) into the target task's `Prompt` before execution. ### ConditionTask (If-Task) Evaluates a boolean expression against the task result and routes to one of two branches. - condition — A boolean expression evaluated against the task result. - true_next_task_id — ID of the task to activate when the condition evaluates to true. - false_next_task_id — ID of the task to activate when the condition evaluates to false. --- ## Schedule ### Types - time-based — Cron or ISO schedule expression. - datetime-event — Start at a fixed date/time instant or range (e.g. `2026-04-01T09:00:00Z`). This enables schedule tasks. - immediate — Start as soon as created. The system uses normal event publishing and handling for task coordination; it does not use dedicated signal-based schedules. ### Goto Tasks - Use a `GotoTask` (see **Specialised Task Types**) to jump to an arbitrary target task. - The target task receives the additional `goto_prompt` text merged (appended) into its `Prompt` and activates immediately when the `GotoTask` completes. ### Retry Policy - attempts — integer. - backoff — fixed or exponential. - max_retries — integer. --- ## Complexity Level ### Definition Defines how complex a task is and when it must be split. | Level | Assignment rule | Characteristics | |-------------|-------------------------------------------|----------------| | Low | resolve_time_estimate ≤ 30 | Simple logic; single topic. | | Middle | 30 < resolve_time_estimate ≤ 60 | Moderate complexity; single topic. | | Hard | resolve_time_estimate > 60 | Multi-topic; requires subtasks. | | ExtremHard | resolve_time_estimate unknown/unbounded | Multi-topic; unknown duration. | ### Automatic Rules - resolve_time_estimate > 60 → Hard. - Missing or unbounded estimate → ExtremHard. - Parent prompt must be prepended to all subtasks. --- ## TaskTag ### Purpose Enum identifying task working areas and agent capabilities. ### Example Values - DATA_INGESTION - NLP - IMAGE_PROCESSING - DEPLOYMENT - SECURITY - TESTING - DOCUMENTATION --- ## Agent ### Core Fields - id — Unique identifier. - skills — List of assigned skills. - task_tags — List of TaskTag values the agent can handle. - choose_rules — Matching rules for accepting tasks. - capabilities — wait_for_subagents, call_subagent, return_status, escalation. ### Matching Rules Examples - Accept if task has a single TaskTag equal to one of agent's TaskTags. - Accept if agent supports a superset of task tags. - Accept only when agent has all TaskTags required by the task. ### Execution Behavior - Agent executes inside its container and returns success or failure. - On failure, escalation.reason and escalation.prompt are required. - Agents may spawn sub-agents and wait for them. --- ## Running Container ### Definition Isolated runtime environment required for a task. ### Attributes - image — Container image reference. - tools — List of required tools and binaries. - resources — CPU, memory, disk limits. - network_policy — Allowed network rules. - volumes — Mounts and persistence rules. ### Constraints - Agents cannot access resources outside their container. - Containers must declare all tools at startup. --- ## Tool ### Definition Runnable binary or library available inside a container. ### Attributes - name — Tool name. - version — Required version. - purpose — Description of tool’s role. - usage_guidelines — Allowed and forbidden operations. - install_instructions — Installation steps. --- ## Failure and Escalation Flow ### Agent Failure - Agent returns failure with escalation.reason and escalation.prompt. - Orchestrator evaluates retry policy. - If retries exhausted → escalate to parent agent or human operator. ### Logging - All failures and escalations must be logged with timestamps, container id, agent id, and error context. --- ## Frontend CLI ### Capabilities - Manage Agents, Tasks, Tools, Containers, Complexity Levels, TaskTags, Schedules. - Create, update, delete, list resources. - Start tasks manually and view logs. - Trigger event-driven execution. ### Example Commands - orchestrate agent create --name --task-tags --skills - orchestrate task create --name --prompt-file --container --schedule - orchestrate container define --image --tools --resources - orchestrate task start --id - orchestrate task status --id --follow --- ## Example Task Definition (YAML) ```yaml id: task-200 name: "Transform raw data" prompt: "Clean and normalize raw input data." container: "data-worker:v1.0" resolve_time_estimate: 20 task_tags: - DATA_INGESTION complexity_level: Low schedule: type: immediate group_id: "group-preprocessing" tools: - python:3.11 - pip:latest escalation: reason: "Validation failed" prompt: "Please inspect input data and retry."