9.8 KiB
FEATURE.md — Orchestration Service
Purpose
Define features, data models, behaviors, and operational rules for an Orchestration Service that assigns and coordinates agents to complete tasks inside isolated running containers.
Service Overview
- Name — Orchestration Service for agent-based task execution.
- Goal — Accept task definitions, schedule and dispatch agents, manage execution environments, track progress and complexity, and escalate on failures.
- Primary actors — Task, Agent, Running Container, Tool, Scheduler, Operator.
Task Model
Core Fields
- id — Unique identifier.
- name — Human-readable title.
- prompt — Primary instruction text prepended to the task before execution.
- container — Reference to the required running container.
- resolve_time_estimate — Estimated completion time in minutes.
- task_tags — Array of TaskTag values describing domain(s).
- complexity_level — One of: Low, Middle, Hard, ExtremHard.
- escalation — Optional object with reason and prompt.
- metadata — Free-form key/value map.
Task Flow Control
A task is always a first-class entity. There is no parent/subtask concept. Tasks chain to each other via next_task. Tasks are organised into groups using a TaskGroup entity that defines sequential or parallel execution.
- schedule — Schedule definition.
- type: immediate, time-based, datetime-event.
- datetime-event is used for the schedule task feature (starts on a specific date/time).
- next_task — Optional ID of the next task in the chain (within or across groups).
- previous_task — Optional ID of the previous task in the chain.
Specialised control-flow behaviours — iteration, conditional branching, and jump-with-prompt — are expressed as distinct task types (see Specialised Task Types below). Each specialised task type is a separate entity and a separate database row with a task_type discriminator.
Task Groups
Purpose
A TaskGroup is a first-class chain participant, just like a task. It owns a set of children, waits for all of them to complete, and then advances to its next item — which can itself be either a task or another group.
Children of a group can be any mix of:
AgentTask(standard or any specialised type:ForeachTask,GotoTask,ConditionTask)- Nested
TaskGroupinstances (groups can be nested arbitrarily deep)
TaskGroup Fields
- id — Unique identifier.
- name — Human-readable label.
- type —
SequentialorParallel.Sequential— Children run one at a time in the declared order. When one child completes, the next child is activated.Parallel— All children are dispatched simultaneously. The group completes when every child has finished.
- children — Ordered list of
GroupChildRefentries. Each entry identifies a child by its ID and kind (TaskorGroup). - next — Optional
GroupChildRefidentifying the item to activate after this group completes. It can point to a task or another group. - previous — Optional
GroupChildRefidentifying the item that precedes this group in the chain.
GroupChildRef
A GroupChildRef is a value object that points to either a task or a group:
- child_id — Guid of the child.
- kind —
TaskorGroup.
Orchestrator Behavior
- A group begins execution when it is activated (either by the root scheduler or by the
nextpointer of a predecessor). - Sequential: the engine activates the first child; when that child completes, it activates the second child, and so on. After the last child completes, the group itself is marked complete and activates its
next. - Parallel: the engine activates all children simultaneously. After all children complete, the group is marked complete and activates its
next. - If a child is itself a
TaskGroup, the same rules apply recursively — the nested group must complete before it counts as done for its parent. - A child belongs to at most one group.
- Cycles in the chain (via
nextor through children) must be rejected at validation time (DependencyCycleException).
Specialised Task Types
Each specialised control-flow behaviour is a distinct domain entity. Each type is stored as a separate database row with a task_type discriminator column. All specialised types inherit the core task fields and next_task / previous_task chaining.
ForeachTask
Iterates over a list of items and spawns one task instance per item.
- foreach_items — Ordered list of string values to iterate over.
- foreach_template_task_id — ID of the task template to clone per item.
- Generated task instances are linked via
next_task/previous_taskin creation order and can be placed into a group.
GotoTask
On completion, merges additional context into a target task and activates it — regardless of where the target sits in the chain.
- goto_target_task_id — ID of the task to activate.
- goto_prompt — Additional prompt text merged (appended) into the target task's
Promptbefore execution.
ConditionTask (If-Task)
Evaluates a boolean expression against the task result and routes to one of two branches.
- condition — A boolean expression evaluated against the task result.
- true_next_task_id — ID of the task to activate when the condition evaluates to true.
- false_next_task_id — ID of the task to activate when the condition evaluates to false.
Schedule
Types
- time-based — Cron or ISO schedule expression.
- datetime-event — Start at a fixed date/time instant or range (e.g.
2026-04-01T09:00:00Z). This enables schedule tasks. - immediate — Start as soon as created.
The system uses normal event publishing and handling for task coordination; it does not use dedicated signal-based schedules.
Goto Tasks
- Use a
GotoTask(see Specialised Task Types) to jump to an arbitrary target task. - The target task receives the additional
goto_prompttext merged (appended) into itsPromptand activates immediately when theGotoTaskcompletes.
Retry Policy
- attempts — integer.
- backoff — fixed or exponential.
- max_retries — integer.
Complexity Level
Definition
Defines how complex a task is and when it must be split.
| Level | Assignment rule | Characteristics |
|---|---|---|
| Low | resolve_time_estimate ≤ 30 | Simple logic; single topic. |
| Middle | 30 < resolve_time_estimate ≤ 60 | Moderate complexity; single topic. |
| Hard | resolve_time_estimate > 60 | Multi-topic; requires subtasks. |
| ExtremHard | resolve_time_estimate unknown/unbounded | Multi-topic; unknown duration. |
Automatic Rules
- resolve_time_estimate > 60 → Hard.
- Missing or unbounded estimate → ExtremHard.
- Parent prompt must be prepended to all subtasks.
TaskTag
Purpose
Enum identifying task working areas and agent capabilities.
Example Values
- DATA_INGESTION
- NLP
- IMAGE_PROCESSING
- DEPLOYMENT
- SECURITY
- TESTING
- DOCUMENTATION
Agent
Core Fields
- id — Unique identifier.
- skills — List of assigned skills.
- task_tags — List of TaskTag values the agent can handle.
- choose_rules — Matching rules for accepting tasks.
- capabilities — wait_for_subagents, call_subagent, return_status, escalation.
Matching Rules Examples
- Accept if task has a single TaskTag equal to one of agent's TaskTags.
- Accept if agent supports a superset of task tags.
- Accept only when agent has all TaskTags required by the task.
Execution Behavior
- Agent executes inside its container and returns success or failure.
- On failure, escalation.reason and escalation.prompt are required.
- Agents may spawn sub-agents and wait for them.
Running Container
Definition
Isolated runtime environment required for a task.
Attributes
- image — Container image reference.
- tools — List of required tools and binaries.
- resources — CPU, memory, disk limits.
- network_policy — Allowed network rules.
- volumes — Mounts and persistence rules.
Constraints
- Agents cannot access resources outside their container.
- Containers must declare all tools at startup.
Tool
Definition
Runnable binary or library available inside a container.
Attributes
- name — Tool name.
- version — Required version.
- purpose — Description of tool’s role.
- usage_guidelines — Allowed and forbidden operations.
- install_instructions — Installation steps.
Failure and Escalation Flow
Agent Failure
- Agent returns failure with escalation.reason and escalation.prompt.
- Orchestrator evaluates retry policy.
- If retries exhausted → escalate to parent agent or human operator.
Logging
- All failures and escalations must be logged with timestamps, container id, agent id, and error context.
Frontend CLI
Capabilities
- Manage Agents, Tasks, Tools, Containers, Complexity Levels, TaskTags, Schedules.
- Create, update, delete, list resources.
- Start tasks manually and view logs.
- Trigger event-driven execution.
Example Commands
- orchestrate agent create --name --task-tags --skills
- orchestrate task create --name --prompt-file --container
--schedule
- orchestrate container define --image
--tools --resources
- orchestrate task start --id
- orchestrate task status --id --follow
Example Task Definition (YAML)
id: task-200
name: "Transform raw data"
prompt: "Clean and normalize raw input data."
container: "data-worker:v1.0"
resolve_time_estimate: 20
task_tags:
- DATA_INGESTION
complexity_level: Low
schedule:
type: immediate
group_id: "group-preprocessing"
tools:
- python:3.11
- pip:latest
escalation:
reason: "Validation failed"
prompt: "Please inspect input data and retry."