docs
This commit is contained in:
256
docs/features.md
Normal file
256
docs/features.md
Normal file
@@ -0,0 +1,256 @@
|
||||
# FEATURE.md — Orchestration Service
|
||||
|
||||
## Purpose
|
||||
Define features, data models, behaviors, and operational rules for an Orchestration Service that assigns and coordinates agents to complete tasks inside isolated running containers.
|
||||
|
||||
---
|
||||
|
||||
## Service Overview
|
||||
- Name — Orchestration Service for agent-based task execution.
|
||||
- Goal — Accept task definitions, schedule and dispatch agents, manage execution environments, track progress and complexity, and escalate on failures.
|
||||
- Primary actors — Task, Agent, Running Container, Tool, Scheduler, Operator.
|
||||
|
||||
---
|
||||
|
||||
## Task Model
|
||||
|
||||
### Core Fields
|
||||
- id — Unique identifier.
|
||||
- name — Human-readable title.
|
||||
- prompt — Primary instruction text prepended to the task before execution.
|
||||
- container — Reference to the required running container.
|
||||
- resolve_time_estimate — Estimated completion time in minutes.
|
||||
- task_tags — Array of TaskTag values describing domain(s).
|
||||
- complexity_level — One of: Low, Middle, Hard, ExtremHard.
|
||||
- escalation — Optional object with reason and prompt.
|
||||
- metadata — Free-form key/value map.
|
||||
|
||||
### Task Flow Control
|
||||
A task is always a first-class entity. There is no parent/subtask concept. Tasks chain to each other via `next_task`. Tasks are organised into groups using a `TaskGroup` entity that defines sequential or parallel execution.
|
||||
- schedule — Schedule definition.
|
||||
- type: immediate, time-based, datetime-event.
|
||||
- datetime-event is used for the schedule task feature (starts on a specific date/time).
|
||||
- next_task — Optional ID of the next task in the chain (within or across groups).
|
||||
- previous_task — Optional ID of the previous task in the chain.
|
||||
|
||||
Specialised control-flow behaviours — iteration, conditional branching, and jump-with-prompt — are expressed as distinct task types (see **Specialised Task Types** below). Each specialised task type is a separate entity and a separate database row with a `task_type` discriminator.
|
||||
|
||||
---
|
||||
|
||||
## Task Groups
|
||||
|
||||
### Purpose
|
||||
A `TaskGroup` is a first-class chain participant, just like a task. It owns a set of children, waits for all of them to complete, and then advances to its `next` item — which can itself be either a task or another group.
|
||||
|
||||
Children of a group can be any mix of:
|
||||
- `AgentTask` (standard or any specialised type: `ForeachTask`, `GotoTask`, `ConditionTask`)
|
||||
- Nested `TaskGroup` instances (groups can be nested arbitrarily deep)
|
||||
|
||||
### TaskGroup Fields
|
||||
- id — Unique identifier.
|
||||
- name — Human-readable label.
|
||||
- type — `Sequential` or `Parallel`.
|
||||
- `Sequential` — Children run one at a time in the declared order. When one child completes, the next child is activated.
|
||||
- `Parallel` — All children are dispatched simultaneously. The group completes when every child has finished.
|
||||
- children — Ordered list of `GroupChildRef` entries. Each entry identifies a child by its ID and kind (`Task` or `Group`).
|
||||
- next — Optional `GroupChildRef` identifying the item to activate after this group completes. It can point to a task or another group.
|
||||
- previous — Optional `GroupChildRef` identifying the item that precedes this group in the chain.
|
||||
|
||||
### GroupChildRef
|
||||
A `GroupChildRef` is a value object that points to either a task or a group:
|
||||
- child_id — Guid of the child.
|
||||
- kind — `Task` or `Group`.
|
||||
|
||||
### Orchestrator Behavior
|
||||
- A group begins execution when it is activated (either by the root scheduler or by the `next` pointer of a predecessor).
|
||||
- **Sequential**: the engine activates the first child; when that child completes, it activates the second child, and so on. After the last child completes, the group itself is marked complete and activates its `next`.
|
||||
- **Parallel**: the engine activates all children simultaneously. After all children complete, the group is marked complete and activates its `next`.
|
||||
- If a child is itself a `TaskGroup`, the same rules apply recursively — the nested group must complete before it counts as done for its parent.
|
||||
- A child belongs to at most one group.
|
||||
- Cycles in the chain (via `next` or through children) must be rejected at validation time (`DependencyCycleException`).
|
||||
|
||||
---
|
||||
|
||||
## Specialised Task Types
|
||||
|
||||
Each specialised control-flow behaviour is a distinct domain entity. Each type is stored as a separate database row with a `task_type` discriminator column. All specialised types inherit the core task fields and `next_task` / `previous_task` chaining.
|
||||
|
||||
### ForeachTask
|
||||
Iterates over a list of items and spawns one task instance per item.
|
||||
- foreach_items — Ordered list of string values to iterate over.
|
||||
- foreach_template_task_id — ID of the task template to clone per item.
|
||||
- Generated task instances are linked via `next_task` / `previous_task` in creation order and can be placed into a group.
|
||||
|
||||
### GotoTask
|
||||
On completion, merges additional context into a target task and activates it — regardless of where the target sits in the chain.
|
||||
- goto_target_task_id — ID of the task to activate.
|
||||
- goto_prompt — Additional prompt text merged (appended) into the target task's `Prompt` before execution.
|
||||
|
||||
### ConditionTask (If-Task)
|
||||
Evaluates a boolean expression against the task result and routes to one of two branches.
|
||||
- condition — A boolean expression evaluated against the task result.
|
||||
- true_next_task_id — ID of the task to activate when the condition evaluates to true.
|
||||
- false_next_task_id — ID of the task to activate when the condition evaluates to false.
|
||||
|
||||
---
|
||||
|
||||
## Schedule
|
||||
|
||||
### Types
|
||||
- time-based — Cron or ISO schedule expression.
|
||||
- datetime-event — Start at a fixed date/time instant or range (e.g. `2026-04-01T09:00:00Z`). This enables schedule tasks.
|
||||
- immediate — Start as soon as created.
|
||||
|
||||
The system uses normal event publishing and handling for task coordination; it does not use dedicated signal-based schedules.
|
||||
|
||||
### Goto Tasks
|
||||
- Use a `GotoTask` (see **Specialised Task Types**) to jump to an arbitrary target task.
|
||||
- The target task receives the additional `goto_prompt` text merged (appended) into its `Prompt` and activates immediately when the `GotoTask` completes.
|
||||
|
||||
### Retry Policy
|
||||
- attempts — integer.
|
||||
- backoff — fixed or exponential.
|
||||
- max_retries — integer.
|
||||
|
||||
---
|
||||
|
||||
## Complexity Level
|
||||
|
||||
### Definition
|
||||
Defines how complex a task is and when it must be split.
|
||||
|
||||
| Level | Assignment rule | Characteristics |
|
||||
|-------------|-------------------------------------------|----------------|
|
||||
| Low | resolve_time_estimate ≤ 30 | Simple logic; single topic. |
|
||||
| Middle | 30 < resolve_time_estimate ≤ 60 | Moderate complexity; single topic. |
|
||||
| Hard | resolve_time_estimate > 60 | Multi-topic; requires subtasks. |
|
||||
| ExtremHard | resolve_time_estimate unknown/unbounded | Multi-topic; unknown duration. |
|
||||
|
||||
### Automatic Rules
|
||||
- resolve_time_estimate > 60 → Hard.
|
||||
- Missing or unbounded estimate → ExtremHard.
|
||||
- Parent prompt must be prepended to all subtasks.
|
||||
|
||||
---
|
||||
|
||||
## TaskTag
|
||||
|
||||
### Purpose
|
||||
Enum identifying task working areas and agent capabilities.
|
||||
|
||||
### Example Values
|
||||
- DATA_INGESTION
|
||||
- NLP
|
||||
- IMAGE_PROCESSING
|
||||
- DEPLOYMENT
|
||||
- SECURITY
|
||||
- TESTING
|
||||
- DOCUMENTATION
|
||||
|
||||
---
|
||||
|
||||
## Agent
|
||||
|
||||
### Core Fields
|
||||
- id — Unique identifier.
|
||||
- skills — List of assigned skills.
|
||||
- task_tags — List of TaskTag values the agent can handle.
|
||||
- choose_rules — Matching rules for accepting tasks.
|
||||
- capabilities — wait_for_subagents, call_subagent, return_status, escalation.
|
||||
|
||||
### Matching Rules Examples
|
||||
- Accept if task has a single TaskTag equal to one of agent's TaskTags.
|
||||
- Accept if agent supports a superset of task tags.
|
||||
- Accept only when agent has all TaskTags required by the task.
|
||||
|
||||
### Execution Behavior
|
||||
- Agent executes inside its container and returns success or failure.
|
||||
- On failure, escalation.reason and escalation.prompt are required.
|
||||
- Agents may spawn sub-agents and wait for them.
|
||||
|
||||
---
|
||||
|
||||
## Running Container
|
||||
|
||||
### Definition
|
||||
Isolated runtime environment required for a task.
|
||||
|
||||
### Attributes
|
||||
- image — Container image reference.
|
||||
- tools — List of required tools and binaries.
|
||||
- resources — CPU, memory, disk limits.
|
||||
- network_policy — Allowed network rules.
|
||||
- volumes — Mounts and persistence rules.
|
||||
|
||||
### Constraints
|
||||
- Agents cannot access resources outside their container.
|
||||
- Containers must declare all tools at startup.
|
||||
|
||||
---
|
||||
|
||||
## Tool
|
||||
|
||||
### Definition
|
||||
Runnable binary or library available inside a container.
|
||||
|
||||
### Attributes
|
||||
- name — Tool name.
|
||||
- version — Required version.
|
||||
- purpose — Description of tool’s role.
|
||||
- usage_guidelines — Allowed and forbidden operations.
|
||||
- install_instructions — Installation steps.
|
||||
|
||||
---
|
||||
|
||||
## Failure and Escalation Flow
|
||||
|
||||
### Agent Failure
|
||||
- Agent returns failure with escalation.reason and escalation.prompt.
|
||||
- Orchestrator evaluates retry policy.
|
||||
- If retries exhausted → escalate to parent agent or human operator.
|
||||
|
||||
### Logging
|
||||
- All failures and escalations must be logged with timestamps, container id, agent id, and error context.
|
||||
|
||||
---
|
||||
|
||||
## Frontend CLI
|
||||
|
||||
### Capabilities
|
||||
- Manage Agents, Tasks, Tools, Containers, Complexity Levels, TaskTags, Schedules.
|
||||
- Create, update, delete, list resources.
|
||||
- Start tasks manually and view logs.
|
||||
- Trigger event-driven execution.
|
||||
|
||||
### Example Commands
|
||||
- orchestrate agent create --name <name> --task-tags <tags> --skills <skills>
|
||||
- orchestrate task create --name <name> --prompt-file <file> --container <image> --schedule <spec>
|
||||
- orchestrate container define --image <image> --tools <list> --resources <spec>
|
||||
- orchestrate task start --id <task-id>
|
||||
- orchestrate task status --id <task-id> --follow
|
||||
|
||||
---
|
||||
|
||||
## Example Task Definition (YAML)
|
||||
|
||||
```yaml
|
||||
id: task-200
|
||||
name: "Transform raw data"
|
||||
prompt: "Clean and normalize raw input data."
|
||||
container: "data-worker:v1.0"
|
||||
resolve_time_estimate: 20
|
||||
task_tags:
|
||||
- DATA_INGESTION
|
||||
complexity_level: Low
|
||||
schedule:
|
||||
type: immediate
|
||||
|
||||
group_id: "group-preprocessing"
|
||||
|
||||
tools:
|
||||
- python:3.11
|
||||
- pip:latest
|
||||
|
||||
escalation:
|
||||
reason: "Validation failed"
|
||||
prompt: "Please inspect input data and retry."
|
||||
Reference in New Issue
Block a user