docs

2026-04-16 20:50:38 +02:00
commit 4b8aed50d5
6 changed files with 1591 additions and 0 deletions
--- a/docs/features.md
+++ b/docs/features.md
@@ -0,0 +1,256 @@
+# FEATURE.md — Orchestration Service
+
+## Purpose
+Define features, data models, behaviors, and operational rules for an Orchestration Service that assigns and coordinates agents to complete tasks inside isolated running containers.
+
+---
+
+## Service Overview
+- Name — Orchestration Service for agent-based task execution.
+- Goal — Accept task definitions, schedule and dispatch agents, manage execution environments, track progress and complexity, and escalate on failures.
+- Primary actors — Task, Agent, Running Container, Tool, Scheduler, Operator.
+
+---
+
+## Task Model
+
+### Core Fields
+- id — Unique identifier.
+- name — Human-readable title.
+- prompt — Primary instruction text prepended to the task before execution.
+- container — Reference to the required running container.
+- resolve_time_estimate — Estimated completion time in minutes.
+- task_tags — Array of TaskTag values describing domain(s).
+- complexity_level — One of: Low, Middle, Hard, ExtremHard.
+- escalation — Optional object with reason and prompt.
+- metadata — Free-form key/value map.
+
+### Task Flow Control
+A task is always a first-class entity. There is no parent/subtask concept. Tasks chain to each other via `next_task`. Tasks are organised into groups using a `TaskGroup` entity that defines sequential or parallel execution.
+- schedule — Schedule definition.
+  - type: immediate, time-based, datetime-event.
+  - datetime-event is used for the schedule task feature (starts on a specific date/time).
+- next_task — Optional ID of the next task in the chain (within or across groups).
+- previous_task — Optional ID of the previous task in the chain.
+
+Specialised control-flow behaviours — iteration, conditional branching, and jump-with-prompt — are expressed as distinct task types (see **Specialised Task Types** below). Each specialised task type is a separate entity and a separate database row with a `task_type` discriminator.
+
+---
+
+## Task Groups
+
+### Purpose
+A `TaskGroup` is a first-class chain participant, just like a task. It owns a set of children, waits for all of them to complete, and then advances to its `next` item — which can itself be either a task or another group.
+
+Children of a group can be any mix of:
+- `AgentTask` (standard or any specialised type: `ForeachTask`, `GotoTask`, `ConditionTask`)
+- Nested `TaskGroup` instances (groups can be nested arbitrarily deep)
+
+### TaskGroup Fields
+- id — Unique identifier.
+- name — Human-readable label.
+- type — `Sequential` or `Parallel`.
+  - `Sequential` — Children run one at a time in the declared order. When one child completes, the next child is activated.
+  - `Parallel` — All children are dispatched simultaneously. The group completes when every child has finished.
+- children — Ordered list of `GroupChildRef` entries. Each entry identifies a child by its ID and kind (`Task` or `Group`).
+- next — Optional `GroupChildRef` identifying the item to activate after this group completes. It can point to a task or another group.
+- previous — Optional `GroupChildRef` identifying the item that precedes this group in the chain.
+
+### GroupChildRef
+A `GroupChildRef` is a value object that points to either a task or a group:
+- child_id — Guid of the child.
+- kind — `Task` or `Group`.
+
+### Orchestrator Behavior
+- A group begins execution when it is activated (either by the root scheduler or by the `next` pointer of a predecessor).
+- **Sequential**: the engine activates the first child; when that child completes, it activates the second child, and so on. After the last child completes, the group itself is marked complete and activates its `next`.
+- **Parallel**: the engine activates all children simultaneously. After all children complete, the group is marked complete and activates its `next`.
+- If a child is itself a `TaskGroup`, the same rules apply recursively — the nested group must complete before it counts as done for its parent.
+- A child belongs to at most one group.
+- Cycles in the chain (via `next` or through children) must be rejected at validation time (`DependencyCycleException`).
+
+---
+
+## Specialised Task Types
+
+Each specialised control-flow behaviour is a distinct domain entity. Each type is stored as a separate database row with a `task_type` discriminator column. All specialised types inherit the core task fields and `next_task` / `previous_task` chaining.
+
+### ForeachTask
+Iterates over a list of items and spawns one task instance per item.
+- foreach_items — Ordered list of string values to iterate over.
+- foreach_template_task_id — ID of the task template to clone per item.
+- Generated task instances are linked via `next_task` / `previous_task` in creation order and can be placed into a group.
+
+### GotoTask
+On completion, merges additional context into a target task and activates it — regardless of where the target sits in the chain.
+- goto_target_task_id — ID of the task to activate.
+- goto_prompt — Additional prompt text merged (appended) into the target task's `Prompt` before execution.
+
+### ConditionTask (If-Task)
+Evaluates a boolean expression against the task result and routes to one of two branches.
+- condition — A boolean expression evaluated against the task result.
+- true_next_task_id — ID of the task to activate when the condition evaluates to true.
+- false_next_task_id — ID of the task to activate when the condition evaluates to false.
+
+---
+
+## Schedule
+
+### Types
+- time-based — Cron or ISO schedule expression.
+- datetime-event — Start at a fixed date/time instant or range (e.g. `2026-04-01T09:00:00Z`). This enables schedule tasks.
+- immediate — Start as soon as created.
+
+The system uses normal event publishing and handling for task coordination; it does not use dedicated signal-based schedules.
+
+### Goto Tasks
+- Use a `GotoTask` (see **Specialised Task Types**) to jump to an arbitrary target task.
+- The target task receives the additional `goto_prompt` text merged (appended) into its `Prompt` and activates immediately when the `GotoTask` completes.
+
+### Retry Policy
+- attempts — integer.
+- backoff — fixed or exponential.
+- max_retries — integer.
+
+---
+
+## Complexity Level
+
+### Definition
+Defines how complex a task is and when it must be split.
+
+| Level       | Assignment rule                         | Characteristics |
+|-------------|-------------------------------------------|----------------|
+| Low         | resolve_time_estimate ≤ 30               | Simple logic; single topic. |
+| Middle      | 30 < resolve_time_estimate ≤ 60          | Moderate complexity; single topic. |
+| Hard        | resolve_time_estimate > 60               | Multi-topic; requires subtasks. |
+| ExtremHard  | resolve_time_estimate unknown/unbounded  | Multi-topic; unknown duration. |
+
+### Automatic Rules
+- resolve_time_estimate > 60 → Hard.
+- Missing or unbounded estimate → ExtremHard.
+- Parent prompt must be prepended to all subtasks.
+
+---
+
+## TaskTag
+
+### Purpose
+Enum identifying task working areas and agent capabilities.
+
+### Example Values
+- DATA_INGESTION
+- NLP
+- IMAGE_PROCESSING
+- DEPLOYMENT
+- SECURITY
+- TESTING
+- DOCUMENTATION
+
+---
+
+## Agent
+
+### Core Fields
+- id — Unique identifier.
+- skills — List of assigned skills.
+- task_tags — List of TaskTag values the agent can handle.
+- choose_rules — Matching rules for accepting tasks.
+- capabilities — wait_for_subagents, call_subagent, return_status, escalation.
+
+### Matching Rules Examples
+- Accept if task has a single TaskTag equal to one of agent's TaskTags.
+- Accept if agent supports a superset of task tags.
+- Accept only when agent has all TaskTags required by the task.
+
+### Execution Behavior
+- Agent executes inside its container and returns success or failure.
+- On failure, escalation.reason and escalation.prompt are required.
+- Agents may spawn sub-agents and wait for them.
+
+---
+
+## Running Container
+
+### Definition
+Isolated runtime environment required for a task.
+
+### Attributes
+- image — Container image reference.
+- tools — List of required tools and binaries.
+- resources — CPU, memory, disk limits.
+- network_policy — Allowed network rules.
+- volumes — Mounts and persistence rules.
+
+### Constraints
+- Agents cannot access resources outside their container.
+- Containers must declare all tools at startup.
+
+---
+
+## Tool
+
+### Definition
+Runnable binary or library available inside a container.
+
+### Attributes
+- name — Tool name.
+- version — Required version.
+- purpose — Description of tool’s role.
+- usage_guidelines — Allowed and forbidden operations.
+- install_instructions — Installation steps.
+
+---
+
+## Failure and Escalation Flow
+
+### Agent Failure
+- Agent returns failure with escalation.reason and escalation.prompt.
+- Orchestrator evaluates retry policy.
+- If retries exhausted → escalate to parent agent or human operator.
+
+### Logging
+- All failures and escalations must be logged with timestamps, container id, agent id, and error context.
+
+---
+
+## Frontend CLI
+
+### Capabilities
+- Manage Agents, Tasks, Tools, Containers, Complexity Levels, TaskTags, Schedules.
+- Create, update, delete, list resources.
+- Start tasks manually and view logs.
+- Trigger event-driven execution.
+
+### Example Commands
+- orchestrate agent create --name <name> --task-tags <tags> --skills <skills>
+- orchestrate task create --name <name> --prompt-file <file> --container <image> --schedule <spec>
+- orchestrate container define --image <image> --tools <list> --resources <spec>
+- orchestrate task start --id <task-id>
+- orchestrate task status --id <task-id> --follow
+
+---
+
+## Example Task Definition (YAML)
+
+```yaml
+id: task-200
+name: "Transform raw data"
+prompt: "Clean and normalize raw input data."
+container: "data-worker:v1.0"
+resolve_time_estimate: 20
+task_tags:
+  - DATA_INGESTION
+complexity_level: Low
+schedule:
+  type: immediate
+
+group_id: "group-preprocessing"
+
+tools:
+  - python:3.11
+  - pip:latest
+
+escalation:
+  reason: "Validation failed"
+  prompt: "Please inspect input data and retry."