3.5 KiB
3.5 KiB
| title | linkTitle | weight | description |
|---|---|---|---|
| Agent Workflow Diagram | UAT Agent Workflow Diagram | 5 | Visual workflow of a typical Agent S (Autonomous UAT Agent) run (gui_agent_cli.py) across Ministral, Holo, and VNC |
Agent Workflow Diagram (Autonomous UAT Agent)
This page provides a visual sketch of the typical workflow (example: gui_agent_cli.py).
Workflow (fallback without Mermaid)
If Mermaid rendering is not available or fails in your build, this section shows the same workflow as plain text.
Operator/Prompt
-> gui_agent_cli.py
-> (1) Planning request -> Ministral vLLM (thinking)
<- Next action intent
-> (2) Screenshot capture -> VNC Desktop / Firefox
<- PNG screenshot
-> (3) Grounding request -> Holo vLLM (vision)
<- Coordinates + element metadata
-> (4) Execute action -> VNC Desktop / Firefox
-> Artifacts saved -> results/ (logs, screenshots, JSON)
| Step | From | To | What | Output |
|---|---|---|---|---|
| 0 | Operator | gui_agent_cli.py | Provide goal / prompt | Goal text |
| 1 | gui_agent_cli.py | Ministral vLLM | Plan next step (text) | Next action intent |
| 2 | gui_agent_cli.py | VNC Desktop | Capture screenshot | PNG screenshot |
| 3 | gui_agent_cli.py | Holo vLLM | Ground UI element(s) | Coordinates + element metadata |
| 4 | gui_agent_cli.py | VNC Desktop | Execute click/type/scroll | UI state change |
| 5 | gui_agent_cli.py | results/ | Persist evidence | Logs + screenshots + JSON |
High-level data flow
flowchart LR
%% Left-to-right overview of one typical agent loop
user[Operator / Prompt] --> cli[Agent S script<br/>gui_agent_cli.py]
subgraph OTC[OTC (Open Telekom Cloud)]
subgraph MIN_HOST[ecs_ministral_L4]
MIN[(Ministral 3 8B<br/>Thinking / Planning)]
end
subgraph HOLO_HOST[ecs_holo_A40]
HOLO[(Holo 1.5-7B<br/>Vision / Grounding)]
end
subgraph TARGET[GUI test target]
VNC[VNC / Desktop]
FF[Firefox]
VNC --> FF
end
end
cli -->|1. plan step<br/>vLLM_THINKING_ENDPOINT| MIN
MIN -->|next action<br/>click / type / wait| cli
cli -->|2. capture screenshot| VNC
VNC -->|screenshot (PNG)| cli
cli -->|3. grounding request<br/>vLLM_VISION_ENDPOINT| HOLO
HOLO -->|coordinates + UI element info| cli
cli -->|4. execute action<br/>mouse / keyboard| VNC
cli -->|logs + screenshots| artifacts[(Artifacts<br/>logs, screenshots, JSON comms)]
Sequence (one loop)
sequenceDiagram
autonumber
actor U as Operator
participant CLI as gui_agent_cli.py
participant MIN as Ministral vLLM (ecs_ministral_L4)
participant VNC as VNC Desktop (Firefox)
participant HOLO as Holo vLLM (ecs_holo_A40)
U->>CLI: Provide goal / prompt
loop Step loop (until done)
CLI->>MIN: Plan next step (text-only reasoning)
MIN-->>CLI: Next action (intent)
CLI->>VNC: Capture screenshot
VNC-->>CLI: Screenshot (image)
CLI->>HOLO: Ground UI element(s) in screenshot
HOLO-->>CLI: Coordinates + element metadata
CLI->>VNC: Execute click/type/scroll
end
CLI-->>U: Result summary + saved artifacts
Notes
- The thinking and grounding models are separate on purpose: it improves coordinate reliability and makes failures easier to debug.
- The agent loop typically produces artifacts (logs + screenshots) which are later copied into D66 evidence bundles.