forked from DevFW-CICD/website-and-documentation

Tom Sakretz 097724558a Minor updates to existing pages

2026-01-30 15:23:28 +01:00

3.5 KiB

Raw Blame History

title	linkTitle	weight	description
Agent Workflow Diagram	UAT Agent Workflow Diagram	5	Visual workflow of a typical Agent S (Autonomous UAT Agent) run (gui_agent_cli.py) across Ministral, Holo, and VNC

Agent Workflow Diagram (Autonomous UAT Agent)

This page provides a visual sketch of the typical workflow (example: gui_agent_cli.py).

Workflow (fallback without Mermaid)

If Mermaid rendering is not available or fails in your build, this section shows the same workflow as plain text.

Operator/Prompt
  -> gui_agent_cli.py
     -> (1) Planning request  -> Ministral vLLM (thinking)
     <-     Next action intent
     -> (2) Screenshot capture -> VNC Desktop / Firefox
     <-     PNG screenshot
     -> (3) Grounding request  -> Holo vLLM (vision)
     <-     Coordinates + element metadata
     -> (4) Execute action     -> VNC Desktop / Firefox
     -> Artifacts saved        -> results/ (logs, screenshots, JSON)

Step	From	To	What	Output
0	Operator	gui_agent_cli.py	Provide goal / prompt	Goal text
1	gui_agent_cli.py	Ministral vLLM	Plan next step (text)	Next action intent
2	gui_agent_cli.py	VNC Desktop	Capture screenshot	PNG screenshot
3	gui_agent_cli.py	Holo vLLM	Ground UI element(s)	Coordinates + element metadata
4	gui_agent_cli.py	VNC Desktop	Execute click/type/scroll	UI state change
5	gui_agent_cli.py	results/	Persist evidence	Logs + screenshots + JSON

High-level data flow

flowchart LR
  %% Left-to-right overview of one typical agent loop

  user[Operator / Prompt] --> cli[Agent S script<br/>gui_agent_cli.py]

  subgraph OTC[OTC (Open Telekom Cloud)]
    subgraph MIN_HOST[ecs_ministral_L4]
      MIN[(Ministral 3 8B<br/>Thinking / Planning)]
    end

    subgraph HOLO_HOST[ecs_holo_A40]
      HOLO[(Holo 1.5-7B<br/>Vision / Grounding)]
    end

    subgraph TARGET[GUI test target]
      VNC[VNC / Desktop]
      FF[Firefox]
      VNC --> FF
    end
  end

  cli -->|1. plan step<br/>vLLM_THINKING_ENDPOINT| MIN
  MIN -->|next action<br/>click / type / wait| cli

  cli -->|2. capture screenshot| VNC
  VNC -->|screenshot (PNG)| cli

  cli -->|3. grounding request<br/>vLLM_VISION_ENDPOINT| HOLO
  HOLO -->|coordinates + UI element info| cli

  cli -->|4. execute action<br/>mouse / keyboard| VNC

  cli -->|logs + screenshots| artifacts[(Artifacts<br/>logs, screenshots, JSON comms)]

Sequence (one loop)

sequenceDiagram
  autonumber
  actor U as Operator
  participant CLI as gui_agent_cli.py
  participant MIN as Ministral vLLM (ecs_ministral_L4)
  participant VNC as VNC Desktop (Firefox)
  participant HOLO as Holo vLLM (ecs_holo_A40)

  U->>CLI: Provide goal / prompt

  loop Step loop (until done)
    CLI->>MIN: Plan next step (text-only reasoning)
    MIN-->>CLI: Next action (intent)

    CLI->>VNC: Capture screenshot
    VNC-->>CLI: Screenshot (image)

    CLI->>HOLO: Ground UI element(s) in screenshot
    HOLO-->>CLI: Coordinates + element metadata

    CLI->>VNC: Execute click/type/scroll
  end

  CLI-->>U: Result summary + saved artifacts

Notes

The thinking and grounding models are separate on purpose: it improves coordinate reliability and makes failures easier to debug.
The agent loop typically produces artifacts (logs + screenshots) which are later copied into D66 evidence bundles.

3.5 KiB Raw Blame History