From 097724558acd86b03d22665b7970d300628eda45 Mon Sep 17 00:00:00 2001 From: Tom Sakretz Date: Fri, 30 Jan 2026 15:23:28 +0100 Subject: [PATCH] Minor updates to existing pages --- .../en/docs/Autonomous UAT Agent/_index.md | 9 --- .../agent-workflow-diagram.md | 68 +++++++++++++------ .../golden-run-telekom-header-nav/index.md | 28 +------- .../running-auata-scripts.md | 4 +- 4 files changed, 51 insertions(+), 58 deletions(-) diff --git a/content/en/docs/Autonomous UAT Agent/_index.md b/content/en/docs/Autonomous UAT Agent/_index.md index 3e3442b..0800a62 100644 --- a/content/en/docs/Autonomous UAT Agent/_index.md +++ b/content/en/docs/Autonomous UAT Agent/_index.md @@ -10,12 +10,3 @@ description: > This section contains the core documentation for D66, focusing on how the Autonomous UAT Agent works and how to run it. -## Pages - -- [Overview](./overview.md) -- [Quickstart](./quickstart.md) -- [Running Autonomous UAT Agent Scripts](./running-auata-scripts.md) -- [Workflow Diagram](./agent-workflow-diagram.md) -- [Model Stack](./model-stack.md) -- [Outputs & Artifacts](./outputs-and-artifacts.md) -- [Troubleshooting](./troubleshooting.md) diff --git a/content/en/docs/Autonomous UAT Agent/agent-workflow-diagram.md b/content/en/docs/Autonomous UAT Agent/agent-workflow-diagram.md index 5215ab5..ff85f0d 100644 --- a/content/en/docs/Autonomous UAT Agent/agent-workflow-diagram.md +++ b/content/en/docs/Autonomous UAT Agent/agent-workflow-diagram.md @@ -10,42 +10,68 @@ description: > This page provides a **visual sketch** of the typical workflow (example: `gui_agent_cli.py`). +## Workflow (fallback without Mermaid) + +If Mermaid rendering is not available or fails in your build, this section shows the same workflow as plain text. + +```text +Operator/Prompt + -> gui_agent_cli.py + -> (1) Planning request -> Ministral vLLM (thinking) + <- Next action intent + -> (2) Screenshot capture -> VNC Desktop / Firefox + <- PNG screenshot + -> (3) Grounding request -> Holo vLLM (vision) + <- Coordinates + element metadata + -> (4) Execute action -> VNC Desktop / Firefox + -> Artifacts saved -> results/ (logs, screenshots, JSON) +``` + +| Step | From | To | What | Output | +|---:|---|---|---|---| +| 0 | Operator | gui_agent_cli.py | Provide goal / prompt | Goal text | +| 1 | gui_agent_cli.py | Ministral vLLM | Plan next step (text) | Next action intent | +| 2 | gui_agent_cli.py | VNC Desktop | Capture screenshot | PNG screenshot | +| 3 | gui_agent_cli.py | Holo vLLM | Ground UI element(s) | Coordinates + element metadata | +| 4 | gui_agent_cli.py | VNC Desktop | Execute click/type/scroll | UI state change | +| 5 | gui_agent_cli.py | results/ | Persist evidence | Logs + screenshots + JSON | + ## High-level data flow ```mermaid flowchart LR %% Left-to-right overview of one typical agent loop - user[Operator / Prompt] --> cli[Agent S script (gui_agent_cli.py)] + user[Operator / Prompt] --> cli[Agent S script
gui_agent_cli.py] - subgraph otc[OTC (Open Telekom Cloud)] - subgraph ecsMin["ecs_ministral_L4 (164.30.28.242:8001) - Ministral vLLM"] - ministral[(Ministral 3 8B - Thinking / Planning)] + subgraph OTC[OTC (Open Telekom Cloud)] + subgraph MIN_HOST[ecs_ministral_L4] + MIN[(Ministral 3 8B
Thinking / Planning)] end - subgraph ecsHolo["ecs_holo_A40 (164.30.22.166:8000) - Holo vLLM"] - holo[(Holo 1.5-7B - Vision / Grounding)] + subgraph HOLO_HOST[ecs_holo_A40] + HOLO[(Holo 1.5-7B
Vision / Grounding)] end - subgraph ecsGui["GUI test target (VNC + Firefox)"] - vnc[VNC / Desktop] - browser[Firefox] + subgraph TARGET[GUI test target] + VNC[VNC / Desktop] + FF[Firefox] + VNC --> FF end end - cli -->|1. plan step (vLLM_THINKING_ENDPOINT)| ministral - ministral -->|next action (click/type/wait)| cli + cli -->|1. plan step
vLLM_THINKING_ENDPOINT| MIN + MIN -->|next action
click / type / wait| cli - cli -->|2. capture screenshot| vnc - vnc -->|screenshot (PNG)| cli + cli -->|2. capture screenshot| VNC + VNC -->|screenshot (PNG)| cli - cli -->|3. grounding request (vLLM_VISION_ENDPOINT)| holo - holo -->|coordinates + UI element info| cli + cli -->|3. grounding request
vLLM_VISION_ENDPOINT| HOLO + HOLO -->|coordinates + UI element info| cli - cli -->|4. execute action (mouse/keyboard)| vnc - vnc --> browser + cli -->|4. execute action
mouse / keyboard| VNC - cli -->|logs + screenshots (results/ folder)| artifacts[(Artifacts: logs, screenshots, JSON comms)] + cli -->|logs + screenshots| artifacts[(Artifacts
logs, screenshots, JSON comms)] ``` ## Sequence (one loop) @@ -55,9 +81,9 @@ sequenceDiagram autonumber actor U as Operator participant CLI as gui_agent_cli.py - participant MIN as Ministral vLLM\n(ecs_ministral_L4) - participant VNC as VNC Desktop\n(Firefox) - participant HOLO as Holo vLLM\n(ecs_holo_A40) + participant MIN as Ministral vLLM (ecs_ministral_L4) + participant VNC as VNC Desktop (Firefox) + participant HOLO as Holo vLLM (ecs_holo_A40) U->>CLI: Provide goal / prompt diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/index.md b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/index.md index 4d72914..fd2b196 100644 --- a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/index.md +++ b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/index.md @@ -35,32 +35,11 @@ Return findings in the 'issues' field as a list of objects: - problem: What doesn't work - recommendation: How to fix it If no problems found, return an empty array: []" \ - --max-steps 30 + --max-steps 15 ``` ## Artifacts -### In-repo evidence (this page bundle) - -Place the evidence files here: - -- Screenshots: `screenshots/` -- Text log: `logs/run.log` -- Optional JSON communication log(s): `logs/calibration_log_*.json` - -If you have ~15 screenshots, name them in a stable order, e.g.: - -- `screenshots/uat_agent_step_001.png` … `screenshots/uat_agent_step_015.png` - -### Runtime output location (where they come from) - -The CLI defaults to: - -- `./results/gui_agent_cli//screenshots/` -- `./results/gui_agent_cli//logs/run.log` - -Copy the files you want to publish into this page bundle so they render in the docs. - ## Screenshot gallery ### Thumbnail grid (recommended for many screenshots) @@ -135,8 +114,3 @@ Click any thumbnail to open the full image. {{< figure src="screenshots/uat_agent_step_013.png" caption="Step 013" >}} - -## Notes - -- If repo size becomes an issue, publish only a curated subset (e.g. 6–8 key frames) and link to the full run folder externally. -- If you want a thumbnail grid instead of full-width figures, say so and BMad Master will add a compact gallery layout. diff --git a/content/en/docs/Autonomous UAT Agent/running-auata-scripts.md b/content/en/docs/Autonomous UAT Agent/running-auata-scripts.md index f483835..3408f3f 100644 --- a/content/en/docs/Autonomous UAT Agent/running-auata-scripts.md +++ b/content/en/docs/Autonomous UAT Agent/running-auata-scripts.md @@ -12,6 +12,8 @@ The **Autonomous UAT Agent** is the overall UX/UI testing use case built on top All commands below assume you are running from the **Agent-S repository root** (Linux/ECS), `~/Projects/Agent_S3/Agent-S`. To do that, connect to the server via SSH. You will need a key pair for authentication and an open inbound port in the firewall. For information on how to obtain the key pair and request firewall access, contact [tom.sakretz@telekom.de](mailto:tom.sakretz@telekom.de). +## Template for running a script from command line terminal + ### 1) Connect from Windows ```powershell @@ -36,7 +38,7 @@ firefox & ### 3) One-command recommended run (ECS) -If you only run one thing to produce clean, repeatable evidence (screenshots with click markers), run the following command CLI: +If you only want to produce clean, repeatable evidence (screenshots with click markers), run the following command CLI: ```bash python staging_scripts/gui_agent_cli.py --prompt "Go to telekom.de and click the cart icon" --max-steps 10