diff --git a/content/en/docs/Autonomous UAT Agent/model-stack.md b/content/en/docs/Autonomous UAT Agent/model-stack.md index 0282683..d0589a6 100644 --- a/content/en/docs/Autonomous UAT Agent/model-stack.md +++ b/content/en/docs/Autonomous UAT Agent/model-stack.md @@ -6,13 +6,13 @@ description: > Thinking vs grounding model split for D66 (current state and target state) --- -# Model Stack (D66) +# Model Stack For a visual overview of how the models interact with the VNC-based GUI automation loop, see: [Workflow Diagram](./agent-workflow-diagram.md) ## Requirement -D66 must use **open-source models from European companies**. +The Autonomous UAT Agent must use **open-source models from European companies**. This has been a project requirement form the very beginnning of this project. ## Target setup @@ -45,7 +45,6 @@ The Agent S framework runs an iterative loop: it uses a reasoning model to decid - Deployment: vLLM OpenAI-compatible endpoint (chat completions) - Endpoint env var: `vLLM_THINKING_ENDPOINT` - Current server (deployment reference): `http://164.30.28.242:8001/v1` - - Recommendation: set `vLLM_THINKING_ENDPOINT` explicitly (do not rely on script defaults). **Operational note:** vLLM is configured to **auto-start on server boot** (OTC ECS restart) via `systemd`. @@ -80,9 +79,4 @@ The Agent S framework runs an iterative loop: it uses a reasoning model to decid - `grounding_width`: `3840` - `grounding_height`: `2160` -Notes: - -- Prompting and output-format hardening (reliability work): - - `docs/story-026-001-context.md` (Holo output reliability) - - `docs/story-025-001-context.md` (double grounding / calibration) diff --git a/content/en/docs/Autonomous UAT Agent/results/_index.md b/content/en/docs/Autonomous UAT Agent/results/_index.md new file mode 100644 index 0000000..8de5742 --- /dev/null +++ b/content/en/docs/Autonomous UAT Agent/results/_index.md @@ -0,0 +1,17 @@ +--- +title: "Results & Findings" +linkTitle: "Results" +weight: 20 +description: > + Results, findings, and evidence artifacts for D66 +--- + +# Results & Findings (D66) + +This section contains the outputs that support D66 claims: findings summaries and pointers to logs, screenshots, and run artifacts. + +## Pages + +- [PoC Validation](./poc-validation.md) +- [Golden Run (Telekom Header Navigation)](./golden-run-telekom-header-nav/) +- [Logs & Artifacts](./logs-and-artifacts.md) diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/index.md b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/index.md new file mode 100644 index 0000000..4d72914 --- /dev/null +++ b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/index.md @@ -0,0 +1,142 @@ +--- +title: "Golden Run: Telekom Header Navigation" +linkTitle: "Golden Run (Telekom)" +weight: 3 +description: > + Evidence pack (screenshots + logs) for the golden run on www.telekom.de header navigation +--- + +# Golden Run: Telekom Header Navigation + +This page is the evidence pack for the **Autonomous UAT Agent** golden run on **www.telekom.de**. + +## Run intent + +- Goal: Test interactive elements in the header navigation for functional weaknesses +- Output: Click-marked screenshots + per-run log (and optionally model communication JSON) + +## How the run was executed (ECS) + +Command (as used in the runbook): + +```bash +python staging_scripts/gui_agent_cli.py \ + --prompt "Role: You are a UI/UX testing agent specializing in functional correctness. +Goal: Test all interactive elements in the header navigation on www.telekom.de for functional weaknesses. +Tasks: +1. Navigate to the website +2. Identify and test interactive elements (buttons, links, forms, menus) +3. Check for broken flows, defective links, non-functioning elements +4. Document issues found +Report Format: +Return findings in the 'issues' field as a list of objects: +- element: Name/description of the element +- location: Where on the page +- problem: What doesn't work +- recommendation: How to fix it +If no problems found, return an empty array: []" \ + --max-steps 30 +``` + +## Artifacts + +### In-repo evidence (this page bundle) + +Place the evidence files here: + +- Screenshots: `screenshots/` +- Text log: `logs/run.log` +- Optional JSON communication log(s): `logs/calibration_log_*.json` + +If you have ~15 screenshots, name them in a stable order, e.g.: + +- `screenshots/uat_agent_step_001.png` … `screenshots/uat_agent_step_015.png` + +### Runtime output location (where they come from) + +The CLI defaults to: + +- `./results/gui_agent_cli//screenshots/` +- `./results/gui_agent_cli//logs/run.log` + +Copy the files you want to publish into this page bundle so they render in the docs. + +## Screenshot gallery + +### Thumbnail grid (recommended for many screenshots) + +Click any thumbnail to open the full image. + +
+
+ UAT agent step 001 +
Step 001
+
+
+ UAT agent step 002 +
Step 002
+
+
+ UAT agent step 003 +
Step 003
+
+
+ UAT agent step 004 +
Step 004
+
+
+ UAT agent step 005 +
Step 005
+
+
+ UAT agent step 006 +
Step 006
+
+
+ UAT agent step 007 +
Step 007
+
+
+ UAT agent step 008 +
Step 008
+
+
+ UAT agent step 010 +
Step 010
+
+
+ UAT agent step 011 +
Step 011
+
+
+ UAT agent step 012 +
Step 012
+
+
+ UAT agent step 013 +
Step 013
+
+
+ +
+ Full-size images (stacked) + + {{< figure src="screenshots/uat_agent_step_001.png" caption="Step 001" >}} + {{< figure src="screenshots/uat_agent_step_002.png" caption="Step 002" >}} + {{< figure src="screenshots/uat_agent_step_003.png" caption="Step 003" >}} + {{< figure src="screenshots/uat_agent_step_004.png" caption="Step 004" >}} + {{< figure src="screenshots/uat_agent_step_005.png" caption="Step 005" >}} + {{< figure src="screenshots/uat_agent_step_006.png" caption="Step 006" >}} + {{< figure src="screenshots/uat_agent_step_007.png" caption="Step 007" >}} + {{< figure src="screenshots/uat_agent_step_008.png" caption="Step 008" >}} + {{< figure src="screenshots/uat_agent_step_010.png" caption="Step 010" >}} + {{< figure src="screenshots/uat_agent_step_011.png" caption="Step 011" >}} + {{< figure src="screenshots/uat_agent_step_012.png" caption="Step 012" >}} + {{< figure src="screenshots/uat_agent_step_013.png" caption="Step 013" >}} + +
+ +## Notes + +- If repo size becomes an issue, publish only a curated subset (e.g. 6–8 key frames) and link to the full run folder externally. +- If you want a thumbnail grid instead of full-width figures, say so and BMad Master will add a compact gallery layout. diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/logs/.gitkeep b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/logs/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/.gitkeep b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_001.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_001.png new file mode 100644 index 0000000..d5a6b41 Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_001.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_002.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_002.png new file mode 100644 index 0000000..8548a2f Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_002.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_003.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_003.png new file mode 100644 index 0000000..e1940e5 Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_003.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_004.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_004.png new file mode 100644 index 0000000..780e63a Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_004.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_005.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_005.png new file mode 100644 index 0000000..8049842 Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_005.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_006.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_006.png new file mode 100644 index 0000000..03e696e Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_006.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_007.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_007.png new file mode 100644 index 0000000..86188b1 Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_007.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_008.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_008.png new file mode 100644 index 0000000..3364330 Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_008.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_010.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_010.png new file mode 100644 index 0000000..7d6b64c Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_010.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_011.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_011.png new file mode 100644 index 0000000..9911841 Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_011.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_012.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_012.png new file mode 100644 index 0000000..c92c6c0 Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_012.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_013.png b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_013.png new file mode 100644 index 0000000..4a1eb08 Binary files /dev/null and b/content/en/docs/Autonomous UAT Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_013.png differ diff --git a/content/en/docs/Autonomous UAT Agent/results/logs-and-artifacts.md b/content/en/docs/Autonomous UAT Agent/results/logs-and-artifacts.md new file mode 100644 index 0000000..cee348f --- /dev/null +++ b/content/en/docs/Autonomous UAT Agent/results/logs-and-artifacts.md @@ -0,0 +1,36 @@ +--- +title: "Logs & Artifacts" +linkTitle: "Logs & Artifacts" +weight: 2 +description: > + Where to find logs, screenshots, and reports relevant to D66 +--- + +# Logs & Artifacts + +## Repo locations + +- Local calibration and run logs: `logs/` +- Script outputs (varies by run): + - `Backend/IPCEI-UX-Agent-S3/staging_scripts/uxqa.db` + - `Backend/IPCEI-UX-Agent-S3/staging_scripts/Screenshots/` + - `Backend/IPCEI-UX-Agent-S3/staging_scripts/agent_output/` + +- Golden run evidence pack (recommended publishing location in docs): + - `docs/D66/results/golden-run-telekom-header-nav/` + +## What to capture for D66 + +- A representative run per capability: + - functional correctness checks + - visual quality audits + - task-based UX smoke tests +- For each run, capture: + - target URL + - timestamp + - key screenshots/overlays + - issue summaries (structured) + +## Notes + +If needed, we can add a consistent run naming convention and a small “how to export a D66 evidence pack” procedure. diff --git a/content/en/docs/Autonomous UAT Agent/results/poc-validation.md b/content/en/docs/Autonomous UAT Agent/results/poc-validation.md new file mode 100644 index 0000000..f2a345a --- /dev/null +++ b/content/en/docs/Autonomous UAT Agent/results/poc-validation.md @@ -0,0 +1,29 @@ +--- +title: "PoC Validation" +linkTitle: "PoC Validation" +weight: 1 +description: > + What was validated and where to find the evidence +--- + +# PoC Validation Evidence + +## What was validated + +- Autonomous GUI interaction via the Autonomous UAT Agent (Agent S3-based scripts) +- Generation of UX findings and recommendations +- Production of reproducible artifacts (screenshots, logs) + +## Where to find evidence in this repo + +- Run logs and calibration logs: `logs/` +- Story evidence and investigation notes: + - `docs/story-025-001-context.md` + - `docs/story-026-001-context.md` + - `docs/story-023-003-coordinate-space-detection.md` + +## How to reproduce a run + +1. Choose a script in `Backend/IPCEI-UX-Agent-S3/staging_scripts/` +2. Set target URL (if supported) via `AS2_TARGET_URL` +3. Run and capture artifacts (see `docs/D66/documentation/outputs-and-artifacts.md`) diff --git a/content/en/docs/Autonomous UAT Agent/running-auata-scripts.md b/content/en/docs/Autonomous UAT Agent/running-auata-scripts.md index 7a797d2..f483835 100644 --- a/content/en/docs/Autonomous UAT Agent/running-auata-scripts.md +++ b/content/en/docs/Autonomous UAT Agent/running-auata-scripts.md @@ -8,114 +8,9 @@ description: > # Running Autonomous UAT Agent Scripts -All commands below assume you are running from the **Agent-S repository root** (Linux/ECS), i.e. the folder that contains `staging_scripts/`. - The **Autonomous UAT Agent** is the overall UX/UI testing use case built on top of the Agent S codebase and scripts in this repo. -If you are inside the monorepo workspace, first `cd ~/Projects/Agent_S3/Agent-S` on the Ubuntu ECS and then run the same commands. - -## One-command recommended run (ECS) - -If you only run one thing to produce clean, repeatable evidence (screenshots with click markers), run the calibration CLI: - -```bash -DISPLAY=:1 python staging_scripts/gui_agent_cli.py --prompt "Go to telekom.de and click the cart icon" --max-steps 10 -``` - -This writes screenshots to `./results/gui_agent_cli//screenshots/`. - -## ECS runner notes - -- **Working directory matters:** the default output path is relative to the current working directory (it should be the Agent-S repo root on ECS). -- **GUI required:** `pyautogui` needs an X server (`DISPLAY=:1` is assumed by most scripts). -- **Persistence:** if you want results after the task ends, ensure `./results/` is on a mounted volume or copied out as an artifact. - -## Prerequisites (runtime) - -- Linux GUI session (VNC/Xvfb) because these scripts drive a real browser via `pyautogui`. -- A working `DISPLAY` (most of the scripts assume `:1`). -- Network access to the model endpoints (thinking + vision/grounding). - -Common environment variables used by the vLLM-backed scripts: - -- `vLLM_THINKING_ENDPOINT` (default in code if unset) -- `vLLM_VISION_ENDPOINT` (default in code if unset) -- `vLLM_API_KEY` (default: `dummy-key`) - -## Key scripts (repo locations) - -Core scripts referenced for D66 demonstrations: - -- UI check (Agent S3): `staging_scripts/1_UI_check_AS3.py` -- Functional correctness check: `staging_scripts/1_UI_functional_correctness_check.py` -- Visual quality audit: `staging_scripts/2_UX_visual_quality_audit.py` -- Task-based UX flow (newsletter): `staging_scripts/3_UX_taskflow_newsletter_signup.py` - -Calibration / CLI entry point (used for click coordinate scaling validation): - -- GUI Agent CLI (Holo click calibration): `staging_scripts/gui_agent_cli.py` - -Legacy / historical: - -- `staging_scripts/old scripts/agent_s3_1_old.py` -- `staging_scripts/old scripts/agent_s3_ui_test.py` - -## Common configuration knobs - -Many scripts support these environment variables: - -- `AS2_TARGET_URL`: website URL to test -- `AS2_MAX_STEPS`: max steps (varies by script) -- `ASK_EVERY_STEPS`: interactive prompt cadence - -Execution environment: - -- Linux GUI environment typically expects `DISPLAY=:1` - -## Recommended: run gui_agent_cli.py (calibration / click precision) - -This is the “clean” CLI entry point for repeatable calibration runs. - -Minimal run (prompt mode): - -```bash -python staging_scripts/gui_agent_cli.py \ - --prompt "Go to telekom.de and click the cart icon" \ - --max-steps 30 -``` - -Optional scaling factors for debugging (defaults to `1.0` / `1.0`): - -```bash -python staging_scripts/gui_agent_cli.py \ - --prompt "Go to telekom.de and click the cart icon" \ - --x-scale 2.0 \ - --y-scale 2.0 \ - --max-steps 30 -``` - -Outputs: - -- Default run folder: `./results/gui_agent_cli//` -- Screenshots: `./results/gui_agent_cli//screenshots/` -- Text log (stdout/stderr): `./results/gui_agent_cli//logs/run.log` - -If `--enable-logging` is set, the script also writes a structured JSON communication log (Story 026-002) into the same run `logs/` folder by default. - -Enable model communication logging (recommended when debugging mis-clicks): - -```bash -python staging_scripts/gui_agent_cli.py \ - --prompt "Click the Telekom icon" \ - --max-steps 10 \ - --output-dir ./results/gui_agent_cli/debug_run_telekom_icon \ - --enable-logging \ - --log-output-dir ./results/gui_agent_cli/debug_run_telekom_icon/logs -``` - -## Golden run (terminal on ECS) - -This is the “golden run” command sequence currently used for D66 evidence generation. +All commands below assume you are running from the **Agent-S repository root** (Linux/ECS), `~/Projects/Agent_S3/Agent-S`. To do that, connect to the server via SSH. You will need a key pair for authentication and an open inbound port in the firewall. For information on how to obtain the key pair and request firewall access, contact [tom.sakretz@telekom.de](mailto:tom.sakretz@telekom.de). ### 1) Connect from Windows @@ -127,7 +22,6 @@ ssh -i "C:\Path to KeyPair\KeyPair-ECS.pem" ubuntu@80.158.3.120 ```bash # Activate venv -# Recommended: use the Agent S3 venv source ~/Projects/Agent_S3/Agent-S/venv/bin/activate # Go to Agent-S repo root @@ -140,7 +34,45 @@ export DISPLAY=":1" firefox & ``` -### 3) Run the golden prompt +### 3) One-command recommended run (ECS) + +If you only run one thing to produce clean, repeatable evidence (screenshots with click markers), run the following command CLI: + +```bash +python staging_scripts/gui_agent_cli.py --prompt "Go to telekom.de and click the cart icon" --max-steps 10 +``` + +This will produce: + +- Screenshots: `./results/gui_agent_cli//screenshots/` +- Text log: `./results/gui_agent_cli//logs/run.log` +- JSON comm log: `./results/gui_agent_cli//logs/run.log` + + +## Prerequisites (runtime) + +- Linux GUI session (VNC/Xvfb) because these scripts drive a real browser via `pyautogui`. +- A working `DISPLAY` (default for all scripts is `:1`). +- Network access to the model endpoints (thinking + vision/grounding). + + +## Key scripts (repo locations) + +The GUI Agent CLI script is the most flexible entry point and is therefore the only one described in more detail in this documentation. Assumes you are in project root `~/Projects/Agent_S3/Agent-S`. + +- GUI Agent CLI: `staging_scripts/gui_agent_cli.py` + +Historically, we used purpose-built scripts for individual tasks. We now recommend using `gui_agent_cli.py` as the primary entry point, because the same scenarios can usually be expressed via a well-scoped prompt while keeping the workflow more flexible and easier to maintain. The scripts below are kept for reference and may not reflect the current, preferred workflow. + +- UI check (Agent S3): `staging_scripts/1_UI_check_AS3.py` +- Functional correctness check: `staging_scripts/1_UI_functional_correctness_check.py` +- Visual quality audit: `staging_scripts/2_UX_visual_quality_audit.py` +- Task-based UX flow (newsletter): `staging_scripts/3_UX_taskflow_newsletter_signup.py` + + +## Golden run (terminal on ECS) + +This is the “golden run” command sequence currently used for D66 evidence generation. The golden run is a complete workflow that works as a template for reproducible outcomes. ```bash python staging_scripts/gui_agent_cli.py \ @@ -167,63 +99,14 @@ Golden run artifacts: - Text log: `./results/gui_agent_cli//logs/run.log` - Optional JSON comm log (if enabled): `./results/gui_agent_cli//logs/calibration_log_*.json` +An example golden run with screenshots and log outputs can be seen in [Results](./results/). + ## Alternative: run the agent via a web interface (Frontend) Work in progress. We are currently updating the web-based view and its ECS runner integration. This section will be filled with the correct, up-to-date instructions once the frontend flow supports the current Autonomous UAT Agent + `gui_agent_cli.py` workflow. -## Run the D66 evaluation scripts (staging_scripts) - -These scripts are used for D66-style evaluation runs and tend to write their artifacts into `staging_scripts/` (DB, screenshots, JSON). - -### UI check (Agent S3) - -Typical pattern (URL via env var + optional run control args): - -```bash -export AS2_TARGET_URL="https://www.leipzig.de" -export AS2_MAX_STEPS="20" - -python staging_scripts/1_UI_check_AS3.py --auto-yes --ask-every 1000 -``` - -Notes: - -- Supports `--job-id ` (used by runners) and uses `JOB_ID` as a fallback. -- Writes JSON to `./agent_output/raw_json//` and screenshots/overlays to `staging_scripts/Screenshots/...`. - -### Functional correctness check - -```bash -export AS2_TARGET_URL="https://www.leipzig.de" -export AS2_MAX_STEPS="0" # 0 = no limit (script-specific) - -python staging_scripts/1_UI_functional_correctness_check.py --auto-yes --ask-every 1000 -``` - -### Visual quality audit - -This script currently uses a hardcoded `WEBSITE_URL` near the top of the file. Update it and then run: - -```bash -python staging_scripts/2_UX_visual_quality_audit.py --auto-yes --ask-every 10 -``` - -### Task-based UX flow (newsletter) - -This script is currently a staging/WIP script; verify it runs in your environment before relying on it for evidence. - -## Outputs to expect - -Most scripts record one or more of: - -- `uxqa.db` (run log DB) -- screenshots/overlays under `staging_scripts/Screenshots/...` -- JSON step outputs under `agent_output/` (paths vary by script) -- calibration CLI outputs under `./results/gui_agent_cli//` - -See [Outputs & Artifacts](./outputs-and-artifacts.md). ## Notes on model usage