Update to running auata scripts and results

2026-01-29 15:22:46 +01:00 · 2026-01-29 15:22:46 +01:00 · 242a4b8a79
commit 242a4b8a79
parent 22e6c208ac
20 changed files with 268 additions and 167 deletions
--- a/content/en/docs/Autonomous
+++ b/content/en/docs/Autonomous
@ -6,13 +6,13 @@ description: >
  Thinking vs grounding model split for D66 (current state and target state)
 ---

-# Model Stack (D66)
+# Model Stack

 For a visual overview of how the models interact with the VNC-based GUI automation loop, see: [Workflow Diagram](./agent-workflow-diagram.md)

 ## Requirement

-D66 must use **open-source models from European companies**.
+The Autonomous UAT Agent must use **open-source models from European companies**. This has been a project requirement form the very beginnning of this project.

 ## Target setup

@ -45,7 +45,6 @@ The Agent S framework runs an iterative loop: it uses a reasoning model to decid
 - Deployment: vLLM OpenAI-compatible endpoint (chat completions)
  - Endpoint env var: `vLLM_THINKING_ENDPOINT`
  - Current server (deployment reference): `http://164.30.28.242:8001/v1`
-  - Recommendation: set `vLLM_THINKING_ENDPOINT` explicitly (do not rely on script defaults).

 **Operational note:** vLLM is configured to **auto-start on server boot** (OTC ECS restart) via `systemd`.

@ -80,9 +79,4 @@ The Agent S framework runs an iterative loop: it uses a reasoning model to decid
  - `grounding_width`: `3840`
  - `grounding_height`: `2160`

-Notes:
-
- Prompting and output-format hardening (reliability work):
-  - `docs/story-026-001-context.md` (Holo output reliability)
-  - `docs/story-025-001-context.md` (double grounding / calibration)

--- a/content/en/docs/Autonomous
+++ b/content/en/docs/Autonomous
@ -0,0 +1,17 @@
+---
+title: "Results & Findings"
+linkTitle: "Results"
+weight: 20
+description: >
+  Results, findings, and evidence artifacts for D66
+---
+
+# Results & Findings (D66)
+
+This section contains the outputs that support D66 claims: findings summaries and pointers to logs, screenshots, and run artifacts.
+
+## Pages
+
+- [PoC Validation](./poc-validation.md)
+- [Golden Run (Telekom Header Navigation)](./golden-run-telekom-header-nav/)
+- [Logs & Artifacts](./logs-and-artifacts.md)
--- a/Agent/results/golden-run-telekom-header-nav/index.md
+++ b/Agent/results/golden-run-telekom-header-nav/index.md
@ -0,0 +1,142 @@
+---
+title: "Golden Run: Telekom Header Navigation"
+linkTitle: "Golden Run (Telekom)"
+weight: 3
+description: >
+  Evidence pack (screenshots + logs) for the golden run on www.telekom.de header navigation
+---
+
+# Golden Run: Telekom Header Navigation
+
+This page is the evidence pack for the **Autonomous UAT Agent** golden run on **www.telekom.de**.
+
+## Run intent
+
+- Goal: Test interactive elements in the header navigation for functional weaknesses
+- Output: Click-marked screenshots + per-run log (and optionally model communication JSON)
+
+## How the run was executed (ECS)
+
+Command (as used in the runbook):
+
+```bash
+python staging_scripts/gui_agent_cli.py \
+  --prompt "Role: You are a UI/UX testing agent specializing in functional correctness.
+Goal: Test all interactive elements in the header navigation on www.telekom.de for functional weaknesses.
+Tasks:
+1. Navigate to the website
+2. Identify and test interactive elements (buttons, links, forms, menus)
+3. Check for broken flows, defective links, non-functioning elements
+4. Document issues found
+Report Format:
+Return findings in the 'issues' field as a list of objects:
+- element: Name/description of the element
+- location: Where on the page
+- problem: What doesn't work
+- recommendation: How to fix it
+If no problems found, return an empty array: []" \
+  --max-steps 30
+```
+
+## Artifacts
+
+### In-repo evidence (this page bundle)
+
+Place the evidence files here:
+
+- Screenshots: `screenshots/`
+- Text log: `logs/run.log`
+- Optional JSON communication log(s): `logs/calibration_log_*.json`
+
+If you have ~15 screenshots, name them in a stable order, e.g.:
+
+- `screenshots/uat_agent_step_001.png` … `screenshots/uat_agent_step_015.png`
+
+### Runtime output location (where they come from)
+
+The CLI defaults to:
+
+- `./results/gui_agent_cli/<timestamp>/screenshots/`
+- `./results/gui_agent_cli/<timestamp>/logs/run.log`
+
+Copy the files you want to publish into this page bundle so they render in the docs.
+
+## Screenshot gallery
+
+### Thumbnail grid (recommended for many screenshots)
+
+Click any thumbnail to open the full image.
+
+<div style="display:grid; grid-template-columns: repeat(auto-fit, minmax(240px, 1fr)); gap: 12px; align-items:start;">
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_001.png"><img src="screenshots/uat_agent_step_001.png" alt="UAT agent step 001" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 001</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_002.png"><img src="screenshots/uat_agent_step_002.png" alt="UAT agent step 002" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 002</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_003.png"><img src="screenshots/uat_agent_step_003.png" alt="UAT agent step 003" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 003</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_004.png"><img src="screenshots/uat_agent_step_004.png" alt="UAT agent step 004" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 004</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_005.png"><img src="screenshots/uat_agent_step_005.png" alt="UAT agent step 005" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 005</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_006.png"><img src="screenshots/uat_agent_step_006.png" alt="UAT agent step 006" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 006</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_007.png"><img src="screenshots/uat_agent_step_007.png" alt="UAT agent step 007" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 007</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_008.png"><img src="screenshots/uat_agent_step_008.png" alt="UAT agent step 008" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 008</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_010.png"><img src="screenshots/uat_agent_step_010.png" alt="UAT agent step 010" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 010</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_011.png"><img src="screenshots/uat_agent_step_011.png" alt="UAT agent step 011" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 011</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_012.png"><img src="screenshots/uat_agent_step_012.png" alt="UAT agent step 012" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 012</figcaption>
+  </figure>
+  <figure style="margin:0;">
+    <a href="screenshots/uat_agent_step_013.png"><img src="screenshots/uat_agent_step_013.png" alt="UAT agent step 013" style="width:100%; height:auto; border:1px solid #ddd; border-radius:6px;" /></a>
+    <figcaption style="text-align:center; font-size:0.9em;">Step 013</figcaption>
+  </figure>
+</div>
+
+<details>
+  <summary>Full-size images (stacked)</summary>
+
+  {{< figure src="screenshots/uat_agent_step_001.png" caption="Step 001" >}}
+  {{< figure src="screenshots/uat_agent_step_002.png" caption="Step 002" >}}
+  {{< figure src="screenshots/uat_agent_step_003.png" caption="Step 003" >}}
+  {{< figure src="screenshots/uat_agent_step_004.png" caption="Step 004" >}}
+  {{< figure src="screenshots/uat_agent_step_005.png" caption="Step 005" >}}
+  {{< figure src="screenshots/uat_agent_step_006.png" caption="Step 006" >}}
+  {{< figure src="screenshots/uat_agent_step_007.png" caption="Step 007" >}}
+  {{< figure src="screenshots/uat_agent_step_008.png" caption="Step 008" >}}
+  {{< figure src="screenshots/uat_agent_step_010.png" caption="Step 010" >}}
+  {{< figure src="screenshots/uat_agent_step_011.png" caption="Step 011" >}}
+  {{< figure src="screenshots/uat_agent_step_012.png" caption="Step 012" >}}
+  {{< figure src="screenshots/uat_agent_step_013.png" caption="Step 013" >}}
+
+</details>
+
+## Notes
+
+- If repo size becomes an issue, publish only a curated subset (e.g. 6–8 key frames) and link to the full run folder externally.
+- If you want a thumbnail grid instead of full-width figures, say so and BMad Master will add a compact gallery layout.
--- a/Agent/results/golden-run-telekom-header-nav/logs/.gitkeep
+++ b/Agent/results/golden-run-telekom-header-nav/logs/.gitkeep
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/.gitkeep
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/.gitkeep
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_001.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_001.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_002.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_002.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_003.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_003.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_004.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_004.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_005.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_005.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_006.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_006.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_007.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_007.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_008.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_008.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_010.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_010.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_011.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_011.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_012.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_012.png
--- a/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_013.png
+++ b/Agent/results/golden-run-telekom-header-nav/screenshots/uat_agent_step_013.png
--- a/Agent/results/logs-and-artifacts.md
+++ b/Agent/results/logs-and-artifacts.md
@ -0,0 +1,36 @@
+---
+title: "Logs & Artifacts"
+linkTitle: "Logs & Artifacts"
+weight: 2
+description: >
+  Where to find logs, screenshots, and reports relevant to D66
+---
+
+# Logs & Artifacts
+
+## Repo locations
+
+- Local calibration and run logs: `logs/`
+- Script outputs (varies by run):
+  - `Backend/IPCEI-UX-Agent-S3/staging_scripts/uxqa.db`
+  - `Backend/IPCEI-UX-Agent-S3/staging_scripts/Screenshots/`
+  - `Backend/IPCEI-UX-Agent-S3/staging_scripts/agent_output/`
+
+- Golden run evidence pack (recommended publishing location in docs):
+  - `docs/D66/results/golden-run-telekom-header-nav/`
+
+## What to capture for D66
+
+- A representative run per capability:
+  - functional correctness checks
+  - visual quality audits
+  - task-based UX smoke tests
+- For each run, capture:
+  - target URL
+  - timestamp
+  - key screenshots/overlays
+  - issue summaries (structured)
+
+## Notes
+
+If needed, we can add a consistent run naming convention and a small “how to export a D66 evidence pack” procedure.
--- a/Agent/results/poc-validation.md
+++ b/Agent/results/poc-validation.md
@ -0,0 +1,29 @@
+---
+title: "PoC Validation"
+linkTitle: "PoC Validation"
+weight: 1
+description: >
+  What was validated and where to find the evidence
+---
+
+# PoC Validation Evidence
+
+## What was validated
+
+- Autonomous GUI interaction via the Autonomous UAT Agent (Agent S3-based scripts)
+- Generation of UX findings and recommendations
+- Production of reproducible artifacts (screenshots, logs)
+
+## Where to find evidence in this repo
+
+- Run logs and calibration logs: `logs/`
+- Story evidence and investigation notes:
+  - `docs/story-025-001-context.md`
+  - `docs/story-026-001-context.md`
+  - `docs/story-023-003-coordinate-space-detection.md`
+
+## How to reproduce a run
+
+1. Choose a script in `Backend/IPCEI-UX-Agent-S3/staging_scripts/`
+2. Set target URL (if supported) via `AS2_TARGET_URL`
+3. Run and capture artifacts (see `docs/D66/documentation/outputs-and-artifacts.md`)
--- a/Agent/running-auata-scripts.md
+++ b/Agent/running-auata-scripts.md
@ -8,114 +8,9 @@ description: >

 # Running Autonomous UAT Agent Scripts

-All commands below assume you are running from the **Agent-S repository root** (Linux/ECS), i.e. the folder that contains `staging_scripts/`.
-
 The **Autonomous UAT Agent** is the overall UX/UI testing use case built on top of the Agent S codebase and scripts in this repo.

-If you are inside the monorepo workspace, first `cd ~/Projects/Agent_S3/Agent-S` on the Ubuntu ECS and then run the same commands.
-
-## One-command recommended run (ECS)
-
-If you only run one thing to produce clean, repeatable evidence (screenshots with click markers), run the calibration CLI:
-
-```bash
-DISPLAY=:1 python staging_scripts/gui_agent_cli.py --prompt "Go to telekom.de and click the cart icon" --max-steps 10
-```
-
-This writes screenshots to `./results/gui_agent_cli/<timestamp>/screenshots/`.
-
-## ECS runner notes
-
- **Working directory matters:** the default output path is relative to the current working directory (it should be the Agent-S repo root on ECS).
- **GUI required:** `pyautogui` needs an X server (`DISPLAY=:1` is assumed by most scripts).
- **Persistence:** if you want results after the task ends, ensure `./results/` is on a mounted volume or copied out as an artifact.
-
-## Prerequisites (runtime)
-
- Linux GUI session (VNC/Xvfb) because these scripts drive a real browser via `pyautogui`.
- A working `DISPLAY` (most of the scripts assume `:1`).
- Network access to the model endpoints (thinking + vision/grounding).
-
-Common environment variables used by the vLLM-backed scripts:
-
- `vLLM_THINKING_ENDPOINT` (default in code if unset)
- `vLLM_VISION_ENDPOINT` (default in code if unset)
- `vLLM_API_KEY` (default: `dummy-key`)
-
-## Key scripts (repo locations)
-
-Core scripts referenced for D66 demonstrations:
-
- UI check (Agent S3): `staging_scripts/1_UI_check_AS3.py`
- Functional correctness check: `staging_scripts/1_UI_functional_correctness_check.py`
- Visual quality audit: `staging_scripts/2_UX_visual_quality_audit.py`
- Task-based UX flow (newsletter): `staging_scripts/3_UX_taskflow_newsletter_signup.py`
-
-Calibration / CLI entry point (used for click coordinate scaling validation):
-
- GUI Agent CLI (Holo click calibration): `staging_scripts/gui_agent_cli.py`
-
-Legacy / historical:
-
- `staging_scripts/old scripts/agent_s3_1_old.py`
- `staging_scripts/old scripts/agent_s3_ui_test.py`
-
-## Common configuration knobs
-
-Many scripts support these environment variables:
-
- `AS2_TARGET_URL`: website URL to test
- `AS2_MAX_STEPS`: max steps (varies by script)
- `ASK_EVERY_STEPS`: interactive prompt cadence
-
-Execution environment:
-
- Linux GUI environment typically expects `DISPLAY=:1`
-
-## Recommended: run gui_agent_cli.py (calibration / click precision)
-
-This is the “clean” CLI entry point for repeatable calibration runs.
-
-Minimal run (prompt mode):
-
-```bash
-python staging_scripts/gui_agent_cli.py \
-  --prompt "Go to telekom.de and click the cart icon" \
-  --max-steps 30
-```
-
-Optional scaling factors for debugging (defaults to `1.0` / `1.0`):
-
-```bash
-python staging_scripts/gui_agent_cli.py \
-  --prompt "Go to telekom.de and click the cart icon" \
-  --x-scale 2.0 \
-  --y-scale 2.0 \
-  --max-steps 30
-```
-
-Outputs:
-
- Default run folder: `./results/gui_agent_cli/<timestamp>/`
- Screenshots: `./results/gui_agent_cli/<timestamp>/screenshots/`
- Text log (stdout/stderr): `./results/gui_agent_cli/<timestamp>/logs/run.log`
-
-If `--enable-logging` is set, the script also writes a structured JSON communication log (Story 026-002) into the same run `logs/` folder by default.
-
-Enable model communication logging (recommended when debugging mis-clicks):
-
-```bash
-python staging_scripts/gui_agent_cli.py \
-  --prompt "Click the Telekom icon" \
-  --max-steps 10 \
-  --output-dir ./results/gui_agent_cli/debug_run_telekom_icon \
-  --enable-logging \
-  --log-output-dir ./results/gui_agent_cli/debug_run_telekom_icon/logs
-```
-
-## Golden run (terminal on ECS)
-
-This is the “golden run” command sequence currently used for D66 evidence generation.
+All commands below assume you are running from the **Agent-S repository root** (Linux/ECS), `~/Projects/Agent_S3/Agent-S`. To do that, connect to the server via SSH. You will need a key pair for authentication and an open inbound port in the firewall. For information on how to obtain the key pair and request firewall access, contact [tom.sakretz@telekom.de](mailto:tom.sakretz@telekom.de).

 ### 1) Connect from Windows

@ -127,7 +22,6 @@ ssh -i "C:\Path to KeyPair\KeyPair-ECS.pem" ubuntu@80.158.3.120

 ```bash
 # Activate venv
-# Recommended: use the Agent S3 venv
 source ~/Projects/Agent_S3/Agent-S/venv/bin/activate

 # Go to Agent-S repo root
@ -140,7 +34,45 @@ export DISPLAY=":1"
 firefox &
 ```

-### 3) Run the golden prompt
+### 3) One-command recommended run (ECS)
+
+If you only run one thing to produce clean, repeatable evidence (screenshots with click markers), run the following command CLI:
+
+```bash
+python staging_scripts/gui_agent_cli.py --prompt "Go to telekom.de and click the cart icon" --max-steps 10
+```
+
+This will produce:
+
+- Screenshots: `./results/gui_agent_cli/<timestamp>/screenshots/`
+- Text log: `./results/gui_agent_cli/<timestamp>/logs/run.log`
+- JSON comm log: `./results/gui_agent_cli/<timestamp>/logs/run.log`
+
+
+## Prerequisites (runtime)
+
+- Linux GUI session (VNC/Xvfb) because these scripts drive a real browser via `pyautogui`.
+- A working `DISPLAY` (default for all scripts is `:1`).
+- Network access to the model endpoints (thinking + vision/grounding).
+
+
+## Key scripts (repo locations)
+
+The GUI Agent CLI script is the most flexible entry point and is therefore the only one described in more detail in this documentation. Assumes you are in project root `~/Projects/Agent_S3/Agent-S`.
+
+- GUI Agent CLI: `staging_scripts/gui_agent_cli.py`
+
+Historically, we used purpose-built scripts for individual tasks. We now recommend using `gui_agent_cli.py` as the primary entry point, because the same scenarios can usually be expressed via a well-scoped prompt while keeping the workflow more flexible and easier to maintain. The scripts below are kept for reference and may not reflect the current, preferred workflow.
+
+- UI check (Agent S3): `staging_scripts/1_UI_check_AS3.py`
+- Functional correctness check: `staging_scripts/1_UI_functional_correctness_check.py`
+- Visual quality audit: `staging_scripts/2_UX_visual_quality_audit.py`
+- Task-based UX flow (newsletter): `staging_scripts/3_UX_taskflow_newsletter_signup.py`
+
+
+## Golden run (terminal on ECS)
+
+This is the “golden run” command sequence currently used for D66 evidence generation. The golden run is a complete workflow that works as a template for reproducible outcomes.

 ```bash
 python staging_scripts/gui_agent_cli.py \
@ -167,63 +99,14 @@ Golden run artifacts:
 - Text log: `./results/gui_agent_cli/<timestamp>/logs/run.log`
 - Optional JSON comm log (if enabled): `./results/gui_agent_cli/<timestamp>/logs/calibration_log_*.json`

+An example golden run with screenshots and log outputs can be seen in [Results](./results/).
+
 ## Alternative: run the agent via a web interface (Frontend)

 Work in progress.

 We are currently updating the web-based view and its ECS runner integration. This section will be filled with the correct, up-to-date instructions once the frontend flow supports the current Autonomous UAT Agent + `gui_agent_cli.py` workflow.

-## Run the D66 evaluation scripts (staging_scripts)
-
-These scripts are used for D66-style evaluation runs and tend to write their artifacts into `staging_scripts/` (DB, screenshots, JSON).
-
-### UI check (Agent S3)
-
-Typical pattern (URL via env var + optional run control args):
-
-```bash
-export AS2_TARGET_URL="https://www.leipzig.de"
-export AS2_MAX_STEPS="20"
-
-python staging_scripts/1_UI_check_AS3.py --auto-yes --ask-every 1000
-```
-
-Notes:
-
- Supports `--job-id <id>` (used by runners) and uses `JOB_ID` as a fallback.
- Writes JSON to `./agent_output/raw_json/<job_id>/` and screenshots/overlays to `staging_scripts/Screenshots/...`.
-
-### Functional correctness check
-
-```bash
-export AS2_TARGET_URL="https://www.leipzig.de"
-export AS2_MAX_STEPS="0"   # 0 = no limit (script-specific)
-
-python staging_scripts/1_UI_functional_correctness_check.py --auto-yes --ask-every 1000
-```
-
-### Visual quality audit
-
-This script currently uses a hardcoded `WEBSITE_URL` near the top of the file. Update it and then run:
-
-```bash
-python staging_scripts/2_UX_visual_quality_audit.py --auto-yes --ask-every 10
-```
-
-### Task-based UX flow (newsletter)
-
-This script is currently a staging/WIP script; verify it runs in your environment before relying on it for evidence.
-
-## Outputs to expect
-
-Most scripts record one or more of:
-
- `uxqa.db` (run log DB)
- screenshots/overlays under `staging_scripts/Screenshots/...`
- JSON step outputs under `agent_output/` (paths vary by script)
- calibration CLI outputs under `./results/gui_agent_cli/<timestamp>/`
-
-See [Outputs & Artifacts](./outputs-and-artifacts.md).

 ## Notes on model usage