ASTRA

QA

Test automation that doesn't break the moment your UI does.

ASTRA QA replaces single-prompt test automation with a staged, reviewable agentic pipeline — so testing volume scales without scaling QA headcount.

The Pain

Test automation has two ceilings.
Both are about to hit your roadmap.

Traditional test automation can't keep up with feature velocity, and it has no answer for the AI features your team is now shipping into production. The same QA team faces two distinct ceilings — and the headcount fix doesn't work for either.

Axis 01

The traditional ceiling

Locator scripts break the moment a button moves. Test maintenance eats the engineering capacity you were going to spend on new tests. QA headcount has to scale linearly with feature velocity — until it can't.

Axis 02

The AI-era ceiling

You're shipping LLM-backed features into production. They don't behave deterministically. RAG pipelines drift. Generative outputs aren't repeatable. Traditional test automation has no answer here — and LLM-as-judge approaches hallucinate their own ground truth.

Both ceilings converge on the same answer: agentic test automation that's grounded, bounded, and self-healing.

Why Traditional Automation Fails

Locator scripts.
LLM-as-judge.
Both break before they help.

Most test automation today sits in one of two camps. The Selenium / Playwright / record-and-replay camp is fast to write and fragile to maintain. The LLM-as-judge camp is faster to scale and unreliable to defend. Neither holds up at enterprise scale.

  1. 01

    Locators that break with the UI

    Hard-coded XPath / CSS selectors break the moment a developer renames a class, restructures a component, or moves a button. The test suite becomes the bottleneck it was meant to remove.

  2. 02

    Record-and-replay drift

    Recorded interactions encode the exact pixel coordinates, the exact DOM tree, the exact timing. None of these survive a real-world UI refresh. Tests need rewriting, not re-recording.

  3. 03

    LLM-as-judge with no grounding

    Pointing an LLM at a screen and asking “did it work?” feels magical for a week. Then the LLM hallucinates a pass that wasn't, or fails a test on a rendering quirk. Without grounding in a source of truth, the judge is making it up.

  4. 04

    Generative-output testing has no answer

    RAG pipelines drift, model outputs aren't repeatable, agent traces diverge across runs. Traditional pass/fail assertions don't apply. The category itself has been waiting for a different testing approach.

None of this means the tools are bad — they are the right tools for the simpler jobs they were built for. They are not the right foundation for testing complex agentic systems at enterprise scale.

The Agentic Difference

Brittle scripts break.
Agentic tests adapt.

ASTRA QA's agents don't read locators — they read the UI. When a button moves, the agent finds it. When a layout changes, the agent adapts. When a generative output drifts, the agent reasons about why. Grounded in source documents. Bounded by reviewer gates. Self-healing in execution.

Brittle test automation

Locator-coupled
LOCATORBUTTONBUTTONLOCATOR MISSED

Locator: //button[@id='submit']
UI changes → locator misses → test fails. No adaptation. No grounding. No evidence beyond pass/fail.

ASTRA QA — agentic

Vision-assisted
AGENTBUTTONTEST ADAPTED

Reads the UI semantically. UI changes → vision-assisted reasoning finds the element → test adapts. Grounded in source docs. Bounded by reviewer gates. Evidence trail attached.

Every test artefact traces to a source document; every execution carries its evidence; every stage waits for a human gate.

The 3-Stage Pipeline

One platform.
Three reviewable stages.
No autonomous execution.

ASTRA QA's pipeline runs in three explicit stages. Each stage produces an output. Each output is reviewable. The platform never moves to the next stage without a human gate. Bounded autonomy is the design principle.

STAGE 01UnderstandSTAGE 02DesignSTAGE 03ExecuteREVIEWERGATEREVIEWERGATEBOUNDED AUTONOMY — NO STAGE EXECUTES WITHOUT HUMAN APPROVAL
  1. 01

    Understand

    The agent reads your source documents — requirements, specs, user stories, design files — and surfaces the test intent. It asks clarifying questions where the source is ambiguous. Output: a reviewable test charter, grounded in the source.

    ▸ Reviewer gate: Reviewer approves the charter before Design begins.

  2. 02

    Design

    The agent designs test cases against the approved charter — scenarios, edge cases, data setup, expected outcomes. Each test case traces back to a specific source-document section. No invented test logic. Output: a reviewable test plan.

    ▸ Reviewer gate: Reviewer approves the test plan before Execute begins.

  3. 03

    Execute

    The agent runs the test plan against the live system. Vision-assisted reasoning adapts when the UI deviates from the design's assumptions. Self-healing execution — brittle locators are eliminated by construction. Output: pass/fail outcomes plus an evidence trail.

    ▸ Reviewer gate: Reviewer triages execution outcomes — exceptions, ambiguities, unexpected passes — before sign-off.

Four capabilities, all enforced at every stage of the pipeline.

Grounded generation

Every test artefact traces back to a source document. No hallucinated test cases.

Bounded autonomy

Reviewer gates at every stage. The platform never executes without human approval.

Self-healing execution

Vision-assisted reasoning adapts to UI changes. Brittle locator scripts are eliminated.

Evidence-first

Screenshots, logs, validation outcomes — the audit trail is part of the deliverable.

Target State

Testing volume scales.
QA headcount doesn't have to.

When the test pipeline is agentic, grounded, and reviewable, the QA team's role changes shape. Less drafting and clicking. More judgment and exception handling. The same team handles more features — and handles them with more confidence than brittle automation ever delivered.

Evidence trail · accumulating

TEST EXECUTIONPASS · WITH EVIDENCESSCREENSHOTLLOGVVALIDATIONDSOURCE DOCHAUDIT HASH

Every execution leaves a complete audit trail. Compliance reviewers, engineering reviewers, and the QA team itself all read from the same evidence record.

01

From days to hours

QA effort moves from drafting and clicking toward judgment and exception handling. The pipeline takes the mechanical work; the team takes the calls that matter.

02

Evidence as default

Every execution leaves a complete audit trail: screenshots, logs, validation outcomes, source-document traces. Compliance reviewers, engineering reviewers, and the QA team itself all read from the same evidence record.

03

Testing for the AI era

Generative-output validation, agentic-system behaviour testing, RAG-pipeline drift detection — the categories traditional automation has no answer for. ASTRA QA's grounding and bounded autonomy make these testable.

04

A platform, not a project

One agentic-test pipeline serves every product team, every release, every regulated audit. Reusable across features, environments, and review boundaries.

R&D Backbone

Built on agentic-QA research.
From the InnoHK lab ASTRA spun out of.

ASTRA QA is the productisation of AIFT's CityU Lab Agentic QA solution (architectural codename StratumQA). The research direction — grounded test synthesis, bounded execution, vision-assisted reasoning — has been live in AIFT engagements for years before ASTRA delivered it as a named product.

The category evolution

Era 01

Selenium era

Brittle locators. Fast to write, fragile to maintain.

Era 02

LLM-augmented

Faster to scale. Hallucinated test logic. Unreliable to defend.

Era 03 · NOW

Agentic

Grounded. Bounded. Self-healing by construction. ASTRA QA.

  1. 01

    Research lineage, not adopted theory

    ASTRA QA isn't an integration of a public framework. The grounded-test-synthesis and bounded-execution approaches were developed at AIFT and carried into ASTRA's product surface. Architecture decisions reflect research depth, not vendor packaging.

  2. 02

    60+ engineer R&D bench

    When a QA engagement needs research-grade depth — novel domain models, custom evaluation harnesses, multi-modal evidence pipelines — we have the bench to build it. Decisions are made by senior engineers with direct access to AIFT's research team, not associates working from a playbook.

  3. 03

    InnoHK FinTech lab heritage

    AIFT is the only FinTech research laboratory recognised by InnoHK — Hong Kong SAR Government's flagship innovation programme. Co-founded by City University of Hong Kong, Columbia University, and Tsinghua University. ASTRA QA inherits the same regulatory-grade engineering posture.

Start

Bring the agentic pipeline in

If your QA is the bottleneck — or your AI features have no test framework — let's talk.

Tell us what your product surface looks like, where traditional automation has been breaking, and what you're trying to test that LLM-as-judge can't reliably handle. We'll sketch the pipeline shape and the engagement model for your team.