From days to hours
QA effort moves from drafting and clicking toward judgment and exception handling. The pipeline takes the mechanical work; the team takes the calls that matter.
ASTRA QA replaces single-prompt test automation with a staged, reviewable agentic pipeline — so testing volume scales without scaling QA headcount.
Traditional test automation can't keep up with feature velocity, and it has no answer for the AI features your team is now shipping into production. The same QA team faces two distinct ceilings — and the headcount fix doesn't work for either.
Locator scripts break the moment a button moves. Test maintenance eats the engineering capacity you were going to spend on new tests. QA headcount has to scale linearly with feature velocity — until it can't.
You're shipping LLM-backed features into production. They don't behave deterministically. RAG pipelines drift. Generative outputs aren't repeatable. Traditional test automation has no answer here — and LLM-as-judge approaches hallucinate their own ground truth.
Both ceilings converge on the same answer: agentic test automation that's grounded, bounded, and self-healing.
Most test automation today sits in one of two camps. The Selenium / Playwright / record-and-replay camp is fast to write and fragile to maintain. The LLM-as-judge camp is faster to scale and unreliable to defend. Neither holds up at enterprise scale.
Hard-coded XPath / CSS selectors break the moment a developer renames a class, restructures a component, or moves a button. The test suite becomes the bottleneck it was meant to remove.
Recorded interactions encode the exact pixel coordinates, the exact DOM tree, the exact timing. None of these survive a real-world UI refresh. Tests need rewriting, not re-recording.
Pointing an LLM at a screen and asking “did it work?” feels magical for a week. Then the LLM hallucinates a pass that wasn't, or fails a test on a rendering quirk. Without grounding in a source of truth, the judge is making it up.
RAG pipelines drift, model outputs aren't repeatable, agent traces diverge across runs. Traditional pass/fail assertions don't apply. The category itself has been waiting for a different testing approach.
None of this means the tools are bad — they are the right tools for the simpler jobs they were built for. They are not the right foundation for testing complex agentic systems at enterprise scale.
ASTRA QA's agents don't read locators — they read the UI. When a button moves, the agent finds it. When a layout changes, the agent adapts. When a generative output drifts, the agent reasons about why. Grounded in source documents. Bounded by reviewer gates. Self-healing in execution.
Locator: //button[@id='submit']
UI changes → locator misses → test fails. No adaptation. No grounding. No evidence beyond pass/fail.
Reads the UI semantically. UI changes → vision-assisted reasoning finds the element → test adapts. Grounded in source docs. Bounded by reviewer gates. Evidence trail attached.
Every test artefact traces to a source document; every execution carries its evidence; every stage waits for a human gate.
ASTRA QA's pipeline runs in three explicit stages. Each stage produces an output. Each output is reviewable. The platform never moves to the next stage without a human gate. Bounded autonomy is the design principle.
The agent reads your source documents — requirements, specs, user stories, design files — and surfaces the test intent. It asks clarifying questions where the source is ambiguous. Output: a reviewable test charter, grounded in the source.
▸ Reviewer gate: Reviewer approves the charter before Design begins.
The agent designs test cases against the approved charter — scenarios, edge cases, data setup, expected outcomes. Each test case traces back to a specific source-document section. No invented test logic. Output: a reviewable test plan.
▸ Reviewer gate: Reviewer approves the test plan before Execute begins.
The agent runs the test plan against the live system. Vision-assisted reasoning adapts when the UI deviates from the design's assumptions. Self-healing execution — brittle locators are eliminated by construction. Output: pass/fail outcomes plus an evidence trail.
▸ Reviewer gate: Reviewer triages execution outcomes — exceptions, ambiguities, unexpected passes — before sign-off.
Four capabilities, all enforced at every stage of the pipeline.
Every test artefact traces back to a source document. No hallucinated test cases.
Reviewer gates at every stage. The platform never executes without human approval.
Vision-assisted reasoning adapts to UI changes. Brittle locator scripts are eliminated.
Screenshots, logs, validation outcomes — the audit trail is part of the deliverable.
When the test pipeline is agentic, grounded, and reviewable, the QA team's role changes shape. Less drafting and clicking. More judgment and exception handling. The same team handles more features — and handles them with more confidence than brittle automation ever delivered.
Evidence trail · accumulating
Audit trail as deliverable
Every execution leaves a complete audit trail. Compliance reviewers, engineering reviewers, and the QA team itself all read from the same evidence record.
QA effort moves from drafting and clicking toward judgment and exception handling. The pipeline takes the mechanical work; the team takes the calls that matter.
Every execution leaves a complete audit trail: screenshots, logs, validation outcomes, source-document traces. Compliance reviewers, engineering reviewers, and the QA team itself all read from the same evidence record.
Generative-output validation, agentic-system behaviour testing, RAG-pipeline drift detection — the categories traditional automation has no answer for. ASTRA QA's grounding and bounded autonomy make these testable.
One agentic-test pipeline serves every product team, every release, every regulated audit. Reusable across features, environments, and review boundaries.
ASTRA QA is the productisation of AIFT's CityU Lab Agentic QA solution (architectural codename StratumQA). The research direction — grounded test synthesis, bounded execution, vision-assisted reasoning — has been live in AIFT engagements for years before ASTRA delivered it as a named product.
The category evolution
Brittle locators. Fast to write, fragile to maintain.
Faster to scale. Hallucinated test logic. Unreliable to defend.
Grounded. Bounded. Self-healing by construction. ASTRA QA.
ASTRA QA isn't an integration of a public framework. The grounded-test-synthesis and bounded-execution approaches were developed at AIFT and carried into ASTRA's product surface. Architecture decisions reflect research depth, not vendor packaging.
When a QA engagement needs research-grade depth — novel domain models, custom evaluation harnesses, multi-modal evidence pipelines — we have the bench to build it. Decisions are made by senior engineers with direct access to AIFT's research team, not associates working from a playbook.
AIFT is the only FinTech research laboratory recognised by InnoHK — Hong Kong SAR Government's flagship innovation programme. Co-founded by City University of Hong Kong, Columbia University, and Tsinghua University. ASTRA QA inherits the same regulatory-grade engineering posture.
Bring the agentic pipeline in
Tell us what your product surface looks like, where traditional automation has been breaking, and what you're trying to test that LLM-as-judge can't reliably handle. We'll sketch the pipeline shape and the engagement model for your team.
QA team