ASTRA

Forge

Knowledge that compounds.
Agents that reason.

ASTRA Forge turns enterprise knowledge — messy, distributed, unstructured — into AI-ready ground truth that agents can actually reason over. Not retrieved passages; reasoning paths.

The Pain

Enterprise knowledge is fragile.
And your AI projects know it.

Most production AI doesn't fail at the model. It fails at the ground truth feeding it. Knowledge lives in SharePoint, Confluence, PDFs, email threads, ticket histories, and in people's heads. The moment people leave — or move on — institutional reasoning leaves with them.

01

Knowledge in human heads

Critical reasoning lives in the team's heads, not in retrievable systems. The day a senior person leaves, your AI loses access to the most valuable training context you had.

02

Per-project pipelines

Every AI initiative rebuilds the same data plumbing from scratch. Six months later you have three pilots, three pipelines, and one unified ground truth — none of them.

03

RAG-only retrieval

Vector search finds passages that look relevant. For a chatbot answering “what's our refund policy”, that's enough. For an agent investigating a multi-step regulatory question, it's not.

Why RAG Alone Fails

RAG retrieves passages.
Deep reasoning needs more.

Retrieval-Augmented Generation is the right baseline for most question-answering. It's not the right foundation for agentic reasoning in high-stakes domains.

  1. 01

    Chunk boundaries break context

    Documents get chunked for embedding. The relationships between facts in the same document — or across documents — are lost the moment you slice them up for vector retrieval.

  2. 02

    Vector similarity isn't reasoning

    Two passages can be semantically similar and logically unrelated. An agent that retrieves “supplier X is in region Y” and “region Y has sanctions risk” cannot conclude “supplier X is at sanctions risk” — RAG returns chunks, not inferences.

  3. 03

    Multi-hop questions cascade errors

    Each hop in an agentic reasoning loop is another roll of the dice. By the third retrieval, the agent's working context is fragmented. Hallucinations compound. Confidence in any one answer drops.

  4. 04

    Schema-free, constraint-blind

    RAG doesn't know that “drug X” and “contraindication Y” are typed entities with a “hasContraindication” relation. It sees them as nearby text. For clinical, financial, or regulatory reasoning, that distinction is the difference between a working agent and a liability.

None of this is RAG's fault. RAG is the right foundation for the right job. It's not the right foundation for the agentic, multi-hop, schema-aware reasoning enterprises now need from production AI.

The KAG Difference

RAG retrieves passages.
KAG retrieves reasoning paths.

Knowledge-Augmented Generation integrates a structured knowledge graph with a logical reasoning engine alongside the LLM. Retrieval returns reasoning paths anchored in a typed schema — not isolated text chunks.

KAG was introduced by Ant Group's Knowledge Graph Team and Zhejiang University (peer-reviewed, arXiv:2409.13731), with the open-source OpenSPG implementation. ASTRA Forge is a KAG-class platform built on the same grounded-generation research lineage.

RAG

Retrieves passages
QUERYVECTORSSCATTERED · NO RELATIONS

Vector similarity over chunked text. No schema, no constraints. Multi-hop reasoning fragmented across turns; relationships between facts are not modelled.

KAG

Retrieves reasoning paths
QUERYANSWERREASONING PATH · TYPED RELATIONS

Logical-form-guided traversal over a typed knowledge graph. Schema-aware, constraint-respecting. Persistent reasoning substrate across agent turns.

Peer-reviewed result

KAG outperforms RAG on multi-hop / professional-domain QA by 19.6–33.4% F1 improvement.

Ant Group / OpenSPG, peer-reviewed 2024

Microsoft's GraphRAG is the closest adjacent work — graph-community-aware retrieval. KAG goes further: a dedicated reasoning engine that performs inference before passing results to the LLM.

For deep analysis and investigation — financial counterparty risk, clinical decision support, supply-chain disruption, regulatory reasoning — RAG alone is not enough. KAG matters when agents must reason across typed relationships and domain constraints.

The Forge Pipeline

One platform.
Five stages to AI-ready knowledge.

ASTRA Forge takes enterprise knowledge from raw to AI-ready in five governed stages. Each stage is observable, audited, and built to enterprise scale. Security and compliance are designed in — not bolted on.

RAW INPUTINGESTCURATESTRUCTUREGROUNDGOVERNENTITYGOVERNEDKAG-CLASSGROUND TRUTH
  1. 01

    Ingest

    Connects to SharePoint, Confluence, network shares, ticketing, CRMs, knowledge bases. Handles PDFs, Word, slides, structured data, audio transcripts.

    Connector list illustrative; specifics confirmed per engagement. Identity, access control, and audit are present from the ingestion edge — not bolted on at the end of the pipeline.

  2. 02

    Curate

    Deduplicates, version-resolves, removes stale content. Classifies sensitive data. Routes confidential material through approved-access paths only.

    PII detection, sensitivity labels, and routing rules are configured per the enterprise's security policy — not generic defaults.

  3. 03

    Structure

    Extracts entities, relationships, ontologies. Builds knowledge graphs where domain shape matters. Decomposes long documents into retrievable units with preserved context.

    This is the stage where ASTRA Forge becomes KAG-class. Structured entity extraction, typed relations, and schema design — not just embedding into a vector store. The knowledge graph is the substrate for the reasoning paths the agent will later traverse.

  4. 04

    Ground

    Produces high-quality, retrieval-ready knowledge — vector indexes, knowledge graphs, decision rule sets, structured SOPs. Every artefact traces back to its source.

    Hybrid retrieval is the state of the art: vector indexes for semantic match, knowledge graph for relationship traversal, structured rule sets for constraint-respecting inference. Forge produces all three from the same governed pipeline.

  5. 05

    Govern

    Enforces access policy, retention, residency. Every retrieval is logged. Every artefact carries its provenance. Compliance-grade audit trails by default.

    Governance is not a final stage — it is enforced at every prior stage and made queryable here. Audit logs are immutable and human-readable. Residency rules respect jurisdiction; access controls respect identity.

Four capabilities, all enterprise-scale, all in production from day one.

Enterprise-scale ingestion

Built to process knowledge at organisation scale — not a desktop RAG tool.

Security & compliance built in

Identity, access control, audit, residency, encryption — not bolted on.

Structured, not just embedded

Beyond vector retrieval — entities, relationships, and decision rules. This is the KAG-class capability surfaced explicitly.

Reusable

One investment in AI-readiness; many agents and applications consume it.

Target State

One ground truth.
Many agents.
No single point of failure.

When the knowledge layer is durable, the AI layer compounds. Every new agent, every new investigation, every new application reuses the same governed ground truth — not a per-project rebuild. The institution gets smarter; the agents get better; the dependency on any one person's retention gets weaker.

Multi-hop reasoning path

QUERYANSWER

Each step is a typed relation. The agent doesn't guess the connection — the graph provides it. That's the difference between an answer the agent retrieved and an answer the agent can defend.

01

Durable knowledge that compounds

Every agent built on Forge contributes back to the ground truth. The substrate gets richer with use, not noisier. Six months in, you don't have a dataset — you have an institutional reasoning layer.

02

Agents that reason, not just retrieve

Multi-hop investigations. Constraint-respecting decisions. Schema-aware reasoning paths. The kind of work agentic AI is supposed to do — supported by the substrate it's supposed to do it on.

03

End of the human-retention bottleneck

When senior team members move on, the institutional reasoning stays. Their tacit knowledge becomes documented relationships in the graph. Onboarding shortens. Departure no longer drops the organisation's IQ.

04

One investment, many applications

Stop building per-project data pipelines. Build one ground truth that every agent uses.

R&D Backbone

Built on grounded-generation research.
From the InnoHK lab ASTRA spun out of.

ASTRA Forge is built on grounded-generation research from AIFT — the Laboratory for AI-Powered Financial Technologies, ASTRA's parent R&D lab. AIFT is the only FinTech research laboratory recognised by InnoHK, the Hong Kong SAR Government's flagship innovation initiative. Co-founded by CityU + Columbia + Tsinghua. 60+ engineer R&D bench. The same research lineage that KAG-class systems were born from.

  1. 01

    Research-lineage continuity

    KAG isn't a framework we adopted from a paper. It is the same grounded-generation research direction AIFT has been working in for years. The deep-dive you've just read is engineering practice with research behind it — not vendor marketing.

  2. 02

    60+ engineer R&D bench

    When the implementation needs research-grade muscle, we have it. Architecture decisions in your Forge engagement are made by senior engineers with direct access to the AIFT bench — not associates working from a playbook.

  3. 03

    The InnoHK FinTech lab

    AIFT is the only FinTech research laboratory recognised by InnoHK — Hong Kong SAR Government's flagship innovation programme. Co-founded by City University of Hong Kong, Columbia University, and Tsinghua University. Operates from Hong Kong Science Park.

Start

Bring the ground truth in

If your AI's bottleneck is the knowledge feeding it, let's talk.

Tell us what your agents need to reason over, where the knowledge lives today, and what the production target is. We'll sketch the Forge engagement shape and the architecture for your domain.