ForgeAI-READINESS · KAG-CLASS PLATFORM

Knowledge that compounds.
Agents that reason.

ASTRA Forge turns enterprise knowledge — messy, distributed, unstructured — into AI-ready ground truth that agents can actually reason over. Not retrieved passages; reasoning paths.

Start a conversation See the Forge pipeline →

The Pain

Enterprise knowledge is fragile.
And your AI projects know it.

Most production AI doesn't fail at the model. It fails at the ground truth feeding it. Knowledge lives in SharePoint, Confluence, PDFs, email threads, ticket histories, and in people's heads. The moment people leave — or move on — institutional reasoning leaves with them.

Knowledge in human heads

Critical reasoning lives in the team's heads, not in retrievable systems. The day a senior person leaves, your AI loses access to the most valuable training context you had.

Per-project pipelines

Every AI initiative rebuilds the same data plumbing from scratch. Six months later you have three pilots, three pipelines, and one unified ground truth — none of them.

RAG-only retrieval

Vector search finds passages that look relevant. For a chatbot answering “what's our refund policy”, that's enough. For an agent investigating a multi-step regulatory question, it's not.

Why RAG Alone FailsTECHNICAL DEEP-DIVE · RETRIEVAL LIMITS

RAG retrieves passages.
Deep reasoning needs more.

Retrieval-Augmented Generation is the right baseline for most question-answering. It's not the right foundation for agentic reasoning in high-stakes domains.

01
Chunk boundaries break context
Documents get chunked for embedding. The relationships between facts in the same document — or across documents — are lost the moment you slice them up for vector retrieval.
02
Vector similarity isn't reasoning
Two passages can be semantically similar and logically unrelated. An agent that retrieves “supplier X is in region Y” and “region Y has sanctions risk” cannot conclude “supplier X is at sanctions risk” — RAG returns chunks, not inferences.
03
Multi-hop questions cascade errors
Each hop in an agentic reasoning loop is another roll of the dice. By the third retrieval, the agent's working context is fragmented. Hallucinations compound. Confidence in any one answer drops.
04
Schema-free, constraint-blind
RAG doesn't know that “drug X” and “contraindication Y” are typed entities with a “hasContraindication” relation. It sees them as nearby text. For clinical, financial, or regulatory reasoning, that distinction is the difference between a working agent and a liability.

None of this is RAG's fault. RAG is the right foundation for the right job. It's not the right foundation for the agentic, multi-hop, schema-aware reasoning enterprises now need from production AI.

The KAG DifferenceKNOWLEDGE-AUGMENTED GENERATION · STRUCTURED RETRIEVAL + LOGICAL REASONING

RAG retrieves passages.
KAG retrieves reasoning paths.

Knowledge-Augmented Generation integrates a structured knowledge graph with a logical reasoning engine alongside the LLM. Retrieval returns reasoning paths anchored in a typed schema — not isolated text chunks.

KAG was introduced by Ant Group's Knowledge Graph Team and Zhejiang University (peer-reviewed, arXiv:2409.13731), with the open-source OpenSPG implementation. ASTRA Forge is a KAG-class platform built on the same grounded-generation research lineage.

RAG

Retrieves passages

Vector similarity over chunked text. No schema, no constraints. Multi-hop reasoning fragmented across turns; relationships between facts are not modelled.

KAG

Retrieves reasoning paths

Logical-form-guided traversal over a typed knowledge graph. Schema-aware, constraint-respecting. Persistent reasoning substrate across agent turns.

Peer-reviewed result

KAG outperforms RAG on multi-hop / professional-domain QA by 19.6–33.4% F1 improvement.

Ant Group / OpenSPG, peer-reviewed 2024

Microsoft's GraphRAG is the closest adjacent work — graph-community-aware retrieval. KAG goes further: a dedicated reasoning engine that performs inference before passing results to the LLM.

For deep analysis and investigation — financial counterparty risk, clinical decision support, supply-chain disruption, regulatory reasoning — RAG alone is not enough. KAG matters when agents must reason across typed relationships and domain constraints.

The Forge PipelineINGEST · CURATE · STRUCTURE · GROUND · GOVERN

One platform.
Five stages to AI-ready knowledge.

ASTRA Forge takes enterprise knowledge from raw to AI-ready in five governed stages. Each stage is observable, audited, and built to enterprise scale. Security and compliance are designed in — not bolted on.

01
Ingest
Connects to SharePoint, Confluence, network shares, ticketing, CRMs, knowledge bases. Handles PDFs, Word, slides, structured data, audio transcripts.
Connector list illustrative; specifics confirmed per engagement. Identity, access control, and audit are present from the ingestion edge — not bolted on at the end of the pipeline.
02
Curate
Deduplicates, version-resolves, removes stale content. Classifies sensitive data. Routes confidential material through approved-access paths only.
PII detection, sensitivity labels, and routing rules are configured per the enterprise's security policy — not generic defaults.
03
Structure
Extracts entities, relationships, ontologies. Builds knowledge graphs where domain shape matters. Decomposes long documents into retrievable units with preserved context.
This is the stage where ASTRA Forge becomes KAG-class. Structured entity extraction, typed relations, and schema design — not just embedding into a vector store. The knowledge graph is the substrate for the reasoning paths the agent will later traverse.
04
Ground
Produces high-quality, retrieval-ready knowledge — vector indexes, knowledge graphs, decision rule sets, structured SOPs. Every artefact traces back to its source.
Hybrid retrieval is the state of the art: vector indexes for semantic match, knowledge graph for relationship traversal, structured rule sets for constraint-respecting inference. Forge produces all three from the same governed pipeline.
05
Govern
Enforces access policy, retention, residency. Every retrieval is logged. Every artefact carries its provenance. Compliance-grade audit trails by default.
Governance is not a final stage — it is enforced at every prior stage and made queryable here. Audit logs are immutable and human-readable. Residency rules respect jurisdiction; access controls respect identity.

Four capabilities, all enterprise-scale, all in production from day one.

Enterprise-scale ingestion

Built to process knowledge at organisation scale — not a desktop RAG tool.

Security & compliance built in

Identity, access control, audit, residency, encryption — not bolted on.

Structured, not just embedded

Beyond vector retrieval — entities, relationships, and decision rules. This is the KAG-class capability surfaced explicitly.

Reusable

One investment in AI-readiness; many agents and applications consume it.

Target StateWHAT ENTERPRISES GET · DURABLE KNOWLEDGE THAT COMPOUNDS

One ground truth.
Many agents.
No single point of failure.

When the knowledge layer is durable, the AI layer compounds. Every new agent, every new investigation, every new application reuses the same governed ground truth — not a per-project rebuild. The institution gets smarter; the agents get better; the dependency on any one person's retention gets weaker.

Multi-hop reasoning path

Agentic thinking-trace · KAG substrate

Each step is a typed relation. The agent doesn't guess the connection — the graph provides it. That's the difference between an answer the agent retrieved and an answer the agent can defend.

Durable knowledge that compounds

Every agent built on Forge contributes back to the ground truth. The substrate gets richer with use, not noisier. Six months in, you don't have a dataset — you have an institutional reasoning layer.

Agents that reason, not just retrieve

Multi-hop investigations. Constraint-respecting decisions. Schema-aware reasoning paths. The kind of work agentic AI is supposed to do — supported by the substrate it's supposed to do it on.

End of the human-retention bottleneck

When senior team members move on, the institutional reasoning stays. Their tacit knowledge becomes documented relationships in the graph. Onboarding shortens. Departure no longer drops the organisation's IQ.

One investment, many applications

Stop building per-project data pipelines. Build one ground truth that every agent uses.

R&D BackboneAIFT · GROUNDED-GENERATION RESEARCH LINEAGE

Built on grounded-generation research.
From the InnoHK lab ASTRA spun out of.

ASTRA Forge is built on grounded-generation research from AIFT — the Laboratory for AI-Powered Financial Technologies, ASTRA's parent R&D lab. AIFT is the only FinTech research laboratory recognised by InnoHK, the Hong Kong SAR Government's flagship innovation initiative. Co-founded by CityU + Columbia + Tsinghua. 60+ engineer R&D bench. The same research lineage that KAG-class systems were born from.

01
Research-lineage continuity
KAG isn't a framework we adopted from a paper. It is the same grounded-generation research direction AIFT has been working in for years. The deep-dive you've just read is engineering practice with research behind it — not vendor marketing.
02
60+ engineer R&D bench
When the implementation needs research-grade muscle, we have it. Architecture decisions in your Forge engagement are made by senior engineers with direct access to the AIFT bench — not associates working from a playbook.
03
The InnoHK FinTech lab
AIFT is the only FinTech research laboratory recognised by InnoHK — Hong Kong SAR Government's flagship innovation programme. Co-founded by City University of Hong Kong, Columbia University, and Tsinghua University. Operates from Hong Kong Science Park.

Start

Bring the ground truth in

If your AI's bottleneck is the knowledge feeding it, let's talk.

Tell us what your agents need to reason over, where the knowledge lives today, and what the production target is. We'll sketch the Forge engagement shape and the architecture for your domain.

Forge team

hello@astrahk.com

See where Forge fits in /solutions →The FDE delivery model →

Knowledge that compounds.Agents that reason.

Enterprise knowledge is fragile.And your AI projects know it.

Knowledge in human heads

Per-project pipelines

RAG-only retrieval

RAG retrieves passages.Deep reasoning needs more.

Chunk boundaries break context

Vector similarity isn't reasoning

Multi-hop questions cascade errors

Schema-free, constraint-blind

RAG retrieves passages.KAG retrieves reasoning paths.

RAG

KAG

One platform.Five stages to AI-ready knowledge.

Ingest

Curate

Structure

Ground

Govern