Google AI Agents Challenge Track 2: MBS Reliability Agent

Mandatory Technologies Verification

Intelligence

Gemini 2.5 Flash via Vertex AI / Agent Engine

The routing LLM and multi-agent reasoning backbone, hosted on Google Cloud Vertex AI in us-central1.

Orchestration

Google ADK root_agent + SequentialAgent

Agentic workflow orchestrated by Google ADK with EvidenceRetrievalAgent → ProcurementDecisionAgent → MBSGateAgent → ReviewPacketAgent.

Infrastructure

Google Cloud Run backend + Agent Engine deployed smoke test

Public demo front-end served on Cloudflare, while the agent backend runs on Google Cloud infrastructure.

Google Cloud Proof

Cloud Run Backend

https://mbs-track2-adk-109897818537.us-central1.run.app

Agent Engine Resource

projects/109897818537/locations/us-central1/reasoningEngines/355545226783227904

Deployment Note

Public demo is hosted on Cloudflare; agent backend runs on Google Cloud.

Demo Cases

Case 1: Normal Vendor Approval

PASS (Baseline)

Scenario: Standard vendor with complete evidence pack, standard contract terms, valid insurance.

The agent successfully retrieves policy and evidence, makes the correct approval decision, and passes MBS gate validation.

Case 2: Risky Edge Case

REVIEW (Baseline) / FAIL (Gated)

Scenario: Bank detail change request or off-channel payment instruction with missing evidence.

Baseline agent may approve; MBS-gated agent correctly blocks or routes to review before execution.

Case 3: Optimized MBS-Gated

PASS/REVIEW/FAIL Logic

Scenario: All edge cases handled by MBS runtime gate that evaluates decision confidence, evidence completeness, and policy compliance before action.

The optimized agent uses deterministic MBS gate logic to ensure safe action selection.

Evidence Access

View Trace

Local and Vertex Gemini execution traces showing retrieval and decision spans.

View Optimizer Diff

GEPA iteration results: 33 iterations, evolution of prompt improvements.

View Scorecard

Baseline vs. optimized local evaluation with decision/gate match metrics.

View Grounding/RAG Evidence

Proof of custom private-data grounding implementation (NOT Vertex AI Search).

View Full Code

adk_agent/, evals/, instructions/, and results/ on GitHub.

Scorecard

Early local mock evidence — live Vertex Gemini/Vertex Agent Engine evidence is separately validated.

baseline_local

2/3

decision match

2/3

gate match

Local mock evidence, unoptimized

optimized_local

3/3

decision match

3/3

gate match

Local mock evidence, GEPA optimized (33 iterations)

live_vertex_agent

6/6

gate match

100% gate match rate

Live Gemini 2.5 Flash on Vertex Agent Engine

Opaque label

Local scorecard is local mock evidence. Live Gemini/Vertex deployment is proven separately by the Agent Engine smoke test with 100% gate match.

Grounding / RAG

We use custom private-data grounding / custom RAG, NOT Vertex AI Search.

Retrieval Agents & Tools

def retrieve_policy() -> dict[str, Any]:
    """Retrieve the private procurement policy used for grounding."""
    policy = load_policy()
    return {
        "truth_label": "local custom private retrieval",
        "source": "demo/policies/procurement_policy_v1.md",
        "chars": len(policy),
        "policy": policy,
    }

def retrieve_evidence_pack(pack_id: str) -> dict[str, Any]:
    """Retrieve a private vendor evidence pack by id."""
    pack = load_evidence_pack(pack_id)
    return {
        "truth_label": "local custom private retrieval",
        "pack_id": pack_id,
        "found": pack is not None,
        "evidence_pack": pack,
    }
          

EvidenceRetrievalAgent calls both tools before any ProcurementDecisionAgent action. This ensures grounding in private procurement policy and vendor-specific evidence.

Private Data Sources

Private procurement policy: demo/policies/procurement_policy_v1.md
Vendor evidence packs: demo/evidence_packs/pack_*.json (10 different packs)
No external search: All retrieval is from local, pre-loaded private data

Evidence in Traces

Trace Example (normal_standard_vendor)

policy_retrieved

"source": "demo/policies/procurement_policy_v1.md"

{
  "spans": [
    {
      "name": "policy_retrieved",
      "status": "ok",
      "attributes": {
        "truth_label": "local custom private retrieval",
        "source": "demo/policies/procurement_policy_v1.md"
      }
    },
    {
      "name": "evidence_retrieved",
      "status": "ok",
      "attributes": {
        "pack_id": "pack_approve_standard"
      }
    }
  ]
}

Multi-Agent Workflow

EvidenceRetrievalAgent

➜

ProcurementDecisionAgent

➜

MBSGateAgent

➜

ReviewPacketAgent

Agent Chain Description

EvidenceRetrievalAgent: Retrieves private procurement policy and vendor evidence pack
ProcurementDecisionAgent: Uses evidence to make structured decision (PASS/REVIEW/FAIL)
MBSGateAgent: Validates/gates proposed action using deterministic MBS logic
ReviewPacketAgent: Creates human review packet for edge cases that need manual review

The chain is orchestrated by Google ADK SequentialAgent and configured via root_agent. All agents use VertexGemini(model="gemini-2.5-flash").

Limitations

Opaque labels

Not claiming full hosted SaaS readiness.
Not claiming Vertex AI Search.
Some evaluation artifacts are local mock; live Gemini evidence is separately labeled.
MBS gate is deterministic packaged runtime gate.

Product branding

General Aletheia/MBS product marketing is below this page or at /mbs. This page is judge-first and focused on Challenge Track 2 evidence.