SCRYBE structured ingestion engine

For agents, RAG, and long-context systems

URLs in. Structured context out.

SCRYBE fetches pages, resolves dynamic rendering, extracts semantic structure, falls back to OCR when needed, and returns markdown or JSON that feels deliberate instead of scraped.

Input
web pages, dynamic apps, PDFs, OCR-heavy docs
Output
semantic markdown, structured JSON, job-based workflows
Runtime
Python-first API, browser fallback, Node hidden behind OCR
Live Shape operational
Example URL
https://brahmai.in
Example response shape
GET /v1/parse/json?url=https://brahmai.in

{
  "format": "json",
  "document": {
    "metadata": {
      "title": "...",
      "description": "...",
      "content_type": "webpage"
    },
    "markdown": "Title: ...\nMeta Description: ...",
    "chunks": null,
    "summary": null
  }
}
JSON Markdown OCR fallback

01

Semantic extraction

SCRYBE no longer just flattens HTML. It finds the main content root, preserves headings, lists, links, blockquotes, tables, code, and keeps the result readable for downstream reasoning.

02

Smart runtime ladder

Static fetch first. Browser render when the page is mostly shell. OCR or LiteParse only when native extraction becomes untrustworthy. Faster happy path, deeper fallback.

03

LLM-ready delivery

Every document can carry title, source, description, content type, raw text, images, timings, chunks, and optional enrichment so your agents receive context with provenance.

How it works

A calmer pipeline for chaotic pages.

The point is not “scraping.” The point is reconstructing a document that feels like it was authored for a machine collaborator.

Fetch

Start with fast HTTP retrieval and only escalate when the DOM looks like a shell.

Understand

Choose the real content root and walk the structure intentionally instead of dumping nodes.

Normalize

Emit stable markdown and JSON with metadata, links, tables, code blocks, and provenance.

Extend

Add OCR, chunking, summaries, captions, and async jobs only when the workload deserves it.

Endpoints

Built like a service, not a demo.

An experimental intelligence infrastructure product from BRAHMAI.