For agents, RAG, and long-context systems

URLs in. Structured context out.

SCRYBE fetches pages, resolves dynamic rendering, extracts semantic structure, falls back to OCR when needed, and returns markdown or JSON that feels deliberate instead of scraped.

Try JSON Parse Try Markdown Parse

Input: web pages, dynamic apps, PDFs, OCR-heavy docs
Output: semantic markdown, structured JSON, job-based workflows
Runtime: Python-first API, browser fallback, Node hidden behind OCR

Live Shape operational

Example URL

https://brahmai.in

Example response shape

GET /v1/parse/json?url=https://brahmai.in

{
  "format": "json",
  "document": {
    "metadata": {
      "title": "...",
      "description": "...",
      "content_type": "webpage"
    },
    "markdown": "Title: ...\nMeta Description: ...",
    "chunks": null,
    "summary": null
  }
}

JSON Markdown OCR fallback

Semantic extraction

SCRYBE no longer just flattens HTML. It finds the main content root, preserves headings, lists, links, blockquotes, tables, code, and keeps the result readable for downstream reasoning.

Smart runtime ladder

Static fetch first. Browser render when the page is mostly shell. OCR or LiteParse only when native extraction becomes untrustworthy. Faster happy path, deeper fallback.

LLM-ready delivery

Every document can carry title, source, description, content type, raw text, images, timings, chunks, and optional enrichment so your agents receive context with provenance.

How it works

A calmer pipeline for chaotic pages.

The point is not “scraping.” The point is reconstructing a document that feels like it was authored for a machine collaborator.

Fetch

Start with fast HTTP retrieval and only escalate when the DOM looks like a shell.

Understand

Choose the real content root and walk the structure intentionally instead of dumping nodes.

Normalize

Emit stable markdown and JSON with metadata, links, tables, code blocks, and provenance.

Extend

Add OCR, chunking, summaries, captions, and async jobs only when the workload deserves it.

Endpoints

Built like a service, not a demo.

An experimental intelligence infrastructure product from BRAHMAI.

GET /v1/parse/json GET /v1/parse/markdown POST /v1/jobs GET /v1/capabilities