01
Semantic extraction
SCRYBE no longer just flattens HTML. It finds the main content root, preserves headings, lists, links, blockquotes, tables, code, and keeps the result readable for downstream reasoning.
For agents, RAG, and long-context systems
SCRYBE fetches pages, resolves dynamic rendering, extracts semantic structure, falls back to OCR when needed, and returns markdown or JSON that feels deliberate instead of scraped.
GET /v1/parse/json?url=https://brahmai.in
{
"format": "json",
"document": {
"metadata": {
"title": "...",
"description": "...",
"content_type": "webpage"
},
"markdown": "Title: ...\nMeta Description: ...",
"chunks": null,
"summary": null
}
}
01
SCRYBE no longer just flattens HTML. It finds the main content root, preserves headings, lists, links, blockquotes, tables, code, and keeps the result readable for downstream reasoning.
02
Static fetch first. Browser render when the page is mostly shell. OCR or LiteParse only when native extraction becomes untrustworthy. Faster happy path, deeper fallback.
03
Every document can carry title, source, description, content type, raw text, images, timings, chunks, and optional enrichment so your agents receive context with provenance.
How it works
The point is not “scraping.” The point is reconstructing a document that feels like it was authored for a machine collaborator.
Start with fast HTTP retrieval and only escalate when the DOM looks like a shell.
Choose the real content root and walk the structure intentionally instead of dumping nodes.
Emit stable markdown and JSON with metadata, links, tables, code blocks, and provenance.
Add OCR, chunking, summaries, captions, and async jobs only when the workload deserves it.
Endpoints
An experimental intelligence infrastructure product from BRAHMAI.