Document Engine

Four ways to read and edit .docx files - from the browser, backend, terminal, or AI agents.

Editing a .docx file from code is unreasonably hard.

A Word document isn't a file you can read and write like JSON. It's a zip archive of interconnected XML files built on OOXML - a specification with thousands of pages (6,755 to be precise). Tables reference styles that reference numbering definitions that reference theme colors. Move a paragraph and you might break a tracked change three sections away.

Most tooling either strips this complexity (convert to Markdown, edit, convert back) or tries to manipulate the XML directly. Both approaches lose formatting, break comments, or produce files that Word opens with repair warnings.

SuperDoc Document Engine makes programmatic .docx editing reliable.

What Document Engine is

Document Engine is a deterministic operation layer for .docx files. You search for content, replace text, add comments, apply tracked changes, format paragraphs - through a stable API that preserves the full document structure.

Every operation runs against the same underlying document model that powers the SuperDoc editor. The same model that handles footnote layout, table merges, and tracked change attribution in the browser also handles them headlessly.

The result is the same regardless of which surface you use.

Four surfaces, same operations

Document Engine gives you four ways to work with documents, depending on where your code runs:

Document API - in-browser editing

If you're building a web app with a visible SuperDoc editor, you already have access. Every operation lives on editor.doc.*:

const match = editor.doc.query.match({
  select: { type: 'text', pattern: 'ACME Corp' },
  require: 'first',
});

const ref = match.items?.[0]?.handle?.ref;
if (ref) {
  editor.doc.mutations.apply({
    expectedRevision: match.evaluatedRevision,
    atomic: true,
    steps: [
      {
        id: 'replace-acme',
        op: 'text.rewrite',
        where: { by: 'ref', ref },
        args: { replacement: { text: 'NewCo Inc.' } },
      },
    ],
  });
}

No network round-trips. The document is already loaded in the browser.

SDKs - backend automation

The Node.js and Python SDKs manage a headless editor process and expose typed methods for every operation.

import { SuperDocClient } from '@superdoc-dev/sdk';

const client = new SuperDocClient({ defaultChangeMode: 'tracked' });
const doc = await client.open({ doc: './contract.docx' });

const match = await doc.query.match({
  select: { type: 'text', pattern: 'ACME Corp' },
  require: 'first',
});

const ref = match.items?.[0]?.handle?.ref;
if (ref) {
  await doc.mutations.apply({
    expectedRevision: match.evaluatedRevision,
    atomic: true,
    steps: [
      {
        id: 'replace-acme',
        op: 'text.rewrite',
        where: { by: 'ref', ref },
        args: { replacement: { text: 'NewCo Inc.' } },
      },
    ],
  });
}

await doc.save();
await doc.close();
npm install @superdoc-dev/sdk   # Node.js
pip install superdoc-sdk         # Python

The CLI is bundled - no separate install. Works in backend services, automation pipelines, serverless environments, anywhere you can run Node or Python.

CLI - scripts, CI pipelines, and AI agents

The CLI exposes every Document API operation as a shell command. Use it from a terminal, a shell script, or let an AI agent drive it.

superdoc open contract.docx --user-name "Review Bot"

superdoc find --type text --pattern "ACME Corp"

superdoc replace \
  --target-json '...' \
  --text "NewCo Inc." \
  --change-mode tracked

superdoc save
superdoc close

Use --change-mode tracked on any mutating command to apply edits as tracked changes. Pipe JSON in and out for scripted workflows.

The CLI is also an agent interface. Run superdoc host --stdio to start a persistent JSON-RPC server over stdio. AI agents spawn the process, send structured commands, and get JSON responses back. No MCP server required.

This pattern - subprocess over stdio with structured I/O - is how most production agent frameworks execute tools internally. It's lighter than MCP (no schema bloat, no initialization dance), composable with standard shell tooling, and debuggable by both humans and machines.

The SuperDoc CLI was designed for this from the start. Every command outputs structured JSON, and the host --stdio mode keeps a document session open across multiple operations without re-loading. Both the Node.js and Python SDKs use this exact mechanism under the hood.

npm install -g @superdoc-dev/cli

MCP Server - AI agents

The MCP server gives AI agents direct access to .docx files through the Model Context Protocol. Set it up once - your agent spawns the server automatically on each conversation.

# Claude Code
claude mcp add superdoc -- npx @superdoc-dev/mcp

# Cursor - add to ~/.cursor/mcp.json
# Windsurf - add to ~/.codeium/windsurf/mcp_config.json

The agent follows the same workflow: open, read, edit, save, close. 180+ tools cover content reading, search, editing, formatting, tables, lists, sections, comments, tracked changes, and batched mutations.

superdoc_open → superdoc_get_content / superdoc_search → edit tools → superdoc_save → superdoc_close

Your documents never leave your machine. The server runs locally, reads from disk, writes back to disk.

Why this matters for AI

LLMs are good at deciding what should change in a document. They can summarize contracts, suggest edits, flag inconsistencies.

But actually modifying a .docx file is a different problem. If you ask a model to generate OOXML, you're asking it to understand the full spec - relationship graphs, formatting rules, Word-specific expectations. Sometimes the output works. Sometimes it produces XML that follows the spec but behaves incorrectly in Word. And because the model generates structure each time, the same edit may be performed differently on every run.

Document Engine turns this from a generation problem into a tool-use problem:

Before:  LLM → generate OOXML → hope Word opens it correctly
After:   LLM → choose document operation → Document Engine performs edit

The model decides what should change. Document Engine performs how it happens. The operation is deterministic - same input, same result, every time.

This works across all four surfaces. The SDK ships ready-made tool definitions for OpenAI, Anthropic, and Vercel AI that you can pass directly to your model. The MCP server provides the same operations to any MCP-compatible agent.

These default tools cover the most common workflows. But the Document API is the full surface - you can compose your own tools on top of it, tailored to your specific use case. A contract review agent doesn't need the same tools as a translation agent. Build exactly what your workflow requires.

What you can do today

  • Find and replace across documents while preserving formatting
  • Add comments and tracked changes programmatically
  • Format text - bold, italic, styles, alignment, spacing
  • Create structure - paragraphs, headings, lists
  • Batch operations = group multiple edits into a single atomic mutation
  • Diff documents - compare two .docx files and produce a tracked-changes result
  • Build AI agents that edit real Word documents reliably

Get started

Pick the surface that fits your use case: