Design¶
This page gives the condensed rationale. The canonical, longer design
document is spec.md
in the repo root; the roadmap is in
feature-plan.md.
Why wordlive exists¶
There is no good Python library for driving a live Microsoft Word session. The options today are:
| Library | Target | Mechanism |
|---|---|---|
python-docx |
.docx file on disk |
OOXML I/O |
docx-plus |
.docx file on disk (docx extender) |
OOXML I/O |
wordlive |
Running winword.exe |
COM (pywin32) |
File-side libraries can't help when the user has the document open — Word
holds the lock, and any change you make on disk is invisible until the user
closes and re-opens. COM is the only path. And raw pywin32 is brutally
LLM-hostile: magic integer constants, untyped late-bound dispatch, modal
dialog footguns, STA threading rules.
xlwings exists for Excel. wordlive is the equivalent for Word, with one
extra goal: be first-class for LLM tool use, not retrofitted.
Design principles¶
The four principles, in priority order:
- Politeness first. Default behaviour preserves the user's
Selection, view, and scroll. They keep editing alongside your script. Operations that must move the cursor say so explicitly (doc.go_to(...),scope.allow_cursor_move()). - Semantic anchors over
Selection. Operations target named handles — bookmarks, content controls, headings — not the live cursor. Anchors are stable across edits and visible to an LLM as JSON strings; the cursor is neither. - Atomic undo. Every
doc.edit()block opens a WordUndoRecord, so one Ctrl-Z reverts the whole intent. A 10-opexecscript is one undo step, not ten. - Structured I/O. Reads return dataclasses / dicts; the CLI emits one JSON object per invocation; exit codes are deterministic. No string scraping anywhere in the pipeline. See the Errors page for the exit-code contract.
Underlying all four: an escape hatch. Every wrapper exposes .com. When
wordlive doesn't cover something, drop to raw COM rather than giving up.
What's out of scope¶
- Cross-platform support. COM is Windows-only. We don't pretend otherwise.
- Cloud co-authoring. Microsoft Graph / WOPI is a different stack and a different problem.
- Full Word object-model coverage. Anything we don't cover is one
.comaccess away. - Replacing
python-docx. Different surface, different problem. - Embedding the Word window as a child HWND. Separate problem, out of scope.
Architecture at a glance¶
your code / LLM
│
▼
┌───────────────────────────────────────────────────┐
│ wordlive public API │
│ attach / connect → Word │
│ │ │
│ ▼ │
│ Document │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ bookmarks content_controls headings │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Bookmark ContentControl Heading │
│ └─────────────┴─────────────┘ │
│ │ │
│ ▼ │
│ Anchor (text, set_text, │
│ insert_before/after, delete) │
└───────────────────────────────────────────────────┘
│
▼
EditScope (UndoRecord + SelectionSnapshot)
│
▼
pywin32 → Word.Application (COM, STA-threaded)
The library is intentionally flat: ~10 modules, no plugin system, no hierarchy beyond Word → Document → Anchor.
What comes next¶
The roadmap lives in
feature-plan.md.
The current release covers the politeness/anchors/EditScope core, the LLM-first
CLI, fuzzy find/replace, document-scoped styles + paragraph formatting, tables
(cells as table:N:R:C anchors), the collaboration surface (review comments,
scoped track-changes, and arbitrary range:START-END anchors), document
structure — bullet/numbered lists and section headers/footers
(header:S:WHICH / footer:S:WHICH anchors), full paragraph addressing (every
paragraph is a para:N anchor — doc.paragraphs, outline --all — with insert
working on any anchor via --before/--after and an explicit, opt-in cursor
surface, cursor read / cursor write), and image insertion
(anchor.insert_image(...) /
wordlive insert-image, accepting a file path, raw bytes, or base64, with
required wrap), table creation / deletion
(Document.add_table /
Anchor.insert_table / Table.delete),
page / column / section breaks
(Anchor.insert_break and
format_paragraph(page_break_before=…)), and page / section rendering to
PNG for vision models (Document.snapshot /
Anchor.snapshot, via the optional snapshot
extra). wordlive also ships two LLM-facing agent skills — a CLI guide and an
import wordlive as wl Python guide that wordlive install-skill drops into
.agents/skills/ — and an MCP server (wordlive-mcp, registered with
wordlive install-mcp or the one-click .mcpb bundle) that exposes the same
surface as a handful of dispatch tools (see MCP). Next on the visual-content track: extracting embedded
images back out for vision models, then
Excel-backed charts; after that, event sinks (WindowSelectionChange,
DocumentBeforeSave), an async wrapper around the sync core, and the deeper
style cuts (character styles, theme-aware fonts).
Full design document¶
For the unabridged version — including the original motivation, the error
taxonomy in more detail, the rejected alternatives, and a list of open
questions — see
spec.md in
the repo root.