Test Suite Gap Audit¶
Status (2026-05-20): The per-item audit below was snapshotted at end-of-Phase-5 (v0.1) and has not been re-run against the v0.2 suite. Treat its stats as historical: the current suite is 717 tests across 34 files (pytest, mypy
--strict, ruffcheckall green). The v0.1 and v0.2 cycles did not formally close the IMPORTANT gaps catalogued here — later test growth narrowed several, but they remain the priority backlog for a v0.3 re-audit. The original "realistic target before Phase 4" milestone was not met; that planning note is left below as a record.
Snapshot date: 2026-05-19 (end of Phase 5)
Scope: Phases 1–5 — core/, styles/, controls/, fields/, protection/
Suite size at snapshot: 285 tests across 17 files; mypy --strict, ruff check, ruff format --check all green on both docx_plus/ and tests/
This is an honest accounting of where the existing test suite has real holes,
not a comprehensive coverage report. Items are listed by severity. Each entry
cites the file/line that exercises (or fails to exercise) the path and the
SPEC / IMPLEMENTATION.md section that motivates it. Phase 6 (examples/)
is intentionally excluded — it is not yet built, so missing tests there are
expected and not a "gap" in the sense used here.
Entries close (move to a ## Resolved section at the bottom) as gaps are
filled. Update this file when you fix one.
BLOCKER¶
(none — see ## Resolved for B1.)
IMPORTANT¶
Correctness gaps that could bite under realistic workloads. Listed in recommended fix order.
I1. Save→reopen round-trips cover only 2 of 5 modify operations¶
- Where:
tests/test_styles_modify.py:884(test_save_reopen_preserves_created_style) andtests/test_styles_modify.py:911(test_save_reopen_preserves_ensure_style_heading) are the only tests that go to disk and reopen withpython-docx. The other three operations (modify_style,apply_style,delete_style) are tested in-memory only. - Plausible bug: A modification that works against the in-memory lxml
tree but produces XML that
python-docx's serializer or Word's parser rejects would not surface. Schema-order violations onmodify_style(see I2) are the canonical example. - SPEC / IMPL ref: SPEC §10 Layer 2 ("round-trip tests"),
IMPLEMENTATION.md §8ranks round-trips as the highest-value test class. - Fix: Add three round-trip tests:
test_save_reopen_preserves_modified_style— create a style with multiple properties, save, reopen,modify_styleto change two, save, reopen, assert resolved formatting reflects both the original untouched properties and the modified onestest_save_reopen_preserves_applied_style— create + apply, save, reopen, resolve the paragraph, assert it picks up the styletest_save_reopen_delete_force_leaves_dangling_refs— apply a style to a paragraph,delete_style(force=True), save, reopen, assert thew:pStylebody reference still names the deleted style (closes I5 simultaneously)
I2. Schema-order assertions exist for create_style but not modify_style¶
- Where:
tests/test_styles_modify.py:277(test_style_children_ordered_correctly),:301(test_ppr_children_ordered_correctly), and:320(test_rpr_children_ordered_correctly) all verify schema order aftercreate_style. No equivalent test runs aftermodify_style. - Plausible bug:
modify_stylewith kwargs that target the same composite element (w:ind,w:spacing,w:rFonts) uses merge semantics — re-reading the element, mutating attributes, and writing it back. If the merge path appends instead of_ordered_insert-ing, child order can drift. The result remains structurally well-formed XML, so in-memory tests pass, but Word silently "repairs" the file on open. - SPEC / IMPL ref:
IMPLEMENTATION.md §4flags silent repair as a high-risk failure mode. - Fix: Add
test_modify_style_preserves_pPr_child_orderandtest_modify_style_preserves_rPr_child_order. Construct a style with child elements deliberately near both ends of the canonical order, runmodify_styleto touch an attribute on a child mid-list, and assert the child order matches_PPR_CHILD_ORDER/_RPR_CHILD_ORDER.
I3. Cycle detection only exercised with 2-node cycles¶
- Where:
tests/test_styles_inspect.pyhas one cycle test (A→B→A) and one depth-limit test (13-node chain triggering_MAX_STYLE_CHAIN_DEPTHatinspect.py:33). - Plausible bug: A defect in the visited-set logic in
_collect_style_chain(inspect.py:376-399) could pass the 2-node case but mishandle longer cycles or self-references. Three nodes is the smallest cycle that distinguishes "checks predecessor" from "checks set membership"; a self-loop (A → basedOn A) is the smallest case that tests whethercurrent_id in visitedruns before the chain is appended to. - SPEC / IMPL ref: SPEC §4 ("Cycle detection required"),
IMPLEMENTATION.md §5("Cycle detection lives in one place"). - Fix: Add
test_three_node_cycle_raises_with_full_path(A→B→C→A, asserting the error message includes all three IDs in the cycle path) andtest_self_referential_basedon_raises(A → basedOn A).
I4. Toggle parity not exercised end-to-end with w:val="0"¶
- Where:
tests/test_cascade_toggles.py:172(test_value_zero_treated_as_explicit_false) confirms the toggle helper treats"0"the same as"false"at the unit level, but no test runs the full cascade walker (_Accumulator.toggleatinspect.py:223-238) against a style chain where one layer emitsw:val="0"and another emits unqualified<w:b/>. - Plausible bug: Word can emit either
"0"or"false"depending on version and authoring tool. A regression that comparesval_attr == "false"instead ofval_attr in ("0", "false")would let one of those XOR through the chain when it should reset. - SPEC / IMPL ref: SPEC §4 ("Implementation note: track each toggle
property's parity..."),
IMPLEMENTATION.md §5toggle test list (case 3: "Style A bold, style B basedOn Aw:b w:val='false'→ effective false"). - Fix: Parametrize the existing toggle-chain tests with both
"0"and"false"forms of the explicit-reset case, or add a dedicatedtest_toggle_val_zero_resets_parity_through_basedon_chain.
I5. delete_style(force=True) only smoke-checked¶
- Where:
tests/test_styles_modify.py:497(test_delete_style_force_overrides_reference_check) asserts the style is gone afterforce=True, but does not check the body refs. - Plausible bug: A defensive "rewrite to Normal" branch could be added (e.g. while refactoring) and the test would still pass — but the documented contract (SPEC §5: "leaves dangling references — Word will fall back to Normal") would silently change.
- SPEC ref: SPEC §5 (
delete_style). - Fix: Convert the existing assertion to also XPath the body for
w:pStyle[@w:val='ToDelete']and assert at least one such reference remains. Folds into I1'stest_save_reopen_delete_force_leaves_dangling_refs.
I6. Built-in style materialization is one bulk smoke test¶
- Where:
tests/test_styles_modify.py:609(test_ensure_style_all_known_builtins_succeed) iterates every entry in_BUILTIN_STYLES(modify.py:1154, 107 entries across seven tiers) and assertsensure_stylereturns a proxy. Heading1 has a dedicated round-trip at:584and:911, but no other built-in has property-level round-trip coverage — they're only checked for "doesn't raise." - Plausible bug: A missing or wrong property in one built-in's
propertiesdict (e.g. Heading3 missingspacing_before,TableofFigureskeyed under the wrong case as happened on 2026-05-19) materializes successfully — theproperties=kwargs are optional andensure_stylefalls through tocreate_styleon a table miss — but the resolved paragraph using that style lacks the expected formatting. The bulk test never inspects the resolved output. - SPEC ref: SPEC §5 ("Test that materialization of each produces a style Word accepts.")
- Fix: Parametrize a single test over a list of
(style_id, expected_properties)tuples; for each,ensure_style, apply to a paragraph,resolve_effective_formatting, and assert eachexpected_propertiesfield matches. With 107 entries in the table now this is high-value: a regression like theTableOfFigures→TableofFiguresstyleId case bug would be caught immediately. Start with the 14 SPEC §5 "at minimum" set, then expand to all sample-sourced tiers (E/F/G).
I7. remap_styles four-step resolution not tested as a sequence¶
- Where:
tests/test_styles_modify.py:724-840covers each of the four steps individually (exact match:724, explicit mapping:732, matcher:749, create-from-builtins:782), but no test triggers fall-through across all four on a single target. - Plausible bug: A step-ordering defect that skips a step entirely (e.g. fall-through condition inverted) would not surface — each step's test sets up the world so that its step is the one that fires.
- IMPL ref:
IMPLEMENTATION.md §12(Phase 3.5 entry) describes the intended four-step chain. - Fix: Add
test_remap_styles_falls_through_all_four_steps— construct a doc with one target that needs each step in turn (across multiple targets in one call), assert each lands at the expected step.
I8. Unknown theme-name graceful-degradation path untested¶
- Where:
styles/theme.pyThemeColors.base()returnsNonefor names not in_THEME_NAME_TO_SCHEME_KEY.inspect.py:605-620(_resolve_color) then setsacc.partial = Trueand returns the unresolved name.tests/test_theme_edge_cases.pycovers strip-rel and corrupt-blob scenarios but not the "themeColor is a garbage name" path through the cascade walker. - Plausible bug: A future refactor that raises instead of returning
Nonewould turn the inspector into a function that crashes on diverse real-world inputs —IMPLEMENTATION.md §5explicitly warns against this. - SPEC ref: SPEC §4 ("If the theme part is missing or malformed,
set
partial=True..."). - Fix: Add a test that builds a style with
themeColor="garbage", resolves a paragraph using it, assertspartial=Trueandcolor_rgb == "garbage".
NICE-TO-HAVE¶
Defensive completeness — worth adding but not blocking Phase 4.
N1. Shared assertion library — two helpers still missing¶
- Where:
docx_plus/_testing/ooxml_asserts.pynow exportsassert_ids_unique,assert_style_defined,count_controls,assert_protected,assert_field_dirty. SPEC §10 also listsassert_style_not_definedandassert_no_orphan_relationships, neither of which exists yet. - Status: Reduced from "two-of-seven" to "five-of-seven" across
Phases 4 and 5.
assert_style_not_definedhas obvious callers in current code (the delete-style tests use ad-hoc XPath instead);assert_no_orphan_relationshipsis still blocked on a real caller needing it. - Fix: Add
assert_style_not_defined. Defer the relationship helper until v0.2 binding work needs it.
N2. ~~Conditional table formatting has no skip-marked placeholder~~ — resolved (v0.2 expansion)¶
See ## Resolved below — tests/test_styles_table_conditional.py
exercises the live implementation rather than carrying an xfail.
Correction note¶
A draft of this audit flagged apply_lum_mod / apply_lum_off as
untested. They are actually exercised at tests/test_styles_theme.py:103,
108, 112, 116 (4 tests covering identity, halving, clamp-at-one,
floor-at-zero). The entry was removed before publication.
Recommended priority order for resolution¶
Filed in the order that maximizes value-per-effort and unblocks Phase 4:
- B1 (coverage threshold) — single PR, single config change, biggest structural improvement
- I1 + I5 (save→reopen for modify/apply/delete, folded together)
- I2 (schema-order on
modify_stylemerge path) - I3 (3-node and self-referential cycles)
- I6 (per-style cascade verification across built-ins)
- I4 + I7 + I8 (parametric / fall-through / theme-unknown edge cases)
- N1 (
assert_style_not_defined) - N2 (xfail placeholder for conditional table formatting)
Items 1–4 were the realistic target before Phase 4 began; 5–8 to land alongside Phase 4 work. (Historical: this milestone was not met — see the status note at the top of the file. Re-prioritise during the v0.3 audit.)
Resolved¶
2026-05-19 — v0.2 in-place expansion¶
- N2 — Conditional table formatting now lands real cascade coverage.
The cascade resolver applies
<w:tblStylePr>branches (firstRow,lastRow,firstCol,lastCol,band1Horz,band1Vert, the four corners,wholeTable) in ECMA-376 17.7.6.5 precedence order. 13 tests intests/test_styles_table_conditional.pyverify auto-derivation from_Cellposition, manualTableContextoverride, precedence (firstRowoverband1Horz,nwCelloverfirstRow),wholeTablealways-applies, and that paragraphs / runs inside cells inherit the conditional formatting. The xfail placeholder the original entry recommended was never needed — the feature shipped instead.
2026-05-19 — Phase 6¶
- N3 — Headless LibreOffice smoke tests shipped. The
requires_libreofficemarker is now used bytests/test_examples_libreoffice.py(one test per example script: builds, converts to PDF via headlesssoffice, asserts conversion succeeds and the PDF has the expected page count). Skipped automatically whensofficeis not on PATH; the Ubuntu/Python 3.13 CI job installs LibreOffice to exercise them. - Coverage scope note. The 90% gate runs over
core/,styles/, andcontrols/.docx_plus/_testing/*(test infrastructure) anddocx_plus/examples/*(covered by Layer 3 smoke tests, not unit tests) are listed inpyproject.toml's[tool.coverage.run] omitby design. - B1 — Coverage gate flipped on.
fail_under = 90added to[tool.coverage.report]inpyproject.toml;ci.ymlnow runspytest ... --cov-fail-under=90. Coverage at 91.76% aggregate (styles 90.7%, core 91%, controls 93%). Bringing this from a stale 77% onstyles/inspect.pyrequired two new test files:tests/test_cascade_numbering.py(7 tests covering layer 4 of the cascade) andtests/test_cascade_run_target.py(5 tests covering Run and_Celltarget paths). Thenumbered.docxfixture intests/fixtures/build_fixtures.py:build_numberedmaterialises a customabstractNum/numpair inside python-docx's already-shipped numbering part.
2026-05-19 — Phase 5¶
- Partial: N1 —
count_controls(Phase 4),assert_protected(Phase 5),assert_field_dirty(Phase 5) added todocx_plus/_testing/ooxml_asserts.py. Five-of-seven SPEC §10 helpers now present; entry above is rewritten with the remaining two.
Phase 4–5 coverage notes¶
These don't belong in the BLOCKER/IMPORTANT/NICE-TO-HAVE buckets above — they're scoped to features that landed in Phases 4 and 5 — but they record what is and isn't tested for those modules.
controls/builder.py— 24 tests intests/test_controls_builder.pycover every control type's structure assertion + a save→reopen round-trip, multi-builder ID-registry sharing, theMissingNamespaceErrorpath, and sdtPr child-order invariants per control type.controls/read.py— 26 tests intests/test_controls_read.pycover the four typed errors, everyset_control_valuetype path, the dropdown-by-value vs by-display matching, the combobox freeform-passthrough, and theexisting_formfixture (a doc built withoutFormBuilder— schema-tolerance check).fields/— 24 tests intests/test_fields.py. Every field variant's instruction text, thexml:space="preserve"invariant, two-fields-in-one-paragraph composition, three save→reopen round-trips (PAGE, DATE, mark-dirty+PAGE), and fourmark_fields_dirtycases (insert / idempotent / update-false / schema-position / fallback-append).protection/— 18 tests intests/test_protection.py. EveryProtectionModeliteral (parametrised), the schema-position invariant againstw:defaultTabStop, idempotency + mode-replace, unprotect+reprotect, theis_protectedpredicate, save→reopen round-trip, plus negative tests on theassert_protectedhelper.- Round-trip floor for new capabilities: every modifying function
in
controls/,fields/,protection/has at least one save→reopen test. The Phase 3 gap onmodify_style/apply_style/delete_style(I1, I5 above) is the outlier, not the rule.
Move entries here with their resolution date and the test file/line that now covers them.