Skip to content

docx_plus.comments.read

Inverse of add_comment: walks comments.xml and pairs each <w:comment> with the body-side range it anchors. Each result carries the comment body text, the anchored document text, the paragraph index where the comment is attached, and parsed metadata (author, initials, timestamp). Orphaned comments (no matching body range) appear with anchored_text="" and paragraph_index=-1.

docx_plus.comments.read

Read every anchored comment from a document.

Inverse of :func:docx_plus.comments.add_comment: walks the comments part and pairs each <w:comment> with the body-side range it anchors, extracting the comment text and the document text the comment is attached to.

This module imports only from docx_plus.core (SPEC §9.1).

AnchoredComment dataclass

AnchoredComment(
    comment_id: int,
    author: str,
    initials: str | None,
    timestamp: datetime | None,
    text: str,
    anchored_text: str,
    paragraph_index: int,
)

A comment paired with the document text it anchors to.

Attributes:

Name Type Description
comment_id int

The w:id value of the comment.

author str

The w:author attribute (may be empty).

initials str | None

The w:initials attribute, or None if absent.

timestamp datetime | None

The w:date attribute parsed as a timezone-aware UTC :class:datetime, or None if the attribute is absent or unparseable.

text str

The comment body text. Multiple text runs are concatenated.

anchored_text str

The document text between the comment's commentRangeStart and commentRangeEnd markers. Empty if no matching body range exists (orphaned comment), or if the markers are inverted (rangeEnd appears before rangeStart in document order) — a malformed state this reader reports as empty rather than guessing.

paragraph_index int

Zero-based index (within doc.paragraphs) of the paragraph that contains the commentRangeStart marker. -1 for orphaned comments.

read_comments

read_comments(doc: Document) -> list[AnchoredComment]

Return every comment in doc paired with the text it anchors to.

A comment with no matching body range still appears in the result with anchored_text="" and paragraph_index=-1 — this is the "orphaned" state that python-docx's add_comment produces.

Parameters:

Name Type Description Default
doc Document

The python-docx :class:~docx.document.Document to scan.

required

Returns:

Name Type Description
One list[AnchoredComment]

class:AnchoredComment per comment, in

list[AnchoredComment]

comments.xml order. Returns [] if the document has no

list[AnchoredComment]

comments part at all.

Source code in docx_plus/comments/read.py
def read_comments(doc: Document) -> list[AnchoredComment]:
    """Return every comment in ``doc`` paired with the text it anchors to.

    A comment with no matching body range still appears in the result
    with ``anchored_text=""`` and ``paragraph_index=-1`` — this is the
    "orphaned" state that python-docx's ``add_comment`` produces.

    Args:
        doc: The python-docx :class:`~docx.document.Document` to scan.

    Returns:
        One :class:`AnchoredComment` per comment, in
        ``comments.xml`` order. Returns ``[]`` if the document has no
        comments part at all.
    """
    try:
        comments_part = cast("XmlPart", doc.part.part_related_by(RT.COMMENTS))
    except KeyError:
        return []

    comments_root = comments_part.element
    body = doc.element.body
    paragraph_elements = list(xpath(body, ".//w:p"))

    result: list[AnchoredComment] = []
    for comment_el in xpath(comments_root, "./w:comment"):
        cid_raw = comment_el.get(qn("w:id"))
        if cid_raw is None:
            continue
        try:
            cid = int(cid_raw)
        except ValueError:
            continue

        author = comment_el.get(qn("w:author")) or ""
        initials = comment_el.get(qn("w:initials"))
        timestamp = _parse_date(comment_el.get(qn("w:date")))

        text = _comment_body_text(comment_el)

        anchored_text, paragraph_index = _anchor_lookup(body, paragraph_elements, str(cid))

        result.append(
            AnchoredComment(
                comment_id=cid,
                author=author,
                initials=initials,
                timestamp=timestamp,
                text=text,
                anchored_text=anchored_text,
                paragraph_index=paragraph_index,
            )
        )
    return result