docx_parser_converter.docx_parsers.document.paragraph_parser module

class docx_parser_converter.docx_parsers.document.paragraph_parser.ParagraphParser[source]

Bases: object

A parser for extracting paragraph elements from the DOCX document structure.

This class handles the extraction of paragraph properties, runs, styles, numbering, and tabs within a paragraph element, converting them into a structured Paragraph object for further processing or conversion to other formats like HTML.

extract_paragraph_properties(pPr: Element | None) ParagraphStyleProperties[source]

Extracts the paragraph properties from the given paragraph properties element.

Parameters:

pPr (Optional[etree.Element]) – The paragraph properties element.

Returns:

The extracted paragraph style properties.

Return type:

ParagraphStyleProperties

extract_runs(p: Element) List[Run][source]

Extracts the run elements from the paragraph element.

Parameters:

p (etree.Element) – The paragraph element.

Returns:

The list of extracted runs.

Return type:

List[Run]

Example

The following is an example of run elements in a paragraph element in a document.xml file:

<w:r>
    <w:t>Example text</w:t>
</w:r>
extract_style_id(pPr: Element | None) str | None[source]

Extracts the style ID from the paragraph properties element.

Parameters:

pPr (Optional[etree.Element]) – The paragraph properties element.

Returns:

The style ID, or None if not found.

Return type:

Optional[str]

Example

The following is an example of a paragraph style element in a document.xml file:

<w:pStyle w:val="Heading1"/>
extract_tabs(pPr: Element | None) List[TabStop] | None[source]

Extracts the tab stops from the paragraph properties element.

Parameters:

pPr (Optional[etree.Element]) – The paragraph properties element.

Returns:

The list of tab stops, or None if not found.

Return type:

Optional[List[TabStop]]

Example

The following is an example of a tabs element in a document.xml file:

<w:tabs>
    <w:tab w:val="left" w:pos="720"/>
</w:tabs>
parse(p: Element) Paragraph[source]

Parses a paragraph element from the DOCX document.

Parameters:

p (etree.Element) – The paragraph element to parse.

Returns:

The parsed paragraph object.

Return type:

Paragraph

Example

The following is an example of a paragraph element in a document.xml file:

<w:p>
    <w:pPr>
        <w:pStyle w:val="Heading1"/>
        <w:numPr>
            <w:ilvl w:val="0"/>
            <w:numId w:val="1"/>
        </w:numPr>
    </w:pPr>
    <w:r>
        <w:t>Example text</w:t>
    </w:r>
</w:p>
parse_tabs(tabs_elem: Element) List[TabStop][source]

Parses the tab stops from the tabs element.

Parameters:

tabs_elem (etree.Element) – The tabs element.

Returns:

The list of parsed tab stops.

Return type:

List[TabStop]