docx_parser_converter.docx_parsers.document.paragraph_parser module
- class docx_parser_converter.docx_parsers.document.paragraph_parser.ParagraphParser[source]
Bases:
objectA parser for extracting paragraph elements from the DOCX document structure.
This class handles the extraction of paragraph properties, runs, styles, numbering, and tabs within a paragraph element, converting them into a structured Paragraph object for further processing or conversion to other formats like HTML.
- extract_paragraph_properties(pPr: Element | None) ParagraphStyleProperties[source]
Extracts the paragraph properties from the given paragraph properties element.
- Parameters:
pPr (Optional[etree.Element]) – The paragraph properties element.
- Returns:
The extracted paragraph style properties.
- Return type:
- extract_runs(p: Element) List[Run][source]
Extracts the run elements from the paragraph element.
- Parameters:
p (etree.Element) – The paragraph element.
- Returns:
The list of extracted runs.
- Return type:
List[Run]
Example
The following is an example of run elements in a paragraph element in a document.xml file:
<w:r> <w:t>Example text</w:t> </w:r>
- extract_style_id(pPr: Element | None) str | None[source]
Extracts the style ID from the paragraph properties element.
- Parameters:
pPr (Optional[etree.Element]) – The paragraph properties element.
- Returns:
The style ID, or None if not found.
- Return type:
Optional[str]
Example
The following is an example of a paragraph style element in a document.xml file:
<w:pStyle w:val="Heading1"/>
- extract_tabs(pPr: Element | None) List[TabStop] | None[source]
Extracts the tab stops from the paragraph properties element.
- Parameters:
pPr (Optional[etree.Element]) – The paragraph properties element.
- Returns:
The list of tab stops, or None if not found.
- Return type:
Optional[List[TabStop]]
Example
The following is an example of a tabs element in a document.xml file:
<w:tabs> <w:tab w:val="left" w:pos="720"/> </w:tabs>
- parse(p: Element) Paragraph[source]
Parses a paragraph element from the DOCX document.
- Parameters:
p (etree.Element) – The paragraph element to parse.
- Returns:
The parsed paragraph object.
- Return type:
Example
The following is an example of a paragraph element in a document.xml file:
<w:p> <w:pPr> <w:pStyle w:val="Heading1"/> <w:numPr> <w:ilvl w:val="0"/> <w:numId w:val="1"/> </w:numPr> </w:pPr> <w:r> <w:t>Example text</w:t> </w:r> </w:p>