docx_parser_converter.docx_parsers.document.document_parser module
- class docx_parser_converter.docx_parsers.document.document_parser.DocumentParser(source: bytes | str | None = None)[source]
Bases:
objectParses the main document.xml part of a DOCX file.
This class handles the extraction and parsing of the document.xml file within a DOCX file, converting it into a structured DocumentSchema.
- extract_elements() List[Paragraph | Table][source]
Extracts elements (paragraphs and tables) from the document XML.
Example
The following is an example of the body element in a document.xml file:
<w:body> <w:p> <!-- Paragraph properties and content here --> </w:p> <w:tbl> <!-- Table properties and content here --> </w:tbl> </w:body>
- extract_margins() DocMargins | None[source]
Extracts margins from the document XML.
- Returns:
The extracted margins or None if not found.
- Return type:
Optional[DocMargins]
Example
The following is an example of the section properties with margins in a document.xml file:
<w:sectPr> <w:pgMar w:left="1134" w:right="1134" w:gutter="0" w:header="0" w:top="1134" w:footer="0" w:bottom="1134"/> </w:sectPr>
- get_document_schema() DocumentSchema[source]
Gets the parsed document schema.
- Returns:
The document schema.
- Return type:
- parse() DocumentSchema[source]
Parses the document XML into a DocumentSchema.
- Returns:
The parsed document schema.
- Return type: