docx_parser_converter.docx_to_html.docx_processor module
- class docx_parser_converter.docx_to_html.docx_processor.DocxProcessor[source]
Bases:
objectA processor for parsing and merging DOCX document components such as styles, numbering, and document content.
- static get_default_numbering_schema() NumberingSchema[source]
Returns a default numbering schema.
- Returns:
The default numbering schema.
- Return type:
- static get_default_styles_schema() StylesSchema[source]
Returns a default styles schema.
- Returns:
The default styles schema.
- Return type:
- static process_docx(docx_file: bytes) tuple[DocumentSchema, StylesSchema, NumberingSchema][source]
Processes the DOCX file and extracts the document schema, styles schema, and numbering schema.
- Parameters:
docx_file (bytes) – The binary content of the DOCX file.
- Returns:
The parsed document schema, styles schema, and numbering schema.
- Return type:
tuple[DocumentSchema, StylesSchema, NumberingSchema]
- Raises:
Exception – If the document.xml parsing fails.
Example
The following is an example of how to use the process_docx method:
docx_path = "path/to/your/docx_file.docx" docx_file = read_binary_from_file_path(docx_path) document_schema, styles_schema, numbering_schema = DocxProcessor.process_docx(docx_file)