avatarXAVIER HERCE shared you an app

Objective: Create an application that automatically generates an XML file from the content of an updated PDF document. The XML file should conform to a specific structure used for proforma invoices. The app should be capable of extracting relevant information from the PDF, mapping it to the correct XML elements, and ensuring the data is correctly formatted. Key Features: PDF Parsing: The app must be able to read and parse various PDF documents, specifically focusing on those structured as invoices or proforma invoices. Implement OCR (Optical Character Recognition) to handle PDFs that are scanned images rather than digitally generated text. Data Extraction: Identify and extract key fields from the PDF, such as: Document type (DOCUMENT_TYPE) Document number (NO_DOCUMENTO) Supplier (PROVEEDOR) Tax Identification Number (CIF) Taxable base (BASE_IMPONIBLE) Discounts (DESCUENTO) VAT (IVA) Total amount (TOTAL) Document date (FECHA_DE_DOCUMENTO) Archiving date (FECHA_DE_ARCHIVADO) Responsible person (RESPONSABLE) Company information (EMPRESA1, EMPRESA) Status (STATUS) Order number (NUM_PEDIDO) Line items (DETALLE) Support multiple pages and large volumes of line items, extracting all relevant details accurately. Data Mapping: Map the extracted data to the corresponding XML elements as per the defined XML structure. Handle variations in PDF layouts by implementing flexible mapping rules or using machine learning models to identify fields based on context. XML Generation: Generate a well-formed XML file that adheres to the predefined schema. Ensure that: The XML structure includes all required elements and attributes. Line items are correctly nested under the DETALLE element. Optional fields are included or omitted based on the presence of data. Validate the generated XML against the XML schema to ensure compliance. User Interface: Provide a simple and intuitive interface where users can upload PDFs and download the corresponding XML files. Allow users to review and edit extracted data before XML generation, if necessary. Display errors or warnings if any issues are encountered during the PDF parsing or XML generation processes. Examples xml structure for the output: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="DOCUMENT"> <xs:complexType> <xs:sequence> <xs:element name="DOCUMENT_TYPE" type="xs:string"/> <xs:element name="NO_DOCUMENTO" type="xs:string"/> <xs:element name="PROVEEDOR" type="xs:string"/> <xs:element name="CIF" type="xs:string"/> <xs:element name="BASE_IMPONIBLE" type="xs:decimal"/> <xs:element name="DESCUENTO" type="xs:decimal"/> <xs:element name="IVA" type="xs:decimal"/> <xs:element name="TOTAL" type="xs:decimal"/> <xs:element name="FECHA_DE_DOCUMENTO" type="xs:dateTime"/> <xs:element name="FECHA_DE_ARCHIVADO" type="xs:dateTime"/> <xs:element name="RESPONSABLE" type="xs:string"/> <xs:element name="EMPRESA1" type="xs:string"/> <xs:element name="DELEGAC" type="xs:string"/> <xs:element name="NIDENTIFICADOR_FACTURA" type="xs:string" minOccurs="0"/> <xs:element name="NIDENTIFICADOR_ALBARAN" type="xs:string"/> <xs:element name="NIDENTIFICADOR_PEDIDO" type="xs:string" minOccurs="0"/> <xs:element name="EMPRESA" type="xs:string"/> <xs:element name="STATUS" type="xs:string"/> <xs:element name="NUM_PEDIDO" type="xs:string"/> <xs:element name="NUM_ALBARAN" type="xs:string" minOccurs="0"/> <xs:element name="NUM_FACTURA" type="xs:string" minOccurs="0"/> <xs:element name="DETALLE"> <xs:complexType> <xs:sequence> <!-- FILA can occur an unlimited number of times --> <xs:element name="FILA" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="CAMPO_DIFERENC1" type="xs:decimal"/> <xs:element name="CAMPO_PRNUMFAC" type="xs:string" minOccurs="0"/> <xs:element name="CAMPO_ARTICULO" type="xs:string"/> <xs:element name="CAMPO_CANTENT1" type="xs:decimal"/> <xs:element name="CAMPO_DESCOM_1" type="xs:string" minOccurs="0"/> <xs:element name="CAMPO_DESCOM_2" type="xs:string" minOccurs="0"/> <xs:element name="CAMPO_DESCOM_3" type="xs:string" minOccurs="0"/> <xs:element name="CAMPO_IMPORTE_UNITARIO" type="xs:decimal"/> <xs:element name="CAMPO_IMPORTE_TOTAL" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

Please login to use this app.