Text analysis and filling method, device, equipment and medium
By using an AI-driven large language model-based semantic parsing and auto-filling method, the problem of low data entry efficiency and poor accuracy in enterprise contract management has been solved. It has achieved efficient conversion from unstructured documents to structured forms and risk identification, thereby improving the accuracy and consistency of data entry.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING QDING INTERCONNECTION TECHNOLOGY CO LTD
- Filing Date
- 2026-02-11
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, enterprise contract management suffers from problems such as low data entry efficiency, errors in manual operation, inability to understand context, poor template adaptability, and failure to achieve end-to-end automation, making it difficult to guarantee data accuracy and consistency.
A semantic parsing method based on an AI large language model is adopted. By identifying the target field set, semantic parsing is performed to generate structured data, which is automatically filled into the target form. Combined with a preset rule base, risk identification and cross-validation are performed to realize the conversion from unstructured documents to structured forms.
It improved the efficiency and accuracy of data entry, avoided human error, realized an end-to-end automated data entry process, and enhanced data consistency and risk monitoring capabilities.
Smart Images

Figure CN122240804A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, and in particular to a text parsing and filling method, apparatus, device and medium. Background Technology
[0002] Enterprise contract management is a core aspect of business activities. After multiple rounds of communication with clients or partners and finalization, the core terms of a contract must be entered into the enterprise contract management system for subsequent performance tracking, financial accounting, and risk monitoring. With the development of artificial intelligence technology, how to achieve contract information entry based on AI has become a key research focus.
[0003] The related technologies require manual data entry, which is inefficient and prone to errors, reducing data accuracy. Another related technology is based on Optical Character Recognition (OCR) combined with keyword templates for information extraction. This technology is highly dependent on fixed contract templates, and the keyword-based extraction method cannot understand the context, making it difficult to accurately locate the correct information in complex sentences and paragraphs. Moreover, it cannot call system interfaces to complete automatic filling. Summary of the Invention
[0004] The embodiments of this application aim to at least partially solve one of the technical problems in the related art. Therefore, the purpose of the embodiments of this application is to provide a text parsing and filling method, apparatus, device, and medium that realizes automatic text filling.
[0005] This application provides a text parsing and filling method, including: processing a file to be parsed to obtain text content; identifying fields to be filled in a target form to obtain a target field set; performing semantic parsing based on the text content, the target field set, and the file to be parsed to obtain target content that matches at least one target field in the target field set; and filling the target content into the target form.
[0006] For example, based on text content, a target field set, and a file to be parsed, semantic parsing is performed to obtain target content that matches at least one target field in the target field set, including: obtaining prompt data; using the prompt data as a parsing reference, performing semantic parsing on the text content, the target field set, and the file to be parsed to obtain the target content.
[0007] For example, using the prompt data as a parsing reference, semantic parsing is performed on the text content, the target field set, and the file to be parsed to obtain the target content, including: generating a parsing request based on the prompt data, the text content, and the target field set; executing the parsing request to perform semantic parsing on the file to be parsed and / or the text content to obtain the target content.
[0008] For example, filling the target content into the target form includes: performing structured processing on the target content based on the target field to generate structured data; parsing the structured data to obtain the parameters to be filled; and filling the target form based on the parameters to be filled. For example, filling target content into a target form includes: determining the confidence level of each target content when any one of the target fields corresponds to multiple target contents; when the confidence level of at least one target content is lower than a preset confidence level, providing a prompt for the multiple target contents in the file to be parsed and / or text content; and in response to receiving a selection operation for some target contents among the multiple target contents, filling the selected target contents into the target form. For example, the method further includes: when performing semantic parsing processing based on text content, target field set, and file to be parsed, comparing the file to be parsed and / or text content with a preset rule base, and marking the risk information present in the file to be parsed and / or text content.
[0009] For example, the file to be parsed includes a main file and associated files, and the text content includes the main text content corresponding to the main file and the associated file content corresponding to the associated file; based on the text content, the target field set, and the file to be parsed, semantic parsing is performed to obtain target content that matches at least one target field in the target field set, including: performing cross-validation on the main file and associated files, and / or performing cross-validation on the main text content and associated text content to obtain the target content. Another embodiment of this application provides a text parsing and filling device, which includes: a processing module for processing a file to be parsed to obtain text content; an identification module for identifying fields to be filled in a target form to obtain a target field set; a parsing module for performing semantic parsing based on the text content, the target field set, and the file to be parsed to obtain target content that matches at least one target field in the target field set; and a filling module for filling the target content into the target form.
[0010] Another embodiment of this application provides an electronic device having a computer program stored thereon, which, when executed by a processor, implements the steps of the method of any of the above embodiments.
[0011] Another embodiment of this application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method of any of the above embodiments.
[0012] In the above embodiments, the text parsing and filling method includes: processing the file to be parsed to obtain text content; identifying the fields to be filled in the target form to obtain a target field set; performing semantic parsing based on the text content, the target field set, and the file to be parsed to obtain target content that matches at least one target field in the target field set; and filling the target content into the target form. Through semantic parsing and automatic filling, the conversion from unstructured documents to structured forms is achieved, improving data processing efficiency and avoiding problems such as incorrect or missing entries due to human fatigue or negligence, significantly improving the accuracy and consistency of data entry. Attached Figure Description
[0013] Figure 1 Flowchart of the text parsing and filling method provided for the implementation of this application; Figure 2 A schematic diagram of the target form provided for an embodiment of this application; Figure 3 Flowchart of a text parsing and filling method provided for another embodiment of this application; Figure 4 Block diagram of a text parsing and filling device provided for another embodiment of this application; Figure 5 A block diagram of an electronic device provided for another embodiment of this application. Detailed Implementation
[0014] The embodiments of this application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.
[0015] Enterprise contract management is a core part of business activities. After a contract has been communicated with customers or partners for multiple rounds and finally finalized, its core terms and conditions must be entered into the enterprise contract management system for subsequent performance tracking, financial accounting and risk monitoring. This entry process usually involves an electronic form with dozens or even hundreds of fields.
[0016] The relevant technologies require manual data entry, which has the following bottlenecks: (1) Low data entry efficiency and high labor costs: After the contract is signed, legal or business personnel need to manually extract key information (such as the contracting parties, amount, validity period, etc.) from the contract documents (such as Word, PDF) and fill them into the contract management system field by field. This process is repetitive and consumes a lot of time. (2) Manual operation is prone to errors and data accuracy cannot be guaranteed: During the manual copying and filling process, it is easy to make mistakes or omissions in information filling due to negligence, such as filling in the wrong contract amount, date or company name. These errors may cause serious problems in the subsequent financial and performance links. (3) Information barrier and work disconnect: There is a "gap" of manual operation between the final contract documents and the structured data in the system. This non-automated process leads to the bottleneck of work efficiency and makes the digital management of contracts fail to achieve end-to-end closed loop.
[0017] Another related technology is based on Optical Character Recognition (OCR) combined with keyword templates for information extraction. First, the scanned contract image is converted into editable text using OCR technology. Then, the information is located and extracted using preset rules or keyword templates (e.g., searching for words such as "Party A", "contract amount", "expiration date"). It has the following drawbacks: (1) Lack of true semantic understanding: The keyword-based extraction method is very "fragile" and cannot understand the context. For example, a contract may contain multiple monetary figures, but only one instance of "total contract amount"; the full company name of "Party A" may not be on the same line as the word "Party A". Traditional technology has difficulty accurately locating the correct information in complex sentences and paragraphs. (2) Poor template adaptability: This method is highly dependent on fixed contract templates. Once the contract template provided by the customer is different from the preset template, the extraction rules will fail on a large scale, resulting in unusable or a large amount of incorrect information being extracted. (3) Failure to achieve end-to-end automation: Even if some text is extracted, it is usually impossible to intelligently and accurately map it to the specific fields in the system form, and it is even more impossible to directly call the system interface to complete the filling. In the end, a lot of manual copying, pasting and verification work is often required, which does not fundamentally solve the problem.
[0018] In view of this, the embodiments of this application provide a text parsing and filling method, which intelligently understands the semantics of the contract based on an AI big model, and automatically completes the entire process from unstructured text to structured data filling according to the field requirements of the target page.
[0019] Figure 1 A flowchart illustrating the text parsing and filling method provided for the implementation of this application.
[0020] like Figure 1As shown, the text parsing and filling method 100 provided in this application includes, for example, steps S110-S140.
[0021] Step S110: Process the file to be parsed to obtain the text content. For example, a user can upload a file to be parsed in the contract management system. The file to be parsed may include a contract file. The format of the file to be parsed may include a text file (such as a Word document) and an image file (obtained by scanning a text file, such as a PDF document). The text content includes plain text files and does not contain images, tables, or various document styles.
[0022] Step S120: Identify the fields to be filled in the target form to obtain the target field set. For example, the target form (such as Figure 2 (As shown) is obtained by the server from the front-end interface and back-end business logic of the contract management system. The front-end interface can obtain the target fields to be filled in, and the back-end business logic can obtain the relevant descriptions of the target fields. The target fields and their relevant descriptions constitute the target field set. The target field set includes multiple target fields. The fields to be filled in the target form can be identified before the user uploads the file to be parsed, or the fields to be filled in the target form can be identified at the same time as the file to be parsed is uploaded.
[0023] Step S130: Based on the text content, the target field set, and the file to be parsed, perform semantic parsing to obtain target content that matches at least one target field in the target field set. For example, semantic parsing is performed by a large language model (AI). The server sends the text content, target field set, and file to be parsed to the large language model. The large language model searches the file to be parsed and the text content based on the target field set to obtain the target content that matches the target fields in the target field set. The large language model can search the file to be parsed based on the target field set (e.g., the file to be parsed is in plain text format, without images or tables), or it can search the text content based on the target field set (the file to be parsed is not in plain text format), or it can search the file to be parsed and the text content based on the target field set. The target content includes structured data (e.g., JSON format). The large language model sends the target content to the server.
[0024] Step S140: Fill the target content into the target form.
[0025] For example, when the server receives the target content sent by the large language model, it calls the interface of the contract management system to populate the target content into the backend of the target form. The frontend page refreshes and displays the automatically filled target interface based on the target content from the backend, and shows the user the target form with all fields automatically filled.
[0026] In the above embodiments, semantic parsing and automatic filling are used to convert unstructured documents into structured forms, thereby improving data processing efficiency and avoiding problems such as incorrect or missing entries caused by human fatigue or negligence, thus greatly improving the accuracy and consistency of data entry. In some embodiments of this application, semantic parsing is performed based on text content, a target field set, and a file to be parsed to obtain target content that matches at least one target field in the target field set. This includes: obtaining prompt data; using the prompt data as a parsing reference, performing semantic parsing on the text content, the target field set, and the file to be parsed to obtain the target content.
[0027] For example, the prompt data includes prompt words, which can be written manually or generated by a large language model. The text content, target field set, file to be parsed, and prompt words are sent together to the large language model for semantic parsing to obtain the target content.
[0028] For example, a prompt might read: "You are a senior legal expert. Please extract and populate the fields defined in the following JSON format based on the contract text below. If a field is not found in the text, please leave its value blank." In some embodiments of this application, using prompt data as a parsing reference, semantic parsing is performed on the text content, target field set, and file to be parsed to obtain target content, including: generating a parsing request based on prompt data, text content, and target field set; executing the parsing request to perform semantic parsing on the file to be parsed and / or text content to obtain target content.
[0029] For example, the server sends the text content, target field set, file to be parsed, and prompt words to the large language model for parsing. The large language model executes the parsing request and performs semantic parsing on the text content and / or file to be parsed based on the prompt words and target field set to obtain the target content. The large language model can perform semantic parsing on the file to be parsed (e.g., the file to be parsed is in plain text format, without images or tables, etc.) based on the target field set, or it can perform semantic parsing on the text content (the file to be parsed is not in plain text format) based on the target field set, or it can perform semantic parsing on both the file to be parsed and the text content based on the target field set.
[0030] For example, after receiving a request from the server, the AI model first understands the business meaning of the field list, then reads the entire contract and uses its powerful contextual understanding and semantic reasoning capabilities to locate the necessary information for each field in the unstructured text. For instance, it can understand "This contract is entered into by Company A (hereinafter referred to as "Party A") and Company B (hereinafter referred to as "Party B")," thereby accurately extracting the complete names of Party A and Party B.
[0031] In some embodiments of this application, filling the target content into the target form includes: performing structured processing on the target content based on the target field to generate structured data; parsing the structured data to obtain parameters to be filled; and filling the target form based on the parameters to be filled. Specifically, the large language model extracts the information and generates a structured data (usually in JSON format) according to the format of the input field list, and returns it to the server. After receiving the JSON data returned by the large language model, the server parses the data and calls the internal API (Application Programming Interface) of the contract management system, passing in the extracted information as parameters (parameters to be filled), thereby completing the filling of the backend data.
[0032] In the above embodiments, a semantic structuring engine with target fields as anchors performs deep semantic understanding and field-level mapping on unstructured target content, automatically generating high-fidelity, low-loss structured data; through two-way field-parameter binding, one-click error-free filling of target forms is achieved, improving filling efficiency.
[0033] In some embodiments of this application, filling target content into a target form includes: determining the confidence level of each target content when any one of the target fields corresponds to multiple target contents; when the confidence level of at least one target content is lower than a preset confidence level, providing a prompt for the multiple target contents in the file to be parsed and / or text content; and in response to receiving a selection operation for a portion of the target contents, filling the selected portion of the target contents into the target form. For example, when the large language model performs semantic parsing on text content and / or files to be parsed based on prompt data and target field set, each target field has a corresponding confidence level for the target content. The content with higher confidence level will be used as the target content. Each target field may correspond to multiple target contents. If there is a case where the confidence level of multiple target contents is lower than the preset confidence level, the corresponding files to be parsed and / or text content in the multiple target contents will be highlighted. At this time, the user selects from the multiple target contents and fills the selected part of the target content into the target form.
[0034] For example, when the AI model has low confidence in extracting a certain piece of information from a contract (e.g., multiple possible contract amounts appear in the text), the system does not directly fill in the form. Instead, it highlights the information on the interface and provides multiple options and the original source, which the user can click to select, thus achieving "human-machine collaboration" to complete the verification.
[0035] In the above embodiments, in the scenario of multiple values conflict in a single field, the candidate target content is quantitatively sorted by confidence evaluation model. When the confidence level is detected to be lower than the preset threshold, a visual prompt mechanism is automatically triggered, and the multi-value candidate area is highlighted in the original text to realize human-computer collaborative verification. The system responds to the user's click operation, instantly locks the target content with high confidence or manual confirmation, completes accurate and traceable form backfilling, and improves the filling efficiency.
[0036] In some embodiments of this application, the method further includes: when performing semantic parsing processing based on text content, target field set, and file to be parsed, comparing the file to be parsed and / or text content with a preset rule base, and marking the risk information present in the file to be parsed and / or text content.
[0037] Specifically, while extracting structured information, the capabilities of AI can be expanded to allow it to compare and review contract terms based on the company's pre-set legal risk control rule base (pre-set rule base), automatically identify and mark missing key terms, risky "trap" terms, or terms inconsistent with standard templates (risk information).
[0038] In the above embodiments, by performing multi-dimensional semantic comparison between the text / file to be parsed and a preset rule base, intelligent identification and accurate labeling of potential risk information in unstructured text can be achieved.
[0039] In some embodiments of this application, the file to be parsed includes a main file and associated files, and the text content includes the main text content corresponding to the main file and the associated file content corresponding to the associated file; based on the text content, the target field set, and the file to be parsed, semantic parsing is performed to obtain target content that matches at least one target field in the target field set, including: performing cross-validation on the main file and associated files, and / or performing cross-validation on the main text content and associated text content to obtain the target content. Exemplarily, after a user uploads a main contract (main document), they are allowed to upload associated contracts or attachments (associated documents). The large language model can perform cross-information comparison on multiple documents (main document and associated documents). For example, it can check whether the names and numbers of the attachments cited in the main contract are consistent with the actually uploaded attachments to ensure data consistency. It can also perform cross-information comparison on the text content (main text content and associated text content) to obtain the target content.
[0040] Figure 3 The flowchart of the text parsing and filling method provided for another embodiment of this application is as Figure 3 shown, and the text parsing and filling method includes S301 - S314.
[0041] S301, the user uploads the final version of the contract file in the contract system.
[0042] Exemplarily, the user uploads the finalized contract file (such as in.docx,.pdf format) on the designated page of the contract management system.
[0043] S302, the server performs file preprocessing.
[0044] Exemplarily, after receiving the file, the server starts the preprocessing module to extract the pure text content from the file. For scanned PDF files, this step will first call the OCR service, and file preprocessing will convert PDF / Word to pure text, for example. S303, the system obtains the field list of the current form from the front - end page or the back - end.
[0045] Exemplarily, either at the same time as or before the user uploads the file, the system determines the target form to be filled this time. The application server obtains the list (Schema) of all fields required for this form from the front - end interface definition or the back - end business logic. This list not only contains the field names (such as party_a_name), but may also contain the Chinese labels (such as "name of Party A") and descriptions of the fields to facilitate better understanding by the AI.
[0046] S304, send the contract pure text and the field list to the AI model together.
[0047] Exemplarily, the application server takes the pure text content of the contract and the target field list as two key inputs and sends them to the AI large language model together through a carefully designed prompt.
[0048] S305, understand the business meaning of the field list.
[0049] S306, perform semantic retrieval and information extraction in the contract text. [[ID=,37]]
[0050] S307, output structured data in a field list format.
[0051] S308, the server application service module receives structured data.
[0052] S309 calls the system's internal API to fill data into the corresponding business interface.
[0053] S310, the backend populates the database or cache with data.
[0054] S311, the front-end page displays the automatically filled form.
[0055] For example, once the data is populated, the front-end page refreshes and displays a form to the user with all fields automatically filled in. The AI model can also highlight the source of each filled-in piece of information in the original text, allowing users to quickly verify it.
[0056] S312, the user conducts final manual review and confirmation, and proceeds to S313 after the review is approved.
[0057] For example, users act as the final gatekeepers, quickly reviewing the pre-submitted information.
[0058] S313, the user clicks submit.
[0059] For example, after confirming that everything is correct, click the "Submit" button to complete the entire contract information entry process.
[0060] S314, Contract information is officially stored in the database.
[0061] The text parsing and filling method proposed in this application, which is achieved by the collaborative work of user terminal, application server and AI big model, realizes: (1) Intelligent extraction driven by dynamic fields: The extraction behavior of AI is not based on a fixed template, but is driven by the dynamic field list (Schema) of the target page. This makes the method highly adaptable and can seamlessly connect to any different contract forms in the system without developing separate parsing rules for each form. (2) End-to-end automated filling process: It opens up the entire link from "unstructured documents" to "structured system data". It is not only "extraction" but also "filling". By integrating the AI's Function Calling or Tool Using capabilities, it directly interacts with the system API, realizing true automation and minimizing manual operation. (3) Precise parsing based on deep semantics: Utilizing the deep understanding of legal and business texts by the AI big model, this invention can handle complex sentence structures, polysemous words, referential relationships, etc., with an accuracy far exceeding that of traditional OCR + keyword technology, and can handle real contract text parsing tasks with different formats.
[0062] Figure 4 A block diagram of a text parsing and filling device provided for another embodiment of this application.
[0063] This specification provides a text parsing and filling device 400. Please refer to [link / reference]. Figure 4 The text parsing and filling device 400 includes: a processing module 410, a recognition module 420, a parsing module 430, and a filling module 440.
[0064] For example, the processing module 410 is used to process the file to be parsed to obtain text content.
[0065] For example, the identification module 420 is used to identify the fields to be filled in the target form and obtain the target field set.
[0066] For example, the parsing module 430 is used to perform semantic parsing processing based on the text content, the target field set, and the file to be parsed to obtain target content that matches at least one target field in the target field set.
[0067] For example, the filling module 440 is used to fill in the target content into the target form. For example, the parsing module 430 is also used to obtain prompt data; using the prompt data as a parsing reference, semantic parsing processing is performed on the text content, the target field set, and the file to be parsed to obtain the target content. For example, the parsing module 430 is also used to generate a parsing request based on the prompt data, text content and target field set; execute the parsing request to perform semantic parsing on the file to be parsed and / or text content to obtain the target content.
[0068] For example, the form filling module 440 is also used to perform structured processing on the target content based on the target field to generate structured data; parse the structured data to obtain the parameters to be filled; and fill the target form based on the parameters to be filled. For example, the filling module 440 is further configured to: determine the confidence level of each target content among the multiple target contents when any one of the target fields in at least one target field corresponds to multiple target contents; when the confidence level of at least one target content among the multiple target contents is lower than a preset confidence level, provide a prompt for the multiple target contents in the file to be parsed and / or text content; and in response to receiving a selection operation for some target contents among the multiple target contents, fill in the selected target contents into the target form. For example, the text parsing and filling device 400 further includes: a comparison module, used to compare the file to be parsed and / or the text content with a preset rule base when performing semantic parsing processing based on text content, target field set and file to be parsed, and to mark the risk information present in the file to be parsed and / or the text content. For example, the file to be parsed includes a main file and associated files, and the text content includes the main text content corresponding to the main file and the associated file content corresponding to the associated file; the parsing module 430 is also used to perform cross-validation processing on the main file and associated files, and / or to perform cross-validation processing on the main text content and associated text content to obtain the target content.
[0069] Figure 5 A block diagram of an electronic device provided for another embodiment of this application.
[0070] Another embodiment of this application provides an electronic device having a computer program stored thereon, which, when executed by a processor, implements the steps of the method of any of the above embodiments.
[0071] like Figure 5 As shown, for ease of understanding, embodiments of this application illustrate a specific electronic device 400.
[0072] Electronic device 500 is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.
[0073] like Figure 5 As shown, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 502 or a computer program loaded from a storage unit 508 into a random access memory (RAM) 503. The RAM 503 may also store various programs and data required for the operation of the electronic device 500. The computing unit 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.
[0074] Multiple components in electronic device 500 are connected to input / output (I / O) interface 505. These components include: input unit 506, such as a keyboard or mouse; output unit 507, such as various types of displays or speakers; storage unit 508, such as a hard disk or optical disk; and communication unit 509, such as a network interface card (NIC), modem, or wireless transceiver. Communication unit 509 allows electronic device 500 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0075] The computing unit 501 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods described above. For example, in some embodiments, any one or more of the various methods described above can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program can be loaded and / or installed on the electronic device 500 via ROM 502 and / or communication unit 509. When the computer program is loaded into RAM 503 and executed by the computing unit 501, one or more steps of any one or more of the various methods described above can be performed. Alternatively, in other embodiments, the computing unit 501 can be configured to perform any one or more of the various methods described above by any other suitable means (e.g., by means of firmware).
[0076] This application provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the steps of the method in any of the above embodiments.
[0077] It should be noted that the logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be specifically implemented in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this application, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: electrical connections (electronic devices) having one or more wires, portable computer disk drives (magnetic devices), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, computer-readable media can even be paper or other suitable media on which programs can be printed, because programs can be obtained electronically, for example, by optically scanning the paper or other media, followed by editing, interpreting, or otherwise processing as necessary, and then stored in computer memory.
[0078] It should be understood that various parts of this application can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0079] In the description of this application, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this application, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0080] In the description of this application, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc., indicating the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, are only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this application.
[0081] Furthermore, the terms "first," "second," etc., used in the embodiments of this application are for descriptive purposes only and should not be construed as indicating or implying relative importance, or implicitly specifying the number of technical features indicated in this embodiment. Therefore, features defined with terms such as "first" and "second" in the embodiments of this application can explicitly or implicitly indicate that the embodiment includes at least one of those features. In the description of this application, the word "multiple" means at least two or more, such as two, three, four, etc., unless otherwise explicitly and specifically defined in the embodiments.
[0082] In this application, unless otherwise explicitly specified or limited in the embodiments, the terms "installation," "connection," "joining," and "fixing" appearing in the embodiments should be interpreted broadly. For example, a connection can be a fixed connection, a detachable connection, or an integral part; it can also be a mechanical connection, an electrical connection, etc. Of course, it can also be a direct connection, or an indirect connection through an intermediate medium, or it can be the internal communication between two components, or the interaction between two components. Those skilled in the art can understand the specific meaning of the above terms in this application based on the specific implementation.
[0083] In this application, unless otherwise expressly specified and limited, "above" or "below" the second feature can mean that the first feature is in direct contact with the second feature, or that the first feature is in indirect contact with the second feature through an intermediate medium. Furthermore, "above," "on top of," and "over" the second feature can mean that the first feature is directly above or diagonally above the second feature, or simply that the first feature is at a higher horizontal level than the second feature. "Below," "below," and "under" the second feature can mean that the first feature is directly below or diagonally below the second feature, or simply that the first feature is at a lower horizontal level than the second feature.
[0084] Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of this application.
Claims
1. A text parsing and data entry method, characterized in that, The method includes: The file to be parsed is processed to obtain the text content; Identify the fields to be filled in the target form to obtain the target field set; Based on the text content, the target field set, and the file to be parsed, semantic parsing is performed to obtain target content that matches at least one target field in the target field set; Fill the target content into the target form.
2. The method according to claim 1, characterized in that, The step of performing semantic parsing based on the text content, the target field set, and the file to be parsed to obtain target content that matches at least one target field in the target field set includes: Get the prompt data; Using the prompt data as a parsing reference, semantic parsing processing is performed on the text content, the target field set, and the file to be parsed to obtain the target content.
3. The method according to claim 2, characterized in that, The step of using the prompt data as a parsing reference to perform semantic parsing processing on the text content, the target field set, and the file to be parsed to obtain the target content includes: Based on the prompt data, the text content, and the target field set, a parsing request is generated; The parsing request is executed to perform semantic parsing on the file to be parsed and / or the text content to obtain the target content.
4. The method according to claim 1, characterized in that, The step of filling the target content into the target form includes: Based on the target field, the target content is processed in a structured manner to generate structured data; The structured data is parsed to obtain the parameters to be filled. The target form is populated based on the parameters to be filled.
5. The method according to claim 1, characterized in that, The step of filling the target content into the target form includes: When any one of the at least one target field corresponds to multiple target contents, determine the confidence level of each target content among the multiple target contents; When at least one of the target contents has a confidence level lower than a preset confidence level, a prompt is made for the target contents in the file to be parsed and / or the text content. In response to receiving a selection operation for a portion of the target content among the multiple target contents, the selected portion of the target content is filled into the target form.
6. The method according to any one of claims 1-5, characterized in that, The method further includes: When performing semantic parsing based on the text content, the target field set, and the file to be parsed, the file to be parsed and / or the text content is compared with a preset rule base, and risk information present in the file to be parsed and / or the text content is marked.
7. The method according to any one of claims 1-5, characterized in that, The file to be parsed includes a main file and associated files, and the text content includes the main text content corresponding to the main file and the associated file content corresponding to the associated files; the semantic parsing process based on the text content, the target field set, and the file to be parsed to obtain target content that matches at least one target field in the target field set includes: The target content is obtained by performing cross-validation on the main file and the associated file, and / or by performing cross-validation on the main text content and the associated text content.
8. A text parsing and filling device, characterized in that, The device includes: The processing module is used to process the file to be parsed to obtain the text content; The identification module is used to identify the fields to be filled in the target form and obtain the target field set. The parsing module is used to perform semantic parsing based on the text content, the target field set, and the file to be parsed to obtain target content that matches at least one target field in the target field set; The data entry module is used to fill in the target content into the target form.
9. An electronic device having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method described in any one of claims 1-7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method described in any one of claims 1-7.