A text content reference resolution method, device, equipment and storage medium thereof
By constructing entity relation vectors through knowledge graph analysis and representation learning models, and combining entity extraction and synonym classification, the problem of limited information in the resolution of referential meaning in existing technologies is solved, and more accurate resolution of text content referential meaning is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA PING AN PROPERTY INSURANCE CO LTD
- Filing Date
- 2024-01-08
- Publication Date
- 2026-06-26
AI Technical Summary
Existing methods for resolving semantic references only process the input text, resulting in limited information and failing to effectively assist financial processing systems in resolving textual semantic references.
By parsing the target knowledge graph to obtain entity data and relation representation data, and using a representation learning model to construct entity relation vectors, combined with an entity extraction model and synonym classification, the referential resolution of the text to be resolved is achieved.
It improves the accuracy and comprehensiveness of text content referencing resolution, and can better assist financial processing systems in understanding the meaning of text.
Smart Images

Figure CN117932082B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of financial technology and is applied to the scenario of text content referential resolution, and in particular to a method, apparatus, device and storage medium for text content referential resolution. Background Technology
[0002] With the rapid development of the internet, all industries are seeking breakthroughs by leveraging it. In recent years, the financial industry has also been expanding its online business. Due to the large volume of business and data involved in the financial industry, a better understanding of the meaning of text is crucial for processing financial business documents.
[0003] Reference resolution is an important task in natural language processing. Its goal is to resolve the referential relationships of noun phrases in a text, that is, to determine all places in the text where the same entity (such as a person, place, or object) appears, which can help us better understand the meaning of the text.
[0004] Current methods for resolving terminology typically focus on the input text itself, aiming to extract information from it. For example, some researchers use end-to-end neural networks to compute vector representations of phrases and head attention mechanisms within the text to achieve terminology resolution. Others use question-answering models, first extracting candidate terms from the text, then treating each term's sentence as a question and the entire text as context, concatenating the two, and finally extracting all co-references for that term using a question-answering model. However, these methods only process the input text, yielding very limited information and failing to effectively assist financial processing systems in resolving textual terminology. Summary of the Invention
[0005] The purpose of this application is to provide a method, apparatus, device and storage medium for text content referencing resolution, so as to solve the problem that the existing referencing resolution methods only process the input text, and the information obtained is very limited, which cannot better assist the financial processing system in text content referencing resolution.
[0006] To address the aforementioned technical problems, this application provides a method for resolving text content referencing, employing the following technical solution:
[0007] A method for resolving text content referencing includes the following steps:
[0008] Obtain the target knowledge graph;
[0009] The target knowledge graph is parsed to obtain all entity data and the relational representation data between all entity data.
[0010] All entity data and relation representation data between all entity data are input into a preset representation learning model. Based on the output of the representation learning model, an entity relation vector containing semantic information is obtained. The entity relation vector containing semantic information is fitted by head entity data, relation representation data and tail entity data.
[0011] Obtain the text to be resolved by dereference;
[0012] The text to be resolved by substitution is input into a preset entity extraction model, and all entity data contained in the text to be resolved by substitution are extracted according to the entity extraction model.
[0013] Based on a preset thesaurus, all entity data contained in the text to be resolved are classified and organized using synonyms to obtain the classified and organized entity data to be trained.
[0014] Using the entity relation vector containing semantic information as a supervision signal, the entity data to be trained is input into the representation learning model to obtain the entity relation vector containing semantic information corresponding to the entity data to be trained;
[0015] Based on the entity relation vector containing semantic information corresponding to the entity data to be trained, the entity data contained in the text to be resolved is resolved by referencing.
[0016] Furthermore, the step of parsing the target knowledge graph to obtain all entity data and the relational representation data between all entity data specifically includes:
[0017] The target knowledge graph is initially analyzed to obtain all graph nodes and connections between them.
[0018] Based on the graph node identifiers and a preset first reference table, the entity data corresponding to each node is determined. The first reference table contains the mapping relationship between the graph node identifiers and the entity data.
[0019] Based on the connection identifiers and a preset second reference table, the relation representation data corresponding to the connections between all graph nodes are determined. The second reference table contains the mapping relationship between the connection identifiers and the relation representation data.
[0020] Furthermore, before performing the step of inputting all entity data and the relation representation data between all entity data into a preset representation learning model, and obtaining an entity relation vector containing semantic information based on the output of the representation learning model, the method further includes:
[0021] The target knowledge graph is deployed as a learning reference graph into the representation learning model;
[0022] The target knowledge graph is initialized to obtain entity location vectors and relationship representation vectors contained in the target knowledge graph. The entity location vectors are obtained based on the location information of all entity data in the target knowledge graph. The relationship representation vectors are composed of magnitude information and angle information, which are calculated based on the location information of two entity data with a connection relationship in the target knowledge graph.
[0023] Furthermore, the step of inputting all entity data and the relation representation data between all entity data into a preset representation learning model, and obtaining an entity relation vector containing semantic information based on the output of the representation learning model, specifically includes:
[0024] By comparison, the entity location vectors corresponding to all the entity data are identified;
[0025] Select one entity from all the entity data as the header entity data;
[0026] Randomly select one entity from all the entity data as the tail entity data;
[0027] Based on the entity position vectors corresponding to the head entity data and tail entity data respectively, the modulus information and angle information between the head entity data and tail entity data are calculated.
[0028] Based on the target knowledge graph, determine whether there is direct relationship representation data between the head entity data and the tail entity data;
[0029] If there is direct relation representation data between the head entity data and the tail entity data, then the entity relation vector containing semantic information is constructed based on the entity position vector of the head entity data, the entity position vector of the tail entity data, and the relation representation data.
[0030] If there is no direct relationship representation data between the head entity data and the tail entity data, the indirect relationship representation data contained in the optimal path between the head entity data and the tail entity data will be obtained according to the preset filtering strategy. The optimal path is the path when the number of indirect relationship representation data contained is the minimum value identified by the filtering strategy.
[0031] The entity relationship vector containing semantic information is constructed based on the entity position vector of the head entity data, the entity position vector of the tail entity data, and the indirect relationship representation data under the best path contained between the head entity data and the tail entity data.
[0032] Furthermore, the entity extraction model is a BERT entity extraction model based on the CRF algorithm. The step of inputting the text to be resolved into the preset entity extraction model and extracting all entity data contained in the text to be resolved according to the entity extraction model specifically includes:
[0033] The text to be resolved by referential analysis is segmented according to the word segmentation component in the entity extraction model to obtain the word segmentation result;
[0034] The word segmentation results are analyzed by the part-of-speech analysis component in the entity extraction model to obtain the noun field in the word segmentation results.
[0035] The noun field is output according to the output component in the entity extraction model, thereby completing the extraction of all entity data contained in the text to be resolved.
[0036] Furthermore, the step of performing referential resolution on the entity data contained in the text to be resolved based on the entity relation vector containing semantic information corresponding to the entity data to be trained specifically includes:
[0037] Select one entity data to be trained from the entity data to be trained as the comparison entity data;
[0038] Obtain the entity relation vector containing semantic information corresponding to the compared entity data, and use it as the first vector;
[0039] Sequentially obtain the entity relation vectors containing semantic information corresponding to the other entity data to be trained in the entity data to be trained, excluding the comparison entity data, and use them as the second vector;
[0040] The similarity between the first vector and the second vector is calculated according to a preset similarity algorithm.
[0041] If the similarity meets a preset similarity threshold, then the comparison entity data refers to the training entity data corresponding to the second vector, and the comparison entity data refers to the synonyms of the training entity data corresponding to the second vector.
[0042] Furthermore, the step of calculating the similarity between the first vector and the second vector according to a preset similarity algorithm specifically includes:
[0043] Obtain the first vector corresponding to the comparison entity data, which is fitted by the head entity data, relation representation data and tail entity data;
[0044] Obtain the vector data corresponding to the second vector, which is fitted from the head entity data, relation representation data, and tail entity data;
[0045] The cosine similarity algorithm is used to calculate the similarity between the head entity data, relation representation data, and tail entity data in the first vector and the second vector.
[0046] The similarity between the head entity data, relation representation data, and tail entity data in the first vector and the second vector is used as the similarity between the first vector and the second vector.
[0047] To address the aforementioned technical problems, this application also provides a text content referencing resolution device, which employs the following technical solution:
[0048] A text content referencing resolution device, comprising:
[0049] The knowledge graph acquisition module is used to acquire the target knowledge graph;
[0050] The knowledge graph parsing module is used to parse the target knowledge graph to obtain all entity data and the relational representation data between all entity data.
[0051] The entity relation vector acquisition module is used to input all entity data and relation representation data between all entity data into a preset representation learning model, and obtain an entity relation vector containing semantic information based on the output of the representation learning model. The entity relation vector containing semantic information is fitted by head entity data, relation representation data and tail entity data.
[0052] The text acquisition module is used to obtain the text to be dereferenced;
[0053] The entity data extraction module is used to input the text to be resolved into a preset entity extraction model, and extract all entity data contained in the text to be resolved according to the entity extraction model.
[0054] The synonym classification and organization module is used to classify and organize all entity data contained in the text to be resolved into synonyms according to a preset thesaurus, and obtain the classified and organized entity data to be trained.
[0055] The supervised learning module is used to input the entity data to be trained into the representation learning model using the entity relation vector containing semantic information as a supervision signal, so as to obtain the entity relation vector containing semantic information corresponding to the entity data to be trained.
[0056] The referential resolution processing module is used to perform referential resolution on the entity data contained in the text to be resolved based on the entity relation vector containing semantic information corresponding to the entity data to be trained.
[0057] To address the aforementioned technical problems, this application also provides a computer device that employs the following technical solution:
[0058] A computer device includes a memory and a processor, wherein the memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the steps of the text content referencing resolution method described above.
[0059] To address the aforementioned technical problems, this application also provides a computer-readable storage medium, employing the technical solution described below:
[0060] A computer-readable storage medium storing computer-readable instructions, which, when executed by a processor, implement the steps of the text content referencing resolution method described above.
[0061] Compared with the prior art, the embodiments of this application have the following main advantages:
[0062] The text content referencing resolution method described in this application involves parsing the target knowledge graph to obtain all entity data and relational representation data between all entity data; inputting all entity data and relational representation data between all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtaining the text to be resolved; inputting the text to be resolved into a preset entity extraction model to extract all entity data contained in the text; classifying and organizing all entity data contained in the text to be resolved into synonyms to obtain entity data to be trained; inputting the entity data to be trained into the representation learning model, and using the entity relation vectors containing semantic information as supervision signals to obtain entity relation vectors containing semantic information corresponding to the entity data to be trained; and resolving the referencing of entity data contained in the text to be resolved based on the entity relation vectors containing semantic information corresponding to the entity data to be trained. By using the entity relation vectors contained in the target recognition graph obtained by the representation learning model as supervision signals, the entity relation vectors between all entity data in the text to be resolved are determined, thus better assisting the financial processing system in performing text content referencing resolution. Attached Figure Description
[0063] To more clearly illustrate the solutions in this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0064] Figure 1 This is an exemplary system architecture diagram to which this application can be applied;
[0065] Figure 2 This is a flowchart illustrating one embodiment of the resolution method as referred to in the text of this application;
[0066] Figure 3 yes Figure 2 A flowchart of a specific embodiment of step 202 shown;
[0067] Figure 4 yes Figure 2 A flowchart of a specific embodiment of step 203 shown;
[0068] Figure 5 yes Figure 2 A flowchart of a specific embodiment of step 205 shown;
[0069] Figure 6 yes Figure 2 A flowchart of a specific embodiment of step 208 shown;
[0070] Figure 7 yes Figure 6 A flowchart of a specific embodiment of step 604 shown;
[0071] Figure 8 This is a schematic diagram of the structure of one embodiment of the digestion apparatus as referred to in the text of this application;
[0072] Figure 9 This is a schematic diagram of the structure of one embodiment of the computer device according to this application. Detailed Implementation
[0073] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.
[0074] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0075] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0076] like Figure 1 As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.
[0077] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social media platform software, etc.
[0078] Terminal devices 101, 102, and 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), MP4 players (Moving Picture Experts Group Audio Layer IV), laptops, and desktop computers, etc.
[0079] Server 105 can be a server that provides various services, such as a backend server that supports the pages displayed on terminal devices 101, 102, and 103.
[0080] It should be noted that the text content referencing resolution method provided in this application embodiment is generally executed by a server / terminal device, and correspondingly, the text content referencing resolution device is generally set in the server / terminal device.
[0081] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0082] Continue to refer to Figure 2 The diagram illustrates a flowchart of an embodiment of the text content referencing resolution method according to this application. The text content referencing resolution method includes the following steps:
[0083] Step 201: Obtain the target knowledge graph, wherein the target knowledge graph is a knowledge graph constructed based on a preset text knowledge base.
[0084] Step 202: Parse the target knowledge graph to obtain all entity data and the relationship representation data between all entity data.
[0085] In this embodiment, the target knowledge graph includes a financial business knowledge graph, such as an insurance business knowledge graph containing policyholders, insured entities, underwriters, and the relationships between them. Generally, the target knowledge graph contains all entities and the relationship representation data between all entities, wherein the relationship representation data in the target knowledge graph is composed of pointer lines and description fields.
[0086] Continue to refer to Figure 3 , Figure 3 yes Figure 2 A flowchart of a specific embodiment of step 202 shown includes:
[0087] Step 301: Perform preliminary analysis on the target knowledge graph to obtain all graph nodes and connections between all graph nodes contained in the target knowledge graph;
[0088] Step 302: Determine the entity data corresponding to each node according to the graph node identifier and the preset first reference table, wherein the first reference table contains the mapping relationship between the graph node identifier and the entity data;
[0089] Step 303: Based on the connection identifier symbols and the preset second reference table, determine the relationship representation data corresponding to the connections between all graph nodes. The second reference table contains the mapping relationship between the connection identifier symbols and the relationship representation data.
[0090] By parsing, we obtain all entity data and relational representation data between all entities contained in the target knowledge graph, which facilitates subsequent training of representation learning models based on all entity data and relational representation data between all entities contained in the target knowledge graph.
[0091] Step 203: Input all entity data and the relation representation data between all entity data into a preset representation learning model, and obtain an entity relation vector containing semantic information based on the output of the representation learning model. The entity relation vector containing semantic information is fitted by head entity data, relation representation data and tail entity data.
[0092] In this embodiment, before executing the step of inputting all entity data and the relation representation data between all entity data into a preset representation learning model, and obtaining entity relation vectors containing semantic information based on the output of the representation learning model, the method further includes: deploying the target knowledge graph as a learning reference graph into the representation learning model; initializing the target knowledge graph to obtain entity position vectors and relation representation vectors contained in the target knowledge graph, wherein the entity position vectors are obtained based on the position information of all entity data in the target knowledge graph, and the relation representation vectors are composed of modulus information and angle information, and the modulus information and angle information are calculated based on the position information of two entity data with a connection relationship in the target knowledge graph.
[0093] In this embodiment, the step of initializing the target knowledge graph to obtain the entity location vectors and relationship representation vectors contained in the target knowledge graph specifically includes: arbitrarily selecting a point in the target knowledge graph as the three-dimensional coordinate origin; obtaining the three-dimensional coordinate information of all graph nodes in the target knowledge graph; determining the entity location information of all entity data based on the three-dimensional coordinate information of all graph nodes; obtaining the entity location vectors corresponding to each entity data based on the entity location information; and calculating the modulus information and angle information between two entity data that have a connection relationship based on the entity location vectors corresponding to each entity data.
[0094] Continue to refer to Figure 4 , Figure 4 yes Figure 2 A flowchart of a specific embodiment of step 203 shown includes:
[0095] Step 401: By comparison, identify the entity position vectors corresponding to all the entity data respectively;
[0096] Step 402: Randomly select one entity data from all the entity data as the header entity data;
[0097] Step 403: Randomly select one entity data from all the entity data as the tail entity data;
[0098] Step 404: Based on the entity position vectors corresponding to the head entity data and tail entity data respectively, calculate the modulus information and angle information between the head entity data and tail entity data;
[0099] Step 405: Based on the target knowledge graph, determine whether there is direct relationship representation data between the head entity data and the tail entity data;
[0100] Step 406: If there is direct relation representation data between the head entity data and the tail entity data, then construct the entity relation vector containing semantic information based on the entity position vector of the head entity data, the entity position vector of the tail entity data, and the relation representation data.
[0101] Step 407: If there is no direct relationship representation data between the head entity data and the tail entity data, the indirect relationship representation data contained in the optimal path between the head entity data and the tail entity data will be obtained according to the preset filtering strategy. The optimal path is the path when the number of indirect relationship representation data contained is the minimum value identified by the filtering strategy.
[0102] Step 408: Construct the entity relationship vector containing semantic information based on the entity position vector of the head entity data, the entity position vector of the tail entity data, and the indirect relationship representation data under the best path contained between the head entity data and the tail entity data.
[0103] By arbitrarily selecting head entity data and tail entity data from all entity data, and constructing entity relationship vectors containing semantic information based on the head entity data and tail entity data, the entity relationship vectors between all entity data are determined. This facilitates the subsequent use of the entity relationship vectors as a monitoring signal to determine the entity relationship vectors between all entity data in the text to be resolved, thus better assisting the financial processing system in performing the resolution of referential issues.
[0104] Step 204: Obtain the text to be resolved by referential elimination.
[0105] In this embodiment, the text to be resolved by dereference is the insurance business text to be resolved by dereference. For example, a newly signed insurance contract text.
[0106] Step 205: Input the text to be resolved into a preset entity extraction model, and extract all entity data contained in the text to be resolved according to the entity extraction model.
[0107] In this embodiment, the entity extraction model is a BERT entity extraction model based on the CRF algorithm. Specifically, the CRF algorithm is a Conditional Random Field algorithm. Applying it to the BERT entity extraction model can randomly filter the text to be resolved, quickly identify word segments of different parts of speech in the text to be resolved, and improve the entity extraction speed.
[0108] Continue to refer to Figure 5 , Figure 5 yes Figure 2 A flowchart of a specific embodiment of step 205 shown includes:
[0109] Step 501: Perform word segmentation on the text to be resolved based on the word segmentation component in the entity extraction model to obtain the word segmentation result;
[0110] Step 502: Perform part-of-speech analysis on the word segmentation result using the part-of-speech analysis component in the entity extraction model to obtain the noun field in the word segmentation result;
[0111] Step 503: Output the noun field according to the output component in the entity extraction model to complete the extraction of all entity data contained in the text to be resolved.
[0112] Step 206: According to the preset thesaurus, classify and organize all entity data contained in the text to be resolved into synonyms to obtain the classified entity data to be trained.
[0113] In this embodiment, the terms included in the preset thesaurus are entity data in the target knowledge graph, and entity data with the same or similar semantics corresponding to the entity data in the target knowledge graph. All entity data in the text to be resolved may be entity data not in the target knowledge graph. Therefore, the purpose of using the thesaurus is to uniformly classify and organize the entity data not in the target knowledge graph in the text to be resolved into entity data in the target knowledge graph, so as to facilitate the subsequent use of the representation learning model to perform entity relation vector representation on all entity data in the text to be resolved.
[0114] Step 207: Using the entity relation vector containing semantic information as a supervision signal, input the entity data to be trained into the representation learning model to obtain the entity relation vector containing semantic information corresponding to the entity data to be trained.
[0115] By using the entity relation vector containing semantic information as a supervision signal, the entity data to be trained is input into the representation learning model to obtain the entity relation vector containing semantic information corresponding to the entity data to be trained. Using the entity relation vector containing semantic information trained according to the target knowledge graph as a supervision signal facilitates learning supervision when representing the entity relation vector containing semantic information corresponding to the entity data to be trained, thus ensuring the accuracy of the entity relation vector representation.
[0116] Step 208: Based on the entity relation vector containing semantic information corresponding to the entity data to be trained, perform referential resolution on the entity data contained in the text to be resolved.
[0117] Continue to refer to Figure 6 , Figure 6 yes Figure 2 A flowchart of a specific embodiment of step 208 shown includes:
[0118] Step 601: Select one entity data to be trained from the entity data to be trained as the comparison entity data;
[0119] Step 602: Obtain the entity relation vector containing semantic information corresponding to the comparison entity data, as the first vector;
[0120] Step 603: Sequentially obtain the entity relation vectors containing semantic information corresponding to the other entity data to be trained in the entity data to be trained, excluding the comparison entity data, and use them as the second vector;
[0121] Step 604: Calculate the similarity between the first vector and the second vector according to a preset similarity algorithm;
[0122] Continue to refer to Figure 7 , Figure 7 yes Figure 6 A flowchart of a specific embodiment of step 604 shown includes:
[0123] Step 701: Obtain the first vector corresponding to the comparison entity data, which is fitted by the head entity data, relation representation data and tail entity data;
[0124] Step 702: Obtain the vector data corresponding to the second vector, which is fitted by the head entity data, relation representation data and tail entity data;
[0125] Step 703: Using the cosine similarity algorithm, calculate the similarity between the head entity data, relation representation data, and tail entity data in the first vector and the second vector;
[0126] Step 704: The similarity between the head entity data, relation representation data, and tail entity data in the first vector and the second vector is used as the similarity between the first vector and the second vector.
[0127] Step 605: If the similarity meets a preset similarity threshold, then the comparison entity data refers to the training entity data corresponding to the second vector, and the comparison entity data refers to the synonyms of the training entity data corresponding to the second vector.
[0128] This application obtains all entity data and relational representations between all entity data by parsing the target knowledge graph; inputs all entity data and relational representations between all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtains text to be resolved by substitution; inputs the text to be resolved by substitution into a preset entity extraction model to extract all entity data contained in the text to be resolved by substitution; performs synonym classification and organization on all entity data contained in the text to be resolved by substitution to obtain entity data to be trained; inputs the entity data to be trained into the representation learning model, and uses the entity relation vectors containing semantic information as supervision signals to obtain entity relation vectors containing semantic information corresponding to the entity data to be trained; and performs substitution resolution on the entity data contained in the text to be resolved by substitution based on the entity relation vectors containing semantic information corresponding to the entity data to be trained. By using the entity relation vectors contained in the target recognition graph obtained by the representation learning model as supervision signals, the entity relation vectors between all entity data in the text to be resolved by substitution are determined, which better assists the financial processing system in performing text content substitution resolution.
[0129] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.
[0130] Foundational artificial intelligence technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, large text content dereference techniques, operating / interactive systems, and mechatronics. Artificial intelligence software technologies mainly include computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.
[0131] In this embodiment, by parsing the target knowledge graph, all entity data and relational representation data between all entity data are obtained; all entity data and relational representation data between all entity data are input into a preset representation learning model to obtain entity relation vectors containing semantic information; text to be resolved by substitution is obtained; the text to be resolved by substitution is input into a preset entity extraction model to extract all entity data contained in the text to be resolved by substitution; all entity data contained in the text to be resolved by substitution are classified and organized by synonyms to obtain entity data to be trained; the entity data to be trained is input into the representation learning model, and the entity relation vectors containing semantic information are used as supervision signals to obtain entity relation vectors containing semantic information corresponding to the entity data to be trained; based on the entity relation vectors containing semantic information corresponding to the entity data to be trained, the entity data contained in the text to be resolved by substitution is resolved. By using the entity relation vectors contained in the target recognition graph obtained by the representation learning model as supervision signals, the entity relation vectors between all entity data in the text to be resolved by substitution are determined, which better assists the financial processing system in resolving text content substitution.
[0132] Further reference Figure 8 As a response to the above Figure 2 To implement the method shown, this application provides an embodiment of a text content referencing resolution device, which is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.
[0133] like Figure 8 As shown, the text content referencing resolution device 800 described in this embodiment includes: a knowledge graph acquisition module 801, a knowledge graph parsing module 802, an entity relation vector acquisition module 803, a text acquisition module 804, an entity data extraction module 805, a synonym classification and organization module 806, a supervised learning module 807, and a referencing resolution processing module 808. Wherein:
[0134] The knowledge graph acquisition module 801 is used to acquire a target knowledge graph, wherein the target knowledge graph is a knowledge graph constructed based on a preset text knowledge base;
[0135] The knowledge graph parsing module 802 is used to parse the target knowledge graph to obtain all entity data and the relationship representation data between all entity data.
[0136] The entity relation vector acquisition module 803 is used to input all entity data and relation representation data between all entity data into a preset representation learning model, and obtain an entity relation vector containing semantic information based on the output of the representation learning model. The entity relation vector containing semantic information is fitted by head entity data, relation representation data and tail entity data.
[0137] The text acquisition module 804 is used to acquire the text to be dereferenced;
[0138] The entity data extraction module 805 is used to input the text to be resolved into a preset entity extraction model, and extract all entity data contained in the text to be resolved according to the entity extraction model.
[0139] The synonym classification and organization module 806 is used to classify and organize all entity data contained in the text to be resolved into synonyms according to a preset thesaurus, and obtain the classified and organized entity data to be trained.
[0140] The supervised learning module 807 is used to input the entity data to be trained into the representation learning model using the entity relation vector containing semantic information as a supervision signal, so as to obtain the entity relation vector containing semantic information corresponding to the entity data to be trained.
[0141] The referential resolution processing module 808 is used to perform referential resolution on the entity data contained in the text to be resolved based on the entity relation vector containing semantic information corresponding to the entity data to be trained.
[0142] This application obtains all entity data and relational representations between all entity data by parsing the target knowledge graph; inputs all entity data and relational representations between all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtains text to be resolved by substitution; inputs the text to be resolved by substitution into a preset entity extraction model to extract all entity data contained in the text to be resolved by substitution; performs synonym classification and organization on all entity data contained in the text to be resolved by substitution to obtain entity data to be trained; inputs the entity data to be trained into the representation learning model, and uses the entity relation vectors containing semantic information as supervision signals to obtain entity relation vectors containing semantic information corresponding to the entity data to be trained; and performs substitution resolution on the entity data contained in the text to be resolved by substitution based on the entity relation vectors containing semantic information corresponding to the entity data to be trained. By using the entity relation vectors contained in the target recognition graph obtained by the representation learning model as supervision signals, the entity relation vectors between all entity data in the text to be resolved by substitution are determined, which better assists the financial processing system in performing text content substitution resolution.
[0143] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions. These computer-readable instructions can be stored in a computer-readable storage medium. When the program is executed, it can include the processes of the embodiments of the methods described above. The aforementioned storage medium can be a non-volatile storage medium such as a magnetic disk, optical disk, or read-only memory (ROM), or random access memory (RAM).
[0144] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0145] To address the aforementioned technical problems, embodiments of this application also provide a computer device. Please refer to [link / reference needed]. Figure 9 , Figure 9 This is a basic structural block diagram of the computer device in this embodiment.
[0146] The computer device 9 includes a memory 9a, a processor 9b, and a network interface 9c that are interconnected via a system bus. It should be noted that only the computer device 9 with components 9a-9c is shown in the figure; however, it should be understood that it is not required to implement all the shown components, and more or fewer components can be implemented alternatively. Those skilled in the art will understand that the computer device described here is a device capable of automatically performing numerical calculations and / or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.
[0147] The computer device can be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device can interact with the user via a keyboard, mouse, remote control, touchpad, or voice control.
[0148] The memory 9a includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, disk, optical disk, etc. In some embodiments, the memory 9a may be an internal storage unit of the computer device 9, such as the hard disk or memory of the computer device 9. In other embodiments, the memory 9a may also be an external storage device of the computer device 9, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the computer device 9. Of course, the memory 9a may include both the internal storage unit and its external storage device of the computer device 9. In this embodiment, the memory 9a is typically used to store the operating system and various application software installed on the computer device 9, such as computer-readable instructions for a text content referencing resolution method. In addition, the memory 9a can also be used to temporarily store various types of data that have been output or will be output.
[0149] In some embodiments, the processor 9b may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other text content decryption chip. The processor 9b is typically used to control the overall operation of the computer device 9. In this embodiment, the processor 9b is used to execute computer-readable instructions stored in the memory 9a or to process data, for example, to execute computer-readable instructions for the text content decryption method.
[0150] The network interface 9c may include a wireless network interface or a wired network interface, which is typically used to establish communication connections between the computer device 9 and other electronic devices.
[0151] The computer device proposed in this embodiment belongs to the field of financial technology and is applied in the scenario of text content referencing resolution. This application obtains all entity data and relational representation data between all entity data by parsing the target knowledge graph; inputs all entity data and relational representation data between all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtains the text to be resolved; inputs the text to be resolved into a preset entity extraction model to extract all entity data contained in the text; performs synonym classification and organization on all entity data contained in the text to be resolved to obtain entity data to be trained; inputs the entity data to be trained into the representation learning model, and uses the entity relation vectors containing semantic information as supervision signals to obtain entity relation vectors containing semantic information corresponding to the entity data to be trained; and performs referencing resolution on the entity data contained in the text to be resolved based on the entity relation vectors containing semantic information corresponding to the entity data to be trained. By using the entity relationship vectors obtained from the target recognition map through the representation learning model as supervision signals, the entity relationship vectors between all entity data in the text to be resolved are determined, which better assists the financial processing system in resolving text content referencing.
[0152] This application also provides another embodiment, namely, providing a computer-readable storage medium storing computer-readable instructions that can be executed by a processor to cause the processor to perform the steps of the text content referencing resolution method described above.
[0153] The computer-readable storage medium proposed in this embodiment belongs to the field of financial technology and is applied in the scenario of text content referencing resolution. This application obtains all entity data and relational representation data between all entity data by parsing the target knowledge graph; inputs all entity data and relational representation data between all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtains the text to be resolved; inputs the text to be resolved into a preset entity extraction model to extract all entity data contained in the text; performs synonym classification and organization on all entity data contained in the text to be resolved to obtain entity data to be trained; inputs the entity data to be trained into the representation learning model, and uses the entity relation vectors containing semantic information as supervision signals to obtain entity relation vectors containing semantic information corresponding to the entity data to be trained; and performs referencing resolution on the entity data contained in the text to be resolved based on the entity relation vectors containing semantic information corresponding to the entity data to be trained. By using the entity relationship vectors obtained from the target recognition map through the representation learning model as supervision signals, the entity relationship vectors between all entity data in the text to be resolved are determined, which better assists the financial processing system in resolving text content referencing.
[0154] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0155] Obviously, the embodiments described above are only some embodiments of this application, not all embodiments. The accompanying drawings show preferred embodiments of this application, but do not limit the patent scope of this application. This application can be implemented in many different forms; rather, the purpose of providing these embodiments is to provide a more thorough and comprehensive understanding of the disclosure of this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this application's specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this application.
Claims
1. A method for resolving text content referencing, characterized in that, Includes the following steps: Obtain the target knowledge graph; The target knowledge graph is parsed to obtain all entity data and the relational representation data between all entity data. All entity data and relation representation data between all entity data are input into a preset representation learning model. Based on the output of the representation learning model, an entity relation vector containing semantic information is obtained. The entity relation vector containing semantic information is fitted by head entity data, relation representation data and tail entity data. Obtain the text to be resolved by dereference; The text to be resolved by substitution is input into a preset entity extraction model, and all entity data contained in the text to be resolved by substitution are extracted according to the entity extraction model. Based on a preset thesaurus, all entity data contained in the text to be resolved are classified and organized using synonyms to obtain the classified and organized entity data to be trained. Using the entity relation vector containing semantic information as a supervision signal, the entity data to be trained is input into the representation learning model to obtain the entity relation vector containing semantic information corresponding to the entity data to be trained; Based on the entity relation vector containing semantic information corresponding to the entity data to be trained, the entity data contained in the text to be resolved is resolved by referencing.
2. The text content referencing resolution method according to claim 1, characterized in that, The step of parsing the target knowledge graph to obtain all entity data and the relationship representation data between all entity data specifically includes: The target knowledge graph is initially analyzed to obtain all graph nodes and connections between them. Based on the graph node identifiers and a preset first reference table, the entity data corresponding to each node is determined. The first reference table contains the mapping relationship between the graph node identifiers and the entity data. Based on the connection identifiers and a preset second reference table, the relation representation data corresponding to the connections between all graph nodes are determined. The second reference table contains the mapping relationship between the connection identifiers and the relation representation data.
3. The text content referencing resolution method according to claim 1, characterized in that, Before performing the step of inputting all entity data and the relation representation data between all entity data into a preset representation learning model, and obtaining an entity relation vector containing semantic information based on the output of the representation learning model, the method further includes: The target knowledge graph is deployed as a learning reference graph into the representation learning model; The target knowledge graph is initialized to obtain entity location vectors and relationship representation vectors contained in the target knowledge graph. The entity location vectors are obtained based on the location information of all entity data in the target knowledge graph. The relationship representation vectors are composed of magnitude information and angle information, which are calculated based on the location information of two entity data with a connection relationship in the target knowledge graph.
4. The text content referencing resolution method according to claim 1 or 3, characterized in that, The step of inputting all entity data and the relation representation data between all entity data into a preset representation learning model, and obtaining an entity relation vector containing semantic information based on the output of the representation learning model, specifically includes: By comparison, the entity location vectors corresponding to all the entity data are identified; Select one entity from all the entity data as the header entity data; Randomly select one entity from all the entity data as the tail entity data; Based on the entity position vectors corresponding to the head entity data and tail entity data respectively, the modulus information and angle information between the head entity data and tail entity data are calculated. Based on the target knowledge graph, determine whether there is direct relationship representation data between the head entity data and the tail entity data; If there is direct relation representation data between the head entity data and the tail entity data, then the entity relation vector containing semantic information is constructed based on the entity position vector of the head entity data, the entity position vector of the tail entity data, and the relation representation data. If there is no direct relationship representation data between the head entity data and the tail entity data, the indirect relationship representation data contained in the optimal path between the head entity data and the tail entity data will be obtained according to the preset filtering strategy. The optimal path is the path when the number of indirect relationship representation data contained is the minimum value identified by the filtering strategy. The entity relationship vector containing semantic information is constructed based on the entity position vector of the head entity data, the entity position vector of the tail entity data, and the indirect relationship representation data under the best path contained between the head entity data and the tail entity data.
5. The text content referencing resolution method according to claim 1, characterized in that, The entity extraction model is a BERT entity extraction model based on the CRF algorithm. The step of inputting the text to be dereferenced into the preset entity extraction model and extracting all entity data contained in the text to be dereferenced according to the entity extraction model specifically includes: The text to be resolved by referential analysis is segmented according to the word segmentation component in the entity extraction model to obtain the word segmentation result; The word segmentation results are analyzed by the part-of-speech analysis component in the entity extraction model to obtain the noun field in the word segmentation results. The noun field is output according to the output component in the entity extraction model, thereby completing the extraction of all entity data contained in the text to be resolved.
6. The text content referencing resolution method according to claim 1, characterized in that, The step of performing referential dissociation on the entity data contained in the text to be dissociated based on the entity relation vector containing semantic information corresponding to the entity data to be trained specifically includes: Select one entity data to be trained from the entity data to be trained as the comparison entity data; Obtain the entity relation vector containing semantic information corresponding to the compared entity data, and use it as the first vector; Sequentially obtain the entity relation vectors containing semantic information corresponding to the other entity data to be trained in the entity data to be trained, excluding the comparison entity data, and use them as the second vector; The similarity between the first vector and the second vector is calculated according to a preset similarity algorithm. If the similarity meets a preset similarity threshold, then the comparison entity data refers to the training entity data corresponding to the second vector, and the comparison entity data refers to the synonyms of the training entity data corresponding to the second vector.
7. The text content referencing resolution method according to claim 6, characterized in that, The step of calculating the similarity between the first vector and the second vector according to a preset similarity algorithm specifically includes: Obtain the first vector corresponding to the comparison entity data, which is fitted by the head entity data, relation representation data and tail entity data; Obtain the vector data corresponding to the second vector, which is fitted from the head entity data, relation representation data, and tail entity data; The cosine similarity algorithm is used to calculate the similarity between the head entity data, relation representation data, and tail entity data in the first vector and the second vector. The similarity between the head entity data, relation representation data, and tail entity data in the first vector and the second vector is used as the similarity between the first vector and the second vector.
8. A text content referencing resolution device, characterized in that, include: The knowledge graph acquisition module is used to acquire the target knowledge graph; The knowledge graph parsing module is used to parse the target knowledge graph to obtain all entity data and the relationship representation data between all entity data. The entity relation vector acquisition module is used to input all entity data and relation representation data between all entity data into a preset representation learning model, and obtain an entity relation vector containing semantic information based on the output of the representation learning model. The entity relation vector containing semantic information is fitted by head entity data, relation representation data and tail entity data. The text acquisition module is used to obtain the text to be dereferenced; The entity data extraction module is used to input the text to be resolved into a preset entity extraction model, and extract all entity data contained in the text to be resolved according to the entity extraction model. The synonym classification and organization module is used to classify and organize all entity data contained in the text to be resolved into synonyms according to a preset thesaurus, and obtain the classified and organized entity data to be trained. The supervised learning module is used to input the entity data to be trained into the representation learning model using the entity relation vector containing semantic information as a supervision signal, so as to obtain the entity relation vector containing semantic information corresponding to the entity data to be trained. The referential resolution processing module is used to perform referential resolution on the entity data contained in the text to be resolved based on the entity relation vector containing semantic information corresponding to the entity data to be trained.
9. A computer device, characterized in that, The method includes a memory and a processor, wherein the memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the steps of the text content referencing resolution method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the text content referencing resolution method as described in any one of claims 1 to 7.