Knowledge graph construction method, information query method and related device
By expanding and constructing triplet data, the problem of constructing knowledge graphs for semi-structured data is solved, enabling rapid and simplified graph construction and efficient querying.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- MASHANG CONSUMER FINANCE CO LTD
- Filing Date
- 2022-08-02
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to quickly construct knowledge graphs for semi-structured data, and the query process is complex, making it difficult to effectively utilize the information within semi-structured data.
By determining the number of cells in each row of semi-structured data, expanding the number of cells in the target row to make the number of cells in all rows the same, extracting text information and constructing triple data, and finally building a knowledge graph based on the triple data, and providing information query methods.
It simplifies the knowledge graph construction process, enables rapid knowledge graph construction, reduces the workload of domain experts/builders, ensures that information is not lost, and improves the accuracy and efficiency of queries.
Smart Images

Figure CN116383393B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer data processing technology, and in particular to a method for constructing a knowledge graph, an information query method, and related devices. Background Technology
[0002] In real life, much data is presented in a semi-structured format. For example, in consumer finance product data, the detailed data for each financial product is generally stored as semi-structured data in document format.
[0003] In semi-structured data, the steps for querying data differ depending on the channel. More complexly, product-related information may vary depending on the channel and the specific service being processed. For example, in the financial sector, when querying "change bank card," the steps differ depending on the channel (app, mini-program, official account) or the specific service being processed (application, consumption, withdrawal, repayment, etc.). In semi-structured data, this manifests as various complex multi-layered cell formats.
[0004] Therefore, how to quickly extract knowledge from hundreds or thousands of semi-structured documents to assist in the ontology design of the graph is an urgent problem to be solved. Summary of the Invention
[0005] The main technical problem solved by this invention is to simplify the knowledge graph construction process and quickly construct knowledge graphs.
[0006] To solve the above technical problems, one technical solution adopted by the present invention is: providing a method for constructing a knowledge graph, the method comprising: acquiring semi-structured data, and obtaining a first value corresponding to the number of cells in each row of the semi-structured data; determining the first value with the largest value as the number of target cells; expanding the number of cells in the target row based on the number of target cells, and extracting the text information of all cells after expansion, wherein the row with the first value less than the number of target cells is the target row; obtaining triple data corresponding to the cells in the row based on the text information in the first cell of the row, the text information in the other cells of the row excluding the first cell, and the correspondence between the text information in the first cell of the row and the text information in the other cells of the row, wherein the first cell is the last cell in the row; and constructing the knowledge graph based on the triple data of each row.
[0007] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is to provide an information query method, comprising: obtaining information to be queried; extracting keywords from the information to be queried; and querying the query results corresponding to the information to be queried from a knowledge graph based on the keywords, wherein the knowledge graph is constructed by the construction method described above.
[0008] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is: providing a knowledge graph construction device, comprising: an acquisition module, used to acquire semi-structured data and obtain a first value for each row according to the number of cells in each row of the semi-structured data; a cell determination module, used to determine the first value with the largest value as the number of target cells, wherein the number of target cells is the largest among the number of cells; an expansion module, used to expand the number of cells in the target row based on the number of target cells, and extract the text information of all cells after expansion, wherein the row with the first value less than the number of target cells is the target row; a triplet data determination module, used to obtain triplet data corresponding to the cells in the row based on the text information in the first cell of the row, the text information in other cells of the row excluding the first cell, and the correspondence between the text information in the first cell of the row and the text information in the other cells of the row, wherein the first cell is the last cell in the row; and a knowledge graph construction module, used to construct the knowledge graph based on the triplet data of each row.
[0009] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is to provide an information query device, the device comprising: an information acquisition module for acquiring information to be queried; a keyword extraction module for extracting keywords from the information to be queried; and a query module for querying query results corresponding to the information to be queried from a knowledge graph based on the keywords, wherein the knowledge graph is constructed by any of the above-described construction methods.
[0010] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is: to provide a smart terminal, including a processor and a memory coupled to each other, wherein the memory is used to store program instructions for implementing the method described in any one of the above-mentioned methods; and the processor is used to execute the program instructions stored in the memory.
[0011] To solve the above-mentioned technical problems, another technical solution adopted by the present invention is to provide a storage medium storing a program file, wherein the program file can be executed to implement the method described in any of the above-mentioned methods.
[0012] The beneficial effects of this invention are as follows: Unlike existing technologies, the knowledge graph construction method proposed in this invention first determines the number of cells in each row of the semi-structured data; determines the number of target cells based on the number of cells in each row; expands the number of cells in each row based on the number of target cells, and extracts the text information of all expanded cells; obtains triplet data corresponding to each row of cells based on the text information in the first cell, the text information in other cells besides the first cell, and the correspondence between the text information in the first cell and the text information in the other cells, where the first cell is the last cell in each row; and constructs the knowledge graph based on the triplet data. This knowledge graph construction method simplifies the construction process of building a knowledge graph based on semi-structured data by expanding the data to ensure the same number of cells in all rows, then extracting the text information from each cell to construct unit group data, and finally constructing the knowledge graph based on the triplet data. Attached Figure Description
[0013] Figure 1 This is a flowchart illustrating an embodiment of the knowledge graph construction method of the present invention;
[0014] Figure 2 and Figure 3 yes Figure 1 A schematic diagram of an embodiment of step S13;
[0015] Figure 4 yes Figure 1 A schematic diagram of an embodiment of step S14;
[0016] Figure 5 yes Figure 1 A schematic diagram of an embodiment of step S15;
[0017] Figure 6 This is a schematic diagram of one embodiment of a knowledge graph;
[0018] Figure 7 This is a flowchart illustrating an embodiment of the information query method of the present invention;
[0019] Figure 8 Based on Figure 6 A diagram illustrating the query results of a knowledge graph;
[0020] Figure 9 This is a schematic diagram of the structure of an embodiment of the knowledge graph construction device of the present invention;
[0021] Figure 10 This is a schematic diagram of the structure of an embodiment of the information query device of the present invention;
[0022] Figure 11 This is a schematic diagram of the structure of an embodiment of the smart terminal of the present invention;
[0023] Figure 12 This is a schematic diagram of the structure of the storage medium of the present invention. Detailed Implementation
[0024] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0025] The present invention will now be described in detail with reference to the accompanying drawings and embodiments.
[0026] Please see Figure 1 The diagram below is a structural illustration of the first embodiment of the knowledge graph construction method of this application, including:
[0027] Step S11: Obtain semi-structured data and obtain the first value according to the number of cells in each row of the semi-structured data.
[0028] It's important to note that while semi-structured data possesses a certain degree of structure, its structure is not entirely uniform. For example, when designing an information system, data storage is crucial, and system information is typically stored in a designated relational database. During storage, data is categorized by business function, corresponding tables are designed, and the relevant information is saved to the appropriate tables. For instance, in a business system, to store basic employee information such as employee ID, name, gender, and date of birth, we would create a corresponding `staff` table. However, not all information in the system can be simply represented by a single table field. Consider, for example, storing employee resumes. Unlike basic employee information, resumes vary significantly from employee to employee. Some resumes are simple, including only education details; others are complex, containing information such as work history, marital status, immigration records, residency status, and technical skills. This type of information is what we call semi-structured data.
[0029] This application proposes a method for constructing a knowledge graph based on semi-structured data. First, semi-structured data is obtained. For example, in the field of consumer finance, product-related semi-structured data is mostly presented in tabular form, where one cell corresponds to multiple pieces of knowledge. Therefore, it is necessary to determine the number of cells in each row of the semi-structured data.
[0030] Specifically, if the storage format of the semi-structured data is not a preset format, then the storage format of the semi-structured data is converted to the preset format. In one embodiment, the semi-structured data is generally stored in the form of a Word document. First, the format of the Word document is determined. If the Word document contains a doc format, then the doc format is converted to a docx format document. That is, the preset format is docx format. It should be noted that the save type for Word 2003 and earlier versions is doc; the save type for Word 2007 and later versions is docx. doc documents occupy more space, while docx documents occupy less space, which can save a lot of space. doc documents are slower to access, while docx documents are faster to access. Therefore, converting doc format to docx format can reduce the document's space usage and improve access speed. In one embodiment, a parsing function can be built using a third-party Python library to parse the semi-structured data to determine the number of cells in each row of the semi-structured data.
[0031] For example, parsing semi-structured data yields preliminary parsing results such as... Figure 2 As shown, the first, second, and third rows (product code) each have 2 cells; the fourth row (promotion name) has 2 cells; the fifth row (launch time) has 2 cells; the sixth row (product attributes) has 2 cells; and the seventh, eighth, and ninth rows (application channel) each have 3 cells. It's understandable that the first value in each row represents the number of cells in that row.
[0032] Step S12: Determine the number of target cells based on the largest first value.
[0033] Specifically, the number of target cells is determined by the largest numerical value among all cells. Based on the number of cells in each row of the semi-structured data, rows 7, 8, and 9 (application channel) have the largest number of cells (i.e., the largest first numerical value), which is 3. Therefore, the maximum number of cells in the semi-structured data is 3, while the number of cells corresponding to product code, promotion name, online status, and product attributes is 2.
[0034] Step S13: Expand the number of cells in the target row based on the number of target cells, and extract the text information of all expanded cells. The row whose first count is less than the number of target cells is the target row.
[0035] In one embodiment, if the first value (i.e., the number of cells) in the current row is less than the number of target cells, then it is determined as the target row, and the target row is expanded based on the number of target cells. Specifically, if the number of cells in the current row is less than the number of target cells, then the current row is determined as the target row, and the number of cells in the current row (i.e., the target row) is expanded so that the number of cells in the expanded target row is the same as the number of target cells. Figure 2 As shown, the target cell has 3 cells and the remaining cells have 2 cells. We need to increase the number of the remaining cells to 3.
[0036] In one embodiment, the text information in the first cell of the corresponding row is determined to be the text information in the expanded cells of the corresponding row. Specifically, as shown... Figure 3 As shown, for the first row of product codes, a cell is expanded using the text information "5103(4107.4108.4103)" from the first cell (i.e., the last cell). For the second row of product codes, a cell is expanded using the text information "Note: If it is an ICBC joint loan generated after March 20, 2019, then ICRM-Customer Details-Signed Contracts, under the contract number, you can find sub-contract 1591)" from the first cell (i.e., the last cell).
[0037] For the promotional name, expand one cell with "Anyihua"; for the launch time, expand one cell with the text information "around the end of 2016" from the first cell (i.e., the last cell); for the product attributes, expand one cell with "cash installment product under revolving credit". After expansion, each row has the same number of cells, which is 3.
[0038] Understandable, Figure 3 In the example shown, the text information of the first cell is determined as "5103(4107.4108.4103)", "Note: If it is a joint loan from ICBC generated after March 20, 2019, then you can find sub-contract 1591 under ICRM-Customer Details-Signed Contracts", "Anyi Hua", "Around the end of 2016", and "Cash installment product under revolving credit".
[0039] In one embodiment, after expanding the number of cells, the text information of all expanded cells is extracted, such as... Figure 3 As shown, the extracted text information is as follows:
[0040] ['Product Code', '5103(4107.4108.4103)', '5103(4107.4108.4103)']
[0041] ['Product Code', 'Note: If it is an ICBC joint loan generated after March 20, 2019, then you can find sub-contract 1591 under ICRM-Customer Details-Signed Contracts', 'Note: If it is an ICBC joint loan generated after March 20, 2019, then you can find sub-contract 1591 under ICRM-Customer Details-Signed Contracts']
[0042] ['Promotional Name', 'An Yi Hua', 'An Yi Hua']
[0043] ['Launch date', 'Around the end of 2016', 'Around the end of 2016']
[0044] ['Product Attributes', 'Cash Installment Products under Revolving Credit', 'Cash Installment Products under Revolving Credit']
[0045] [Application Channels, WeChat Official Accounts, AnYiHua, Instant Consumption]
[0046] ['Application Channels', 'APP', 'AnYiHua, MaShang Finance, AnYiHua Express']
[0047] ['Application Channels', 'WeChat Mini Program', 'AnYiHua (launched on March 8, 2018, does not support consumption function)'].
[0048] Because the number of cells was increased in step S13, duplicate information appeared in the extracted text. Therefore, it is necessary to deduplicate the extracted text. The deduplicated text is as follows:
[0049] ['Product Code', '5103(4107.4108.4103)']
[0050] ['Product Code', 'Note: If it is an ICBC joint loan generated after March 20, 2019, then you can find sub-contract 1591 under ICRM-Customer Details-Signed Contracts']
[0051] ['Promotional Name', 'An Yi Hua']
[0052] [Release date: Around the end of 2016]
[0053] ['Product Attributes', 'Cash Installment Products under Revolving Credit Limit']
[0054] [Application Channels, WeChat Official Accounts, AnYiHua, Instant Consumption]
[0055] ['Application Channels', 'APP', 'AnYiHua, MaShang Finance, AnYiHua Express']
[0056] ['Application Channels', 'WeChat Mini Program', 'AnYiHua (launched on March 8, 2018, does not support consumption function)'].
[0057] Step S14: Based on the text information in the first cell of the row, the text information in the other cells of the row excluding the first cell, and the correspondence between the text information in the first cell of the row and the text information in the other cells of the row, obtain the triplet data corresponding to the cell of the row, where the first cell is the last cell in each row.
[0058] Specifically, for text information with no duplicates and a character length greater than 2, all strings except the last one are merged, and a concatenation operator is defined to connect the strings.
[0059] Based on the above text information, the string length of the following information is 3, which is greater than 2:
[0060] [Application Channels, WeChat Official Accounts, AnYiHua, Instant Consumption]
[0061] ['Application Channels', 'APP', 'AnYiHua, MaShang Finance, AnYiHua Express']
[0062] ['Application Channels', 'WeChat Mini Program', 'AnYiHua (launched on March 8, 2018, does not support consumption function)'].
[0063] Assuming the defined concatenation operator is "—", using the concatenation operator to join strings will produce the following result:
[0064] [Application Channels - WeChat Official Account: AnYiHua, Instant Consumption]
[0065] [Application Channels - App, AnYiHua, MaShang Finance, AnYiHua Express]
[0066] ['Application Channel - WeChat Mini Program', 'Anyihua (launched on March 8, 2018, does not support consumption function)'].
[0067] In one embodiment, if the text information in the first cell of each of a consecutive set number of rows is the same, then the cells of the consecutive set number of rows are merged into the same row of cells, and the text information in the cells of the merged row is not repeated. Based on the extracted text information, it is known that the text information in the first cell of both the first and second rows is "product code". Therefore, merging the first and second rows into the same cell yields the initial merged result as follows:
[0068] ['Product Code', 'Product Code', '5103(4107.4108.4103)', 'Note: If it is an ICBC joint loan generated after March 20, 2019, then you can find sub-contract 1591 under ICRM-Customer Details-Signed Contracts'.
[0069] In one embodiment, the text information in the merged row cells is not duplicated. Therefore, duplicate 'product codes' are removed, resulting in the final merged result:
[0070] ['Product Code', '5103(4107.4108.4103)', 'Note: If it is an ICBC joint loan generated after March 20, 2019, then you can find sub-contract 1591 under ICRM-Customer Details-Signed Contracts'.
[0071] The result obtained after connecting and merging using the above connectors is as follows: Figure 4 As shown. Based on the text information in the first cell, the text information in all other cells except the first cell, and the correspondence between the text information in the first cell and the text information in the other cells, the triplet data corresponding to each row of cells is obtained. Specifically, the first cell is the last cell in each row. That is, the triplet data corresponding to each row of cells is obtained according to the correspondence between the text information in the last cell and the text information in the remaining cells.
[0072] Specifically, such as Figure 4 As shown, for the first row, the generated triplet data is: "Product Code", corresponding to "5103(4107.4108.4103)", 'Note: If it is an ICBC joint loan generated after March 20, 2019, then in ICRM-Customer Details-Signed Contracts, sub-contract 1591 can be found under the contract number.' It should be noted that the correspondence represents the attribute value.
[0073] For the second row, the generated triplet data is: "Promotion Name", corresponding to "Anyihua". For the other rows, the method for generating triplet data is the same as for the first and second rows, as shown in the example below. Figure 5 As shown.
[0074] Step S15: Construct the knowledge graph based on the triple data of each row.
[0075] After obtaining the triplet data, further clustering can be performed on the triplet data. Specifically, triplet data with the same key in multiple key-value pairs are clustered into one class. Clustering algorithms include, but are not limited to, k-means clustering algorithm and Gaussian mixture model (GMM), and are not specifically limited to any particular algorithm.
[0076] Specifically, each triplet corresponds to a key-value pair, and triplets with the same key in the key-value pair are clustered into one category. The value in the key-value pair is the text information in the first cell of the corresponding row, and the key in the key-value pair is the text information in other cells of the corresponding row besides the first cell. The keys and at least one value corresponding to each key in the same category are respectively treated as entities, and the key-value pair relationships are used as edges to construct the knowledge graph.
[0077] Specifically, the value in the key-value pair is the text information in the first cell, and the key in the key-value pair is the text information in other cells besides the first cell; the key in the same category and at least one value corresponding to the key are respectively regarded as entities, and the correspondence between the key and the value of the key-value pair is regarded as an edge to construct a knowledge graph.
[0078] by Figure 6 For example, for Figure 5 The knowledge graph generated from lines 5, 6, and 7 is as follows: Figure 6 As shown. Specifically, for row 5: "Application Channel - WeChat Official Account" is the key, and "AnYiHua WeChat Official Account" and "MaShangHuiBuy WeChat Official Account" are the values. Since "Application Channel - WeChat Official Account" includes both "AnYiHua WeChat Official Account" and "MaShangHuiBuy WeChat Official Account," the corresponding relationship is "inclusion." This relationship is an edge. Connecting "AnYiHua WeChat Official Account" with "Application Channel - WeChat Official Account," and connecting "MaShangHuiBuy WeChat Official Account" with "Application Channel - WeChat Official Account," constructs the knowledge graph corresponding to row 5. For row 6: "Application Channel - APP" is the key, and "AnYiHua APP," "MaShangHui APP," and "AnYiHua Express APP" are the values. Since "Application Channel - APP" includes "AnYiHua APP," "MaShangHui APP," and "AnYiHua Express APP," the corresponding relationship is "inclusion." This relationship is an edge. Connecting "AnYiHua APP" with "Application Channel - APP," "MaShangHui APP" with "Application Channel - APP," and connecting "AnYiHua Express APP" with "Application Channel - APP" constructs the knowledge graph corresponding to row 6. Row 7 follows the same method and will not be elaborated further.
[0079] In one embodiment, the knowledge graph construction method of this application is applied to the construction of a knowledge graph for financial consumption.
[0080] Existing knowledge graph construction methods mostly target the processing of structured and unstructured data, lacking solutions specifically for semi-structured data. This proposed solution, however, can process semi-structured data, significantly reducing the workload for domain experts / builders in summarizing and generalizing during the knowledge graph construction process, thereby lowering the difficulty of graph construction. Addressing the multi-layered cell problem in the unstructured data extraction process, this application proposes a key-value pair construction scheme, ensuring a simple, fast process without information loss.
[0081] Please see Figure 7 The above is a flowchart illustrating an embodiment of the information query method of the present invention, specifically including:
[0082] Step S21: Obtain the information to be queried.
[0083] Step S22: Extract keywords from the information to be queried;
[0084] Step S23: Based on the keywords, query the query results corresponding to the information to be queried from the knowledge graph, wherein the knowledge graph is constructed by the construction method described in any one of claims 1 to 6.
[0085] Specifically, in one embodiment, assuming a user wants to inquire about the credit limit of AnYiHua, the information to be queried is "What is the credit limit of AnYiHua?". Keywords are then extracted from this information. Specifically, intent recognition and entity recognition are performed on the information to be queried. It should be noted that entity recognition identifies the entities within the query information, while intent recognition identifies the information the user wants to query. Taking "What is the credit limit of AnYiHua?" as an example, the entity is "AnYiHua," and the intent is "credit limit." The entity and intent are identified as keywords, and based on these keywords "AnYiHua" and "credit limit," the query results corresponding to the information to be queried are retrieved from the knowledge graph.
[0086] In another embodiment, assuming a user wants to query the application steps of AnYiHua in Alipay, the information to be queried is "application steps of AnYiHua in Alipay". At this time, the keywords in the information to be queried are extracted to obtain the entity "AnYiHua-Alipay" and the intent "application steps". The entity and intent are determined as keywords, and the query results corresponding to the information to be queried are retrieved from the knowledge graph based on the keywords "AnYiHua-Alipay" and "application steps".
[0087] Please see Figure 8 For example, if the query information is "Anyihua application channels", and the keywords are "Anyihua" and "application channels", the search results based on these keywords will be "Anyihua APP", "Anyihua WeChat Official Account", "Anyihua Express APP", and "Anyihua WeChat Mini Program".
[0088] The knowledge graph in this embodiment is obtained through the above... Figure 1 The construction method shown will not be elaborated further here. This application's knowledge graph-based query method is simple, fast, and provides accurate information.
[0089] Please see Figure 9 The module includes module 31 for obtaining data, module 32 for determining cell data, module 33 for expanding data, module 34 for determining triplet data, and module 35 for constructing knowledge graphs.
[0090] The acquisition module 31 is used to acquire semi-structured data and obtain the first value of each row according to the number of cells in each row of the semi-structured data. In one embodiment, the acquisition module 31 is further used to convert the storage format of the semi-structured data into a preset format if the storage format of the semi-structured data is not a preset format.
[0091] The cell determination module 32 is used to determine the first value with the largest value as the number of target cells, wherein the number of target cells is the largest value among the number of cells.
[0092] The expansion module 33 is used to expand the number of cells in a target row based on the number of target cells, and extract the text information of all cells after expansion. The row whose first value is less than the number of target cells is the target row. The expansion module 33 is also used to determine the current row as the target row if the number of cells in the current row is less than the number of target cells, and expand the number of cells in the target row so that the number of cells in the expanded target row is the same as the number of target cells. The expansion module 33 is also used to determine the text information in the first cell of the corresponding row as the text information in the expanded cells of the corresponding row. The expansion module 33 is also used to merge the cells of the consecutively defined number of rows into the same row if the text information in the first cell of each row is the same, ensuring that the text information in the cells of the merged row is not repeated.
[0093] The triplet data determination module 34 is used to obtain the triplet data corresponding to the cell in the row based on the text information in the first cell of the row, the text information in the other cells of the row other than the first cell, and the correspondence between the text information in the first cell of the row and the text information in the other cells of the row. The first cell is the last cell in the row.
[0094] The knowledge graph construction module 35 is used to construct the knowledge graph based on the triple data of each row. Each triple data corresponds to a key-value pair. The knowledge graph construction module 35 is used to cluster triple data with the same key in the key-value pair into one category; wherein, the value in the key-value pair is the text information in the first cell of the corresponding row, and the key in the key-value pair is the text information in other cells of the corresponding row other than the first cell; the key in the same category and at least one value corresponding to the key are respectively treated as entities, and the correspondence between the key and value of the key-value pair is treated as edges to construct the knowledge graph.
[0095] This solution can process semi-structured data, significantly reducing the workload of domain experts / builders in summarizing and generalizing during the knowledge graph construction process, thereby lowering the difficulty of graph construction. Addressing the multi-layered cell problem in the unstructured data extraction process, this application proposes entity concatenation to ensure a simple, fast process without information loss.
[0096] Please see Figure 10 This is a schematic diagram of the structure of an embodiment of the information query device of the present invention, specifically including: an information acquisition module 41, a keyword extraction module 42, and a query module 43. The information acquisition module 41 is used to acquire information to be queried; the keyword extraction module 42 is used to extract keywords from the information to be queried; and the query module 43 is used to query the query results corresponding to the information to be queried from a knowledge graph based on the keywords. The knowledge graph is constructed using the construction method described in any one of claims 1 to 6.
[0097] Please see Figure 11 This is a schematic diagram of the structure of an embodiment of the smart terminal of the present invention. The smart terminal includes a memory 52 and a processor 51 connected to each other.
[0098] The memory 52 is used to store program instructions for implementing any of the above methods.
[0099] Processor 51 is used to execute program instructions stored in memory 52.
[0100] The processor 51 can also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip with signal processing capabilities. The processor 51 can also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. A general-purpose processor can be a microprocessor or any conventional processor.
[0101] The memory 52 can be a memory module, TF card, etc., and can store all information in the smart terminal, including raw input data, computer programs, intermediate running results, and final running results. It stores and retrieves information according to the location specified by the controller. With memory, the smart terminal has a memory function and can ensure normal operation. Memory in a smart terminal can be classified according to its purpose into main memory (RAM) and auxiliary memory (external storage), or it can be classified into external memory and internal memory. External storage is usually magnetic media or optical discs, which can store information for a long time. RAM refers to the storage components on the motherboard, used to store currently executing data and programs, but it is only used for temporary storage; the data will be lost when the power is turned off.
[0102] In the several embodiments provided in this application, it should be understood that the disclosed methods and apparatus can be implemented in other ways. For example, the apparatus implementations described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0103] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment, depending on actual needs.
[0104] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0105] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, system server, or network device, etc.) or processor to execute all or part of the steps of the methods of the various embodiments of this application.
[0106] Please see Figure 12 This is a schematic diagram of the structure of the storage medium of the present invention. The storage medium of this application stores a program file 61 capable of implementing all the above methods. The program file 61 can be stored in the storage medium in the form of a software product, including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods of each embodiment of this application. The aforementioned storage device includes various media capable of storing program code, such as a USB flash drive, portable hard drive, read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk, or terminal devices such as computers, servers, mobile phones, and tablets.
[0107] The above are merely embodiments of the present invention and do not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.
Claims
1. A method for constructing a knowledge graph, characterized in that, The construction method includes: Obtain semi-structured data, and obtain the first value of each row according to the number of cells in each row of the semi-structured data; The largest single value is determined as the number of target cells; Based on the number of target cells, the number of cells in the target row is expanded, and the text information of all expanded cells is extracted. The row with the first value less than the number of target cells is the target row. The number of cells in the expanded target row is the same as the number of target cells. The text information in the first cell of the row is determined as the text information in the expanded cells of the row. The first cell is the last cell of each row. Based on the text information in the first cell of the row, the text information in the other cells of the row excluding the first cell, and the correspondence between the text information in the first cell of the row and the text information in the other cells of the row, the triplet data corresponding to the cell of the row is obtained; The knowledge graph is constructed based on the triple data in each row. 2.The method of claim 1, wherein, Before the step of expanding the number of cells in the target row based on the number of target cells, the construction method further includes: If the number of cells in the current row is less than the number of cells in the target row, then the current row is designated as the target row, and the number of cells in the target row is increased. 3.The method of claim 1, wherein, The construction method also includes: If the text information in the first cell of each of the consecutive rows is the same, then the cells of the consecutive rows will be merged into the same row, and the text information in the cells of the merged row will not be repeated.
4. The method for constructing a knowledge graph according to claim 1, characterized in that, Before the step of acquiring semi-structured data and obtaining the first value of each row based on the number of cells in each row of the semi-structured data, the following steps are included: If the storage format of the semi-structured data is not a preset format, then the storage format of the semi-structured data will be converted to the preset format. 5.The method of claim 1, wherein, The step of constructing the knowledge graph based on the triple data of each row includes: Each triplet data corresponds to a key-value pair, and triplet data with the same key in the key-value pair are clustered into one category; wherein, the value in the key-value pair is the text information in the first cell of the corresponding row, and the key in the key-value pair is the text information in other cells of the corresponding row other than the first cell; The knowledge graph is constructed by taking the keys in the same class and at least one value corresponding to the key as entities, and the correspondence between the key and the value of the key-value pair as edges.
6. An information search method characterized by comprising: include: Retrieve the information to be queried; Extract keywords from the information to be queried; Based on the keywords, query results corresponding to the information to be queried are retrieved from the knowledge graph; wherein the knowledge graph is constructed using the construction method described in any one of claims 1 to 5.
7. A knowledge graph construction apparatus, characterized in that, The device includes: The acquisition module is used to acquire semi-structured data and obtain the first value of each row according to the number of cells in each row of the semi-structured data; The cell determination module determines the number of target cells by identifying the largest first value among the total number of cells. An expansion module is used to expand the number of cells in a target row based on the number of target cells, extract the text information of all cells after expansion, wherein the first value is less than the number of target cells, the row is the target row, the number of cells in the expanded target row is the same as the number of target cells, and the text information in the first cell of the row is determined as the text information in the expanded cells of the row, wherein the first cell is the last cell of each row. The triplet data determination module is used to obtain the triplet data corresponding to the cell in the row based on the text information in the first cell of the row, the text information in the other cells of the row other than the first cell, and the correspondence between the text information in the first cell of the row and the text information in the other cells of the row. The knowledge graph construction module is used to construct the knowledge graph based on the triple data of each row.
8. A smart terminal, characterized by The intelligent terminal includes: an interconnected processor and a memory, wherein, The memory is used to store the method for constructing the knowledge graph as described in any one of claims 1 to 5 or the information query method as described in claim 6; The processor is used to execute program instructions stored in the memory.
9. A storage medium, characterized by The system stores a program file that can be executed to implement the knowledge graph construction method as described in any one of claims 1 to 5 or the information query method as described in claim 6.
Citation Information
Patent Citations
Information query method and device
CN110502645A
Table information extraction method and device, electronic equipment and storage medium
CN113901214A