A data processing system for acquiring tags

By acquiring target user information, expanding the question text, and using entity recognition and knowledge graphs to generate SQL strings, the problem of low label accuracy in existing technologies is solved, and higher accuracy label generation is achieved.

CN116561388BActive Publication Date: 2026-06-12ZHEJIANG MEIRI HUDONG NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG MEIRI HUDONG NETWORK TECH CO LTD
Filing Date
2023-05-09
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

In existing technologies, when searching the database and generating tags based on search queries, the search is based solely on entities, failing to fully consider the semantics and relationships of the entities, resulting in low tag accuracy.

Method used

By acquiring target user information, expanding the question text, analyzing entity relationships using a pre-defined entity recognition model and knowledge graph, generating SQL strings, extracting a list of specified user IDs from the task dataset, and generating tags using a natural language model.

Benefits of technology

It improves the accuracy of tag acquisition by generating more accurate tags through the analysis of entities and relationships.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116561388B_ABST
    Figure CN116561388B_ABST
Patent Text Reader

Abstract

The application provides a data processing system for obtaining a label, comprising: an initial user information list, an initial task data set, a processor and a memory storing a computer program, when the computer program is executed by the processor, the following steps are implemented: obtaining first user information; obtaining target question text according to the first user information and initial question text; obtaining a target SQL string according to the target question text; obtaining a specified user ID list according to the target SQL string; and obtaining a specified label corresponding to the specified user ID list according to the target SQL string. It can be known that the initial question text is expanded to obtain the target question text, the first entity in the target question text is processed to obtain a third entity, the SQL string is generated according to the relationship between the third entity and the first entity, the SQL string is analyzed, and the label is intelligently generated, so that the precision of obtaining the label is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of database processing, and in particular to a data processing system for acquiring tags. Background Technology

[0002] With the rapid development of computer technology, there are tens of thousands of task data lists for each task. The target user cannot find the data they need from these tens of thousands of task data lists. Therefore, based on the search query entered by the target user, tags are intelligently generated for the data in the task data list. The data needed by the target user can be directly obtained through the tags. Most existing methods for obtaining tags involve taking the data search query input information extraction model, obtaining the entities in the search query, and then searching the database based on the entities to generate tags.

[0003] However, the above method also has the following technical problems:

[0004] In the process of searching the database and generating tags based on the search query, the search is only performed on the entities in the search query, without analyzing the semantics of the entities or the relationships between them. Therefore, it is difficult to retrieve all the data corresponding to the search query based solely on the entities, resulting in low accuracy of the obtained tags. Summary of the Invention

[0005] To address the aforementioned technical problems, the technical solution adopted by this invention is as follows:

[0006] A data processing system for acquiring tags includes: an initial user information list, an initial task dataset, a processor, and a memory storing a computer program. The initial task dataset includes several initial task data lists, each containing several field names. When the computer program is executed by the processor, the following steps are implemented:

[0007] S100. Based on the target user ID and the initial user information list, obtain the first user information corresponding to the target user ID, wherein the first user information is the initial user information corresponding to the initial user ID that is consistent with the target user ID in the initial user information list.

[0008] S200. Obtain the target question text based on the first user information and the initial question text.

[0009] S300. Based on the target question text, obtain the target SQL string. Step S300 includes the following steps to obtain the target SQL string:

[0010] S301. Input the target question text into the preset entity recognition model to obtain the first entity list B = {B1, ..., B1} corresponding to the target question text.i , ..., B m The entity relation list C corresponding to B is C = {C1, ..., C...} i , ...C m}, C i ={C i1 , ..., C ij , ..., C in}, B i Let C be the i-th first entity corresponding to the target question text, i = 1...m, where m is the number of first entities corresponding to the target question text. ij For B i and D j The entity relationship between them, D j Divide B from B i The j-th first entity other than the one mentioned above, j = 1...n.

[0011] S303. Based on the preset knowledge graph list, obtain the second entity list E = {E1, ..., E2} corresponding to B. i , ..., E m}, E i For B i The corresponding second entity.

[0012] S305. Based on the initial task dataset, obtain the third entity list F = {F1, ..., F2} corresponding to E. i , ..., F m}, F i For E i The corresponding third entity.

[0013] S307. Input C and F into the preset natural language model to obtain the target SQL string.

[0014] S400. Based on the target SQL string, obtain the list of specified user IDs from the initial task dataset.

[0015] S500: Based on the target SQL string, retrieve the specified tags corresponding to the specified user ID list.

[0016] The present invention has at least the following beneficial effects:

[0017] This invention provides a data processing system for acquiring tags, comprising: an initial user information list, an initial task dataset, a processor, and a memory storing a computer program. When the computer program is executed by the processor, it performs the following steps: acquiring first user information; acquiring target question text based on the first user information and the initial question text; acquiring a target SQL string based on the target question text; acquiring a list of specified user IDs based on the target SQL string; and acquiring specified tags corresponding to the list of specified user IDs based on the target SQL string. It can be seen that this invention expands the initial question text to acquire the target question text, processes the first entity in the target question text to acquire the third entity, generates an SQL string based on the relationship between the third entity and the first entity, analyzes the SQL string, and intelligently generates tags, which helps improve the accuracy of tag acquisition. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart illustrating the execution of a computer program in a data processing system for acquiring tags, as provided in an embodiment of the present invention. Detailed Implementation

[0020] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0021] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that includes a series of steps or modules is not necessarily limited to those explicitly listed, but may include other steps or modules not explicitly listed or inherent to such processes, methods, products, or devices.

[0022] Embodiments of the present invention provide a data processing system for acquiring tags, comprising: an initial user information list, an initial task dataset, a processor, and a memory storing a computer program. The initial task dataset includes several initial task data lists, each containing several field names. When the computer program is executed by the processor, it implements the following steps: Figure 1 As shown:

[0023] S100. Based on the target user ID and the initial user information list, obtain the first user information corresponding to the target user ID. The first user information is the initial user information corresponding to the initial user ID that is consistent with the target user ID in the initial user information list. Those skilled in the art know that any method in the prior art for obtaining the initial user ID that is consistent with the target user ID from the initial user information list is within the protection scope of this invention, and will not be described in detail here.

[0024] Specifically, the initial user information in the initial user information list is user information that has been pre-set by those skilled in the art according to actual needs.

[0025] Furthermore, user information includes: user ID, user name, user department, and user title.

[0026] Specifically, the initial task data list is a list of tasks to be processed specified by the user, for example, an employee query task.

[0027] S200. Obtain the target question text based on the first user information and the initial question text.

[0028] Specifically, the initial question text is the question text input by the target user. For example, the initial question text can also be the question text obtained from the voice input by the target user. As those skilled in the art know, any method of converting speech into text in the prior art is within the protection scope of this invention, and will not be elaborated here.

[0029] Specifically, step S200 includes the following steps:

[0030] S201. Based on the first user information, obtain the first text list A = {A1, A2}, where A1 is the first text and A2 is the key text associated with A1. The first text is any one of the user information other than the user ID and user name in the first user information, and the key text associated with the first text is any one of the user information other than the user ID, user name, and the user information corresponding to the first text in the first user information.

[0031] S203. Input A1, A2 and the initial question text into the preset semantic fusion model to obtain the target question text. As those skilled in the art know, any semantic fusion model in the prior art is within the protection scope of this invention, and will not be described in detail here. For example, the N-Gram language model.

[0032] The above describes a process where the first text is obtained based on the first user information, the target question text is obtained by combining the first text with the initial question text, and the initial question text is expanded by combining the first user information corresponding to the target user. The obtained target question text is clearer and more accurate. Analyzing and processing the target question text helps to improve the accuracy of tag acquisition.

[0033] S300. Obtain the target SQL string based on the target question text.

[0034] Specifically, step S300 includes the following steps:

[0035] S301. Input the target question text into the preset entity recognition model to obtain the first entity list B = {B1, ..., B1} corresponding to the target question text. i , ..., B m The entity relation list C corresponding to B is C = {C1, ..., C...} i , ...C m}, C i ={C i1 , ..., C ij , ..., C in}, B i Let C be the i-th first entity corresponding to the target question text, i = 1...m, where m is the number of first entities corresponding to the target question text. ij For B i and D j The entity relationship between them, D j Divide B from B i The j-th first entity other than the one mentioned above, j = 1...n. As those skilled in the art know, any entity recognition model in the prior art that can obtain entities and entity relationships falls within the protection scope of this invention, and will not be elaborated further here.

[0036] Specifically, n=m-1.

[0037] Specifically, an entity relation is a relation within a type of relation, such as equivalence relation or parallel relation.

[0038] Furthermore, equivalence relations include: greater than, equal to, and less than.

[0039] Furthermore, parallel relationships include: and, and, or.

[0040] S303. Based on the preset knowledge graph list, obtain the second entity list E = {E1, ..., E2} corresponding to B. i , ..., E m}, E i For B i The corresponding second entity, wherein the list of preset knowledge graphs includes several preset knowledge graphs, which are knowledge graphs pre-set by those skilled in the art according to actual needs.

[0041] Specifically, step S303 includes the following steps:

[0042] S3031. Obtain the first intermediate entity list U = {U1, ..., U...} corresponding to the preset knowledge graph list. y , ..., U q}, U y Let y be the first intermediate entity corresponding to the preset knowledge graph list, where y = 1...q, and q is the number of first intermediate entities corresponding to the preset knowledge graph list. The first intermediate entity is an entity in the preset knowledge graph in the preset knowledge graph list.

[0043] S3033, B i Input into the preset word vector extraction model to obtain B i The corresponding first entity word vector list Z i ={Z i1 , ..., Z ig , ..., Z ih}, Z ig For B i The corresponding first entity word vector, g = 1...h, where h is B i The number of corresponding first entity word vectors.

[0044] S3035, U y Input into the preset word vector extraction model to obtain U y The corresponding second entity word vector list V y ={V y1 , ..., V yg , ..., V yh}, V yg For U y The corresponding g-th second entity word vector.

[0045] S3037, according to Z ig and V yg , obtain B i Corresponding entity similarity list XS i ={XS i1 ..., XS iy ..., XS iq}, XSiy For B i and U y Entity similarity between them, where XS iy The following conditions must be met:

[0046]

[0047] S3039, Determine XS i The largest XS iy Corresponding U y For E i .

[0048] S305. Based on the initial task dataset, obtain the third entity list F = {F1, ..., F2} corresponding to E. i , ..., F m}, F i For E i The corresponding third entity.

[0049] Specifically, step S305 includes the following steps:

[0050] S3051. Obtain the second intermediate entity list DE = {DE1, ..., DE} corresponding to the initial task dataset. k , ..., DE t}, DE k This is the list of the kth second intermediate entities corresponding to the initial task dataset, where k = 1...t, t is the number of second intermediate entities corresponding to the initial task dataset, and the second intermediate entity is the field name in the initial task data list of the initial task dataset.

[0051] S3053, according to E i and DE k , obtain F i Wherein, as those skilled in the art know, according to E i and DE k , obtain F i The method is the same as steps S3033-S3039, and will not be repeated here.

[0052] S307. Input C and F into a preset natural language model to obtain the target SQL string. In some embodiments, the SQL string can be replaced by other DSLs, which can also achieve the technical solution of the present invention. The preset natural language model is a natural language model used to obtain the SQL string. Those skilled in the art know that any natural language model in the prior art that can obtain the SQL string is within the protection scope of the present invention, and will not be described in detail here.

[0053] The above process involves obtaining the first entity from the target question statement, obtaining the first intermediate entity through a pre-defined knowledge graph, analyzing the semantics of the first entity and the first intermediate entity to obtain word vectors, further obtaining the second entity, obtaining the third entity through the second intermediate entity in the initial task dataset using the same method, generating an SQL string based on the relationship between the third entity and the first entity, analyzing the SQL string, and intelligently generating tags, which helps improve the accuracy of tag acquisition.

[0054] S400. Obtain a list of specified user IDs from the initial task dataset based on the target SQL string. As those skilled in the art know, any method in the prior art for obtaining a list from a dataset based on an SQL string is within the protection scope of this invention, and will not be described further here.

[0055] S500: Based on the target SQL string, retrieve the specified tags corresponding to the specified user ID list.

[0056] Specifically, the S500 procedure includes the following steps:

[0057] S501. Obtain the preset label mapping list G = {G1, ..., G...} r , ..., G s}, G r =(G r1 G r2 ), G r1 G is the preset SQL string in the r-th record of the preset label mapping list. r2 For G r1 The corresponding preset labels, r = 1...s, where s is the number of records in the preset label mapping list. As those skilled in the art know, the preset SQL strings and the preset labels corresponding to the preset SQL strings in the preset label mapping list are set by those skilled in the art according to actual needs.

[0058] S503. Obtain the first character list H = {H1, ..., H2} corresponding to the target SQL string. e H f}, H e Let f be the e-th first character of the target SQL string, where e = 1...f, f is the number of the first characters in the target SQL string, and the first character is a character in the target SQL string.

[0059] S505, Obtain G r1 The corresponding second character list L r ={L r1 , ..., L rx , ..., L rp}, L rx For Gr1 The corresponding second character of the xth position, x = 1...p, where p is G r1 The corresponding number of the second character, which is a character in the preset SQL string.

[0060] S507, according to H e and L rx Get the target SQL string and G r1 String similarity W between r .

[0061] Specifically, step S507 includes the following steps:

[0062] S5071, Obtain H e equal to L rx The quantity count, where e = x.

[0063] S5073, When f≥p, W r = count / f.

[0064] S5075, When f < p, W r = count / p.

[0065] S509, when W r When = 1, determine G r2 Specify the label.

[0066] The above-mentioned method of obtaining the string similarity between the target SQL string and the preset SQL string, comparing the string similarity, and determining the specified tags corresponding to the specified user ID list helps to improve the accuracy of tag acquisition.

[0067] This invention provides a data processing system for acquiring tags, comprising: an initial user information list, an initial task dataset, a processor, and a memory storing a computer program. When the computer program is executed by the processor, it performs the following steps: acquiring first user information; acquiring target question text based on the first user information and the initial question text; acquiring a target SQL string based on the target question text; acquiring a list of specified user IDs based on the target SQL string; and acquiring specified tags corresponding to the list of specified user IDs based on the target SQL string. It can be seen that this invention expands the initial question text to acquire the target question text, processes the first entity in the target question text to acquire the third entity, generates an SQL string based on the relationship between the third entity and the first entity, analyzes the SQL string, and intelligently generates tags, which helps improve the accuracy of tag acquisition.

[0068] While specific embodiments of the invention have been described in detail by way of example, those skilled in the art should understand that the above examples are for illustrative purposes only and are not intended to limit the scope of the invention. Those skilled in the art should also understand that various modifications can be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims

1. A data processing system for acquiring tags, characterized in that, The system includes: an initial user information list, an initial task dataset, a processor, and a memory storing a computer program. The initial task dataset includes several initial task data lists, each containing several field names. When the computer program is executed by the processor, the following steps are implemented: S100. Based on the target user ID and the initial user information list, obtain the first user information corresponding to the target user ID, wherein the first user information is the initial user information corresponding to the initial user ID that is consistent with the target user ID in the initial user information list. S200. Obtain the target question text based on the first user information and the initial question text; S300. Based on the target question text, obtain the target SQL string. Step S300 includes the following steps to obtain the target SQL string: S301. Input the target question text into the preset entity recognition model to obtain the first entity list B={B1, ..., B1} corresponding to the target question text. i , ..., B m The entity relation list C = {C1, ..., C} corresponding to B is C = {C1, ..., C}. i , ...C m }, C i ={C i1 , ..., C ij , ..., C in }, B i Let C be the i-th first entity corresponding to the target question text, i=1……m, where m is the number of first entities corresponding to the target question text. ij For B i and D j The entity relationship between them, D j Divide B from B i The j-th first entity other than the one mentioned above, j=1……n; S303. Based on the preset knowledge graph list, obtain the second entity list E={E1, ..., E2} corresponding to B. i , ..., E m }, E i For B i The corresponding second entity; step S303 includes the following steps: S3031. Obtain the first intermediate entity list U={U1, ..., U2} corresponding to the preset knowledge graph list. y , ..., U q }, U y Let y be the first intermediate entity corresponding to the preset knowledge graph list, where y = 1...q, q is the number of first intermediate entities corresponding to the preset knowledge graph list, and the first intermediate entity is the entity in the preset knowledge graph in the preset knowledge graph list; S3033, B i Input into the preset word vector extraction model to obtain B i The corresponding first entity word vector list Z i ={Z i1 , ..., Z ig , ..., Z ih }, Z ig For B i The corresponding first entity word vector, g=1……h, where h is B i The corresponding number of first entity word vectors; S3035, U y Input into the preset word vector extraction model to obtain U y The corresponding second entity word vector list V y ={V y1 , ..., V yg , ..., V yh }, V yg For U y The corresponding g-th second entity word vector; S3037, according to Z ig and V yg , obtain B i Corresponding entity similarity list XS i ={XS i1 ..., XS iy ..., XS iq }, XS iy For B i and U y Entity similarity between them, where XS iy The following conditions must be met: ; S3039, Determine XS i The largest XS iy Corresponding U y For E i ; S305. Based on the initial task dataset, obtain the third entity list F = {F1, ..., F2} corresponding to E. i , ..., F m }, F i For E i The corresponding third entity; step S305 includes the following steps: S3051. Obtain the second intermediate entity list DE={DE1, ..., DE} corresponding to the initial task dataset. k , ..., DE t }, DE k This is the list of the kth second intermediate entities corresponding to the initial task dataset, k=1……t, where t is the number of second intermediate entities corresponding to the initial task dataset, and the second intermediate entity is the field name in the initial task data list in the initial task dataset; S3053, according to E i and DE k , obtain F i ; S307. Input C and F into the preset natural language model to obtain the target SQL string; S400. Based on the target SQL string, obtain the list of specified user IDs from the initial task dataset; S500: Based on the target SQL string, retrieve the specified tags corresponding to the specified user ID list.

2. The data processing system for acquiring tags according to claim 1, characterized in that, The S200 procedure includes the following steps: S201. Based on the first user information, obtain the first text list A={A1, A2}, where A1 is the first text and A2 is the key text associated with A1. The first text is any one of the user information other than the user ID and user name in the first user information, and the key text associated with the first text is any one of the user information other than the user ID, user name, and the user information corresponding to the first text in the first user information. S203. Input A1, A2 and the initial question text into the preset semantic fusion model to obtain the target question text.

3. The data processing system for acquiring tags according to claim 1, characterized in that, The S500 procedure includes the following steps: S501. Obtain the preset label mapping list G = {G1, ..., G...} r , ..., G s }, G r =(G r1 G r2 ), G r1 G is the preset SQL string in the r-th record of the preset label mapping list. r2 For G r1 The corresponding preset labels, r=1……s, where s is the number of records in the preset label mapping list; S503. Obtain the first character list H = {H1, ..., H2} corresponding to the target SQL string. e H f }, H e The first character is the e-th character corresponding to the target SQL string, where e = 1...f, f is the number of the first characters corresponding to the target SQL string, and the first character is a character in the target SQL string; S505, Obtain G r1 The corresponding second character list L r ={L r1 , ..., L rx , ..., L rp }, L rx For G r1 The corresponding second character of the xth position, x=1……p, p is G r1 The corresponding number of the second character, where the second character is a character in the preset SQL string; S507, according to H e and L rx Get the target SQL string and G r1 String similarity W between r ; S509, when W r When =1, determine G r2 To specify a label.

4. The data processing system for acquiring tags according to claim 3, characterized in that, Step S507 includes the following steps: S5071, Obtain H e equal to L rx The quantity count, where e = x; S5073, When f≥p, W r =count / f; S5075, When f < p, W r =count / p.

5. The data processing system for acquiring tags according to claim 1, characterized in that, The user information includes: user ID, user name, user department, and user title.

6. The data processing system for acquiring tags according to claim 1, characterized in that, n = m - 1.