A method for mapping noun phrases to description logic concepts based on externalization

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By mapping noun phrases to the descriptive logic language EL++, and utilizing the extension and syntax tree parsing order, the problem of noun phrase parsing without a training dataset is solved, achieving efficient understanding and mapping of complex noun phrases, and improving mapping accuracy and efficiency.

CN115186671BActive Publication Date: 2026-06-23NANJING UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: NANJING UNIV
Filing Date: 2022-05-16
Publication Date: 2026-06-23

Application Information

Patent Timeline

16 May 2022

Application

23 Jun 2026

Publication

CN115186671B

IPC: G06F40/30; G06F40/211; G06F40/253; G06F40/268; G06F40/289; G06F40/242; G06N5/022; G06N5/02

AI Tagging

Application Domain

Semantic analysis Knowledge representation

Technology Topics

Part of speechNoun phrase

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A method and system for quantitatively evaluating the influence of citations on emotional semantics deconstruction
CN122346544APart of speechSemantics
Event causality detection method fusing lexical and dependency features
CN116796727BEnhanced Semantic RepresentationMathematical models Semantic analysisConditional random fieldPart of speech
A method and device for identifying a table of contents of an article in the financial field
CN116151224BPart of speechDirectory
A paraphrase sentence generation method based on template sentence enhancement
CN116628133BPart of speechParaphrase
A multi-table data intelligent screening system and method based on cell data association
CN122450979ADatasheetPart of speech

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing semantic parsing methods require a large amount of training data and lack training datasets for noun phrases, resulting in unsatisfactory parsing performance. Existing methods cannot effectively handle the problem of mapping complex noun phrases with nested relationships to knowledge graphs.

Method used

By mapping noun phrases to logical concepts in the descriptive logic language EL++ through their extensions, a mapping table is generated using the extensions. Combined with the syntax tree parsing order and multi-dimensional scoring functions, high-quality descriptive logical concepts are generated.

Benefits of technology

Under unsupervised conditions, it achieves efficient understanding and mapping of complex noun phrases, improving mapping accuracy and efficiency. In particular, for noun phrases with nested relationships, the quality and efficiency of the generated descriptive logical concepts are significantly improved.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115186671B_ABST

Patent Text Reader

Abstract

The method for mapping a noun phrase to a description logic concept based on epitaxy firstly exhaustively lists all text segments of the noun phrase, generates a mapping table of the text segments to resources in a knowledge base; then generates an analysis sequence according to the word segmentation, part-of-speech tagging and syntax tree of the noun phrase; and finally, according to the analysis sequence, continuously refines basic concepts generated by the indexed resources from the concept of EL++, until all words are analyzed, to obtain the description logic concept to which the noun phrase is mapped. The application can automatically process complex noun phrases containing implicit relations and generate high-quality description logic concepts through analysis of the syntax tree.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer technology, and relates to natural language processing and knowledge graph technology. It is a method for mapping noun phrases to descriptive logical concepts based on extension. Background Technology

[0002] Enabling computers to understand natural language has always been a relentless goal for researchers in the field of natural language processing (NLP). Semantic parsing, which aims to transform natural language text into a meaning representation that computers can understand, is one of the most challenging problems in NLP. Due to the complexity and ambiguity of natural language, this task has attracted considerable attention from researchers since its inception. The rise of knowledge graphs has made the work of bridging natural language and knowledge graphs even more crucial.

[0003] In natural language processing, noun phrases are phrases that function grammatically as nouns. Noun phrases are widely present in various corpora, making their understanding crucial. A good noun phrase parser can also serve as a component in other natural language processing tasks. However, current semantic parsing work, and KBQA work implemented through semantic parsing methods, typically uses sentences or texts as the unit of natural language research, with little specific focus on noun phrases. Relational information within noun phrases often appears implicitly. For example, "American songwriters" can mean "songwriters who were born in the United States" or "songwriters whose citizenship is in the United States." This is easily understood by humans, but for computers, information about nationality or place of birth cannot be directly obtained from the phrase text. Some studies, to save manpower in labeling training data, choose to use extension as training data for weakly supervised learning. Extension is a concept relative to intension, consisting of the things to which the phrase applies. For question-answering tasks, extension is the set of answer entities for the question. In some works, extension is used as a supplement to the training data; statistical indicators based on extension information can serve as training features, providing a reference for determining implicit relationships during training. However, these semantic parsing works based on supervised or weakly supervised learning all require a certain size training dataset to train the generative model. Currently, however, there is no authoritative, publicly available supervised learning training dataset specifically for noun phrases. How to achieve phrase understanding using more lightweight methods is worth exploring and researching.

[0004] On the other hand, some related work has emerged in the task of mapping noun phrases to knowledge graphs using extensions. These works use Wikipedia categories as their research object. Since the set of entities described is easily obtained, these works use statistical indicators to provide features of entities that conform to the descriptions of Wikipedia categories. For example, Cat2Ax utilizes the hierarchical structure of Wikipedia categories to extract matching patterns, and selects the highest-scoring axiom based on a comprehensive evaluation of statistical indicators and lexical scores, thereby generating new triples to complete the knowledge base. Pasca et al. treat complex noun phrases as a combination of a head type and modifiers. They first determine the head of the phrase, then divide the other parts into several modifiers, and use statistical indicators to select the interpretation of the other modifiers given the interpretation of the head. In general, these existing methods treat noun phrases as combinations of modifiers, interpreting them separately and then simply concatenating them, failing to handle complex noun phrases with nested relationships.

[0005] Because complex noun phrases may exist, a semantically powerful representation is needed to describe them. Description logic primarily describes the concepts and attributes of an ontology, providing a convenient representation for knowledge graph construction and is widely used in ontology reasoning. Among them, the description logic language EL++ has multinomial-time computational complexity for reasoning, maintaining good expressiveness while being relatively lightweight. The EL++ logical form can be recursively defined as:

[0006]

[0007] in, This is a set of top-level concept names. A represents an atomic concept, i.e., a concept name, such as Film; r represents an atomic role, i.e., a role name, such as basedOn; O represents an individual name, such as Alice Munro; C1 and C2 are general concepts. That is, in EL++, concept C is formed by disjunction from atomic concepts A and atomic roles r. Existence constraints Generate as a constructor. For ease of understanding, the concepts in the description logic EL++ are referred to as description logic concepts.

[0008] In conclusion, an effective and efficient method for mapping phrases to logical forms based on specific knowledge graphs using extensions is of great significance. Summary of the Invention

[0009] The problem this invention aims to solve is that existing semantic parsing methods require a large amount of training data. Due to a lack of extension and dataset, their performance in parsing noun phrases is poor during the prediction stage. Furthermore, existing methods that map noun phrases to knowledge graphs using extension cannot handle complex noun phrases with nested relationships. The purpose of this invention is to provide a method that can quickly and comprehensively understand noun phrases through extension, specifically a method for mapping noun phrases to EL++ descriptive logical concepts.

[0010] The technical solution of this invention is: a method for mapping noun phrases to descriptive logical concepts based on extension, which maps noun phrases to logical language concepts expressed by the descriptive logic language EL++ through the extension of the noun phrases, generating an understanding of the noun phrases on a given knowledge base, including the following steps:

[0011] Step 1: Perform word segmentation and lemmatization on the noun phrases. For the segmented word sequence, enumerate all text fragments T, which are all fragments composed of N-grams in the noun phrases, and the lemmatized text fragments T corresponding to these text fragments. lemma The text fragment T is indexed to the resources in the knowledge base, generating a mapping table from the text fragment to the resources in the knowledge base;

[0012] Step 2: Tag the parts of speech of the noun phrases and generate a syntax tree. Recursively traverse the entire tree from the top of the tree, and use the traversal order of the leaf nodes, i.e. each word, as the parsing order.

[0013] Step 3: Following the parsing order, start with the concepts of EL++. Initially, the basic concepts generated from the indexed resources are continuously refined. Each parsable word is parsed sequentially, and this process continues until all words are parsed, resulting in the descriptive logical concepts mapped to noun phrases:

[0014] Step 3.1: For the currently parsable word, list all candidate text fragments containing the parsable word;

[0015] Step 3.2: Based on the mapping table obtained in Step 1, index the candidate text fragments to the corresponding resources, and generate candidate refinement operations based on the corresponding resources;

[0016] Step 3.3: Perform consistency screening on the newly generated candidate refinement operations and filter out refinement operations that are inconsistent with the syntax;

[0017] Step 3.4: Use the refinement operation obtained in 3.3 to generate a refined description logic concept for the current parsable word. Score the obtained description logic concepts and select the top k with the highest scores to keep. Then check whether the parsing has been completed, that is, whether the currently parsed parsable word is the last one in the parsing order. If not, proceed to step 3.1 to parse the next parsable word; if yes, proceed to step 3.5.

[0018] The scoring function describing logical concepts is:

[0019] S score (NP,C)=w sup *S sup (NP,C)+w match *S match (NP,C)+w sim *S sim (NP,C)

[0020] Where S sup For support rating, S match To score the match, S sim For simplicity rating, w sup w match w sim For the corresponding weights,

[0021] Support score for describing logical concepts (S) sup Defined as the smoothed mean of the support sets of several refinement operations during the generation of a logical concept, for a given noun phrase NP and refinement operations. NP I For the set of entities described by a noun phrase, i.e., the extension of the phrase, for concept C, C I Let C be the set of entities described by concept C, and B be the set of entities described by basic concepts B. I For the set of entities described by basic concept B, refine the operations. For a concept C, a basic concept B is used to modify a part A within C, supporting a Set. sup The calculation formula is as follows:

[0022]

[0023] in, The part A modified by B describes the extension NP. I itself, The part A modified by B refers to a set of entities that describe the relationship between the extension and the external scope.

[0024] S sup It is calculated by the following formula, where d is the refinement operation of concept C. It supports sets Support level:

[0025]

[0026] S match Defined as the proportion of words in a noun phrase NP that can be matched with concept C, calculated using the following formula:

[0027]

[0028] S sim Defined as the number of refinement operations in the concept, the calculation formula is as follows:

[0029] S sim (C)=-|{d|d∈C}|

[0030] Step 3.5: For the logical concepts describing all words obtained according to the parsing order, retain the one with the highest score as the output C. best That is, the descriptive logical concept mapped to the noun phrase, which is used by the knowledge base for semantic understanding of the noun phrase.

[0031] Compared with the prior art, the beneficial effects of this invention are as follows:

[0032] (1) In existing semantic parsing work, there are few methods that utilize syntactic trees for semantic parsing. Some methods use joint training of syntactic trees and semantic parsing results, or methods that generate the decoding process of the model by constraining syntactic information. These methods are mainly supervised and semi-supervised learning methods, which rely on training datasets and have unsatisfactory parsing results. In the absence of training datasets for noun phrases, this invention utilizes the extension of noun phrases to automatically process complex noun phrases with implicit relations in a lightweight way, thus realizing an unsupervised lightweight algorithm.

[0033] (2) Existing extension-based bundle methods do not consider relational analysis of complex noun phrases. The core reason for this is the purpose of the methods. Existing unsupervised methods using extension are mainly aimed at extracting triples to supplement the knowledge base, and therefore do not consider complex phrases. The purpose of this invention is to understand noun phrases using the resources of a given knowledge base, especially complex noun phrases with nested relationships. This invention utilizes the grammatical information of noun phrases and improves the quality and efficiency of the generated descriptive logical concepts by constraining grammatical consistency and parsing order, and has the ability to process noun phrases with nested relationships.

[0034] (3) This invention utilizes extension-based statistical indicators and index matching indicators to select high-quality descriptive logical concepts through a multi-dimensional scoring method, thereby improving the accuracy of mapping noun phrases to logical language concepts. Using 600 randomly selected high-quality Wikipedia categories with manual annotations as the dataset, the validation set and test set were divided in a 5:1 ratio, and the following indicator results were obtained: the complete matching rate of generated EL++ descriptive logical concepts was 0.53, and the partial matching rate was 0.71. Attached Figure Description

[0035] Figure 1 This is a flowchart illustrating the method of the present invention. Detailed Implementation

[0036] This invention provides a method for mapping noun phrases to logical concepts described by EL++ through extension. By mapping noun phrases to logical language concepts expressed by the logical language EL++, an understanding of the noun phrases is generated on a given knowledge base DBpedia, enabling computers to better understand noun phrases. The method includes the following steps.

[0037] Step 1: Using natural language processing tools, segment and lemmatize the noun phrases. For the segmented word sequence, enumerate all text fragments T. A text fragment refers to a continuous segment of words in a noun phrase, that is, a segment composed of all N-grams. lemma This is the key used when building the index dictionary, which is essentially a text alias. It might be the prototype of a word; for example, "French" cannot be indexed as the entity "dbr:France," but "France" can. The prototype fragment is the text fragment after lemmatization. This involves combining the text fragment T with its corresponding lemmatized text fragment T. lemma It indexes resources in a given knowledge base and generates a mapping table of text fragments to resources in the knowledge base; resources include entities, literals, attributes, and types.

[0038] Step 2: Perform part-of-speech tagging based on the word segmentation of the noun phrases and generate a syntax tree. Recursively traverse the entire tree starting from the top, using the traversal order of the leaf nodes, i.e., each word, as the parsing order, as follows.

[0039] Step 2.1: Use natural language processing tools to generate the syntax tree of the noun phrases;

[0040] Step 2.2: Recursively traverse the entire tree starting from the top of the tree, taking the traversal order of the leaf nodes as the parsing order. In the syntax tree, the leaf nodes are all words, and the generated parsing word order is a certain arrangement order of the words in the phrase. The currently parsable word is the word that should be parsed at the current time. These words are parsed in the parsing order.

[0041] Furthermore, when generating the parsing order, the head of a noun phrase is defined as the last word of the first noun group. A noun group refers to a long noun phrase composed of nouns that modify other nouns. All noun groups corresponding to the noun phrase are obtained through part-of-speech analysis. For noun phrase nodes in the syntax tree, the child node containing the head of the current noun phrase is first parsed as the new noun phrase node. Then, the child nodes to the left of the head are parsed from right to left, and finally, the child nodes to the right of the head are parsed from left to right. That is, the parsing order is from the head to the farthest. For nodes starting with a verb or adverb, the verb or adverb is parsed first, and then the remaining part is parsed in the original order, that is, the order from left to right or from right to left on the parent node. For nodes starting with an adjective, since the adjective is necessarily a modifier of the head, the other parts except the adjective are parsed first in the original order of the phrase, and the adjective is parsed last.

[0042] Step 3: For a parsable word, based on all its text fragments, index resources generate refinement operations, and obtain the corresponding descriptive logical concepts through these refinement operations. Following the parsing order, from the EL++ concepts... Initially, the basic concepts generated from the indexed resources are continuously refined. Each parsable word is parsed sequentially, and this process continues until all words are parsed, resulting in the descriptive logical concepts mapped to noun phrases, as detailed below.

[0043] Step 3.1: For the currently parsable word, list all candidate text segments containing the word, that is, all text segments containing the parsable word.

[0044] Step 3.2: Based on the resource mapping table obtained in Step 1, index the text fragments to the corresponding resources, and generate all candidate refinement operations based on the corresponding resources.

[0045] The basic concepts in the logic language EL++ include five forms: the individual concept {O}, the atomic concept A, and the basic concept form corresponding to the role in EL++ logical concepts. and hidden character names and Resources include entities, literals, attributes, and types. For indexed entities and literals, corresponding forms are generated, including {O} and... For the indexed type, generate the corresponding form, including A and For the indexed attributes, generate the corresponding form. Define refinement operations The process involves modifying a portion A of concept C using basic concept B. Given basic concepts B and C, all possible refinement operations are generated by enumerating the A portion of the refined C. Specifically, for the indexed entity and literal o, the corresponding basic concept {o} and all basic concepts containing hidden roles are generated. It also generates all refinement operations with non-zero support; for the indexed type A, it generates the corresponding basic concept A and all basic concepts containing hidden roles. It also generates all refinement operations with non-zero support; for the indexed attribute p and corresponding role r, it generates the corresponding basic concepts. It also generates all refinement operations with non-zero support. Here, support refers to the support set of the refinement operation, for the noun phrase NP, the refinement operation d, and the extension NP. I , It supports Sets sup Support of (NP,d).

[0046] Step 3.3: Perform consistency screening on the newly generated candidate refinement operations, and filter out refinement operations that are inconsistent with the syntax:

[0047] If the current word to be parsed is the head, then prioritize selecting words like... The detailed operation, of which B atomic For atomic concepts, conversely, if the current word to be parsed is not a phrase head, then non-phrasei are preferred. Detailed operations of the form.

[0048] Step 3.4: Use the refinement operation obtained in 3.3 to generate a refined description logic concept for the current parsable word. Score the obtained description logic concept and select the top k with the highest scores to keep. Check whether the parsing has been completed, that is, the parsable word that has been parsed is the last one in the parsing order. If not, proceed to step 3.1 to parse the next word; if yes, proceed to step 3.5.

[0049] The scoring function describing logical concepts is:

[0050] S score (NP,C)=w sup *S sup (NP,C)+w match *S match (NP,C)+w sim *S sim (NP,C)

[0051] Where S sup For support rating, S match To score the match, S sim For simplicity rating, w sup w match w sim The corresponding weights.

[0052] Support score for describing logical concepts (S) supDefined as the mean of the smoothed values of the support sets of several refinement operations during the generation of logical concepts, for a given noun phrase NP and refinement operations. NP I For the set of entities described by a noun phrase, i.e., the extension of the phrase, for concept C, C I For a set of entities describing a concept, for basic concepts B, B I For a set of entities described by basic concepts, refine the operations. For a concept C, if a basic concept B modifies a part A within C, the formula for calculating the support set is as follows:

[0053]

[0054] in, The part A modified by B describes the extension NP. I itself, such as middle, This refers to the part A modified by B, which describes a set of entities that have a relationship with the extension, such as... In the text, the refined part, Work, is not a description of the extensional NP. I It is not a collection of entities that have a "basedOn" relationship with it.

[0055] S sup It is calculated by the following formula, where d is the refinement operation that generates C. It supports sets The support level, ε, is an empirical parameter, typically set to 1.

[0056]

[0057] S match Defined as the proportion of words in a phrase that can be matched with a concept, calculated using the following formula:

[0058]

[0059] S sim Defined as the number of refinement operations in the concept, the calculation formula is as follows:

[0060] S sim (C)=-|{d|d∈C}|

[0061] Step 3.5: For the descriptive logical concepts of all words obtained according to the parsing order, retain the score S. score The highest value is output C best That is, the descriptive logical concept mapped to the noun phrase, which is used by the knowledge base for semantic understanding of the noun phrase.

[0062] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments. In particular, the weighting parameter is set to w. sup =0.3,w match =0.5,w sim =0.2, k is 5, and the experiment uses DBpedia version 2016-10 as the knowledge base.

[0063] Example

[0064] The input noun phrase is "Films based on works by Alice Munro", and the entity set {dbr:Away_from_Her,dbr:Edge_of_Madness...} described by the phrase is used as the extension. The correct output is: The present invention will be further described in detail using this example so that those skilled in the art can implement it based on the description.

[0065] Combination Figure 1 The present invention specifically includes the following steps:

[0066] Step 1: Enumerate all text fragments of noun phrases and generate a mapping table from text fragments to resources in the knowledge base, as follows:

[0067] Using natural language processing tools, noun phrases are segmented and lemmatized, resulting in the word sequence "[films,based,on,works,by,Alice,Munro]" and the prototype sequence "[film,base,on,work,by,Alice,Munro]". For the segmented word sequence, all text fragments and their corresponding lemmatized text fragments are enumerated. For each text fragment T and its corresponding lemmatized text fragment T... lemma The index retrieves the corresponding knowledge base resources, including entities, literals, attributes, and types. The index dictionary used for indexing is built offline using anchor text, tag attribute values, and redirects from DBpedia. It is stored as key-value pairs of <natural language text, resource> for fast lookup during the indexing process.

[0068] The index yielded partial text fragments and their corresponding attributes, as shown in Table 1.

[0069] Table 1

[0070]

[0071]

[0072] Step 2: Based on the word segmentation, part-of-speech tagging, and syntax tree of the noun phrases, generate the parsing order as follows:

[0073] Step 2.1: Use natural language processing tools to generate the syntax tree of the noun phrase. The syntax tree for “Films based on works by Alice Munro” is “(TOP(NP(NP(_Films))(VP(_based)(PP(_on)(NP(NP(_works))(PP(_by)(NP(_Alice)(_Munro))))))))”

[0074] Step 2.2: Recursively traverse the entire tree starting from the top, using the traversal order of the leaf nodes (i.e., each word) as the parsing order.

[0075] The head of a noun phrase is defined as the last word of the first noun group. For noun phrase nodes, the child node containing the current noun phrase head is parsed first as the new noun phrase node. Then, the child nodes to the left of the head are parsed from right to left, and finally, the child nodes to the right of the head are parsed from left to right. That is, the parsing order is from near to far, starting from the head. For nodes starting with a verb or adverb, the verb or adverb is parsed first, and then the remaining parts are parsed in the original order. For nodes starting with an adjective, since the adjective is necessarily a modifier of the head, the other parts are parsed first in the original phrase order, and the adjective is parsed last.

[0076] For "Films based on works by Alice Munro", we first process the header "films", then process the part to the right of the header from left to right. Since the first word of this node, "based", is a verb, we process the verb "based" first, then process "on" in the original order. At this point, a new noun phrase "works by Alice Munro" appears. For this part, we first process the header "works", then process "by" in order, and finally process the new noun phrase "AliceMunro". Therefore, the parsing order is "films" "based" "on" "works" "by" "Munro" "Alice".

[0077] Step 3: Following the parsing order, start with the concepts of EL++. Initially, the basic concepts generated from the indexed resources are continuously refined until all words are parsed, resulting in descriptive logical concepts mapped to noun phrases, as follows:

[0078] Step 3.1: For the currently parsable word, generate all candidate text fragments containing the parsed word. For example, for the parsable word "film", generate all candidate text fragments, including "films", "films based", "films based on", etc.

[0079] Step 3.2: Based on the resource index obtained in Step 1, index the corresponding resources from the text fragment. Generate all candidate refinement operations based on the corresponding resources. For the text fragment "films", the indexed resources include type "dbo:Film", entity "dbr:Film", attribute "dbo:openingFilm", etc., generating basic concepts. Film, {Film}, etc., are used to refine known concepts and generate refinement operations such as... wait.

[0080] Step 3.3: Perform consistency screening on the newly generated candidate refinement operations, filtering out refinement operations that are inconsistent with the syntax. For the current header "films", filter out... Wait, keep

[0081] Step 3.4: Sort the refined concepts and select the top k with the highest scores. For each concept, check if it has been fully analyzed. If not, proceed to step 3.1; if so, proceed to step 3.5.

[0082] When there is only one concept, Film, since the parsing is not yet complete, we directly proceed to step 3.1 to search for possible refinement operations again. After multiple iterations, the candidate refined concepts are: And so on. Calculate its fraction, such as... The matching score is 6 / 7 = 0.857, the support score is 0.83, and the conciseness score is -4, resulting in a total score of -0.1225. We retain the two concepts with the highest scores. and The parsing is complete. Proceed to step 3.5.

[0083] Step 3.5: Of all the concepts so far, retain the one with the highest score, designated as C. best This is the output. In this embodiment, the output with the highest score is...

[0084] Compared with the work of Cat2Ax and Pasca et al. (i.e. HM decompose*), the present invention has superior accuracy and can provide more complete and higher quality description logic concept mapping results, as shown in Table 2.

[0085] Table 2

[0086] Partial match rate Perfect match rate This invention 0.71 0.53 Cat2Ax 0.42 0.21 HM decompose* 0.36 0.29

Claims

1. A method for mapping noun phrases to descriptive logical concepts based on extension, characterized in that, By mapping noun phrases to logical language concepts expressed using the denotation logic language EL++, an understanding of noun phrases on a given knowledge base is generated, including the following steps: Step 1: Perform word segmentation and lemmatization on the noun phrases. For the segmented word sequence, enumerate all text fragments T, which are all fragments composed of N-grams in the noun phrases, and the lemmatized text fragments T corresponding to these text fragments. lemma This indexes text fragments to resources in the knowledge base and generates a mapping table between text fragments and resources in the knowledge base. Step 2: Perform part-of-speech tagging based on the word segmentation of noun phrases and generate a syntax tree. Recursively traverse the entire tree starting from the top, using the leaf nodes, i.e. the traversal order of each word, as the parsing order. Step 3: Following the parsing order, start with the concepts of EL++. Initially, the basic concepts generated from the indexed resources are continuously refined. Each parsable word is parsed sequentially, and this process continues until all words are parsed, resulting in the descriptive logical concepts mapped to noun phrases: Step 3.1: For the currently parsable word, list all candidate text fragments containing the parsable word; Step 3.2: Based on the mapping table obtained in Step 1, index the candidate text fragments to the corresponding resources, and generate candidate refinement operations based on the corresponding resources; specifically as follows: The basic concepts in the definition of descriptive logic include five forms: individuals in the EL++ descriptive logic concept. Atom concept The basic conceptual form of the role And hidden character names and For the indexed entities and literals, generate the corresponding forms, including and For the indexed type, generate the corresponding format, including and For the indexed attributes, generate the corresponding form. ; Define refinement operations For: the concept Using basic concepts To modify Part of For known basic concepts and known concepts Refined through enumeration In This part generates all possible refinement operations, whereby for the indexed entities and literals, the corresponding basic conceptual forms are generated. With all basic concept forms that include hidden characters It also generates all refinement operations with non-zero support; for indexed types, it generates corresponding atomic concepts. With all basic concept forms that include hidden characters And generate all refinement operations with non-zero support; for the indexed attribute p, the corresponding role Generate the corresponding basic conceptual form And generate all refinement operations with support not equal to 0; Step 3.3: Perform consistency screening on the newly generated candidate refinement operations and filter out refinement operations that are inconsistent with the syntax; Step 3.4: Use the refinement operation obtained in 3.3 to generate a refined description logic concept for the current parsable word. Score the obtained description logic concepts and select the top k with the highest scores to keep. Then check whether the parsing has been completed, that is, whether the currently parsed parsable word is the last one in the parsing order. If not, proceed to step 3.1 to parse the next parsable word; if yes, proceed to step 3.

5. The scoring function describing logical concepts is: in For support rating, Score the match rate. Rate it for simplicity. , , For the corresponding weights, Description Logic Defined as the smoothed mean of the support sets of several refinement operations during the generation of a logical concept, for a given noun phrase NP and refinement operations. , The set of entities described by a noun phrase, that is, the extension of the phrase, is related to the concept. , For the concept The described set of entities, related to basic concepts , Basic concepts Describing the set of entities, refinement operations Refers to the concept Using basic concepts To modify Part of Supports collection The calculation formula is as follows: in, The part A modified by B describes the extension. itself, The part A modified by B refers to a set of entities that describe the relationship between the extension and the external scope. It is calculated by the following formula, where Indicates the concept Detailed operations, It supports sets Support level: This indicates calculating the average. This represents any refinement operation on concept C. This is an empirical parameter, set to 1; Defined as a noun phrase NP that can be defined by a concept The proportion of matched words is calculated using the following formula: Defined as the number of refinement operations in the concept, the calculation formula is as follows: Step 3.5: For the logical concepts describing all words obtained according to the parsing order, retain the one with the highest score as the output C. best That is, the descriptive logical concept mapped to the noun phrase, which is used by the knowledge base for semantic understanding of the noun phrase.

2. The method for mapping noun phrases to descriptive logical concepts based on extension according to claim 1, characterized in that... When generating the parsing order, all noun phrases corresponding to the noun phrases are obtained through part-of-speech analysis. The head of the noun phrase is defined as the last word of the first noun phrase. For a noun phrase node in the syntax tree, the child node containing the head of the current noun phrase is parsed first as the new noun phrase node. Then, the child nodes to the left of the head are parsed from right to left, and finally, the child nodes to the right of the head are parsed from left to right. That is, the parsing order is from the head to the farthest. For nodes starting with a verb or adverb, the verb or adverb is parsed first, and then the remaining part is parsed according to the parent node's order from left to right or from right to left. For nodes starting with an adjective, the parts other than the adjective are parsed according to the parent node's order phrases, and finally, the adjective is parsed.

3. The method for mapping noun phrases to descriptive logical concepts based on extension according to claim 1, characterized in that... In step 3.3, if the current word to be parsed is the head, then select as follows: Detailed operations, including For concepts in the form of atomic concepts, conversely, if the current word to be parsed is not the phrase head, then a non-phrase is selected. Detailed operations of the form.

Citation Information

Patent Citations

Domain model extraction method and device and readable storage medium
CN113158654A
Reinforcement learning approach to decode sentence ambiguity
US11281855B1

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Domain model extraction method and device and readable storage medium

Reinforcement learning approach to decode sentence ambiguity