Entity reference item identification method based on topic model and semantic analysis

A technology of entity referent and topic model, applied in semantic analysis, character and pattern recognition, natural language data processing, etc., can solve the problem of low effectiveness of entity boundary detection and classification methods, and achieve the effect of improving effectiveness

Pending Publication Date: 2020-04-17
INST OF ELECTRONICS & INFORMATION ENG OF UESTC IN GUANGDONG
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The inventors found that existing methods have defects: e...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity reference item identification method based on topic model and semantic analysis
  • Entity reference item identification method based on topic model and semantic analysis
  • Entity reference item identification method based on topic model and semantic analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Certain terms are used, for example, in the description and claims to refer to particular components. Those skilled in the art should understand that hardware manufacturers may use different terms to refer to the same component. The specification and claims do not use the difference in name as a way to distinguish components, but use the difference in function of components as a criterion for distinguishing. As mentioned throughout the specification and claims, "comprising" is an open term, so it should be interpreted as "including but not limited to". "Approximately" means that within an acceptable error range, those skilled in the art can solve technical problems within a certain error range and basically achieve technical effects.

[0039] In the description of the present invention, it should be understood that the orientation or positional relationship indicated by the terms "upper", "lower", "front", "rear", "left", "right", horizontal" etc. are based on the draw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an entity reference item recognition method based on a topic model and semantic analysis, and the method comprises the following steps: 1, carrying out the sentence segmentation, word segmentation, part-of-speech tagging and dependency analysis of an input corpus; step 2, based on syntactic analysis, obtaining noun phrases with complete boundaries as a candidate set of entity reference items, and then comprehensively utilizing an LDA topic model and a TF-IDF statistical algorithm to filter non-entity reference items from the candidate set; and step 3, measuring semanticsimilarity between the entity reference items and the seed entities, selecting seed categories with high similarity as entity categories, and then classifying the entity reference items of each entity category into corresponding reference item categories by utilizing shallow syntactic knowledge setting rules. The effectiveness of the entity boundary detection and classification method can be improved.

Description

technical field [0001] The invention belongs to the technical field of language data processing, and in particular relates to a method for identifying entity reference items based on topic models and semantic analysis. Background technique [0002] Information extraction is a key step in understanding and processing natural language data, with the goal of identifying and classifying important information conveyed in the data. Since entities are the basic units that carry information, entity recognition becomes the basic task of information extraction and provides data support for other tasks, including entity disambiguation, relationship extraction, event extraction, etc. As the underlying information extraction technology, entity recognition plays an important role in the field of artificial intelligence, including knowledge graphs, question answering systems, machine translation, natural language understanding, etc. The early entity recognition technology mainly identifie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/295G06F40/30G06F40/211G06F16/35G06K9/62
CPCG06F16/35G06F18/22
Inventor 韩伟红徐菁陈雷霆母国才尹怀东
Owner INST OF ELECTRONICS & INFORMATION ENG OF UESTC IN GUANGDONG
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products