Entity attribute information extraction method and device based on syntactic dependency

A technology of entity attributes and attribute information, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of misalignment of attributes in information extraction methods, and achieve the effect of reducing workload, improving efficiency, and improving accuracy.

Active Publication Date: 2018-04-24
湖南星汉数智科技有限公司
View PDF5 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the invention: In order to solve the technical problem of attribute misalignment in the existing information extraction method based on natural language processing, provide a method and device for extracting entity attribute information based on syntactic dependence, combine natural language processing with graph theory, use The syntactic dependency tree in the natural language processing results creates an undirected weighted graph, and uses the shortest path algorithm in graph theory to search for the shortest associated path between entities and associated information, and calculates the semantic similarity between words and attribute keywords on the path , automatically align the attributes of entities and associated information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity attribute information extraction method and device based on syntactic dependency
  • Entity attribute information extraction method and device based on syntactic dependency
  • Entity attribute information extraction method and device based on syntactic dependency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] refer to Figure 1-2 , taking the text "Deng Chao, born in Nanchang, Jiangxi Province in 1979, and admitted to the Performance Department of the Central Academy of Drama in 1998." as an example, the method of extracting entity attribute information based on the syntax-dependent path is explained in detail:

[0056] Step 1: According to the keyword request entered by the user, the text to be extracted is obtained from the Internet with the help of existing crawler software, and the text to be extracted is preprocessed to obtain the text entity to be extracted;

[0057] Step 1.1: Record the text to be extracted as "Deng Chao, born in Nanchang, Jiangxi Province in 1979, and was admitted to the Performance Department of the Central Academy of Drama in 1998." as I, use the HanLP open source tool to segment the text I, and obtain the word set after word segmentation, denoted as W;

[0058] Step 1.2: Use the HanLP open source tool to perform part-of-speech tagging and named e...

Embodiment 2

[0081] Now take the text "Yuan Hong, graduated from the Shanghai Theater Academy, and is Hu Ge's classmate and friend." as an example, to describe in detail the method of extracting entity-related information based on the syntax-dependent path:

[0082] Step 1: Preprocess the text to be extracted to obtain the text entity to be extracted;

[0083] Step 1.1: Record the text to be extracted as "Yuan Hong, graduated from the Shanghai Theater Academy, and is a classmate of Hu Ge." as I, use the Stanford open source NLP tool to process the text I, and obtain the word set after text segmentation, which is recorded as W, the set of words such as image 3 As shown, NN represents a common noun, PU represents a sentence break, VV represents a verb, NR represents a proper noun, VC represents yes, and DEG represents an auxiliary word;

[0084] Step 1.2: Use the Stanford open source NLP tool to perform part-of-speech tagging and named entity recognition on the word set. The obtained word ...

Embodiment 3

[0109] refer to Figure 5 , the present invention also discloses a device for extracting entity related information based on a syntax-dependent path, including:

[0110] The preprocessing module is used to obtain the text to be extracted from the Internet by means of the existing crawler software according to the keyword request input by the user, and preprocess the text to be extracted to obtain the text entity to be extracted;

[0111] The path calculation module is used to establish an undirected weighted graph between words according to the syntactic dependence and part-of-speech relationship of the text to be extracted, and obtain the candidate attribute information of the text entity to be extracted according to the part-of-speech relationship; search in the undirected weighted graph The shortest path between the text entity to be extracted and the words of the candidate attribute information, and the words passing through the shortest path form a set of associated infor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an entity attribute information extraction method and device based on syntactic dependency. The method comprises the steps that firstly, a to-be-extracted text is preprocessedto obtain a to-be-extracted text entity; then, according to the syntactic dependency and the part-of-speech relation of the to-be-extracted text, an undirected weighted graph between words is established, and candidate attribute information of the to-be-extracted text entity is obtained according to the part-of-speech relation; the shortest path between the to-be-extracted text entity and the words of the candidate attribute information is searched for, and the words passing through the shortest path form an association information word set; finally, the semantic similarity between each attribute in the attribute set and the association information word set is calculated, an entity attribute is obtained, and the entity, the entity attribute and the attribute information are integrated to serve as a final extraction result. The natural language processing technology and the graph theory model are combined, the ambiguity of text information is solved, and the text extraction accuracy isimproved; the semantic similarity of the keywords is utilized, the attributes of the abstract information are automatically summarized, and the extraction efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method and device for extracting entity attribute information based on syntactic dependence. Background technique [0002] With the rapid development of Internet applications, the number of web pages and texts on the Internet is also increasing exponentially. How to extract effective and practical information from these massive web pages and texts has become a hot research and development topic in the industry and academia. . At present, information extraction based on structured text has made great progress and has been widely used. However, due to the complex and changeable presentation forms of unstructured free text, as well as the diversity and ambiguity of text semantics, coupled with the existence of a large number of invalid and interfering text pictures and other information in the text, the information extraction of free text is further increased....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/279
Inventor 郭建京彭建辉
Owner 湖南星汉数智科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products