Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Control method and device of Chinese entity relationship extraction based on word co-occurrence

A control method and entity relationship technology, applied in the direction of instruments, calculations, electrical digital data processing, etc., can solve the problems of unsatisfactory recall rate and accuracy rate, unsuitable for Chinese relationship extraction, etc., and achieve the effect of improving accuracy rate

Inactive Publication Date: 2012-07-18
EAST CHINA NORMAL UNIV
View PDF3 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, the methods of entity relationship extraction mainly include: (1) template matching method, which uses natural language processing related knowledge to construct a pattern set and store it. When performing relationship extraction, the preprocessed sentence and pattern set If the matching is successful, it can be considered that the statement has the relational attributes of the corresponding pattern. However, extracting sufficient and high-quality templates is a bottleneck in the application of this method; (2) Based on the dictionary-driven method, the method It is only limited to the case of verb-centered relations; (3) Ontology-based relation extraction, which requires experts to build a large-scale knowledge base, which increases a lot of manual participation; (4) Based on machine learning, this method Training data and using various learning methods to convert into classification problems, usually constructing feature vectors, but this method is relatively slow and not suitable for the extraction of Chinese relations
At the same time, most of the current relationship extraction methods are more suitable for English entity relationship extraction. The Chinese language structure is flexible and the potential meaning is rich. The final recall rate and accuracy rate applied in Chinese entity relationship extraction are not ideal.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Control method and device of Chinese entity relationship extraction based on word co-occurrence
  • Control method and device of Chinese entity relationship extraction based on word co-occurrence
  • Control method and device of Chinese entity relationship extraction based on word co-occurrence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] see Figure 6 , Figure 7 , the invention discloses a Chinese entity relationship extraction method based on word co-occurrence and pattern matching, the method calculates the correlation degree of words by counting the co-occurrence frequency of words in a news corpus, and applies a group of related words with the same two words As its feature vector, calculate the similarity between two words; then, combined with pattern matching technology, consider the similarity between words, word position, word of speech and whether the word is a verb, etc., calculate the difference between the seed pattern sentence and the test sentence The matching similarity between them is selected, and the relationship in the seed pattern sentence with the highest similarity is selected as the relationship between entities in the test sentence. In practical applications, in a search engine environment, through user queries, the corpus documents can be captured and analyzed to obtain documen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a control method of Chinese entity relationship extraction based on word co-occurrence, which comprises the steps of: a. calculating a word correlation degree by statistics of co-occurrence frequency of words; b. calculating word similarity according to the word correlation degree; and c. determining an entity relationship according to the word similarity. The invention also provides a corresponding control device. A corpus used by the method is a news corpus, so that the corpus can be directly created by extracting news texts and titles via the current mature webpage analysis technology without a large amount of manpower participating in the corpus creation; and the method is capable of obtaining such information as word frequency used for calculating the word correlation degree, word position considered when calculating the matching similarity, part of speech of the words and whether the words are verbs, and the like, by utilizing a shallow language rule, for example, participle and part of speech marking in natural language processing and simple statistical techniques, and the method can combine the semantic information of the words with a traditional mode matching method.

Description

technical field [0001] The invention relates to the technical field of entity relationship extraction, in particular to a Chinese entity relationship extraction method based on word co-occurrence and pattern matching. Background technique [0002] The rapid development of the Internet has brought us massive information resources, which are causing profound changes in the way people work, study, live and entertain. However, there are also certain disadvantages. In the face of large-scale information, it is difficult for users to find their real needs. Most of the current search engines are just simple keyword matching, returning many pages with little relevance to users, so users spend a lot of time looking for useful information. Users hope that the search engine can have human-like associative ability, and hope that on the basis of understanding the content of a single concept, it can also find out other information related to this concept, such as the relationship between...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 杨静王晶周凌琛刘金盼陈超贺樑
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products