Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for extracting relations among named entities in Internet massive data and system thereof

A named entity, massive data technology, applied in network data retrieval, network data indexing, electronic digital data processing and other directions, can solve the problem of not fully representing the category theme

Active Publication Date: 2015-09-23
SOUTH CHINA UNIV OF TECH
View PDF4 Cites 49 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The primary purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and provides a method for extracting the relationship between named entities in the massive data of the Internet. Multivariate relationship to many, improves the recall rate of extracting entity relationship pairs, improves the problem that frequent words cannot fully represent category topics, and improves the accuracy rate in relationship extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting relations among named entities in Internet massive data and system thereof
  • Method for extracting relations among named entities in Internet massive data and system thereof
  • Method for extracting relations among named entities in Internet massive data and system thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0090] Such as figure 1 As shown, a method for extracting the relationship between named entities in massive Internet data includes the following steps:

[0091] Network information crawling and corpus construction, crawling named entities and text explanations about entities from encyclopedia websites, and a large amount of structured entity data from the Internet, that is, "seed" data, this part of information will be used as training data to guide updates A large number of entity relationship annotations;

[0092] Text preprocessing, segmenting the crawled text, tagging parts of speech and removing stop words;

[0093] Extract keywords representing short document features, and use frequent word extraction method and improved weighted entropy method to obtain keywords representing text meaning in short text;

[0094] Obtain the "entity-relationship model" representing the entity relationship, for example, in the short text "#大张伟#_P sang the song "#临儿刚#_S" at the Spring Fes...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for extracting relations among named entities in Internet massive data. The method comprises the following steps: crawling network information and constructing a corpus; preprocessing texts; extracting keywords representing short document features; acquiring 'entity-relation modes' representing entity relations; annotating the relations, and finding new 'entity-relation pairs' from a large quantity of unstructured texts by using the modes; and evaluating the entity-relation pairs. The invention also discloses a system for implementing the method for extracting the relations among the named entities in Internet massive data. The system comprises a network information crawling module, an information preprocessing module, a feature word extraction module and an entity relation extraction and evaluation module. The method and the system have the advantages of greater convenience in extension of a relation lookup system, high running efficiency and the like.

Description

technical field [0001] The invention relates to a technology for extracting relationships between named entities in massive Internet data, in particular to a method and system for extracting relationships between named entities in massive Internet data. The entropy relational extraction method and system combine Hadoop distributed technology in order to adapt to the huge amount of data on the Internet and maintain flexible scalability. Background technique [0002] At present, with the rapid development of Internet technology, the data accumulated in the Internet is growing exponentially. Since the beginning of the 21st century, due to the rapid development of network hardware facilities and cheaper storage media, the amount of data stored in the Internet has reached It is unprecedentedly huge, and almost everyone in the world is continuously contributing data resources to it. In this context, great changes are quietly taking place in the fields of technology, business, man...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 蔡毅李靖楠闵华清
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products