Recognition method and system of named entities in microblog messages

A named entity recognition and named entity technology, applied in special data processing applications, instruments, electronic digital data processing and other directions, can solve the problems of scattered sentence forms, irregular grammar, arbitrary language forms, etc., to avoid error accumulation and reduce labor. cost effect

Active Publication Date: 2013-08-28
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF4 Cites 49 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current research on named entity recognition is usually oriented to the recognition of standardized texts, such as scientific and technological documents, news reports, etc., while Weibo message texts have their own characteristics, such as arbitrary language forms, irregular grammar, and scattered sentence forms. Existing named entity recognition methods cannot accurately analyze the named entities appearing in the text of Weibo messages
Moreover, the existing named entity recognition method (also known as supervised named entity recognition method) needs to manually mark a certain amount

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Recognition method and system of named entities in microblog messages
  • Recognition method and system of named entities in microblog messages
  • Recognition method and system of named entities in microblog messages

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below through specific embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0068] figure 1 A weakly supervised named entity recognition method in microblog messages according to an embodiment of the present invention is given. The method includes: step 1) designate a small number of named entities as seeds, and automatically mark a certain amount of microblog data in the original microblog message collection (or original microblog message database) to be processed as the training data for training the named entity recognizer set; step 2) train a named recognizer based on the training data set; step 3) use the trained named recognizer to identify named entiti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a recognition method of named entities in microblog messages. The recognition method includes that a few named entities are specified as seeds; a certain number of microblog messages from the original microblog message set to be processed are automatically marked as a training data set; and then the training data set is utilized to train a named entity identifier and the trained named entity recognizer is utilized to recognize the named entities in the microblog messages. According to the recognition method of the named entities in the microblog messages, only a few existing seed entities need to be specified to enable a high quality training set to be automatically marked; the labor costs are significantly reduced for the microblog messages which are texts capable of being updated rapidly; and an iterative mode is utilized to generate high quality marked data step by step and each time first N newly named entities which can most reflect the appearing law of the named entities in real microblog data are selected to add into a seed bank, so that finally generated marked data can well cover the integral microblog message set.

Description

technical field [0001] The invention relates to network data processing and analysis, in particular to a method for automatically identifying named entities in microblog message texts. Background technique [0002] Microblog is a newly emerging form of information publishing and dissemination on the Internet. Because of its convenience, shortness and speed of publishing information, Weibo has quickly attracted the attention of Internet users. At present, there are hundreds of millions of microblog users in China. On large-scale microblog platforms such as Sina, Tencent, Sohu and Netease, users generate a large number of microblog message texts every day. There are close to 100 million blog messages. On the Weibo platform, every Internet user is a "self-media", that is, users can spread what they see and hear by posting Weibo messages, as well as express their views, needs and interests. These messages are aggregated to form a massive message collection, and such a massive ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
Inventor 程学旗伍大勇李静远王元卓刘倩
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products