Automatic massive-text labeling method based on exception handling

An exception handling and automatic labeling technology, which is applied in the fields of electrical digital data processing, special data processing applications, natural language data processing, etc., can solve the problem of insufficient number of Freebase named entity types, errors and omissions, and incorrect labeling of named entity parsers Find out all the problems of people, organizations, and location entities, so as to improve the recall rate and accuracy, and prevent wrong labeling

Active Publication Date: 2015-01-21
BEIHANG UNIV
View PDF2 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, there are many limitations in the remote supervised entity relationship extraction method. According to the training situation, the named entity parser can find limited types of named entities. For example, Stanford's named entity parser can only identify people, organizations The three named entity types of and location are less than one percent of the number of named entity types in Freebase, and the recall rate cannot well meet the needs of users
Secondly, the named entity parser cannot correctly mark all the entities of people, organizations and places, and there will be errors and omissions, which cannot effectively meet the needs of text labeling

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic massive-text labeling method based on exception handling
  • Automatic massive-text labeling method based on exception handling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The technical content of the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0034] Such as figure 1 As shown, the present invention provides a method for automatic labeling of massive texts based on exception handling, comprising the following steps: extracting named entities and named entity pairs with relationships from the knowledge base, storing them respectively, and according to the stored named entities, by using characters String matching finds the named entity of each sentence in the massive text, and finds out the sentences in which all named entity pairs co-occur in the text according to the stored named entity pairs, and performs rough annotation. Finally, the wrongly labeled named entities are removed by filtering algorithm, and the final text labeling result is obtained. The following is a detailed description of this process.

[0035] S1, extract the named entity and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic massive-text labeling method based on exception handling. The automatic massive-text labeling method based on the exception handling comprises the following steps of S1, extracting named entities and related named entity pairs from a knowledge base and respectively storing the named entities and the related named entity pairs; S2, finding the named entities of all sentences in a massive amount of texts through string matching according to the stored named entities, finding co-occurring sentences of all named entity pairs in the texts according to the stored named entity pairs and performing overstriking labeling; S3, judging the named entities in the sentences subjected to overstriking labeling, deleting abnormal word pairs when the abnormal word pairs exist, using a filter algorithm to filter out all abnormal named entities in the named entities when the abnormal named entities exist, and finally obtaining final text labeling results. The automatic massive-text labeling method based on exception handling can effectively solve the mistaken labeling problem in the labeling process and improves the text labeling accuracy rate.

Description

technical field [0001] The invention relates to a text labeling method, in particular to a massive text automatic labeling method based on exception processing, and belongs to the technical field of natural language processing. Background technique [0002] With the rapid development of science and technology, all kinds of information emerge in endlessly, even far beyond the ability of human beings to read. How to effectively use massive data and find out the needed information has been paid more and more attention by people. Information Extraction is a technique to help people use massive data. Its main purpose is to extract specific events, facts and other information from unstructured natural language texts, and then convert them into structured or semi-structured information, and then store them in the database for query and further analysis and utilization. , question answering system, text mining and other application systems provide an important foundation. Entity ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3344G06F16/335G06F16/367G06F40/284
Inventor 刘瑞左源王德庆
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products