An automatic labeling method for massive texts based on exception handling

An exception handling and automatic labeling technology, applied in the direction of electrical digital data processing, special data processing applications, natural language data processing, etc., can solve the shortage of Freebase named entity types, errors and omissions, and the recall rate cannot be well satisfied Users and other issues to achieve the effect of improving recall rate and accuracy, and preventing wrong labeling

Active Publication Date: 2017-12-08
BEIHANG UNIV
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, there are many limitations in the remote supervised entity relationship extraction method. According to the training situation, the named entity parser can find limited types of named entities. For example, Stanford's named entity parser can only identify people, organizations The three named entity types of and location are less than one percent of the number of named entity types in Freebase, and the recall rate cannot well meet the needs of users
Secondly, the named entity parser cannot correctly mark all the entities of people, organizations and places, and there will be errors and omissions, which cannot effectively meet the needs of text labeling

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An automatic labeling method for massive texts based on exception handling
  • An automatic labeling method for massive texts based on exception handling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0035] Such as figure 1 As shown, the present invention provides a method for automatic labeling of massive texts based on exception handling, comprising the following steps: extracting named entities and named entity pairs with relationships from the knowledge base, storing them respectively, and according to the stored named entities, by using characters String matching finds the named entity of each sentence in the massive text, and finds out the sentences in which all named entity pairs co-occur in the text according to the stored named entity pairs, and performs rough annotation. Finally, the wrongly labeled named entities are removed by filtering algorithm, and the final text labeling result is obtained. The following is a detailed description of this process.

[0036] S1, extract the named entity and the named entity pair wit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic massive-text labeling method based on exception handling. The automatic massive-text labeling method based on the exception handling comprises the following steps of S1, extracting named entities and related named entity pairs from a knowledge base and respectively storing the named entities and the related named entity pairs; S2, finding the named entities of all sentences in a massive amount of texts through string matching according to the stored named entities, finding co-occurring sentences of all named entity pairs in the texts according to the stored named entity pairs and performing overstriking labeling; S3, judging the named entities in the sentences subjected to overstriking labeling, deleting abnormal word pairs when the abnormal word pairs exist, using a filter algorithm to filter out all abnormal named entities in the named entities when the abnormal named entities exist, and finally obtaining final text labeling results. The automatic massive-text labeling method based on exception handling can effectively solve the mistaken labeling problem in the labeling process and improves the text labeling accuracy rate.

Description

technical field [0001] The invention relates to an automatic labeling method for massive texts, in particular to an automatic labeling method for massive texts based on exception processing, which belongs to the field of natural language processing. Background technique [0002] With the rapid development of today's science and technology, all kinds of information emerge in an endless stream, even far beyond the ability of human reading. How to effectively use massive data and find out the required information has attracted more and more attention. Information Extraction (Information Extraction) is a technology to help people use massive data. The main purpose of information extraction is to extract specific events, facts and other information from unstructured natural language texts, and then convert them into structured or semi-structured The information is then stored in the database for query and further analysis, providing an important basis for application systems such...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3344G06F16/335G06F16/367G06F40/284
Inventor 刘瑞左源王德庆
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products