Bidirectional recursive neural network-based enterprise abbreviation extraction method

A recurrent neural network and extraction method technology, applied in the field of natural language processing, can solve the problems of time-consuming and labor-intensive feature templates, poor generality, and difficult global optimal prediction results.

Inactive Publication Date: 2016-09-28
成都数联铭品科技有限公司
View PDF3 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

To use conditional random fields, you first need to design and construct feature templates based on the characteristics of the entity to be recognized. Feature templates include first-order words or multi-order phrases with a specified window size context, word prefixes, suffixes, part-of-speech tags and other status features; feature templates The construction is very time-consuming and labor-intensive, and the feature templates set manually are often only based on the characteristics of some samples, which has poor versatility; the recognition results are highly depende

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bidirectional recursive neural network-based enterprise abbreviation extraction method
  • Bidirectional recursive neural network-based enterprise abbreviation extraction method
  • Bidirectional recursive neural network-based enterprise abbreviation extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described in detail below in conjunction with test examples and specific embodiments. However, it should not be understood that the scope of the above subject matter of the present invention is limited to the following embodiments, and all technologies realized based on the content of the present invention belong to the scope of the present invention.

[0033] The present invention provides a method for extracting enterprise abbreviations based on a bidirectional recursive neural network. The text to be processed is serialized through word segmentation, and a certain amount (such as 5,000 pieces) of text to be processed is selected for manual labeling, and the company name is marked in sections. It is: the beginning part, the keyword part, the industry part and the organizational form part, the data other than the enterprise name are marked as irrelevant parts, and the training samples after the mark are input into the bidirectional ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of natural language processing, in particular to a bidirectional recursive neural network-based enterprise abbreviation extraction method. The method comprises the steps of serializing to-be-processed texts through word segmentation processing; selecting a certain number of the to-be-processed texts to perform manual annotation, segmentally annotating enterprise names in the to-be-processed texts as starting parts, keyword parts, industrial parts and organization form parts, and annotating data except the enterprise names as unrelated parts; inputting annotated training samples into a bidirectional recursive neural network to train the bidirectional recursive neural network; extracting word sequences belonging to the enterprise names through prediction of the bidirectional recursive neural network, and further extracting fields belonging to the keyword parts of the names as enterprise abbreviations; and establishing a corresponding enterprise abbreviation database. Therefore, powerful technical support is provided for related information analysis of informal texts.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a method for extracting enterprise abbreviations based on a bidirectional recursive neural network. Background technique [0002] With the rapid development of the Internet, a large amount of public web data has been generated, which has also spurred various emerging industries based on big data technology, such as Internet medical care, Internet education, corporate or personal credit investigation, etc. The rise of these Internet industries is inseparable from the analysis of a large amount of information and data, and the value of information analysis lies in sharpness and accuracy. Sharp analysis requires timely and rapid discovery of new information; but most of the data obtained directly from web pages are very Structured, in order to use these data, data cleaning has become the place where companies spend the most time and energy. In data cleaning, the extractio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06N3/02
CPCG06F16/3325G06N3/02
Inventor 刘世林何宏靖
Owner 成都数联铭品科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products