Information extraction system based on bidirectional RNN

A technology of information extraction and information extraction module, which is applied in the field of information extraction system based on two-way RNN, can solve the problems that the recognition result is highly dependent on the feature template, the prediction result is difficult to achieve global optimization, and the versatility is poor.

Inactive Publication Date: 2016-09-28
成都数联铭品科技有限公司
View PDF0 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Feature templates include first-order words or multi-order phrases with a specified window size context, word prefixes, suffixes, part-of-speech tags and other state features; the construction of feature templates is very time-consuming and labor-intensive, but the recognition results are extremely dependent on feature templates; Manually set feature templates are often only based on the characteristics of some samples, which is poor in versatility; and usually only local context information can be used, and the use of each feature template is independent of each other. The prediction cannot rely on longer historical state information, nor can it Use longer future information feedback to correct possible historical errors, the forecasting process is complicated, and the forecasting results are difficult to achieve global optimality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information extraction system based on bidirectional RNN
  • Information extraction system based on bidirectional RNN
  • Information extraction system based on bidirectional RNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] Obtain the following news text on the Internet: Beijing AB Electronics Co., Ltd., a wholly-owned subsidiary of Beijing AB Holdings Group Co., Ltd., intends to invest in the establishment of ABEF Big Data Financial Services Co., Ltd. with CDEF Technology Co., Ltd. and 2 natural persons, mainly for banks of financial institutions provide commercial big data solutions for financial services. "Enter this text into the load stanford-word-segmenter word segmenter module, and the word segmentation of the word segmentation module forms the following sequence of 53 words: "Beijing / A / B / Holdings / Group / Shares / Co., Ltd. / of / Wholly-owned / Subsidiary / Beijing / A / B / Electronics / Co., Ltd. / Proposed / United / C / D / E / F / Technology / Co., Ltd. / and / 2 / Name / Natural Person / Investment / Establishment / A / B / E / F / big data / gold / service / limited company / , / for / based on / bank / of / financial / institution / provide / financial / service / of / commercial / big data / solution / . "The above-mentioned character sequence is sequentially i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of natural language processing and particularly relates to an information extraction system based on bidirectional RNN. The system comprises a word segmentation module, a dictionary mapping table, a bidirectional RNN module and an information extraction module, wherein the word segmentation module, the dictionary mapping table, the bidirectional RNN module and the information extraction module are sequentially connected. A classification sequence of information corresponding to an inputted text sequence is predicted through the bidirectional RNN module and a corresponding entity name in the classification sequence is extracted through the information extraction module. When the bidirectional RNN module used by the system predicts the entity name, vector information converted by the to-be-processed text sequence is firstly forwardly and then backwardly inputted to the bidirectional RNN module at a corresponding moment, prediction of the classification result at each moment depends on historical information and also depends on future information, the predicted result is more accurate and reasonable, and the system has good application prospect in the data processing field, and particularly, in the fields of entity name extraction and recognition.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to an information extraction system based on bidirectional RNN. Background technique [0002] With the rapid development of the Internet, a large amount of public web data has been generated, which has also spurred various emerging industries based on big data technology, such as Internet medical care, Internet education, corporate or personal credit investigation, etc. The rise and prosperity of these Internet industries is inseparable from the analysis of a large amount of data information; however, most of the data obtained directly from web pages are unstructured. In order to use these data, data cleaning has become the most time-consuming and energy-consuming task for major companies. place. In data cleaning, the extraction of specific information, especially the extraction of named entities, is a common occurrence. For example, when doing business credit investigati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06Q10/04G06N3/04
CPCG06Q10/04G06N3/045
Inventor 刘世林何宏靖
Owner 成都数联铭品科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products