Text language association extraction method and system based on recurrent neural network

A recursive neural network and text technology, applied in the field of entity recognition and entity relationship extraction, can solve problems such as difficult expansion, non-existence, and high cost, and achieve data-driven, time-saving, and performance-improving effects

Inactive Publication Date: 2020-07-03
PEKING UNIV +3
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The cost of these feature engineering is high, and it is difficult to expand on large-scale data sets. It does not play the role of massive data well, and it is not data-driven.
Moreover, many hidden high-order interactions of contextual features cannot be covered by artificially constructed features.
Moreover, the entity normalization modules in these methods all rely on existing dictionaries, and there is an unreasonable assumption that "standard entity representations already exist in dictionaries".
However, the coverage of existing dictionaries is limited, and many corpora lack dictionaries in corresponding fields.
Especially in today's highly developed information technology, some new entities often appear in the text of the news media, such as reports on newly established institutions, newly issued bonds, new events, etc. These new entities do not exist in the existing In some dictionaries and knowledge bases, dictionary-dependent methods cannot normalize the names of such new entities

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text language association extraction method and system based on recurrent neural network
  • Text language association extraction method and system based on recurrent neural network
  • Text language association extraction method and system based on recurrent neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be described in detail below through specific embodiments and accompanying drawings.

[0028] figure 1 It is a schematic diagram of the constituent modules of the text entity and entity association relationship extraction system based on the recurrent neural network in the embodiment of the present invention, figure 2 It is a schematic diagram of the data flow and network structure of the text entity and entity association relationship extraction system based on the recurrent neural network according to the embodiment of the present invention. combine figure 1 and figure 2 ,right figure 1 The functions and implementation of each module shown are described as follows:

[0029] (1) The contextual feature encoder based on the temporal recurrent neural network (bidirectional long-term and short-term memory network), which consists of a forward long-term short-term memory network (LSTM) and a backward long-term short-term memory network, is r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text language association extraction method and system based on a recurrent neural network. According to the method, complex context features are automatically extracted based on a recurrent neural network (bidirectional long-short-term memory network), and semantic information of contexts is encoded; finding a definition mode in the document through a rule-based entity expression extractor, identifying definitions related to non-standard expressions in the document, and extracting defined standard expressions and non-standard expressions belonging to the same entityconcept; encoding the extracted features of the entity expression pair, and embedding information about entity normalization into a low-dimensional entity expression vector; the entity expression vector and the context feature coding vector are connected and subjected to dimension conversion to obtain a final code; and decoding a globally optimal state sequence as a final output sequence based ona decoder of the conditional random field in combination with the transition probability between the features learned by the encoder and the state. According to the invention, the entity identification performance can be effectively improved.

Description

technical field [0001] The invention belongs to the field of artificial intelligence, and relates to extracting information from massive unstructured data using natural language processing technology, specifically refers to identifying entities and extracting entity associations from text, which is a key technology for information extraction. Background technique [0002] Text entity extraction is to identify meaningful entities from text, such as person names, place names, organization names, etc. It is a key technology for extracting information from massive unstructured data, and is the cornerstone of many complex natural language processing applications, such as intelligent question answering, knowledge graph, automatic summarization, machine translation, etc. [0003] Due to the rich expression forms of natural language, the same entity may have many different expressions, such as the full name, abbreviation and alias of the entity. The phenomenon of "multiple words wi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F16/31G06F16/36G06N3/04G06N3/08
CPCG06N3/08G06N3/048G06N3/045
Inventor 韩英陈薇王腾蛟李强刘迪黄晓光
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products