Chinese entity extraction method and device

An extraction method and extraction device technology, applied in the computer field, can solve problems such as insufficient consideration of sentence semantic information, and achieve the effect of improving accuracy

Active Publication Date: 2020-06-16
NORTH CHINA UNIVERSITY OF TECHNOLOGY
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing technology directly inputs long sentences into the BiLSTM-CRF (Bi-directional Long Short-Term Memory-Conditional Random Field) model, which does not consider enough semantic information of sentences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese entity extraction method and device
  • Chinese entity extraction method and device
  • Chinese entity extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The specific embodiments of the present invention will be further described below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solution of the present invention more clearly, but not to limit the protection scope of the present invention.

[0050] figure 1 A schematic flow chart of a Chinese entity extraction method provided in this embodiment is shown, including:

[0051] S11. Segment the target source sentence based on punctuation marks to obtain clauses.

[0052] In the embodiment of the present invention, the target source sentence is a sentence of the entity to be extracted.

[0053] In the embodiment of the present invention, entities may be in different positions of the target source sentence, and different positions require different information when identifying entities, that is, context information has different influences on entity recognition. In order to identify entities accurately, it i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a Chinese entity extraction method and device. The method comprises the steps that a target source statement is segmented into clauses; the words in the clause are vectorized to obtain a word vector; a probability matrix of each label corresponding to each word obtained by a long short-term memory network LSTM is determined according to the word vector andthe hierarchical bidirectional long short-term memory network BiLSTM; the probability matrix is input into a CRF model to obtain a label with the maximum probability in the labels corresponding to each word; and an entity consisting of the words corresponding to the label with the maximum probability is extracted. According to the embodiment of the invention, the target source statement is segmented into the clauses, so that subsequent semantic representation in the clauses learned at a word level and inter-clause semantic representation learned at a clause level are facilitated; through theCRF model, the tag with the maximum probability in the tags corresponding to each word is determined, and the Chinese entity composed of the words corresponding to the tag with the maximum probabilityis extracted, so that the Chinese entity identification accuracy is improved.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a Chinese entity extraction method and device. Background technique [0002] With the advancement of science and technology and the digitization of information, tremendous changes and innovations have taken place in all walks of life. [0003] In recent years, entity recognition in specific fields has received continuous research attention. For example, in the field of food safety, NER (Named Entity Recognition, Named Entity Recognition) automatically recognizes entities related to food and generates structured data to help build Knowledge graph in the field of food. Domain-specific cases are usually recorded by recorders, but recorders sometimes use Chinese abbreviations, resulting in multiple expressions of the same entity. And for entities that are mixed with Chinese characters, letters, numbers and punctuation marks, it increases the difficulty of identifying entities. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/295G06F40/30G06N3/04
CPCG06N3/044
Inventor 董哲邵若琦康宇佳李月恒
Owner NORTH CHINA UNIVERSITY OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products