Named-entity recognition model training method and named-entity recognition method and device

A named entity recognition and named entity technology, applied in the field of training named entity recognition models, can solve the problems of low recognition accuracy and lack of generalization ability of the model, and achieve the effect of improving recognition accuracy and good generalization ability

Inactive Publication Date: 2015-05-13
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF3 Cites 93 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although this method can achieve good results on small-scale data, it relies on the Markov assumption (whether the current word is part of a named entity depends on a fixed number (usually 2) of words in front of it) , leading to the lack of generalization ability of the model, and the recognition accuracy on large-scale data is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named-entity recognition model training method and named-entity recognition method and device
  • Named-entity recognition model training method and named-entity recognition method and device
  • Named-entity recognition model training method and named-entity recognition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0022] figure 2 It is a flow chart showing the method for training the RNN named entity recognition model according to Embodiment 1 of the present invention. The RNN named entity recognition model is used to recognize named entities in text.

[0023] refer to figure 2 , in step S110, obtain a plurality of labeled sample data, each of which includes a text string and a plurality of word segmentation annotation data thereof, and the word segmentation annotation data includes the word segmentation and its presence in the text string Named entity attribute flags in the described text string.

[0024] Specifically, according to the concept of the present invention, the named entity attribute flag of the word segmentation in the text string includes information about whether the word segmentation belongs to a named entity.

[0025] In addition, the named entity attribute flag of the word segmentation in the text string may further include a position label of the word segmentati...

Embodiment 2

[0050] Figure 4 It is a flow chart showing the method for identifying named entities in Embodiment 2 of the present invention. The method may be performed, for example, on a search engine server.

[0051] refer to Figure 4 , in step S210, a text string is acquired.

[0052] The text string may be a search term sent from the client. For example, the user inputs "I never thought why it was so hot?" on the browser search engine interface to search, and the browser application sends the search term to the search engine server.

[0053] In step S220, word segmentation is performed on the text string to obtain a plurality of word segmentations.

[0054] For example, the search engine server may use the existing word segmentation technology to perform word segmentation processing on the acquired text string to obtain multiple word segments.

[0055] In step S230, the RNN named entity recognition model trained according to the method described in the first embodim...

Embodiment 3

[0061] Figure 5 It is a logical block diagram showing the device for training the RNN named entity recognition model according to Embodiment 3 of the present invention.

[0062] refer to Figure 5 , the RNN named entity recognition model is used to recognize named entities in text, and the device for training the RNN named entity recognition model includes a sample data acquisition module 310 and a parameter learning module 320 .

[0063] The sample data acquisition module 310 is used to acquire a plurality of labeled sample data, each of which includes a text string and a plurality of word segmentation tagging data thereof, and the word segmentation tagging data includes a word segment and its tagging data separated from the text string Named entity attribute flags in the text string.

[0064] Optionally, the named entity attribute flag of the word segmentation in the text string includes information about whether the word segmentation belongs to a named entity. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention provides a named-entity recognition model training method and a named-entity recognition method and device. The method used for training a recurrent neutral network (RNN) named-entity recognition model includes: acquiring multiple labeled sample data, wherein each sample datum includes a text string and multiple term segment labeled data thereof, and each term segment labeled datum includes segmented terms separated from the text string and a named-entity attribute tag in the text string; mapping the segmented terms in the labeled sample data to be term vectors, taming the sample data as training samples, training the RNN named-entity recognition model, and learning parameters of the RNN named-entity recognition model. By the named-entity recognition model training method and the name-entity recognition method and device, the trained model has better generalization ability, the named entity in the natural language tests can be recognized rapidly, and recognition accuracy of the named entity is improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method for training a named entity recognition model, a named entity recognition method and a device. Background technique [0002] The recognition of named entities (such as person names, place names, organization names, network vocabulary with specific meanings, etc.) is an important part of natural language understanding. One of the cores of domain applications (such as search systems, machine translation systems, etc.). For example, if a search engine can recognize that the user's search term "WanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWanWang" is the name of an online film and television drama if the search engine can use the named entity database to return more accurate search results to the user. [0003...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 张军
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products