Data identification method and device

A data recognition and database technology, applied in the field of data recognition, can solve problems such as poor effect, achieve the effect of reducing time and equipment consumption, and good entity recognition effect

Active Publication Date: 2022-07-12
浙江香侬慧语科技有限责任公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the above technical problems, the embodiment of the present invention expects to provide a method and device for data recognition, so as to at least solve the problem of poor effect when applying the model in the general field to the subject field text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data identification method and device
  • Data identification method and device
  • Data identification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] In a first aspect, an embodiment of the present invention provides a method for data identification, figure 1 A schematic flowchart of a method for data identification provided in Embodiment 1 of the present invention; such as figure 1 As shown, the data identification method provided by the embodiment of the present application includes:

[0031] Step S102, inputting the obtained input sample into the entity recognition model, to obtain a first probability distribution consisting of each word vector in the input sample, the entity of each word vector, and the probability of the entity;

[0032] Optionally, before inputting the acquired input sample into the entity recognition model in step S102, the data recognition method provided in this embodiment of the present application further includes: constructing a cache database according to a pre-stored training set, where the cache database contains all training data and All entities, among them, all entities are obtaine...

Embodiment 2

[0056] In a second aspect, an embodiment of the present invention provides a device for data identification, image 3 A schematic diagram of a data identification device provided in Embodiment 2 of the present invention, such as image 3 As shown, the device for data identification provided by the embodiment of the present application includes:

[0057] The recognition module 32 is used to input the obtained input sample into the entity recognition model, and obtains the first probability distribution composed of the probability of each word vector, the entity of each word vector and the entity in the input sample; the search module 34 is used to input the input sample. The sample input is searched in the pre-created cache database, and at least one unit pair matching the word vector in the input sample is obtained; the merging module 36 is used for merging the entity and the probability of the entity in the at least one unit pair with the first probability distribution, A se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a data identification method and device. The data identification method comprises the following steps: inputting an obtained input sample into an entity identification model to obtain first probability distribution consisting of each word vector in the input sample, an entity of each word vector and a probability of the entity; inputting the input sample into a pre-created cache database for searching to obtain at least one unit pair matched with the word vector in the input sample; combining the entity and the probability of the entity in the at least one unit pair with the first probability distribution to obtain second probability distribution; and labeling the word vectors in the input sample according to the second probability distribution. According to the scheme provided by the invention, the model can be trained in the general field, and entity lists of different subject fields are continuously collected into the database, so that a good entity recognition effect can be obtained on interdisciplinary texts by single-model service, and time and equipment consumption caused by multiple model services is reduced.

Description

technical field [0001] The invention relates to the application field of computer technology, in particular to a method and device for data identification. Background technique [0002] Entity recognition models trained in the general domain of news do not work well on textual data from different disciplines (chemistry, biology, physics, computer science and technology, etc.). Among them, the entity recognition model is used to automatically identify the artificially specified entity and mark the corresponding entity name through the model in a given sentence. For example, given "Beijing is the capital of China", there are two categories of entities named "region" and "country". At this time, when "Beijing is the capital of China" is input into the model, the model needs to predict that the two locations "Beijing" and "China" should be entities, and mark "region" for "Beijing" and "China" Label "Country". [0003] However, due to the difficulty of labeling tasks, the requ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/33G06F16/36G06F40/295
CPCG06F16/3344G06F16/3346G06F16/367G06F40/295
Inventor 李纪为王树河孙晓飞
Owner 浙江香侬慧语科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products