Nested entity data identification method and device, and electronic equipment

An entity data, entity technology, applied in the field of data recognition, can solve the problems affecting the efficiency and accuracy of nested entity data recognition, difficult to divide the thickness and granularity of entities, large time and labor costs, etc., to save time and labor costs, The effect of improving recognition efficiency and accuracy, optimizing process and workload

Active Publication Date: 2021-01-22
BEIJING PERFECT WORLD SOFTWARE TECH DEV CO LTD
View PDF5 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, this type of entity data labeling method is difficult to divide the granularity of entities, and multi-level BIO labeling requires a lot of time and labor costs. I

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Nested entity data identification method and device, and electronic equipment
  • Nested entity data identification method and device, and electronic equipment
  • Nested entity data identification method and device, and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Hereinafter, the present application will be described in detail with reference to the drawings and embodiments. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

[0029] In order to improve the current BIO entity data labeling method, it is difficult to divide the granularity of entities, and multi-level BIO labeling requires a lot of time and labor costs, and it is difficult to process a large amount of effective labeling data in a short period of time, which will affect the nested entity data. Identify technical issues of efficiency and accuracy. This embodiment provides a method for identifying nested entity data, such as figure 1 As shown, the method includes:

[0030] 101. Arranging and combining seed entity vocabulary of different entity categories to generate a short text data set.

[0031] First determine the entity category. This embodiment ca...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a nested entity data identification method and device, and electronice equipment, and relates to the technical field of data identification. The method comprises the steps of permutating and combining seed entity vocabularies of different entity categories to generate a short text data set; defining at least one entity category label for a short text in the short text dataset, and index information of starting and ending of a sub-text, corresponding to each entity category label, in the short text; training a deep learning recognition model by using the defined short text data set as a training set; and recognizing the nested entity data by using the recognition model which is trained to reach the standard. According to the method, the entity labeling information is defined for the statement according to the start and end indexes and the entity category label, so that the multi-nested entity content labeling is simpler to implement, the labeling process and workload of nested entity recognition are optimized, the time cost and the labor cost are saved. And thus, the identification efficiency and accuracy of the nested entity data can be improved.

Description

technical field [0001] The present application relates to the technical field of data identification, in particular to an identification method, device and electronic equipment for nested entity data. Background technique [0002] Named Entity Recognition (NER) is an important research direction in the field of natural language processing. It refers to the recognition of entities with specific meaning in text, mainly including names of people, places, institutions, and proper nouns. With the development of deep learning technology and the needs of actual production applications, the requirements for named entity recognition are also increasing. When using entities for search support, fine-grained and nested entity information is required to ensure the accuracy and coverage of searches. At this stage, the main technology used for named entity recognition is deep learning technology. [0003] When using deep learning technology for entity recognition, a large amount of labele...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/242G06F40/295G06F16/31G06F16/33G06F16/35G06N3/04G06N3/08
CPCG06F40/295G06F40/242G06F16/316G06F16/3344G06F16/35G06N3/049G06N3/08G06N3/045
Inventor 于淼刘炎覃建策陈邦忠
Owner BEIJING PERFECT WORLD SOFTWARE TECH DEV CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products