Unlock instant, AI-driven research and patent intelligence for your innovation.

Text sampling method and device for improving annotation efficiency

A text and efficiency technology, applied in the field of text sampling methods and devices for improving labeling efficiency, can solve the problems of high cost, low information volume, poor versatility, etc., and achieve the effect of improving efficiency, improving information volume, and improving model performance

Pending Publication Date: 2020-12-11
BEIJING XUEZHITU NETWORK TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1. As a result, the final extracted data for labeling is relatively similar, and the amount of information carried between texts is low, resulting in a reduction in the labeling effect;
[0007] 2. It is necessary to determine the concept set, which has poor versatility and cannot effectively deal with scenarios without concept keywords / regularity;
[0008] 3. It is necessary to divide attribute tags and emotions, which has poor versatility and cannot effectively handle texts with atypical relationships between attribute tags and emotions, and it is more complicated to deal with the relationship between attribute tags and emotions, and the cost is significantly higher than that commonly used in this field label cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text sampling method and device for improving annotation efficiency
  • Text sampling method and device for improving annotation efficiency
  • Text sampling method and device for improving annotation efficiency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0053] As used herein, "comprising", "comprising", "having", "comprising" and so on are all open terms, meaning including but not limited to.

[0054] The term "plurality" herein includes "two" and "more than two".

[0055] Please refer to figure 1 , figure 1 It is a flow chart of the device model identification method of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text sampling method and device for improving annotation efficiency. The sampling method comprises the steps: S1, acquiring an original text set; s2, extracting a vector of each original text through a vectorization model according to the original text; s3, judging the degree of the original text piece according to the vector of each original text, and removing the original text with high similarity from the original text set to obtain a final text set; and S4, in the final text of the final text set, performing random sampling to obtain a final annotation text. Therefore, the information amount of text sampling is effectively improved, the text annotation efficiency is improved, and the final model performance is improved.

Description

technical field [0001] The present invention relates to a text sampling method and device, in particular to a text sampling method and device for improving labeling efficiency. Background technique [0002] Natural language processing (Natural Language Processing, NLP) technology can efficiently systematically analyze, understand and extract text data, so that computers can understand natural language and generate natural language, and then realize effective communication between humans and computers using natural language. Interaction (e.g. use of applications such as message auto-reply, voice assistant, etc.). [0003] In order to realize the industrial application of natural language processing, it is necessary to manually label a part of the text data in some scenarios for model training. For example, for Weibo data, mark "XX is so handsome" as a female user's Weibo, and "Girlfriend's birthday, what gift would you like to send" as a male user's Weibo, and then use natur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/194G06F40/117G06K9/62
CPCG06F40/194G06F40/117G06F18/22
Inventor 卫海天丁若谷
Owner BEIJING XUEZHITU NETWORK TECH