Negative sample sampling method and device, text processing method and device, equipment and medium

A negative sample and text technology, applied in the fields of equipment and media, negative sample sampling methods, devices, and text processing methods, can solve the problem that negative samples cannot be applied to professional fields, etc.

Pending Publication Date: 2020-12-08
ALIBABA GRP HLDG LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Embodiments of the present invention provide a negative sample sampling method, text processing method, device, equipment, and medium, which can solve the problem that negative samples sampled in general fields cannot be applied to professional fields

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Negative sample sampling method and device, text processing method and device, equipment and medium
  • Negative sample sampling method and device, text processing method and device, equipment and medium
  • Negative sample sampling method and device, text processing method and device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The characteristics and exemplary embodiments of various aspects of the present invention will be described in detail below. In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described here are only configured to explain the present invention, not to limit the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is only to provide a better understanding of the present invention by showing examples of the present invention.

[0036] It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a negative sample sampling method and device, a text processing method and device, equipment and a medium. The negative sample sampling method comprises the steps of obtaininga text corpus; performing word segmentation on the text corpus to obtain a word segmentation result of the text corpus; obtaining a plurality of text fragments from the text corpus to serve as negative samples of the text corpus; wherein the text fragment comprises a text fragment formed by a single character and/or a text fragment formed by a plurality of characters; wherein each text segment inthe plurality of text segments is different from each word in the word segmentation result. According to the embodiment of the invention, the sampled negative sample can be suitable for the field to which the text corpus belongs.

Description

technical field [0001] The invention belongs to the field of computers, and in particular relates to a negative sample sampling method, a text processing method, a device, equipment and a medium. Background technique [0002] In the specific business scenarios of natural language processing, it is often necessary to segment texts in some professional fields. In order to segment the text in the professional field, it is necessary to rebuild the complex neural network word segmentation model used in the training of the general field tokenizer. [0003] Most of the negative samples when rebuilding the word segmentation model come from the general news fields such as People’s Daily. However, if the negative samples sampled in these general fields are used to train the word segmentation device, the trained word segmentation device often performs poorly in some professional fields, that is, in general news. Negative samples from domain sampling cannot be applied to some professio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/284G06F40/247
Inventor 叶宇潇邱立坤付彬邓拯宇李杨
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products