Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A text enhancement method, text classification method and related device

A text and text data technology, applied in the field of artificial intelligence, deep learning, and machine learning, can solve problems such as low sample efficiency

Active Publication Date: 2022-04-01
BEIJING TOPSEC NETWORK SECURITY TECH +2
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the embodiments of the present application is to provide a text enhancement method, a text classification method and related devices, which are used to improve the problem of low efficiency in collecting samples of specific categories

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text enhancement method, text classification method and related device
  • A text enhancement method, text classification method and related device
  • A text enhancement method, text classification method and related device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0043] The implementation of the above-mentioned step S121 includes: the first implementation, obtain the text corpus, and perform word segmentation on the text corpus based on the word segmentation method of grammar and rules, and obtain multiple words. The basic idea is to perform syntactic and semantic analysis while word segmentation , using syntactic information and semantic information to carry out part-of-speech tagging to solve word segmentation ambiguity. In the second implementation mode, the text corpus is obtained, and the text corpus is segmented based on the mechanical word segmentation method (ie dictionary) to obtain multiple words. The principle of the mechanical word segmentation here is to combine the character strings in the document with the words in the dictionary If a string is found in the dictionary, the match is successful and it can be segmented, otherwise it will not be segmented. Mechanical word segmentation methods such as: forward maximum matching...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The application provides a text enhancement method, a text classification method and related devices. The method includes: obtaining the sentence content in the text corpus, and performing word segmentation on the sentence content to obtain the word after word segmentation; Similar words among the words whose similarity exceeds the threshold, and use similar words to randomly replace the sentence content to obtain multiple sentences; use multiple sentences to train the generation confrontation network, and obtain the generation confrontation network model; use the generation confrontation network model to generate extended sentences Sample; combine the expanded sentence sample with the sentence content in the text corpus to obtain an enhanced text data set. In the above implementation process, by using the generated adversarial network model obtained by training to generate extended sentence samples, since the generated adversarial network model learns the new change rules between similar words whose similarity exceeds the threshold in the concept tree, it can better to generate class-specific samples.

Description

technical field [0001] The present application relates to the technical fields of machine learning, artificial intelligence and deep learning, and specifically relates to a text enhancement method, a text classification method and related devices. Background technique [0002] At present, most of the sentences of the text corpus are used to train the neural network model, and a lot of samples are often needed to train the neural network model to achieve better results. However, when the number of text samples of a specific category (for example: prohibited words and prohibited words) used to train the neural network model is small, the correct rate of the trained neural network model is low. In order to increase the recognition accuracy of samples of a specific category, it is common practice to rely on humans to collect as many text sentence samples as possible, or to manually compose more text sentence samples. However, this approach is not only inefficient but also not e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F40/216G06F40/30G06F16/35G06K9/62G06N3/04G06N3/08
CPCG06F40/289G06F40/30G06F40/216G06F16/35G06N3/08G06N3/045G06F18/22
Inventor 陈龙王炜江军
Owner BEIJING TOPSEC NETWORK SECURITY TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products