Text enhancement method, text classification method and related devices

A technology of text and text data, applied in the field of artificial intelligence, deep learning, and machine learning, which can solve problems such as low sample efficiency

Active Publication Date: 2021-06-04
BEIJING TOPSEC NETWORK SECURITY TECH +2
View PDF8 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the embodiments of the present application is to provide a text enhancement method, a text classification method and related devices, which are used to improve the problem of low efficiency in collecting samples of specific categories

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text enhancement method, text classification method and related devices
  • Text enhancement method, text classification method and related devices
  • Text enhancement method, text classification method and related devices

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0043] The implementation of the above-mentioned step S121 includes: the first implementation, obtain the text corpus, and perform word segmentation on the text corpus based on the word segmentation method of grammar and rules, and obtain multiple words. The basic idea is to perform syntactic and semantic analysis while word segmentation , using syntactic information and semantic information to carry out part-of-speech tagging to solve word segmentation ambiguity. In the second implementation mode, the text corpus is obtained, and the text corpus is segmented based on the mechanical word segmentation method (ie dictionary) to obtain multiple words. The principle of the mechanical word segmentation here is to combine the character strings in the document with the words in the dictionary If a string is found in the dictionary, the match is successful and it can be segmented, otherwise it will not be segmented. Mechanical word segmentation methods such as: forward maximum matching...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text enhancement method, a text classification method and a related device. The method comprises the steps: acquiring a statement content in a text corpus, performing word segmentation on the statement content, and obtaining words obtained after word segmentation; screening out similar words of which the similarity with the segmented words exceeds a threshold value from the concept tree, and randomly replacing the statement content by using the similar words to obtain a plurality of statements; training a generative adversarial network by using a plurality of statements to obtain a generative adversarial network model; generating an expansion statement sample by using a generative adversarial network model; and combining the expanded statement sample with the statement content in the text corpus to obtain an enhanced text data set. In the implementation process, the expanded statement sample is generated by using the generative adversarial network model obtained by training, and the generative adversarial network model learns the newly added change rule between the similar words with the similarity exceeding the threshold value in the concept tree, so a specific category sample can be better generated.

Description

technical field [0001] The present application relates to the technical fields of machine learning, artificial intelligence and deep learning, and specifically relates to a text enhancement method, a text classification method and related devices. Background technique [0002] At present, most of the sentences of the text corpus are used to train the neural network model, and a lot of samples are often needed to train the neural network model to achieve better results. However, when the number of text samples of a specific category (for example: prohibited words and prohibited words) used to train the neural network model is small, the correct rate of the trained neural network model is low. In order to increase the recognition accuracy of samples of a specific category, it is common practice to rely on humans to collect as many text sentence samples as possible, or to manually compose more text sentence samples. However, this approach is not only inefficient but also not e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/216G06F40/30G06F16/35G06K9/62G06N3/04G06N3/08
CPCG06F40/289G06F40/30G06F40/216G06F16/35G06N3/08G06N3/045G06F18/22
Inventor 陈龙王炜江军
Owner BEIJING TOPSEC NETWORK SECURITY TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products