Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for realizing text categorical dataset expansion based on generative adversarial network

A text classification and network implementation technology, applied in the field of big data analysis, can solve problems such as unfairness, achieve the effect of improving classification accuracy and facilitating scientific research

Inactive Publication Date: 2018-11-27
WUHAN UNIV
View PDF8 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In this way, the seemingly fair treatment in the classification actually has a certain degree of unfairness in practical applications.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for realizing text categorical dataset expansion based on generative adversarial network
  • Method for realizing text categorical dataset expansion based on generative adversarial network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0033] Such as figure 1 As shown, the present invention provides a method for expanding text classification datasets based on generative confrontation networks,

[0034] The method of the present invention is feasible for all text classifications, but for the convenience of expression, we use the data set of extended microblog user interest classification as an example to illustrate.

[0035] According to the statistics and screening of the news types of large portal websites, we assume that the news categories are the same as the interest categories of Weibo users, and determine the interest categories of Weibo users.

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for realizing text categorical dataset expansion based on a generative adversarial network. The method comprises following steps: determining the original data category requiring data expansion; preprocessing the corresponding data and representing the data as the form of the word vector matrix by using word2vec and TFID; and generating the expanded matrix vectorsby using the generative adversarial network and combining the matrix vectors of the original data and the expanded matrix vectors so as to realize dataset expansion. According to the method, the rarecategory data are expanded so as to be beneficial for the researcher to expand the dataset that is difficult to obtain, improve the classification accuracy of the rare category data and facilitate scientific research.

Description

technical field [0001] The invention relates to big data analysis technology, in particular to a method for expanding text classification data sets based on generative confrontation networks. Background technique [0002] With the rapid development of the Internet, the amount of information on the network is increasing exponentially, among which unstructured text data accounts for the largest proportion. How to mine the information that users are interested in from the numerous text data has become increasingly important. Text classification is the basis of network text data mining. Whether the classification result is good or bad directly affects the text data mining effect. Therefore, how to build a text classification algorithm with high accuracy is the focus of network information data mining research. [0003] For the text classification problem, domestic and foreign scholars and experts have invested a lot of time and energy and conducted in-depth research. Generally,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 崔晓晖田斐菡杨威关景曹佳敏唐艺豪李启琛
Owner WUHAN UNIV