Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cross-field text classification method based on self-adaptive noise reduction encoder

A text classification and cross-domain technology, applied in the field of cross-domain classification of network text data information, can solve problems such as enlarged feature space, difficult selection of meaningful features, and sensitive noise coefficients

Active Publication Date: 2018-11-20
HEFEI UNIV OF TECH
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Cross-domain classification tasks need to process features of multiple different domains at the same time, which further enlarges the feature space and further exacerbates the high-dimensional and sparse nature of text data, which makes it difficult to select meaningful features and provides a common way for learning. Feature spaces for cross-domain classification pose challenges
[0007] Second, although the edge denoising encoder can learn a relatively robust feature space in cross-domain classification tasks, however, its learning results are sensitive to noise coefficients

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-field text classification method based on self-adaptive noise reduction encoder
  • Cross-field text classification method based on self-adaptive noise reduction encoder
  • Cross-field text classification method based on self-adaptive noise reduction encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] see figure 1 , the cross-domain text classification method based on the adaptive noise reduction encoder in this embodiment is carried out in the following steps:

[0053] Step 1: Statistical feature words and their frequency of occurrence in the source and target domains

[0054] Obtain the target domain data set DT and the source domain data set DS with label information respectively,

[0055]

[0056] t i is the i-th sample in the target domain data set DT, no t is the number of samples in the target domain data set DT, Indicates the i-th sample t in the target field data set DT i The a-th feature word in , a=1,2,...,nw t , nw t is the number of characteristic words of samples in the target domain data set DT.

[0057] the s j is the jth sample in the source domain data set DS, no s is the number of samples in the source domain data set DS, w b j Indicates the jth sample s in the source domain data set DS j The b-th feature word in , b=1,2,...,nw ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cross-field text classification method based on a self-adaptive noise reduction encoder. The method includes: using a feature selection method suitable for use in a cross-field task to filter out feature words with low appearing frequency and meaninglessness in samples in a source-field data set and a target-field data set, calculating a better noise interference coefficient in a self-adaptive manner according to distribution difference between the samples in the source-field set and the target-field set, using the better noise interference coefficient to interfere with feature space, using a stacking edge noise reduction encoder method to construct new feature space, and constructing a classifier. According to the method, relationships between potential featuresbetween fields can be better mined, field difference can be reduced, and thus correctness of classification can be improved.

Description

technical field [0001] The invention relates to a cross-domain text classification method based on an adaptive noise reduction encoder, and classifies network text data information, more specifically, cross-domain classification for network text data information in different fields and different data distributions . Background technique [0002] In recent years, with the rapid rise of online social platforms such as blogs, WeChat, and Weibo, a large amount of text information has been generated on the Internet. These massive data often contain huge potential commercial value. Information can be used to improve or upgrade products in a targeted manner, so as to meet consumer needs and increase market competitiveness; It will be more favored by consumers. In view of this, research in related fields such as text classification has extremely important value and significance. [0003] However, because the data in the network is affected by multiple factors such as users and ti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/289
Inventor 张玉红杨帅李玉玲李培培
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products