A Cross-Domain Text Classification Method Based on Adaptive Noise Reduction Encoder

A text classification, cross-domain technology, applied in the field of network text data information for cross-domain classification, can solve the problems of enlarged feature space, sensitivity to noise coefficient, increased text data high-dimensionality, sparsity, etc.

Active Publication Date: 2021-09-14
HEFEI UNIV OF TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Cross-domain classification tasks need to process features of multiple different domains at the same time, which further enlarges the feature space and further exacerbates the high-dimensional and sparse nature of text data, which makes it difficult to select meaningful features and provides a common way for learning. Feature spaces for cross-domain classification pose challenges
[0007] Second, although the edge denoising encoder can learn a relatively robust feature space in cross-domain classification tasks, however, its learning results are sensitive to noise coefficients

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Cross-Domain Text Classification Method Based on Adaptive Noise Reduction Encoder
  • A Cross-Domain Text Classification Method Based on Adaptive Noise Reduction Encoder
  • A Cross-Domain Text Classification Method Based on Adaptive Noise Reduction Encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] see figure 1 , the cross-domain text classification method based on the adaptive noise reduction encoder in this embodiment is carried out in the following steps:

[0053] Step 1: Statistical feature words and their frequency of occurrence in the source and target domains

[0054] Obtain the target domain data set DT and the source domain data set DS with label information respectively,

[0055]

[0056] t i is the i-th sample in the target domain data set DT, no t is the number of samples in the target domain data set DT, Indicates the i-th sample t in the target field data set DT i The a-th feature word in , a=1,2,...,nw t , nw t is the number of characteristic words of samples in the target domain data set DT.

[0057] the s j is the jth sample in the source domain data set DS, no s is the number of samples in the source domain data set DS, w b j Indicates the jth sample s in the source domain data set DS j The b-th feature word in , b=1,2,...,nw ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cross-domain text classification method based on an adaptive noise reduction encoder, which is characterized in that: a feature selection method suitable for cross-domain tasks is used to filter samples in the source domain data set and the target domain data set Feature words with low frequency and meaningless appearing in , and adaptively calculate the optimal noise interference coefficient according to the distribution difference between the samples in the source domain set and the target domain set, and use the optimal noise interference coefficient to analyze the feature space For perturbation, a new feature space is built using the stacked edge denoising encoder approach and a classifier is constructed. The invention can better excavate the relationship between latent features among fields, reduce field differences, and thus can improve the correctness of classification.

Description

technical field [0001] The invention relates to a cross-domain text classification method based on an adaptive noise reduction encoder, and classifies network text data information, more specifically, cross-domain classification for network text data information in different fields and different data distributions . Background technique [0002] In recent years, with the rapid rise of online social platforms such as blogs, WeChat, and Weibo, a large amount of text information has been generated on the Internet. These massive data often contain huge potential commercial value. Information can be used to improve or upgrade products in a targeted manner, so as to meet consumer needs and increase market competitiveness; It will be more favored by consumers. In view of this, research in related fields such as text classification has extremely important value and significance. [0003] However, because the data in the network is affected by multiple factors such as users and ti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289
CPCG06F40/289
Inventor 张玉红杨帅李玉玲李培培
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products