Transfer learning method based on latent semantic analysis

A technology of semantic analysis and transfer learning, which is applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems that affect transfer learning, do not consider the semantic correlation between the source domain and the target domain, and improve the time The effect of increasing efficiency, time complexity, and space complexity

Active Publication Date: 2013-06-26
HARBIN ENG UNIV
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, there are some studies on transfer learning methods, most of which only analyze data from the vocabulary level and do not consider the semantic correlation between the source domain and the target domain. Some "noise" ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transfer learning method based on latent semantic analysis
  • Transfer learning method based on latent semantic analysis
  • Transfer learning method based on latent semantic analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention is described in more detail below in conjunction with accompanying drawing example:

[0042] combine figure 1 , the present invention comprises the following steps:

[0043] (1) The training data is processed by removing stop words, stemming, etc., and calculating the vocabulary weights of the source domain and the target domain respectively to obtain a vocabulary-text matrix.

[0044] The calculation method of vocabulary weight (denoted by W(i,j)) includes two parts: text contribution weight (denoted by LW(i,j)) and class label contribution weight (denoted by GET(i)). Finally, the two weights are multiplied together to obtain the final vocabulary weights.

[0045] Text Contribution Weight It emphasizes the importance of a certain word in a certain text. In order to effectively reduce the impact of high-frequency words on latent semantic analysis, the logarithm of word frequency can be used to define the text contribution weight:

[0046] LW(i,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a transfer learning method based on latent semantic analysis. The method includes the following steps: carrying out stop word removal and stemming on training data; calculating the vocabulary weight in a source domain and the vocabulary weight in a target domain respectively to acquire a vocabulary-text matrix M; carrying out singular value decomposition on the matrix M; mapping vocabulary and text in the matrix M to a lower dimension latent semantic space; removing synonymy noise effect from the source domain; adjusting the matrix M structure; finding vocabulary with a high association degree with target domain text in the source domain as transfer terms; adjusting the matrix M structure again; analyzing target domain vocabulary in the adjusted matrix M structure to acquire new character representation of target domain data; acquiring a final classifier in a training dataset; and classifying a testing dataset S.

Description

technical field [0001] What the present invention relates to is a kind of machine learning method. Background technique [0002] With the development of the Internet, more and more information is stored on the network in the form of text, which becomes the source of information for people. Faced with a huge text library, people urgently need an efficient technical means to organize and classify the data in the text library. Machine learning studies how computers simulate human learning behaviors to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their own performance. But machine learning has a very important assumption, that is, the training data and test data must obey the same distribution. This brings great trouble to practical application. When a new field emerges, the data in the sample space is often less and the features are sparse. At this time, using traditional machine learning to classify the data will prod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 初妍陈曼夏琳琳沈洁张健沛杨静王勇高迪王兴梅李丽洁
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products