Method of transfer learning from long text to short text

A technology of transfer learning and short text, which is applied in the field of transfer learning from long text to short text, can solve problems such as difficulty in obtaining, negative impact of target tasks, and subjectivity errors, and achieve the goals of improving classification accuracy, enriching data, and reducing impact Effect

Active Publication Date: 2013-09-25
HARBIN ENG UNIV
View PDF2 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, there are some studies on transfer learning methods from long texts to short texts, but they often require source domain data related to short texts in the target domain, so that data acquisition and domain correlation measurement will be due to human Subjectivity produces some errors, which negatively affects the target task; there are also some studies that require the prior probability distribution of the data to be known before long text to short text transfer learning, which is difficult to obtain in practice

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method of transfer learning from long text to short text
  • Method of transfer learning from long text to short text
  • Method of transfer learning from long text to short text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] Since the data in the target field is short and only a small number of labels can be provided, the first thing to do in the present invention is to expand the label set of the target text, which is called the seed feature set.

[0036] Step 1: According to the tags extracted from the short text in the target field, the source field data is obtained through a search engine, and the seed feature set of the source field is extracted, which specifically includes the following steps:

[0037] Step 1.1: The present invention does not need to prepare the source domain data in advance, but makes full use of online information on the Internet, inputs the tags extracted in the target domain as keywords into a certain search engine, and extracts the text of the first few pages of webpages as the semantics of the target domain. Related source domain datasets.

[0038] Step 1.2: Construct word-text matrix: M=[a ij ] m×n , where a ij The value is the logarithm of the number of occ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method of transfer learning from a long text to a short text. The method of the transfer learning from the long text to the short text is characterized by comprising step 1, obtaining data of a source domain according to tags extracted from a short text of an target domain, and extracting a seed feature set of the source domain; step 2, creating an undirected graph of social media according to a tag set of the short text of the target domain and the seed feature set of the source domain, and extracting subgraphs containing all the nodes of the tag set and the seed feature set of the target domain from the undirected graph; step 3, obtaining a new feature representation of the data of the source domain on the basis of a Laplacian Eigenmap algorithm; step 4, classifying the data of the source domain according to the new feature representation of the data of the source domain.

Description

technical field [0001] The invention relates to a long text to short text transfer learning method. Background technique [0002] With the rapid development of science and technology, Internet information is becoming more and more diverse, and short texts such as Weibo, QQ messages, and online advertisements are playing an increasingly important role in network applications. Short text data has the characteristics of few keywords, lack of context information, high-dimensional and sparse text representation, and it is difficult to fully and accurately express text features. When the target field is short text data and there is only a small amount of labeled data, it becomes a big problem to make statistics and classification of short text database information. Compared with short texts, due to the longer length of long texts, the context will carry more keywords related to the subject of the text, and the long texts appear earlier in the network, and their classification tec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 初妍陈曼夏琳琳沈洁王勇杨悦张健沛杨静赵芳丹
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products