A Chinese Text Feature Extraction Method Fused with Text Mood

A feature extraction and text technology, applied in the field of Chinese text feature extraction, can solve the problems of high cost, sparseness, and high latitude of text representation

Inactive Publication Date: 2020-10-16
YUNNAN UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The number of texts from the Internet, e-commerce and other fields increases sharply every day. It will cost a lot of money to process and understand these massive text data manually.
In order to quickly and efficiently mine useful knowledge patterns in massive texts, it is a better choice to process and understand texts based on artificial intelligence-related technologies; the key to intelligent analysis of massive texts is to effectively represent the semantic features of texts, the most commonly used The text representation method is the Bag of Words (BOW) model. Although the bag of words model is simple and practical, the text representation is often high in latitude and sparse.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese Text Feature Extraction Method Fused with Text Mood
  • A Chinese Text Feature Extraction Method Fused with Text Mood
  • A Chinese Text Feature Extraction Method Fused with Text Mood

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] Specific embodiments of the present invention will be described below in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention.

[0045] figure 1 A kind of Chinese text feature extraction method of fusion text tone comprises: step (1), massive text word set and tone word set generation, generate the word, tone word set of each text by text set, text tone word set; (2 ), word embedding model construction, obtain text feature vectors and modal particle feature vectors by training Skip-gram and CBOW models; (3), text word representation model construction, generate contextual semantic features of words in each text through Bi-LSTM layer , and then combine the initialized word vector to generate the local feature vector of the text, and then obtain the global feature in the middle of the text through 2-dimensional convolution and 1-dimensional pooling; (4), text representation model construction

[0046] T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese text feature extracting method with a text mood fusion function. By means of the method, it is achieved that the text feature representation fusing mood features, syntax features and semanteme features is obtained in a lengthened text. The method comprises the steps that firstly, a text word set and a mood word set are constructed, the text word set and the mood word set are transformed into word embedding forms respectively, and corresponding vector models are obtained; secondly, according to the text word embedding represented time step dimensions and feature dimensions, text features are screened, the mood features are fused into the time step dimension of the selected text feature, and the text feature representation which accurately represents the semanteme is obtained. According to the method, the contributions of modal particles to the text semanteme are fully utilized to fuse the mood features, the syntax features and the semanteme features into the text feature representation, and the text feature representation is low in dimension and continuous so that the text semanteme can be better represented, and natural language processing tasks, such as text analysis, language translation and relation extraction, can be better effectively supported.

Description

technical field [0001] The invention belongs to the field of natural language processing, and relates to a Chinese text feature extraction method that integrates text mood; based on massive Chinese texts, the Chinese mood features are integrated into the text features to better represent the semantics of the Chinese text. Background technique [0002] The number of texts from the Internet, e-commerce and other fields increases sharply every day. Manual processing and understanding of these massive text data will cost a lot of money and the losses outweigh the benefits. In order to quickly and efficiently mine useful knowledge patterns in massive texts, it is a better choice to process and understand texts based on artificial intelligence related technologies; the key to intelligent analysis of massive texts is to effectively represent the semantic features of texts, the most commonly used The text representation method is the Bag of Words (BOW) model. Although the bag of wor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/30
CPCG06F40/30
Inventor 郭延哺金宸姬晨邓春云李维华王顺芳
Owner YUNNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products