Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text sentiment analysis method based on five stroke type code character level language model

A language model and Wubi font technology, applied in natural language data processing, electrical digital data processing, special data processing applications, etc., can solve problems such as unsatisfactory experimental results, achieve excellent Chinese emotion classification effects, and save labor costs. Effect

Inactive Publication Date: 2018-09-28
CHENGDU REMARK TECH CO LTD +1
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, unlike English, in the case of no word segmentation, inputting each Chinese character as a character into a character-level language model will cause many problems, and the experimental results are not ideal compared with English corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text sentiment analysis method based on five stroke type code character level language model
  • Text sentiment analysis method based on five stroke type code character level language model
  • Text sentiment analysis method based on five stroke type code character level language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0064] A text sentiment analysis method based on the character-level language model of Wubi font code, such as figure 1 As shown, it mainly includes the following steps:

[0065] Step E1: Preprocessing the sample text, converting the Chinese characters in the sample text into Wubi font codes;

[0066] Step E2: Build an index dictionary for the characters in the preprocessed sample text in step E1, and map them into random dense vectors of different fixed lengths; the sample text is finally converted into a sequence of character vectors for training character-level language models ;

[0067] Step E3: use the mLSTM model to train a character-level language model; use the character vector at the previous moment in the character vector sequence generated in step E2 to predict the character vector at the next moment in the sequence;

[0068] Step E4: mark the emotional category of the sample text, and use the character-level language model in step E3 to extract the features of th...

Embodiment 2

[0072] This embodiment is further optimized on the basis of Embodiment 1, an intermediate state variable m is introduced into the mLSTM model t , the m t The calculation formula is as follows:

[0073] m t =(W mx x t )⊙(W mh x t-1 ) (1)

[0074] In the mLSTM model The calculation formula is as follows:

[0075]

[0076] i t =σ(W ix x t +W im m t ) (3)

[0077] o t =σ(W ox x t +W om m t ) (4)

[0078] f t =σ(W fx x t +W fm m t ) (5)

[0079]

[0080] h t =o t ⊙tanh(C t ) (7)

[0081] The mLSTM model can handle more complex state transitions between consecutive characters, forming a flexible input-dependent processing mechanism.

[0082] The present invention converts Chinese characters in Chinese text into Wubi glyph codes, and the character-level language model retains more original information; the present invention uses the mLSTM model to train the character-level language model, which can handle more complex state transitions between con...

Embodiment 3

[0085] The present embodiment is further optimized on the basis of embodiment 2. In the step E4, the emotion is divided into two kinds of emotion categories, positive emotion and negative emotion, and the sample text is selected and marked with the emotion category; the sample text is input into the characters in the step E3 Level language model, classification algorithm uses logistic regression, the formula of described logistic regression is as follows:

[0086] y=Logit(tanh(C T )) (8)

[0087] Among them, assuming that there are T moments, the memory neuron C of the last moment T T is a summary of the sample text, and tan(C T ) as features for sentiment classification to train a binary classifier.

[0088] The present invention converts Chinese characters in Chinese text into Wubi glyph codes, and the character-level language model retains more original information; the present invention uses the mLSTM model to train the character-level language model, which can handle m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text sentiment analysis method based on a five stroke type code character level language model. In the method, Chinese characters in a Chinese text are converted into five stroke type codes, and a character level language model keeps more original information. An mLSTM (Multiplicative Long Short-Term Memory) model is used for training the character level language model, more complex state transition among continuous characters can be processed, and a flexible input dependency processing mechanism is formed. The five stroke type code is used for training the characterlevel language model, a sentiment neuron extracted through the character level language model is taken as the characteristic of a Chinese sentiment classifier, and an excellent Chinese sentiment classification effect is realized. Meanwhile, the method can finish model training without a great quantity of manual annotation, a great quantity of labor cost can be saved, and the method can be suitablefor sentiment classification of different fields.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a text emotion analysis method based on a character-level language model of a Wubi font code. Background technique [0002] Automatic analysis of text sentiment data has always been an important research and application technology field of artificial intelligence and natural language processing technology. Its main function is to automatically analyze and learn the emotional expression contained in text data through natural language processing technology and emotional computing technology. , classify sentences into positive, negative, and neutral categories according to their emotional expressions. With the continuous development of information technology and the continuous deepening of the application scope of information systems, automatic sentiment analysis of text data is an important technical measure to improve the production efficiency and c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/242
Inventor 蒋欣辰
Owner CHENGDU REMARK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products