Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A character interest extraction method based on a long text

An extraction method and long text technology, applied in the field of personalized recommendation for social media users, can solve the problems of inability to classify topics of interest, inability to obtain topic vector space, etc., and achieve the effect of improving the accuracy of extraction results

Inactive Publication Date: 2019-05-28
四川易诚智讯科技有限公司 +1
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it cannot correspond to a predefined topic of interest classification, and cannot obtain a low-latitude topic vector space

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A character interest extraction method based on a long text
  • A character interest extraction method based on a long text
  • A character interest extraction method based on a long text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] First, briefly describe the prior art involved in the present invention:

[0043] 1. Word2Vec word vector model

[0044] The Word2Vec word vector model is one of the neural network probabilistic language models. According to the language model, it is divided into two models: CBOW model and Skip-gram model. like figure 1 As shown, the left is the CBOW model, and the right is the Skip-gram model. Both models are divided into three layers: the input layer, the projection layer and the output layer. The former is to predict the current probability under the premise that the context word probability of the current word is known, and the latter is to predict the probability of the context word by knowing the probability of the current word. The following mainly introduces the CBOW model. figure 1 The input is the input layer, the projection is the projection layer, and the output is the output layer.

[0045] The input layer of the CBOW model inputs the word vectors of a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a character interest extraction method based on a long text. The method comprises the steps of S1, text preprocessing; S2, respectively extracting keywords of the same text from the text processed in the step S1 by adopting TextRank and RAKE; S3, predicting the text processed in the step S1 by adopting the trained LDA model to obtain a text topic distribution condition; S4,adopting a trained Word2Vec model for each keyword extracted in the step S2, and converting the Word2Vec model into a word vector; Adding the word vectors corresponding to the keywords into the textfeature vectors; S5, adding the text topic distribution condition obtained in the step S3 into the text feature vector obtained in the step S4; S6, extracting a character interest classification result in the text by adopting the trained binary classification support vector machine; According to the method, the final extraction result accuracy can be effectively improved.

Description

technical field [0001] The invention belongs to the field of data mining, and in particular relates to a personalized recommendation technology for social media users. Background technique [0002] With the rapid development of computer and Internet application technology, human beings have leaped from the era of difficult access to information to the era of information overload in just a few decades. In this era of information explosion, humans can access and obtain a variety of news, information, etc. through various applications, but this explosive growth of information makes it more difficult for consumers to obtain the information they are interested in. Consumers You will end up being plagued by a lot of information, unable to find the information that is really interesting. For producers, the problem of how to make their products stand out from the many products is even more serious. [0003] In recent years, personalized recommendation has become more and more popu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9535G06F16/35
Inventor 占梦来张磊张军罗爽邹佩良
Owner 四川易诚智讯科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products