LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method

A recommendation method, a technology of Chinese herbal medicine, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of similarity that cannot be found in text mining, and achieve the effect of fast and efficient similarity recommendation

Active Publication Date: 2014-05-28
ZHEJIANG UNIV
View PDF3 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to overcome the defect that the above-mentioned existing methods can only perceive the superficial meaning of the text and cannot further mine the similarity at the hidden semantic level of the text, and provide a method for recommending similar documents of Chinese herbal medicine based on LDA and VSM

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method
  • LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method
  • LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The present invention is based on the Chinese herbal medicine similar document recommendation method of LDA and VSM, comprises the following steps:

[0017] 1. For the established Chinese herbal medicine literature database, for each document, based on the special dictionary of Chinese herbal medicine, IKAnalyzer is used to segment the document, filter out stop words, adjectives, prepositions and other useless terms, and keep verbs and nouns. After the word segmentation is completed, the word vector space of the entire Chinese herbal medicine literature database is constructed, and the constructed word vector space is numbered word by word to obtain a mapping dictionary.

[0018] 2. Vectorize each document based on the mapping dictionary to form a parameterized word vector, and then integrate the word vectors of all documents to form a "document-word" matrix.

[0019] 3. For the "document-word" matrix, set the hyperparameters α and β, use the topic model LDA to train, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method. The method includes: adopting an IKAnalyzer to perform word segmentation on topics and summary information of literature on the basis of a terminological dictionary for Chinese herbs, constructing a vector space, performing dimensionality reduction on the vector space, constructing a semantic dictionary, numbering all lexical items in the dictionary in sequence, performing vectorization through each document on the basis of the semantic dictionary, constructing term vectors of each document, utilizing LDA and a Gibbs sampling algorithm to perform training to obtain probability distribution of each document on themes, then computing a value of similarity between every two documents by the aid of KL divergence, computing cosine similarity of the term vectors of each document on the basis of term frequency, performing joint weighting on the two kinds of similarities prior to performing similarity sorting, and then making recommendation. By the method, the literature, similar both in content and theme, in the Chinese herb literature can be recommended to users, and recommendation results are closer to user requirements.

Description

technical field [0001] The invention relates to the technical field of computer similar literature recommendation, in particular to a method for recommending Chinese herbal medicine similar literature based on LDA (Latent Dirichlet Allocation) and VSM (Vector Space Model, vector space model). Background technique [0002] When users search for documents and view detailed information, they are often not satisfied with the information provided by one document, but also hope to view other documents with similar content. At this point, it is necessary to recommend documents similar in content to the current document to the user. [0003] Most of the traditional document similarity recommendation methods are based on the calculation of literal text content similarity. For example, the similarity calculation method based on TF-IDF is a very common method, but this type of algorithm has some defects, such as only being able to perceive the superficial meaning of the text, and unab...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3322G06F16/334
Inventor 张引魏宝刚庄越挺凌超申晨张月娇
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products