Text recommendation method and device based on contents and user behaviors

A recommendation method and text technology, applied in special data processing applications, instruments, electrical and digital data processing, etc., can solve the problems of not meeting the individual needs of users, unable to solve information overload, and unable to obtain different results.

Inactive Publication Date: 2016-12-21
LANGCHAO ELECTRONIC INFORMATION IND CO LTD
View PDF6 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, current search engines can only perform matching searches based on the characters entered by users. When using the same keyword to search for information, the results obtained are the same, and different results cannot be obtained according to diverse search needs.
On the other hand, informatio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text recommendation method and device based on contents and user behaviors
  • Text recommendation method and device based on contents and user behaviors
  • Text recommendation method and device based on contents and user behaviors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] see figure 1 As shown, the method provided by Embodiment 1 of the present invention includes steps:

[0053] In step S110, the document collection to be analyzed is obtained, and Chinese word segmentation is performed on the documents in the document collection to obtain multiple terms.

[0054] Step S111, perform information gain calculation on the terms in the document collection, sort and filter multiple terms according to the size of the information gain as the reference vector.

[0055] Step S112, according to the reference vector, convert the text in the document collection into a multi-dimensional space vector model.

[0056] Step S113, perform TF-IDF calculation on the space vector model to obtain a text vector matrix.

[0057] Step S114, calculating the similarity between different text vector matrices to form a document relationship matrix.

[0058] Step S115, analyzing the user behavior data, combining with the document relationship matrix, forming a recom...

Embodiment 2

[0088] For the flow of the method for text recommendation based on content and user behavior provided in Embodiment 2 of the present invention, please refer to figure 2 shown, including:

[0089] Step S210, obtain the initial document set, which is RDBMS or text.

[0090] Step S211, using a tokenizer to perform Chinese word segmentation on the initial document set.

[0091] Step S212, using a noun filter to filter nouns to obtain a noun set.

[0092] In step S213, the document frequency statistics are performed and stored in redis, and then steps S214 and S217 are entered.

[0093] Step S214, perform an inverted index and store the index result in redis, and go to step S221.

[0094] In step S215, word frequency statistics are performed and stored in redis, and then steps S216 and S217 are entered.

[0095] Step S216, perform forward indexing of the document, and then proceed to step S219.

[0096] Step S217, perform IG calculation.

[0097] In step S218, the feature wo...

Embodiment 3

[0108] An embodiment of the present invention further provides an apparatus for text-based recommendation based on content and user behavior, see Image 6 shown, including:

[0109] A word segmentation module, used to obtain the document set to be analyzed, and perform Chinese word segmentation on the documents in the document set to obtain a plurality of terms;

[0110] The IG calculation module is used to perform information gain calculation on the terms in the document set, and sort and filter multiple terms as reference vectors according to the size of the information gain;

[0111] A dimensionality reduction module for converting the text in the document collection into a multi-dimensional space vector model according to the reference vector;

[0112] The TF-IDF calculation module is used to perform TF-IDF calculation on the space vector model to obtain a text vector matrix;

[0113] The similarity calculation module is used to calculate the similarity between different t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text recommendation method and device based on contents and user behaviors. The method comprises the steps of obtaining a to-be-analyzed document set and carrying out Chinese word segmentation on documents in the document set, thereby obtaining multiple word items; carrying out information gain calculation on the word items in the document set, sorting the word items according to the magnitude of information gain volume and screening multiple word items as reference vectors; converting texts in the document set into multidimensional spatial vector models according to the reference vectors; carrying out TF-IDF (Term Frequency-Inverse Document Frequency) calculation on the spatial vector models, thereby obtaining text vector matrixes; calculating similarities among different text vector matrixes, thereby forming a document relationship matrix; and analyzing user behavior data and forming a recommendation list for a user through combination of the document relationships matrix. The device comprises a word segmentation module, an IG calculation module, a dimension reduction module, a TF-IDF calculation module, a similarity calculation module and a recommendation module. According to the method and the device, the user text content recommendation effectiveness can be improved.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to a text recommendation method and device based on content and user behavior. Background technique [0002] The emergence and popularization of the Internet has brought a large amount of information to users, which has met the needs of users for information in the information age. Sometimes you can't get the part of information that is really useful to you, and the efficiency of using information is reduced. This is the so-called information overload problem. [0003] At present, one of the solutions to the problem of information overload is information retrieval systems represented by search engines, which play an extremely important role in helping users obtain network information. However, current search engines can only perform matching searches based on the characters input by users. When using the same keyword to search for information, the results obtained are the same,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3344G06F16/3347G06F16/335G06F16/338G06F16/9535G06F40/216G06F40/242G06F40/284
Inventor 张达亓开元苏志远
Owner LANGCHAO ELECTRONIC INFORMATION IND CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products