ITQ algorithm-based Indonetic similar news recommendation method

A recommendation method and news technology, applied in the computer field, can solve the problems of no more consideration information, high overhead, low utilization rate of news information, etc., and achieve the effect of good effect, small dimension, and reduction of calculation amount and memory overhead.

Active Publication Date: 2019-07-09
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantages of this method are as follows: word frequency-inverse document frequency first vectorizes the news, that is, converts the news into a one-dimensional numerical vector with the same dimension, and performs similar recommendation on the basis of the news vector
The dimension of this vector is very large. Even if some vocabulary filtering methods are used t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • ITQ algorithm-based Indonetic similar news recommendation method
  • ITQ algorithm-based Indonetic similar news recommendation method
  • ITQ algorithm-based Indonetic similar news recommendation method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0049] Example

[0050] like figure 1 Shown, the present invention discloses a kind of Indonesian language similar news recommendation method based on ITQ algorithm, and its implementation method is as follows:

[0051] (S1) Crawl the Indonesian news data, extract the title and text in each Indonesian news, and save it in the field corresponding to the Indonesian news;

[0052] (S2) according to the Indonesian news data training Word2Vec model, obtain the mapping dictionary of news to vector, it comprises the steps:

[0053] (a1) According to the crawled Indonesian news data, get the most frequently used 100,000 words, and use the Word2Vec model to calculate the word embedding;

[0054] (a2) converting each piece of news into a vector representation according to the word embedding, thereby obtaining a mapping dictionary from news to vectors;

[0055] Described step (S2) also comprises the preprocessing to Indonesian news, and it comprises the following steps:

[0056] (b1)...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an ITQ algorithm-based Indonesian similar news recommendation method, which comprises the following steps of: firstly, extracting a title and a text in each Indonesian news, andstoring the title and the text in fields corresponding to the Indonesian news; training a Word2Vec model according to the Indonesian news data to obtain a news-to-vector mapping dictionary; obtaininga binary code of the feature vector under the optimal rotation matrix through an ITQ algorithm; calculating an n-bit signature composed of the binary number of each Indonesian news in the currently browsed Indonesian news and candidate data set; calculating the Hamming distance between the currently browsed news and each Indonesian news in the candidate data set; and performing sorting accordingto the Hamming distance, and selecting the first m Indonesian news with the minimum distance in the candidate data set as recommended news. According to the method, the technical problem of balance between the news recommendation effect and the calculated amount based on the content is solved. The method is high in flexibility and can be suitable for various language environments.

Description

technical field [0001] The invention belongs to the field of computers, and in particular relates to a method for recommending Indonesian similar news based on an ITQ algorithm. Background technique [0002] When a user searches for webpage news, the system will efficiently and accurately retrieve news from the database that is similar or similar in content to the webpage news that the user is currently browsing. There are roughly two existing technologies for realizing the same function as follows. The first one: based on the search and sorting function that comes with the database, the general working principle of the search engine is as follows. The word segmentation results are stored in the database. Each word corresponds to a news serial number field, which indicates which news contains this word. When the user retrieves news, the system performs word segmentation processing on the words entered by the user, and searches for each word in the database. The correspondin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/9535G06F16/951
Inventor 杨国武杨晓强张庆颖陈祥熊菊霞黄勇王逸尘刘海洋
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products