Unlock instant, AI-driven research and patent intelligence for your innovation.

Similar news distinguishing method and system and electronic equipment

A discrimination method and news technology, applied in the information field, can solve the problems of reducing calculation time consumption and low accuracy, and achieve the effect of reducing work and time and making the discrimination result more accurate

Active Publication Date: 2019-11-01
广州吉信网络科技开发有限公司
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the Simhash similarity scheme reduces the calculation time by compressing the information of the article, so the accuracy of the similarity is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similar news distinguishing method and system and electronic equipment
  • Similar news distinguishing method and system and electronic equipment
  • Similar news distinguishing method and system and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] Refer to attached figure 1 As shown, a similar news discrimination method includes the following steps:

[0062] Grabbing step: 110. Input the stock news data, the stock news data is 1 million pieces of news data randomly grabbed from the webpage, carry out word segmentation to the stock news data, and perform statistics on the word and its word frequency after word segmentation After setting up the word frequency database, the word frequency refers to the number of articles that words appear in the news data of stock; Execution step 120;

[0063] Calculation step: 120, segment each piece of news data into words and extract keywords, respectively calculate the weight of the keywords according to the word frequency database; perform step 130;

[0064] 130. Calculate the simhash value of each piece of news data according to the keywords and the weight of the keywords in step 120, and store them in the database; perform step 140;

[0065] 140. Establish a data structure ...

Embodiment 2

[0097] refer to figure 2 As shown, a system for discriminating similar news to the discriminating method of embodiment 1 includes an input module 210, a processing module 220 and an output module 230, wherein:

[0098] The input module 210 is used to input stock news data and / or input new news data,

[0099] The processing module 220 includes a capture module 221, a calculation module 222 and a storage module 223, wherein,

[0100] The capture module 221 is used to segment the news data of the input stock of the input module, segment each news data respectively and extract keywords;

[0101] Described storage module 222 is used for setting up word frequency storehouse after the word and word frequency thereof after word segmentation are counted, and described word frequency refers to the article number that word appears in the news data of stock; The data structure that is used to store simhash value establishment; A list of similar news ids obtained by storing or updating;...

Embodiment 3

[0108] refer to image 3 As shown, a processing module 220 for discriminating similar news to the discrimination method of Embodiment 1, the processing module 220 includes a capture module 221, a calculation module 222 and a storage module 223, wherein:

[0109] The capture module 221 is used to segment the news data of the input stock, segment each news data respectively and extract keywords;

[0110] Described storage module 222 is used for setting up word frequency storehouse after the word and word frequency thereof after word segmentation are counted, and described word frequency refers to the article number that word appears in the news data of stock; The data structure that is used to store simhash value establishment; A list of similar news ids obtained by storing or updating;

[0111] Described calculation module 223 is used for calculating the weight of described keyword respectively according to described storage module and grabbing module; Calculate the simhash va...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a similar news distinguishing method and system and electronic equipment, capable of considering both the distinguishing accuracy and the real-time performance. The similar news distinguishing method comprises the following steps: a capturing step: carrying out word segmentation on stock news data, and establishing a word frequency library after carrying out statistics onwords subjected to word segmentation and word frequencies thereof; a calculation step: performing word segmentation on each piece of news data, extracting keywords, and calculating weights of the keywords according to the word frequency library; calculating a simhash value of each piece of news data according to the keywords and the weights of the keywords in the calculation step, and storing thesimhash values into a database; and establishing a data structure for the simhash value, and / or performing similarity processing according to the simhash value of each piece of news data, merging thesimilar news data into the data structure, and constructing or updating to obtain a similar news id list.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a similar news discrimination method, system and electronic equipment. Background technique [0002] In the public opinion monitoring system, customers can monitor online news and public opinion that meets the conditions by setting keywords. However, since a piece of news is usually reposted twice or even multiple times, there may be more than a dozen or dozens of eligible news and public opinions filtered by keywords are duplicates. In order to save user time and improve user experience, the existing technology urgently needs a method or device that combines similar news into one and gives the number of similar items, so that users can quickly obtain the most widely spread public opinion information that they care about . [0003] However, existing discrimination schemes in the prior art mainly include two types. One is the TF-IDF method, where TF refers to w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/951G06F17/27G06K9/62
CPCG06F16/3344G06F16/951G06F18/22Y02D10/00
Inventor 曾颖清
Owner 广州吉信网络科技开发有限公司