Similar news distinguishing method and system and electronic equipment
A discrimination method and news technology, applied in the information field, can solve the problems of reducing calculation time consumption and low accuracy, and achieve the effect of reducing work and time and making the discrimination result more accurate
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0061] Refer to attached figure 1 As shown, a similar news discrimination method includes the following steps:
[0062] Grabbing step: 110. Input the stock news data, the stock news data is 1 million pieces of news data randomly grabbed from the webpage, carry out word segmentation to the stock news data, and perform statistics on the word and its word frequency after word segmentation After setting up the word frequency database, the word frequency refers to the number of articles that words appear in the news data of stock; Execution step 120;
[0063] Calculation step: 120, segment each piece of news data into words and extract keywords, respectively calculate the weight of the keywords according to the word frequency database; perform step 130;
[0064] 130. Calculate the simhash value of each piece of news data according to the keywords and the weight of the keywords in step 120, and store them in the database; perform step 140;
[0065] 140. Establish a data structure ...
Embodiment 2
[0097] refer to figure 2 As shown, a system for discriminating similar news to the discriminating method of embodiment 1 includes an input module 210, a processing module 220 and an output module 230, wherein:
[0098] The input module 210 is used to input stock news data and / or input new news data,
[0099] The processing module 220 includes a capture module 221, a calculation module 222 and a storage module 223, wherein,
[0100] The capture module 221 is used to segment the news data of the input stock of the input module, segment each news data respectively and extract keywords;
[0101] Described storage module 222 is used for setting up word frequency storehouse after the word and word frequency thereof after word segmentation are counted, and described word frequency refers to the article number that word appears in the news data of stock; The data structure that is used to store simhash value establishment; A list of similar news ids obtained by storing or updating;...
Embodiment 3
[0108] refer to image 3 As shown, a processing module 220 for discriminating similar news to the discrimination method of Embodiment 1, the processing module 220 includes a capture module 221, a calculation module 222 and a storage module 223, wherein:
[0109] The capture module 221 is used to segment the news data of the input stock, segment each news data respectively and extract keywords;
[0110] Described storage module 222 is used for setting up word frequency storehouse after the word and word frequency thereof after word segmentation are counted, and described word frequency refers to the article number that word appears in the news data of stock; The data structure that is used to store simhash value establishment; A list of similar news ids obtained by storing or updating;
[0111] Described calculation module 223 is used for calculating the weight of described keyword respectively according to described storage module and grabbing module; Calculate the simhash va...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


