Method and system for screening duplicate message of Internet

A technology of repeated information and screening methods, which is applied in the field of computer search, can solve the problems of occupying computing resource overhead, large repetitive information, confusing information, etc., and achieve the effect of optimizing information storage, improving search efficiency, and saving hardware resource overhead

Inactive Publication Date: 2017-11-03
重庆电信系统集成有限公司 +1
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When searching the entire network, this method of information preservation will generate a large amount of repetitive and confusing information, and consume a large amount of computing resource overhead

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for screening duplicate message of Internet
  • Method and system for screening duplicate message of Internet

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] figure 1 A schematic flowchart of a method for screening duplicate Internet information provided by Embodiment 1 of the present invention is shown. The methods include:

[0037] Step S1, obtaining relevant text information on the Internet according to preset keywords;

[0038] Step S2, selecting an information source sample and a comparison sample from the text information;

[0039] Step S3, respectively decomposing the information source sample and the comparison sample;

[0040] Step S4, calculating text similarity according to the decomposed information source sample and the comparison sample;

[0041] In step S5, according to the similarity of the texts, the corresponding texts are classified and stored.

[0042] The concrete technical scheme of embodiment one of the present invention is:

[0043] In step S1, relevant text information on the Internet is obtained according to preset keywords.

[0044] Preferably, the text information containing the keyword is o...

Embodiment 2

[0078] Corresponding to the embodiments of the present invention, figure 2 A schematic structural diagram of an Internet duplicate information screening system provided by an embodiment of the present invention is shown. The system includes: an information acquisition module 101 , a sample selection module 102 , a sample decomposition module 103 , a similarity calculation module 104 , and a classification processing module 105 .

[0079] The information acquisition module 101 is configured to acquire relevant text information on the Internet according to preset keywords. The information acquisition module 101 is preferably a web crawler module, and the web crawler module can automatically grab information on the Internet according to certain rules. In the embodiment of the present invention, the rules can be set to grab information containing preset keywords, then The web crawler module can grab text information containing the keyword from the Internet.

[0080] The sample ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for screening the duplicate message of the Internet. The method comprises the following steps that: according to a preset keyword, obtaining the relevant text information of the Internet; selecting a message source sample and a comparison sample from the text information; independently decomposing the message source sample and the comparison sample; according to the decomposed message source sample and comparison sample, calculating a text similarity; and according to the text similarity, carrying out classified processing and storage on the corresponding text. By use of the method, a great quantity of duplicate messages are screened and classified, search efficiency can be improved, an information storage way is optimized, and hardware resource expenditures are saved.

Description

technical field [0001] The invention relates to the field of computer search, in particular to a method and system for screening repeated information on the Internet. Background technique [0002] In the mass of Internet texts, articles and news, information always exists repeatedly on many websites and servers. When a computer search system acquires information, it usually saves the information in the form of full-text acquisition. When searching the entire network, this method of information preservation will generate a large amount of repetitive and confusing information, and consume a large amount of computing resource overhead. Contents of the invention [0003] Aiming at the defects in the prior art, the present invention provides a method and system for screening repetitive information on the Internet. When a computer automatically obtains Internet information, it screens and classifies a large amount of repetitive information, so that the computer can improve searc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/95
Inventor 郑午刘德彬严开
Owner 重庆电信系统集成有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products