Method and system for screening duplicate message of Internet
A technology of repeated information and screening methods, which is applied in the field of computer search, can solve the problems of occupying computing resource overhead, large repetitive information, confusing information, etc., and achieve the effect of optimizing information storage, improving search efficiency, and saving hardware resource overhead
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0036] figure 1 A schematic flowchart of a method for screening duplicate Internet information provided by Embodiment 1 of the present invention is shown. The methods include:
[0037] Step S1, obtaining relevant text information on the Internet according to preset keywords;
[0038] Step S2, selecting an information source sample and a comparison sample from the text information;
[0039] Step S3, respectively decomposing the information source sample and the comparison sample;
[0040] Step S4, calculating text similarity according to the decomposed information source sample and the comparison sample;
[0041] In step S5, according to the similarity of the texts, the corresponding texts are classified and stored.
[0042] The concrete technical scheme of embodiment one of the present invention is:
[0043] In step S1, relevant text information on the Internet is obtained according to preset keywords.
[0044] Preferably, the text information containing the keyword is o...
Embodiment 2
[0078] Corresponding to the embodiments of the present invention, figure 2 A schematic structural diagram of an Internet duplicate information screening system provided by an embodiment of the present invention is shown. The system includes: an information acquisition module 101 , a sample selection module 102 , a sample decomposition module 103 , a similarity calculation module 104 , and a classification processing module 105 .
[0079] The information acquisition module 101 is configured to acquire relevant text information on the Internet according to preset keywords. The information acquisition module 101 is preferably a web crawler module, and the web crawler module can automatically grab information on the Internet according to certain rules. In the embodiment of the present invention, the rules can be set to grab information containing preset keywords, then The web crawler module can grab text information containing the keyword from the Internet.
[0080] The sample ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com