Unlock instant, AI-driven research and patent intelligence for your innovation.

Website information merging and deduplication method

A website information and website technology, applied in the Internet field, can solve the problems of not being able to enjoy the convenience of the Internet to the greatest extent, time and labor waste, and achieve the effect of timeliness and convenience

Active Publication Date: 2017-02-01
青岛崇胜网络科技有限公司
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the development of Internet technology, network platforms have become the main way for people to obtain information, and there are more and more websites of the same type. Then there is a phenomenon: the same information is published on different networks, For example: the business information released by the same company will be published on multiple business websites of the same type. When users browse the website and look for information, they will browse repeatedly and obtain a large amount of repeated information, resulting in time and Waste of labor, unable to enjoy the convenience brought by the Internet to the greatest extent

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website information merging and deduplication method
  • Website information merging and deduplication method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] A method for merging and deduplicating website information, the method includes the following steps:

[0031] (1) Obtain the data information of multiple target websites that need to be analyzed, compare the data information horizontally among the websites, and merge and deduplicate the information;

[0032] A. According to the structure of the target website, set the website template of the target website to be analyzed, and set the URL of the target website; the design process of the website template includes analyzing the structure of each target website to be compared, and setting the crawling needs according to the website structure The URL of the data home page, the URL of the corresponding data page under the data home page, the page label to be captured, through regular expression matching, and DOM parsing of HTML label elements; the required website content can be obtained through the website template.

[0033] B. Set up an independent thread for the website te...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a website information combination and de-duplication method. The method mainly includes the steps of 1, acquiring data information of multiple target websites, to be analyzed, transversely comparing the data information among the websites, and subjecting the information to combination and de-duplication; 2, acquiring internal data information of each target website, longitudinally comparing the data among insides of the websites, and subjecting the data to combination and de-duplication; 3, displaying the information on a new web page after combination and de-duplication. The method has the advantages that mass duplicate information on similar websites can be removed, the information which is de-duplicated is displayed centrally, and timeliness and convenience of internet can be given to full play.

Description

technical field [0001] The invention belongs to the technical field of the Internet, and in particular relates to a method for merging and deduplicating website information. Background technique [0002] With the development of Internet technology, network platforms have become the main way for people to obtain information, and there are more and more websites of the same type. Then there is a phenomenon: the same information is published on different networks, For example: the business information released by the same company will be published on multiple business websites of the same type. When users browse the website and look for information, they will browse repeatedly and obtain a large amount of repeated information, resulting in time and The waste of labor makes it impossible to enjoy the convenience brought by the Internet to the greatest extent. [0003] The key reason for the formation of this problem is that the websites of the same kind operate independently an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/958
Inventor 初殿松
Owner 青岛崇胜网络科技有限公司