Webpage information collection system and method

A technology of webpage information and collection system, which is applied in the field of webpage information collection system that supports industry structure analysis, and can solve the problems of unable to collect webpage information, unable to obtain directly related information, and difficult to distinguish related information, etc.

Pending Publication Date: 2022-01-04
盐城至新达科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are often cases where information directly related to information cannot be obtained, such as representations, fluctuations, and lexical notation errors in the content of web page information.
[0005] When web page information does not include rules related to required background knowledge, sometimes it is difficult to distinguish relevant information
As a result, sometimes irrelevant webpage information is collected, and sometimes relevant webpage information cannot be collected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage information collection system and method
  • Webpage information collection system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0012] Hereinafter, modes for implementing the present invention will be described with reference to the drawings. figure 1 It is a structural schematic diagram of a specific embodiment of a web page information collection system according to the present invention. The web page information collection system of this embodiment uses source codes representing attributes (subject, creation time, etc.) judge. refer to figure 1 , the web page information collection system includes a keyword generation module 10 , an information collection module 20 , a rule storage module 30 , a database 50 , and a decision module 40 .

[0013] The keyword generation module 10 is capable of generating keywords according to the document specified by the user. Keywords can contain multiple word collections. The keyword generation module uses the algorithm of word vectors for clustering statistics, and collects character strings representing the attributes contained in the source code information w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a webpage information collection system, which comprises a keyword generation module (10) capable of generating keywords according to a document specified by a user; an information collection module (20) which is used for collecting source codes attached to the webpage information; a rule storage module (30) which is preset with corresponding rules of keywords and source codes of webpage information; a database (50) which is used for storing the keywords and the collected webpage information and the source codes; and a judging module (40) which is used for judging whether the keyword and the webpage information source code conform to the corresponding rule or not based on the keyword and the collected source code information, and correspondingly storing the keyword, the webpage information and the source code into the database if the keyword and the webpage information source code conform to the corresponding rule. According to the method and the device, the required webpage information can be comprehensively and accurately collected. The invention further provides a webpage information collection method.

Description

technical field [0001] The present invention relates to a webpage information collection system and a webpage information collection method, in particular to a webpage information collection system and a webpage information collection method supporting industry structure analysis. Background technique [0002] In recent years, with the popularization of the Internet, there has been an increasing number of companies and individuals disclosing information to web sites. Collecting information published on websites (hereinafter referred to as webpage information) for marketing strategies and corporate strategy formulation is increasing. This requires proper classification and arrangement of the collected web page information. Due to the high cost of manual sorting, a method of classifying and sorting website information according to target files to be sorted has appeared in the prior art. [0003] In the existing method, information is classified / arranged by using web page inf...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9532G06F16/9538G06F16/35
CPCG06F16/9532G06F16/9538G06F16/35
Inventor 胡日勒
Owner 盐城至新达科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products