Webpage information identification system

A web page information and identification system technology, applied in the computer field, can solve problems such as the narrow scope of application of web page information data acquisition tools

Pending Publication Date: 2021-05-11
张冶青
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The main purpose of the present invention is to propose a webpage information identification system to solve the problem that the scope of application of webpage information data acquisition tools in the prior art is narrow

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage information identification system
  • Webpage information identification system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0036] It should be noted that, in this document, the term "comprises", "comprises" or any other variant is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, but also Other elements not expressly listed, or inherent to the process, method, article, or apparatus, are also included. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.

[0037] Herein, use of suffixes such as 'module', 'part' or 'unit' for denoting elements is only for facilitating description of the present invention and has no specific meaning by itself. Therefore, "module" and "compone...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is suitable for the technical field of computers, and provides a webpage information identification system which comprises a path identification module, a path identification module, a vectorization processing module, a clustering processing module, a verification module, a webpage information identification module and a webpage information output module. The webpage information identification system is in butt joint with a target webpage, so that the label path in the target webpage is obtained through identification based on the content of the target webpage, then vectorization processing and clustering processing are conducted on the label path, and automatic arrangement of the webpage content is achieved. And finally, an optimal list node in the optimal list node set is marked on the target webpage by a webpage information output module. According to the method and the device, the webpage content can be automatically sorted, and the optimal list nodes in the optimal list node set are hierarchical, so that the marked webpage content has hierarchical division, and the extraction of the list item content is facilitated.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a web page information identification system. Background technique [0002] With the rapid development of network technology, the World Wide Web has become the information data transmission carrier with the largest number of transmissions and the highest transmission efficiency. How to effectively obtain the required information data from the World Wide Web and make use of massive information has become a research topic in the field of network technology and communication technology. hot topics. [0003] Among them, the web crawler is a commonly used webpage information data acquisition tool. Its principle is to automatically grab the program or script of World Wide Web information according to certain rules, thereby reading the content of the webpage, finding other link addresses in the webpage, and then passing These link addresses look for the next webpage, and the loop con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951G06F16/9535G06F16/955
CPCG06F16/951G06F16/9535G06F16/9566
Inventor 张冶青
Owner 张冶青
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products