Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Webpage information extraction method and system and electronic equipment

A web page information and extraction method technology, applied in the field of data analysis, can solve the problems of high system resource occupation, high labor cost consumption, unstable extraction results, etc., to reduce resource occupation, reduce maintenance costs, improve extraction efficiency and extraction efficiency The effect of accuracy

Inactive Publication Date: 2020-11-20
成都数联铭品科技有限公司
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to overcome the above-mentioned deficiencies in the prior art, and provide a method and system for extracting webpage information, which is used to solve the problem of excessive human cost consumption due to the use of customized extraction rules in the existing webpage information extraction process, or the use of Third-party open source libraries have technical defects such as unstable extraction results and excessive system resource usage

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage information extraction method and system and electronic equipment
  • Webpage information extraction method and system and electronic equipment
  • Webpage information extraction method and system and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

[0033] see figure 1 , figure 1 It shows a schemati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a webpage information extraction method and system and electronic equipment. According to the method, a target webpage is processed through a pre-processing process, the processed target webpage and a corresponding field extraction rule are obtained, a field extraction rule base is generated, and then corresponding field information is extracted from the target webpage based on the field extraction rule base. The invention further discloses a webpage information extraction system. Based on the method and the system disclosed by the invention, the defect of low efficiency of manually customizing the field extraction rule in the prior art is solved; meanwhile, the problems of low accuracy, poor stability and the like when the webpage information is extracted by utilizing the existing open source toolkit are solved; the accuracy and the stability of field information extraction are improved while the labor cost and the resource cost are reduced, and therefore, themethod has obvious technical advantages and technical effects.

Description

technical field [0001] The invention relates to the field of data analysis, in particular to a web page information extraction method and system, and electronic equipment. Background technique [0002] In the era of big data, web crawlers are a useful tool for collecting data from the Internet. Web crawlers need to crawl hundreds or thousands of related site pages to obtain information related to topics, such as title, time, source, content, author, etc. content, and in view of the diversification of web development technologies and style designs, the traditional solution is to customize the extraction code and extraction rules for each site. The advantage of this solution is that the correct rate of field extraction is very high, but the obvious disadvantage is that due to the diversification of web page styles, at the same time, this solution relies heavily on the stability of the web page structure, and the adjustment of the web page structure will lead to the extraction ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/951
CPCG06F16/951
Inventor 何莹瑜丁明会许杰吴桐
Owner 成都数联铭品科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products