Directional web data extraction method
A technology of webpage data and data, which is applied in the field of network technology and search engines, can solve the problems of not being able to provide directional crawling of webpage data and limited application fields, and achieve the effects of simple operation, wide application range, and saving storage resources
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0055] In order to provide users with a vegetable price information service business, the server that provides the corresponding business needs to capture vegetable price data from a professional price information website.The web files contained in the price information website are HTML (Hyper Text Markup Language, Super text) format; the URL (Uniform Resource Locator, unified resource positioning symbol) address containing vegetable price information is "http: / / wwww. Feinno.com / Commodity-Pro / 016 ", the price of vegetables in this page presents the table structure as shown in Table 1:
[0056] Table 1
[0057]
[0058] In Table 1, the "Vegetable Price" unit is the header, and the other cells are data cells.Now, you need to provide the "green pepper" price quotation service for users. Therefore, the server needs to grasp the price data of green peppers from the webpage to ignore other web pages in the webpage;Facilities Ru Ru figure 1 The specific method is as follows:
[0059]...
Embodiment 2
[0086] In order to provide users with a vegetable price information service business, the server that provides the corresponding business needs to be the webpage "http: / / www.feinno.com / commodity-price / 016" of the price information website described by Example 1Price data, the price of vegetable prices in this page is shown in Table 1.Currently, it is necessary to provide users with the vegetable name data listed in this form and the corresponding price data of various vegetables. Therefore, the vegetable name of the vegetables in the webpage "tomato", "green peppers", "carrot" and the corresponding vegetable price "3.50 yuan / 500 grams "," 2.50 yuan / 500 grams "," 1.50 yuan / 500 grams "is the web page data to be captured; using the method of the present invention for directional capture, the process box diagram is like figure 1 The specific method is as follows:
[0087] i) According to the data structure characteristics of the web file to be captured, the data matching model built b...
Embodiment 3
[0121] In the price information website, the URL address is "http: / / www.feinno.com / commodity-price / 016". The content of the webpage has changed.Another food price information form with the same table structure as Table 1, as shown in Table 4:
[0122] Table 4
[0123]
[0124] Now it still provides users with vegetable price information forms (as shown in Table 1), and the vegetable name data listed in Table 1 and the corresponding price data of various vegetables need to be directed from the above webpage."Carrot" and the vegetable price data corresponding to the three "3.50 yuan / 500 grams", "2.50 yuan / 500 grams", "1.50 yuan / 500 grams".
[0125] According to the table structure characteristics of Table 1 and the rules of HTML source code, you can determine the source code of vegetable name data and vegetable price data in the web file.
[0136]
[0137]
[0138]
[0139]
[0140]
[0141]
[0142] ", Table Tag" ", Head tag" ", Data cell tag" "But the characteristics of only ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com