An entity-based bottom-up web data extraction method
A data extraction, bottom-up technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of reduced recall, neglected connections, and unsuitable complex pages.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0042] Step 1. Select the web data page: select the popular air ticket booking website "Taobao Air Ticket" http: / / ipiao.taobao.com / 2010 / home.htm? TBG=66409.71436.28&ad_id=&am_id=&cm_id=1400381961b2c34cffa7&pm_id=As the data source, select Shenyang as the origin of the flight, select Shenzhen as the destination, and select 2011 / 5 / 11 as the date, and click Search to return to the ticket result page (see attachment Figure 4 ), enter the HTML source code of the page as the input.
[0043] Step 2. Divide the text: After completing the preprocessing of the result page D, divide the text of D, and obtain the text sequence S list For .
[0044] Step 3. Label entity attributes: The extraction rules for booking topics are defined as follows:
[0045] First level rule level R 1
The second level ruleset R 2
Flight (F)
\C{4,8}([\w\d]{6})?
\C{2}Aviation\w{2}\d{4}
time (T)
\d{1,2}[:dot]\d{1,2}
([01][0-9])|(2[0-4])[:point]([0-5][0-9])|(60) ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com