Web page data acquisition method of using context environment rules
A webpage data and acquisition method technology, applied in the direction of electronic digital data processing, special data processing applications, instruments, etc., can solve problems such as users' reading troubles, and achieve the effect of high content extraction quality, high writing efficiency, and simple definition
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0010] The technical solutions of the present invention will be further described in detail below in conjunction with specific embodiments.
[0011] A web page data acquisition method using contextual environment rules, including content extraction rules and rule matching algorithms, the content extraction rules are mainly defined by the user according to the syntax of the extraction rules; the content extraction rules adopt tree inheritance similar to object-oriented languages The specific and specialized extraction rules are inherited from the general rules; the syntax of the extraction rules is a condition-action grammar mode; the condition part includes DOM node attributes and context attributes, and the DOM node attributes include tag names, node class names, Node ID, node font name, node width attribute, node height attribute, and even some calculated values inside the DOM node, such as the number of pictures contained inside, the number of strings, text length, link de...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com