Subject area identifying method based on weight of text structure
A technology of subject area and text structure, applied in the field of Web information extraction, can solve problems such as affecting the application effect, slow extraction speed, usage restrictions, etc., to save time and energy, run fast, and achieve simple effects
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0096] The technical solution of the present invention will be described in detail below in conjunction with the drawings and embodiments.
[0097] Such as figure 1 As shown, in the embodiment, the web page is acquired first, and then the web page is denoised, so as to obtain the web page to be identified. Web page acquisition is the most original data source and is responsible for providing Web pages to be identified. Concrete implementation can adopt a simple and easy breadth priority crawler to realize webpage acquisition, at first obtain webpage from Internet (Internet) by seed URL address, analyze wherein link then, fresh link is stored in the queue, then cycle takes out link from queue, until Stop when the user goal is reached or the queue is empty. Web page denoising is to standardize the obtained web pages, which can improve the recognition accuracy. During the specific implementation, the HTML document of the web page to be identified can be standardized acco...
PUM
![No PUM](https://static-eureka.patsnap.com/ssr/23.2.0/_nuxt/noPUMSmall.5c5f49c7.png)
Abstract
Description
Claims
Application Information
![application no application](https://static-eureka.patsnap.com/ssr/23.2.0/_nuxt/application.06fe782c.png)
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com