Webpage label position-based text formatting and cleaning method
A web page labeling and text technology, applied in text database indexing, unstructured text data retrieval, special data processing applications, etc. The effect of improving accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0033] The preferred embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings, so that the advantages and features of the present invention can be more easily understood by those skilled in the art, so as to define the protection scope of the present invention more clearly.
[0034] see figure 1 , the embodiment of the present invention includes:
[0035] A text formatting and cleaning method based on the position of the web page label, comprising the following steps:
[0036] S1: traverse all the tags of the entire webpage, and record each tag name, tag location, text content, and document number to the original table; the specific steps include:
[0037] S101: read the network address (URL, uniform resource locator) of the original webpage, this address can be the link address of the website or the downloaded HTML file, and convert the content of the webpage into a tree-structured document object;
[0038] S102: T...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com