Method for acquiring news web page text information
A text information and extraction method technology, applied in the direction of instruments, calculations, electrical digital data processing, etc., can solve the problems of complex generation and maintenance of wrappers, high cost, etc., and achieve the effect of reducing manual intervention and improving efficiency and accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0048] In the following, the method of the present invention will be further explained in conjunction with the embodiments and the drawings.
[0049] Take, for example, extracting text information from 1000 news pages arranged in chronological order from the sports channel of Sina News, such as figure 1 As shown, a method for extracting the body information of a news webpage includes the following steps:
[0050] (1) Use a third-party web page purification tool (for example, tidy tool) for 1,000 web pages to perform standardized preprocessing to make them conform to the Html language standard, and then follow the Html language Mark, parse the Html data of all news webpages, and get the Html tree;
[0051] When parsing the Html data of all news pages, and constructing the Html tree, the following methods are used:
[0052] Because in the present invention, the Html tag with The effect is the same, so the present invention is based on with The situation is completely simil...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap