Method for acquiring news web page text information
A technology of text information and extraction method, applied in the fields of instrument, calculation, electric digital data processing, etc., can solve the problems of inaccurate text information, low efficiency, incomplete text information, etc., and achieve the effect of reducing manual intervention and improving efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0050] The method of the present invention will be further illustrated below in conjunction with the embodiments and accompanying drawings.
[0051] Take the text information extracted from the 1000 news webpages arranged in chronological order captured from the sports channel of Sina News as an example, such as figure 1 As shown, a method for extracting news webpage text information includes the following steps:
[0052] (1) Use a third-party web page purification tool (for example, tidy tool) to perform standardized preprocessing on 1000 web pages to make them conform to the Html language standard, and then according to the Html language mark, analyze the Html data of all news webpages, and get the Html tree;
[0053] When parsing the Html data of all news webpages and constructing the Html tree, the following methods are used:
[0054] Since in the present invention, the Html tag and The effect is the same, so the present invention uses and The sit...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com