Webpage content processing method and apparatus
A technology for web page content and processing methods, applied in the field of data processing, can solve problems such as low versatility, decreased data availability, unfavorable sorting and optimization, etc., to optimize processing technology, expand description information, and meet the effect of personalization
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0029] figure 1 It is a flow chart of a method for processing webpage content provided by Embodiment 1 of the present invention. The method of this embodiment can be executed by a device for processing webpage content. The device can be implemented by means of hardware and / or software, and can generally be integrated in the server. The method of this embodiment specifically includes:
[0030] 110. Read the text data of the HTML structure corresponding to the webpage to be processed.
[0031] In the technical solution of the embodiment of the present invention, the text content in the webpage to be processed needs to be processed to finally generate a title text pair, so the text content in the webpage to be processed needs to be read first. At the same time, since the webpage is composed of HTML-structured hypertext, in this embodiment, the text content in the webpage to be processed is defined as the HTML-structured text data.
[0032] Wherein, those skilled in the art may...
Embodiment 2
[0044] Figure 2aIt is a flow chart of a method for processing webpage content provided by Embodiment 2 of the present invention. This embodiment is optimized on the basis of the above-mentioned embodiments. In this embodiment, each paragraph in the paragraph list will be converted into a title text according to the content with title attributes in each paragraph in the paragraph list. To: extract a paragraph included in the paragraph list as a target paragraph; identify the content with title attributes included in the target paragraph as a title; use the content in the target paragraph except the title as a paragraph text; The title text pair is formed by taking the title, the paragraph text and the target paragraph as independent wholes.
[0045] Correspondingly, the method in this embodiment specifically includes:
[0046] 210. Read HTML structure text data corresponding to the webpage to be processed.
[0047] 220. Using a paragraph as a unit, perform structural divisi...
Embodiment 3
[0062] Figure 3a It is a flow chart of a method for processing webpage content provided by Embodiment 3 of the present invention. This embodiment is optimized on the basis of the above embodiments. In this embodiment, according to the content with title attributes in each paragraph in the paragraph list, each paragraph in the paragraph list is converted into a title text pair Afterwards, it is also preferred to include: if the adjacent two heading text pairs do not include the text of the paragraph, and the headings in the previous heading text pair only include numbers, then the two adjacent text pairs do not include the text of the paragraph. The title text pair is merged to generate a new title text pair;
[0063] In addition, after converting each paragraph in the paragraph list into a title text pair according to the content with title attributes in each paragraph in the paragraph list, it also preferably includes: if two adjacent title text pairs , the previous paragr...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com