Universal forum text extraction method
An extraction method and text technology, applied in the field of general forum text extraction, can solve the problems of inability to extract useful information efficiently and universally, and achieve good utilization effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0037] In order to make the technical scheme of the present invention clearer, below in conjunction with figure 1 The flow chart of the general forum text extraction method shown in detail describes the specific implementation of the present invention.
[0038] General forum text extraction method of the present invention comprises the steps:
[0039] a. Crawl data: Crawl all the information of the website, that is, extract the complete html code of the website, detect the encoding format of the webpage, and uniformly encode it into utf8 format for subsequent processing;
[0040] b. Clean data: Based on the data encoded in uft8 format, apply BeautifulSoup to parse the html tag type to obtain the DOM tree of the web page, such as figure 2 As shown, extract the title information and the content of the div tag containing the publication time information, filter the useless information and classify the extracted information and generate a list;
[0041] c. Format information: ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com