Method and device for extracting forum post content
A forum and content technology, applied in the field of forum post content extraction, can solve problems such as forum post content extraction, and achieve the effect of automatic extraction
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0014] The present invention will be described in detail below with reference to the accompanying drawings and in combination with embodiments.
[0015] figure 1 A flowchart showing a method for extracting forum post content according to an embodiment of the present invention, including:
[0016] Step S10, generating an HTML tag tree from the source code posted on the forum;
[0017] Step S20, merging the tag subtrees in the HTML tag tree whose text rate is greater than the first threshold to obtain a maximum candidate subtree. According to the results of multiple experiments, preferably, the first threshold is set to 0.8;
[0018] In step S30, all node clusters with similar structures are filtered from the largest candidate subtree, which are the posts on each floor;
[0019] Step S40, screening node clusters with a text rate greater than the second threshold from the node clusters, according to the results of multiple experiments, preferably, setting the second threshold t...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com