Method and device for extracting forum post content
A forum and content technology, applied in the field of forum post content extraction, can solve problems such as forum post content extraction, and achieve the effect of automatic extraction
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0014] The present invention will be described in detail below with reference to the accompanying drawings and in combination with embodiments.
[0015] figure 1 A flowchart showing a method for extracting forum post content according to an embodiment of the present invention, including:
[0016] Step S10, generating an HTML tag tree from the source code posted on the forum;
[0017] Step S20, merging the tag subtrees in the HTML tag tree whose text rate is greater than the first threshold to obtain a maximum candidate subtree. According to the results of multiple experiments, preferably, the first threshold is set to 0.8;
[0018] In step S30, all node clusters with similar structures are filtered from the largest candidate subtree, which are the posts on each floor;
[0019] Step S40, screening node clusters with a text rate greater than the second threshold from the node clusters, according to the results of multiple experiments, preferably, setting the second threshold t...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More - R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com
