Internet forum page clustering method based on website structure and equipment
A clustering method and web page technology, applied in the field of web forum page clustering based on URL structure, can solve problems such as lack of solutions
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0074] As introduced in the background technology, in order to solve the above problems, the present invention provides a network forum page clustering method and device based on the URL structure. The invention constructs structure vectors according to the URLs, and calculates the dissimilarity between the structure vectors, so that the webpages can be classified by using the clustering analysis method, the webpage classification can be effectively realized for the web forum pages, and the accuracy and efficiency of the webpage classification can be improved.
[0075] In order to achieve the above object, the present invention adopts the following technical scheme:
[0076] A network forum page clustering method based on URL structure, the method includes the following steps:
[0077] (1) Preliminary grouping of all webpages according to the domain names to which the webpages belong, sampling each group of webpages after the preliminary grouping to form a sample, and insertin...
Embodiment 2
[0150] The second object of the present invention is to provide a storage device for a network forum page clustering method based on the URL structure.
[0151] In order to achieve the above object, the present invention adopts the following technical scheme:
[0152] A memory device storing therein a plurality of instructions adapted to be loaded and executed by a processor:
[0153] (1) Preliminary grouping of all webpages according to the domain names to which the webpages belong, sampling each group of webpages after the preliminary grouping to form a sample, and inserting marked webpages to be screened into the samples to form a sample webpage;
[0154] (2) segment the URLs of the sample webpage except the domain name according to symbols, number the category and content of each segmented URL, and construct structural blocks;
[0155] (3) Arrange the structural blocks of the same website in order to form the structural vector of the website; calculate the dissimilarity o...
Embodiment 3
[0158] The third object of the present invention is to provide a terminal device for a network forum page clustering method based on the URL structure.
[0159] In order to achieve the above object, the present invention adopts the following technical scheme:
[0160] A terminal device comprising:
[0161] a processor adapted to implement the instructions; and
[0162] A storage device adapted to store a plurality of instructions adapted to be loaded and executed by a processor:
[0163] (1) Preliminary grouping of all webpages according to the domain names to which the webpages belong, sampling each group of webpages after the preliminary grouping to form a sample, and inserting marked webpages to be screened into the samples to form a sample webpage;
[0164] (2) segment the URLs of the sample webpage except the domain name according to symbols, number the category and content of each segmented URL, and construct structural blocks;
[0165] (3) Arrange the structural bloc...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com