A URL cleaning system and method based on integrated learning
An integrated learning and cleaning system technology, applied in the field of network information processing, can solve problems such as easily missing titles and manpower consumption, and achieve the effects of improving accuracy, improving cleaning efficiency, and improving computing efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0038] In order to make the technical problems, technical solutions and beneficial effects to be solved by the present invention clearer and clearer, the present invention will be further described in detail below in conjunction with specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
[0039] A URL cleaning system based on integrated learning, which includes:
[0040] The data crawling module is used to crawl the URL of the website and its corresponding website title;
[0041] A data labeling module, which judges whether the website title is consistent with the specified crawling theme, if so, then marks the website title as a class A title, otherwise marks it as a B class title;
[0042] The primary prediction model 1 is used to segment the marked A-type titles and B-type titles, calculate the weight value of the word segmentation results, and then use ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


