Method and device for detecting junk data
A garbage data and data technology, applied in the Internet field, can solve the problems of slow detection of garbage data, failure to meet actual needs, and difficulty in handling a large number of user-created content, so as to reduce workload, increase speed, and meet actual needs.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0024] The embodiment of the present invention provides a method for detecting junk data, see figure 1 , the method includes:
[0025] 101: Get user-generated content.
[0026] 102: Detect whether the user-created content contains data that meets the preset spam data conditions, and the spam data conditions include spam data regular expressions, spam data repetition conditions, spam databases or image links.
[0027] 103: Determine that the user-created content is junk data if it contains data that meets the preset junk data condition.
[0028] Among them, user-created content refers to the data published by users in network applications such as online communities, blogs, and microblogs.
[0029] Preferably, it is detected whether the content created by the user contains data that meets the preset junk data conditions, including:
[0030] Compare user-generated content with preset spam regular expressions;
[0031] Determine whether the user-created content contains data t...
Embodiment 2
[0050] The embodiment of the present invention provides a method for detecting junk data, see figure 2 , the method includes:
[0051] 201: Obtain user-generated content.
[0052] Among them, user-created content refers to the data published by users in network applications such as online communities, blogs, and microblogs.
[0053] 202: Standardize the content created by the user.
[0054] Specifically, standardizing user-created content includes typesetting user-created content in a prescribed format, converting traditional characters in user-created content into simplified characters, etc., and standardizing user-created content to reduce data redundancy in user-created content. In addition, improve data consistency.
[0055] 203: Detect whether the normalized user-created content contains data that meets the preset junk data condition, and if so, go to step 204; otherwise, go to step 205.
[0056] Among them, junk data refers to low-quality meaningless data published ...
Embodiment 3
[0120] see image 3 , the embodiment of the present invention provides a device for detecting garbage data, the device includes:
[0121] An acquisition module 301, configured to acquire user-generated content;
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


