Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for detecting junk data

A garbage data and data technology, applied in the Internet field, can solve the problems of slow detection of garbage data, failure to meet actual needs, and difficulty in handling a large number of user-created content, so as to reduce workload, increase speed, and meet actual needs.

Active Publication Date: 2018-09-04
TENCENT TECH (SHENZHEN) CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The existing technology relies on the manual judgment of editors, the speed of detecting junk data is slow, it is difficult to handle a large number of user-created content, and it cannot meet actual needs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for detecting junk data
  • Method and device for detecting junk data
  • Method and device for detecting junk data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] The embodiment of the present invention provides a method for detecting junk data, see figure 1 , the method includes:

[0025] 101: Get user-generated content.

[0026] 102: Detect whether the user-created content contains data that meets the preset spam data conditions, and the spam data conditions include spam data regular expressions, spam data repetition conditions, spam databases or image links.

[0027] 103: Determine that the user-created content is junk data if it contains data that meets the preset junk data condition.

[0028] Among them, user-created content refers to the data published by users in network applications such as online communities, blogs, and microblogs.

[0029] Preferably, it is detected whether the content created by the user contains data that meets the preset junk data conditions, including:

[0030] Compare user-generated content with preset spam regular expressions;

[0031] Determine whether the user-created content contains data t...

Embodiment 2

[0050] The embodiment of the present invention provides a method for detecting junk data, see figure 2 , the method includes:

[0051] 201: Obtain user-generated content.

[0052] Among them, user-created content refers to the data published by users in network applications such as online communities, blogs, and microblogs.

[0053] 202: Standardize the content created by the user.

[0054] Specifically, standardizing user-created content includes typesetting user-created content in a prescribed format, converting traditional characters in user-created content into simplified characters, etc., and standardizing user-created content to reduce data redundancy in user-created content. In addition, improve data consistency.

[0055] 203: Detect whether the normalized user-created content contains data that meets the preset junk data condition, and if so, go to step 204; otherwise, go to step 205.

[0056] Among them, junk data refers to low-quality meaningless data published ...

Embodiment 3

[0120] see image 3 , the embodiment of the present invention provides a device for detecting garbage data, the device includes:

[0121] An acquisition module 301, configured to acquire user-generated content;

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a trash data detection method and device and belongs to the technical field of communication. The method comprises obtaining user created contents; detecting whether data meeting preset trash data conditions are included in the user created contents; determining that the user created contents are trash data if the data meeting preset trash data conditions are included. The device comprises an obtaining module, a detection module and a determination module. According to the method and the device, whether the user created contents are trash data is determined by detecting whether data meeting preset trash data conditions are included in the user created contents, automatic detection can be performed automatically, the workload of editors can be reduced, the trash data detection speed can be improved, a great number of user created contents can be processed, and actual requirements can be met.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method and device for detecting junk data. Background technique [0002] With the development of Internet technology, the Internet has gradually become an important source for people to obtain information, especially in the Web2.0 era, users participate in the creation of a large amount of content, and the amount of network information is growing exponentially. However, a lot of User Generated Content (UGC) is junk data, which seriously affects the quality of network information. In order to improve the quality of network information, it is necessary to monitor user-created content, detect whether user-created content is spam data, and control spam data accordingly. [0003] At present, the method of detecting spam data is as follows: obtain user-created content, and editors check whether the user-created content conforms to the preset document rules, and if not, determin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/95
Inventor 何小晨杨娜许春林廖宇奇
Owner TENCENT TECH (SHENZHEN) CO LTD