Data mining method and system for high-quality user-generated content

A high-quality user and data mining technology, applied in the direction of website content management, network data retrieval, electronic digital data processing, etc., can solve the problems of high-quality UGC such as poor real-time performance, excessive time consumption, poor correlation, etc., to save machine resources and network Bandwidth resources, reduction of human-computer interaction, and objective selection methods

Active Publication Date: 2017-11-17
TENCENT TECH (BEIJING) CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the high-quality accounts set in the prior art are not based on the objective UGC content published by the accounts, but based on other factors such as the number of "fans" or subjective settings. Therefore, the so-called high-quality accounts sent The high-quality UGC content selected from UGC has poor correlation with the content that target users care about, and also has poor correlation with this category. For example, in the Weibo system, a high-quality "entertainment" account contains The content of the popular Weibo published does not necessarily belong to the "entertainment" category, and may belong to other categories
[0009] Secondly, the existing high-quality UGC data mining process is mainly based on the ranking and selection of UGC reposts and comments published by high-quality accounts of various categories, and is not selected based on the content of UGC, resulting in the final selected high-quality The content of UGC is less relevant to the content that target users care about, and also less relevant to this category
[0010] Thirdly, because the existing high-quality UGC data mining process is mainly based on the number of reposts and comments of UGC published by high-quality accounts in various categories, the UGC with a higher number of reposts and comments is often published. Older UGC, and the number of reposts and comments of the latest UGC is often very small, so the probability of the latest UGC being selected as high-quality UGC is very low, resulting in the poor real-time performance of high-quality UGC, which cannot adapt to some real-time needs Real-time requirements for higher categories such as news categories
[0011] In short, the high-quality UGC selected by the existing high-quality UGC data mining technology for various purposes has poor correlation with the content that target users care about, and the correlation with this category is also poor, and the real-time performance is also poor. poor
It makes it inconvenient for target users to quickly browse the UGC they care about, and it takes too much time and energy to find the content they care about

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data mining method and system for high-quality user-generated content
  • Data mining method and system for high-quality user-generated content
  • Data mining method and system for high-quality user-generated content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Below in conjunction with accompanying drawing and specific embodiment the present invention will be described in further detail again

[0030] figure 2 It is a flow chart of the data mining method for high-quality UGC in the present invention. see figure 2 , the method mainly includes:

[0031] Step 201, mining high-quality accounts for various purposes. That is: analyze and calculate the historical UGC content published by each account of the UGC website system, obtain the quality score of each UGC and its correlation with each category, and filter out high-quality accounts of each category based on the quality score and relevance . Due to the huge amount of calculation in this step 201, this step is usually performed offline.

[0032] Step 202, mining and processing high-quality UGC. That is: after the UGC website system receives the UGC newly published by the premium account, it calculates the quality score of the UGC and the correlation between the UGC and th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

This application discloses a data mining method and system for high-quality user-generated content (UGC), including: calculating the quality score of historical UGC published by each account and its correlation with various purposes, according to the quality score and correlation After receiving the newly published UGC of the high-quality account, calculate the quality score of the UGC and the correlation between the UGC and the category of the high-quality account that published the UGC ; Determine whether the quality score of the UGC is greater than the preset quality score threshold, and whether the correlation between the UGC and the category of the high-quality account that published the UGC is greater than the preset correlation threshold of the category, and if so, the The UGC is regarded as a high-quality UGC in the category of the high-quality account that published the UGC. Utilizing the present invention can improve the correlation between high-quality UGC and content concerned by target users and the category, and improve the real-time performance of high-quality UGC.

Description

technical field [0001] This application relates to the technical field of computer and Internet data processing, and in particular to a data mining method and system for high-quality User Generated Content (UGC, User Generated Content). Background technique [0002] At present, with the development of Internet technology, the Internet has gradually become an important source of information for people, especially after the Internet enters the Web2.0 era, users are not only the viewers of website content, but also the creators of website content. The content created by users is called UGC, such as logs and photos published by users. In the era of Web 2.0, due to the emergence of a large number of UGC, the amount of network information has grown rapidly at a geometric level, forming a situation of multiple, wide, and specialized, which has played a very important role in the accumulation and dissemination of human knowledge. [0003] A website system that can publish UGC is ge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/958G06F16/24578
Inventor 阳云李维刚
Owner TENCENT TECH (BEIJING) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products