User portrait method and system of Dirichlet process based on word

A user and word technology, applied in special data processing applications, instruments, unstructured text data retrieval, etc., can solve problems that affect the performance of the method, are not comprehensive, cannot respond to users, etc.

Active Publication Date: 2019-05-21
宋来伟
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are at least the following problems in using the above method to profile user data of microblog data production platforms: (1) There are few explicit information available, there are a large number of anonymous users, and the label information is not comprehensive. For example, although the label information reflects the user Preferences are not comprehensive, nor can they reflect changes in user preferences, and the accuracy of user portraits is poor; (2) It is difficult to extract implicit information from fragmented explicit information. interests and changes, but the text of each piece of content information is limited to 140 characters, it is difficult to establish a suitable clustering dimension for clustering and classification through conventional semantic analysis, and cannot be used for user portraits
Both types of distributions are obtained by relying on word co-occurrence information. When the text length of each document is short, the word co-occurrence information is insufficient, which affects the performance of this type of method.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • User portrait method and system of Dirichlet process based on word
  • User portrait method and system of Dirichlet process based on word
  • User portrait method and system of Dirichlet process based on word

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] Such as figure 2 As shown, this embodiment provides a user profiling method based on word-to-Dirichlet process, which is used for profiling users by extracting user data from Sina Weibo. The method may include the following steps:

[0048] S101. Extract short documents in user data.

[0049] In specific implementation, such as image 3 The information panel of a Sina Weibo user as shown provides the user’s account information including basic information, work information, education information, and label information identified by himself or others through network social activities. These information are user data. part. In this embodiment, the user data of the user also includes content information such as microblogs and public messages posted or updated by the user daily, and each microblog or public message is a short document. A data table including all short documents is established, and a field of the data table includes at least a short document id correspond...

Embodiment 2

[0060] This embodiment provides a user portrait method based on word-pair Dirichlet process, which is used to profile users by extracting user data in Sina Weibo. The difference between this embodiment and Embodiment 1 is that short documents extracted from user data are segmented according to the time axis, each segment is used as a short document set to extract keywords, and user portraits are performed according to changes in the probability distribution of key words. If it is found that the value of the keyword "food" becomes lower, it can be judged that the user is on a diet.

Embodiment 3

[0062] Corresponding to all the method embodiments of the present application, this embodiment provides a user portrait system based on the word pair Dirichlet process. All or part of the data used by the system to generate the user portrait comes from the keywords obtained through the method of the present application or from any process data obtained during the implementation of the method.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a user portrait method and system of a Dirichlet process based on words, and relates to the technical field of data mining. The method comprises: short documents in user data are extracted, keywords of the short documents are obtained through the word-to-Dirichlet process, and the keywords are used for establishing a user portrait. The fragmentized content information in the user data generated by the microblog data production platform can be fully mined, and the accuracy of carrying out user portrait drawing by utilizing the user data can be effectively improved. According to the word pair Dirichlet process provided by the invention, a document is not directly obtained. That is, subject distribution breaks through boundary limitation between documents, co-occurrence information of words is counted from the whole document set, and the problem that word co-occurrence information is seriously insufficient when a single document is a short text is solved. Topic-topic co-occurrence information can be obtained according to the word co-occurrence information of the whole document set Word distribution is carried out, and then a Bayesian formula can be utilized toobtain a document-document of each document-theme distribution.

Description

technical field [0001] The present invention relates to the technical field of data mining, in particular to a method and system for making user portraits through short documents in user data. Background technique [0002] User portrait (that is, User Profile or Personas), also known as user role, is an instrumental modeling method that outlines target users through user data, connects user needs and actual product design direction, and the digital model generated by the user portrait method is also called user. portrait. Weibo is an Internet social tool with a large number of users, and it is also a data production platform for user data. Its users can generate a large amount of user data every day. The user data of data production platforms such as Weibo can be digitized through user portraits. It is used to grasp the core demands of user groups, analyze the emotional preferences of user groups, improve the performance of personalized information recommendation and assist...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/335G06F16/34
CPCY02D10/00
Inventor 王小军席耀一唐永旺王波郭克坤徐东毛二松陈诚李福昌
Owner 宋来伟
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products