Behavioral data clustering method and device

By generating behavioral vectors and transforming long-tail data into locally dense target vectors, the problem of poor clustering effect of sparse behavioral data is solved, and effective clustering under sparse data is achieved.

CN116821714BActive Publication Date: 2026-06-30CHINA MOBILE FINANCIAL TECHNOLOGY CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA MOBILE FINANCIAL TECHNOLOGY CO LTD
Filing Date
2022-03-18
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

When processing sparse behavioral data, existing technologies, such as K-Means and other unsupervised machine learning algorithms, cannot effectively transform high-impact factor, low-density data, resulting in poor clustering performance.

Method used

By generating behavioral vectors and using a preset long-tail function to transform long-tail data, a locally dense target vector is formed, which is then used for clustering.

Benefits of technology

In the case of sparse behavioral data, it increases the impact period and influence of the data, and improves the clustering effect.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116821714B_ABST
    Figure CN116821714B_ABST
Patent Text Reader

Abstract

This application provides a method and apparatus for clustering behavioral data. The method includes: generating a behavioral vector for a user based on recorded behavioral data for any user at each time point within a preset time window; performing long-tail data transformation on the behavioral vectors of each user according to a preset long-tail function to obtain target vectors; and clustering the target vectors to obtain clustering results for the behavioral data of each user; wherein, if behavioral data exists at a time point, the recorded information corresponding to that time point is first recorded information; otherwise, the recorded information is second recorded information. The behavioral data clustering method provided in this application can effectively cluster behavioral data even when the behavioral data is sparse.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data mining technology, specifically to a method and apparatus for clustering behavioral data. Background Technology

[0002] To address individual differences and enhance the robustness of user behavior data analysis results for predictive analysis of group characteristics and screening of similar user groups, clustering of user behavior data is typically necessary. In related technologies, clustering of user behavior data is achieved using unsupervised machine learning algorithms such as K-Means. However, these technologies require collecting dense behavioral data for clustering. When the collected behavioral data is sparse, the one-hot encoding method used in these technologies creates a high-dimensional sparse matrix, failing to incorporate high-impact, low-density data into distance calculations. This hinders the effective transformation of sparse user behavior, resulting in poor clustering performance. Summary of the Invention

[0003] This application provides a method and apparatus for clustering behavioral data, which can effectively cluster behavioral data even when the behavioral data is sparse.

[0004] In a first aspect, embodiments of this application provide a behavioral data clustering method, including:

[0005] Generate the user's behavior vector based on the recorded behavior data of any user at each time point in the preset time window;

[0006] Based on a preset long-tail function, the behavior vectors of each user are transformed into long-tail data to obtain each target vector.

[0007] Cluster the target vectors to obtain the clustering results of the behavioral data of each user.

[0008] Wherein, if the behavioral data exists at the time node, the record information corresponding to the time node is the first record information; otherwise, the record information is the second record information.

[0009] In one embodiment, generating the user's behavior vector based on the recorded behavior data of any user at each time point within a preset time window includes:

[0010] Obtain the recorded information of the user's behavioral data at each of the aforementioned time points;

[0011] The recorded information is sorted according to the time sequence between the time nodes to form the behavior vector.

[0012] In one embodiment, the time interval between adjacent time nodes is the same.

[0013] In one embodiment, before performing long-tail data transformation on the behavior vectors of each user according to a preset long-tail function to obtain each target vector, the method further includes:

[0014] Based on the data type of the behavioral data, a preset long-tail function corresponding to the data type is obtained from each preset conversion function.

[0015] In one embodiment, the preset long-tail function is:

[0016]

[0017] Where i represents the first record information in the behavior vector, and I represents the influence factor on the t-th record information following the first record information. init The initial influence factor of the first recorded information is represented by , and D represents the attenuation coefficient.

[0018] In one embodiment, the step of performing long-tail data transformation on the behavior vectors of each user according to a preset long-tail function to obtain each target vector includes:

[0019] According to the preset long-tail function, obtain each of the influence factors corresponding to any of the record information in the behavior vector;

[0020] The influencing factors corresponding to the recorded information are superimposed to obtain the target data corresponding to the recorded information;

[0021] The target vector is formed based on each of the target data.

[0022] In one embodiment, clustering each of the target vectors to obtain the clustering results of the behavioral data of each user includes:

[0023] The similarity between the target vectors is determined based on the distance between them.

[0024] Based on the similarity between the target vectors, the target vectors are clustered to obtain the clustering results of the user's behavior data.

[0025] Secondly, embodiments of this application provide a behavioral data clustering apparatus, comprising:

[0026] The behavior vector generation module is used to generate the user's behavior vector based on the recorded information of the user's behavior data at each time node in a preset time window;

[0027] The long-tail data conversion module is used to perform long-tail data conversion on the behavior vectors of each user according to a preset long-tail function to obtain each target vector;

[0028] The behavior data clustering module is used to cluster the target vectors and obtain the clustering results of the behavior data of each user.

[0029] If the behavioral data exists at the specified time point, the recorded information is the first recorded information; otherwise, the recorded information is the second recorded information.

[0030] Thirdly, embodiments of this application provide an electronic device, including a processor and a memory storing a computer program, wherein the processor executes the program to implement the steps of the behavioral data clustering method described in the first aspect.

[0031] Fourthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the steps of the behavioral data clustering method described in the first aspect.

[0032] The behavioral data clustering method and apparatus provided in this application form a user's behavioral vector by recording behavioral data for any user at each time node within a preset time window. The behavioral vector is then subjected to a long-tail transformation to obtain the target vector for each user, enabling the clustering of behavioral data for each user. This long-tail transformation converts the behavioral vectors corresponding to the behavioral data into locally dense behavioral vectors. This improves the impact period and influence of sparse behavioral data, thereby enabling effective clustering of sparse behavioral data and enhancing the clustering effect. Attached Figure Description

[0033] To more clearly illustrate the technical solutions in this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0034] Figure 1 This is a flowchart illustrating the behavioral data clustering method provided in an embodiment of this application;

[0035] Figure 2 This is a schematic diagram of the structure of the behavioral data clustering device provided in the embodiments of this application;

[0036] Figure 3 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation

[0037] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings of the embodiments. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0038] The embodiments of this application will now be described in detail with reference to the accompanying drawings.

[0039] Reference Figure 1 This is one of the flowcharts illustrating a behavioral data clustering method provided in this embodiment. This method is applied to servers or terminal devices to cluster the behavioral data of various users, thereby effectively predicting the behavioral characteristics of a user group. For example... Figure 1 As shown, the behavioral data clustering method provided in this embodiment includes:

[0040] Step 101: Generate the user's behavior vector based on the recorded information of the user's behavior data at each time node in the preset time window;

[0041] Step 102: According to the preset long-tail function, perform long-tail data transformation on the behavior vectors of each user to obtain each target vector;

[0042] Step 103: Cluster the target vectors to obtain the clustering results of the user's behavior data;

[0043] If the behavioral data exists at the specified time point, the recorded information is the first recorded information; otherwise, the recorded information is the second recorded information.

[0044] By forming user behavior vectors from the recorded behavioral data at each time point within a preset time window, and then performing a long-tail transformation on these behavioral vectors, target vectors for each user are obtained for clustering the user's behavioral data. This long-tail transformation converts the behavioral vectors corresponding to the behavioral data into locally dense behavioral vectors, thereby increasing the impact period and influence of sparse behavioral data. This enables effective clustering of sparse behavioral data and improves the clustering effect.

[0045] In one embodiment, the preset time window can be determined according to actual conditions, such as one day, one week, or one month. After determining the preset time window, behavioral data of any user at each time point in the preset time serial port can be collected. For example, behavioral data of user A at each time point within a week can be collected.

[0046] In one embodiment, the time nodes can be set according to the actual situation of a preset time window, and the time interval between adjacent time nodes is the same. For example, if the selected preset time window is one week and the analysis granularity is days, then each time node includes the first day, the second day, ... and the seventh day of the preset time window.

[0047] In one embodiment, the recording information is used to record whether user behavior data exists at the i-th time node within a preset time window. If user behavior data exists at the i-th time node within the preset time window, the recording information corresponding to the i-th time node is the first recording information, which is recorded as 1; otherwise, the recording information corresponding to the i-th time node is the second recording information, which is recorded as 0. For example, the behavior data is a marketing activity, and the preset time window is one week. If a user's participation record in the marketing activity is collected on the first day within the preset time window, the recording information for the first day within the preset time window is the first recording information, recorded as 1; otherwise, the recording information is the second recording information, recorded as 0.

[0048] After collecting user behavior data at various time points, behavior vectors can be formed based on the recorded information at each time point. The length of the behavior vector is determined by the number of time points. For example, if there are 7 time points, then the length of any behavior vector will be 7.

[0049] In one embodiment, generating the user's behavior vector based on the recorded behavior data of any user at each time point within a preset time window includes:

[0050] Obtain the recorded information of the user's behavioral data at each of the aforementioned time points;

[0051] The recorded information is sorted according to the time sequence between the time nodes to form the behavior vector.

[0052] In one embodiment, after obtaining the recorded information of behavioral data at each time point, the recorded information from the first time point to the last time point is sorted sequentially according to the time order of each time point to form the user's behavior vector.

[0053] For example, if the preset time window is one week, the behavioral data is a marketing campaign, and the recorded information for each time node from the first day to the last day is 1, 1, 0, 0, 0, 1, 0, this indicates that the user participated in the marketing campaign on the first, second, and sixth days of the week. In this case, the user's behavioral vector generated according to the chronological order of each time node is:

[0054] In one embodiment, after obtaining the behavior vector, a long-tail data transformation can be performed on the behavior vector using a preset long-tail function. The preset long-tail function is used to determine the influence factor of the first record information at any given time point on the record information collected after that time point. The preset long-tail function can be a power function or a linear function constructed with time as the independent variable and the influence factor as the dependent variable. The long-tail data transformation can convert the record information at any time point in the behavior vector into target data after the sum of the various influence factors corresponding to that record information. Considering that the attractiveness of the scenario corresponding to the behavior data usually declines with increasing time, such as in a marketing campaign scenario where its attractiveness gradually decreases, meaning the influence factor of the target node decreases over time, the preset long-tail function can be a power function or a linear function with a decaying trend, to make the target vector obtained after long-tail data transformation more consistent with the actual scenario and further improve the subsequent clustering effect.

[0055] To ensure that the target vectors obtained after long-tail data transformation better reflect different realities and thus improve the clustering effect of subsequent behavioral data, in one embodiment, before performing long-tail data transformation on the behavioral vectors of each user according to a preset long-tail function to obtain each target vector, the following steps are also included:

[0056] Based on the data type of the behavioral data, a preset long-tail function corresponding to the data type is obtained from each preset conversion function.

[0057] In one embodiment, the database pre-stores preset conversion functions corresponding one-to-one with each data type. The correspondence between each data type and each preset conversion function can be pre-recorded in the database's information mapping table. The data type is determined based on the actual scenario; for example, if the actual scenario is a marketing campaign, then the data type is "marketing campaign data." Different preset conversion functions are pre-defined for different data types, ensuring that each preset conversion function conforms to the real-world logic of the corresponding scenario. For example, if the data type is "marketing campaign data," then in reality, the attractiveness of marketing campaigns generally exhibits a declining trend. Therefore, the preset long-tail function corresponding to this data type is either a power function exhibiting a declining trend or a linear function exhibiting a declining trend. The specific preset conversion functions corresponding to different data types can be pre-defined according to the actual situation, which will not be elaborated upon here.

[0058] In one embodiment, after obtaining the user's behavior vector, based on the data type of the behavior data corresponding to the behavior vector, the system searches a table containing mapping relationships between various data types and preset conversion functions to find the preset long-tail function corresponding to that data type for long-tail data conversion. This makes the target vector obtained after long-tail data conversion more closely resemble the actual situation, thereby improving the clustering effect of subsequent behavior data.

[0059] For example, a preset long-tail function could be:

[0060]

[0061] Where i represents the first record information in the behavior vector, and I represents the influence factor on the t-th record information following the first record information. init The initial influence factor of the first recorded information is represented by , and D represents the attenuation coefficient.

[0062] Different preset long-tail functions can be used for different data types, such as logarithmic decay or exponential decay, to ensure that they conform as closely as possible to the design logic of the actual scenario corresponding to the data type.

[0063] In one embodiment, the step of performing long-tail data transformation on the behavior vectors of each user according to a preset long-tail function to obtain each target vector includes:

[0064] According to the preset long-tail function, obtain each of the influence factors corresponding to any of the record information in the behavior vector;

[0065] When the recorded information is the first recorded information, the influence factors corresponding to the first recorded information are superimposed with the initial influence factor to obtain the target data corresponding to the first recorded information;

[0066] When the recorded information is the second recorded information, the influencing factors corresponding to the second recorded information are superimposed to obtain the target data corresponding to the second recorded information;

[0067] The target vector is generated based on the target data corresponding to each of the recorded information.

[0068] In one embodiment, after determining the preset long-tail function, the time node of the first record information is determined in advance based on the record information in the behavior vector. For example, the behavior vector is... A value of 1 indicates the presence of behavioral data, while 0 indicates the absence of behavioral data. The time nodes for the first record information are the first, second, and sixth time nodes. Then, each first record information is input into a preset long-tail function to obtain the influence factor of any first record information on subsequent record information.

[0069] Using the behavior vector as The default long-tail function is For example, if the attenuation coefficient D is 1 and the initial influence factor is 3, then the preset long-tail function is: Inputting the first record information of the first time node in the behavior vector into a preset long-tail function, the influence factor of the first record information of the first time node on the record information of the 0th time node (i.e., the record information of the first time node itself) is: i = 3 - 0 = 3; the influence factor of the first record information of the first time node on the record information of the 1st time node (i.e., the record information of the second time node in the behavior vector) is i = 3 - 1 = 2; the influence factor of the first record information of the first time node on the record information of the 2nd time node (i.e., the record information of the third time node in the behavior vector) is i = 3 - 2 = 1; and the influence factor of the first record information of the first time node on the record information of the 3rd time node (i.e., the record information of the fourth time node in the behavior vector) is i = 3 - 3 = 0. Simultaneously, based on the above preset long-tail function, it can be seen that the influence factor of the first record information of the first time node on the record information of the fifth to seventh time nodes in the behavior vector is all 0.

[0070] At this point, the partial influence factors corresponding to the seven records in the behavior vector can be obtained as follows:

[0071] The impact factor corresponding to the recorded information at the first time point is {3}.

[0072] The impact factor corresponding to the recorded information at the second time point is {2}.

[0073] The impact factor corresponding to the recorded information at the third time point is {1}.

[0074] The impact factor corresponding to the recorded information at the fourth time point is {0}.

[0075] The impact factor corresponding to the recorded information at the fifth time point is {0}.

[0076] The impact factor corresponding to the recorded information at the sixth time point is {0}.

[0077] The impact factor corresponding to the recorded information at the seventh time point is {0}.

[0078] Since the recorded information at the second time point is also the first recorded information, the influence factor of the first recorded information at the second time point on the recorded information at the 0th time point (i.e., its own information) can be determined through the aforementioned preset long-tail function: i = 3 - 0 = 3; the influence factor of the first recorded information at the second time point on the recorded information at the 1st time point (i.e., the recorded information at the 3rd time point in the behavior vector) is i = 3 - 1 = 2; the influence factor of the first recorded information at the second time point on the recorded information at the 2nd time point (i.e., the recorded information at the 4th time point in the behavior vector) is i = 3 - 2 = 1; and the influence factor of the first recorded information at the second time point on the recorded information at the 3rd time point (i.e., the recorded information at the 5th time point in the behavior vector) is i = 3 - 3 = 0. Therefore, the influence factor of the first recorded information at the second time point on the recorded information at the 6th to 7th time points in the behavior vector is 0.

[0079] Then, combining the partial influence factors corresponding to each record obtained from the first record information at the first time node, we can obtain the partial influence factors corresponding to the seven records in the behavior vector as follows:

[0080] The impact factor corresponding to the recorded information at the first time point is {3}.

[0081] The impact factor corresponding to the recorded information at the second time point is {2, 3};

[0082] The impact factor corresponding to the recorded information at the third time point is {1, 2}.

[0083] The impact factor corresponding to the recorded information at the fourth time point is {0, 1}.

[0084] The impact factor corresponding to the recorded information at the fifth time point is {0, 0}.

[0085] The impact factor corresponding to the recorded information at the sixth time point is {0, 0}.

[0086] The impact factor corresponding to the recorded information at the seventh time point is {0, 0}.

[0087] Similarly, since the recorded information at the sixth time point is also the first recorded information, the influence factor of the first recorded information at the sixth time point on the recorded information at the 0th time point after it, i.e., its own, can be obtained through the above-mentioned preset long-tail function: i = 3 - 0 = 3; the influence factor of the first recorded information at the sixth time point on the recorded information at the 1st time point after it, i.e., the recorded information at the seventh time point in the behavior vector, is i = 3 - 1 = 2.

[0088] In summary, the influencing factors corresponding to the seven records in the behavior vector are as follows:

[0089] The impact factor corresponding to the recorded information at the first time point is {3}.

[0090] The impact factor corresponding to the recorded information at the second time point is {2, 3};

[0091] The impact factor corresponding to the recorded information at the third time point is {1, 2}.

[0092] The impact factor corresponding to the recorded information at the fourth time point is {0, 1}.

[0093] The impact factor corresponding to the recorded information at the fifth time point is {0, 0}.

[0094] The impact factor corresponding to the recorded information at the sixth time point is {0, 0, 3}.

[0095] The impact factor corresponding to the recorded information at the seventh time point is {0, 0, 2}.

[0096] After obtaining all the influencing factors corresponding to any given record, the influencing factors are summed to obtain the target data corresponding to the seven records in the behavior vector as follows: 3, 5, 3, 1, 0, 3, 2. Then, based on the chronological order of the time nodes corresponding to each target data point, the target data are combined to form the target vector. As can be seen, the long-tail data transformation method described above greatly improves the density of the vector, thus facilitating subsequent clustering and improving the clustering effect.

[0097] In one embodiment, clustering is performed on each of the target vectors to obtain the clustering results of the behavioral data of each user, including:

[0098] The similarity between the target vectors is determined based on the distance between them.

[0099] Based on the similarity between the target vectors, the target vectors are clustered to obtain the clustering results of the user's behavior data.

[0100] The behavior vectors of each user are respectively For example, without transformation, the Euclidean distances of the three components are pairwise equal, making it impossible to accurately cluster the behavioral data of each user. However, after the long-tail data transformation described above, the target vectors for each user can be obtained as follows: At this point, calculating the distance between each target vector yields... The distance is and The distance between them is Compared to clustering without long-tail data transformation, the recognition efficiency is significantly improved. After obtaining the distance between each target vector, the behavioral data of users corresponding to two target vectors with a distance less than a preset value can be grouped into the same category of behavioral data, thus completing the clustering of behavioral data. As a result, the clustered behavioral data can be better used for group feature identification and prediction of other users' behavioral characteristics.

[0101] The behavioral data clustering device provided by the present invention is described below. The behavioral data clustering device described below and the behavioral data clustering method described above can be referred to in correspondence.

[0102] In one embodiment, such as Figure 2 As shown, a behavioral data clustering device is provided, comprising:

[0103] The behavior vector generation module 210 is used to generate the user's behavior vector based on the recorded information of the user's behavior data at each time node in a preset time window.

[0104] The long-tail data conversion module 220 is used to perform long-tail data conversion on the behavior vectors of each user according to a preset long-tail function to obtain each target vector;

[0105] The behavior data clustering module 230 is used to cluster each of the target vectors to obtain the clustering results of the behavior data of each user.

[0106] If the behavioral data exists at the specified time point, the recorded information is the first recorded information; otherwise, the recorded information is the second recorded information.

[0107] By forming user behavior vectors from the recorded behavioral data at each time point within a preset time window, and then performing a long-tail transformation on these behavioral vectors, target vectors for each user are obtained for clustering the user's behavioral data. This long-tail transformation converts the behavioral vectors corresponding to the behavioral data into locally dense behavioral vectors, thereby increasing the impact period and influence of sparse behavioral data. This enables effective clustering of sparse behavioral data and improves the clustering effect.

[0108] In one embodiment, the behavior vector generation module 210 is specifically used for:

[0109] Obtain the recorded information of the user's behavioral data at each of the aforementioned time points;

[0110] The recorded information is sorted according to the time sequence between the time nodes to form the behavior vector.

[0111] In one embodiment, the time interval between adjacent time nodes in each of the time nodes is the same.

[0112] In one embodiment, the long-tail data conversion module 220 is further configured to:

[0113] Based on the data type of the behavioral data, a preset long-tail function corresponding to the data type is obtained from each preset conversion function.

[0114] In one embodiment, the preset long-tail function is:

[0115]

[0116] Where i represents the first record information in the behavior vector, and I represents the influence factor on the t-th record information following the first record information. init The initial influence factor of the first recorded information is represented by , and D represents the attenuation coefficient.

[0117] In one embodiment, the long-tail data conversion module 220 is specifically used for:

[0118] According to the preset long-tail function, obtain each of the influence factors corresponding to any of the record information in the behavior vector;

[0119] The influencing factors corresponding to the recorded information are superimposed to obtain the target data corresponding to the recorded information;

[0120] The target vector is formed based on each of the target data.

[0121] In one embodiment, the behavioral data clustering module 230 is specifically used for:

[0122] The similarity between the target vectors is determined based on the distance between them.

[0123] Based on the similarity between the target vectors, the target vectors are clustered to obtain the clustering results of the user's behavior data.

[0124] Figure 3 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 3As shown, the electronic device may include a processor 810, a communication interface 820, a memory 830, and a communication bus 840, wherein the processor 810, the communication interface 820, and the memory 830 communicate with each other via the communication bus 840. The processor 810 can call a computer program in the memory 830 to execute steps of a behavioral data clustering method, such as:

[0125] Generate the user's behavior vector based on the recorded behavior data of any user at each time point in the preset time window;

[0126] Based on a preset long-tail function, the behavior vectors of each user are transformed into long-tail data to obtain each target vector.

[0127] Cluster the target vectors to obtain the clustering results of the behavioral data of each user.

[0128] Wherein, if the behavioral data exists at the time node, the record information corresponding to the time node is the first record information; otherwise, the record information is the second record information.

[0129] Furthermore, the logical instructions in the aforementioned memory 830 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0130] On the other hand, embodiments of this application also provide a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can perform the steps of the behavioral data clustering method provided in the above embodiments, such as including:

[0131] Generate the user's behavior vector based on the recorded behavior data of any user at each time point in the preset time window;

[0132] Based on a preset long-tail function, the behavior vectors of each user are transformed into long-tail data to obtain each target vector.

[0133] Cluster the target vectors to obtain the clustering results of the behavioral data of each user.

[0134] Wherein, if the behavioral data exists at the time node, the record information corresponding to the time node is the first record information; otherwise, the record information is the second record information.

[0135] On the other hand, embodiments of this application also provide a processor-readable storage medium storing a computer program for causing a processor to perform the steps of the methods provided in the above embodiments, such as including:

[0136] Generate the user's behavior vector based on the recorded behavior data of any user at each time point in the preset time window;

[0137] Based on a preset long-tail function, the behavior vectors of each user are transformed into long-tail data to obtain each target vector.

[0138] Cluster the target vectors to obtain the clustering results of the behavioral data of each user.

[0139] Wherein, if the behavioral data exists at the time node, the record information corresponding to the time node is the first record information; otherwise, the record information is the second record information.

[0140] The processor-readable storage medium can be any available medium or data storage device that the processor can access, including but not limited to magnetic memory (e.g., floppy disk, hard disk, magnetic tape, magneto-optical disk (MO)), optical memory (e.g., CD, DVD, BD, HVD), and semiconductor memory (e.g., ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH), solid-state drive (SSD)).

[0141] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0142] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0143] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A behavioral data clustering method, characterized in that, include: Generate the user's behavior vector based on the recorded behavior data of any user at each time point in the preset time window; Based on a preset long-tail function, the behavior vectors of each user are transformed into long-tail data to obtain each target vector. Cluster the target vectors to obtain the clustering results of the behavioral data of each user. Wherein, if the behavioral data exists at the time node, the record information corresponding to the time node is the first record information; otherwise, the record information is the second record information; the preset long-tail function is: Where i represents the first record information in the behavior vector, and the influence factor on the t-th record information following the first record information. The initial influence factor of the first recorded information is represented by , and D represents the attenuation coefficient.

2. The behavioral data clustering method according to claim 1, characterized in that, The step of generating the user's behavior vector based on the recorded behavior data of any user at each time point within a preset time window includes: Obtain the recorded information of the user's behavioral data at each of the aforementioned time points; The recorded information is sorted according to the time sequence between the time nodes to form the behavior vector.

3. The behavioral data clustering method according to claim 1 or 2, characterized in that, The time interval between adjacent time nodes in each of the aforementioned time nodes is the same.

4. The behavioral data clustering method according to claim 1, characterized in that, Before performing long-tail data transformation on the behavior vectors of each user according to a preset long-tail function to obtain each target vector, the process further includes: Based on the data type of the behavioral data, a preset long-tail function corresponding to the data type is obtained from each preset conversion function.

5. The behavioral data clustering method according to claim 1, characterized in that, The step of performing long-tail data transformation on the behavior vectors of each user according to a preset long-tail function to obtain each target vector includes: According to the preset long-tail function, obtain each of the influence factors corresponding to any of the record information in the behavior vector; The influencing factors corresponding to the recorded information are superimposed to obtain the target data corresponding to the recorded information; The target vector is formed based on each of the target data.

6. The behavioral data clustering method according to claim 1, characterized in that, Clustering is performed on each of the target vectors to obtain the clustering results of the behavioral data of each user, including: The similarity between the target vectors is determined based on the distance between them. Based on the similarity between the target vectors, the target vectors are clustered to obtain the clustering results of the user's behavior data.

7. A behavioral data clustering device, characterized in that, include: The behavior vector generation module is used to generate the user's behavior vector based on the recorded information of the user's behavior data at each time node in a preset time window; The long-tail data transformation module is used to perform long-tail data transformation on the behavior vectors of each user according to a preset long-tail function to obtain each target vector; The behavior data clustering module is used to cluster the target vectors and obtain the clustering results of the behavior data of each user. Wherein, if the behavioral data exists at the time node, the recorded information is the first recorded information; otherwise, the recorded information is the second recorded information; the preset long-tail function is: Where i represents the first record information in the behavior vector, and the influence factor on the t-th record information following the first record information. The initial influence factor of the first recorded information is represented by , and D represents the attenuation coefficient.

8. An electronic device comprising a processor and a memory storing a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the behavioral data clustering method according to any one of claims 1 to 6.

9. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the behavioral data clustering method according to any one of claims 1 to 6.