Information pushing method, apparatus, device, and medium
By performing DTW and DBSCAN clustering on user behavior sequences, group behavior patterns can be identified and information can be pushed, solving the problem of difficulty in mining group user preferences in existing technologies, and improving the accuracy of information push and the optimization effect of applications.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INDUSTRIAL AND COMMERCIAL BANK OF CHINA
- Filing Date
- 2023-03-22
- Publication Date
- 2026-06-26
AI Technical Summary
Existing information push methods are unable to effectively mine the behavioral preferences of group users, resulting in insufficient accuracy of information push.
By acquiring user behavior sequences, the DTW algorithm is used to construct a matrix, which is then combined with the DBSCAN algorithm for clustering. This identifies group behavior patterns, selects the central sequence for information push, and redirects noisy sequences to offline pages.
It improved the accuracy and reach of information push, optimized the application design, and achieved precise operation.
Smart Images

Figure CN116304343B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of big data technology and can be used in the financial field or other fields. More specifically, it relates to an information push method, device, equipment, medium, and program product. Background Technology
[0002] Pushing information to pages that users are interested in can effectively increase the exposure of the pushed information (such as advertisements, coupons, discounts, etc.) to users, thereby helping to improve the conversion rate of information push. Therefore, mining users' browsing behavior preferences when using applications is very beneficial in information push decisions. Existing methods for mining user browsing behavior interests can be divided into two schools of thought: one is to build a network model and analyze the user-page interest weights through random walks; however, this method can only mine whether an individual user is interested in the content of a single page, and it is difficult to mine the behavioral preferences of a group of users. The other is to use neural network training to predict the behavior of individual users to a certain extent; however, this method can only mine the behavioral preferences of a single user, and it is difficult to mine the behavioral preferences of a group. Summary of the Invention
[0003] In view of the above problems, this disclosure provides an information push method, apparatus, device, medium and program product that can mine group behavioral preferences to help improve the accuracy of information push to group users.
[0004] A first aspect of this disclosure provides an information push method. The method includes: acquiring N user behavior sequences, wherein each user behavior sequence is formed based on a sequence of pages browsed by a user during a single use of an application, where N is an integer greater than or equal to 2; clustering the N user behavior sequences to form at least one cluster; selecting a center sequence from each cluster; and pushing a message to the page sequence corresponding to the center sequence.
[0005] According to embodiments of this disclosure, the clustering of the N user behavior sequences includes: constructing a DTW matrix based on every two user behavior sequences in the N user behavior sequences using the Dynamic Time Warping (DTW) algorithm, wherein the number of rows I and columns J of the DTW matrix are the number of pages in every two user behavior sequences, and each element D[i, j] in the DTW matrix is the minimum number of jumps from the i-th page in one user behavior sequence to the j-th page in another user behavior sequence; obtaining the distance between every two user behavior sequences based on the DTW matrix corresponding to every two user behavior sequences; and clustering the N user behavior sequences based on the distance between every two user behavior sequences.
[0006] According to an embodiment of this disclosure, clustering the N user behavior sequences includes: using the DBSCAN algorithm, a density-based clustering method with noise, to cluster the N user behavior sequences.
[0007] According to embodiments of this disclosure, the clustering of the N user behavior sequences using the DBSCAN algorithm includes: firstly, determining all core sequences among the N user behavior sequences, wherein when the number of other user behavior sequences within the ε neighborhood of a user behavior sequence is greater than or equal to MinPts, the user behavior sequence is determined to be a core sequence, where ε is a predetermined distance radius; where MinPts is an integer, and 1≤MinPts<N-1; nextly, starting from any untraversed first core sequence among all the core sequences, a cluster is traversed cyclically as follows: determining all density-directed sequences from the N user behavior sequences that have a density-directed relationship with the first core sequence; and if the all density-directed sequences include untraversed core sequences, then updating the first core sequence with the untraversed core sequences.
[0008] According to embodiments of this disclosure, the center sequence is selected from each cluster in any of the following ways: using the first core sequence initially selected when dividing each cluster as the center sequence; using all core sequences in each cluster as the center sequence respectively; or using the first m core sequences in each cluster sorted from largest to smallest by the number of density direct access sequences, as the center sequence respectively, where m is an integer greater than or equal to 1.
[0009] According to an embodiment of this disclosure, the method further includes: if there is a noise sequence among the N user behavior sequences that does not belong to any category, terminating the page jump method corresponding to the noise sequence.
[0010] According to an embodiment of this disclosure, when there is a first non-core sequence that is not within the ε-neighborhood of any of the core sequences among the N user behavior sequences that are not part of the core sequence, the first non-core sequence is determined to be a noise sequence.
[0011] A second aspect of this disclosure provides an information push device. The device includes an acquisition module, a clustering module, a selection module, and a push module. The acquisition module acquires N user behavior sequences, where each user behavior sequence is formed based on a sequence of pages browsed by a user during a single use of an application, and N is an integer greater than or equal to 2. The clustering module clusters the N user behavior sequences to obtain at least one cluster. The selection module selects a center sequence from each cluster. The push module pushes a message to the page sequence corresponding to the center sequence.
[0012] According to embodiments of this disclosure, the apparatus further includes an optimization module. The optimization module is further configured to: terminate the page transition method corresponding to the noise sequence if there is a noise sequence among the N user behavior sequences that does not belong to any cluster.
[0013] A third aspect of this disclosure provides an electronic device. The electronic device includes one or more processors and a memory. The memory is used to store one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors perform the methods described above.
[0014] A fourth aspect of this disclosure also provides a computer-readable storage medium having executable instructions stored thereon, which, when executed by a processor, cause the processor to perform the methods described above.
[0015] A fifth aspect of this disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described method.
[0016] The above one or more embodiments have the following advantages or beneficial effects: clustering facilitates the mining of group behavior patterns, and information can be pushed to the central sequence page of each cluster, which can improve the reach of the pushed messages to the group and achieve the effect of precise operation. Attached Figure Description
[0017] The foregoing contents, as well as other objects, features, and advantages of this disclosure, will become clearer from the following description of embodiments with reference to the accompanying drawings, in which:
[0018] Figure 1 The illustrations depict application scenarios of information push methods, apparatuses, devices, media, and program products according to embodiments of the present disclosure.
[0019] Figure 2 A flowchart illustrating an information push method according to an embodiment of the present disclosure is shown schematically;
[0020] Figure 3This illustration schematically shows a flowchart of clustering N user behavior sequences in an information push method according to an embodiment of the present disclosure;
[0021] Figure 4 A schematic diagram of the ε-neighborhood in the DBSCAN algorithm is shown.
[0022] Figure 5 This illustration shows a flowchart illustrating clustering using the DBSCAN algorithm in an information push method according to another embodiment of the present disclosure;
[0023] Figure 6 A block diagram of an information push device according to an embodiment of the present disclosure is schematically shown; and
[0024] Figure 7 A block diagram schematically illustrates an electronic device suitable for implementing an information push method according to embodiments of the present disclosure. Detailed Implementation
[0025] The embodiments of the present disclosure will now be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of the disclosure. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the embodiments of the present disclosure for ease of explanation. However, it will be apparent that one or more embodiments may be practiced without these specific details. Furthermore, descriptions of well-known structures and techniques are omitted in the following description to avoid unnecessarily obscuring the concepts of the present disclosure.
[0026] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. The terms “comprising,” “including,” etc., as used herein indicate the presence of the stated features, steps, operations, and / or components, but do not exclude the presence or addition of one or more other features, steps, operations, or components.
[0027] All terms used herein (including technical and scientific terms) have the meanings commonly understood by those skilled in the art, unless otherwise defined. It should be noted that the terms used herein are to be interpreted in a manner consistent with the context of this specification, and not in an idealized or overly rigid way.
[0028] When expressions such as "at least one of A, B, and C" are used, they should generally be interpreted in accordance with the meaning commonly understood by those skilled in the art (e.g., "a system having at least one of A, B, and C" should include, but is not limited to, systems having only A, only B, only C, A and B, A and C, B and C, and / or systems having A, B, and C, etc.). The terms "first," "second," etc., used herein are for distinction only and have no limiting meaning, and the number of any elements in the accompanying drawings is for illustrative purposes only and not for limitation.
[0029] This disclosure provides an information push method, apparatus, device, medium, and program product. By performing cluster analysis on the behavioral sequences obtained when users browse applications, the preferred behavioral patterns of groups are obtained. Operational activities are then deployed on the pages corresponding to the central behavioral sequences of each cluster, achieving precise operation. For behavioral patterns far from the cluster centers, i.e., noisy sequences, the page jump method for these sequences can be arranged to be offline, streamlining the application. Here, the preferred behavioral pattern refers to the behavioral sequence formed by users' browsing records in mobile banking. The behavioral sequence obtained by studying the group's behavioral sequences is called the group's preferred behavioral pattern.
[0030] It should be noted that the information push methods, devices, equipment, media and program products provided in the embodiments of this disclosure can be used in the financial field, or in any field other than the financial field. This disclosure does not limit the application field.
[0031] Figure 1 The illustration shows an application scenario diagram of the information push method, apparatus, device, medium, and program product according to embodiments of the present disclosure.
[0032] like Figure 1 As shown, application scenario 100 according to this embodiment may include terminal devices 101, 102, and 103, network 104, and server 105. Network 104 is a medium used to provide a communication link between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links or fiber optic cables, etc.
[0033] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as mobile banking, shopping applications, web browser applications, search applications, etc. (for example only).
[0034] Terminal devices 101, 102, and 103 can be various electronic devices with displays and web browsing capabilities, including but not limited to smartphones, tablets, laptops, and desktop computers.
[0035] Server 105 can be a server that provides various services, such as a backend management server that supports websites browsed by users using terminal devices 101, 102, and 103 (for example only). The backend management server can analyze and process data such as received user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal devices.
[0036] It should be noted that the information push method provided in this disclosure embodiment can generally be executed by server 105. Accordingly, the information push device, equipment, medium, and program product provided in this disclosure embodiment can generally be located in server 105. The information push method provided in this disclosure embodiment can also be executed by a server or server cluster that is different from server 105 and capable of communicating with terminal devices 101, 102, 103, and / or server 105. Accordingly, the information push device, equipment, medium, and program product provided in this disclosure embodiment can also be located in a server or server cluster that is different from server 105 and capable of communicating with terminal devices 101, 102, 103, and / or server 105.
[0037] Understandable. Figure 1 The examples shown are merely examples of system architectures that can be applied to the embodiments of this disclosure, in order to help those skilled in the art understand the technical content of this disclosure, but do not mean that the embodiments of this disclosure cannot be used in other devices, systems, environments or scenarios.
[0038] The following will be based on Figure 1 The described scene, through Figures 2-5 The information push method according to embodiments of this disclosure will be described in detail. It should be noted that the sequence numbers of each operation in the following methods are only for descriptive purposes and should not be considered as indicating the execution order of the operations. Unless explicitly stated otherwise, the method does not need to be executed in the exact order shown.
[0039] Figure 2 A flowchart illustrating an information push method according to an embodiment of the present disclosure is shown.
[0040] like Figure 2 As shown, the information push method may include operations S210 to S240.
[0041] In operation S210, N user behavior sequences are obtained, where each user behavior sequence is formed based on a sequence of pages browsed by a user during a single use of the application, and N is an integer greater than or equal to 2.
[0042] All access records within a certain period can be extracted from the application logs on server 105. Each user's access to the application from opening to exiting the application constitutes one usage session. The sequence of pages viewed by each user during each usage session constitutes a user behavior sequence.
[0043] For example, extract the full browsing history of all users using the application within a certain period, set the browsing sequence number of each time as the primary key, and obtain the user behavior sequence. One example of a user behavior sequence is as follows (1):
[0044] tra={(Page1, Page2, Page3,..., Page n )} (1)
[0045] In operation S220, the N user behavior sequences are clustered to form at least one cluster. The clustering algorithm used can be any clustering algorithm in the field, such as the K-MEANS clustering algorithm, or the density-based clustering of applications with noise (DBSCAN).
[0046] In operation S230, a central sequence is selected from each cluster. One or more sequences located at the center of a cluster can be selected as the central sequence, or one or more representative sequences from each cluster can be selected as the central sequence according to a certain strategy.
[0047] For example, when using the K-MEANS algorithm for clustering, the center sequence in each cluster can be a sequence of user behaviors that serve as the cluster center.
[0048] For example, when using the DBSCAN algorithm for clustering, since the N user sample sequences can be divided into core sequences and non-core sequences, the center sequence in each cluster can be selected from the core sequences of each cluster, which will be explained in detail below.
[0049] Next, operate S240 to push messages in the page sequence corresponding to the central sequence.
[0050] Clustering facilitates the discovery of group behavior patterns, which in turn allows for the delivery of pop-up ads or advertisements to the central sequence pages of each cluster. This can improve the reach of push messages to the groups corresponding to each cluster, achieving the effect of precise operation.
[0051] Furthermore, if clustering reveals that among the N user behavior sequences, there exists a sequence that does not belong to any cluster, this sequence can be marked as a noise sequence. The clustering results indicate that the page browsing pattern corresponding to this noise sequence is rarely used. In this case, the page navigation between the pages corresponding to the noise sequence can be terminated, simplifying page navigation in the application and optimizing the application design. Of course, the presence of noise sequences depends on the choice of clustering algorithm. For example, if the K-MEANS clustering method is used, all sequences after partitioning will belong to a single cluster, making it impossible to uncover noise sequences. Conversely, if the DBSCAN algorithm is used, noise sequences can be uncovered, facilitating application simplification.
[0052] Figure 3 The flowchart illustrating the clustering of N user behavior sequences in an information push method according to an embodiment of the present disclosure is shown.
[0053] like Figure 3 As shown, according to this embodiment, the clustering process in operation S220 may include operations S301 to S303.
[0054] First, in operation S301, based on every two user behavior sequences in N user behavior sequences, a DTW matrix is constructed according to the Dynamic Time Warping (DTW) algorithm.
[0055] Specifically, the number of rows I and the number of columns J in each DTW matrix are the number of pages in each pair of user behavior sequences, respectively. Each element D[i,j] in the DTW matrix is the minimum number of jumps from the i-th page in one user behavior sequence to the j-th page in another user behavior sequence.
[0056] For example, for user behavior sequence 1 and user behavior sequence 2, assuming that these two sequences have passed through I pages and J pages respectively, then user behavior sequence 1 and user behavior sequence 2 can be represented by the following equation (2):
[0057] tra1 = {(Page 11 Page 12 Page 13 Page 1I )}
[0058] tra2 = {(Page 21Page 22 Page 23 Page 2J (2)
[0059] For tra1 and tra2, an I*J DTW matrix can be constructed, where D[i,j] represents the i-th node in tra1 (e.g., Page). 1i ) to the j-th node in tra2 (e.g., Page) 2j The minimum number of jumps to (Page). 1i and Page 2j The minimum number of jumps between the two pages can be the minimum of the number of jumps between the two pages that have appeared in N user behavior sequences.
[0060] Next, in operation S302, the distance between each pair of user behavior sequences is obtained based on the DTW matrix corresponding to each pair of user behavior sequences.
[0061] In some embodiments, dynamic programming can be used to search for the shortest distance between the page sequence corresponding to coordinate [1, 1] and the page sequence corresponding to coordinate [I, J] in the DTW matrix, which is the DTW distance, and this distance can be used as the distance between every two user behavior sequences.
[0062] Taking the I*J DTW matrix constructed by tra1 and tra2 above as an example, we will explain the process of obtaining the DTW distance between tra1 and tra2.
[0063] Based on tra x In the DTW matrix constructed with tra2, the page sequence corresponding to coordinate [1, 1] is {Page 11 Page 21}, the page sequence corresponding to coordinates [I, J] is {Page 1I Page 2J Starting from coordinate [1, 1] in the DTW matrix, each jump to an adjacent coordinate position results in a search trajectory after reaching coordinate [I, J]. The sum of the elements at the coordinate positions traversed along this trajectory represents the distance of that trajectory. Multiple search trajectories can exist between coordinate [1, 1] and coordinate [I, J] in the DTW matrix. The minimum distance among these distances is the DTW distance between tra1 and tra2.
[0064] The DTW distance, based on a dynamic programming strategy, adjusts the time-domain alignment of two time series through nonlinearity and calculates the distance between them. It can effectively measure the similarity of time series, especially when the two time series have different lengths, where the similarity measurement is more accurate.
[0065] Of course, in other embodiments, the distance between any two user behavior sequences can also be measured using the norm of the DTW matrix (such as the 1-norm, 0-norm, or F-norm). This method of calculating distance is simpler and computationally less computationally expensive than calculating DTW distance.
[0066] Then, in operation S303, the N user behavior sequences are clustered based on the distance between every two user behavior sequences. As mentioned above, the clustering algorithm used can be any clustering algorithm in this field.
[0067] If using the K-MEANS algorithm, the number of clusters to be divided must be known in advance. For example, the number of clusters can be determined based on the characteristics of the application's users (e.g., age, income level, city of residence, etc.) before clustering. After K-MEANS clustering, each user behavior sequence will be assigned to a certain cluster, thus making it impossible to discover noisy sequences.
[0068] The DBSCAN algorithm can be used to extract noisy sequences. The clustering process using the DBSCAN algorithm will be described in detail below.
[0069] Figure 4 A schematic diagram of the ε-neighborhood in the DBSCAN algorithm is shown. Figure 5 The flowchart illustrating clustering using the DBSCAN algorithm in an information push method according to another embodiment of this disclosure is shown below. Figure 4 and Figure 5 The process of clustering using the DBSCAN algorithm is explained below.
[0070] Specifically, in this embodiment of the disclosure, the sequences processed by the DBSCAN algorithm on N user behavior sequences can be divided into: core sequences, boundary sequences, and noise sequences. The definitions of these three types of sequences are as follows.
[0071] Core sequence: Among N user behavior sequences, if the ε-neighborhood of user behavior sequence p contains at least MinPts other user behavior sequences (e.g., containing user behavior sequence p), then user behavior sequence p is called the core sequence, i.e., Nε(p) >= MinPts, and p is called the core sequence, where the expression for the ε-neighborhood is N ε(p) = {q∈D|dist(p,q)<=ε}, where D is a dataset consisting of N user behavior sequences, dist(p,q) represents the distance between user behavior sequence p and user behavior sequence q, for example, it can be the distance calculated in the above operation S302; ε is a predetermined distance radius, where MinPts is an integer and 1≤MinPts<N-1.
[0072] Combination Figure 4 Assuming MinPts = 3, according to the definition of a core sequence, the number of other user behavior sequences in the ε-neighborhood of user behavior sequence ① and user behavior sequence ④ are 3 and 4 respectively. Therefore, user behavior sequence ① and user behavior sequence ④ are both core sequences.
[0073] Boundary sequence: Among N user behavior sequences, a non-core sequence that does not belong to the core sequence, such as user behavior sequence b, if b lies within the ε-neighborhood of any core sequence p, then b is called a boundary sequence. Figure 4 User behavior sequences ②, ③, ⑤, ⑥, and ⑦ are all boundary sequences.
[0074] Noisy sequence: For a non-core sequence r, if r is not within the ε-neighborhood of any core sequence p, then r is called a noisy sequence. For example... Figure 4 User behavior sequences ⑧ and ⑨ are noise sequences.
[0075] As can be seen, the DBSCAN algorithm can divide N user behavior sequences into core sequences and non-core sequences. The non-core sequences can include boundary sequences and / or noise sequences.
[0076] The following concepts can be used in the DBSCAN algorithm clustering process:
[0077] Density accessibility: If q is within the ε neighborhood of p, and p is a core sequence, then q is said to be density accessible from p.
[0078] Density reachability: If q is within the ε neighborhood of p, and both p and q are core sequences, then the neighborhood points of q are said to be density reachable from p.
[0079] Density-connected: If q and p are both non-core sequences, and p and q are in the same cluster, then q and p are said to be density-connected.
[0080] Next, based on the above introduction, through Figure 5 The process of clustering N user behavior sequences using the DBSCAN algorithm will be described in detail, which may include operations S501 to S509.
[0081] First, in operation S501, all core sequences among N user behavior sequences are determined. When the number of other user behavior sequences in the ε neighborhood of a user behavior sequence is greater than or equal to MinPts, the user behavior sequence is determined to be a core sequence.
[0082] In practice, based on the initial assessment of the number of user groups to be classified (e.g., set to 5-10 clusters), and combined with the distance distribution between sequences, the values of parameters ε and MinPts can be set to control the final number of clusters within the expected range.
[0083] Next, by looping through operations S502 to S508, the core sequence is traversed and the clusters are divided until all core sequences have been traversed.
[0084] Specifically, in operation S502, a core sequence that has not yet been traversed is selected as the first core sequence.
[0085] Next, in operation S503, all density-directed sequences that have a density-directed relationship with the first core sequence are determined from N user behavior sequences.
[0086] Then, in operation S504, it is determined whether the above-mentioned density-accessible sequences include any untraversed core sequences. That is, it is determined whether the first core sequence has a density-accessible sequence, and whether that density-accessible sequence is an untraversed core sequence.
[0087] If yes, then in operation S505, update the first core sequence with the core sequence that has not been traversed, and return to operation S503. If no, then in operation S506, classify the core sequences traversed in the loop from operation S503 to operation S505, as well as the density direct access sequence of each core sequence, into a single cluster.
[0088] In this way, after each loop of operations S503 to S505 terminates, a cluster can be obtained in operation S506. For example... Figure 4 In this context, user behavior sequences ①, ②, ③, ④, ⑤, ⑥, and ⑦ can be grouped into a single cluster.
[0089] Next, in step S507, determine if there are any user behavior sequences that have not yet been assigned to any cluster. If not, proceed to step S509 to confirm the end of clustering. If yes, continue the determination in step S508.
[0090] In operation S508, determine whether there are still core sequences among the user behavior sequences that have not yet been assigned to any cluster. If so, it means that there are still untraversed core sequences. At this point, return to operation S501, take any untraversed core sequence as the first core sequence, and then perform clustering on the user behavior sequences that have not yet been assigned to any cluster through operations S502 to S508, until operation S507 determines that there are no user behavior sequences that have not yet been assigned to any cluster, or operation S508 determines that there are no untraversed core sequences.
[0091] When the loop of operations S502 to S508 meets the termination condition, that is, there is no core sequence that has not been traversed, or there is no user behavior sequence that has not been assigned to any cluster, then in operation S509, the clustering is determined to end.
[0092] In the above clustering, the divided clusters can be obtained based on the output of operation S506.
[0093] In some embodiments, the core sequence search in operation S501 can be performed simultaneously with the clustering process in operations S502 to S508. Specifically, the DBSCAN algorithm can randomly select a sequence from N user behavior sequences and traverse the clusters from that sequence. The algorithm terminates when all core sequences in the N user behavior sequences have been traversed. Specifically, first, the number of other user behavior sequences in the ε neighborhood of each user behavior sequence is found. If the number is greater than or equal to MinPts, the user behavior sequence is determined as a core sequence. Then, the density-accessible sequence of each core sequence is determined, and then the untraversed core sequences are found from the density-accessible sequences. This process is repeated until no untraversed core sequences are found, at which point the traversal process terminates. Then, each core sequence and its density-accessible sequence traversed in this process are classified into the same cluster. Next, a core sequence is found from the user behavior sequences that have not yet been clustered, and the above process is repeated. If no core sequence exists in the similar user behavior sequences that have not yet been divided, then the cluster division ends.
[0094] After clustering N user behavior sequences using the DBSCAN algorithm, in operation S230, the center sequence can be selected from each cluster in any of the following ways: using the first core sequence initially selected when dividing each cluster as the center sequence; or using all the core sequences in each cluster as center sequences; or using the top m core sequences in each cluster, sorted from largest to smallest by the number of density-directed sequences, as center sequences, where m is an integer greater than or equal to 1. Thus, one or more representative sequences can be selected as center sequences from the core sequences in each cluster.
[0095] After clustering, one or more central sequences are selected from each cluster, representing the group's preference behavior patterns. Activity pop-ups or advertisements can be placed on the pages of the central sequences to achieve the effect of precise operation.
[0096] For all non-core samples, if they are within the ε-neighborhood of a core sample (i.e., only samples whose density reaches a core sample), then the non-core samples are boundary samples; otherwise, they are noise sequences. For noise sequences, page navigation on this sequence can be arranged to take it offline.
[0097] As can be seen, the embodiments of this disclosure can analyze the browsing patterns of a group of users to uncover preference patterns and / or noise patterns in group behavior. Based on this, precise operations can be carried out on the uncovered group behavior patterns, including placing advertisements on the time series of preference behavior patterns at the center of a cluster to achieve precise operations; and / or arranging the jump method of the sequence to take offline for noise behavior patterns, thus simplifying the application.
[0098] Based on the information push methods of the above embodiments, this disclosure also provides an information push device. The following will be combined with... Figure 6 The device is described in detail.
[0099] Figure 6 A block diagram of an information push device 600 according to an embodiment of the present disclosure is shown schematically.
[0100] like Figure 6 As shown, according to one embodiment of this disclosure, the information push device 600 may include an acquisition module 610, a clustering module 620, a selection module 630, and a push module 640. According to another embodiment of this disclosure, the device 600 may further include an optimization module 650. The device 600 can perform reference... Figures 2-5 The method described.
[0101] The acquisition module 610 is used to acquire N user behavior sequences, where each user behavior sequence is formed based on a sequence of pages browsed by a user during a single use of the application, and N is an integer greater than or equal to 2. In one embodiment, the acquisition module 610 can perform the operation S210 described above.
[0102] The clustering module 620 is used to cluster N user behavior sequences to obtain at least one cluster. In one embodiment, the clustering module 620 can perform the operation S220 described above.
[0103] In one embodiment, the clustering module 620 can be used to: construct a DTW matrix based on every two user behavior sequences in N user behavior sequences according to the Dynamic Time Warping (DTW) algorithm, wherein the number of rows I and columns J of the DTW matrix are the number of pages in every two user behavior sequences, and each element D[i,j] in the DTW matrix is the minimum number of jumps from the i-th page in one user behavior sequence to the j-th page in another user behavior sequence; obtain the distance between every two user behavior sequences based on the DTW matrix corresponding to every two user behavior sequences; and cluster the N user behavior sequences based on the distance between every two user behavior sequences.
[0104] In another embodiment, clustering module 620 can be used to cluster N user behavior sequences using a density-based clustering algorithm with noise, DBSCAN.
[0105] Selection module 630 is used to select the center sequence from each cluster. In one embodiment, selection module 630 can perform the operation S230 described above.
[0106] The push module 640 is used to push messages in the page sequence corresponding to the central sequence. In one embodiment, the push module 640 can perform the operation S240 described above.
[0107] The optimization module 650 is used to terminate the page jump method corresponding to the noise sequence if there is a noise sequence among the N user behavior sequences that does not belong to any category.
[0108] In one embodiment, the optimization module 650 is further configured to determine the first non-core sequence as a noise sequence when clustering N user behavior sequences according to the DBSCAN algorithm, and when there is a first non-core sequence that is not within the ε-neighborhood of any core sequence among the non-core sequences that are not core sequences in the N user behavior sequences.
[0109] According to embodiments of this disclosure, any multiple modules among the acquisition module 610, clustering module 620, selection module 630, push module 640, and optimization module 650 can be combined into one module, or any one of these modules can be split into multiple modules. Alternatively, at least some of the functions of one or more of these modules can be combined with at least some of the functions of other modules and implemented in one module. According to embodiments of this disclosure, at least one of the acquisition module 610, clustering module 620, selection module 630, push module 640, and optimization module 650 can be at least partially implemented as hardware circuitry, such as a field-programmable gate array (FPGA), a programmable logic array (PLA), a system-on-a-chip, a system-on-a-substrate, a system-on-package, an application-specific integrated circuit (ASIC), or implemented in hardware or firmware by any other reasonable means of integrating or packaging the circuitry, or implemented in software, hardware, or firmware, or in any suitable combination of any of these three implementation methods. Alternatively, at least one of the acquisition module 610, clustering module 620, selection module 630, push module 640, and optimization module 650 may be implemented at least partially as a computer program module, which can perform corresponding functions when the computer program module is run.
[0110] Figure 7 A block diagram schematically illustrates an electronic device suitable for implementing an information push method according to embodiments of the present disclosure.
[0111] like Figure 7 As shown, an electronic device 700 according to an embodiment of the present disclosure includes a processor 701, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 702 or a program loaded from a storage portion 708 into a random access memory (RAM) 703. The processor 701 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and / or an associated chipset and / or a special-purpose microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 701 may also include onboard memory for caching purposes. The processor 701 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of the present disclosure.
[0112] RAM 703 stores various programs and data required for the operation of electronic device 700. Processor 701, ROM 702, and RAM 703 are interconnected via bus 704. Processor 701 performs various operations of the method flow according to embodiments of the present disclosure by executing programs in ROM 702 and / or RAM 703. It should be noted that the programs may also be stored in one or more memories other than ROM 702 and RAM 703. Processor 701 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in said one or more memories.
[0113] According to embodiments of this disclosure, the electronic device 700 may further include an input / output (I / O) interface 705, which is also connected to a bus 704. The electronic device 700 may also include one or more of the following components connected to the I / O interface 705: an input section 706 including a keyboard, mouse, etc.; an output section 707 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 708 including a hard disk, etc.; and a communication section 709 including a network interface card such as a LAN card, modem, etc. The communication section 709 performs communication processing via a network such as the Internet. A drive 710 is also connected to the I / O interface 705 as needed. A removable medium 711, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 710 as needed so that computer programs read from it can be installed into the storage section 708 as needed.
[0114] This disclosure also provides a computer-readable storage medium, which may be included in the device / apparatus / system described in the above embodiments; or it may exist independently and not assembled into the device / apparatus / system. The computer-readable storage medium carries one or more programs that, when executed, implement the method according to the embodiments of this disclosure.
[0115] According to embodiments of this disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as, but not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this disclosure, the computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. For example, according to embodiments of this disclosure, the computer-readable storage medium may include ROM 702 and / or RAM 703 and / or one or more memories other than ROM 702 and RAM 703 described above.
[0116] Embodiments of this disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowchart. When the computer program product is run on a computer system, the program code is used to cause the computer system to implement the methods provided in the embodiments of this disclosure.
[0117] When the computer program is executed by the processor 701, it performs the functions defined in the system / apparatus of this disclosure embodiments. According to embodiments of this disclosure, the systems, apparatuses, modules, units, etc., described above can be implemented by computer program modules.
[0118] In one embodiment, the computer program may rely on a tangible storage medium such as an optical storage device or a magnetic storage device. In another embodiment, the computer program may also be transmitted and distributed in the form of signals over a network medium, and may be downloaded and installed via the communication section 709, and / or installed from a removable medium 711. The program code contained in the computer program can be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination thereof.
[0119] In such an embodiment, the computer program can be downloaded and installed from a network via the communication section 709, and / or installed from the removable medium 711. When the computer program is executed by the processor 701, it performs the functions defined in the system of this disclosure embodiment. According to embodiments of this disclosure, the systems, devices, apparatuses, modules, units, etc., described above can be implemented by computer program modules.
[0120] According to embodiments of this disclosure, program code for executing the computer programs provided in embodiments of this disclosure can be written in any combination of one or more programming languages. Specifically, these computational programs can be implemented using high-level procedural and / or object-oriented programming languages, and / or assembly / machine languages. Programming languages include, but are not limited to, languages such as Java, C++, Python, "C", or similar programming languages. The program code can execute entirely on the user's computing device, partially on the user's device, partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device can be connected to the user's computing device via any type of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (e.g., via the Internet using an Internet service provider).
[0121] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0122] Those skilled in the art will understand that the features described in the various embodiments and / or claims of this disclosure can be combined and / or combined in various ways, even if such combinations or combinations are not explicitly described in this disclosure. In particular, the features described in the various embodiments and / or claims of this disclosure can be combined and / or combined in various ways without departing from the spirit and teachings of this disclosure. All such combinations and / or combinations fall within the scope of this disclosure.
[0123] The embodiments of this disclosure have been described above. However, these embodiments are for illustrative purposes only and are not intended to limit the scope of this disclosure. Although various embodiments have been described above, this does not mean that the measures in the various embodiments cannot be used advantageously in combination. The scope of this disclosure is defined by the appended claims and their equivalents. Various substitutions and modifications can be made by those skilled in the art without departing from the scope of this disclosure, and all such substitutions and modifications should fall within the scope of this disclosure.
Claims
1. An information push method, comprising: Obtain N user behavior sequences, where each user behavior sequence is formed based on a sequence of pages browsed by a user during a single use of the application, and N is an integer greater than or equal to 2; The N user behavior sequences are clustered to form at least one cluster; Select the center sequence from each cluster; and Messages are pushed into the page sequence corresponding to the central sequence; The clustering of the N user behavior sequences includes: Based on every two user behavior sequences in the N user behavior sequences, a DTW matrix is constructed according to the Dynamic Time Warping (DTW) algorithm. The number of rows I and columns J of the DTW matrix are the number of pages in every two user behavior sequences, respectively. Each element D[i,j] in the DTW matrix is the minimum number of jumps from the i-th page in one user behavior sequence to the j-th page in another user behavior sequence. Based on the DTW matrix corresponding to each pair of user behavior sequences, the distance between each pair of user behavior sequences is obtained; and The N user behavior sequences are clustered based on the distance between any two user behavior sequences.
2. The method according to claim 1, wherein, The clustering of the N user behavior sequences includes: The DBSCAN algorithm, a density-based clustering method with noise, is used to cluster the N user behavior sequences.
3. The method according to claim 2, wherein, The process of clustering the N user behavior sequences using the noisy density-based clustering method DBSCAN includes: Identify all core sequences from the N user behavior sequences, wherein when a user behavior sequence's... When the number of other user behavior sequences within the domain is greater than or equal to MinPts, the user behavior sequence is determined to be a core sequence. The distance radius is predetermined; where MinPts is an integer, and 1≤MinPts<N-1; Each time, starting from any untraversed first core sequence among all the core sequences, a cluster is generated by iterating through the following loop: From the N user behavior sequences, determine all density-directed sequences that have a density-directed relationship with the first core sequence; and If the total density-accessible sequences include core sequences that have not been traversed, then the first core sequence is updated with the untraversed core sequences.
4. The method according to claim 3, wherein, The central sequence is selected from each cluster using any of the following methods: The first core sequence initially selected when dividing each cluster is used as the center sequence; Each core sequence in each cluster is used as the central sequence. or The first m core sequences, sorted from largest to smallest by the number of density-directed sequences in each cluster, are respectively used as the center sequences, where m is an integer greater than or equal to 1.
5. The method according to claim 3, wherein, The method further includes: If there is a noise sequence among the N user behavior sequences that does not belong to any category, the jump method between the pages corresponding to the noise sequence is terminated.
6. The method according to claim 5, wherein, When there exists a non-core sequence among the N user behavior sequences that does not belong to the core sequence, and the first non-core sequence is not in any of the core sequences. Within the domain, the first non-core sequence is determined to be the noise sequence.
7. An information push device, comprising: The acquisition module is used to acquire N user behavior sequences, where, Each user behavior sequence is formed based on the sequence of pages a user browses during a single use of the application, where N is an integer greater than or equal to 2; The clustering module is used to cluster the N user behavior sequences to obtain at least one cluster; The selection module is used to select the central sequence from each cluster; and The push module is used to push messages in the page sequence corresponding to the central sequence; The clustering module is specifically used for: Based on every two user behavior sequences in the N user behavior sequences, a DTW matrix is constructed according to the Dynamic Time Warping (DTW) algorithm. The number of rows I and columns J of the DTW matrix are the number of pages in every two user behavior sequences, respectively. Each element D[i,j] in the DTW matrix is the minimum number of jumps from the i-th page in one user behavior sequence to the j-th page in another user behavior sequence. Based on the DTW matrix corresponding to each pair of user behavior sequences, the distance between each pair of user behavior sequences is obtained; and The N user behavior sequences are clustered based on the distance between any two user behavior sequences.
8. An electronic device, comprising: One or more processors; Memory, used to store one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors perform the method according to any one of claims 1 to 6.
9. A computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the method of any one of claims 1 to 6.
10. A computer program product comprising computer program instructions that, when executed by a processor, implement the method of any one of claims 1 to 6.