A software application recommendation method based on user behavior information and speech information
By combining conceptual models and metagraphs of user behavior and speech information, and utilizing random walks and Skip-gram models, the problems of implicit feedback and long-tail distribution in mobile application recommendations are solved, resulting in more accurate recommendations and improved recommendation performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING UNIV OF TECH
- Filing Date
- 2022-11-25
- Publication Date
- 2026-06-26
AI Technical Summary
Existing mobile application recommendation methods fail to effectively utilize users' implicit feedback and long-tail distribution, resulting in poor recommendation performance and a lack of consideration for users' social information.
By integrating user behavior and speech information, a conceptual model is built and a metagraph is designed. Node embedding representations are learned using random walks and heterogeneous Skip-gram models, and recommendations are made by calculating the vector similarity between users and mobile applications.
It enables more detailed profiling of users and mobile applications, improves the accuracy of recommendations, effectively addresses the cold start problem, and enhances recommendation performance.
Smart Images

Figure CN115757755B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the technical field of recommendation methods, specifically mobile application recommendation, and particularly a software application recommendation method based on user behavior and speech information. It is applicable to mobile application recommendation processes that consider a user's mobile application usage, speech activity, and social information together. This invention can utilize personalized methods to accurately recommend mobile applications to users based on their preferred user behavior and speech information. Background Technology
[0002] Web-based software application distribution platforms are becoming increasingly popular among internet users. A wide variety of applications have greatly facilitated people's lives. At the same time, application recommendation engines tend to recommend mobile applications based on popularity. Therefore, as different users' interests are overlooked, users often struggle to find the applications they truly want. Application recommendation systems can be used to understand user preferences and predict their interests.
[0003] Compared to recommendation systems in other fields, mobile app recommendations have two unique characteristics: implicit feedback and a long-tailed distribution. Unlike explicit feedback such as ratings and reviews, implicit feedback is simply the interaction between users and items. Therefore, identifying user preferences for items from implicit feedback is much more difficult for researchers. Furthermore, the challenges of mobile app recommendations also stem from the long-tailed distribution of items and the sparsity of the dataset. Compared to Netflix's dataset, the app dataset is top-heavy, with the top 1% of apps accounting for 58% of usage, while on Netflix, the top 1% of movies account for 22% of all ratings.
[0004] After decades of development, the field of app recommendation has seen extensive research on how to more efficiently analyze app characteristics and user preferences and make recommendations based on this. Today, although many app recommendation methods integrate information from different dimensions, this information is relatively simple and lacks consideration of users' social information and the use of multi-dimensional data.
[0005] To address these two challenges, many articles have incorporated new information into app recommendations. To meet these challenges and improve the performance of mobile app recommendations, researchers have fully utilized various types of information. Many scholars both domestically and internationally have considered adding other information when making app recommendations, including app ratings, size and permissions, app category information, contextual information about the user's online time and location, competitive relationships between apps, app version information, and user download and browsing behavior.
[0006] Graph embedding refers to learning vector representations of nodes in a graph using specific methods, such as LINE, DeepWalk, and Node2vec. LINE can embed large information networks into a low-dimensional vector space. DeepWalk uses random walk paths of nodes in the network to simulate the text generation process, and can vectorize nodes based on natural language models. Node2vec is an improvement on DeepWalk, which improves the random walk strategy by optimizing the probability of node transitions. DeepWalk and Node2vec transform the graph structure into node paths through random walks in the network, and then apply the Skip-gram model to embed the nodes. The Skip-gram model is a neural network model for natural language processing. The embedding work treats each node path as a sentence and each node as a word, and then uses the Skip-gram model to calculate the embedding vector of the node. These embedding learning techniques can learn the contextual relationships between nodes, making it easier to calculate the similarity between nodes. Many researchers have used embedding techniques for recommendation mobile applications. Summary of the Invention
[0007] The technical problem to be solved by this invention is to provide a software application recommendation method based on user behavior information and speech information, in order to solve the problems of implicit feedback and long-tail distribution that have not been considered in existing research solutions, as well as the problem that the recommendation effect of existing solutions needs to be improved when applying recommendations.
[0008] This invention provides a software application recommendation method based on user behavior and speech information. This method establishes a conceptual model by fusing behavioral and speech information, fully integrating heterogeneous user behavior and speech, and constructing a knowledge graph representing the relationship between users and applications to better characterize users and applications. Furthermore, a metagraph is designed based on the conceptual model, connecting concepts within the model. Finally, through random walks guided by the metagraph, meaningful walks are performed on nodes in the knowledge graph, resulting in sequences that better represent the nodes. Multiple walks yield multiple meaningful sequences of node starting points, thus obtaining the node's neighborhood. Embedding the walk results yields a node vector, which effectively represents the node's information. Finally, by calculating the vectors, similar nodes can be recommended. This method is suitable for application recommendation processes that simultaneously consider user behavior and speech information.
[0009] This invention includes the following steps:
[0010] Step 1, preprocessing step, involves preprocessing the user's behavioral and verbal information, and using the processed results in Step 2.
[0011] For user behavior information, this invention directly extracts the user's application usage time and forum level within the application's forum. For user comments, this invention requires natural language processing (NLP) to extract the user's viewpoints from the user's comments and posts. User viewpoints are defined as the user's sentiment towards a specific aspect of the application, using an "aspect-sentiment" pair to represent the user's opinion. The input user comments are processed using standard language processing procedures, employing NLP-based methods to extract the user's expressed viewpoints.
[0012] Step 2: Establish a conceptual model. Combine user behavior and speech information to create a conceptual model, and then proceed to Step 3 based on this model.
[0013] To integrate user application usage and user comments, this invention uses a conceptual model to fuse information with different forms and contents. The model considers five core concepts: mobile application, user, application forum, reviews, and posts, and seven core relationships between these concepts: user uses application, user uses forum, user posts reviews, user posts, application owns forum, application owns reviews, and forum owns posts. Conceptual modeling allows for the association of these heterogeneous, multi-dimensional pieces of information, helping us better organize the relationships between these data. In step 2, user comments and behavioral information are fused using the conceptual model to establish the conceptual model.
[0014] Step 3, Metagraph Design. Based on the concepts of mobile app, user, app forum, rating, post, and the relationships between them in the conceptual model, design a metagraph to meaningfully connect these concepts and relationships. The designed metagraph is then used to perform Step 4.
[0015] Based on the concepts of mobile applications, users, application forums, reviews, and posts in the conceptual model, and the relationships between them, this invention designs metagraphs to find frequently occurring meaningful subgraphs to express the connections between different users and applications. This invention designs six metagraphs. Metagraphs M1 and M2 consider user behavior information, characterizing the duration of user application use and the application forum levels represented by the user's browsing and posting. Metagraphs M3 and M4 consider the impact of different users expressing similar opinions on the same application, while metagraphs M5 and M6 consider the impact of different users posting similar replies to the same content. These four metagraphs all consider user speech information, characterizing speech through user posts and replies respectively. Through this step, both user behavior and speech can be considered simultaneously, resulting in metagraphs that effectively characterize user features.
[0016] Step 4, the random walk step, involves performing a random walk on the conceptual model using the designed metagraph guidance information. The resulting node sequence will serve as the input for step 5.
[0017] Through conceptual models, user behavior and speech information, as well as the relationships between users and mobile applications, are linked together to form a knowledge graph. Random walks guided by metagraphs are then performed on this knowledge graph. This allows for restrictions on the types of nodes that can be walked and for evaluation of those nodes. The resulting sequence of nodes better represents the meaningful neighborhood of each node and provides a more comprehensive characterization of node information.
[0018] Step 5 involves embedding and scoring the walk results. The node sequence obtained from the random walk is embedded to obtain a vectorized representation of each node, and a score is assigned based on these vectors. The scoring results will serve as input for step 6.
[0019] This invention employs a heterogeneous Skip-gram model to learn the embedding representations of nodes in a heterogeneous knowledge graph G(V,E). This model can effectively learn node embedding representations in heterogeneous networks. For the node embedding vectors, user mobile application preferences are scored by calculating vector similarity. Calculating the cosine similarity between vectors takes into account both the magnitude and direction of the vectors to determine their similarity. By calculating the vector similarity between users and mobile applications, the similarity between users and mobile applications can be obtained. For each user, we calculate scores score1, score2, score3, score4, score5, and score6 for each candidate mobile application based on metagraphs M1, M2, M3, M4, M5, and M6, respectively. Then, we calculate the user's overall score for the mobile applications, sort all the overall scores of the user and the candidate mobile applications, and select the top k mobile applications with the highest scores as the application recommendation results.
[0020] Step 6: Output mobile app recommendations. The system will output a list of recommended mobile apps for different users.
[0021] The beneficial effects of this invention are:
[0022] I. This invention integrates user behavior and speech information, fully considering both explicit and implicit characteristics of users and mobile applications in the field of mobile application recommendation, thus providing a better characterization of both. Compared to other mobile application recommendation methods, this method offers a more detailed characterization of users and mobile applications, resulting in a clearer representation of both.
[0023] Second, this invention takes into account users' social information and incorporates multi-dimensional information. It better addresses the cold start problem encountered during the mobile application recommendation process. Attached Figure Description
[0024] Figure 1 This is the overall flowchart of the present invention.
[0025] Figure 2 This is a flowchart of the preprocessing steps of the present invention.
[0026] Figure 3 This is a schematic diagram of the conceptual model design of the present invention.
[0027] Figure 4 This is a schematic diagram of the meta-graphic design of the present invention.
[0028] Figure 5 This is a diagram of the random walk process of the present invention.
[0029] Figure 6 The flowchart for embedding and scoring the walk results of this invention is shown below. Detailed Implementation
[0030] A brief overview of the invention is given below to provide a basic understanding of certain aspects of it. It should be understood that this overview is not an exhaustive summary of the invention. It is not intended to identify key or essential parts of the invention, nor is it intended to limit the scope of the invention. Its purpose is merely to present certain concepts in a simplified form as a prelude to the more detailed description that follows.
[0031] like Figure 1 As shown, this mobile recommendation method includes the following steps: preprocessing, conceptual model building, metagraph design, random walk, embedding and scoring of walk results, and outputting mobile recommendations. The specific details are described below:
[0032] Step 1, preprocessing step, involves preprocessing the user's behavioral and verbal information, and using the processed results in Step 2.
[0033] Step 2: Establish a conceptual model. Combine user behavior and speech information to create a conceptual model, and then proceed to Step 3 based on this model.
[0034] Step 3, Metagraph Design. Based on the concepts of mobile app, user, app forum, rating, post, and the relationships between them in the conceptual model, design a metagraph to meaningfully connect these concepts and relationships. The designed metagraph is then used to perform Step 4.
[0035] Step 4, the random walk step, involves performing a random walk on the conceptual model using the designed metagraph guidance information. The resulting node sequence will serve as the input for step 5.
[0036] Step 5 involves embedding and scoring the walk results. The node sequence obtained from the random walk is embedded to obtain a vectorized representation of each node, and a score is assigned based on these vectors. The scoring results will serve as input for step 6.
[0037] Step 6: Output mobile application recommendations.
[0038] In step 1 above, the text posted by the user needs to be extracted using natural language processing technology to obtain the user's viewpoint contained in the text. The overall flowchart is as follows: Figure 2 As shown. After removing the HTML structure information from the raw text obtained by the crawler, this invention executes standard language processing procedures, using the natural language processing toolkit CoreNLP to process the text. CoreNLP enables users to derive linguistic annotations of the text, including tokens and sentence boundaries, discourse, named entities, numbers and time values, dependency and component resolution, core inference, sentiment, citation attribution, and relationships. This invention uses the tokenize, pos, and sentiment tools to segment the Chinese text and obtain POS tags, where the NN and sentence sentiment are combined to obtain "aspect-sentiment" pairs, which are obtained as user opinions. Subsequently, this invention filters the obtained data to avoid excessive sparsity. Mobile applications used by users for less than 120 minutes and mobile application forums with user forum levels below level 2 are removed. These mobile applications or forums are not considered to be used by users due to insufficient usage time or frequency. After removal, users with fewer than 10 mobile applications and mobile application forums in total are removed to avoid cold start and excessive data sparsity. For comments and posts, considering their informational and social aspects, this invention removes comments and posts that do not contain any opinions, as well as comments and posts that do not have any replies or likes.
[0039] The conceptual model established by the present invention in step 2 above is as follows: Figure 3As shown. After the preprocessing step, a conceptual model is established. To combine mobile application usage and user-posted social information, this invention uses a conceptual model to integrate information with different forms and contents. The model considers five core concepts: mobile application, user, mobile application forum, reviews, and posts, as well as seven core relationships between these concepts: user using mobile application, user using mobile application forum, user posting reviews, user posting posts, mobile application owning forum, mobile application owning reviews, and forum owning posts. This information has different forms and contents. Through the conceptual model, this information can be linked, helping us to better organize the relationships between these data. The conceptual model includes commonly used multidimensional information, including user, mobile application, mobile application description, and user reviews. Simultaneously, considering the user's social attributes, user social information is added. This social information mainly consists of various information from the mobile application forum, including post content, forum level, comment replies, and post replies. Specifically, from the perspective of the user's needs and preferences analysis task for mobile applications itself, the concepts of user and mobile application are indispensable. From the perspective of analyzing the impact on user needs and preferences, user interactions with mobile applications—ratings—reflect user preferences and influence other user needs and preferences; therefore, the concept of ratings needs to be defined. Furthermore, this invention incorporates users' social information to strengthen the consideration of user needs and preferences. In this regard, mobile application forums, as communities where users can express their opinions, become a concept within the conceptual model. User interactions with mobile application forums—posts—like ratings, reflect user preferences and influence other user needs and preferences; therefore, the concept of posts also needs to be defined. Based on these five core concepts, this invention derives other related concepts and the relationships between these core concepts.
[0040] Based on the conceptual model, this invention designs a metagraph to find frequently occurring meaningful subgraphs to represent the connections between different users. Figure 4The six metagraphs designed in step 3 above are illustrated. Metagraph M1 considers the impact of user time spent using the mobile application, representing two users who have used the same mobile application for an extended period. This explains the similarity of needs and preferences represented by different user time spent using the mobile application. Metagraph M2 considers the impact of user ranking in the mobile application forum, representing two users with similar rankings in the same mobile application forum. This explains the similarity of needs and preferences represented by the amount of browsing and interaction users have in the mobile application forum. Metagraphs M3 and M4 consider the impact of different users' opinions on the same mobile application. Metagraph M3 represents two users who have posted comments with similar opinions on the same mobile application, and Metagraph M4 represents two users who have posted posts with similar opinions in the same mobile application forum. Here, "opinion" can refer to comments or posts expressing certain viewpoints, or it can refer to users expressing agreement with other users' viewpoints, thus being considered as expressing such viewpoints. These two metagraphs explain the implicit similarity of needs and preferences in user comments and posts when using the mobile application. Metagraphs M5 and M6 take into account the impact of different users' expressions of agreement on the same published content. Metagraph M5 represents two users agreeing with the same comment, while metagraph M6 represents two users agreeing with the same post. These two metagraphs explain the implicit similarity of needs and preferences in the opinions expressed by users in their interactions when using mobile applications.
[0041] The process of random walk guided by metagraph in step 4 above is as follows: Figure 5 As shown in the diagram. Here, f(v) represents any node that conforms to the node type specified by the metagraph and is reachable from the current node. τ(v) represents the condition that node v satisfies when a judgment needs to be made. In this algorithm, each node in the node set V will be the starting node, and the process will proceed n times. Lines 4 to 12 show a single walking path. In this single path, it starts from the current node and randomly walks to the neighboring nodes of the current node, which satisfy the node type specified in the metagraph. If the current node has no neighboring nodes of the specified type, the current walk terminates, and the next walk begins. The current node is added to the path unless a judgment needs to be made and the node does not satisfy the judgment. After appending, the node is updated to a new node. In this process, user behavior information, speech information, and mobile application information are associated with the conceptual model to form a graph network. On this graph network, random walks guided by the metagraph are used to simulate the semantic neighborhood of a node that can express its mobile application preferences. During the process of traversing the graph network that integrates user and mobile application information, it was interpreted and transformed into a series of meaningful node sequences.
[0042] In step 5 above, the process of embedding and scoring the walking results is as follows: Figure 6 As shown, the node sequences obtained from the walk are embedded using a heterogeneous Skip-gram model to obtain their embedding vectors. These vectors can summarize the information in the metagraph, facilitating subsequent calculations. Then, the similarity of the embedding vectors is calculated using the vector cosine value method to obtain similar nodes, and a score is assigned based on the similarity between the user and the mobile application.
[0043] This invention employs a heterogeneous Skip-gram model to learn the embedding representations of nodes in a heterogeneous knowledge graph G(V,E). Within a window of size w, the embedding representations of the context nodes of node v are maximized. The probability of occurrence can be learned through the Skip-gram learning node embedding function. :
[0044]
[0045] in It is the context node of node v, which can be obtained by a random walk guided by the metagraph. It is all The product of, where yes All nodes in the process. With nodes The type of the model is related to the model size, and it is modeled using softmax. Simultaneously, negative sampling is used to accelerate training. In this design, the window size w is 5. This model can effectively learn the embedding representations of nodes in heterogeneous networks.
[0046] For the node embedding vector, the user's mobile application preferences are scored by calculating vector similarity. For user u who wants to recommend applications, the score of user u for each application a is obtained according to formula (2).
[0047]
[0048] in, It is the vector of user node u. It is the vector of the application node a.
[0049] For each user, we calculate scores (score1, score2, score3, score4, score5, score6) for each mobile application based on metagraphs M1, M2, M3, M4, M5, and M6 respectively. Then, the user's overall score for the mobile application is:
[0050]
[0051] We then sort all the candidate mobile applications by their overall scores and select the top k mobile applications with the highest scores as the application recommendations.
[0052] Finally, the mobile app recommendation results are output. By rating the mobile apps and sorting the scores in descending order, a personalized recommendation list for the user's mobile apps is obtained; this list is the final output recommendation result.
Claims
1. A software application recommendation method based on user behavior information and speech information, characterized in that, Includes the following steps: Step 1, preprocessing step, preprocesses the user's behavior information and speech information, and uses the processed results for step 2; For user behavior information, the design directly extracts the duration of user use of the application and the user's forum level in the application forum; For user comments and posts, the design requires natural language processing to extract user opinions. User opinions are defined as the user's feelings about a certain aspect of the application, using "aspect-feeling" pairs to represent the user's opinions. The input user comments are processed using standard language processing procedures and natural language processing-based methods to extract the user's opinions. Step 2, establish a conceptual model; combine user behavior and speech information to perform conceptual modeling, and then proceed to Step 3 based on the conceptual model; To combine user application usage and user comments, a conceptual model is used to integrate information with different forms and contents. This involves considering five core concepts: mobile application, user, application forum, reviews, and posts, and seven core relationships between these concepts: user uses application, user uses forum, user posts reviews, user posts, application owns forum, application owns reviews, and forum owns posts. Conceptual modeling is used to link these heterogeneous, multi-dimensional pieces of information. In step 2, user comments and behavioral information are integrated using a conceptual model to establish the conceptual model. Step 3, Metagraph Design Step; Based on the concepts of mobile application, user, application forum, rating, post, and the relationships between them in the conceptual model, design a metagraph to connect these concepts and relationships in a meaningful way; The designed metagraph is used to perform Step 4; Based on the concepts of mobile applications, users, application forums, reviews, and posts in the conceptual model, and the relationships between them, metagraphs are designed to find frequently occurring meaningful subgraphs to express the connections between different users and applications. Six metagraphs are designed: metagraphs M1 and M2 consider user behavior information, characterizing the duration of user use of the application and the application forum levels represented by the user's browsing and posting; metagraphs M3 and M4 consider the impact of different users expressing similar opinions on the same application; metagraphs M5 and M6 consider the impact of different users posting similar replies to the same post. These four metagraphs all consider user speech information, characterizing speech through user posts and replies respectively. Through this step, metagraphs that characterize user features are designed by considering both user behavior and speech. Step 4, random walk step, using the designed metagraph to guide the random walk on the conceptual model; the resulting node sequence will be used as the input for step 5; Through the conceptual model, user behavior and speech information, as well as the relationship between users and mobile applications, are linked together to form a knowledge graph. On the knowledge graph, random walks are guided by metagraphs, the types of nodes that walk are restricted, and the nodes that walk are judged. The resulting series of node sequences express the meaningful neighborhood of the nodes and characterize the information of the nodes. Step 5, Walk result embedding and scoring step, embeds the node sequence obtained by random walk to obtain the vectorized representation of each node, and scores based on the vector of each node; the scoring result will be used as the input of step 6. The design employs a heterogeneous Skip-gram model to learn the embedding representations of nodes in a heterogeneous knowledge graph G(V,E). For the node embedding vectors, user mobile application preferences are scored by calculating vector similarity. Cosine similarity between vectors is calculated, taking into account both magnitude and direction, to determine the similarity between vectors. The similarity between users and mobile applications is obtained by calculating vector similarity. For each user, scores score1, score2, score3, score4, score5, and score6 are calculated based on metagraphs M1, M2, M3, M4, M5, and M6, respectively. Then, the overall score of users for mobile applications is calculated, and all overall scores of users and candidate mobile applications are sorted. The top k mobile applications with the highest scores are selected as the application recommendations. Step 6: Output mobile application recommendations; Output a list of mobile application recommendations for different users.
2. The software application recommendation method based on user behavior information and speech information according to claim 1, characterized in that, The design employs a heterogeneous Skip-gram model to learn the embedding representations of nodes in a heterogeneous knowledge graph G(V,E); within a window of size w, the embedding representations of the context nodes of node v are maximized. The probability of occurrence is determined by learning the node embedding function through Skip-gram. : ; in It is the context node of node v, obtained by a random walk guided by the metagraph; It is all The product of, where yes All nodes in; With nodes The type is related and is modeled using softmax; negative sampling is used to speed up training; the window size w is 5, which effectively learns the embedding representation of nodes in heterogeneous networks.
3. The software application recommendation method based on user behavior information and speech information according to claim 1, characterized in that, For the node's embedding vector, the user's mobile application preferences are scored by calculating vector similarity; for the user u to recommend applications, the user u's score for each application a is obtained; ; in, It is the vector of user node u. It is the vector of the application node a; For each user, scores score1, score2, score3, score4, score5, and score6 for each mobile application are calculated based on metagraphs M1, M2, M3, M4, M5, and M6 respectively; then, the user's overall score for the mobile application is calculated as follows: ; Then, the overall scores of all candidate mobile applications are sorted, and the top k mobile applications with the highest scores are selected as the application recommendation results. Finally, the recommended mobile applications are output. The scores of the mobile applications are sorted in descending order to obtain a personalized list of mobile applications recommended to the user. This list is the final output recommendation result.