Community question and answer recommendation method and device, equipment and storage medium

By calculating the embedding values ​​of question and answer content and users, and combining them with a factorization machine model, the problems of cold start users and data sparsity were solved, enabling more accurate community question and answer recommendations, improving user experience and producer incentives.

CN116226343BActive Publication Date: 2026-06-26PING AN TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PING AN TECH (SHENZHEN) CO LTD
Filing Date
2023-02-16
Publication Date
2026-06-26

Smart Images

  • Figure CN116226343B_ABST
    Figure CN116226343B_ABST
Patent Text Reader

Abstract

The application discloses a community question and answer recommendation method, comprising the following steps: obtaining a question and answer content recommendation candidate set, wherein the question and answer content recommendation candidate set comprises basic feature data of question and answer content; obtaining user information of a user, wherein the user information comprises personal information data and historical behavior data of the user; obtaining an Embedding value of each question and answer content in the question and answer content recommendation candidate set according to the basic feature data, the personal information data and the historical behavior data; calculating an Embedding value of the user according to the Embedding value of the question and answer content and the historical behavior data; calculating the similarity between the Embedding value of the question and answer content and the Embedding value of the user; generating an initial question and answer content recommendation set according to the similarity; inputting the basic feature data, the personal information data and the historical behavior data into a factor decomposition machine model, and performing correlation sorting on the question and answer content contained in the initial question and answer content recommendation set to obtain a question and answer content recommendation set.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of deep learning, and more specifically, to a community question-and-answer recommendation method, apparatus, device, and storage medium. Background Technology

[0002] Recommendation systems are a crucial application area of ​​modern machine learning algorithms, playing a vital role in scenarios such as advertising and community recommendations. In community recommendation scenarios, recommendation algorithms can improve user engagement and content conversion rates. For users, recommendation systems can more accurately match content that they might be interested in. For community content creators, a good recommendation system can bring more users interested in their content, further incentivizing creation. For UGC community applications, recommendation systems can serve as a bridge between content creators and consumers, which is extremely important for applications like TikTok and Toutiao.

[0003] Traditional recommendation systems have the following problems: (1) For cold-start users and items without any behavioral records, it is difficult to predict their interests due to the lack of relevant behavioral data. (2) There is a data sparsity problem. When the data is sufficiently sparse, the overlap of items viewed by any two users is relatively small, and it is difficult to distinguish between synonyms. (3) The lack of generalization ability of the model makes it difficult to generalize to complex data in the production environment. Summary of the Invention

[0004] The main objective of this invention is to provide a community question-and-answer recommendation method, apparatus, device, and storage medium, aiming to solve the technical problems of data sparsity and generalization.

[0005] The present invention discloses the following technical solutions:

[0006] A community question-and-answer recommendation method includes:

[0007] Obtain a candidate set of question-and-answer content recommendations, wherein the candidate set of question-and-answer content recommendations includes basic feature data of question-and-answer content;

[0008] Obtaining user information, wherein the user includes the user's personal information data and historical behavior data;

[0009] Based on the basic feature data, the personal information data, and the historical behavior data, the embedding value of each question and answer content in the question and answer content recommendation candidate set is obtained;

[0010] The user's Embedding value is calculated based on the Embedding value of the question and answer content and the historical behavior data;

[0011] Calculate the similarity between the embedding value of the question and answer content and the embedding value of the user;

[0012] An initial set of question-and-answer content recommendations is generated based on similarity.

[0013] The basic feature data, the personal information data, and the historical behavior data are input into the factorization machine model to sort the question and answer content contained in the initial question and answer content recommendation set by relevance, thereby obtaining the question and answer content recommendation set.

[0014] Furthermore, the step of obtaining the candidate set of question-answer content recommendations includes:

[0015] Question and answer content that meets preset recommendation criteria is filtered from the database to generate a candidate set of recommended question and answer content, wherein the database contains all question and answer content within the community.

[0016] Furthermore, prior to the step of obtaining the candidate set of question-answer content recommendations, the following steps are included:

[0017] Determine if a user is an active user;

[0018] If not, then popular Q&A content will be recommended to the user.

[0019] Further, the step of obtaining the embedding value of each question and answer content in the question and answer content recommendation candidate set based on the basic feature data, the personal information data, and the historical behavior data includes:

[0020] The basic feature data, the personal information data, and the historical behavior data are encoded to obtain the embedding value of each feature in each question and answer content;

[0021] Classify multiple features in each of the aforementioned question-and-answer contents;

[0022] The embedding values ​​of features within the same category are summed to obtain the embedding value for each category.

[0023] The embedding values ​​for the question and answer content are generated based on the embedding values ​​of each category.

[0024] Further, the step of calculating the user's embedding value based on the embedding value of the question-and-answer content and the historical behavior data includes:

[0025] Based on the historical behavioral data, behavioral funnel data is obtained;

[0026] Calculate the weights of different historical behavioral data based on the behavioral funnel data;

[0027] Based on the weights, the embedding values ​​of the question and answer content are weighted and summed to obtain the user's embedding value.

[0028] Further, the step of inputting the basic feature data, the personal information data, and the historical behavior data into the factorization machine model to rank the question-and-answer content contained in the initial question-and-answer content recommendation set according to their relevance, to obtain the question-and-answer content recommendation set, includes:

[0029] The basic feature data, the personal information data, and the historical behavior data are used to generate one or more feature factor groups for the user;

[0030] The feature factor set is input into the factorization machine model to obtain the decomposition matrix;

[0031] Based on the decomposition matrix, the cross term coefficients of each of the feature factor groups are calculated;

[0032] Based on the cross term coefficients, the correlation value of each of the feature factor groups is calculated;

[0033] Based on the magnitude of the relevance values, the question and answer content corresponding to the feature factor group is sorted to obtain a question and answer content recommendation set.

[0034] The present invention also provides a community question and answer recommendation device, comprising:

[0035] The question-and-answer content recommendation candidate set acquisition module is used to acquire the question-and-answer content recommendation candidate set, wherein the question-and-answer content recommendation candidate set includes basic feature data of question-and-answer content;

[0036] The user information acquisition module is used to acquire user information, wherein the user information includes the user's personal information data and historical behavior data.

[0037] The first calculation module is used to obtain the embedding value of each question and answer content in the question and answer content recommendation candidate set based on the basic feature data, the personal information data, and the historical behavior data.

[0038] The second calculation module is used to calculate the user's Embedding value based on the Embedding value of the question and answer content and the historical behavior data;

[0039] The similarity calculation module is used to calculate the similarity between the embedding value of the question and answer content and the embedding value of the user;

[0040] The initial question-and-answer content recommendation set generation module is used to generate an initial question-and-answer content recommendation set based on similarity.

[0041] The question-and-answer content recommendation set generation module is used to input the basic feature data, the personal information data, and the historical behavior data into the factorization machine model, and sort the question-and-answer content contained in the initial question-and-answer content recommendation set according to their relevance to obtain the question-and-answer content recommendation set.

[0042] Furthermore, the question-and-answer content recommendation candidate set acquisition module includes:

[0043] The filtering unit is used to filter out question and answer content that meets the preset recommendation criteria from the database and generate a candidate set of recommended question and answer content, wherein the database contains all question and answer content in the community.

[0044] This application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of any of the methods described above.

[0045] This application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of any of the methods described above.

[0046] Beneficial effects:

[0047] This invention utilizes embedding values ​​to mathematically represent each user and each question-and-answer content in the candidate recommendation set, recalling items that the user might like to obtain an initial question-and-answer content recommendation set. This allows for the recommendation of question-and-answer content with little or no historical activity, increasing the exposure of less popular content, enriching the user's browsing experience, and attracting more interested users to content creators with low exposure rates, further incentivizing their creation. Then, a factorization machine model is used to rank the recalled question-and-answer content in the initial recommendation set according to the user's preference, enabling more accurate prediction of user preferences. Attached Figure Description

[0048] Figure 1 This is a flowchart illustrating a community question-and-answer recommendation method according to an embodiment of the present invention;

[0049] Figure 2 This is a schematic diagram of the process for obtaining the Embedding value of each question and answer content in the question and answer content recommendation candidate set according to an embodiment of the present invention;

[0050] Figure 3 This is a schematic diagram of the process for obtaining a question-and-answer content recommendation set according to an embodiment of the present invention;

[0051] Figure 4 This is a schematic block diagram of a community question-and-answer recommendation device according to an embodiment of the present invention;

[0052] Figure 5 This is a schematic block diagram of the structure of a computer device according to an embodiment of the present invention.

[0053] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0054] It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

[0055] Reference Figure 1 The present invention provides an embodiment of a community question-and-answer recommendation method, comprising:

[0056] S1: Obtain a candidate set of question-and-answer content recommendations, wherein the candidate set of question-and-answer content recommendations includes basic feature data of question-and-answer content;

[0057] S2: Obtain the user's user information, wherein the user information includes the user's personal information data and historical behavior data;

[0058] S3: Based on the basic feature data, the personal information data, and the historical behavior data, obtain the embedding value of each question and answer content in the question and answer content recommendation candidate set;

[0059] S4: Calculate the user's Embedding value based on the Embedding value of the question and answer content and the historical behavior data;

[0060] S5: Calculate the similarity between the embedding value of the question and answer content and the embedding value of the user;

[0061] S6: Generate an initial set of question-and-answer content recommendations based on similarity;

[0062] S7: Input the basic feature data, the personal information data, and the historical behavior data into the factorization machine model, and sort the question and answer content contained in the initial question and answer content recommendation set according to their relevance to obtain the question and answer content recommendation set.

[0063] In the above embodiments, the present invention uses embedding values ​​to mathematically represent each user and each question-and-answer content in the question-and-answer content recommendation candidate set, recalling items that the user might like to obtain an initial question-and-answer content recommendation set. This allows for the recommendation of question-and-answer content with little or no historical activity, thereby increasing the exposure of low-profile question-and-answer content, enriching the user's browsing experience, and bringing more users interested in the content of producers with low exposure rates, further incentivizing their creation. Then, a factorization machine model is used to rank the recalled question-and-answer content in the initial recommendation set according to the user's preference, enabling more accurate prediction of user preferences.

[0064] As described in step S1 above, a candidate set of question and answer content recommendations is obtained, wherein the candidate set of question and answer content recommendations includes basic feature data of question and answer content;

[0065] First, obtain a candidate set of recommended question and answer content. This candidate set can contain all the question and answer content in the community, or it can be generated by filtering all the question and answer content in the community.

[0066] Within the candidate set of question-and-answer content recommendations, there exists basic feature data for each question and answer. This basic feature data mainly includes tags, publication time, word count, etc. More specifically, tags can be categorized into sports and fitness, lifestyle, workplace, etc. Publication time can be distinguished by year and month. For example, if the publication time of a question and answer is September 2022, then September 2022 is used as the time-related feature data for this question and answer. Question and answer content can be divided into several categories based on word count. For example, if the content is between 1 and 300 words, this word count feature is recorded as a short article. Then, one-hot encoding is performed on the basic feature data, representing it as a binary vector. For continuous numerical basic feature data, it needs to be discretized first before one-hot encoding.

[0067] As described in step S2 above, user information is obtained, wherein the user information includes the user's personal information data and historical behavior data;

[0068] Personal information data mainly refers to users' personal information, such as their education level, age, gender, and nationality. Historical behavior data mainly refers to data on users' historical behavior towards Q&A content within a certain period, such as the number and duration of clicks, views, favorites, and comments. Both personal information data and historical behavior data are then processed using a one-hot encoding method.

[0069] As described in steps S3 and S4 above, the embedding value of each question and answer content in the question and answer content recommendation candidate set is obtained based on the basic feature data, the personal information data, and the historical behavior data; the embedding value of the user is calculated based on the embedding value of the question and answer content and the historical behavior data.

[0070] The embedding value for each question and answer content is obtained. An embedding typically uses a low-dimensional vector to represent an object, such as a word, a product, or a movie. The property of embedding vectors is that vectors with close proximity correspond to objects with similar meanings. For example, the distance between embeddings (Avengers) and (Iron Man) would be very close, but the distance between embeddings (Avengers) and (Gone with the Wind) would be greater. The embedding value calculated in this embodiment contains multiple features of the question and answer content, thus containing a large amount of information. This embedding value can represent the overall characteristics of a question and answer content. Based on this embedding value, the relevance between a specific question and answer content and a user can be calculated using computational methods. Then, using a user collaborative filtering recall method, the user's embedding value is calculated based on the embedding value of the question and answer content and historical behavioral data.

[0071] As described in steps S5 and S6 above, the similarity between the embedding value of the question-and-answer content and the embedding value of the user is calculated; an initial question-and-answer content recommendation set is generated based on the similarity.

[0072] The higher the similarity between the embedding value of the question and answer content and the user's embedding value, the stronger the relevance and the higher the user's liking for this question and answer content. Therefore, question and answer content with similarity exceeding a preset threshold can be combined into a recommendation set to generate the initial question and answer content recommendation set.

[0073] As described in step S7 above, the basic feature data, the personal information data, and the historical behavior data are input into the factorization machine model to sort the question and answer content contained in the initial question and answer content recommendation set by relevance, thereby obtaining the question and answer content recommendation set.

[0074] After obtaining the initial question and answer content recommendation set, it is necessary to sort the question and answer content contained in the initial question and answer content recommendation set. The relevance between the predicted question and answer content and the user is used as the basis for sorting. The sorting is performed using a factorization machine model to obtain the question and answer content recommendation set.

[0075] In one embodiment, step S1 of obtaining the candidate set of question-and-answer content recommendations includes:

[0076] S101: Select question and answer content that meets the preset recommendation criteria from the database and generate a candidate set of question and answer content recommendations, wherein the database contains all question and answer content in the community.

[0077] In the above embodiment, the community contains a large amount of Q&A content. Directly predicting user preferences for all Q&A content would be computationally intensive and prone to errors. Therefore, it's necessary to pre-screen obviously unsuitable Q&A content. Q&A content is filtered based on preset recommendation criteria. These criteria could be based on whether the Q&A content was published too long ago (e.g., setting a time interval, such as filtering content published more than five years ago); or whether it contains prohibited content. The filtered Q&A content is then compiled into a candidate set for Q&A content recommendations.

[0078] In one embodiment, before step S1 of obtaining the question-answer content recommendation candidate set, the following steps are included:

[0079] S111: Determine if the user is an active user;

[0080] S112: If not, then recommend popular Q&A content to the user;

[0081] In the above embodiment, user activity is determined based on metrics such as last online time and number of exposures. If a user is not active, their historical behavior regarding community Q&A content is unknown, making it impossible to predict their preferences. Therefore, popular Q&A content is directly recommended to them. Popular Q&A content is generated by intersecting filtered popular data with multiple user search logs. This effectively addresses the problem of inaccurate recommendations caused by insufficient accumulated behavioral data features from inactive users.

[0082] Reference Figure 2 In one embodiment, step S3, which involves obtaining the embedding value of each question and answer content in the candidate set of question and answer content recommendations from the basic feature data, the personal information data, and the historical behavior data, includes:

[0083] S301: Encode the basic feature data, the personal information data, and the historical behavior data to obtain the embedding value of each feature in each question and answer content;

[0084] S302: Classify the multiple features in each of the question-and-answer contents;

[0085] S303: Add the embedding values ​​of features in the same category to obtain the embedding value of each category;

[0086] S304: Generate the embedding value of the question and answer content based on the embedding value of each category.

[0087] In the above embodiments, the present invention obtains the embedding value of the question and answer content through basic feature data, the personal information data, and the historical behavior data.

[0088] As described in step S301 above, the basic feature data, the personal information data, and the historical behavior data are encoded to obtain the embedding value of each feature in each question and answer content;

[0089] First, the text in the question-and-answer content is vectorized, typically using a vocabulary mapping method. Then, the vectorized text is trained into corresponding semantic vectors, that is, the text is represented using embedding vectors. Specifically, ERNIE 2.0base can be used to encode the input text. ERNIE 2.0 is an upgraded version of ERNIE 1.0 released by Baidu. ERNIE 1.0 is a language model that can learn grammatical and syntactic information through entity and phrase masks, achieving advanced levels in many Chinese natural language processing tasks. The ERNIE 2.0 semantic understanding pre-trained model obtains natural language information from multiple dimensions such as lexical, syntactic, and semantics from the training data, greatly enhancing its general semantic representation capabilities.

[0090] In addition, each question-and-answer content contains multiple features, and after encoding, the embedding value corresponding to each feature can be obtained. Each question-and-answer content corresponds to the embedding values ​​of multiple features.

[0091] As described in step S302 above, multiple features in each of the question-and-answer contents are classified.

[0092] An embedding value is a vector value, and the number of bits in the embedding value represents the amount of information it contains. A longer embedding value indicates more information, thus requiring more storage space and increasing computational complexity. Since a question-and-answer content contains a great deal of information and needs to be distinguished from other question-and-answer content, a longer embedding value may be required for differentiation. To reduce time and space consumption while maintaining information content, this invention categorizes the features of the question-and-answer content into different categories, with the same number of bits in the embedding value for features within the same category.

[0093] As described in steps S303 and S304 above, the embedding values ​​of features under the same category are added together to obtain the embedding value of each category; the embedding value of the question and answer content is generated based on the embedding value of each category.

[0094] Features within the same category have the same number of bits in their embedding values. The embedding values ​​of features within the same category are summed to obtain the embedding value for each category. Then, the embedding values ​​for each category are concatenated to obtain the embedding value for the question-and-answer content.

[0095] In one embodiment, step S4, which calculates the user's embedding value based on the embedding value of the question-and-answer content and the historical behavior data, includes:

[0096] S401: Obtain behavioral funnel data based on the historical behavioral data;

[0097] S402: Calculate the weights of different historical behavioral data based on the behavioral funnel data;

[0098] S403: Based on the weights, the embedding values ​​of the question and answer content are weighted and summed to obtain the user's embedding value.

[0099] In the above embodiments, after obtaining the embedding value of the question and answer content, the user's embedding value can be obtained by using the embedding value of the question and answer content and historical behavior data.

[0100] As described in step S401 above, behavioral funnel data is obtained based on the historical behavioral data;

[0101] Funnel analysis is a process-oriented data analysis method that scientifically reflects user behavior and the status of each stage from start to finish. For example, in a product service platform, the typical user shopping path from app activation to spending includes five stages: app activation, account registration, entering the live stream, interaction, and gift spending. The funnel can display the conversion rate at each stage. By comparing relevant data at each stage of the funnel, problems can be intuitively identified and explained, leading to a more accurate analysis of user preferences. In this invention, user behavior towards Q&A content in the community is generally divided into four stages: browsing, interaction, liking, and saving. Historical user behavior data is obtained, primarily showing user behavior during the browsing, interaction, liking, and saving stages over a past period.

[0102] As described in step S402 above, the weights of different historical behavioral data are calculated based on the behavioral funnel data; based on the weights, the embedding values ​​of the question and answer content are weighted and summed to obtain the user's embedding value.

[0103] To calculate the weights of different historical behaviors, we need to understand that different user behaviors reflect their level of liking for a particular question or answer. For example, if a user only browses a question or answer, it indicates a low level of liking; if they save it, it means they like it the most. Therefore, when calculating a user's embedding value, we must consider the weights of different behaviors to more accurately predict user preferences. We assign different weights to browsing, interacting, liking, and saving based on their importance to represent different behaviors. The user's embedding value is obtained by multiplying the embedding values ​​of the question and answer content by their respective weights and then summing them.

[0104] Reference Figure 3 In one embodiment, step S7, which involves inputting the basic feature data, the personal information data, and the historical behavior data into a factorization machine model to rank the question-and-answer content contained in the initial question-and-answer content recommendation set based on relevance, includes:

[0105] S701: Generate one or more feature factor groups corresponding to the user by combining the basic feature data, the personal information data, and the historical behavior data;

[0106] S702: Input the feature factor group into the factorization machine model to obtain the decomposition matrix;

[0107] S703: Calculate the cross term coefficients of each of the feature factor groups based on the decomposition matrix;

[0108] S704: Calculate the correlation value of each of the feature factor groups based on the cross term coefficients;

[0109] S705: Sort the question and answer content corresponding to the feature factor group according to the magnitude of the relevance value to obtain the question and answer content recommendation set.

[0110] In the above embodiments, the basic feature data of the question and answer content, the user's historical behavior data corresponding to this question and answer content, and personal information data are bound together to generate multiple feature factor groups, that is, each question and answer content that the user has ever viewed, interacted with, liked or collected is used as the grouping unit.

[0111] The factorization machine model is obtained by training the factorization machine algorithm on a training set in advance.

[0112] Among them, Factorization Machine (FM) is a machine learning algorithm based on matrix factorization. This algorithm considers the correlation between features and solves the problem of feature combination in the case of sparse data.

[0113] The formula for the factorization machine model is as follows:

[0114]

[0115] Among them, v i It is the i-th decomposition variable with k factors; k defines the hyperparameter of the decomposition dimension; n is the feature dimension; w0 is the global bias; w i It is the coefficient of the i-th variable; v i v j It is the coefficient of the interaction between the i-th and j-th variables; x i It is the i-th feature; x i x j y represents different features, such as combinations of user features and information feature factors; y is the target to be calculated.

[0116] This application also provides a community question-and-answer recommendation device, including:

[0117] The question-and-answer content recommendation candidate set acquisition module 10 is used to acquire the question-and-answer content recommendation candidate set, wherein the question-and-answer content recommendation candidate set includes basic feature data of question-and-answer content;

[0118] User information acquisition module 20 is used to acquire user information, wherein the user information includes the user's personal information data and historical behavior data.

[0119] The first calculation module 30 is used to obtain the embedding value of each question and answer content in the question and answer content recommendation candidate set based on the basic feature data, the personal information data, and the historical behavior data.

[0120] The second calculation module 40 is used to calculate the user's Embedding value based on the Embedding value of the question and answer content and the historical behavior data;

[0121] The similarity calculation module 50 is used to calculate the similarity between the embedding value of the question and answer content and the embedding value of the user;

[0122] The initial question-and-answer content recommendation set generation module 60 is used to generate an initial question-and-answer content recommendation set based on similarity.

[0123] The question-and-answer content recommendation set generation module 70 is used to input the basic feature data, the personal information data, and the historical behavior data into the factorization machine model, and to sort the question-and-answer content contained in the initial question-and-answer content recommendation set by relevance to obtain the question-and-answer content recommendation set.

[0124] In one embodiment, the question-and-answer content recommendation candidate set acquisition module includes:

[0125] The filtering unit is used to filter out question and answer content that meets the preset recommendation criteria from the database and generate a candidate set of recommended question and answer content, wherein the database contains all question and answer content in the community.

[0126] In one embodiment, an inactive user recommendation module is also included to determine whether a user is an active user; if not, popular Q&A content is recommended to the user.

[0127] The judgment unit is used to determine whether a user is an active user;

[0128] The recommendation unit is used to recommend popular Q&A content to the user if no.

[0129] In one embodiment, the first computing module includes:

[0130] The encoding unit is used to encode the basic feature data, the personal information data, and the historical behavior data to obtain the embedding value of each feature in each question and answer content;

[0131] A classification unit is used to classify multiple features in each of the question-and-answer contents;

[0132] The addition calculation unit is used to add the embedding values ​​of features under the same category to obtain the embedding value of each category;

[0133] The question-and-answer content embedding value generation unit is used to generate the embedding value of the question-and-answer content based on the embedding values ​​of each category.

[0134] In one embodiment, the second computing module includes:

[0135] A behavior funnel data acquisition unit is used to obtain behavior funnel data based on the historical behavior data.

[0136] The weight calculation unit is used to calculate the weights of different historical behavior data based on the behavior funnel data.

[0137] The weighted summation calculation unit is used to perform a weighted summation of the embedding values ​​of the question and answer content based on the weights to obtain the user's embedding value.

[0138] In one embodiment, the question-and-answer content recommendation set generation module includes:

[0139] The feature factor group generation unit is used to generate one or more feature factor groups corresponding to the user by combining the basic feature data, the personal information data, and the historical behavior data.

[0140] The decomposition matrix generation unit is used to input the feature factor group into the factorization machine model to obtain the decomposition matrix;

[0141] The cross-term coefficient calculation unit is used to calculate the cross-term coefficient of each of the feature factor groups based on the decomposition matrix.

[0142] The correlation value calculation unit is used to calculate the correlation value of each of the feature factor groups based on the cross term coefficients.

[0143] The sorting unit is used to sort the question and answer content corresponding to the feature factor group according to the magnitude of the relevance value, so as to obtain a question and answer content recommendation set.

[0144] Reference Figure 5 This application also provides a computer device, which can be a server. The computer device includes a processor, memory, network interface, and database connected via a system bus. The processor is designed to provide computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database stores user information data, question-and-answer content data, etc. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it can implement the community question-and-answer recommendation method described in any of the above embodiments.

[0145] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, can implement the community question-and-answer recommendation method described in any of the above embodiments.

[0146] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media provided in this application and in the embodiments may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can be obtained in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-speed SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0147] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, apparatus, article, or method. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, apparatus, article, or method that includes that element.

[0148] The above description is merely a preferred embodiment of the present invention and does not limit the patent scope of the present invention. Any equivalent structural or procedural transformations made based on the content of the present invention's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of the present invention.

Claims

1. A community question-and-answer recommendation method, characterized in that, include: Obtain a candidate set of question-and-answer content recommendations, wherein the candidate set of question-and-answer content recommendations includes basic feature data of question-and-answer content; Obtaining user information, wherein the user information includes the user's personal information data and historical behavior data; Based on the basic feature data, the personal information data, and the historical behavior data, the embedding value of each question and answer content in the question and answer content recommendation candidate set is obtained; The user's Embedding value is calculated based on the Embedding value of the question and answer content and the historical behavior data; Calculate the similarity between the embedding value of the question and answer content and the embedding value of the user; An initial set of question-and-answer content recommendations is generated based on the similarity. The basic feature data, the personal information data, and the historical behavior data are input into the factor decomposition machine model to sort the question and answer content contained in the initial question and answer content recommendation set by relevance, thereby obtaining the question and answer content recommendation set; The step of obtaining the embedding value of each question and answer content in the question and answer content recommendation candidate set based on the basic feature data, the personal information data, and the historical behavior data includes: The basic feature data, personal information data, and historical behavior data are encoded to obtain the embedding value of each feature in each question and answer content. The encoding process includes first vectorizing the text in the question and answer content through a word list mapping method, and then training the vectorized text into a corresponding semantic vector. Multiple features in each of the question-and-answer contents are classified, wherein the number of bits in the embedding value of features under the same category remains consistent; The embedding values ​​of features within the same category are summed to obtain the embedding value for each category. The embedding value of the question and answer content is generated based on the embedding value of each category. The generation method is to concatenate the embedding values ​​of each category to obtain the final embedding value of the question and answer content. The step of calculating the user's embedding value based on the embedding value of the question-and-answer content and the historical behavior data includes: Based on the historical behavioral data, behavioral funnel data is obtained; Calculate the weights of different historical behavioral data based on the behavioral funnel data; Based on the weights, the embedding values ​​of the question-and-answer content are weighted and summed to obtain the user's embedding value. Specifically, the weighted summation is performed by multiplying the embedding value of each question-and-answer content by its corresponding historical behavior weight, and then summing all the product results.

2. The community question-and-answer recommendation method according to claim 1, characterized in that, The step of obtaining the candidate set of question-answer content recommendations includes: Question and answer content that meets preset recommendation criteria is filtered from the database to generate a candidate set of recommended question and answer content, wherein the database contains all question and answer content within the community.

3. The community question-and-answer recommendation method according to claim 1, characterized in that, Before the step of obtaining the candidate set of question-answer content recommendations, the following steps are included: Determine if a user is an active user; If not, then popular Q&A content will be recommended to the user.

4. The community question-and-answer recommendation method according to claim 1, characterized in that, The step of inputting the basic feature data, the personal information data, and the historical behavior data into a factorization machine model, and ranking the question-and-answer content contained in the initial question-and-answer content recommendation set according to their relevance to obtain the question-and-answer content recommendation set includes: The basic feature data, the personal information data, and the historical behavior data are used to generate one or more feature factor groups for the user; The feature factor set is input into the factorization machine model to obtain the factorization matrix; Based on the decomposition matrix, the cross term coefficients of each of the feature factor groups are calculated; Based on the cross term coefficients, the correlation value of each of the feature factor groups is calculated; Based on the magnitude of the relevance values, the question and answer content corresponding to the feature factor group is sorted to obtain a question and answer content recommendation set.

5. A community question-and-answer recommendation device, used to implement the method according to any one of claims 1-4, characterized in that, include: The question-and-answer content recommendation candidate set acquisition module is used to acquire the question-and-answer content recommendation candidate set, wherein the question-and-answer content recommendation candidate set includes basic feature data of question-and-answer content; The user information acquisition module is used to acquire user information, wherein the user information includes the user's personal information data and historical behavior data. The first calculation module is used to obtain the embedding value of each question and answer content in the question and answer content recommendation candidate set based on the basic feature data, the personal information data, and the historical behavior data. The second calculation module is used to calculate the user's Embedding value based on the Embedding value of the question and answer content and the historical behavior data; The similarity calculation module is used to calculate the similarity between the embedding value of the question and answer content and the embedding value of the user; The initial question-and-answer content recommendation set generation module is used to generate an initial question-and-answer content recommendation set based on similarity. The question-and-answer content recommendation set generation module is used to input the basic feature data, the personal information data, and the historical behavior data into the factorization machine model, and sort the question-and-answer content contained in the initial question-and-answer content recommendation set according to their relevance to obtain the question-and-answer content recommendation set.

6. The community question-and-answer recommendation device according to claim 5, characterized in that, The question-and-answer content recommendation candidate set acquisition module includes: The filtering unit is used to filter out question and answer content that meets the preset recommendation criteria from the database and generate a candidate set of recommended question and answer content, wherein the database contains all question and answer content in the community.

7. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 4.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 4.