Method for constructing graph structure, and recommendation method and related apparatus

By generating semantic representations and latent factors through a large language model, and automatically constructing graph structures, the problems of poor graph structure quality and high dependence on manual intervention are solved, resulting in more accurate graph structures and more efficient recommendation performance.

WO2026124240A1PCT designated stage Publication Date: 2026-06-18HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2025-11-28
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

In existing technologies, graph structures are of poor quality, making it difficult to achieve efficient expansion in large-scale scenarios. Furthermore, relying on manual construction is costly and affects the prediction accuracy of recommendation systems.

Method used

By acquiring textual descriptions of multiple targets, semantic representations are generated using a large language model, latent factors are extracted, and graph structures are automatically constructed. Semantic similarity between targets is measured based on latent factors, reducing reliance on manual intervention and improving the accuracy of graph structures.

🎯Benefits of technology

It enables the construction of accurate graph structures from a global perspective, reduces computational overhead, and improves the performance of downstream tasks, especially enhancing recommendation performance in recommendation systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025138344_18062026_PF_FP_ABST
    Figure CN2025138344_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Provided in the present application are a method for constructing a graph structure, and a recommendation method and a related apparatus. The method for constructing a graph structure comprises: acquiring text descriptions of a plurality of targets, wherein the plurality of targets comprise a plurality of users or a plurality of objects; on the basis of the text descriptions of the plurality of targets, respectively generating semantic representations of the plurality of targets; on the basis of the semantic representations of the plurality of targets, respectively extracting latent factors of the plurality of targets, wherein the latent factors of the plurality of targets are used for reflecting the semantic similarity between the plurality of targets; and constructing a graph structure, wherein the graph structure comprises nodes of the plurality of targets, association relationships between the nodes of the plurality of targets are determined on the basis of the latent factors of the plurality of targets, and the graph structure is used for a downstream task. The solution in the embodiments of the present application is conducive to improving the perception of a graph structure with regard to global semantic information, thereby ensuring the accuracy of the graph structure, and is further conducive to improving the execution effect of a downstream task.
Need to check novelty before this filing date? Find Prior Art

Description

Methods, recommended methods, and related devices for constructing graph structures

[0001] This application claims priority to Chinese Patent Application No. 202411817524.X, filed on December 10, 2024, entitled "Method, Recommended Method and Related Apparatus for Constructing a Graph Structure", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of artificial intelligence, and more specifically, to a method for constructing graph structures, a recommendation method, and related apparatus. Background Technology

[0003] Graph-structured data can intuitively represent complex relationships between entities, helping to uncover hidden connections and is widely used in various fields, such as social networks or recommendation systems.

[0004] Taking recommender systems as an example, the task of a recommender system is to comprehensively consider factors such as users, objects, and current context information to recommend objects that users may be interested in. In actual modeling, the probability of a user clicking or converting is often used to rank objects and display the recommendation results. Click-through rate (CTR) or conversion rate prediction is a core task in recommender systems, aiming to predict the probability of a user clicking on or converting to recommended objects (such as music, advertisements, etc.). The quality of graph structure data is the foundation of the entire graph learning process and is an important factor affecting the accuracy of prediction. In related solutions, graph structures are usually constructed based on preset rules or by manual methods, resulting in poor graph structure quality and difficulty in achieving efficient expansion in large-scale scenarios. Summary of the Invention

[0005] This application provides a method, recommended method, and related apparatus for constructing graph structures. This approach helps ensure the accuracy of the graph structure, thereby improving the performance of downstream tasks.

[0006] Firstly, a method for constructing a graph structure is provided, comprising: obtaining textual descriptions of multiple targets, including multiple users or multiple objects; generating semantic representations of multiple targets based on the textual descriptions of the multiple targets; extracting latent factors of the multiple targets based on the semantic representations of the multiple targets, the latent factors of the multiple targets being used to reflect the semantic similarity between the multiple targets; constructing a graph structure, the graph structure including nodes of the multiple targets, the association between the nodes of the multiple targets being determined based on the latent factors of the multiple targets, and the graph structure being used for downstream tasks.

[0007] According to the scheme of this application embodiment, latent factors of multiple targets are extracted based on their semantic representations, and a graph structure is constructed based on these latent factors. These latent factors can be used to measure global semantic similarity, allowing the scheme to construct the graph structure from a global perspective. This improves the graph structure's perception of global semantic information, ensuring its accuracy and thus improving the execution performance of downstream tasks. Furthermore, the scheme of this application embodiment enables automatic graph structure construction, reducing reliance on manual labor, lowering labor costs, and increasing construction efficiency.

[0008] For example, a user's text description may include the user's attribute information and / or the user's behavioral data, etc.

[0009] For example, the textual description of an object may include the object's attribute information, etc.

[0010] Latent factors can reflect the similarity of targets in a certain aspect. Targets with the same or similar latent factors have higher semantic relevance.

[0011] In conjunction with the first aspect, in some implementations of the first aspect, multiple targets include a first target, and generating semantic representations of multiple targets based on textual descriptions of multiple targets includes: using a large language model (LLM) to generate semantic representations of the first target based on textual descriptions of the first target, wherein the input of the LLM includes cue words that instruct the LLM to analyze the first target based on textual descriptions of the first target.

[0012] The first objective is any one of these multiple objectives.

[0013] Taking the user as the primary target as an example, the user's prompt words are input into the LLM. These prompt words can be used to instruct the LLM to infer the user's preference information based on the user's text description.

[0014] Taking the first target as an example, input the object's cue words into the LLM. The object's cue words can be used to instruct the LLM to extract facts about the object based on the object's text description, or to generate richer knowledge related to the object.

[0015] In the solution of this application embodiment, the target can be analyzed using LLM, which enhances the semantic representation of the target and facilitates obtaining more accurate semantic information of the target, thereby leading to a more accurate graph structure. Furthermore, in the solution of this application embodiment, each LLM call can involve only one target, avoiding pairwise comparisons, resulting in a total time complexity of O(N), compared to the O(N) time complexity of pairwise comparisons. 2 This approach reduces time complexity and significantly decreases the number of LLM calls, thereby reducing computational overhead and ensuring processing efficiency.

[0016] In conjunction with the first aspect, in some implementations of the first aspect, the semantic representation of the first objective includes the hidden layer representation of one or more hidden layers of the LLM.

[0017] Optionally, the semantic representation of the first objective may include the hidden layer representation of the last hidden layer of the LLM.

[0018] The hidden representation of the last hidden layer usually contains rich semantic information and can better capture contextual information, which is conducive to obtaining more accurate semantic representation, and thus to obtaining more accurate graph structure.

[0019] In conjunction with the first aspect, in some implementations of the first aspect, multiple objectives include a second objective and a third objective, the nodes of the second objective and the nodes of the third objective are related, and the latent factors of the second objective and the latent factors of the third objective are at least partially the same.

[0020] There is a correlation between factors that share the same latent factors.

[0021] In conjunction with the first aspect, in some implementations of the first aspect, the graph structure includes nodes of multiple latent factors, the multiple latent factors include latent factors of multiple objectives, and there are edges between the nodes of the multiple objectives and the nodes of the latent factors of the multiple objectives.

[0022] For example, in the process of constructing a graph structure, edges are built between the nodes of each objective and the potential factors of that objective.

[0023] In the scheme of this application embodiment, latent factors are used as explicit nodes, and the association between targets is established by constructing the edges between the targets and the corresponding latent factors, which is beneficial to improving the construction efficiency.

[0024] In conjunction with the first aspect, in some implementations of the first aspect, multiple objectives include a fourth objective. Extracting latent factors of multiple objectives based on the semantic representations of multiple objectives includes: performing vector quantization processing on the semantic representation of the fourth objective based on multiple codebooks to obtain multi-level quantization results of the fourth objective. The latent factors of the fourth objective include the multi-level quantization results of the fourth objective and / or the identifiers of the multi-level quantization results.

[0025] The identification of multi-level quantization results and / or multiple quantization results constitutes multiple latent factors.

[0026] In the scheme of this application embodiment, multiple latent factors of the target can be extracted. Different latent factors can reflect the similarity of the target in different aspects, which can more accurately measure the overall semantic similarity of the target, thereby helping to obtain a more accurate graph structure.

[0027] In conjunction with the first aspect, in some implementations of the first aspect, the semantic representation of the fourth target is vector quantized using multiple codebooks, including: residual quantization of the semantic representation of the fourth target based on multiple codebooks.

[0028] In conjunction with the first aspect, in some implementations of the first aspect, the downstream tasks include recommendation tasks.

[0029] In the scheme of this application embodiment, the constructed graph structure can be applied to recommendation tasks, which is beneficial to improving the performance of recommendation tasks, such as improving recommendation results.

[0030] In conjunction with the first aspect, in some implementations of the first aspect, the recommendation task is performed by a recommendation model, which is used to predict the scores of candidate recommendation objects. The scores of candidate recommendation objects are used to reflect the matching degree between the candidate recommendation objects and the first user. The input information of the recommendation model includes enhanced features, which include graph structures, and / or features of a fifth target obtained based on the graph structure. The graph structure includes nodes of the fifth target, which is the first user or a candidate recommendation object.

[0031] The score can be the score during the sorting phase or the score during the recall phase.

[0032] In conjunction with the first aspect, in some implementations of the first aspect, a graph structure is used to identify multiple recall objects during the recall phase of the recommendation task. In the graph structure, there is an association between the nodes of the multiple recall objects and the node of the first user.

[0033] Secondly, a recommendation method is provided, comprising: receiving a recommendation request; acquiring input information related to the recommendation request, the input information including information of a first user, information of candidate recommendation objects, and enhanced features related to a graph structure, wherein the enhanced features include at least one of the following: a graph structure, features of the first user obtained based on the graph structure, or features of candidate recommendation objects obtained based on the graph structure; the graph structure includes nodes of multiple targets, the association between nodes of multiple targets is determined based on latent factors of multiple targets, the latent factors of multiple targets are extracted based on semantic representations of multiple targets, the semantic representations of multiple targets are generated based on textual descriptions of multiple targets, and the multiple targets include multiple users or multiple objects; inputting the input information into a recommendation model to obtain scores of candidate recommendation objects, the scores being used to reflect the matching degree between the first user and the candidate recommendation objects.

[0034] According to the scheme of the embodiments of this application, latent factors of multiple targets are extracted based on the semantic representation of multiple targets, and a graph structure is constructed based on the latent factors of multiple targets. The latent factors of multiple targets can be used to measure global semantic similarity. This allows the scheme of this application to construct a graph structure from a global perspective, which is beneficial to improving the graph structure's perception of global semantic information, thereby ensuring the accuracy of the graph structure. Applying this graph structure to a recommendation system is beneficial to improving the recommendation effect.

[0035] In conjunction with the second aspect, in some implementations of the second aspect, there is a relationship between the nodes of the candidate recommendation object and the nodes of the first user in the graph structure.

[0036] In conjunction with the second aspect, in some implementations of the second aspect, multiple objectives include a first objective, the semantic representation of which is generated using a large language model (LLM). The input of the LLM includes cue words, which are used to instruct the LLM to analyze the first objective based on the text description of the first objective.

[0037] In conjunction with the second aspect, in some implementations of the second aspect, the semantic representation of the first objective includes the hidden layer representation of one or more hidden layers of the LLM.

[0038] In conjunction with the second aspect, in some implementations of the second aspect, multiple objectives include a second objective and a third objective, there is an association between the nodes of the second objective and the nodes of the third objective, and the latent factors of the second objective and the latent factors of the third objective are at least partially the same.

[0039] In conjunction with the second aspect, in some implementations of the second aspect, the graph structure includes nodes of multiple latent factors, the multiple latent factors include latent factors of multiple objectives, and there are edges between the nodes of the multiple objectives and the nodes of the latent factors of the multiple objectives.

[0040] In conjunction with the second aspect, in some implementations of the second aspect, multiple objectives include a fourth objective, and the latent factors of the fourth objective include the multi-level quantization results of the fourth objective and / or the identifiers of the multi-level quantization results. The multi-level quantization results of the fourth objective are obtained by vector quantization processing of the semantic representation of the fourth objective based on multiple codebooks.

[0041] In conjunction with the second aspect, in some implementations of the second aspect, the multi-level quantization result of the fourth objective is obtained by performing residual quantization processing on the semantic representation of the fourth objective based on multiple codebooks.

[0042] Thirdly, an apparatus for constructing a graph structure is provided, the apparatus comprising modules / units for performing the methods of the first aspect and any implementation thereof.

[0043] Fourthly, a recommended apparatus is provided, comprising modules / units for performing the methods of the second aspect and any implementation thereof.

[0044] It should be understood that the extensions, limitations, interpretations and descriptions of the relevant content in the first aspect above also apply to the same content in the second, third and fourth aspects.

[0045] Fifthly, a computing device is provided, including a processor and a memory, and optionally, an input / output interface. The processor controls the input / output interface to send and receive information, the memory stores a computer program, and the processor retrieves and runs the computer program from the memory, enabling the execution of the method described in the first aspect, the second aspect, or any possible implementation thereof.

[0046] Optionally, the processor can be a general-purpose processor, which can be implemented in hardware or software. When implemented in hardware, the processor can be a logic circuit, integrated circuit, etc.; when implemented in software, the processor can be a general-purpose processor that reads software code stored in memory. This memory can be integrated into the processor or located outside the processor and exist independently.

[0047] Optionally, the aforementioned computing device may be a terminal device / server, or a chip within a terminal device / server.

[0048] Sixthly, a chip is provided that acquires and executes instructions to implement the methods in the first or second aspect and any of the implementation methods described above.

[0049] Optionally, as one implementation, the chip includes a processor and a data interface, through which the processor reads instructions stored in the memory and executes the methods in the first aspect or the second aspect and any of the implementations described above.

[0050] Optionally, as one implementation, the chip may further include a memory storing instructions, and the processor is used to execute the instructions stored in the memory. When the instructions are executed, the processor is used to perform the method in the first aspect or the second aspect and any of the implementations.

[0051] In a seventh aspect, a computer program product containing instructions is provided, which, when executed by a computing device, cause the computing device to perform the methods described in the first or second aspect and any of their implementations.

[0052] Eighthly, a computer-readable storage medium is provided, including computer program instructions that, when executed by a computing device, perform the method as described in the first or second aspect and any of its implementations.

[0053] As examples, these computer-readable storage devices include, but are not limited to, one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), flash memory, electrically EPROM (EEPROM), and hard drive.

[0054] Alternatively, as one implementation method, the aforementioned storage medium can specifically be a non-volatile storage medium. Attached Figure Description

[0055] Figure 1 is an architecture diagram of a system architecture according to an embodiment of this application.

[0056] Figure 2 is a schematic diagram of the system architecture of a recommendation system according to an embodiment of this application.

[0057] Figure 3 is a schematic diagram of two schemes for constructing graph structures.

[0058] Figure 4 is a schematic diagram of a device for constructing a graph structure according to an embodiment of this application.

[0059] Figure 5 is a schematic flowchart of a method for constructing a graph structure according to an embodiment of this application.

[0060] Figure 6 is a schematic diagram illustrating an example of a latent factor extraction process according to an embodiment of this application.

[0061] Figure 7 is a schematic diagram of an example of a graph structure according to an embodiment of this application.

[0062] Figure 8 is a schematic flowchart of a recommended method according to an embodiment of this application.

[0063] Figure 9 is a schematic diagram of the features of the first user obtained based on the graph structure according to an embodiment of this application.

[0064] Figure 10 is a schematic diagram of the architecture of a recommendation model according to an embodiment of this application.

[0065] Figure 11 is a schematic diagram of codeword ID used as an inverted index in an embodiment of this application.

[0066] Figure 12 is a schematic diagram of an application scenario according to an embodiment of this application.

[0067] Figure 13 is a schematic diagram of two examples of generating enhanced knowledge through LLM according to embodiments of this application.

[0068] Figure 14 is a schematic diagram of another example of a graph structure according to an embodiment of this application.

[0069] Figure 15 is a schematic diagram of another example of a graph structure according to an embodiment of this application.

[0070] Figure 16 is a schematic diagram of the results of the visualization analysis of the potential factors in the embodiments of this application.

[0071] Figure 17 is a schematic diagram of another system architecture according to an embodiment of this application.

[0072] Figure 18 is a schematic block diagram of an apparatus according to an embodiment of this application.

[0073] Figure 19 is a schematic diagram of the architecture of a computing device according to an embodiment of this application. Detailed Implementation

[0074] The technical solutions in this application will now be described with reference to the accompanying drawings.

[0075] The terminology used in the following embodiments is for the purpose of describing specific embodiments only and is not intended to be limiting of this application. As used in the specification and appended claims of this application, the singular expressions “a,” “an,” and “the” are intended to include expressions such as “one or more,” unless the context clearly indicates otherwise. It should also be understood that in the following embodiments of this application, “at least one,” “at least one,” and “one or more” refer to one, two, or more than two. “First,” “second,” and various numerical designations are merely distinctions for descriptive convenience and are not intended to limit the scope of the embodiments of this application. “And / or” is used to describe the correspondence between corresponding objects, indicating that three relationships can exist. For example, “A and / or B” can represent: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character “ / ” generally indicates that the preceding and following related objects are in an “or” relationship. The order of the process numbers below does not imply the order of execution. The execution order of each process should be determined by its function and internal logic and should not constitute any limitation on the implementation process of the embodiments of this application. For example, in the embodiments of this application, the words "301", "401", "501" etc. are merely identifiers made for the convenience of description and do not limit the order of execution steps.

[0076] References to "one embodiment" or "some embodiments" as described in this specification mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. In this application, the words "exemplary" or "for example" are used to indicate that something is illustrative, exemplary, or descriptive. Any embodiment or design described as "exemplary" or "for example" in this application should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the words "exemplary" or "for example" is intended to present the relevant concepts in a concrete manner. The terms "comprising," "including," "having," and variations thereof all mean "including but not limited to," unless otherwise specifically emphasized. In the embodiments of this application, descriptions such as "when," "in the case of," "if," and "if" all refer to the fact that the device will perform a corresponding processing under certain objective circumstances, and are not a limitation on time, nor do they require the device to perform a judgment action during implementation, nor do they imply any other limitations.

[0077] In this application, "for indicating" can include both direct and indirect indication. When describing an indication message as indicating A, it can include whether the indication message directly indicates A or indirectly indicates A, but does not necessarily mean that the indication message carries A.

[0078] The recommendation method provided in this application can be applied to various types of information retrieval scenarios and used for prediction tasks in information retrieval systems (such as recommendation platforms, search engines, and advertising systems).

[0079] Alternatively, the solutions in this application embodiment can be applied to recommendation systems. In the solutions of this application embodiment, a recommendation system refers to a system that analyzes users' historical data, predicts new recommendation requests based on the analysis results, and obtains recommendation results.

[0080] The task of a recommender system is to recommend items that a user is most interested in by comprehensively considering factors such as the user, the item, and the current context. In actual modeling, the probability of a user clicking on or converting the recommended item is often used to rank the items and display the recommendation results. Click-through rate (CTR) or conversion rate estimation is a core task in recommender systems, aiming to predict the probability of a user clicking on or converting to a recommended item (such as music or an advertisement).

[0081] For example, the recommendation method of this application embodiment can be applied to recommendation platforms, such as for product recommendation, music recommendation, or information flow recommendation.

[0082] For example, product recommendations can suggest products that users are most likely to be interested in, thereby increasing click-through rates and conversion rates.

[0083] For example, music recommendations can suggest music that users are most likely to be interested in, thereby increasing users' listening time and improving user experience.

[0084] For example, the recommendation method of this application embodiment can be applied to search engines, such as for product search or browser search.

[0085] For example, the recommended method in this application embodiment can be applied to an advertising system, for example, for ad delivery.

[0086] For example, ad recommendations can suggest ads that users are most interested in, thereby increasing click-through rates and generating revenue for the platform.

[0087] The following is a brief introduction to two commonly used application scenarios.

[0088] Application Scenario 1: App store recommendations;

[0089] The recommendation method of this application embodiment can be applied to the recommendation system of the application market for recommending products (such as applications) in the application market.

[0090] App stores can display a subset of applications. Recommendation systems are used to determine which applications are displayed and their corresponding placement. When a user enters an app store, a recommendation request is triggered. Since the placement of applications is limited, when the recommendation system receives a recommendation request, it can sort all applications to be displayed according to their expected revenue and then select one or more of the most valuable applications to display in their respective positions. For example, in a cost-per-click (CPC) system, advertisers only pay when an application is clicked by a user. In a CPC system, the value of an application is typically determined by its expected revenue. Each application's expected revenue is related to its estimated click-through rate (CTR). In this case, CTR can be understood as the probability that each application (app) will be clicked. To obtain the ranking based on expected revenue, the estimated CTR is needed.

[0091] Specifically, the estimated CTR of all applications to be displayed is determined, the expected revenue of each application is calculated based on the estimated CTR of each application and they are sorted, and the applications to be displayed and their corresponding display positions are determined based on the sorting results.

[0092] The recall and / or ranking tasks can be determined based on the recommendation methods in the embodiments of this application. For example, all applications to be displayed can be determined based on the recommendation methods in the embodiments of this application. Similarly, the ranking results of the applications can be determined based on the recommendation methods in the embodiments of this application.

[0093] Application Scenario 2: Browser Search Recommendations;

[0094] The solution in this application embodiment can be applied to the browser's search engine for browser search recommendations.

[0095] In one possible scenario, when a user performs a search, after obtaining the user's input search terms, the search engine can determine the ranking of all search results to be displayed based on factors such as the user's search history. The ranking then determines which search results will be displayed and their corresponding placement.

[0096] The recall and / or ranking tasks can be determined based on the recommendation methods in the embodiments of this application. For example, all search results to be displayed can be determined based on the recommendation methods in the embodiments of this application. Similarly, the ranking of search results can be determined based on the recommendation methods in the embodiments of this application.

[0097] In another possible scenario, when a user conducts a search, the search terms typically come from two sources: user-inputted search terms and system-recommended search terms. User-inputted search terms are a user behavior that the system cannot interfere with. System-recommended search terms refer to the search terms that are triggered when a user enters a search context. When the system receives a recommendation request, it can calculate the score of all search terms to be displayed and rank them. For example, the search term score can represent the probability of a search term being clicked. Based on the ranking result, the system can determine which search terms will be displayed and their corresponding display position.

[0098] The recall and / or ranking tasks can be determined based on the recommendation methods in the embodiments of this application. For example, all search terms to be displayed can be determined based on the recommendation methods in the embodiments of this application, and the ranking results of search terms can be determined based on the recommendation methods in the embodiments of this application.

[0099] To facilitate understanding of the solutions in the embodiments of this application, the terms that may be involved in the embodiments of this application will be explained below.

[0100] (1) Neural Networks:

[0101] Neural networks can be composed of neural units, which can refer to units represented by x. s The arithmetic unit that takes an intercept of 1 as input can output the following:

[0102] Where s = 1, 2, ..., n, n is a natural number greater than 1, W s For x s The weights are denoted by b, where b is the bias of the neural unit.

[0103] f represents the activation function of a neural network, used to introduce nonlinear characteristics and convert the input signal into the output signal. The output signal of this activation function can be used as the input to the next layer. For example, the activation function can be ReLU, tanh, or sigmoid.

[0104] A neural network is a network formed by connecting multiple individual neural units, meaning that the output of one neural unit can be the input of another. The input of each neural unit can be connected to the local receptive field of the previous layer to extract features from the local receptive field, which can be a region composed of several neural units.

[0105] (2) Deep Neural Networks:

[0106] A deep neural network (DNN), also known as a multilayer neural network, can be understood as a neural network with multiple hidden layers. Based on the position of the layers, the internal neural network of a DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally, the first layer is the input layer, the last layer is the output layer, and the layers in between are hidden layers. The layers are fully connected, meaning that any neuron in the i-th layer is connected to any neuron in the (i+1)-th layer.

[0107] Although DNNs seem complex, the operation of each layer is actually not complicated. Simply put, it involves the following linear relationship expression: in, It is the input vector. It is the output vector. α is the offset vector, W is the weight matrix (also called coefficients), and α() is the activation function. Each layer is simply an adjustment of the input vector. The output vector is obtained through such a simple operation. Because DNNs have many layers, the coefficients W and the offset vector... The number of these parameters is also relatively large. The definitions of these parameters in DNNs are as follows: Taking the coefficient W as an example: Assuming a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as... The superscript 3 represents the layer number where coefficient W is located, while the subscript corresponds to the third layer index 2 of the output and the second layer index 4 of the input.

[0108] In summary, the coefficient from the k-th neuron in layer L-1 to the j-th neuron in layer L is defined as...

[0109] It's important to note that the input layer does not have a W parameter. In deep neural networks, more hidden layers allow the network to better represent complex real-world situations. Theoretically, the more parameters a model has, the higher its complexity and "capacity," meaning it can perform more complex learning tasks. Training a deep neural network is essentially the process of learning the weight matrix, with the ultimate goal of obtaining the weight matrix of all layers in the trained deep neural network (a weight matrix formed by the vectors W from many layers).

[0110] (3) Loss function:

[0111] In training a deep neural network, to ensure the output closely approximates the desired predicted value, we compare the network's prediction with the target value. Based on the difference, we update the weight vector of each layer (usually pre-configuring parameters before the initial update). For example, if the prediction is too high, the weight vector is adjusted to predict a lower value. This adjustment continues until the deep neural network predicts the target value or a value very close to it. Therefore, we need to predefine "how to compare the difference between the predicted and target values," which is the loss function or objective function. These are important equations used to measure the difference between the predicted and target values. Taking the loss function as an example, a higher output value (loss) indicates a greater difference, and training the deep neural network becomes a process of minimizing this loss.

[0112] (4) Backpropagation algorithm:

[0113] Backpropagation (BP) is an algorithm used during training to correct the parameters in the initial model, thereby reducing the model's error loss. Specifically, forward propagation of the input signal to the output generates an error loss; this error loss information is then propagated back to update the parameters in the initial model, leading to convergence of the error loss. The backpropagation algorithm is an error-loss-driven backpropagation process aimed at obtaining optimal model parameters, such as the weight matrix.

[0114] (5) Large language model:

[0115] Large Language Models (LLMs) are neural network models with a large number of parameters trained on vast corpora, capable of understanding and generating natural language text. Specifically, LLMs are typically based on neural network techniques, learning the syntax, semantics, and contextual information of a language through training on large amounts of text data. During training, the model continuously optimizes its parameters to improve its ability to understand and generate text. Due to their powerful ability to understand natural language, LLMs have been widely applied in many fields to solve natural language understanding and generation problems. LLMs have broad applications in artificial intelligence, such as natural language processing, machine translation, and dialogue systems.

[0116] (6) Click-through rate prediction (CTR prediction):

[0117] CTR prediction is a forecast of each product click based on information such as the product, the user, and the context. The accuracy of the prediction affects revenue.

[0118] (7) Quantification:

[0119] Quantization involves dividing a continuous high-dimensional space into different encoding regions and mapping the quantized vector to discrete indices of the corresponding regions through nearest neighbor lookup.

[0120] (8) Vector quantization (VQ):

[0121] VQ is a data compression technique that maps a vector of continuous values ​​to a finite set of discrete values. This finite set is called the codebook, and its elements are called codewords. The codebook is the quantization vector used for vector quantization.

[0122] For example, the basic process of vector quantization may include the following stages.

[0123] Training phase:

[0124] Select or generate an initial codebook from the original dataset. The original dataset consists of original data, which is a high-dimensional vector. The original data is simply the original vector.

[0125] Clustering algorithms are used to iteratively optimize the codebook so that the codewords in the codebook can represent the vector distribution in the original dataset. Each codeword represents a set of similar vectors in the original data.

[0126] Encoding phase:

[0127] For each vector to be quantized (i.e. the original vector), find the codeword that is closest to it, and use that codeword index to represent the vector to be quantized.

[0128] In this way, the original high-dimensional vector is replaced with a low-dimensional index.

[0129] Decoding phase:

[0130] Based on the stored codeword index, the corresponding codeword is retrieved from the codebook as an approximate representation of the original vector.

[0131] (9) Residual quantization (RQ):

[0132] RQ is an improved vector quantization method that improves overall quantization accuracy by recursively quantizing the difference (i.e., residual) between the original vector and its quantized approximation. This method is particularly suitable for applications requiring high-precision representation, such as large-scale similarity search, image processing, and recommender systems.

[0133] For example, the basic process of residual quantization may include the following stages.

[0134] Initial quantization:

[0135] First, a codebook is generated using traditional vector quantization methods (such as clustering), and the original vectors are quantized for the first time.

[0136] Choose the codeword that is closest to the original vector as the initial quantization result.

[0137] Calculate the residuals:

[0138] The difference between the original vector and the initial quantization result obtained in the first step is calculated; this difference is the residual vector.

[0139] Recursive quantization:

[0140] For the residual vector obtained in the previous step, the vector quantization process is applied again to generate a new codebook and find the codeword that is closest to the residual vector.

[0141] This new codeword represents a correction term to the original vector, used to further approximate the original vector.

[0142] Iterative process:

[0143] The above steps can be repeated multiple times, each time quantifying the residuals more precisely based on the previous step, until the required accuracy is achieved or other stopping conditions are met.

[0144] Decoding phase:

[0145] During decoding, all quantized codewords are summed to obtain the final approximate vector.

[0146] The system architecture of the embodiments of this application will be described below.

[0147] Figure 1 illustrates a system architecture 200 applicable to an embodiment of this application. The system architecture 200 may include a local device 220, a local device 230, an execution device 210, and a data storage system 250, wherein the local devices 220 and 230 are connected to the execution device 210 via a communication network.

[0148] For example, the execution device 210 can be a device or server with data processing capabilities, such as a cloud server, network server, application server, or management server. For instance, the execution device 210 may be implemented by one or more servers, optionally in conjunction with other computing devices, such as data storage devices, routers, load balancers, etc. The execution device 210 may be deployed on a single physical site or distributed across multiple physical sites. The execution device 210 may use data from the data storage system 250 or call program code from the data storage system 250 to implement the recommended methods of this application embodiment. The execution device 210 can also be referred to as a cloud device, in which case the execution device 210 may be deployed in the cloud.

[0149] In one possible implementation, the method executed by the aforementioned execution device 210 can be a training method executed in the cloud.

[0150] Users can interact with execution device 210 by operating their respective user devices (e.g., local device 220 and local device 230). Each local device can represent any computing device, such as a personal computer, computer workstation, smartphone, laptop, tablet, augmented reality (AR) / virtual reality (VR) device, ultra-mobile personal computer (UMPC), netbook, personal digital assistant (PDA), smart camera, smart car, other types of cellular phone, media consumption device, wearable device, set-top box, or game console, etc.

[0151] Each user's local device can interact with the execution device 210 through a communication network of any communication mechanism / standard. The communication network can be a wide area network, a local area network, a point-to-point connection, or any combination thereof.

[0152] In one implementation, local devices 220 and 230 can obtain relevant parameters of the recommendation model from execution device 210, and use the recommendation model to obtain recommendation results on local devices 220 and 230.

[0153] In another implementation, the recommendation model can be directly deployed on the execution device 210. The execution device 210 obtains recommendation requests and related information, such as user information, candidate recommendation object information and / or context information, from local devices 220 and 230, and obtains recommendation results through the recommendation model.

[0154] For example, the data storage system 250 may be deployed in local device 220 or local device 230 for storing training data of the local device.

[0155] For example, the data storage system 250 can be deployed independently on the storage device, separate from the local device 220 or the local device 230. The storage device can interact with the local device to obtain user behavior logs from the local device and store them in the storage device.

[0156] Figure 2 shows a schematic diagram of the system architecture of a recommendation system according to an embodiment of this application. The recommendation system will be described below using a click-through rate (CTR) prediction scenario as an example. In this case, the model in Figure 2 can be a recommendation model, such as a CTR prediction model. The CTR prediction model is used to predict the user's click-through rate on products to guide the system in making recommendations to the user.

[0157] As shown in Figure 2, the operation of this system can be divided into two parts: online and offline. The system can include an online prediction module and an offline training module.

[0158] In the online component, when a user enters the system, they can initiate a request (such as a recommendation request). The online prediction module can then provide a corresponding click-through rate (CTR) prediction based on relevant information, such as the user, product, and contextual information. For example, as shown in Figure 2, user, product, and contextual information can be used to generate features, which can then serve as input to the model. The online prediction module can be implemented through prediction instances. The request is sent to a prediction instance, which hosts a CTR prediction model. The CTR prediction model outputs the predicted CTR, and the recommendation system can then output recommendations based on this prediction, such as generating a list of recommended products.

[0159] For example, when a user opens the app store on a smart device (e.g., a mobile phone), it triggers the app store's recommendation system, which then initiates a recommendation request. The recommendation system can predict the probability of the user clicking to download various candidate applications based on the user's historical behavior logs, such as their download history, and the app store's own characteristics, such as environmental features like time and location. The recommendation system can then display candidate applications in descending order of predicted download probability, thereby increasing the download probability of each candidate application.

[0160] Recommendations can be fed back to users for interaction. Users browse the product list and may perform actions, such as clicking. User behavior can be recorded and stored in a log. After data processing, the log can be used as training data for an offline training model. The offline training module can train the click-through rate (CTR) prediction model using training algorithms, or in other words, update the CTR prediction model. Features generated based on user, product, and contextual information can also be used as input information in the training data. The updated CTR prediction model is then used in the online prediction module, for example, deployed in a prediction instance, and goes live, completing the entire closed loop.

[0161] A prediction instance can be understood as a computing resource or environment used to run machine learning models. For example, a prediction instance can be a virtual machine (VM), a container, container technologies (such as Docker), or a serverless function.

[0162] In the scheme of this application embodiment, the text description of the user and / or the text description of the product can be processed to construct a graph structure. The enhanced features related to the graph structure can be input into the model as an additional feature to assist the model's prediction in the online stage, and / or to optimize the training algorithm in the offline stage, i.e., to train the model, so that the system can better utilize the potential relationships in the graph structure, improve the model's capabilities, and thus improve the accuracy of the model results.

[0163] Graph-structured data is widely used in various fields, such as social networks and recommender systems. Taking recommender systems as an example, GNN-based recommendation methods have become cutting-edge algorithms in this field because they can extract complex topological distribution information from graph-structured data. The quality of the graph-structured data is a crucial factor affecting prediction accuracy.

[0164] In some related solutions, graph structures are constructed based on preset rules or through crowdsourcing. For example, constructing graph structures through crowdsourcing can be based on relational annotations in knowledge graphs.

[0165] Figure 3 illustrates two schemes for constructing graph structures. Figure 3(a) shows a scheme for constructing a collaborative graph based on preset rules, which can be called a conventional collaborative graph. For example, if a user clicks on an item, it is determined that an edge needs to be connected between the user node and the item node. Figure 3(b) shows a content collaborative graph, which, compared to Figure 3(a), also includes phrase nodes.

[0166] The above scheme relies too much on rules, resulting in a graph structure with high noise. As shown in Figure 3, there are edges caused by noisy clicks, resulting in poor quality.

[0167] Some related solutions rely on manual methods to construct the graph structure, that is, manually determining which nodes need to be connected by edges. This approach has high manual costs and is difficult to scale efficiently in large-scale scenarios.

[0168] Large language models (LLMs), with their extensive open-domain knowledge and excellent language understanding and reasoning capabilities, have become a potential solution for constructing graph structures. Specifically, leveraging the external knowledge and reasoning capabilities of LLMs, user node pairs or item node pairs, along with their respective information, are input pairwise into the LLM. The LLM then determines the similarity between the nodes and decides whether to connect them based on the similarity level. This approach automates graph structure construction to a certain extent, significantly reducing manual costs.

[0169] However, due to the context length limitation of LLM, it is difficult to provide complete graph information in the prompts. Pairwise comparisons of nodes lead to a lack of information about their relationships with other nodes, preventing the scheme from understanding the global information of the graph and only optimizing the local graph structure, thus compromising the quality of the graph structure. Furthermore, this scheme, based on pairwise node comparisons, has a time complexity of O(N^2). 2 The LLM has a high inference frequency and cannot adapt to large-scale scenarios.

[0170] In view of this, the embodiments of this application provide a recommendation method that is beneficial to improving the graph structure's perception of global semantic information, thereby improving the quality of the graph structure and helping the recommendation system achieve better recommendation results. At the same time, the solution of the embodiments of this application helps to ensure overall processing efficiency and avoids the computational burden caused by too many LLM calls.

[0171] Figure 4 is a schematic diagram of a device 400 for constructing a graph structure according to an embodiment of this application. To better understand the method in this embodiment, the functions of each module in Figure 4 are briefly described below. The device 400 can be deployed on a cloud server, on a terminal device, or on both a cloud server and a terminal device.

[0172] As shown in Figure 4, the device 400 may include a semantic vector generation module 410, a latent factor extraction module 420, and a graph construction module 430.

[0173] The semantic vector generation module 410 can be used to generate a semantic representation of the target based on the textual description of the target.

[0174] For example, the input to the semantic vector generation module 410 may include the text data of the target, and the output may include the semantic representation of the target.

[0175] The target can be a user or an object.

[0176] Optionally, the semantic representation generation module 410 can be used to generate semantic representations of users and / or semantic representations of objects.

[0177] For example, the input to the semantic representation generation module 410 may include the user's text description, and the output may include the user's semantic representation.

[0178] For example, the input to the semantic representation generation module 410 may include a textual description of an object, and the output may include a semantic representation of the object.

[0179] Alternatively, the semantic representation of the user and the semantic representation of the object can be generated by two semantic representation generation modules 410 respectively, and this application embodiment does not limit this.

[0180] The latent factor extraction module 420 can be used to extract latent factors from semantic representations.

[0181] For example, the input to the latent factor extraction module 420 may include the semantic representation of the target, and the output may include one or more latent factors of the target.

[0182] Optionally, the latent factor extraction module 420 can be used to extract latent factors of a user from the user's semantic representation, and / or to extract latent factors of an object from the object's semantic representation.

[0183] For example, the input of the latent factor extraction module 420 may include the user's semantic representation, and the output may include the latent factors of the user's semantic representation, i.e., the user's latent factors.

[0184] For example, the input to the latent factor extraction module 420 may include the semantic representation of an object, and the output may include the latent factors of the semantic representation of the object, i.e., the latent factors of the object.

[0185] Alternatively, the latent factors of users and objects can be generated by two latent factor extraction modules 420 respectively, and this application embodiment does not limit this.

[0186] Graph construction module 430 can be used to construct a graph structure based on the latent factors of each objective. The relationships between nodes of each objective in the graph structure are determined based on the latent factors of each objective.

[0187] For example, the graph construction module 430 can be used to establish the association between nodes of each target based on the similarity between the potential factors of each target.

[0188] For example, nodes with the same latent factors are associated with each other.

[0189] As an example, latent factors can be represented as nodes in a graph structure, i.e., nodes of latent factors. Edges are established between the nodes of each objective and the nodes of its latent factors. Accordingly, nodes of objectives with the same latent factor are associated with each other through the nodes of that latent factor.

[0190] The constructed graph structure can be used in downstream task models to improve the performance of downstream tasks.

[0191] For a detailed description of each module in device 400, please refer to method 500 shown in Figure 5.

[0192] Figure 5 illustrates a method for constructing a graph structure according to an embodiment of this application. The method 500 shown in Figure 5 can be executed by a cloud server, a terminal device, or a combination of both.

[0193] For example, the method 500 shown in FIG5 can be performed by the apparatus shown in FIG4.

[0194] As shown in Figure 5, method 500 may include the following steps.

[0195] 510, retrieve text descriptions of multiple targets.

[0196] 520. Generate semantic representations of the multiple targets based on their textual descriptions.

[0197] 530. Based on the semantic representations of the multiple targets, extract the latent factors of each target. These latent factors are used to reflect the semantic similarity between the multiple targets.

[0198] 540. Construct a graph structure that includes nodes related to the multiple objectives. The relationships between the nodes of the multiple objectives are determined based on the latent factors of the multiple objectives. This graph structure can be used for downstream tasks.

[0199] The graph structure including nodes representing multiple objectives can be understood as these multiple objectives serving as nodes within the graph structure. Alternatively, the multiple objectives correspond to multiple nodes within the graph structure.

[0200] Optionally, the multiple targets may include multiple users or multiple objects.

[0201] For example, if the multiple targets can include multiple users, then the multiple targets can be used as multiple user nodes in a graph structure.

[0202] For example, the multiple targets may include multiple objects, and these multiple objects can be used as multiple object nodes in a graph structure.

[0203] The object can also be replaced with an item, project, or commodity.

[0204] This graph structure can also be replaced with an interaction graph or a collaboration graph.

[0205] A target's textual description refers to textual data or content information related to the target. For example, a target's textual description may include the target's attribute information.

[0206] As one possible implementation, the multiple targets can be multiple users. Text descriptions of multiple users can be obtained in step 510.

[0207] A user's text description refers to text data related to the user. For example, a user's text description may include the user's attribute information and / or the user's behavioral data.

[0208] For example, user attribute information may include one or more of the following: user's gender, user's age, user's occupation, the income of the recommending user, user's hobbies, or user's education level, etc.

[0209] User behavior data refers to the records of user interactions with the system.

[0210] For example, the system can be a system used to perform downstream tasks.

[0211] The following example uses downstream tasks as recommended tasks to illustrate user behavior data.

[0212] This system can be a recommendation system. The recommendation task can be a prediction task within the recommendation system, such as click-through rate prediction. User behavior data refers to the records of user interactions with the recommendation system. For example, actions can include clicks, purchases, favorites, ratings, downloads, and any other interaction between the user and recommended items (such as products, articles, videos, etc.).

[0213] For example, user behavior data may include one or more of the following information: behavior type, timestamp, target of operation, or context information.

[0214] The behavior type refers to the specific action performed by the user, such as browsing, liking, or purchasing. The timestamp records the time when the user performed the action. The operation object refers to the specific item or content ID that the user's action targeted. Contextual information can include the user's current situation, such as device type, geographical location, and weather conditions.

[0215] For example, user behavior data can be provided in the form of user behavior data sequences.

[0216] It should be understood that the above are merely examples of behavioral data and do not constitute a limitation on the specific content of behavioral data. Furthermore, this system may not necessarily be a system that performs downstream tasks; that is, behavioral data obtained from user interactions with other systems can be used in downstream tasks.

[0217] As another possible implementation, the multiple targets can be multiple objects. Textual descriptions of the multiple objects can be obtained in step 510.

[0218] The object refers to the item or content that the user's action targets. This user action can be an action that has already occurred or an action that has not yet occurred. Taking a downstream task as a recommendation task as an example, the object refers to the recommendation object in the recommendation system, that is, the recommended item.

[0219] The recommended object refers to the object recommended by the recommendation system to the user. For example, in a music recommendation scenario, the recommended object can be music, while in an advertising scenario, the recommended object can be advertisements. It should be understood that the specific content of the recommended object can be different in different recommendation scenarios, and the embodiments of this application do not limit the specific content of the recommended object.

[0220] An object's textual description refers to the textual data associated with the object. For example, an object's textual description may include the object's attribute information, etc.

[0221] For example, an object's attribute information may include one or more of the following: the object's name or the object's type, etc.

[0222] The process of generating semantic representations is explained below.

[0223] Semantic representations can also be replaced by semantic vectors, semantic embeddings, or semantic embedding vectors.

[0224] For example, step 520 can be performed by the semantic vector generation module 410 in device 400.

[0225] Optionally, the plurality of targets may include target #1 (an example of a first target). Step 520 may include: generating a semantic representation of target #1 based on the textual description of target #1 using an LLM.

[0226] The number #1 is for descriptive convenience only; target #1 can be any one of the multiple targets.

[0227] For example, in step 520, semantic representations of the multiple targets can be generated by LLM based on the textual descriptions of the multiple targets.

[0228] For ease of description, the following explanation uses the generation process of the semantic representation of one target (target #1) as an example. The generation process of the semantic representation of other targets can be referred to this process.

[0229] The text description of target #1 can be input into the LLM via a prompt. That is, the input of the LLM includes a prompt for target #1, which is used to indicate the text description of target #1.

[0230] Further, optionally, the cue words for target #1 can be used to instruct the LLM to analyze target #1 based on the text description of the target in order to obtain the analysis results.

[0231] Taking target #1 as an example, the prompt words for this user are input into the LLM. These prompt words can be used to instruct the LLM to infer the user's preference information based on the user's text description.

[0232] Taking target #1 as an example, input the object's cue words into the LLM. The object's cue words can be used to instruct the LLM to extract facts about the object based on its text description, or to generate richer knowledge related to the object.

[0233] As an example, the semantic representation of target #1 can be determined based on the hidden layer representations of one or more hidden layers of the LLM.

[0234] The text description of target #1 is input into the LLM for processing via prompt words, and the semantic representation of target #1 is determined based on one or more hidden layer representations generated by the LLM.

[0235] The hidden layer representation of a hidden layer in an LLM can be understood as the state of that hidden layer or the vector of its output.

[0236] For example, the semantic representation of target #1 may include the hidden layer representations of one or more hidden layers of the LLM.

[0237] Optionally, the semantic representation of target #1 may include the hidden layer representation of the last hidden layer of the LLM.

[0238] Alternatively, the semantic representation of target #1 can be obtained by processing the hidden representations of one or more hidden layers of the LLM.

[0239] For example, taking target #1 as the user, the prompt words of the user are input into the LLM. The prompt words of the user can be used to instruct the LLM to infer the user's preference information based on the user's text description. The hidden layer representation of the last hidden layer in the LLM generation process is used as the semantic representation of the user.

[0240] Figure 13 illustrates two examples of the generation process. Figure 13(a) shows an example of the reasoning process for user preference information. The prompt in Figure 13(a) can be a user preference reasoning prompt.

[0241] The user's prompt template, the prompt content built based on the template, and the response obtained by inputting the prompt into the LLM can be seen in Figure 13(a).

[0242] For example, taking target #1 as an object, the LLM is input with a prompt word for that object. This prompt word can be used to instruct the LLM to extract facts about the object based on its textual description, or to generate richer knowledge related to the object, using the hidden representation of the last hidden layer in the LLM generation process as the semantic representation of the object. Figure 13(b) shows an example of the object knowledge extraction process. The prompt in Figure 13(b) can be an item fact extracting prompt.

[0243] The object's prompt template, the prompt content built based on the template, and the response obtained by inputting the prompt into the LLM can be seen in Figure 13(b).

[0244] As another example, the semantic representation of target #1 can be obtained by text encoding the output of the LLM through a text encoder.

[0245] The text description of target #1 is input into the LLM through the prompt words of target #1 for processing. The output of the LLM is encoded by a text encoder, and the encoded result is used as the semantic representation of target #1.

[0246] For example, taking target #1 as a user, the prompt words of the user are input into the LLM. The prompt words of the user can be used to instruct the LLM to infer the user's preference information based on the user's text description. The user's preference information output by the LLM is text-encoded, and the encoding result is used as the semantic representation of the user.

[0247] For example, taking target #1 as an example, input object prompts into the LLM. These object prompts can be used to instruct the LLM to extract facts about the object based on its text description, or to generate richer knowledge related to the object. The LLM output is then text-encoded, and the encoded result is used as the semantic representation of the object.

[0248] It should be understood that the above are merely some examples of methods for generating semantic representations and do not constitute a limitation on the solutions of the embodiments of this application. For example, in other implementations, semantic representations of the target can also be generated using language models other than LLM or text encoders.

[0249] The process of extracting latent factors is explained below.

[0250] Latent factors can also be replaced with semantic factors or semantic elements, etc.

[0251] The semantic representations of these multiple targets are obtained separately, and it is necessary to extract the overall correlation between the semantic representations of these multiple targets.

[0252] After obtaining the semantic representations of multiple targets, latent factors of these targets are extracted to uncover their global semantic relevance, i.e., to explore the potential connections between them. Semantic relevance can also be replaced by semantic similarity.

[0253] Latent factors typically contain specific semantics and can reflect the similarity of targets in a certain aspect. Targets with the same or similar latent factors have a higher semantic relevance. The latent factors of multiple targets can be used to reflect the semantic similarity of those targets.

[0254] For example, step 530 can be performed by the latent factor extraction module 420 in device 400.

[0255] Latent factors can be extracted in a variety of ways.

[0256] As one possible implementation, the latent factors of the target can be obtained through vector quantization. In this case, the latent factors can also be replaced by semantic indexes or quantization indexes, etc.

[0257] That is, by using vector quantization, different targets are divided into different nodes of the codebook to mine global semantic information.

[0258] Optionally, the plurality of objectives may include objective #2 (an example of the fourth objective). Step 530 may include: performing vector quantization on the semantic representation of objective #2 to obtain the latent factors of objective #2.

[0259] The number #2 is for descriptive convenience only; target #2 can be any one of the multiple targets.

[0260] For example, in step 530, the semantic representations of the multiple targets are subjected to vector quantization processing to obtain the latent factors of the multiple targets.

[0261] The number of potential factors for each objective can be one or more.

[0262] For ease of description, the following explanation will focus on the extraction process of latent factors for one target (target #2). The extraction process of latent factors for other targets can refer to this process.

[0263] Vector quantization can be achieved in a variety of ways.

[0264] For example, vector quantization can be implemented using a clustering algorithm, which involves clustering the semantic representations based on multiple objectives and determining the latent factors for each objective based on the clustering results. For instance, the cluster centers and / or the identifiers (IDs) of the semantic representations of each objective can be used as latent factors for each objective. In this case, each objective can have only one latent factor, such as the identifier of the cluster center.

[0265] Alternatively, vector quantization can be implemented based on multiple codebooks. For example, the semantic representation of target #2 can be vector-quantized based on multiple codebooks to obtain a multi-level quantization result of target #2. The latent factors of target #2 include the multi-level quantization result of target #2 and / or the identifier of the multi-level quantization result. In this case, there can be multiple latent factors of target #2, such as the identifier of the multi-level quantization result.

[0266] These multiple codebooks can correspond to the multi-level quantization results. The multi-level quantization results of target #2 are multiple codewords, that is, the codewords in the corresponding codebooks that are closest to the input of that layer. The identifier of the quantization result can be the index of the codeword.

[0267] Optionally, vector quantization can be achieved through residual quantization. That is, residual quantization is performed on the semantic representations of the multiple targets based on the multiple codebooks to obtain the latent factors of the multiple targets.

[0268] Taking target #2 as an example, residual quantization is performed on the semantic representation of target #2 based on the multiple codebooks to obtain the latent factors of target #2. The latent factors of target #2 may include multiple codewords corresponding to target #2 or the identifiers (IDs) of these codewords. The IDs of these codewords can be the indexes of these codewords. In this case, the latent factors of target #2 can also be called the codeword identifiers (IDs) of target #2, or codebook IDs.

[0269] For a given target, there can be multiple codewords, which can come from different codebooks.

[0270] Figure 6 illustrates an example of the extraction process of a latent factor according to an embodiment of this application.

[0271] The following example, with reference to Figure 6, illustrates the extraction of latent factors of a target using residual quantization. The process of extracting latent factors of a target may include the following steps.

[0272] 1) The semantic representations of the multiple targets are reduced in dimensionality using an encoder to obtain the dimensionality reduction results of the multiple targets.

[0273] For example, the encoder can be a DNN encoder, such as the multilayer perception machine (MLP) encoder shown in Figure 6.

[0274] As shown in Figure 6, the semantic representation of the target (i.e., the embedding in Figure 6) is dimensionality reduced by an MLP encoder to obtain the dimensionality reduction result.

[0275] 2) Perform residual quantization on the dimensionality reduction results of the multiple targets to obtain the codeword IDs of the multiple targets.

[0276] In step 2), a multi-layer codebook can be generated based on the dimensionality reduction results of the multiple objectives, and the corresponding codeword IDs can be obtained. The dimensionality reduction results of the multiple objectives can be used as training samples.

[0277] For example, step 2) can be implemented by the following steps.

[0278] 2-1) The quantization process of the first layer:

[0279] The input to the quantization process in the first layer is the dimensionality reduction result of these multiple targets.

[0280] A codebook is generated based on the dimensionality reduction results of the multiple targets, and the quantization results of the first layer of the multiple targets and the residual vector of the first layer of the multiple targets are obtained based on the codebook.

[0281] For example, a codebook can be generated based on the dimensionality reduction results of these multiple objectives using vector quantization methods such as clustering. This codebook can be considered as a first-level codebook.

[0282] For each target, the codeword in the first-layer codebook that is closest to the dimensionality reduction result of that target can be used as the initial quantization result of that target, i.e., the quantization result of the first layer. The difference between the dimensionality reduction result of the target and the initial quantization result of the target is calculated, and this difference can be used as the residual vector of the first layer of the target.

[0283] 2-2) Quantization process of subsequent layers:

[0284] For each subsequent layer, the residual vector calculated from the previous layer can be used as input.

[0285] For each subsequent layer, a codebook is generated based on the residual vector of the previous layer of the multiple targets, and the quantization result of the current layer of the multiple targets and the residual vector of the current layer of the multiple targets are obtained based on the codebook.

[0286] For each subsequent layer, for each target, the codeword in the codebook of that layer that is closest to the residual vector of the previous layer can be used as the quantization result of that layer. The difference between the quantization result of that layer and the residual vector of the previous layer is calculated, and this difference can be used as the residual vector of that layer.

[0287] The calculation and quantization of the residual vector are performed iteratively until a termination condition is met, such as reaching a preset number of quantization layers, i.e., the number of codebooks.

[0288] The index of the closest codeword corresponding to each target in the above process can be used as the codeword ID of each target.

[0289] For a given target, summing up all the quantization results of that target yields a reconstructed vector, which is the quantization representation.

[0290] For example, the quantization process based on the t-th layer codebook, i.e., the quantization process of the t-th layer, can satisfy the following formula:

[0291] Where t = 1, 2, ..., T. T is the number of layers in the codebook, or the number of codebooks. T is an integer greater than 1. 1 That is, x. This represents the codeword in the k-th row of the t-th codebook, or the k-th codeword. k is a positive integer. m t This indicates that in the codebook at level t, r... t The index of the closest codeword.

[0292] When t is less than T, the residual vector r of the t-th layer t+1 The following formula can be satisfied:

[0293] Accordingly, the codeword ID obtained through the above formula can be represented as: The vector reconstructed based on this codeword ID, i.e., the quantization representation, can be expressed as:

[0294] For example, as shown in Figure 6, T=3, the input to the quantization process of the first layer is the dimensionality reduction result output by the MLP encoder.

[0295] In the first-layer codebook (i.e., codebook 1 in Figure 6), the index of the codeword closest to the dimensionality reduction result is determined to be 7. The codeword with index 7 in codebook 1 is the quantization result of the first layer. The difference between the dimensionality reduction result and the codeword with index 7 in codebook 1 is calculated and used as the residual vector of the first layer, i.e., residual vector 1 in Figure 6.

[0296] In the second-layer codebook (i.e., codebook 2 in Figure 6), the index of the codeword closest to residual vector 1 is determined as 1. The codeword with index 1 in codebook 2 is the quantization result of the second layer. The difference between residual vector 1 and the codeword with index 1 in codebook 2 is calculated and used as the residual vector of the second layer, i.e., residual vector 2 in Figure 6.

[0297] In the third-layer codebook (i.e., codebook 3 in Figure 6), the index of the codeword closest to residual vector 2 is determined to be 4. The codeword with index 4 in codebook 3 is the quantization result of the third layer. The difference between residual vector 2 and the codeword with index 4 in codebook 3 is calculated and used as the residual vector of the third layer, i.e., residual vector 3 in Figure 6.

[0298] The index of the codeword closest to the input of each layer in the above quantization process can be used as a latent factor, as shown in Figure 6. The latent factors of this objective include (7, 1, 4). The quantization results of this objective are summed to obtain the quantization representation of the objective.

[0299] 3) The quantization representations of the multiple targets are decoded by the decoder to obtain the reconstructed semantic representations of the multiple targets.

[0300] For example, the decoder can be a DNN decoder, such as the MLP decoder shown in Figure 6.

[0301] For example, as shown in Figure 6, the quantized representation is decoded by an MLP decoder to obtain the reconstructed semantic representation of the target (i.e., the embedding in Figure 6).

[0302] For example, the loss during the training process of the codebook can be determined based on reconstruction loss and / or consistency loss.

[0303] For example, the loss L during the codebook training process. rq The following formula can be satisfied:

[0304] L rq =L rec +L com ;

[0305] Among them, L rec L represents the reconstruction loss. com This indicates a loss of consistency.

[0306] For example, the reconstruction loss can satisfy the following formula:

[0307] Where v represents the semantic representation of the target, Represents the reconstructed semantic representation of the target.

[0308] For example, the consistency loss can satisfy the following formula:

[0309] Where sg[] represents the gradient stopping operation, and β is used to control the ratio of reconstruction loss to consistency loss.

[0310] The latent factors of these multiple objectives can be represented as a set of codeword IDs for the multiple objectives.

[0311] For example, the latent factors of multiple users can be represented as a set. Where |u| represents the number of users, and |u| is an integer greater than 1. This represents the codeword ID of the j-th user.

[0312] For example, the latent factors of multiple objects can be represented as a set. Where |i| represents the number of objects, and |i| is an integer greater than 1. This represents the codeword ID of the j-th object.

[0313] The number of codewords in each codebook can be the same or different. This application does not limit this.

[0314] The number of codebooks, the number of codewords in each codebook, the structure of the encoder, and the structure of the decoder shown in Figure 6 are merely examples and do not constitute a limitation on the solutions of the embodiments of this application.

[0315] It should be understood that the above are merely examples, and vector quantization can also be implemented in other ways, such as through a vector quantization variational auto-encoder (VAE). This application does not limit the implementation of this method.

[0316] Furthermore, the latent factors of these multiple objectives can also be obtained through methods other than vector quantization. For example, the latent factors of these multiple objectives can be obtained through locality-sensitive hashing (LSH). For instance, the latent factors of an objective can be the hash values ​​of its semantic representation.

[0317] The construction process of the graph structure is explained below.

[0318] In step 540, a graph structure can be constructed based on the potential factors of the multiple objectives.

[0319] For example, step 540 can be performed by the graph construction module 430 in device 400.

[0320] For example, the graph structure can include various types of nodes. For instance, the graph structure can include user nodes and object nodes.

[0321] For example, the multiple targets can be multiple users, and the nodes of the multiple targets are the nodes of the multiple users in the graph structure, that is, multiple user nodes.

[0322] For example, the multiple targets can be multiple objects, and the nodes of the multiple targets are the nodes of the multiple objects in the graph structure, that is, multiple object nodes.

[0323] Specifically, in step 540, the association between nodes of the multiple targets can be determined based on the similarity between the potential factors of the multiple targets.

[0324] Optionally, nodes of two targets with the same latent factors are associated.

[0325] If at least some of the latent factors of two objectives are the same, then the two objectives are two objectives with the same latent factors.

[0326] For example, the multiple objectives may include objective #3 (an example of a second objective) and objective #4 (an example of a third objective). There is an association between the nodes of objective #3 and the nodes of objective #4, and at least some of the latent factors of objective #3 and the latent factors of objective #4 are the same.

[0327] The numbers #3 and #4 are for descriptive convenience only; they can be any two of the multiple targets.

[0328] Since there is a one-to-one correspondence between targets and their nodes, the relationship between the nodes of two targets can also be understood as the relationship between the two targets. This application embodiment will not make this distinction.

[0329] Further, optionally, the graph structure may include nodes for multiple latent factors, i.e., multiple latent factor nodes. These multiple latent factors include latent factors for the multiple objectives.

[0330] For example, as mentioned above, the latent factors of the plurality of objectives can be obtained by vector quantization of the semantic representations of the plurality of objectives. For example, the plurality of latent factors can be codewords or codeword indices in all codebooks; or, for instance, the plurality of latent factors can be codeword IDs of the plurality of objectives.

[0331] There are edges connecting the nodes of the multiple objectives to the nodes of the potential factors of the multiple objectives.

[0332] In other words, latent factors can be treated as explicit nodes in the graph structure, i.e., nodes of latent factors. Edges are established between the nodes of each target and the nodes of the latent factors of the targets. Taking the latent factors of a target as its codeword ID as an example, the codeword IDs of all targets can be treated as explicit nodes in the graph structure. For a given codeword ID, the target mapped to that codeword ID is connected to that codeword ID via an edge. The target mapped to that codeword ID is the target that possesses that codeword ID.

[0333] In this way, a connection is constructed between the target and the latent factor, and the nodes of targets with the same latent factor are associated with each other through the nodes of that latent factor.

[0334] In the scheme of this application embodiment, latent factors are used as explicit nodes, and the association between targets is established by constructing the edges between the targets and the corresponding latent factors, which is beneficial to improving the construction efficiency.

[0335] As mentioned above, a graph structure can include various types of nodes. For any type of node, the relationships between nodes can be determined using the scheme described in the embodiments of this application.

[0336] Optionally, the plurality of objectives includes a plurality of objectives of type 1 and a plurality of objectives of type 2.

[0337] The first type and the second type are different types. The "first" in "target of the first type" and the "second" in "target of the second type" are only used to distinguish between the two different types of targets and do not have any other limiting function. For example, the target of the first type can be a user, while the target of the second type can be an object.

[0338] In step 520, semantic representations of the multiple first-type targets are extracted based on the text descriptions of the multiple first-type targets, and semantic representations of the multiple second-type targets are extracted based on the text descriptions of the multiple second-type targets.

[0339] In step 530, latent factors of the multiple first-type targets are extracted based on the semantic representations of the multiple first-type targets, and latent factors of the multiple second-type targets are extracted based on the semantic representations of the multiple second-type targets.

[0340] The association relationships between nodes of the plurality of first-type targets are determined based on latent factors relating to the plurality of first-type targets. The association relationships between nodes of the plurality of second-type targets are determined based on latent factors relating to the plurality of second-type targets.

[0341] The generation methods for the semantic representations of the multiple first-type targets and the multiple second-type targets can refer to step 520, and the generation methods for the latent factors of the multiple first-type targets and the multiple second-type targets can refer to step 530. To avoid repetition, they will not be described again here.

[0342] The following example illustrates the process of constructing a graph structure, using the first type of target as the user and the second type of target as the object.

[0343] Based on steps 510 to 530, latent factors for multiple users and latent factors for multiple objects are obtained. In step 540, a graph structure is constructed based on the latent factors of the multiple users and the multiple objects. In this graph structure, the relationships between nodes of the multiple users are determined based on the latent factors of the multiple users, and the relationships between nodes of the multiple objects can be determined based on the latent factors of the multiple objects.

[0344] This graph structure can include nodes representing multiple user latent factors and nodes representing multiple object latent factors. The multiple user latent factors include the latent factors of those multiple users. The multiple object latent factors include the latent factors of those multiple objects.

[0345] For example, the latent factors of these multiple users are obtained by vector quantization of the semantic representations of these multiple users. The codebook used in this process can be referred to as the user's codebook.

[0346] For example, these multiple user latent factors can be all codewords or codeword indices in the user's codebook. Assume the user has three codebooks, as shown in Figure 7: codebook #u1, codebook #u2, and codebook #u3. Each codebook can include 5 codewords. The indices of all codewords in these three codebooks can serve as explicit nodes in the graph structure, i.e., nodes representing user latent factors.

[0347] For example, the multiple user latent factors can be the latent factors of the multiple users. Suppose that there are three codebooks for the users, as shown in Figure 7: codebook #u1, codebook #u2, and codebook #u3. The indices of the codewords in these three codebooks that have a mapping relationship with the multiple users can be used as explicit nodes in the graph structure, that is, nodes of user latent factors.

[0348] For example, the latent factors of these multiple objects are obtained by vector quantization of the semantic representations of these multiple objects. The codebook used in this process can be called the object's codebook.

[0349] For example, these multiple object latent factors can be all codewords or codeword indices in the object's codebook. Assume the object has three codebooks, as shown in Figure 7: codebook #i1, codebook #i2, and codebook #i3. Each codebook can include 5 codewords. The indices of all codewords in these three codebooks can serve as explicit nodes in the graph structure, i.e., nodes representing the object's latent factors.

[0350] For example, the latent factors of these multiple objects can be the latent factors of these multiple objects. Suppose there are three codebooks for the objects, as shown in Figure 7: codebook #i1, codebook #i2, and codebook #i3. The indices of the codewords in these three codebooks that have a mapping relationship with these multiple objects can be used as explicit nodes in the graph structure, that is, nodes of object latent factors.

[0351] It should be understood that the number of user codebooks, the number of object codebooks, the number of codewords in each codebook, the number of users, and the number of objects in Figure 7 are merely examples and do not constitute a limitation on the scheme of the embodiments of this application.

[0352] This graph structure can include four types of nodes: user nodes, object nodes, user latent factor nodes, and object latent factor nodes. Correspondingly, the graph structure includes three types of edges: user-object edges. u-i User-User Latent Factor Edge e u-q and object-object latent factor edge e i-q User-object edges are edges between user nodes and object nodes. User-user latent factor edges are edges between user nodes and user latent factor nodes. Object-object latent factor edges are edges between object nodes and object latent factor nodes.

[0353] In a graph structure, users, objects, and latent factors are all represented as nodes. For ease of description, the edge between a user node and an object node can be simply referred to as the edge between a user and an object; the edge between a user node and a node of a user latent factor can be simply referred to as the edge between a user and a user latent factor; and the edge between an object node and a node of an object latent factor can be simply referred to as the edge between an object and an object latent factor.

[0354] User-user latent factor edges are determined based on the latent factors of the multiple users. Object-object latent factor edges are determined based on the latent factors of the multiple objects.

[0355] Specifically, in the graph structure, each user is connected to the latent factors of that user, and each object is connected to the latent factors of that object.

[0356] For example, as shown in Figure 7, each user is connected to the user's codeword ID, and each object is connected to the object's codeword ID.

[0357] Taking user 1 as an example, user 1's potential factors, i.e. codeword ID, include (0,2,2). Connect the nodes of user 1 with the nodes of the corresponding user potential factors.

[0358] User-object edges can be obtained based on whether a user has performed an action on the object. For example, a user's action on a recommended object could include clicking, downloading, purchasing, or browsing. Based on user behavior data, it can be determined whether a user has performed an action on the object; if so, an edge can be established between the user and the object.

[0359] The above only uses two types of objectives as examples. In other implementation methods, more types of objectives can be included. You can refer to steps 510 to 530 to extract potential factors for each type of objective. In step 540, the association relationship between nodes of each type of objective is determined based on the potential factors of that type of objective.

[0360] The constructed graph structure can mine multi-hop neighbor information and semantic information, thereby optimizing the graph structure.

[0361] As mentioned above, graph structures can be used for downstream task models. For example, a downstream task can be a recommendation task. Alternatively, a downstream task can be other tasks that depend on a graph structure, which is not limited in this embodiment.

[0362] As one possible implementation, graph structures can be used in the inference phase of downstream task models.

[0363] For example, during the inference process of the downstream task model, the graph-structure-related enhancement features are used as part of the input information of the downstream task model. Alternatively, the graph-structure-related enhancement features are used as an additional input to the downstream task model.

[0364] The following example uses downstream tasks as recommended tasks for illustration.

[0365] Alternatively, this recommendation task can be performed using a recommendation model. This model can be used to predict a score for a candidate recommendation object, which indicates the degree of match between the candidate object and the first user.

[0366] The input to the recommendation model may include augmented features related to the graph structure. These augmented features may include the graph structure and / or features of target #5 (an example of the fifth target) obtained based on the graph structure. The graph structure includes nodes for target #5, which is either the first user or a candidate recommendation object.

[0367] The score of the candidate recommendation can be either the score during the ranking stage or the score during the recall stage.

[0368] Optionally, this graph structure can be used to identify multiple recall objects during the recall phase of a recommendation task. In this graph structure, there is an association between the nodes of these multiple recall objects and the node of the first user.

[0369] For a detailed description of the use of graph structures in the inference phase of downstream task models, please refer to Method 800 below.

[0370] As one possible implementation, graph structures can be used in the training phase of downstream task models.

[0371] For example, during the training of the downstream task model, augmented features related to the graph structure are used as training data for the downstream task model. For instance, during the training of the downstream task model, augmented features related to the graph structure are used as part of the input information for the downstream task model.

[0372] The scheme of this application embodiment can be repeated to construct a new graph structure, or in other words, to update the constructed graph structure.

[0373] After the solution of the embodiment of this application is executed for the first time, a constructed graph structure can be obtained. If a new text description of a target is subsequently obtained, the solution of the embodiment of this application can be executed based on the new text description of the target to update the graph structure.

[0374] For ease of description, the nodes of the target in the current graph structure are referred to as the nodes of the original target. The text description of the new target can be the text description of target 'a'.

[0375] Generate a semantic representation of target a based on the textual description of target a, extract latent factors of target a based on the semantic representation of target a, and then update the graph structure based on the latent factors of target a.

[0376] The method for extracting the latent factors of target a can be the same as the method for extracting the latent factors of multiple targets in method 500.

[0377] For example, the latent factors of the multiple objectives can be obtained by performing residual quantization on the semantic representations of the multiple objectives. In this case, residual quantization can be performed on the semantic representation of objective a to obtain the latent factors of objective a. The codebook used in the residual quantization process can be a T-layer codebook obtained based on the semantic representations of the multiple objectives.

[0378] In other words, during the residual quantization process of the semantic representation of target a, the latent factors of target a can be extracted based on the existing codebook without generating a new codebook.

[0379] That is, the T-layer codebook can be used in the subsequent graph structure update process. After obtaining the text description of the new target, the potential factors of the target can be extracted directly based on the existing codebook without generating a new codebook each time.

[0380] For example, target 'a' can be a different target than the original target; that is, target 'a' is a new target. For instance, target 'a' could be a new user. In this case, a node for target 'a' can be added to the graph structure, and edges can be connected between target 'a' and its latent factors to obtain the updated graph structure.

[0381] Alternatively, target 'a' can be one of multiple original targets. In this case, the edges between target 'a' and latent factors in the graph structure can be updated based on the latent factors of the newly generated target 'a' to obtain the updated graph structure.

[0382] This allows the graph structure to be updated based on the new data, which helps ensure the accuracy of the graph structure and thus helps ensure the effectiveness of subsequent tasks.

[0383] According to the scheme of this application embodiment, latent factors of multiple targets are extracted based on their semantic representations, and a graph structure is constructed based on these latent factors. These latent factors can be used to measure global semantic similarity, allowing the scheme to construct the graph structure from a global perspective. This improves the graph structure's perception of global semantic information, ensuring its accuracy and ultimately improving the performance of downstream tasks. The scheme of this application embodiment enables automatic graph structure construction, reducing reliance on manual labor, thus reducing labor costs and improving construction efficiency.

[0384] In the scheme of this application embodiment, the external knowledge and reasoning capabilities of LLM can be utilized to obtain the semantic representation of the target, which is beneficial to improving the accuracy of the semantic representation, so as to better capture deep semantics in the subsequent process, thereby facilitating the obtaining of a more accurate graph structure. For example, the target can be analyzed by LLM, which can enhance the semantic representation of the target, resulting in more accurate semantic information of the target and thus a more accurate graph structure. Furthermore, in the scheme of this application embodiment, each LLM call can involve only one target, avoiding pairwise comparisons, and the total time complexity is O(N), compared to the O(N) time complexity of pairwise comparisons. 2 This approach reduces time complexity and significantly decreases the number of LLM calls, thereby reducing computational overhead and ensuring processing efficiency.

[0385] Furthermore, the hidden layer representation of the last hidden layer in the LLM generation process can be used to determine the semantic representation of the target. The hidden layer representation of the last hidden layer usually contains rich semantic information and can better capture contextual information, which is conducive to obtaining a more accurate semantic representation, and thus a more accurate graph structure.

[0386] Furthermore, the semantic representations of these multiple targets may be diverse and may contain noise. In the embodiments of this application, extracting the latent factors of these multiple targets helps to reduce the impact of noise.

[0387] The following example uses a recommender system to illustrate the application scenarios of the structure obtained through method 500.

[0388] The method 500 shown in Figure 5 can be applied to recommender systems, for example, by using enhanced features related to graph structures as an additional input to the recommender system. Exemplarily, method 500 can be used in the offline training module and / or online prediction module of Figure 3 to generate input features (i.e., input information) for the recommender system. The solutions of this application embodiment can be flexibly adapted to any recommender model and LLM.

[0389] Figure 8 shows a schematic flowchart of a recommendation method according to an embodiment of this application. Exemplarily, the method 800 shown in Figure 8 can be applied to a cloud server. For example, the recommendation model of this application embodiment can be deployed on a cloud server. Exemplarily, the solution of this application embodiment can be applied to a terminal device. For example, the recommendation model of this application embodiment can be built into a terminal device, and the terminal device executes the method of this application embodiment. Alternatively, the recommendation model of this application embodiment can be deployed in an application (APP) of the terminal device, that is, the recommendation method of this application embodiment is executed by calling the APP.

[0390] The graph structure in method 800 is obtained through method 500. For a detailed description, please refer to method 500. To avoid repetition, some descriptions of method 500 are omitted appropriately.

[0391] As shown in Figure 8, method 800 may include the following steps.

[0392] 810, Receive recommendation request.

[0393] 820, Obtain input information related to the recommendation request, including information about the first user and information about the candidate recommendation objects.

[0394] 830. Input the information into the recommendation model to obtain the score of the candidate recommendation object. This score is used to reflect the matching degree between the candidate recommendation object and the first user.

[0395] For example, when a user enters the recommendation system, a recommendation request is triggered. The user who triggers this request is the first user, and the recommended objects to be shown to that user can be considered as candidate recommended objects.

[0396] For example, the information of the first user may include the ID of the first user.

[0397] The first user's information can also include personalized attributes. These attributes can be replaced with profile information. For example, the first user's gender, age, occupation, income of the target recommended users, hobbies, or education level.

[0398] The first user's information may also include the first user's behavioral data, such as the objects the first user operated on. The objects the first user operated on are the objects on which the first user has previously performed actions, or in other words, historical items.

[0399] The information of the first user may also include the semantic representation of the first user. For example, the semantic representation of the first user can be extracted from the first user's text description. For instance, the semantic representation of the first user can be extracted from the first user's text description using an LLM (User LLM Embedding), which can be called a user LLM embedding. This LLM can be the same model as the LLM in method 500, or it can be a different model.

[0400] The information of the first user can be directly input into the recommendation model, or it can be processed (e.g., feature extraction) before being input into the recommendation model. This application embodiment does not limit this.

[0401] For example, the information of the candidate recommendation object may include the ID of the candidate recommendation object.

[0402] The information of candidate recommendation objects may also include some attribute information of the candidate recommendation objects, such as the name of the candidate recommendation object or the type of the candidate recommendation object.

[0403] The information of candidate recommendation objects may also include semantic representations of the candidate recommendation objects. For example, semantic representations of candidate recommendation objects can be extracted from their text descriptions. For instance, semantic representations can be extracted from the text descriptions of candidate recommendation objects using an LLM (Limited Least Meaning Model), which can be called item LLM embeddings. This LLM can be the same model as the LLM in method 500, or it can be a different model. The LLM used to extract the semantic representation of the user and the LLM used to extract the semantic representation of the candidate recommendation objects can be the same model, or they can be different models.

[0404] The information of the aforementioned candidate recommendations can be directly input into the recommendation model, or it can be processed (e.g., feature extraction) before being input into the recommendation model. This application does not limit this.

[0405] The input information may also include other input data required by the recommendation model, such as context information, etc., which are not limited in this embodiment of the application.

[0406] The information of the first user and the information of the candidate recommendations can be regarded as the original input (i.e., original features) of the recommendation model.

[0407] Method 800 can be applied to the recall and / or ranking phases of the recommendation process. For example, the scores of candidate recommendation objects can be used in the recall and / or ranking phases. The scores of candidate recommendation objects can also be understood as the first user's preference score for the candidate recommendation object, or the relevance score between the first user and the candidate recommendation object.

[0408] For example, the higher the score of the candidate recommendation object, the higher the match between the first user and the candidate recommendation object.

[0409] As one possible implementation, the score of the candidate recommendation object can be the score from the ranking stage.

[0410] For example, the score of a candidate recommendation object can be determined by the probability that the first user has performed an action on the candidate recommendation object. For instance, the higher the probability that the first user has performed an action on the candidate recommendation object, the higher the score of the candidate recommendation object, and the higher the match between the candidate recommendation object and the recommendation request.

[0411] The recommendation model can be a click-through rate (CTR) prediction model, used to predict the probability that the first user will perform an action on a candidate recommendation. This probability can be used for ranking tasks, such as sorting candidate recommendations within a set.

[0412] The probability that the first user will perform an action on the candidate recommendation object can also be replaced by the probability that the candidate recommendation object is recommended to the first user.

[0413] The candidate recommendation set can include multiple candidate recommendation objects. This candidate recommendation set can be obtained through the recall phase, in which case the multiple candidate recommendation objects are the multiple recall objects. The probability that the first user will perform an action on these multiple candidate recommendation objects can be determined with reference to step 830. After predicting the probability that the first user will perform an action on all candidate recommendation objects in the candidate recommendation set through the recommendation model, the candidate recommendation objects in the candidate recommendation set can be sorted based on this to obtain the recommendation result. For example, the candidate recommendation object with the highest probability can be displayed to the first user. Or, the multiple candidate recommendation objects with the highest probability can be displayed to the first user in descending order of probability.

[0414] Figure 12 shows the "Recommended" page in an app store. This page can have multiple lists, such as featured apps and featured games. Taking featured games as an example, candidate recommended games can be candidate recommended games. The app store's recommendation system predicts the probability that a user will download (install) a candidate recommended game based on the first user's information and the information of the candidate recommended applications. Based on this probability, the candidate recommended games are sorted in descending order, with the game most likely to be downloaded placed at the top.

[0415] For example, in a featured game, the recommended results could be: App 5 in the first recommended position, App 6 in the second, App 7 in the third, and App 8 in the fourth. After seeing the app store's recommended results, users can interact with them based on their interests. Once a user's action is performed, the corresponding behavioral data is stored in a log for subsequent updates to target interest features and / or the recommendation model.

[0416] As another possible implementation, the score of the candidate recommendation object can be the score from the recall phase. The score from the recall phase can be used for recall tasks, for example, to determine multiple recall objects.

[0417] The candidate object set can include multiple candidate recommendation objects, and the scores of these multiple candidate recommendation objects can be determined with reference to step 830. After predicting the scores of the candidate recommendation objects in the candidate object set through the recommendation model, suitable candidate recommendation objects can be selected from the candidate object set as recall objects. The recalled objects can be used for subsequent ranking tasks.

[0418] The graph structure obtained by method 500 can be applied to method 800. This graph structure includes nodes representing multiple targets. The relationships between these nodes are determined based on the latent factors of the multiple targets. These latent factors are extracted based on the semantic representations of the multiple targets. The semantic representations of the multiple targets are generated from their textual descriptions. The multiple targets include multiple users or multiple objects.

[0419] The graph structure includes the node of the first user, and / or the node of the candidate recommendation object.

[0420] The following examples illustrate how to use graph structures using two different methods (Method 1 and Method 2).

[0421] Method 1:

[0422] Optionally, the input information may also include augmented features related to the graph structure. Augmented features may include at least one of the following: the graph structure, features of the first user obtained based on the graph structure, and / or features of candidate recommendation objects obtained based on the graph structure.

[0423] Enhanced features can be viewed as additional input.

[0424] Features obtained based on graph structures can be replaced with graph representations.

[0425] As one possible implementation, features obtained based on graph structures can include node embedding vectors.

[0426] For example, the features of the first user obtained based on the graph structure can include the embedding vector of the first user. Similarly, the features of candidate recommendation objects obtained based on the graph structure can be the embedding vectors of the candidate recommendation objects.

[0427] For example, the node embedding vectors can be obtained by processing the graph structure using a GNN. For instance, a GNN can be a graph attention network (GAT). The graph structure is input into the GAT for processing to obtain the embedding vectors of each node in the graph structure.

[0428] Figure 9 illustrates a schematic diagram of the features of the first user obtained based on a graph structure. In Figure 9, K represents the key, V represents the value, and Q represents the query.

[0429] For example, as shown in Figure 9, a user's neighbor, or neighbor user, can be a user connected to the same codeword ID as that user.

[0430] For example, as shown in Figure 9, an item's neighbor, or neighbor item, can be an item that is connected to the same codeword ID as the item.

[0431] For example, as shown in Figure 9, the items connected to a user are the user's history items (hist items). By aggregating the embedding vectors of the first user (i.e., the target user in the figure), the first user's codeword ID, the first user's neighboring users, the history items, the history items' neighboring items, and the history items' codeword IDs, a new embedding vector is obtained, as shown in Figure 9, which is used to update the first user's embedding vector.

[0432] As another possible implementation, features obtained based on graph structures can include features obtained through neighbor aggregation via latent factor recall.

[0433] For example, the features of the first user obtained based on the graph structure can be features obtained by aggregating the features of the first user's neighbors. The neighbors of the first user are determined based on the latent factors of the first user, such as the codeword ID of the first user. Similarly, the features of candidate recommendation objects obtained based on the graph structure can be features obtained by aggregating the features of the candidate recommendation objects' neighbors. The neighbors of the candidate recommendation objects are determined based on the latent factors of the candidate recommendation objects, such as the codeword ID of the candidate recommendation objects.

[0434] As another possible implementation, features obtained based on graph structures can include latent factors.

[0435] For example, the features of the first user obtained based on the graph structure may include latent factors of the first user, such as the first user's codeword ID. Similarly, the features of candidate recommendation objects obtained based on the graph structure may include latent factors of the candidate recommendation objects, such as the candidate recommendation object's codeword ID.

[0436] For example, the input information may include: information about the first user, information about the candidate recommendation objects, features of the first user obtained based on the graph structure, and features of the candidate recommendation objects obtained based on the graph structure. The scores of the candidate recommendation objects can satisfy the following formula:

[0437] `score` represents the score of the candidate recommendation object, `φ()` represents the recommendation model, and `F` represents the score of the candidate recommendation object. u F represents the original characteristics of the first user, i.e., the information of the first user. i This represents the original features of the candidate recommendation objects, i.e., the information of the candidate recommendation objects. This represents the features of the first user obtained based on the graph structure. This represents the features of candidate recommendation objects obtained based on the graph structure.

[0438] Figure 10 illustrates a schematic diagram of the architecture of a recommendation model according to an embodiment of this application. As shown in Figure 10, the recommendation model can adopt a two-tower architecture, where user-side and item-side information are processed through two independent neural networks, namely the user tower and the item tower. The output of the user tower is the user embedding. The output of the item tower is the item embedding. Then, the matching score, i.e., the matching degree, is calculated between the user embedding and the item embedding.

[0439] Input information can include input from the user tower and input from the item tower.

[0440] For example, as shown in Figure 10, the input to the user tower may include user ID, user profile, history item, user quantification index, and user LLM embedding.

[0441] Specifically, the user ID, user profile, history items, and user quantification index are input into the embedding layer for processing. The processing results of the history items are processed by the average calculation module (agg.module in the figure). The user LLM embedding is input into the linear layer for processing. The above processing results are then fused using features, for example, through a concat operation. The feature fusion result is processed through two linear layers to obtain the user embedding.

[0442] For example, as shown in Figure 10, the input to the item tower can include item ID, item feature, item quantization index, and item LLM embedding. Item features can be obtained by feature extraction from item attribute information, etc.

[0443] Specifically, the item ID, item feature, and item quantization index are input into the embedding layer for processing, and the item LLM embedding is input into the linear layer for processing. The results of the above processing are then fused using features, for example, through a concat operation. The feature fusion result is then processed through two linear layers to obtain the item embedding.

[0444] Calculate the matching score between the user embedding and the item embedding. For example, the inner product of the user embedding and the item embedding can be used as the matching score between them.

[0445] User ID, user profile, historical items, and user LLM embedding can be considered as information about the primary user. The user quantification index can serve as a feature of the primary user obtained based on the graph structure. Item ID, item features, and item LLM embedding can be considered as information about candidate recommendation objects. The item quantification index can serve as a feature of candidate recommendation objects obtained based on the graph structure.

[0446] It should be understood that the structure, input data, and output data of the recommendation model in Figure 10 are merely examples and do not constitute a limitation on the solutions of the embodiments of this application.

[0447] Method 2:

[0448] In Method 2, a graph structure can be applied to identify multiple recall objects during the recall phase. Specifically, in this graph structure, there is an association between the nodes of the multiple recall objects and the node of the first user.

[0449] The multiple recall objects include the candidate recommendation object. In other words, the candidate recommendation object can be determined based on a graph structure. In this graph structure, there is a relationship between the nodes of the candidate recommendation object and the nodes of the first user.

[0450] For example, the plurality of recall objects may include at least one of the following: the history object of the first user, the neighbor object of the history object of the first user, the history object of the neighbor user of the first user, or the neighbor object of the history object of the neighbor user of the first user.

[0451] The first user's history objects are the objects connected to that first user in the graph structure. In the graph structure, an object's neighbor objects include objects connected to the same latent factors as that object, and a user's neighbor users include users connected to the same latent factors as that user. Specific examples of history objects, neighbor objects, and neighbor users can be found in Figure 9.

[0452] For example, an inverted index can be constructed using the latent factors of objects in a graph structure. This inverted index can then be used for recall.

[0453] In this inverted index, the key is the latent factor, and the value is an object containing that latent factor.

[0454] Based on the latent factors of an object, the corresponding object can be found in the inverted index, that is, the neighboring objects of that object.

[0455] The following example uses latent factors as codeword IDs for illustration.

[0456] In this case, an inverted index can be built based on the codeword IDs of objects in the graph structure. The key is the codeword ID, and the value is the object containing that codeword ID.

[0457] Figure 11 shows a schematic diagram of a codeword ID used as an inverted index.

[0458] For example, as shown in Figure 11, the codeword ID of object #1 is (7,1,4). Based on the codeword ID 7 of the first-level codebook, we can find objects #2 and #3 mapped to this codeword ID. Based on the codeword ID 1 of the second-level codebook, we can find objects #4 mapped to this codeword ID. Based on the codeword ID 4 of the third-level codebook, we can find objects #5 mapped to this codeword ID. Therefore, we can obtain the neighboring objects of object #1: object #2, object #3, object #4, and object #5.

[0459] For example, an inverted index can be constructed using latent factors of users in a graph structure. This inverted index can then be used for recall.

[0460] In this inverted index, the key is the latent factor, and the value is the user containing that latent factor. For a detailed description, please refer to the description of the object's latent factors; to avoid repetition, it will not be repeated here.

[0461] Optionally, the multiple objectives include a first objective, the semantic representation of which is generated using a large language model (LLM). The input of the LLM includes cue words that instruct the LLM to analyze the first objective based on the textual description of the first objective.

[0462] Optionally, the semantic representation of the first objective includes the hidden layer representation of the LLM.

[0463] Optionally, the multiple objectives include a second objective and a third objective, the nodes of the second objective and the nodes of the third objective are related, and the latent factors of the second objective and the latent factors of the third objective are at least partially the same.

[0464] Optionally, the graph structure includes nodes of multiple latent factors, which include latent factors of multiple objectives, and there are edges between nodes of multiple objectives and nodes of latent factors of multiple objectives.

[0465] Optionally, the multiple targets include a fourth target, and the latent factors of the fourth target include the multi-level quantization results of the fourth target and / or the identifiers of the multi-level quantization results. The multi-level quantization results of the fourth target are obtained by vector quantization processing of the semantic representation of the fourth target based on multiple codebooks.

[0466] Optionally, the multi-level quantization result of the fourth objective is obtained by performing residual quantization processing on the semantic representation of the fourth objective based on multiple codebooks.

[0467] For a detailed description of the graph structure, please refer to Method 500. To avoid repetition, it will not be repeated here.

[0468] The following is a specific example illustrating the graph structure construction process of this application embodiment. This construction process can be considered as a specific implementation of the method shown in Figure 5. To avoid repetition, some descriptions are omitted when describing the construction process. This solution can be applied to recommendation systems.

[0469] The construction process includes the following steps.

[0470] (a) Semantic representation generation;

[0471] The semantic representation generation module 410 can generate LLM-enhanced semantic representations for users. Specifically, the semantic representation generation module 410 can generate semantic representations for each of the multiple users through LLM, thus obtaining the semantic representation for each user.

[0472] For example, a user's text description is input into the LLM using the user's prompt words. These prompt words can instruct the LLM to infer the user's preference information based on the text description, as shown in Figure 13(a). The hidden layer representation of the last hidden layer in the LLM generation process is extracted as the user's semantic representation.

[0473] For example, a user's semantic representation can satisfy the following formula:

[0474] in, This represents the semantic representation of the j-th user. The prompt word represents the j-th user. Encoder() can represent the truncated hidden representation of the LLM, D v The dimension representing the semantic representation.

[0475] The semantic representation generation module 410 can generate LLM-enhanced semantic representations for objects. Specifically, the semantic representation generation module 410 can generate semantic representations for each of the multiple objects using LLM, thus obtaining the semantic representation of each object.

[0476] For example, the textual description of an object is input into the LLM via a prompt word. This prompt word can instruct the LLM to extract facts about the object based on the textual description, or to generate richer knowledge related to the object, for example, as shown in Figure 13(b). The hidden representation of the last hidden layer in the LLM generation process is extracted as the semantic representation of the object.

[0477] For example, the semantic representation of an object can satisfy the following formula:

[0478] in, This represents the semantic representation of the j-th object. The prompt word representing the j-th object.

[0479] It should be understood that the implementation method in step (i) is only an example, and semantic representations can also be generated in other ways. For a detailed description, please refer to step 520 in method 500.

[0480] (ii) Extraction of latent factors;

[0481] The latent factor extraction module 420 can extract latent factors for semantic representation.

[0482] For example, the latent factor extraction module 420 can extract latent factors by quantizing the residuals into the semantic representation of each user among multiple users, thus obtaining the latent factors of each user.

[0483] For example, the semantic representation of a user is reduced in dimensionality using an MLP decoder, and the result of the dimensionality reduction is residual quantized to obtain the latent factors of that user, i.e., codeword IDs. Finally, a set of latent factors for multiple users is obtained.

[0484] For example, the latent factor extraction module 420 can extract latent factors by quantizing the residuals into the semantic representation of each of the multiple objects, thus obtaining the latent factors of each object.

[0485] For example, the semantic representation of an object is reduced in dimensionality using an MLP decoder, and the result of the dimensionality reduction is residual quantized to obtain the latent factors of the object, i.e., codeword IDs. Finally, a set of latent factors for multiple objects is obtained.

[0486] The specific process of generating the codebook and latent factors can be found in step 530 of method 500, and will not be repeated here.

[0487] It should be understood that the implementation method in step (ii) is only an example, and latent factors can also be extracted in other ways. For a detailed description, please refer to step 530 in method 500.

[0488] (III) Collaborative Graph Construction;

[0489] Graph building module 430 can build graph structures.

[0490] For example, codeword IDs are used as explicit nodes in a graph structure, and edges are formed with users mapped to those codeword IDs.

[0491] For example, the codeword ID is used as an explicit node in the graph structure, and an edge is formed with the object mapped to that codeword ID.

[0492] Thus, the constructed graph structure includes four types of nodes and three types of edges. The nodes in the graph structure can be represented as V = {U, I, Q}. u Q i}, where V represents the set of nodes in the graph structure, U represents the set of nodes for the multiple users, I represents the set of nodes for the multiple objects, and Q... u Q represents the set of nodes representing the latent factors of the multiple users, that is, the set of nodes representing the latent factors of the users. i This represents the set of nodes representing the latent factors of the multiple objects; that is, the set of nodes representing the latent factors of the objects. The edges in the graph structure include user-object edges e. u-i User-User Latent Factor Edge e u-q and object-object latent factor edge e i-q .

[0493] It should be understood that the implementation method in step (iii) is only an example. The graph structure can also be constructed in other ways. For a detailed description, please refer to step 540 in method 500.

[0494] The graph structure constructed through the above steps can be used for recommendation tasks. This graph structure is independent of downstream recommendation tasks and models, thus it is compatible with various types of recommendation tasks and models. For an explanation of the application of the graph structure, please refer to the description of Method 500 above; it will not be repeated here.

[0495] It should be noted that the above process is only illustrated using graph structures for recommendation tasks and does not limit the application scenarios of this application. When applied to other scenarios, the recommendation system and its input information in the above process can be adaptively replaced with other systems or models and their input information.

[0496] Furthermore, the above process is only illustrated by extracting latent factors from multiple users and multiple objects separately, and does not constitute a limitation on the solutions of this application embodiment. In other possible implementations, latent factors may be extracted only from multiple users, and a graph structure may be constructed based on the latent factors of these multiple users. For example, as shown in Figure 14, only nodes with latent factors exist on the user side, and correspondingly, the association relationships between these multiple users are determined based on the latent factors of these multiple users. Alternatively, latent factors may be extracted only from multiple objects, and a graph structure may be constructed based on the latent factors of these multiple objects. For example, as shown in Figure 15, only nodes with latent factors exist on the object side, and correspondingly, the association relationships between these multiple objects are determined based on the latent factors of these multiple objects.

[0497] Furthermore, the graph structure obtained in this embodiment can also be applied to the training process of a recommendation model. Similar to the inference process of applying the graph structure to the recommendation model, augmented features can be used as additional inputs during training. For example, a recommendation model can be trained based on at least one training sample and its corresponding sample label. Each training sample includes user information, recommendation object information, and augmented features related to the graph structure (such as the graph structure, user features obtained based on the graph structure, or recommendation object features obtained based on the graph structure). The corresponding sample label is used to indicate whether the user in the training sample has performed an action on the recommendation object.

[0498] In the embodiments of this application, the enhanced features related to the graph structure can be used to optimize the training algorithm, that is, to train the recommendation model, so that the recommendation system can better utilize the graph structure, thereby improving the ability of the recommendation model and improving the accuracy of the recommendation results.

[0499] Offline experiments were conducted using the schemes of this application and other recommendation models, and the overall performance comparison results are shown in Table 1.

[0500] The experiment was based on three public datasets: MovieLens (such as MovieLens-1M in Table 1 or ML-1M in Table 2), Amazon (such as Amazon-books in Table 1 or Amz-Books in Table 2), and BookCrossing (such as BX in Table 2).

[0501] The evaluation metrics selected were normalized discounted cumulative gain at 10 (NDCG@10), hit ratio at 10 (HR@10), mean reciprocal rank (MRR), and grouped area under the curve (GAUC).

[0502] The higher the values ​​of the above four indicators, the better the quality of the model's predicted ranking.

[0503] The baseline model selected two representative methods: LLM augmentation methods and graph augmentation methods. LLM augmentation methods include the Open-World Recommendation with Knowledge Augmentation from Large Language Models (KAR) framework and UIST. Graph augmentation methods include Light Graph Convolutional Network (LightGCN), Co-clustering Graph Neural Network (CCGNN), and Topological Item-to-Item (TopoI2I). The base recommendation model selected includes YouTube Deep Neural Network (YouTubeDNN), Multi-Interest Network with Dynamic Routing (MIND), Gated Recurrent Unit for Recommendation (GRU4Rec), and Self-Attention for Sequential Recommendation (SASRec).

[0504] AutoGraph represents the scheme adopted in the embodiments of this application. “Rel.Impr.” represents the relative improvement rate of this scheme relative to the best metric among each baseline.

[0505] Table 1

[0506] As shown in Table 1, the proposed solution demonstrates significant improvements in all metrics compared to LLM enhancement and graph enhancement methods. Graphing efficiency is shown in Table 2.

[0507] Table 2

[0508] Avg.GAUC represents the average value of GUAC.

[0509] As shown in Table 2, compared with the baseline scheme using LLM, the scheme of this application can effectively reduce the number of LLM inferences and improve performance in both the initial graph construction stage and the incremental node insertion stage. Moreover, the performance improvement is significant compared with the method without LLM.

[0510] Figure 16 illustrates a schematic diagram of the results of a visualization analysis of latent factors provided in an embodiment of this application. Specifically, Figure 16 shows the distribution of movie tags for different latent factor sets. Taking (11, **) as an example, (11, **) represents the frequency distribution of tags for movies whose first codeword ID is 11. As shown in Figure 16, for example, the tags for movies whose first codeword ID is 11 are mainly distributed in the "gunfight" category, and further, the frequency of "gangster" is even higher in the "gunfight" category. As can be seen from Figure 16, the latent factors extracted in this embodiment of the application can reflect various specific group characteristics, and can also reflect the distribution of various fine-grained features.

[0511] It should be understood that in this embodiment of the application, only the downstream task is used as an example for illustration. The solution of this embodiment of the application can be used for other tasks that require a graph structure to help the downstream task achieve better results.

[0512] Figure 17 illustrates a system architecture 100 provided in an embodiment of this application. In Figure 17, a data acquisition device 160 is used to acquire training data. For example, according to the recommended method of this application embodiment, the training data may include semantic representations of users and / or semantic representations of objects.

[0513] After collecting the training data, the data acquisition device 160 stores the training data in the database 130, and the training device 120 trains the target model / rule 101 based on the training data maintained in the database 130.

[0514] The following describes how the training device 120 obtains the target model / rule 101 based on the training data. The training device 120 processes the input raw data and compares the output value with the target value until the difference between the output value of the training device 120 and the target value is less than a certain threshold, thereby completing the training of the target model / rule 101.

[0515] The target model / rule 101 described above can be used to implement the recommended method of this application embodiment. Specifically, the target model / rule 101 in this application embodiment can be a neural network model. It should be noted that in practical applications, the training data maintained in the database 130 may not all come from the data acquisition device 160; it may also be received from other devices. Furthermore, it should be noted that the training device 120 may not necessarily train the target model / rule 101 entirely based on the training data maintained in the database 130; it may also obtain training data from the cloud or other sources for model training. The above description should not be construed as limiting the embodiments of this application.

[0516] The target model / rule 101 trained by training device 120 can be applied to different systems or devices, such as execution device 110 shown in Figure 17. Execution device 110 can be a terminal, such as a mobile terminal, tablet computer, laptop computer, augmented reality (AR) / virtual reality (VR) device, in-vehicle terminal, etc., or it can be a server or cloud device. In Figure 17, execution device 110 is configured with input / output (I / O) interface 112 for data interaction with external devices. Users can input data to I / O interface 112 through client device 140. In this embodiment, the input data can include data to be processed input by the client device.

[0517] During the preprocessing of input data by the execution device 110, for example, by preprocessing through the preprocessing module 113 and / or the preprocessing module 114, or during the calculation and other related processing by the calculation module 111 of the execution device 110, the execution device 110 may call data, code, etc. in the data storage system 150 for corresponding processing, or store the data, instructions, etc. obtained from the corresponding processing into the data storage system 150.

[0518] Finally, I / O interface 112 returns the processing result, such as the data processing result obtained above, to client device 140, thereby providing it to the user.

[0519] It is worth noting that the training device 120 can generate corresponding target models / rules 101 based on different training data for different objectives or tasks. The corresponding target models / rules 101 can be used to achieve the above objectives or complete the above tasks, thereby providing the user with the required results.

[0520] In the scenario shown in Figure 17, the user can manually provide input data, which can be done through the interface provided by I / O interface 112. Alternatively, the client device 140 can automatically send input data to I / O interface 112. If user authorization is required for the client device 140 to automatically send input data, the user can set the corresponding permissions in the client device 140. The user can view the output results of the execution device 110 on the client device 140, which can be presented in various forms such as display, sound, or animation. The client device 140 can also act as a data acquisition terminal, collecting the input data and output results of the input I / O interface 112 as new sample data and storing them in the database 130. Alternatively, data can be collected directly from the I / O interface 112 without going through the client device 140, using the input data and output results of the input I / O interface 112 as new sample data and storing them in the database 130.

[0521] It is worth noting that Figure 17 is only a schematic diagram of a system architecture provided by an embodiment of this application. The positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 17, the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 can also be placed in the execution device 110.

[0522] The apparatus of the present application embodiment will now be described with reference to Figures 18 and 19. It should be understood that the apparatus described below is capable of performing the methods of the foregoing embodiments of the present application. To avoid unnecessary repetition, repeated descriptions will be appropriately omitted when introducing the apparatus of the present application embodiment.

[0523] Figure 18 shows a schematic block diagram of an apparatus provided in an embodiment of this application. The apparatus 1800 shown in Figure 18 can be used to perform the methods of the embodiments of this application, such as the methods shown in Figure 5 or Figure 8.

[0524] As shown in Figure 18, the device 1800 may include an acquisition module 1810 and a processing module 1820.

[0525] As one possible implementation, the device 1800 can be used to perform the method shown in FIG5.

[0526] The acquisition module 1810 is used to acquire text descriptions of multiple targets, including multiple users or multiple objects.

[0527] Processing module 1820 is used for:

[0528] Generate semantic representations of multiple targets based on their textual descriptions;

[0529] Based on the semantic representation of multiple targets, latent factors of multiple targets are extracted respectively. The latent factors of multiple targets are used to reflect the semantic similarity between multiple targets.

[0530] Construct a graph structure, which includes nodes representing multiple objectives. The relationships between these nodes are determined based on the latent factors of the objectives. This graph structure is then used for downstream tasks.

[0531] Optionally, the multiple targets include a first target, and the processing module 1820 is specifically used to: generate a semantic representation of the first target using a large language model (LLM), wherein the input of the LLM includes prompt words, which are used to instruct the LLM to analyze the first target based on the text description of the first target.

[0532] Optionally, the semantic representation of the first objective includes the hidden layer representations of one or more hidden layers of the LLM.

[0533] Optionally, the multiple objectives include a second objective and a third objective, the nodes of the second objective and the nodes of the third objective are related, and the latent factors of the second objective and the latent factors of the third objective are at least partially the same.

[0534] Optionally, the graph structure includes nodes of multiple latent factors, which include latent factors of multiple objectives, and there are edges between nodes of multiple objectives and nodes of latent factors of multiple objectives.

[0535] Optionally, the multiple targets include a fourth target, and the processing module 1820 is specifically used to: perform vector quantization processing on the semantic representation of the fourth target based on multiple codebooks to obtain the multi-level quantization result of the fourth target, wherein the latent factors of the fourth target include the multi-level quantization result of the fourth target and / or the identifier of the multi-level quantization result.

[0536] Optionally, the processing module 1820 is specifically used to: perform residual quantization processing on the semantic representation of the fourth target based on multiple codebooks.

[0537] Optionally, downstream tasks include recommended tasks.

[0538] Optionally, the recommendation task is performed by a recommendation model, which is used to predict the scores of candidate recommendation objects. The scores of candidate recommendation objects are used to reflect the matching degree between the candidate recommendation objects and the first user. The input information of the recommendation model includes enhanced features, which include graph structures, and / or features of a fifth target obtained based on the graph structure. The graph structure includes nodes of the fifth target, which is the first user or a candidate recommendation object.

[0539] Optionally, a graph structure is used to identify multiple recall objects during the recall phase of a recommendation task. In the graph structure, there is an association between the nodes of the multiple recall objects and the node of the first user.

[0540] For a detailed description, please refer to Method 500 above; it will not be repeated here.

[0541] As another possible implementation, device 1800 can be used to perform the method shown in FIG8. Acquisition module 1810 can also be replaced by receiving module.

[0542] Specifically, module 1810 is used to receive recommendation requests.

[0543] Processing module 1820 is used for:

[0544] Obtain input information related to the recommendation request. The input information includes information about the first user, information about the candidate recommendation objects, and enhanced features related to the graph structure. The enhanced features include at least one of the following: graph structure, features of the first user obtained based on the graph structure, or features of the candidate recommendation objects obtained based on the graph structure.

[0545] The graph structure includes nodes of multiple targets. The relationships between the nodes of multiple targets are determined based on the latent factors of multiple targets. The latent factors of multiple targets are extracted based on the semantic representations of multiple targets. The semantic representations of multiple targets are generated based on the textual descriptions of multiple targets. Multiple targets include multiple users or multiple objects.

[0546] The input information is fed into the recommendation model to obtain the scores of the candidate recommendations. The scores are used to reflect the matching degree between the first user and the candidate recommendations.

[0547] Optionally, in the graph structure, there is a relationship between the nodes of the candidate recommendation object and the node of the first user.

[0548] Optionally, the multiple objectives include a first objective, the semantic representation of which is generated using a large language model (LLM). The input of the LLM includes cue words that instruct the LLM to analyze the first objective based on the textual description of the first objective.

[0549] Optionally, the semantic representation of the first objective includes the hidden layer representations of one or more hidden layers of the LLM.

[0550] Optionally, the multiple objectives include a second objective and a third objective, the nodes of the second objective and the nodes of the third objective are related, and the latent factors of the second objective and the latent factors of the third objective are at least partially the same.

[0551] Optionally, the graph structure includes nodes of multiple latent factors, which include latent factors of multiple objectives, and there are edges between nodes of multiple objectives and nodes of latent factors of multiple objectives.

[0552] Optionally, the multiple targets include a fourth target, and the latent factors of the fourth target include the multi-level quantization results of the fourth target and / or the identifiers of the multi-level quantization results. The multi-level quantization results of the fourth target are obtained by vector quantization processing of the semantic representation of the fourth target based on multiple codebooks.

[0553] Optionally, the multi-level quantization result of the fourth objective is obtained by performing residual quantization processing on the semantic representation of the fourth objective based on multiple codebooks.

[0554] For a detailed description, please refer to Method 800 above; it will not be repeated here.

[0555] Each module in device 1800 can be implemented in software or in hardware. For example, the implementation of processing module 1820 will be described below. Similarly, the implementation methods of other modules can be the same as those of processing module 1820.

[0556] As an example of a software functional unit, processing module 1820 may include code running on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, or a container. Further, the aforementioned computing instance may be one or more. For example, processing module 1820 may include code running on multiple hosts / virtual machines / containers. It should be noted that the multiple hosts / virtual machines / containers used to run the code may be distributed within the same region or in different regions. Further, the multiple hosts / virtual machines / containers used to run the code may be distributed within the same availability zone (AZ) or in different AZs, each AZ including one or more geographically proximate data centers. Typically, a region may include multiple AZs.

[0557] Similarly, multiple hosts / virtual machines / containers used to run this code can be distributed within the same Virtual Private Cloud (VPC) or across multiple VPCs. Typically, a VPC is set up within a region. Communication between two VPCs within the same region, as well as between VPCs in different regions, requires a communication gateway to be set up within each VPC to enable interconnection between VPCs.

[0558] As an example of a hardware functional unit, the processing module 1820 may include at least one computing device, such as a server. Alternatively, the processing module 1820 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be implemented using a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), generic array logic (GAL), or any combination thereof.

[0559] The processing module 1820 includes multiple computing devices that can be distributed within the same region or in different regions. These computing devices can be distributed within the same Availability Zone (AZ), within the same Virtual Private Cloud (VPC), or across multiple VPCs. Furthermore, the multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

[0560] It should be noted that the division of units in the above device is only a logical functional division. In actual implementation, they can be fully or partially integrated into a single physical entity, or they can be physically separated. In other embodiments, the processing module 1820 can be used to execute any step in the method described above, and other modules can be used to implement any step described above. The steps that each module is responsible for implementing can be specified as needed. By having each module implement different steps described above, all functions of the device 1800 can be achieved.

[0561] Figure 19 is a schematic block diagram of the apparatus provided in an embodiment of this application. The apparatus 1900 may include a processor 1910, a transceiver 1920, and a memory 1930. The processor 1910, transceiver 1920, and memory 1930 are connected via internal interconnection paths. The memory 1930 is used to store instructions, and the processor 1910 is used to execute the instructions stored in the memory 1930 to receive / send data via the transceiver 1920. Optionally, the memory 1930 may be coupled to the processor 1910 via an interface or integrated with the processor 1910.

[0562] It should be noted that the transceiver 1920 mentioned above may include, but is not limited to, transceiver devices such as input / output interfaces, to enable communication between device 1900 and other devices or communication networks.

[0563] The memory 1930 can be a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).

[0564] In one implementation, the processor 1910 can be a circuit with instruction read and execute capabilities, such as a central processing unit (CPU), microprocessor, or digital signal processor (DSP). In another implementation, the processor 1910 can implement certain functions through the logical relationships of hardware circuits. These logical relationships can be fixed or reconfigurable. For example, the processor 1910 can be a hardware circuit implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), such as a field-programmable gate array (FPGA). In a reconfigurable hardware circuit, the process of the processor loading a configuration document and configuring the hardware circuit can be understood as the processor loading instructions to implement some or all of the functions of the aforementioned units.

[0565] This application also provides an electronic device, which may include the above-described device 1800 or device 1900.

[0566] This application also provides a computing device cluster, which includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone. The computing device cluster includes at least one device 1900.

[0567] In some possible implementations, the memory 1930 of one or more computing devices in the computing device cluster may also store partial instructions for executing the aforementioned method. In other words, a combination of one or more computing devices can jointly execute the instructions for executing the aforementioned method.

[0568] It should be noted that the memory 1930 in different computing devices within the computing device cluster can store different instructions, each used to execute a portion of the steps described above. That is, the instructions stored in the memory 1930 of different computing devices can implement the functions of one or more modules in device 1900.

[0569] In some possible implementations, one or more computing devices in a computing device cluster can be connected via a network. This network can be a wide area network (WAN) or a local area network (LAN), etc.

[0570] This application also provides a computer program product, which includes computer program code that, when run on a computer, causes the computer to perform the methods described in the above embodiments.

[0571] This application also provides a computer-readable medium storing program code that, when run on a computer, causes the computer to perform the methods described in the above embodiments.

[0572] This application also provides a chip, including circuitry, for performing the methods described in the above embodiments.

[0573] In implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software. The method disclosed in the embodiments of this application can be directly implemented by a hardware processor, or by a combination of hardware and software modules within the processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, power-on erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, detailed descriptions are omitted here.

[0574] It should also be understood that, in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0575] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0576] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0577] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0578] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0579] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, ROM, RAM, magnetic disks, or optical disks.

[0580] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be covered.

Claims

1. A method for constructing a graph structure, characterized in that, include: Obtain text descriptions of multiple targets, including multiple users or multiple objects; Semantic representations of the multiple targets are generated based on their textual descriptions. Based on the semantic representations of the multiple targets, latent factors of the multiple targets are extracted respectively, and the latent factors of the multiple targets are used to reflect the semantic similarity between the multiple targets; A graph structure is constructed, which includes nodes of the plurality of objectives. The relationships between the nodes of the plurality of objectives are determined based on the latent factors of the plurality of objectives. The graph structure is used for downstream tasks.

2. The method according to claim 1, characterized in that, The plurality of targets includes a first target, and the step of generating semantic representations of the plurality of targets based on their textual descriptions includes: A semantic representation of the first target is generated based on the text description of the first target using a Large Language Model (LLM). The input of the LLM includes cue words, which are used to instruct the LLM to analyze the first target based on the text description of the first target.

3. The method according to claim 2, characterized in that, The semantic representation of the first target includes the hidden layer representation of one or more hidden layers of the LLM.

4. The method according to any one of claims 1 to 3, characterized in that, The plurality of objectives includes a second objective and a third objective, wherein there is an association between the nodes of the second objective and the nodes of the third objective, and the latent factors of the second objective and the latent factors of the third objective are at least partially the same.

5. The method according to claim 4, characterized in that, The graph structure includes nodes of multiple latent factors, which include latent factors of the multiple targets, and there are edges between the nodes of the multiple targets and the nodes of the latent factors of the multiple targets.

6. The method according to any one of claims 1 to 5, characterized in that, The plurality of objectives includes a fourth objective, and the step of extracting latent factors of the plurality of objectives based on their semantic representations includes: The semantic representation of the fourth target is vector quantized based on multiple codebooks to obtain the multi-level quantization result of the fourth target. The latent factors of the fourth target include the multi-level quantization result of the fourth target and / or the identifier of the multi-level quantization result.

7. The method according to claim 6, characterized in that, The vector quantization processing of the semantic representation of the fourth target based on multiple codebooks includes: The semantic representation of the fourth target is subjected to residual quantization based on the multiple codebooks.

8. The method according to any one of claims 1 to 7, characterized in that, The downstream tasks include recommendation tasks.

9. The method according to claim 8, characterized in that, The recommendation task is executed through a recommendation model, which is used to predict the score of a candidate recommendation object. The score of the candidate recommendation object is used to reflect the matching degree between the candidate recommendation object and the first user. The input information of the recommendation model includes enhanced features, which include the graph structure and / or features of a fifth target obtained based on the graph structure. The graph structure includes nodes of the fifth target, which is the first user or the candidate recommendation object.

10. The method according to claim 8 or 9, characterized in that, The graph structure is used to identify multiple recall objects during the recall phase of the recommendation task. In the graph structure, there is an association between the nodes of the multiple recall objects and the node of the first user.

11. A recommendation method, characterized in that, include: Receive recommendation requests; Obtain input information related to the recommendation request, the input information including information of the first user, information of the candidate recommendation object, and enhanced features related to the graph structure, wherein the enhanced features include at least one of the following: the graph structure, features of the first user obtained based on the graph structure, or features of the candidate recommendation object obtained based on the graph structure; The graph structure includes nodes of multiple targets. The relationships between the nodes of the multiple targets are determined based on the latent factors of the multiple targets. The latent factors of the multiple targets are extracted based on the semantic representations of the multiple targets. The semantic representations of the multiple targets are generated based on the textual descriptions of the multiple targets. The multiple targets include multiple users or multiple objects. The input information is fed into the recommendation model to obtain the score of the candidate recommendation object, which reflects the matching degree between the first user and the candidate recommendation object.

12. The recommended method according to claim 11, characterized in that, In the graph structure, there is an association between the nodes of the candidate recommendation object and the nodes of the first user.

13. The recommended method according to claim 11 or 12, characterized in that, The plurality of targets includes a first target, the semantic representation of which is generated using a large language model (LLM). The input of the LLM includes cue words, which instruct the LLM to analyze the first target based on the textual description of the first target.

14. The recommended method according to claim 13, characterized in that, The semantic representation of the first target includes the hidden layer representation of one or more hidden layers of the LLM.

15. The recommended method according to any one of claims 11 to 14, characterized in that, The plurality of objectives includes a second objective and a third objective, wherein there is an association between the nodes of the second objective and the nodes of the third objective, and the latent factors of the second objective and the latent factors of the third objective are at least partially the same.

16. The recommended method according to claim 15, characterized in that, The graph structure includes nodes of multiple latent factors, which include latent factors of the multiple targets, and there are edges between the nodes of the multiple targets and the nodes of the latent factors of the multiple targets.

17. The recommended method according to any one of claims 11 to 16, characterized in that, The plurality of objectives includes a fourth objective, and the latent factors of the fourth objective include the multi-level quantization result of the fourth objective and / or the identifier of the multi-level quantization result. The multi-level quantization result of the fourth objective is obtained by vector quantization processing of the semantic representation of the fourth objective based on multiple codebooks.

18. The recommended method according to claim 17, characterized in that, The multi-level quantization result of the fourth objective is obtained by performing residual quantization processing on the semantic representation of the fourth objective based on the multiple codebooks.

19. An apparatus for constructing a graph structure, characterized in that, include: The acquisition module is used to acquire text descriptions of multiple targets, including multiple users or multiple objects; Processing module, used for: Semantic representations of the multiple targets are generated based on their textual descriptions. Based on the semantic representations of the multiple targets, latent factors of the multiple targets are extracted respectively, and the latent factors of the multiple targets are used to reflect the semantic similarity between the multiple targets; A graph structure is constructed, which includes nodes of the plurality of objectives. The relationships between the nodes of the plurality of objectives are determined based on the latent factors of the plurality of objectives. The graph structure is used for downstream tasks.

20. The apparatus according to claim 19, characterized in that, The plurality of objectives includes a first objective, and the processing module is specifically used for: A semantic representation of the first target is generated based on the text description of the first target using a Large Language Model (LLM). The input of the LLM includes cue words, which are used to instruct the LLM to analyze the first target based on the text description of the first target.

21. The apparatus according to claim 19 or 20, characterized in that, The plurality of objectives includes a second objective and a third objective, wherein there is an association between the nodes of the second objective and the nodes of the third objective, and the latent factors of the second objective and the latent factors of the third objective are at least partially the same.

22. The apparatus according to claim 21, characterized in that, The graph structure includes nodes of multiple latent factors, which include latent factors of the multiple targets, and there are edges between the nodes of the multiple targets and the nodes of the latent factors of the multiple targets.

23. The apparatus according to any one of claims 19 to 22, characterized in that, The plurality of objectives includes a fourth objective, and the processing module is specifically used for: The semantic representation of the fourth target is vector quantized based on multiple codebooks to obtain the multi-level quantization result of the fourth target. The latent factors of the fourth target include the multi-level quantization result of the fourth target and / or the identifier of the multi-level quantization result.

24. The apparatus according to any one of claims 19 to 23, characterized in that, The downstream tasks include recommendation tasks.

25. A recommended device, characterized in that, include: The receiving module is used to receive recommendation requests; Processing module, used for: Obtain input information related to the recommendation request, the input information including information of the first user, information of the candidate recommendation object, and enhanced features related to the graph structure, wherein the enhanced features include at least one of the following: the graph structure, features of the first user obtained based on the graph structure, or features of the candidate recommendation object obtained based on the graph structure; The graph structure includes nodes of multiple targets. The relationships between the nodes of the multiple targets are determined based on the latent factors of the multiple targets. The latent factors of the multiple targets are extracted based on the semantic representations of the multiple targets. The semantic representations of the multiple targets are generated based on the textual descriptions of the multiple targets. The multiple targets include multiple users or multiple objects. The input information is fed into the recommendation model to obtain the score of the candidate recommendation object, which reflects the matching degree between the first user and the candidate recommendation object.

26. The recommended device according to claim 25, characterized in that, In the graph structure, there is an association between the nodes of the candidate recommendation object and the nodes of the first user.

27. A computing device, characterized in that, It includes a processor and a memory, the processor being configured to execute instructions stored in the memory to cause the computing device to perform the method as claimed in any one of claims 1 to 10, or the method as claimed in any one of claims 11 to 18.

28. A computer program product containing instructions, characterized in that, When the instructions are executed by the computing device, the computing device performs the method as described in any one of claims 1 to 10, or the method as described in any one of claims 11 to 18.

29. A computer-readable storage medium, characterized in that, It includes computer program instructions, which, when executed by a computing device, cause the computing device to perform the method as described in any one of claims 1 to 10, or the method as described in any one of claims 11 to 18.