A material source finding method and system

By constructing a knowledge graph of the industrial chain and a user behavior sequence model, and training material vectors, the problem of unutilized user behavior and industrial chain relationships in material sourcing was solved, resulting in more accurate material recommendations and cost reduction.

CN116186273BActive Publication Date: 2026-06-19GUANGZHOU SHIYUAN ELECTRONICS CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUANGZHOU SHIYUAN ELECTRONICS CO LTD
Filing Date
2021-11-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing material sourcing methods fail to effectively utilize users' historical behavior information and upstream and downstream relationships in the industry chain, resulting in inaccurate material search and recall, and an inability to discover new products or substitutes.

Method used

We construct a knowledge graph of the industry chain and a graph representation model of user behavior sequence. We train material vectors through Translating Embeddings and graph vector models, integrate user interests and industry chain relationships, and use a dual-tower model to calculate the similarity between materials and keywords.

🎯Benefits of technology

It improves the accuracy of material sourcing, enabling the discovery of relevant, new, or alternative materials that users are interested in, reducing procurement costs and increasing the diversity of the result set.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116186273B_ABST
    Figure CN116186273B_ABST
Patent Text Reader

Abstract

This application provides a material sourcing method and system. The method includes: crawling external data; constructing a supply chain knowledge graph based on the external data and internal enterprise data to obtain upstream and downstream relationships; acquiring user tracking data and constructing user behavior sequences based on the user tracking data; inputting the supply chain knowledge graph and user behavior sequences into a supply chain graph and user behavior fusion model for training; constructing and training a dual-tower model of materials and keywords based on historical retrieval data, converting all materials into material vectors; the dual-tower model includes a supply chain graph and user behavior fusion model and a keyword model; inputting keywords into the keyword model to obtain keyword vectors; calculating the similarity between the keyword vectors and all material vectors; and selecting at least one material with the highest similarity as the material sourcing result. This application enables rapid discovery of substitutes and new products in the material database, improving product price transparency and reducing procurement costs and related expenses.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of planning and procurement technology, and in particular to a material sourcing method and system that integrates supply chain knowledge graphs and user behavior sequence diagrams. Background Technology

[0002] Material sourcing has always been a crucial part of strategic procurement. Accurately finding alternative materials with similar specifications and functions but higher cost-effectiveness in the market can reduce costs or mitigate risks for enterprises from the source. Alternatively, it can help find suitable new products in the market, thereby keeping the company's products up-to-date and giving them unique novelty and competitiveness. In strategic procurement, there are many material lines, and the material categories under different material lines are numerous.

[0003] Extensive research has revealed that current material sourcing methods in the industry largely fall into three categories: sourcing based on shared industry supply sources, online sourcing using cloud services, and searching by constructing material vectors from user behavior sequences. The first two methods focus on collecting and organizing materials at the material level, neglecting to consider historical user behavior information; materials retrieved through user searches can only be matched using search keywords. The third method vectorizes users and materials, integrating search keywords, user behavior information, and material attribute information for searching, but it fails to consider the upstream and downstream relationships of materials within the industry, making it unable to discover new products / alternatives. Summary of the Invention

[0004] In view of this, this application constructs a material vectorization representation model from an algorithmic perspective, based on industry chain mapping and graph representation learning algorithms, to help business operators quickly discover substitutes and new products in a material library of millions of items (hereinafter referred to as material sourcing), improve product price transparency, and reduce procurement costs and related expenses.

[0005] To achieve the above objectives, this application proposes a material sourcing method, comprising:

[0006] External data is crawled, and a supply chain knowledge graph is constructed based on the external data and internal enterprise data to obtain upstream and downstream relationships.

[0007] Obtain user tracking data and construct a user behavior sequence based on the user tracking data;

[0008] The industry chain knowledge graph and user behavior sequence are input into the industry chain graph and user behavior fusion model for training; the industry chain graph and user behavior fusion model includes a translation vector (TransE) model and a graph vector model;

[0009] Based on historical retrieval data, a dual-tower model of materials and keywords is constructed and trained, and all materials are transformed into material vectors; the dual-tower model includes a fusion model of industry chain map and user behavior, as well as a keyword model;

[0010] Input the keywords into the keyword model to obtain the keyword vector. Calculate the similarity between the keyword vector and all the material vectors, and select at least one material with the highest similarity as the material sourcing result.

[0011] Furthermore, the step of inputting the industry chain knowledge graph and user behavior sequence into the industry chain graph and user behavior fusion model for training includes:

[0012] The TransE model was used to train the node vectors representing upstream and downstream relationships in the aforementioned industry chain knowledge graph.

[0013] A material vector based on the user behavior sequence is trained using a graph vector model;

[0014] The node vector and the material vector are fused and compressed to obtain a fused vector;

[0015] After merging the fused vector with the original vectors of the TransE model and the graph vector model, the loss is calculated in the downstream task of each model, and the model is trained by backpropagation.

[0016] Furthermore, the crawling of external data, and the construction of a supply chain knowledge graph based on the external data and internal enterprise data to obtain upstream and downstream relationships, includes:

[0017] Extract internal entities from the existing structured data within the enterprise;

[0018] By utilizing entity recognition, relation extraction, and entity fusion, entities and relations required for the industry chain graph are extracted from crawled data. Combined with the internal entities of the enterprise, an industry chain knowledge graph is generated. Each relation in the knowledge graph is represented by a triple, i.e., head node-relationship-tail node.

[0019] Based on the relationship between products and categories, the corresponding industry chain node of the category is mapped, and the industry chain attributes of the current material are obtained based on the upstream and downstream relationships of the category.

[0020] Furthermore, the step of acquiring user tracking data and constructing a user behavior sequence based on the user tracking data includes:

[0021] Construct a graph structure from all users' click sequences and associate the interests of different users through materials;

[0022] A sequence is sampled by randomly walking through the paths in the graph structure.

[0023] By repeatedly traversing the path multiple times, multiple sequences are obtained, which serve as user behavior sequences.

[0024] Furthermore, the step of training the node vectors representing upstream and downstream relationships in the industry chain knowledge graph using the TransE model includes:

[0025] The nodes in the industrial chain knowledge graph are vectorized using a vector layer to obtain the node vectors of upstream and downstream relationships in the industrial chain knowledge graph;

[0026] The input head node ID and relation ID are processed through a vector layer to obtain the head node vector and relation vector, respectively. Then, using the triangular representation of vectors, the sum vector of the head node vector and relation vector is obtained. Finally, the distance between the head node vector, relation vector and tail node vector is evaluated as the loss value, and backpropagation is used to train the model.

[0027] Furthermore, the graph vector model is a skip-gram model.

[0028] Furthermore, the step of training the material vector based on the user behavior sequence using a graph vector model includes:

[0029] Material pairs are constructed from the user behavior sequence using a sliding window approach, with each sliding window determining a central material.

[0030] For each material, its corresponding material vector is obtained through the vector parameter matrix. The neighboring materials are predicted using the center material, and the distance function is used as the prediction score. The final loss function is the positive sample score minus the negative sample score.

[0031] Further, the step of fusing and compressing the node vector and material vector to obtain the fused vector includes:

[0032] The node vector and the material vector are cross-processed to obtain a cross vector;

[0033] After passing through multiple fully connected layers, the cross vectors are compressed to obtain the required dimensions, and then returned to the graph vector model and TransE model for downstream tasks.

[0034] Furthermore, the inputs to the dual-tower model are material ID and keyword, and the output is the probability of whether the keyword corresponds to the clicked material. The loss function is a binary classification cross-entropy loss function.

[0035] For the purposes described above, this application also proposes a material sourcing system, comprising:

[0036] The knowledge graph module is used to crawl external data and construct a supply chain knowledge graph based on the external data and internal enterprise data to obtain upstream and downstream relationships.

[0037] The behavior sequence module is used to acquire user tracking data and construct user behavior sequences based on the user tracking data.

[0038] The fusion model module is used to input the industry chain knowledge graph and user behavior sequence into the industry chain graph and user behavior fusion model for training; the industry chain graph and user behavior fusion model includes the TransE model and the graph vector model.

[0039] The dual-tower model module is used to construct and train a dual-tower model of materials and keywords based on historical retrieval data, and to convert all materials into material vectors; the dual-tower model includes the industry chain map and user behavior fusion model and the keyword model;

[0040] The source tracing calculation module is used to input keywords into the keyword model to obtain keyword vectors, perform similarity calculations between the keyword vectors and all the material vectors, and select at least one material with the highest similarity as the material source tracing result.

[0041] For the purposes described above, this application also proposes a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described material sourcing method.

[0042] For the purposes described above, this application also proposes an electronic device, including a memory and a processor, wherein the memory stores computer-readable instructions, which, when executed by the processor, cause the processor to perform the steps of the above-described material sourcing method.

[0043] In summary, the advantages of this application and the user experience it brings are as follows:

[0044] (1) This application fully considers user behavior information. When a user conducts a search, it can discover the user's interests based on the behavior and return a more accurate and reliable result set.

[0045] (2) This application obtains upstream and downstream information from the industry chain map, resulting in a richer information structure and higher interpretability. When recommending new products and replacements, the upstream and downstream relationships in the industry chain can better identify related materials of the same type and specifications.

[0046] (3) This application explicitly learns information about users, materials, and upstream and downstream relationships of materials, making the final fusion vector expression more accurate. In addition to returning materials that users are interested in, it can also retrieve related materials in the industry chain, new materials, or replacement materials based on related materials. The retrieval is carried out from two aspects: user interests and the industry of the materials.

[0047] (4) This application is an end-to-end training model and also a paradigm of fusion model. That is, the user behavior sequence model and knowledge graph representation model can be replaced with other models for experimental training, and the best one can be selected. Moreover, the learned fusion vectors can also be used in other businesses as pre-training initialization weights. Attached Figure Description

[0048] In the accompanying drawings, unless otherwise specified, the same reference numerals throughout the various drawings denote the same or similar parts or elements. These drawings are not necessarily drawn to scale. It should be understood that these drawings depict only some embodiments disclosed in this application and should not be construed as limiting the scope of this application.

[0049] Figure 1 A flowchart illustrating the fusion of industry chain map and user behavior according to an embodiment of this application is shown.

[0050] Figure 2 A schematic diagram of the industry chain according to an embodiment of this application is shown.

[0051] Figure 3 A schematic diagram of a triplet storage format according to an embodiment of this application is shown.

[0052] Figure 4 This diagram illustrates the extraction of upstream and downstream relationships of entities based on a supply chain map according to an embodiment of this application.

[0053] Figure 5 A schematic diagram of the vectorized representation of the TransE model according to an embodiment of this application is shown.

[0054] Figure 6 A schematic diagram of the vectorization process according to an embodiment of this application is shown.

[0055] Figure 7 A schematic diagram illustrating the principle of the TransE model according to an embodiment of this application is shown.

[0056] Figure 8 A schematic diagram of material vectorization representation according to an embodiment of this application is shown.

[0057] Figure 9 A schematic diagram of a Skip-gram model according to an embodiment of this application is shown.

[0058] Figure 10 This diagram illustrates a fusion model of industry chain map and user behavior according to an embodiment of this application.

[0059] Figure 11 A schematic diagram of a dual-tower model of materials and keywords according to an embodiment of this application is shown.

[0060] Figure 12A flowchart illustrating the retrieval process according to an embodiment of this application is shown.

[0061] Figure 13 A schematic diagram of a material sourcing system according to an embodiment of this application is shown.

[0062] Figure 14 This illustration shows a schematic diagram of the structure of an electronic device according to an embodiment of this application;

[0063] Figure 15 A schematic diagram of a storage medium provided in one embodiment of this application is shown. Detailed Implementation

[0064] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the invention. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.

[0065] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0066] For material sourcing, firstly, it's essential to build a comprehensive and dynamically updated material database to ensure both completeness and real-time updates. This database should include attribute and relationship information, encompassing semantic relationships relevant to real-world applications. Secondly, when searching the database, it's crucial to consider user behavior history, focusing on materials of interest to the user. Furthermore, the search results should be diverse, ensuring that when a user searches for a specific material, relevant new or alternative materials are returned.

[0067] Therefore, in order to achieve the above two functions, this application proposes a method and system that integrates chain-knowledge-graph (ckg) and user behavior sequence graph representation (graph-embedding) (ckg-graph-embedding).

[0068] like Figure 1 The flowchart illustrating the fusion of industry chain map and user behavior according to an embodiment of this application includes the following steps:

[0069] External data is crawled, and a supply chain knowledge graph is constructed based on the external data and internal enterprise data to obtain upstream and downstream relationships.

[0070] Obtain user tracking data and construct a user behavior sequence based on the user tracking data;

[0071] The node embeddings of upstream and downstream relationships in the aforementioned industry chain knowledge graph are trained using a translation embeddings model.

[0072] Item embeddings based on the user behavior sequence are trained using a graph embedding model;

[0073] The node vector and material vector are fused and compressed to obtain a fusion embedding.

[0074] After merging the fused vector with the original vectors of the TransE model and the graph vector model, the loss is calculated in the downstream task of each model, and the model is trained by backpropagation.

[0075] In short, this application first utilizes web crawling technology to collect external industry information and combines it with internal enterprise data to construct a supply chain map, ensuring that the material library contains comprehensive materials and can be linked according to business scenarios. Then, in terms of material vectorization representation, user behavior information, upstream and downstream supply chain information, and material attributes are incorporated. This allows the vectorized retrieval to fully understand the user's search intent, returning not only the target material but also related upstream and downstream materials, new products, and replacement materials, increasing the diversity of the result set.

[0076] The following sections will introduce three aspects: vectorized representation of the supply chain map, vectorized representation of materials based on user behavior, and a fusion model of the supply chain map and user behavior. Additionally, in the following text, vector, embedding, and embedding have the same meaning and can be used interchangeably.

[0077] 1. Vectorized representation of the industry chain map

[0078] In terms of constructing the supply chain graph, for the scenario of material sourcing, this application designed four entity types: enterprise entity, product entity, industry entity, and category entity. It also designed basic relationships such as enterprise-to-enterprise, enterprise-to-product, industry-to-enterprise, and category-to-product. Based on this, using internal enterprise data as a foundation and simultaneously crawling external data, a knowledge graph is constructed from the bottom up, and the attributes and relationships of the knowledge graph are continuously improved. The basic structure of the graph is as follows: Figure 2 As shown.

[0079] Figure 2In the industry chain graph shown, entities such as industry, enterprise, product, and first- and second-level categories are first extracted from the existing structured data within the enterprise. Then, using technologies such as entity recognition, relation extraction, and entity fusion, the entities and relations required for the industry chain graph are extracted from the crawled data to supplement the graph data. In the storage of the industry chain graph, this application represents each relation as a triple, i.e., (head node-relationship-tail node), and stores it in a relational database. For example, in the triple (enterprise-production-product), the head and tail entity nodes (enterprise, product) are represented by their respective unique UUIDs (Universally Unique Identifiers), and the relation is represented by the corresponding relation type ID (production). The data table is stored as follows: Figure 3 As shown.

[0080] In material sourcing scenarios, materials can correspond to product entities. Based on the relationship between products and categories, they can be mapped to the corresponding industry chain node. Then, based on the upstream and downstream relationships of the categories, the industry chain attributes of the current material can be obtained. For example, for the material RC0805FR-073KL, the primary category is resistor. Based on the industry chain relationship, the upstream raw materials for resistors include tantalum-nickel and chromium alloys, while its downstream products can be PCB boards, semiconductor chips, etc. (See diagram below.) Figure 4 As shown.

[0081] After obtaining the upstream and downstream relationships of materials, the next step is to integrate these relationships into the vectorized representation of the materials. This requires the use of knowledge graph vector embedding technology. This application selects the TransE model, the basic idea of ​​which is as follows: Figure 5 As shown.

[0082] The TransE model primarily learns the vectorized representations of nodes in a knowledge graph. Its core idea is to first vectorize each node in the graph and then learn the triplet relationships. For example, for the triple (firm-production-product), it can be vectorized as (h, r, t), where h = head (head node vector), r = relation (relation vector), and t = tail (tail node vector). In the vector space, if h and t are connected by a relation, then h + r = t always holds true. Using this mathematical concept, during training, the TransE model takes h and r as input and aims to output h + r as close as possible to t, thus learning the triplet relationship information. The principles of the TransE model are explained in detail below.

[0083] The first step is how to vectorize the nodes. Vectorization means representing a node as a one-dimensional vector, giving it semantic information so that similarity can be calculated. In the industry, embedding layers are generally used to vectorize nodes, specifically using a parameter matrix W∈R. n*m We use vectors to represent the parameters of all nodes, and then update the parameter matrix based on downstream tasks. Here, n represents the number of nodes, and m represents the vector dimension. Thus, all nodes (n nodes) can be represented by vectors of the same dimension (m dimensions). The embedding process is as follows: Figure 6 As shown.

[0084] During training, for each node, the corresponding vector representation is obtained by indexing it into the Embedding parameter matrix based on the unique UUID. This vector representation is then input into the downstream task to obtain the loss value. Backpropagation is then used to update the Embedding parameter matrix, thereby converging the matrix and obtaining the final vector parameters of each node.

[0085] After obtaining the vector representations of each node, the model can be trained based on the TransE concept, such as... Figure 7 As shown, the head node ID and relation ID are first input, and then passed through the embedding layer to obtain the head node vector and relation vector respectively. Then, using the triangular representation of the vectors, the sum vector of h+r is obtained. Finally, the distance between h+r and the tail node vector t is evaluated as the loss value, and the model is trained by backpropagation.

[0086] During training, a triple (h, r, t) is used as a training sample for the model. Triples that actually have a relationship in the industry chain graph are used as positive samples. Then, the head node or tail node of the triple is randomly replaced to create negative samples. The difference between the scores of the positive and negative samples is used as the loss function, as shown in the following formula:

[0087]

[0088] S′ (h,r,t) ={(h′, r, t)|h′∈E}∪{(h, r, t′)|t′∈E}

[0089]

[0090] Where S represents the set of positive triplet samples. S' represents the set of negative samples. E represents the set of nodes in the graph. d(.) represents the distance formula between vectors h+r and t, regularized using the L2 norm. γ is a hyperparameter constant. The plus sign at the end of the loss formula indicates a positive value (0 for negative values, no change for positive values). During model learning, a smaller loss is better, meaning a larger positive sample score d(h+r, t) and a smaller negative sample score d(h′+r, t′) are better. This allows the model to learn the real triplet relationships in the industry chain graph. For example, using the (enterprise-production-product) triplet as a positive sample, and randomly replacing the head node (industry-production-product) or the tail node (enterprise-production-first-level category) as a negative sample, training the model using the above formula allows the model to increasingly focus on the positive sample (enterprise-production-product) relationship while ignoring non-existent relationships such as the negative sample (industry-production-product) or (enterprise-production-first-level category). This ensures that the parameter matrix learned by the embedding layer contains accurate industry chain graph information. Since both the head node and the tail node belong to nodes in the industry chain graph, they are uniformly represented using node embedding.

[0091] 2. Material vectorization representation based on user behavior

[0092] In constructing the material vector, a classic graph representation model is used. To integrate the material embedding into user behavior information, this application constructs a graph representation of the user behavior sequence, then uses a random walk sampling method to sample the material sequence, and finally inputs it into a skip-gram model to train and obtain the desired material embedding representation. The structure is as follows: Figure 8 As shown.

[0093] Figure 8 In the diagram, (a) represents the actions of different users clicking on materials. For example, user1 clicks on materials B, D, and A to view them, explicitly showing that user1 has some interest in materials B, D, and A. Then, the click sequences of all users are constructed into a graph structure as shown in (b). At this point, the interests of different users are associated through materials. Next, a random walk approach is used, where a node is randomly defined, and a sequence is sampled by traversing the graph. For example, node B can lead to D, and from D to A, resulting in a sequence like BDA. This process is repeated multiple times to obtain multiple sequences, which are then input into the skip-gram model to train the material vectorization representation model. The resulting material vectors contain rich user behavior information.

[0094] Figure 8Steps (a) to (b) are rule-based, constructing a graph structure from user behavior sequences. Steps (b) to (c) use a random walk approach, randomly selecting a node as the starting node of the current sequence, and then sampling the sequence on the graph by setting the walk step size. Finally, step (d) uses a skip-gram model to train the sequence data; the model structure is as follows. Figure 9 As shown.

[0095] The skip-gram model is an unsupervised model that primarily trains semantic vectorized representations of sequence data. For example, given a user sequence CDEB, in this scenario, this application can assume that when a user clicks on item E, they will also be influenced by the items before and after it; that is, different items in a sequence will have mutual influence relationships. How to learn these relationships is the core of skip-gram. In the following text, "item" will be used to represent a material.

[0096] The specific process mainly involves constructing item pairs from the sequence using a sliding window approach. Each sliding window determines a central item, and then the central item is used to predict neighboring items, allowing the model to explicitly learn the relationships of mutual influence within the sequence. Similarly, for the sequence CDEB, this application sets the window size to 2, with C as the central item, corresponding to... Figure 9 In the expression v(t), we can construct item pairs like (C, D) and (C, E), where D is the t+1 neighbor of C, i.e. Figure 9 In the context of v(t+1), E is a t+2 neighbor of C, i.e. Figure 9 In the context of v(t+2), since C is the starting node, it has no neighboring nodes at t-1 and t-2. When D is the center word, we can construct item pairs of (D, C), (D, E), and (D, B), where C is D's t-1 neighbor, and E and B are D's t+1 and t+2 neighbors, respectively. Therefore, the window size is 2, and each center item constructs a material pair consisting of items with two preceding and following steps. Continuing to slide the window, we can obtain the model's input dataset. Simultaneously, to improve the model's generalization ability and accelerate model training, negative sampling is used; each center item is randomly paired with other items to form negative samples.

[0097] Similar to the training process and supply chain mapping vectorization, the input is an item pair, also using an embedding parameter matrix W. For each item, its corresponding item embedding vector can be obtained through matrix W. The center item is used to predict neighboring items, and the distance function is used as the prediction score. The final loss function is the positive sample score minus the negative sample score, enabling the model to understand the relationship between positive sample item pairs. For example, for the sequence CDEB, with D as the center material, there are positive samples (D, C), (D, E), and (D, B). Then, negative samples (D, A) and (D, F) are randomly constructed. During model training, each positive sample is matched with 5 negative samples, and the loss formula is as follows:

[0098]

[0099] Where dist represents the distance formula between vector h+r and vector t, v(t) represents a positive sample, and u(t) represents a negative sample. As the loss decreases, the model can better learn the item pair relationship of positive samples. At this time, updating the converged embedding parameter matrix can better represent the vector of each item (item embedding).

[0100] 3. Industry chain map and user behavior fusion model

[0101] Based on obtaining the node embedding and item embedding of each node in the industry chain graph, in order to achieve the effect of fusing the two, the following approach is designed to integrate TransE and the graph representation model. Figure 10 The ckg-graph-embedding fusion representation model shown explicitly learns industry chain graph information during the construction of material vectors.

[0102] Figure 10The left side of the model represents a material embedding model based on user behavior sequences, while the right side can be a vectorized representation model based on the TransE knowledge graph. The basic idea of ​​fusion is to input material IDs into both models, obtaining the material item embedding vector and the TransE node embedding vector respectively. Then, a cross-processing step is performed on these two vectors to fuse them. The cross-processing can use a simple Hadamard product or an inner product, depending on the business performance. After the cross-processing, compression is performed, i.e., through multiple fully connected layers to compress the cross-vectors to obtain the required dimensions, which are then returned to the two models for downstream tasks. This model fusion paradigm can effectively integrate the material embedding model with the information from the industry chain graph model, resulting in a richer fused embedding representation that includes information about user click behavior and interests, as well as information about upstream and downstream relationships in the industry chain graph.

[0103] Meanwhile, regarding sample construction, since the construction of positive and negative samples is particularly important for the model, negative samples are constructed from the following aspects based on business experience:

[0104] 1. Randomly sampled materials within the same category as the positive sample materials;

[0105] 2. Among the materials that users have seen but haven't clicked, select the top K materials by the number of times they have been seen;

[0106] 3. Users accidentally clicked on promotional materials;

[0107] 4. User-exposed materials that have never been clicked.

[0108] The negative sampling in the above four steps takes into account several aspects such as popularity, user display of negative feedback, and randomness. The positive and negative samples constructed in this way can effectively distinguish whether users are interested in the current material.

[0109] In addition to changing the head node, when the material is the head node, the relationship vector can also be changed as the negative sample, so that the model focuses on the upstream and downstream relationship information.

[0110] Based on the aforementioned industry chain map and user behavior fusion model, a material representation vector integrating user information and industry chain information can be obtained, i.e. Figure 10 The fusion embedding is used in the retrieval system. Based on historical retrieval data, a dual-tower model of item and query is constructed, allowing both item and keyword embeddings to be trained and fine-tuned in the same semantic space. This ensures that the ckg-graph-embedding includes query semantic information. The model is as follows: Figure 11 As shown in the figure. MLP is an abbreviation for Multi-layer Perceptron.

[0111] This dual-tower model takes a material ID and a keyword as input and outputs the probability of whether a click occurs. In other words, it determines whether a click occurs on the corresponding item after a keyword is input. The loss function is a binary classification cross-entropy loss function. After training, the left side can offline convert all materials into material embeddings and store them in a cache database. The right-side query layer can be deployed online to convert real-time queries into embedding representations. Finally, the query embeddings are used to perform similarity retrieval in the material database, selecting the top N most similar materials as the search results and returning them to the user. The overall process is as follows: Figure 12 As shown.

[0112] Figure 12 The main steps are as follows:

[0113] 1. The user enters a query keyword;

[0114] 2. The query layer model outputs keywords as query embedding vectors;

[0115] 3. Calculate the similarity between the query embedding and all material embeddings, and select the N materials with the highest similarity as the result set to return.

[0116] 4. Return the material result set to the user.

[0117] In summary, the search results now include not only materials related to the user's historical behavior, but also materials related to the upstream and downstream of the industry chain, making the search results more accurate.

[0118] According to another aspect of this application, an embodiment provides a material sourcing system for executing the material sourcing method described in the above embodiments, such as... Figure 13 As shown, the system includes:

[0119] The knowledge graph module 501 is used to crawl external data and construct an industry chain knowledge graph based on the external data and internal enterprise data to obtain upstream and downstream relationships.

[0120] The behavior sequence module 502 is used to acquire user tracking data and construct a user behavior sequence based on the user tracking data.

[0121] The fusion model module 503 is used to input the industry chain knowledge graph and user behavior sequence into the industry chain graph and user behavior fusion model for training; the industry chain graph and user behavior fusion model includes the TransE model and the graph vector model;

[0122] The dual-tower model module 504 is used to construct and train a dual-tower model of materials and keywords based on historical retrieval data, and to convert all materials into material vectors; the dual-tower model includes the industry chain map and user behavior fusion model and the keyword model;

[0123] The source tracing calculation module 505 is used to input keywords into the keyword model to obtain keyword vectors, perform similarity calculations between the keyword vectors and all the material vectors, and select at least one material with the highest similarity as the material source tracing result.

[0124] The material sourcing system provided in the above embodiments of this application and the material sourcing method provided in the embodiments of this application are based on the same inventive concept and have the same beneficial effects as the methods adopted, run or implemented by the application stored therein.

[0125] This application also provides an electronic device corresponding to the material sourcing method provided in the foregoing embodiments, for executing the material sourcing method. This application does not limit the scope of the embodiments.

[0126] Please refer to Figure 14 This illustrates a schematic diagram of an electronic device provided by some embodiments of this application. For example... Figure 14 As shown, the electronic device 2 includes: a processor 200, a memory 201, a bus 202, and a communication interface 203. The processor 200, the communication interface 203, and the memory 201 are connected via the bus 202. The memory 201 stores a computer program that can run on the processor 200. When the processor 200 runs the computer program, it executes the material sourcing method provided in any of the foregoing embodiments of this application.

[0127] The memory 201 may include high-speed random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Communication between this system network element and at least one other network element is achieved through at least one communication interface 203 (which can be wired or wireless), such as the Internet, wide area network, local area network, or metropolitan area network.

[0128] Bus 202 can be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc. The memory 201 is used to store programs. After receiving an execution instruction, the processor 200 executes the program. The material sourcing method disclosed in any of the foregoing embodiments of this application can be applied to the processor 200, or implemented by the processor 200.

[0129] The processor 200 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of the processor 200 or by instructions in software form. The processor 200 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor may be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in memory 201. The processor 200 reads the information in memory 201 and, in conjunction with its hardware, completes the steps of the above method.

[0130] The electronic device provided in this application embodiment and the material sourcing method provided in this application embodiment are based on the same inventive concept and have the same beneficial effects as the methods they adopt, operate or implement.

[0131] This application also provides a computer-readable storage medium corresponding to the material sourcing method provided in the foregoing embodiments. Please refer to... Figure 15 The computer-readable storage medium shown is an optical disc 30, on which a computer program (i.e., a program product) is stored. When the computer program is run by a processor, it executes the material sourcing method provided in any of the foregoing embodiments.

[0132] It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other optical and magnetic storage media, which will not be elaborated here.

[0133] The computer-readable storage medium provided in the above embodiments of this application and the material sourcing method provided in the embodiments of this application are based on the same inventive concept and have the same beneficial effects as the methods adopted, run or implemented by the application programs stored therein.

[0134] It should be noted that:

[0135] The algorithms and displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general-purpose systems can also be used in conjunction with the teachings herein. The required structure for constructing such systems is apparent from the above description. Furthermore, this application is not directed to any particular programming language. It should be understood that the content of this application described herein can be implemented using various programming languages, and the above description of specific languages ​​is for the purpose of disclosing the best mode of implementation of this application.

[0136] Numerous specific details are set forth in the specification provided herein. However, it will be understood that embodiments of this application may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this specification.

[0137] Similarly, it should be understood that, in order to simplify this application and aid in understanding one or more of the various inventive aspects, in the above description of exemplary embodiments of this application, various features of this application are sometimes grouped together into a single embodiment, figure, or description thereof. However, this method of disclosure should not be construed as reflecting an intention that the claimed application requires more features than are expressly recited in each claim. Rather, as reflected in the following claims, inventive aspects lie in fewer than all features of a single foregoing disclosed embodiment. Therefore, the claims following the detailed description are hereby expressly incorporated into that detailed description, wherein each claim itself is a separate embodiment of this application.

[0138] Those skilled in the art will understand that modules in the device of the embodiments can be adaptively changed and placed in one or more devices different from that embodiment. Modules, units, or components in the embodiments can be combined into a single module, unit, or component, and further, they can be divided into multiple sub-modules, sub-units, or sub-components. Except where at least some of such features and / or processes or units are mutually exclusive, any combination can be used to combine all features disclosed in this specification (including the accompanying claims, abstract, and drawings) and all processes or units of any method or device so disclosed. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract, and drawings) may be replaced by an alternative feature that serves the same, equivalent, or similar purpose.

[0139] Furthermore, those skilled in the art will understand that although some embodiments described herein include certain features but not others included in other embodiments, combinations of features from different embodiments are intended to be within the scope of this application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

[0140] The various component embodiments of this application can be implemented in hardware, or as software modules running on one or more processors, or a combination thereof. Those skilled in the art will understand that microprocessors or digital signal processors (DSPs) can be used in practice to implement some or all of the functions of some or all of the components in the virtual machine creation system according to the embodiments of this application. This application can also be implemented as a device or system program (e.g., a computer program and computer program product) for performing part or all of the methods described herein. Such an implementation of this application can be stored on a computer-readable medium, or can be in the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.

[0141] It should be noted that the above embodiments are illustrative of this application and not restrictive, and that those skilled in the art can devise alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses should not be construed as limiting the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. This application can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the unit claims enumerating several systems, several of these systems may be embodied by the same item of hardware. The use of the words first, second, and third, etc., does not indicate any order. These words can be interpreted as names.

[0142] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any person skilled in the art can easily conceive of various variations or substitutions within the technical scope disclosed in this application, and these should all be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A material source finding method, characterized in that, include: External data is crawled, and a supply chain knowledge graph is constructed based on the external data and internal enterprise data to obtain upstream and downstream relationships. Obtain user tracking data and construct a user behavior sequence based on the user tracking data; The industry chain knowledge graph and user behavior sequence are input into the industry chain graph and user behavior fusion model for training; the industry chain graph and user behavior fusion model includes the TransE model and the graph vector model; Based on historical retrieval data, a dual-tower model of materials and keywords is constructed and trained to convert all materials into material vectors; the dual-tower model includes a fusion model of industry chain map and user behavior, as well as a keyword model for converting keywords into keyword vectors. Input the keywords into the keyword model to obtain the keyword vector. Calculate the similarity between the keyword vector and all the material vectors, and select at least one material with the highest similarity as the material sourcing result.

2. The method according to claim 1, characterized in that, The step of inputting the industry chain knowledge graph and user behavior sequence into the industry chain graph and user behavior fusion model for training includes: The TransE model was used to train the node vectors representing upstream and downstream relationships in the aforementioned industry chain knowledge graph. Vectors based on the user behavior sequence are trained using a graph vector model; The node vector and the user behavior sequence vector are fused and compressed to obtain a fused vector; After merging the fused vector with the original vectors of the TransE model and the graph vector model, the loss is calculated in the downstream task of each model, and the model is trained by backpropagation.

3. The method according to claim 1, characterized in that, The process of crawling external data and constructing a supply chain knowledge graph based on the external data and internal enterprise data to obtain upstream and downstream relationships includes: Extract internal entities from the existing structured data within the enterprise; By utilizing entity recognition, relation extraction, and entity fusion, entities and relations required for the industry chain graph are extracted from crawled data. Combined with the internal entities of the enterprise, an industry chain knowledge graph is generated. Each relation in the knowledge graph is represented by a triple, i.e., head node-relationship-tail node. Based on the relationship between products and categories, the corresponding industry chain node of the category is mapped, and the industry chain attributes of the current material are obtained based on the upstream and downstream relationships of the category.

4. The method according to claim 1, characterized in that, The step of acquiring user tracking data and constructing a user behavior sequence based on the user tracking data includes: Construct a graph structure from all users' click sequences and associate the interests of different users through materials; A sequence is sampled by randomly walking through the paths in the graph structure. By repeatedly traversing the path multiple times, multiple sequences are obtained, which serve as user behavior sequences.

5. The method according to claim 2, characterized in that, The step of training the node vectors representing upstream and downstream relationships in the industry chain knowledge graph using the TransE model includes: The nodes in the industrial chain knowledge graph are vectorized using a vector layer to obtain the node vectors of upstream and downstream relationships in the industrial chain knowledge graph; The input head node ID and relation ID are processed through a vector layer to obtain the head node vector and relation vector, respectively. Then, using the triangular representation of vectors, the sum vector of the head node vector and relation vector is obtained. Finally, the distance between the head node vector, relation vector and tail node vector is evaluated as the loss value, and backpropagation is used to train the model.

6. The method according to claim 2, characterized in that, The graph vector model is a skip-gram model, used to train the semantic vectorization representation of sequence data; during model training, each positive sample is matched with 5 negative samples, and the loss formula is as follows: Where dist represents the distance between vector h+r and vector t. Let v(t+n) represent the center positive sample corresponding to the center of the sliding window, n be the size of the sliding window, and v(t+n) represent the n positive samples adjacent to the center positive sample v(t) in the positive sample. The negative sample is represented by ; dist(v(t), v(t+n)) represents the distance between v(t) and the corresponding positive samples from V(t+1) to v(t+n); Dist(v(t), u(t)) represents the distance between the positive sample v(t) and the negative sample u(t).

7. The method according to claim 6, characterized in that, The step of training a material vector based on the user behavior sequence using a graph vector model includes: Material pairs are constructed from the user behavior sequence using a sliding window approach, with each sliding window determining a central material. For each material, its corresponding material vector is obtained through the vector parameter matrix. The neighboring materials are predicted using the center material, and the distance function is used as the prediction score. The final loss function is the positive sample score minus the negative sample score.

8. The method according to claim 2 or 7, characterized in that, The step of fusing and compressing the node vector and material vector to obtain the fused vector includes: The node vector and the material vector are cross-processed to obtain a cross vector; After passing through multiple fully connected layers, the cross vectors are compressed to obtain the required dimensions, and then returned to the graph vector model and TransE model for downstream tasks.

9. The method according to claim 1, characterized in that, During training, the dual-tower model takes material ID and keyword as inputs and outputs the probability of whether the keyword corresponds to the clicked material. After training, the dual-tower model takes material ID and keywords as inputs and outputs the top N materials with the highest similarity. The loss function is the binary cross-entropy loss function.

10. A material sourcing system, characterized in that, include: The knowledge graph module is used to crawl external data and construct a supply chain knowledge graph based on the external data and internal enterprise data to obtain upstream and downstream relationships. The behavior sequence module is used to acquire user tracking data and construct user behavior sequences based on the user tracking data. The fusion model module is used to input the industry chain knowledge graph and user behavior sequence into the industry chain graph and user behavior fusion model for training; the industry chain graph and user behavior fusion model includes the TransE model and the graph vector model. The dual-tower model module is used to construct and train a dual-tower model of materials and keywords based on historical retrieval data, and to convert all materials into material vectors; the dual-tower model includes the industry chain map and user behavior fusion model and a model for converting keywords into keyword vectors; The source tracing calculation module is used to input keywords into the keyword model to obtain keyword vectors, perform similarity calculations between the keyword vectors and all the material vectors, and select at least one material with the highest similarity as the material source tracing result.

11. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1-9.

12. An electronic device comprising a memory and a processor, characterized in that, The memory stores computer-readable instructions that, when executed by the processor, cause the processor to perform the steps of the method as described in any one of claims 1-9.

Citation Information

Patent Citations

  • Information recommendation method and device, electronic equipment and storage medium

    CN112395506A

  • Supplier recommendation method based on knowledge graph

    CN113127754A