Multi-category recognition method and device, electronic equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a target multi-category recognition model, the problem of inaccurate category recognition is solved, achieving more accurate category recognition, improving the user shopping experience, and adapting to changes in the candidate category set, supporting the flexibility of product management.

CN117131155BActive Publication Date: 2026-06-16XIAOMI TECH (WUHAN) CO LTD +2

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: XIAOMI TECH (WUHAN) CO LTD
Filing Date: 2023-08-17
Publication Date: 2026-06-16

Application Information

Patent Timeline

17 Aug 2023

Application

16 Jun 2026

Publication

CN117131155B

IPC: G06F16/334; G06F16/35; G06N3/0464; G06N3/08

CPC: G06F16/3346; G06F16/35; G06N3/0464; G06N3/08

AI Tagging

Application Domain

Special data processing applications Neural learning methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN117131155B_ABST

Patent Text Reader

Abstract

The present disclosure relates to a multi-category identification method and device, electronic equipment and storage medium, and relates to the technical field of natural language processing. The method comprises the following steps: obtaining a real-time request text input by a user; inputting the real-time request text into a target multi-category identification model to obtain a category prediction probability corresponding to each candidate category output by the target multi-category identification model, wherein the target multi-category identification model is provided with a candidate category set, and the candidate category set comprises a plurality of candidate categories; sorting all category prediction probabilities in descending order to obtain a category prediction probability sequence generated after sorting; obtaining N candidate categories corresponding to the first N category prediction probabilities in the category prediction probability sequence, and taking the N candidate categories as target categories corresponding to the real-time request text. The present application can strengthen the understanding of user intent through the target multi-category identification model, provide more accurate category identification results, and further improve the user shopping experience and satisfaction.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of natural language processing technology, and in particular to a multi-category recognition method, apparatus, electronic device, and storage medium. Background Technology

[0002] In today's e-commerce search, the variety and quantity of products are vast. Simple keyword matching can no longer meet the comprehensive and diverse query needs and the mapping relationships between products. Category recognition technology, as one of the key technologies in e-commerce search, can accurately identify the user's true intent in a search scenario. This technology can not only narrow the recall scope of the search system's recall module, but also provide category features to the ranking module, displaying products that users are more interested in at the top, making it easier for users to find the products they need, thereby improving the shopping experience and satisfaction. Therefore, accurately identifying categories from user search text is an urgent problem to be solved. Summary of the Invention

[0003] This disclosure provides a multi-category identification method, apparatus, electronic device, and storage medium to at least solve the problem of inaccurate category identification of user search text. The technical solution of this disclosure is as follows:

[0004] According to a first aspect of the present disclosure, a multi-category recognition method is provided, comprising: acquiring real-time request text input by a user; inputting the real-time request text into a target multi-category recognition model, acquiring the category prediction probability corresponding to each candidate category output by the target multi-category recognition model, wherein the target multi-category recognition model has a preset candidate category set, the candidate category set containing multiple candidate categories; sorting all category prediction probabilities in descending order, acquiring a sorted category prediction probability sequence; acquiring the N candidate categories corresponding to the top N category prediction probabilities in the category prediction probability sequence, and using the N candidate categories as the target category corresponding to the real-time request text.

[0005] In some embodiments, a method for training a target multi-category recognition model includes: acquiring a sample dataset, wherein each sample data in the sample dataset contains a request text and one or more category tags associated with the request text; acquiring a preset candidate category set, the candidate category set containing multiple candidate categories; training an initial multi-category recognition model based on the sample dataset and the candidate category set, and acquiring the category prediction probability of each candidate category corresponding to each request text output by the initial multi-category recognition model; iteratively training the initial multi-category recognition model based on the category prediction probability of each candidate category corresponding to each request text, combined with one or more category tags associated with the request text, to acquire the target multi-category recognition model generated after training.

[0006] In some embodiments, the initial multi-category recognition model consists of a text encoder, a category encoder, a semantic aggregation layer, and an output layer. The initial multi-category recognition model is trained based on a sample dataset and a candidate category set to obtain the category prediction probability for each candidate category corresponding to each request text output by the initial multi-category recognition model. This includes: inputting the request text from the sample dataset into the text encoder to obtain the text feature vector output by the text encoder; inputting the candidate category set into the category encoder to obtain the category vector output by the category encoder; aggregating the category vector and the text feature vector based on the semantic aggregation layer to obtain the aggregated feature vector; and outputting the category prediction probability for each candidate category corresponding to each request text after processing by the output layer based on the aggregated feature vector.

[0007] In some embodiments, the text encoder comprises a position encoder, a pre-trained sentence encoder, and a stacked encoder. The process involves inputting the request text from the sample dataset into the text encoder to obtain the text feature vector output by the text encoder, including: inputting the request text from the sample dataset into the position encoder to obtain the position embedding vector output by the position encoder; inputting the request text from the sample dataset into the pre-trained sentence encoder to obtain the semantic feature vector output by the pre-trained sentence encoder, wherein the position embedding vector and the semantic feature vector have the same dimension; adding the position embedding vector and the semantic feature vector to obtain the semantic-position fusion vector; and inputting the semantic-position fusion vector into the stacked encoder to obtain the text feature vector output by the stacked encoder after feature extraction.

[0008] In some embodiments, the category vector and text feature vector are aggregated based on the semantic aggregation layer to obtain the aggregated feature vector, including: obtaining the weight of each token vector in the text feature vector based on the attention mechanism; and weighting the category vector based on the weight of each token vector to obtain the aggregated feature vector.

[0009] In some embodiments, based on the aggregated feature vector, the category prediction probability of each candidate category corresponding to each request text is output after processing by the output layer, including: performing linear transformation and function activation on the aggregated feature vector to obtain the category prediction probability of each candidate category corresponding to each request text.

[0010] In some embodiments, iteratively training an initial multi-category recognition model to obtain a target multi-category recognition model generated after training includes: iteratively training the initial multi-category recognition model until the loss function of the initial multi-category recognition model converges, ending the training, and obtaining the target multi-category recognition model generated after training; or, iteratively training the initial multi-category recognition model until the number of training iterations of the initial multi-category recognition model reaches a preset number, ending the training, and obtaining the target multi-category recognition model generated after training.

[0011] In some embodiments, obtaining a sample dataset includes: obtaining the request text entered by each sample user based on the search and browsing logs of sample users, wherein the request text consists of one or more languages; obtaining the user browsing time corresponding to each product browsed by each sample user after entering the request text; for any request text, obtaining the products whose user browsing time exceeds a preset time threshold after entering the request text and the corresponding category of the product, and using the category corresponding to the product as the associated category of the request text; for any request text, generating an initial sample data based on the request text and one or more associated categories corresponding to the request text; processing each initial sample data to obtain processed sample data, and generating a sample dataset based on multiple sample data.

[0012] In some embodiments, each initial sample data is processed to obtain processed sample data, including: formatting each initial sample data to obtain multiple formatted sample data generated after data formatting; performing data augmentation on each formatted sample data to obtain multiple data augmented sample data generated after data augmentation; and performing data cleaning on each data augmented sample data to obtain multiple sample data generated after data cleaning.

[0013] According to a second aspect of the present disclosure, a multi-category recognition device is provided, comprising: a text acquisition module for acquiring real-time request text input by a user; a model output module for inputting the real-time request text into a target multi-category recognition model and acquiring the category prediction probability corresponding to each candidate category output by the target multi-category recognition model, wherein the target multi-category recognition model has a preset candidate category set, and the candidate category set contains multiple candidate categories; a probability sorting module for sorting all category prediction probabilities in descending order and acquiring a sorted category prediction probability sequence; and a category determination module for acquiring the N candidate categories corresponding to the first N category prediction probabilities in the category prediction probability sequence and using the N candidate categories as the target category corresponding to the real-time request text.

[0014] In some embodiments, the multi-category recognition device further includes a model training module, which is configured to: acquire a sample dataset, wherein each sample data in the sample dataset contains a request text and one or more category tags associated with the request text; acquire a preset candidate category set, which contains multiple candidate categories; train an initial multi-category recognition model based on the sample dataset and the candidate category set, and acquire the category prediction probability of each candidate category corresponding to each request text output by the initial multi-category recognition model; and iteratively train the initial multi-category recognition model based on the category prediction probability of each candidate category corresponding to each request text, combined with one or more category tags associated with the request text, to acquire a target multi-category recognition model generated after training.

[0015] In some embodiments, the initial multi-category recognition model consists of a text encoder, a category encoder, a semantic aggregation layer, and an output layer. The model training module is further configured to: input the request text of the sample dataset into the text encoder to obtain the text feature vector output by the text encoder; input the candidate category set into the category encoder to obtain the category vector output by the category encoder; aggregate the category vector and the text feature vector based on the semantic aggregation layer to obtain the aggregated feature vector; and output the category prediction probability of each candidate category corresponding to each request text after processing by the output layer based on the aggregated feature vector.

[0016] In some embodiments, the text encoder comprises a position encoder, a pre-trained sentence encoder, and a stacked encoder. The model training module is further configured to: input the request text from the sample dataset into the position encoder to obtain the position embedding vector output by the position encoder; input the request text from the sample dataset into the pre-trained sentence encoder to obtain the semantic feature vector output by the pre-trained sentence encoder, wherein the position embedding vector and the semantic feature vector have the same dimension; add the position embedding vector and the semantic feature vector to obtain the semantic position fusion vector obtained by the addition; and input the semantic position fusion vector into the stacked encoder to obtain the text feature vector output by the stacked encoder after feature extraction.

[0017] In some embodiments, the model training module is further configured to: obtain the weight of each token vector in the text feature vector based on the attention mechanism; and weight the aggregated feature vector by combining the weight of each token vector with the category vector.

[0018] In some embodiments, the model training module is further configured to: perform linear transformation and function activation on the aggregated feature vector to obtain the category prediction probability of each candidate category corresponding to each request text.

[0019] In some embodiments, the model training module is further configured to: iteratively train the initial multi-category recognition model until the loss function of the initial multi-category recognition model converges, end the training, and obtain the target multi-category recognition model generated after training; or, iteratively train the initial multi-category recognition model until the number of training iterations of the initial multi-category recognition model reaches a preset number, end the training, and obtain the target multi-category recognition model generated after training.

[0020] In some embodiments, the model training module is further configured to: obtain the request text input by each sample user based on the search and browsing logs of the sample users, wherein the request text consists of one or more languages; obtain the user browsing time corresponding to each product browsed by each sample user after inputting the request text; for any request text, obtain the products whose user browsing time exceeds a preset time threshold after inputting the request text and the corresponding categories of the products, and use the category corresponding to the products as the associated category of the request text; for any request text, generate an initial sample data based on the request text and one or more associated categories corresponding to the request text; process each initial sample data to obtain processed sample data, and generate a sample dataset based on multiple sample data.

[0021] In some embodiments, the model training module is further configured to: format each initial sample data to obtain multiple formatted sample data after data formatting; perform data augmentation on each formatted sample data to obtain multiple data-augmented sample data after data augmentation; and perform data cleaning on each data-augmented sample data to obtain multiple sample data after data cleaning.

[0022] According to a third aspect of the present disclosure, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor to implement a multi-category identification method as described in the first aspect of the present application.

[0023] According to a fourth aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are used to implement a multi-category identification method as described in the first aspect of the present application.

[0024] According to a fifth aspect of the present disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements a multi-category identification method as described in the first aspect of the present application.

[0025] The technical solutions provided by the embodiments of this disclosure have at least the following beneficial effects:

[0026] This application enhances the understanding of user intent through a target multi-category recognition model, providing more accurate category recognition results, thereby further improving the user shopping experience and satisfaction. Furthermore, the output of the target multi-category recognition model depends on the size of the candidate category set and can change as the candidate category set changes, reserving space for scenarios such as product delisting and relisting, category changes, and replacements.

[0027] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and are not intended to limit this disclosure. Attached Figure Description

[0028] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure, and are not intended to unduly limit this disclosure.

[0029] Figure 1 This is a schematic diagram illustrating an exemplary implementation of a multi-category identification method shown in this application.

[0030] Figure 2 This is a schematic diagram illustrating a training method for a target multi-category recognition model as shown in this application.

[0031] Figure 3 This is a training framework diagram of a target multi-category recognition model shown in this application.

[0032] Figure 4 This application illustrates a method for obtaining the predicted category probability of each candidate category corresponding to each request text.

[0033] Figure 5 This is a schematic diagram of a multi-category identification device shown in this application.

[0034] Figure 6 This is a block diagram illustrating an electronic device according to an exemplary embodiment. Detailed Implementation

[0035] To enable those skilled in the art to better understand the technical solutions of this disclosure, the technical solutions in the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings.

[0036] It should be noted that the terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this disclosure described herein can be implemented in orders other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this disclosure as detailed in the appended claims.

[0037] Figure 1 This is a schematic diagram illustrating an exemplary implementation of a multi-category identification method shown in this application, such as... Figure 1 As shown, this multi-category identification method includes the following steps:

[0038] S101, Obtain the real-time request text input by the user.

[0039] In this context, the real-time request text is the phrase the user intends to search for. For example, in an online store product search scenario, the real-time request text is the query term the user enters in the search box, such as "smart bracelet" or "a book that can make sounds."

[0040] S102, input the real-time request text into the target multi-category recognition model, and obtain the category prediction probability corresponding to each candidate category output by the target multi-category recognition model. The target multi-category recognition model has a preset candidate category set, which contains multiple candidate categories.

[0041] In the target multi-category recognition model, a candidate category set is preset. The candidate category set includes all categories corresponding to the products, such as "bracelet", "mobile phone", "digital product", "watch" etc.

[0042] The real-time request text is input into the target multi-category recognition model, and the category prediction probability corresponding to each candidate category output by the target multi-category recognition model is obtained. For example, if the candidate category set includes 100 candidate categories, after inputting the real-time request text into the target multi-category recognition model, the target multi-category recognition model will output the category prediction probability corresponding to each of these 100 candidate categories. The category prediction probability is used to represent the degree of relevance between the real-time request text and the candidate category.

[0043] S103, sort all category prediction probabilities in descending order, and obtain the sorted category prediction probability sequence.

[0044] For example, if the candidate category set includes 100 candidate categories, the target multi-category recognition model will output the real-time request text and the category prediction probability corresponding to each of the 100 candidate categories. The 100 category prediction probabilities will be sorted in descending order to obtain the sorted category prediction probability sequence.

[0045] S104. Obtain the N candidate categories corresponding to the top N category prediction probabilities in the category prediction probability sequence, and use the N candidate categories as the target category corresponding to the real-time request text.

[0046] If N is set to 5, then the top 5 predicted probabilities of the ...

[0047] Furthermore, in this application, a category prediction probability threshold can also be set. For example, the category prediction probability threshold can be set to 0.5. If there is a category prediction probability less than the category prediction probability threshold among the first N category prediction probabilities, then the candidate category corresponding to the category prediction probability is filtered out, and the final remaining candidate category among the N candidate categories is taken as the target category corresponding to the real-time request text.

[0048] This application proposes a multi-category recognition method. It involves acquiring real-time request text input by a user; inputting the real-time request text into a target multi-category recognition model; obtaining the category prediction probability corresponding to each candidate category output by the target multi-category recognition model; wherein the target multi-category recognition model has a preset candidate category set containing multiple candidate categories; sorting all category prediction probabilities in descending order to obtain a sorted category prediction probability sequence; obtaining the N candidate categories corresponding to the top N category prediction probabilities in the category prediction probability sequence, and using these N candidate categories as the target category corresponding to the real-time request text. This application enhances the understanding of user intent through a target multi-category recognition model, providing more accurate category recognition results, thereby further improving the user shopping experience and satisfaction. Furthermore, the output of the target multi-category recognition model depends on the size of the candidate category set and can change with the candidate category set, reserving space for scenarios such as product delisting / removal, category changes, and replacements.

[0049] Figure 2 This is a schematic diagram illustrating a training method for a multi-category target recognition model as shown in this application, such as... Figure 2 As shown, the training method for this multi-category target recognition model includes the following steps:

[0050] S201, Obtain a sample dataset, wherein each sample data in the sample dataset contains a request text and one or more class target tags associated with the request text.

[0051] In this application, the request text entered by each sample user is obtained based on their search and browsing logs. The request text consists of one or more languages. Furthermore, this application mines user click and browsing behavior from the search and browsing logs to construct the relationship between clicked and browsed products and the request text, thereby obtaining the dataset for model training and avoiding the time-consuming problem of manual annotation.

[0052] For example, the request text may consist of only one language, or it may consist of multiple languages from multiple regions, such as: auriculares (Spanish), montre (French), monopattano (Italian), handyhülle (German). cihaz (Turkish) etc.

[0053] After each sample user inputs a request text, the system obtains the user's browsing time for each product. For any given request text, it identifies the products whose browsing time exceeds a preset threshold and their corresponding categories. The category corresponding to these products is then used as the associated category for the request text. For example, if user 1 inputs "band," clicks on the product "mi band 5," and stays on it for more than the preset threshold, the category "wristband" is assigned to "band" as one of the categories. If user 2 inputs "band," clicks on the product "miband charger," and stays on it for more than the preset threshold, the category "wearable accessories" is assigned to "band" as one of the categories. In other words, the associated categories for the request text "band" include "wristband" and "wearable accessories."

[0054] For any request text, an initial sample data is generated based on the request text and one or more associated categories corresponding to the request text. The initial sample data can be represented as <request text, category 1, category 2, ...>.

[0055] After obtaining a large amount of initial sample data, each initial sample data needs to be processed to obtain processed sample data, and a sample dataset is generated based on multiple sample data.

[0056] Specifically, each initial sample data is processed to obtain processed sample data, including data formatting, data augmentation, and data cleaning, which will be described in turn below.

[0057] First, when processing each initial sample data, it is necessary to format each initial sample data, for example, by using methods such as converting uppercase to lowercase, filtering punctuation marks (keeping punctuation marks that are strongly related to the product, such as +), and replacing escape characters, to obtain multiple formatted sample data after data formatting.

[0058] Next, data augmentation needs to be performed on each formatted sample data to obtain multiple augmented sample data to enrich the training dataset, increase the diversity of request texts, and make the model more robust to unseen request texts.

[0059] Optionally, when augmenting each formatted sample data, the product title and keywords can be used as request text to provide more diverse and richer request text, thereby increasing the model's ability to recognize different product categories.

[0060] Optionally, when performing data augmentation on each formatted sample data, artificial noise variants can be used to augment the pattern of the request text: first, the request text is segmented, and then noise is added according to the following strategy:

[0061] 1. Randomly delete unimportant words from the request text, such as product model, modifiers (smart, handheld), etc.

[0062] 2. Randomly swap the order of adjacent words in the request text, for example: band strap becomes strapband.

[0063] 3. Randomly repeat some words in the request text, such as: band 5 becomes bandband 5 after repetition.

[0064] Optionally, when augmenting each formatted sample data, translation tools can be used to translate the request text into other languages and then back into the local language. This can introduce new semantics and expressions; for example, "reloj" in Spanish can be translated into "horloge" in French and then back into "mirar" in Spanish. This augmentation strategy can help the model learn category associations between different languages and improve the model's generalization ability in multilingual environments.

[0065] Finally, data cleaning is performed on each data augmentation sample to obtain multiple data samples generated after data cleaning.

[0066] Optionally, in this application, the associated request texts of clicks on the same product can be aggregated to obtain a sample dataset belonging to the same major category, thereby improving the generalization ability of the target multi-category recognition model.

[0067] S202, obtain the preset candidate category set, which contains multiple candidate categories.

[0068] In the target multi-category recognition model, a candidate category set is preset. The candidate category set includes all categories corresponding to the products, such as "bracelet", "mobile phone", "digital product", "watch" etc.

[0069] S203, the initial multi-category recognition model is trained based on the sample dataset and the candidate category set, and the category prediction probability of each candidate category corresponding to each request text output by the initial multi-category recognition model is obtained.

[0070] Figure 3 This application illustrates a training framework diagram for a multi-category target recognition model, as shown below. Figure 3 As shown, the initial multi-category recognition model consists of a text encoder, a category encoder, a semantic aggregation layer, and an output layer.

[0071] Figure 4 This application illustrates a method for obtaining the predicted category probability of each candidate category corresponding to each request text, as shown in the diagram. Figure 4 As shown, obtaining the category prediction probability for each candidate category corresponding to each request text includes the following steps:

[0072] S2031, Input the request text of the sample dataset into the text encoder and obtain the text feature vector output by the text encoder.

[0073] The text encoder consists of a position encoder, a pre-trained sentence encoder (Universal SentenceEncoder, USE), and a stack encoder.

[0074] In this application, the request text in the sample dataset is input into the position encoder to obtain the position embedding vector output by the position encoder. The position embedding vector represents the relative position information between words in the request text.

[0075] The request text from the sample dataset is input into a pre-trained sentence encoder to obtain the semantic feature vector output by the pre-trained sentence encoder. The position embedding vector has the same dimension as the semantic feature vector. If the semantic feature vector output by the pre-trained sentence encoder has a different dimension than the position embedding vector, a feed-forward network (FFN) can be added after the pre-trained sentence encoder to convert the semantic feature vector to the same dimension as the position embedding vector. The position embedding vector and the semantic feature vector are then added to obtain the semantic-position fusion vector.

[0076] The semantic location fusion vector is input into the stacked encoder to obtain the text feature vector output after feature extraction by the stacked encoder. The stacked encoder is composed of multiple layers of self-attention mechanisms, and the parameters of each self-attention mechanism are different.

[0077] S2032, input the candidate category set into the category encoder and obtain the category vector output by the category encoder.

[0078] Given the relatively short text length of candidate categories, this application utilizes a recurrent neural network as a category encoder in the model to encode the category text and obtain the category vector of the candidate category set. The recurrent neural network can use network structures such as Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gate Recurrent Unit (GRU).

[0079] S2033, based on the semantic aggregation layer, aggregates the category vector and the text feature vector to obtain the aggregated feature vector generated after aggregation.

[0080] In this application, an attention mechanism is used to highlight tokens in the request text that are strongly related to the category semantics, while suppressing irrelevant tokens. This means that tokens in the request text that are beneficial to category identification are preserved to the greatest extent. Then, the weight of each token vector in the text feature vector is obtained based on the attention mechanism. The weight of each token vector is combined with the category vector to obtain an aggregated feature vector, so that the semantic information of the category vector is integrated into the text feature vector, allowing the model to learn that requests with similar semantics are more likely to be identified as the same category.

[0081] S2034, based on aggregated feature vectors, outputs the category prediction probability of each candidate category corresponding to each request text after processing by the output layer.

[0082] The aggregated feature vector is subjected to linear transformation and activation by the sigmoid activation function to obtain the category prediction probability of each candidate category for each request text.

[0083] S204. Based on the category prediction probability of each candidate category corresponding to each request text, and combined with one or more category tags associated with the request text, the initial multi-category recognition model is iteratively trained to obtain the target multi-category recognition model generated after training.

[0084] As one feasible approach, based on the category prediction probability of each candidate category corresponding to each request text, and combined with one or more category tags associated with the request text, the initial multi-category recognition model is iteratively trained until the loss function of the initial multi-category recognition model converges, at which point training ends, and the target multi-category recognition model generated after training is obtained. The loss function can be cross-entropy loss.

[0085] As another possible approach, based on the category prediction probability of each candidate category corresponding to each request text, and combined with one or more category tags associated with the request text, the initial multi-category recognition model is iteratively trained until the training times of the initial multi-category recognition model reach a preset number, then the training ends, and the target multi-category recognition model generated after training is obtained.

[0086] Furthermore, during the training process, model files from multiple stages are saved. Based on a test set different from the training set, the best-performing model is selected from these multiple model files. The best-performing model is defined as the one with the highest recognition accuracy, and is defined as the target multi-category recognition model.

[0087] In this embodiment, a multilingual request text sample dataset is constructed, and a pre-trained sentence encoder is used in the model. This fully utilizes the semantic similarity of similar products in multiple regions to achieve data complementarity. It also provides an improvement method for the few-shot problem caused by the severe lack of data in the early stage of a newly established e-commerce website. That is, it uses data with similar semantics in other regions to improve the problem of insufficient corpus. At the same time, text feature vectors are extracted by the text encoder, category vectors are extracted by the category encoder, and then the category vectors are aggregated into the text feature vectors based on the semantic aggregation layer. This allows the final output layer to output the category prediction probability of each candidate category corresponding to each request text. The initial multi-category recognition model is iteratively trained to obtain a more accurate target multi-category recognition model for category recognition.

[0088] Figure 5 This is a schematic diagram of a multi-category identification device shown in this application, such as... Figure 5 As shown, the multi-category recognition device 500 includes a text acquisition module 501, a model output module 502, a probability ranking module 503, and a category determination module 504, wherein:

[0089] The text acquisition module 501 is used to acquire the real-time request text input by the user.

[0090] The model output module 502 is used to input the real-time request text into the target multi-category recognition model and obtain the category prediction probability corresponding to each candidate category output by the target multi-category recognition model. The target multi-category recognition model has a preset set of candidate categories, which contains multiple candidate categories.

[0091] The probability sorting module 503 is used to sort all category prediction probabilities in descending order and obtain the sorted category prediction probability sequence.

[0092] The category determination module 504 is used to obtain the N candidate categories corresponding to the top N category prediction probabilities in the category prediction probability sequence, and use the N candidate categories as the target category corresponding to the real-time request text.

[0093] This device enhances the understanding of user intent through a target multi-category recognition model, providing more accurate category recognition results, thereby further improving the user shopping experience and satisfaction. Furthermore, the output of the target multi-category recognition model depends on the size of the candidate category set and can change as the candidate category set changes, reserving space for scenarios such as product delisting and relisting, category changes, and replacements.

[0094] In some embodiments, the multi-category recognition device 500 further includes a model training module 505, which is configured to: acquire a sample dataset, wherein each sample data in the sample dataset contains a request text and one or more category tags associated with the request text; acquire a preset candidate category set, which contains multiple candidate categories; train an initial multi-category recognition model based on the sample dataset and the candidate category set, and acquire the category prediction probability of each candidate category corresponding to each request text output by the initial multi-category recognition model; and iteratively train the initial multi-category recognition model based on the category prediction probability of each candidate category corresponding to each request text, combined with one or more category tags associated with the request text, to acquire a target multi-category recognition model generated after training.

[0095] In some embodiments, the initial multi-category recognition model consists of a text encoder, a category encoder, a semantic aggregation layer, and an output layer. The model training module 505 is further configured to: input the request text of the sample dataset into the text encoder to obtain the text feature vector output by the text encoder; input the candidate category set into the category encoder to obtain the category vector output by the category encoder; aggregate the category vector and the text feature vector based on the semantic aggregation layer to obtain the aggregated feature vector; and output the category prediction probability of each candidate category corresponding to each request text after processing by the output layer based on the aggregated feature vector.

[0096] In some embodiments, the text encoder comprises a position encoder, a pre-trained sentence encoder, and a stacked encoder. The model training module 505 is further configured to: input the request text from the sample dataset into the position encoder to obtain the position embedding vector output by the position encoder; input the request text from the sample dataset into the pre-trained sentence encoder to obtain the semantic feature vector output by the pre-trained sentence encoder, wherein the position embedding vector and the semantic feature vector have the same dimension; add the position embedding vector and the semantic feature vector to obtain the semantic position fusion vector obtained by the addition; and input the semantic position fusion vector into the stacked encoder to obtain the text feature vector output by the stacked encoder after feature extraction.

[0097] In some embodiments, the model training module 505 is further configured to: obtain the weight of each token vector in the text feature vector based on the attention mechanism; and weight the aggregated feature vector by combining the weight of each token vector with the category vector.

[0098] In some embodiments, the model training module 505 is further configured to: perform linear transformation and function activation on the aggregated feature vector to obtain the category prediction probability of each candidate category corresponding to each request text.

[0099] In some embodiments, the model training module 505 is further configured to: iteratively train the initial multi-category recognition model until the loss function of the initial multi-category recognition model converges, end the training, and obtain the target multi-category recognition model generated after training; or, iteratively train the initial multi-category recognition model until the number of training iterations of the initial multi-category recognition model reaches a preset number, end the training, and obtain the target multi-category recognition model generated after training.

[0100] In some embodiments, the model training module 505 is further configured to: obtain the request text input by each sample user based on the search and browsing logs of the sample users, wherein the request text consists of one or more languages; obtain the user browsing time corresponding to each product browsed by each sample user after inputting the request text; for any request text, obtain the products whose user browsing time exceeds a preset time threshold after inputting the request text and the corresponding categories of the products, and use the category corresponding to the products as the associated category of the request text; for any request text, generate an initial sample data based on the request text and one or more associated categories corresponding to the request text; process each initial sample data to obtain processed sample data, and generate a sample dataset based on multiple sample data.

[0101] In some embodiments, the model training module 505 is further configured to: format each initial sample data to obtain multiple formatted sample data generated after data formatting; perform data augmentation on each formatted sample data to obtain multiple data augmented sample data generated after data augmentation; and perform data cleaning on each data augmented sample data to obtain multiple sample data generated after data cleaning.

[0102] Figure 6 This is a block diagram illustrating an electronic device 600 according to an exemplary embodiment.

[0103] like Figure 6 As shown, the above-mentioned electronic device 600 includes:

[0104] The memory 601 and the processor 602 are connected by a bus 603, which connects the different components (including the memory 601 and the processor 602). The memory 601 stores a computer program, and when the processor 602 executes the program, it implements a multi-category identification method according to an embodiment of the present disclosure.

[0105] Bus 603 represents one or more of several bus architectures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of the various bus architectures. For example, these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect (PCI) bus.

[0106] Electronic device 600 typically includes a variety of electronic device readable media. These media can be any available media that can be accessed by electronic device 600, including volatile and non-volatile media, removable and non-removable media.

[0107] Memory 601 may also include computer system readable media in the form of volatile memory, such as random access memory (RAM) 604 and / or cache memory 605. Electronic device 600 may further include other removable / non-removable, volatile / non-volatile computer system storage media. By way of example only, storage system 606 can be used to read and write non-removable, non-volatile magnetic media (… Figure 6 Not shown; usually referred to as a "hard drive"). Although Figure 6Not shown, a disk drive for reading and writing to a removable non-volatile disk (e.g., a "floppy disk") and an optical disk drive for reading and writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 603 via one or more data media interfaces. Memory 601 may include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of the embodiments of this disclosure.

[0108] A program / utility 608 having a set (at least one) of program modules 607 may be stored, for example, in memory 601. Such program modules 607 include—but are not limited to—an operating system, one or more application programs, other program modules, and program data. Each or some combination of these examples may include an implementation of a network environment. Program modules 607 typically perform the functions and / or methods described in the embodiments of this disclosure.

[0109] Electronic device 600 can also communicate with one or more external devices 609 (e.g., keyboard, pointing device, display 610, etc.), and with one or more devices that enable a user to interact with the electronic device 600, and / or with any device that enables the electronic device 600 to communicate with one or more other computing devices (e.g., network card, modem, etc.). This communication can be performed through input / output (I / O) interface 611. Furthermore, electronic device 600 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) through network adapter 612. Figure 6 As shown, network adapter 612 communicates with other modules of electronic device 600 via bus 603. It should be understood that, although... Figure 6 As not shown in the diagram, other hardware and / or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.

[0110] The processor 602 executes various functional applications and data processing by running programs stored in the memory 601.

[0111] It should be noted that the implementation process and technical principles of the electronic device in this embodiment are explained in the foregoing description of a multi-category identification method according to an embodiment of this disclosure, and will not be repeated here.

[0112] To implement the above embodiments, this application also proposes a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to implement a multi-category identification method as shown in the above embodiments. Optionally, the non-transitory computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.

[0113] To implement the above embodiments, this application also proposes a computer program product, including a computer program that, when executed by a processor, implements a multi-category identification method as shown in the above embodiments.

[0114] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not disclosed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the following claims.

[0115] It should be understood that this disclosure is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of this disclosure is limited only by the appended claims.

Claims

1. A multi-category identification method, characterized in that, include: Get the real-time request text input by the user; The real-time request text is input into the target multi-category recognition model, and the category prediction probability corresponding to each candidate category output by the target multi-category recognition model is obtained. The target multi-category recognition model has a preset candidate category set, which contains multiple candidate categories. Sort all the predicted probabilities of the categories in descending order to obtain the sorted category predicted probability sequence; Obtain the N candidate categories corresponding to the top N category prediction probabilities in the category prediction probability sequence, and use the N candidate categories as the target category corresponding to the real-time request text; The sample dataset used to train the target multi-category recognition model includes: Based on the search and browsing logs of sample users, obtain the request text entered by each sample user, which consists of one or more languages; After each sample user inputs the request text, the user browsing time for each product viewed by that sample user is obtained. For any of the aforementioned request texts, obtain the products whose browsing time exceeds a preset time threshold after the user enters the request text, as well as the corresponding categories of the products, and use the category corresponding to the product as the associated category of the request text; For any of the aforementioned request texts, an initial sample data is generated based on the request text and one or more associated categories corresponding to the request text; Each of the initial sample data is processed to obtain processed sample data, and the sample dataset is generated based on multiple sample data.

2. The method according to claim 1, characterized in that, The training method for the target multi-category recognition model includes: Obtain a sample dataset, wherein each sample data in the sample dataset contains a request text and one or more class target tags associated with the request text; Obtain a preset set of candidate categories, wherein the set of candidate categories contains multiple candidate categories; The initial multi-category recognition model is trained based on the sample dataset and the candidate category set, and the category prediction probability of each candidate category corresponding to each request text output by the initial multi-category recognition model is obtained. Based on the predicted category probability of each candidate category corresponding to each request text, and combined with one or more category tags associated with the request text, the initial multi-category recognition model is iteratively trained to obtain the target multi-category recognition model generated after training.

3. The method according to claim 2, characterized in that, The initial multi-category recognition model consists of a text encoder, a category encoder, a semantic aggregation layer, and an output layer. The step of training the initial multi-category recognition model based on the sample dataset and the candidate category set, and obtaining the category prediction probability for each candidate category corresponding to each request text output by the initial multi-category recognition model, includes: Input the request text of the sample dataset into the text encoder to obtain the text feature vector output by the text encoder; Input the candidate category set into the category encoder to obtain the category vector output by the category encoder; The category vector and the text feature vector are aggregated based on the semantic aggregation layer to obtain the aggregated feature vector. Based on the aggregated feature vector, the output layer processes the data and outputs the predicted category probability for each candidate category corresponding to each request text.

4. The method according to claim 3, characterized in that, The text encoder consists of a position encoder, a pre-trained sentence encoder, and a stacked encoder. The step of inputting the request text from the sample dataset into the text encoder and obtaining the text feature vector output by the text encoder includes: Input the request text from the sample dataset into the location encoder to obtain the location embedding vector output by the location encoder; The request text in the sample dataset is input into the pre-trained sentence encoder to obtain the semantic feature vector output by the pre-trained sentence encoder. The position embedding vector has the same dimension as the semantic feature vector. The position embedding vector is added to the semantic feature vector to obtain the semantic position fusion vector. The semantic location fusion vector is input into the stacked encoder to obtain the text feature vector output after feature extraction by the stacked encoder.

5. The method according to claim 3, characterized in that, The step of aggregating the category vector and the text feature vector based on the semantic aggregation layer to obtain the aggregated feature vector includes: The weight of each token vector in the text feature vector is obtained based on the attention mechanism; The aggregated feature vector is obtained by weighting each token vector with the category vector.

6. The method according to claim 3, characterized in that, The step of outputting the category prediction probability for each candidate category corresponding to each request text after processing by the output layer based on the aggregated feature vector includes: The aggregated feature vector is subjected to linear transformation and function activation to obtain the category prediction probability of each candidate category corresponding to each request text.

7. The method according to any one of claims 2-6, characterized in that, The step of iteratively training the initial multi-category recognition model to obtain the target multi-category recognition model generated after training includes: The initial multi-category recognition model is iteratively trained until its loss function converges, at which point training ends, and the target multi-category recognition model generated after training is obtained; or, The initial multi-category recognition model is iteratively trained until the initial multi-category recognition model has been trained a preset number of times, then the training ends and the target multi-category recognition model generated after training is obtained.

8. The method according to claim 1, characterized in that, The process of processing each initial sample data to obtain processed sample data includes: Each initial sample data item is formatted to obtain multiple formatted sample data items after data formatting; Data augmentation is performed on each of the formatted sample data to obtain multiple data-augmented sample data. Each data augmentation sample is cleaned to obtain multiple data samples generated after data cleaning.

9. A multi-category identification device, characterized in that, include: The text acquisition module is used to acquire the real-time request text input by the user; The model output module is used to input the real-time request text into the target multi-category recognition model and obtain the category prediction probability corresponding to each candidate category output by the target multi-category recognition model. The target multi-category recognition model has a preset candidate category set, which contains multiple candidate categories. The probability sorting module is used to sort all the predicted probabilities of the categories in descending order and obtain the sorted category prediction probability sequence. The category determination module is used to obtain the N candidate categories corresponding to the top N category prediction probabilities in the category prediction probability sequence, and use the N candidate categories as the target category corresponding to the real-time request text. The model training module is used for: Based on the search and browsing logs of sample users, obtain the request text entered by each sample user, which consists of one or more languages; After each sample user inputs the request text, the user browsing time for each product viewed by that sample user is obtained. For any of the aforementioned request texts, obtain the products whose browsing time exceeds a preset time threshold after the user enters the request text, as well as the corresponding categories of the products, and use the category corresponding to the product as the associated category of the request text; For any of the aforementioned request texts, an initial sample data is generated based on the request text and one or more associated categories corresponding to the request text; Each of the initial sample data is processed to obtain processed sample data, and a sample dataset is generated based on multiple sample data.

10. The apparatus according to claim 9, characterized in that, The device further includes a model training module, the model training module being used for: Obtain a sample dataset, wherein each sample data in the sample dataset contains a request text and one or more class target tags associated with the request text; Obtain a preset set of candidate categories, wherein the set of candidate categories contains multiple candidate categories; The initial multi-category recognition model is trained based on the sample dataset and the candidate category set, and the category prediction probability of each candidate category corresponding to each request text output by the initial multi-category recognition model is obtained. Based on the predicted category probability of each candidate category corresponding to each request text, and combined with one or more category tags associated with the request text, the initial multi-category recognition model is iteratively trained to obtain the target multi-category recognition model generated after training.

11. The apparatus according to claim 10, characterized in that, The initial multi-category recognition model consists of a text encoder, a category encoder, a semantic aggregation layer, and an output layer. The model training module is further used for: Input the request text of the sample dataset into the text encoder to obtain the text feature vector output by the text encoder; Input the candidate category set into the category encoder to obtain the category vector output by the category encoder; The category vector and the text feature vector are aggregated based on the semantic aggregation layer to obtain the aggregated feature vector. Based on the aggregated feature vector, the output layer processes the data and outputs the predicted category probability for each candidate category corresponding to each request text.

12. The apparatus according to claim 11, characterized in that, The text encoder consists of a position encoder, a pre-trained sentence encoder, and a stacked encoder. The model training module is also used for: Input the request text from the sample dataset into the location encoder to obtain the location embedding vector output by the location encoder; The request text in the sample dataset is input into the pre-trained sentence encoder to obtain the semantic feature vector output by the pre-trained sentence encoder. The position embedding vector has the same dimension as the semantic feature vector. The position embedding vector is added to the semantic feature vector to obtain the semantic position fusion vector. The semantic location fusion vector is input into the stacked encoder to obtain the text feature vector output after feature extraction by the stacked encoder.

13. The apparatus according to claim 12, characterized in that, The model training module is also used for: The weight of each token vector in the text feature vector is obtained based on the attention mechanism; The aggregated feature vector is obtained by weighting each token vector with the category vector.

14. The apparatus according to claim 12, characterized in that, The model training module is also used for: The aggregated feature vector is subjected to linear transformation and function activation to obtain the category prediction probability of each candidate category corresponding to each request text.

15. The apparatus according to any one of claims 10-14, characterized in that, The model training module is also used for: The initial multi-category recognition model is iteratively trained until the loss function of the initial multi-category recognition model converges, then the training ends and the target multi-category recognition model generated after training is obtained. or, The initial multi-category recognition model is iteratively trained until the initial multi-category recognition model has been trained a preset number of times, then the training ends and the target multi-category recognition model generated after training is obtained.

16. The apparatus according to claim 9, characterized in that, The model training module is also used for: Each initial sample data item is formatted to obtain multiple formatted sample data items after data formatting; Data augmentation is performed on each of the formatted sample data to obtain multiple data-augmented sample data. Each data augmentation sample is cleaned to obtain multiple data samples generated after data cleaning.

17. An electronic device comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-8.

19. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1-8.