Model training method, object recognition method, related device and storage medium

By using a policy-based model training method to filter and cluster object recognition data, and combining an object pre-detection strategy with a target object recognition model, the problems of low accuracy and efficiency in object recognition tasks are solved, and efficient and accurate object recognition is achieved.

CN116630731BActive Publication Date: 2026-06-26TENCENT TECHNOLOGY (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TENCENT TECHNOLOGY (SHENZHEN) CO LTD
Filing Date
2022-02-09
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

The accuracy and efficiency of object recognition tasks in existing technologies are low, mainly due to missed detections and misjudgments caused by manual recognition methods.

Method used

A policy-based model training method is adopted. By acquiring the attribute description data of the initial objects and the object pre-detection strategy, sample objects are selected for data clustering and model training to obtain the target object recognition model. The object recognition model is combined with the object pre-detection strategy for preliminary recognition and the target object recognition model for secondary recognition to improve the accuracy and efficiency of object recognition.

Benefits of technology

It improves the accuracy and efficiency of object recognition, avoids problems such as resource waste and low training efficiency, and ensures the accuracy of sample data and the effectiveness of model training.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116630731B_ABST
    Figure CN116630731B_ABST
Patent Text Reader

Abstract

The application discloses a model training method, an object identification method, related equipment and a storage medium. The model training method mainly comprises the following steps: performing strategy hit detection on N object pre-detection strategies by using attribute description data of each initial object according to each keyword indicated by the N object pre-detection strategies; selecting, from a plurality of initial objects, initial objects corresponding to attribute description data that hits at least one object pre-detection strategy as sample objects; performing object type-based data clustering processing on the attribute description data of each sample object based on an object type corresponding to an object pre-detection strategy hit by the attribute description data of each sample object, to obtain a plurality of data sets; and performing model training on a benchmark object identification model by using each data set, to obtain a target object identification model under each object type. The application can improve the performance of the target object identification model and improve the accuracy of object identification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of Internet technology, specifically to the field of computer technology, and in particular to a model training method, an object recognition method, related devices, and storage media. Background Technology

[0002] With the continuous development of computer technology, object recognition has become a prominent task. Object recognition refers to identifying whether an object is a target of interest; such objects can be products, advertisements, web pages, audio, or video. Currently, this task is typically accomplished through manual identification. However, manual identification often results in missed detections and false positives, leading to both low accuracy and low efficiency. Summary of the Invention

[0003] This application provides a model training method, an object recognition method, related devices, and a storage medium, which can improve the performance of the target object recognition model and the accuracy of object recognition.

[0004] In one aspect, embodiments of this application provide a policy-based model training method, the method comprising:

[0005] Obtain attribute description data of multiple initial objects for training the benchmark object recognition model, and N object pre-detection strategies, where N is a positive integer; wherein, an object pre-detection strategy is used to indicate: one or more keywords that need to be associated with the attribute description data of an object under a certain object type;

[0006] Based on the keywords indicated by the N object pre-detection strategies, the attribute description data of each initial object is used to perform strategy hit detection on the N object pre-detection strategies.

[0007] From the plurality of initial objects, the initial objects corresponding to the attribute description data that hit at least one object pre-detection strategy are selected as the sample objects of the benchmark object recognition model.

[0008] Based on the object type of the object pre-detection strategy that the attribute description data of each sample object hits, the attribute description data of each sample object is subjected to data clustering based on object type to obtain multiple datasets, with each dataset corresponding to one object type.

[0009] The baseline object recognition model is trained using each dataset to obtain target object recognition models for multiple object types of interest. A target object recognition model is used to predict the probability that any object belongs to the corresponding object type of interest based on the attribute description data of any input object.

[0010] On the other hand, embodiments of this application provide a policy-based model training apparatus, the apparatus comprising:

[0011] The acquisition unit is used to acquire attribute description data of multiple initial objects for training the benchmark object recognition model, and N object pre-detection strategies, where N is a positive integer; wherein, an object pre-detection strategy is used to indicate one or more keywords that need to be associated with the attribute description data of an object under a certain object type;

[0012] The processing unit is used to perform policy hit detection on the N object pre-detection strategies based on the keywords indicated by the N object pre-detection strategies and the attribute description data of each initial object.

[0013] The processing unit is further configured to select, from the plurality of initial objects, the initial objects corresponding to the attribute description data that hit at least one object pre-detection strategy, as sample objects of the benchmark object recognition model;

[0014] The processing unit is also used to perform data clustering processing on the attribute description data of each sample object based on the object type of the object pre-detection strategy hit by the attribute description data of each sample object, to obtain multiple datasets, with each dataset corresponding to one object type.

[0015] The training unit is used to train the baseline object recognition model using each dataset to obtain target object recognition models for multiple object types of interest; a target object recognition model is used to predict the probability that any object belongs to the corresponding object type of interest based on the attribute description data of any input object.

[0016] In another aspect, embodiments of this application provide a computer device, the computer device including an input interface and an output interface, the computer device further including:

[0017] A processor, adapted to implement one or more instructions; and computer storage media;

[0018] The computer storage medium stores one or more instructions, which are adapted to be loaded and executed by the processor using the policy-based model training method mentioned above.

[0019] In another aspect, embodiments of this application provide a computer storage medium storing one or more instructions adapted for loading and execution by a processor of the aforementioned policy-based model training method.

[0020] In another aspect, embodiments of this application provide a computer program product, which includes a computer program; when the computer program is executed by a processor, it implements the policy-based model training method mentioned above.

[0021] This application's embodiments introduce a baseline object recognition model and train it to obtain a target object recognition model, thereby achieving the object recognition task through the target object recognition model. This improves the efficiency and accuracy of object recognition. Furthermore, during model training, N object pre-detection strategies are set. Each object pre-detection strategy indicates one or more keywords that should be associated with the attribute description data of objects under a certain object type. After obtaining the attribute description data of multiple initial objects used to train the baseline object recognition model, attribute description data associated with each object type can be selected from the attribute description data of the multiple initial objects based on the keywords indicated by the N object pre-detection strategies. This serves as sample data for the baseline object recognition model, ensuring the accuracy of the sample data and improving the subsequent model training effect. It also avoids the problem of resource waste and low training efficiency caused by the baseline object recognition model learning attribute description data unrelated to the object type. Furthermore, by clustering the attribute description data of each sample object into multiple datasets based on the object type of the pre-detection strategy corresponding to the attribute description data of each sample object, and using each dataset to specifically train the benchmark object recognition model, the benchmark object recognition model can consistently and focusedly optimize its model parameters by learning the attribute description data of the dataset corresponding to a single object type. This further improves the model training effect, enabling the trained single target object recognition model to have a strong recognition ability for objects under the corresponding object type, thus further improving the accuracy of object recognition.

[0022] On the other hand, embodiments of this application provide a policy and model-based object recognition method, the method comprising:

[0023] Obtain the target attribute description data of the target object to be identified and N object pre-detection strategies, where N is a positive integer; an object pre-detection strategy is used to indicate: one or more keywords that need to be associated with the attribute description data of an object under a certain object type;

[0024] Based on the keywords indicated by the N object pre-detection strategies, the target attribute description data is used to perform strategy hit detection on the N object pre-detection strategies;

[0025] If the target attribute description data matches at least one object pre-detection strategy, then a target object recognition model for type prediction of the target object is determined.

[0026] The identified target object recognition model is invoked to predict the type of the target object based on the target attribute description data, thereby obtaining the type prediction result of the target object, and determining whether the target object is an object of interest based on the type prediction result of the target object.

[0027] On the other hand, embodiments of this application provide a policy and model-based object recognition device, the device comprising:

[0028] The acquisition unit is used to acquire the target attribute description data of the target object to be identified and N object pre-detection strategies, where N is a positive integer; an object pre-detection strategy is used to indicate: one or more keywords that need to be associated with the attribute description data of an object under a certain type of object of interest;

[0029] The identification unit is used to perform policy hit detection on the N object pre-detection strategies based on the keywords indicated by the N object pre-detection strategies and the target attribute description data.

[0030] The identification unit is further configured to determine a target object identification model for type prediction of the target object if the target attribute description data matches at least one object pre-detection strategy.

[0031] The identification unit is further configured to call the determined target object identification model to perform type prediction on the target object based on the target attribute description data, obtain the type prediction result of the target object, and determine whether the target object is an object of interest based on the type prediction result of the target object.

[0032] In another aspect, embodiments of this application provide a computer device, the computer device including an input interface and an output interface, the computer device further including:

[0033] A processor, adapted to implement one or more instructions; and computer storage media;

[0034] The computer storage medium stores one or more instructions, which are adapted to be loaded and executed by the processor of the aforementioned policy and model-based object recognition method.

[0035] In another aspect, embodiments of this application provide a computer storage medium storing one or more instructions adapted for loading and execution by a processor of the aforementioned policy and model-based object recognition method.

[0036] In another aspect, embodiments of this application provide a computer program product, which includes a computer program; when the computer program is executed by a processor, it implements the aforementioned policy and model-based object recognition method.

[0037] This application embodiment sets up N object pre-detection strategies. Each object pre-detection strategy indicates one or more keywords that need to be associated with the attribute description data of an object under a certain type of object of interest. After obtaining the target attribute description data of the target object to be identified, the strategy hit detection can be performed on the target attribute description data according to the keywords indicated by the N object pre-detection strategies to initially identify whether the target object is an object of interest. If the target attribute description data hits at least one object pre-detection strategy, it can be determined that the target object may be an object of interest. At this time, a target object recognition model for secondary identification of the target object can be determined, and the target object recognition model is called to predict the type of the target object based on the target attribute description data. Based on the predicted type prediction result, it is determined whether the target object is an object of interest. By combining strategies and models to identify target objects, the accuracy of object identification can be effectively improved; moreover, the entire identification process does not require human intervention, which can effectively improve the efficiency of object identification. Attached Figure Description

[0038] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0039] Figure 1a This is a schematic diagram of the structure of a benchmark object recognition model provided in an embodiment of this application;

[0040] Figure 1b This is a schematic diagram of the architecture of an object recognition scheme provided in an embodiment of this application;

[0041] Figure 1c This is a schematic diagram of the architecture of another object recognition scheme provided in the embodiments of this application;

[0042] Figure 1d This is a schematic diagram illustrating the operating principle of a pre-filtering strategy system provided in an embodiment of this application;

[0043] Figure 2 This is a flowchart illustrating a policy-based model training method provided in an embodiment of this application.

[0044] Figure 3a This is a schematic diagram of the structure of a benchmark object recognition model provided in another embodiment of this application;

[0045] Figure 3b This is a schematic diagram of the structure of another benchmark object recognition model provided in another embodiment of this application;

[0046] Figure 4 This is a schematic diagram of a process for training a benchmark object recognition model using any dataset, as provided in an embodiment of this application.

[0047] Figure 5a This is a flowchart illustrating a type labeling process based on soft deduplication provided in an embodiment of this application;

[0048] Figure 5b This is a schematic diagram of a process for selecting P unlabeled sample objects according to an embodiment of this application;

[0049] Figure 5c This is a schematic diagram of a process for constructing unlabeled data pairs provided in an embodiment of this application;

[0050] Figure 5d This is a schematic diagram of a model training process provided in an embodiment of this application;

[0051] Figure 5e This is a schematic diagram of a learning rate decay curve provided in an embodiment of this application;

[0052] Figure 6 This is a flowchart illustrating a strategy- and model-based object recognition method provided in an embodiment of this application;

[0053] Figure 7 This is a schematic diagram of an adaptive model optimization process provided in an embodiment of this application;

[0054] Figure 8 This is a schematic diagram of the structure of a policy-based model training device provided in an embodiment of this application;

[0055] Figure 9 This is a schematic diagram of the structure of an object recognition device based on strategy and model provided in an embodiment of this application;

[0056] Figure 10 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Detailed Implementation

[0057] The technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings.

[0058] In this application embodiment, the objects mentioned below may include, but are not limited to, the following: goods, advertisements, web pages, audio or video, etc.; the goods mentioned here can be understood as items sold on e-commerce platforms. E-commerce is short for e-commerce, which refers to related service activities for commodity transactions on the Internet. An object may have attributes under x (x is a positive integer) attribute dimensions. In this application embodiment, the data used to describe the object's attributes can be called attribute description data. Specifically, the object's attribute description data may include attribute description text under x attribute dimensions. For example, when the object is a commodity, if the commodity has attributes under four attribute dimensions—commodity name, commodity description, commodity store, and commodity category—then the commodity's attribute description data may include the following four attribute description texts: commodity name, commodity description, commodity store name, and commodity category (i.e., commodity type). The commodity text here may include at least one of the following: commodity details and commodity image OCR (Optical Character Recognition) text (text obtained by translating the text in the commodity image into computer text using character recognition methods). For example, when the object is an advertisement, if the advertisement has attributes under four attribute dimensions, namely, advertisement content dimension, advertisement type dimension, advertising company dimension, and advertisement goal dimension, then the attribute description data of the advertisement may include the following four attribute description texts: advertisement content text, advertisement type, advertising company name, and advertisement goal (i.e., the object, activity, etc. promoted by the advertisement).

[0059] Regardless of the type of object, it can be categorized into multiple types. The type that requires focused attention based on actual needs can be called the "object of concern." For example, when the object is a commodity, it can be divided into prohibited goods and compliant goods. Prohibited goods refer to items that are prohibited from being manufactured, purchased, used, possessed, stored, transported, imported, or exported according to relevant regulations; these can also be understood as goods that are not allowed to be sold on e-commerce platforms. Conversely, compliant goods refer to goods that are allowed to be sold on e-commerce platforms. If the focus is on prohibited goods, then the object of concern will be prohibited goods; if the focus is on compliant goods, then the object of concern will be compliant goods. Similarly, when the object is an advertisement, it can be divided into sensitive advertisements (or prohibited advertisements) and compliant advertisements. Sensitive advertisements refer to advertisements that are not allowed to be published, while compliant advertisements refer to advertisements that are allowed to be published. If the focus is on sensitive advertisements, then the object of concern will be sensitive advertisements; if the focus is on compliant advertisements, then the object of concern will be compliant advertisements. For example, when the object is a video, it can be divided into three categories: film and television videos, short videos (videos with a duration less than a duration threshold), and game videos. If you need to focus on game videos, then the object of your attention can be game videos; if you need to focus on short videos, then the object of your attention can be short videos.

[0060] Furthermore, the object of interest can be further subdivided into various types of objects. The object types obtained by subdividing the object of interest can be called object of interest types. For example, if the object of interest is prohibited goods, prohibited goods can be further subdivided into three types: e-cigarettes, wild animals, and gunpowder. Therefore, the object of interest types could include the following three categories: e-cigarettes, wild animals, and gunpowder. Similarly, if the object of interest is sensitive advertising, sensitive advertising can be further subdivided into two types: advertisements promoting prohibited activities and advertisements promoting harmful information. Therefore, the object of interest types could include the following two categories: advertisements promoting prohibited activities and advertisements promoting harmful information. Likewise, if the object of interest is short videos, short videos can be further subdivided into three types: emotional short videos, fitness short videos, and news short videos. Therefore, the object of interest types could include the following three categories: emotional videos, fitness videos, and news videos.

[0061] To efficiently and accurately identify whether an object is of interest, this application proposes a policy- and model-based object recognition scheme based on Artificial Intelligence (AI) technology. AI technology refers to the theories, methods, techniques, and application systems that utilize digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results. In other words, artificial intelligence is a comprehensive technology within computer science; it primarily aims to understand the essence of intelligence and produce new intelligent machines that can react in a manner similar to human intelligence, enabling these machines to possess multiple functions such as perception, reasoning, and decision-making. Correspondingly, AI technology is a comprehensive discipline, mainly encompassing several major areas including Computer Vision (CV), speech processing, natural language processing, and Machine Learning (ML) / deep learning.

[0062] Machine learning is a multidisciplinary field involving probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory, among others. It specifically studies how computers can simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of AI and the fundamental approach to enabling computer devices to possess intelligence. In contrast, deep learning is a technique that utilizes deep neural network systems for machine learning. Specifically, machine learning / deep learning typically includes various techniques such as artificial neural networks, supervised, unsupervised, and semi-supervised learning. Supervised learning is short for supervised learning, which refers to machine learning tasks that infer functions (such as model parameters) from labeled training datasets. Unsupervised learning is short for unsupervised learning, which refers to machine learning tasks that infer functions by solving various problems in pattern recognition (such as object recognition) based on sample data with unknown (unlabeled) categories. Semi-supervised learning is short for semi-supervised learning, which refers to machine learning tasks that use a large amount of unlabeled data and some labeled data simultaneously to perform pattern recognition and infer functions.

[0063] The policy- and model-based object recognition scheme proposed in this application mainly involves machine learning / deep learning technologies within the AI ​​technologies mentioned above. In specific implementations, the object recognition scheme can be executed by a computer device, which can be a terminal or a server; alternatively, the object recognition scheme can be executed jointly by a terminal and a server. The terminal mentioned here can include, but is not limited to: smartphones, computers (such as tablets, laptops, desktop computers, etc.), smart wearable devices (such as smartwatches, smart glasses), smart voice interaction devices, smart home appliances (such as smart TVs), vehicle terminals, or aircraft, etc. The server mentioned here can be an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms, etc. Furthermore, the terminal and server can be located within or outside the blockchain network, without limitation. Furthermore, terminals and servers can also upload any data stored internally to the blockchain network for storage, in order to prevent the internally stored data from being tampered with and to improve data security.

[0064] Specifically, this object recognition scheme first constructs a baseline object recognition model to predict the type of any object using a model implementation framework. Then, by utilizing semi-supervised or supervised learning techniques in deep learning, the baseline object recognition model is trained in batches using datasets corresponding to various object types, resulting in a target object recognition model for each object type. A target object recognition model can be used to predict the probability that any given object belongs to a corresponding object type based on its attribute description data. It should be understood that the model structure of any target object recognition model is the same as that of the baseline object recognition model; the difference lies in the model parameters. In other words, the target object recognition model can be understood as a baseline object recognition model with optimized model parameters.

[0065] Furthermore, the embodiments of this application do not limit the model construction method involved in the model implementation framework. For example, in one embodiment, the model implementation framework can use a text classification model based on a convolutional neural network (Text CNN) as the aforementioned benchmark object recognition model; in this case, the attribute description texts in any attribute description data can be concatenated, and the concatenated long text can be input into the text classification model for recognition. As another example, in another embodiment, the model implementation framework can customize the model structure of the aforementioned benchmark object recognition model according to the attribute characteristics of the object; if the object has attributes under multiple attribute dimensions, that is, the attribute description data includes attribute description texts under multiple attribute dimensions, then the customized benchmark object recognition model may include a feature extraction network corresponding to each attribute dimension, a feature joint layer (attention layer), and a feedforward network, such as... Figure 1a As shown in the diagram, each feature extraction network independently extracts features from the attribute description text under its corresponding attribute dimension, obtaining the text features of each attribute description text. The feature joint layer performs joint feature processing on the text features output by each feature extraction network according to an attention mechanism, and inputs the resulting joint features into the feedforward network. Correspondingly, the feedforward network performs object type prediction processing based on the input joint features. By customizing the benchmark object recognition model, the attribute description text under each attribute dimension can be analyzed and utilized through the various feature extraction networks within the model, thereby improving the accuracy of object recognition. In other words, by customizing a multi-dimensional and multi-angle benchmark object recognition model, more specific features of multiple object attributes can be extracted through this model, thereby improving the recognition effect by integrating the features of multiple product attributes and performing joint recognition.

[0066] Furthermore, this application does not limit the model training method involved in the model implementation framework. For example, in one embodiment, the model implementation framework can use a traversal labeling method to annotate the types of each object involved in each dataset, thereby using the labeled type labels of each dataset and corresponding object to train a benchmark object recognition model using supervised learning techniques, and then performing model performance verification and online deployment processing on the trained target object recognition models. As another example, in another embodiment, considering that labeling all objects in the dataset would easily lead to low labeling efficiency; and that there may be attribute description data of many highly similar objects in the dataset, repeatedly labeling these highly similar objects would contribute useless redundant information to the model. Based on this, the model implementation framework can also only label the types of dissimilar objects involved in each dataset, thereby using the labeled type labels of each dataset and corresponding object to train a benchmark object recognition model using semi-supervised learning techniques, and then performing model performance verification and online deployment processing on the trained target object recognition models; this approach can effectively improve labeling efficiency and solve the problem that the high overhead of obtaining type label labels prevents the widespread adoption of deep learning models in object recognition scenarios.

[0067] After deploying the target object recognition models using the methods described above, when it is necessary to identify whether an object is an object of interest, the recognition process provided by this object recognition solution mainly includes preliminary identification and filtering by object pre-detection strategies and secondary recognition by the target object recognition model. The object pre-detection strategy refers to the strategy used to initially identify whether any object belongs to a certain object of interest type. It can be used to indicate the keywords that need to be associated with the attribute description data of objects under the object of interest type. In other words, when identifying whether an object is an object of interest, one or more object pre-detection strategies can be used to perform strategy hit detection based on the object's attribute description data to initially identify whether the object is an object of interest. If the object's attribute description data does not hit any object pre-detection strategy, then the object can be considered not an object of interest. If the attribute description data of an object matches at least one object pre-detection strategy, the object can be considered a potential object of interest. In this case, the target object recognition model corresponding to the object of interest type of the matched object pre-detection strategy can be further invoked to predict the object type of the object based on the attribute description data. Based on the predicted type, it can be further determined whether the object is an object of interest, and if the object is an object of interest, it can be determined which specific object of interest the object belongs to.

[0068] In one implementation, the aforementioned object pre-detection strategy can be a word set composed of keywords; or, a word set composed of keywords and exclusion words. In this implementation, the step of using the object's attribute description data to perform strategy hit detection on one or more object pre-detection strategies can essentially be understood as a simplified keyword matching process; when the attribute description data contains keywords from a certain object pre-detection strategy, the object pre-detection strategy can be considered to have been hit. In this case, combined with the aforementioned supervised learning model implementation framework, the architecture of the object recognition scheme proposed in this application can be exemplarily referred to... Figure 1b As shown. In another embodiment, the aforementioned object pre-detection strategy can be configured through a pre-filtering strategy system (or rule engine), which may include one or more rules, each rule including one or more keywords and the logical relationship between the keywords. In this embodiment, the step of using the object's attribute description data to perform strategy hit detection for one or more object pre-detection strategies can essentially be understood as rule matching processing. Each rule can indicate the target keyword to be hit through its included keywords and corresponding logical relationships; then, when the attribute description data contains the target keyword indicated by one or more rules in an object pre-detection strategy, the object pre-detection strategy can be considered to have been hit. In this case, combined with the semi-supervised learning model implementation framework mentioned above, the architecture of the object recognition scheme proposed in this application can be exemplarily referred to as follows. Figure 1c As shown.

[0069] like Figure 1c As shown, the pre-filtering strategy system may include modules such as a configurable strategy submodule, a strategy matching submodule, and a strategy detection submodule. ① The configurable strategy submodule is used to add corresponding object types of interest and pre-detection strategies for each object type to maintenance personnel. It serves as the manual input path for identifying various object types of interest. ② The strategy matching submodule is responsible for matching the attribute description data of each input object one by one according to the pre-defined object pre-detection strategies (i.e., strategy hit detection). The attribute description data of successfully matched objects and the pre-detection strategies of the matched objects are temporarily cached in memory; successfully matched objects are initially designated as objects of interest, and the object type corresponding to the relevant object pre-detection strategy is initially designated as the object type of interest for that object. For example, the input attribute description data includes product name, product details text, product image OCR text, product store name, and product category; see [link to relevant documentation]. Figure 1dAs shown, the strategy matching submodule can perform multi-attribute matching and multi-strategy matching on the input attribute description data. Multi-attribute matching refers to matching each attribute description text in the attribute description data, while multi-strategy matching refers to matching the attribute description data with multiple pre-defined object detection strategies. The strategy detection submodule is responsible for deduplicating, organizing, and summarizing all data matched by the strategy matching submodule (such as successfully matched object attribute descriptions), and finally outputting the deduplicated data in a well-organized format to a specified database or Excel (a spreadsheet file). Furthermore, after inputting the attribute description data of any object into the strategy matching submodule for preliminary determination of whether the object is a target of interest, the strategy detection submodule is also responsible for outputting the preliminary identification result of the object. This preliminary identification result includes: information indicating whether the object is a target of interest (such as prohibited goods), and the preliminary determined target of interest type (such as prohibited goods type) to which the object belongs.

[0070] Furthermore, for any target object recognition model, after the model has been online for a period of time, feedback data (or feedback results) can be collected for that target object recognition model, thereby enabling adaptive optimization of the target object recognition model based on the feedback data. For example... Figure 1c The closed-loop architecture shown allows for credibility testing of collected feedback data, filtering out reliable feedback data. Based on this reliable feedback data, the model implementation framework is reused to further optimize the object recognition model's parameters, and the improved model performance is verified. This approach enables the object recognition model to adaptively optimize, ensuring it can make targeted optimizations based on feedback data, further improving model performance and ultimately enhancing object recognition accuracy.

[0071] Based on the above description of object recognition schemes, this application proposes a policy-based model training method to further illustrate the model training process described in the above scheme description. This policy-based model training method can be executed by the aforementioned computer devices (such as terminals or servers), or by both terminals and servers; for ease of explanation, the following description will use the execution of this policy-based model training method by a computer device as an example. Please refer to... Figure 2 The policy-based model training method may include the following steps S201-S205:

[0072] S201, Obtain attribute description data of multiple initial objects used to train the benchmark object recognition model, and N object pre-detection strategies, where N is a positive integer.

[0073] In this embodiment, an object pre-detection strategy is used to indicate one or more keywords that need to be associated with the attribute description data of an object under a type of interest. In one implementation, each object pre-detection strategy can be pre-constructed using one or more keywords; in this implementation, an object pre-detection strategy can be understood as a word set, and the word set includes one or more keywords that the attribute description data of an object under a type of interest needs to match. That is, the keywords indicated by each object pre-detection strategy in this implementation and the keywords it includes are the same.

[0074] In another implementation, each object pre-detection strategy can be pre-configured through a configurable strategy submodule in a pre-filtering strategy system; the object types and corresponding object pre-detection strategies mentioned in this application embodiment can be set by operations and maintenance personnel according to specific tasks. The strategy configuration methods provided by the pre-filtering strategy system include:

[0075] a) It supports configuring corresponding rules for the attribute description text under each attribute dimension of an object, thereby forming a set of all rules corresponding to that object as an object pre-detection strategy. For example, when the object is a product, it supports configuring corresponding rules for each attribute description text such as product name, product copy (product details copy + product image OCR text), product store name, and product category, achieving diversified rule configuration. Of course, it should be understood that this pre-filtering system can also support configuring corresponding rules for the entire attribute description data; the rules in the object pre-detection strategy configured in this way apply to all attribute description text in the attribute description data.

[0076] b) The rules that can be configured are shown in Table 1 below:

[0077] Table 1

[0078]

[0079] In practical use, maintenance personnel can configure corresponding object pre-detection strategies on the pre-filtering strategy system for each object identification task of the object type of interest, thereby obtaining N object pre-detection strategies. An object pre-detection strategy can be understood as a rule set, which includes one or more rules corresponding to a type of object of interest; and a rule includes one or more keywords and the logical relationships between these keywords (e.g., the logical relationships shown in Table 1). Therefore, it can be seen that the keywords indicated by any object pre-detection strategy in this implementation include the keywords in each rule of that object pre-detection strategy.

[0080] Furthermore, any object pre-detection strategy in this implementation also indicates the logical relationship between the corresponding keywords. The keywords indicated by each object pre-detection strategy and the logical relationship between them can be used to determine the target keywords that the attribute description data of objects under the corresponding object type needs to hit. For example, assuming that the object pre-detection strategy corresponding to object type A indicates the keywords "best" and "most excellent", if the logical relationship between these two keywords is OR, then it can be determined that the target keywords that the attribute description data of objects under object type A needs to hit include "best" and "most excellent"; if the logical relationship between these two keywords is AND and there is no more than one word between the two keywords, then it can be determined that the target keywords that the attribute description data of objects under object type A needs to hit include "best", and "most excellent" which is located after "best" and separated from "best" by one word, and so on. Therefore, setting object pre-detection strategies through configuration rules can bring more configuration flexibility and richness in terms of keyword combination, attribute description text matching, and data detection, thereby effectively improving the accuracy and recall of the initial identification. The accuracy and recall mentioned here are indicators for evaluating classification performance, and the higher the indicator value (or score), the better.

[0081] S202, based on the keywords indicated by the N object pre-detection strategies, use the attribute description data of each initial object to perform strategy hit detection on the N object pre-detection strategies.

[0082] In one implementation, if the object pre-detection strategy is a word set consisting of one or more keywords, then the specific implementation of step S202 may be: for the attribute description data of any initial object, traverse N object pre-detection strategies; detect whether the attribute description data of any initial object includes the keyword indicated by the current object pre-detection strategy being traversed; if it includes, it is determined that the attribute description data of any initial object has hit the current object pre-detection strategy; if it does not include, it is determined that the attribute description data of any initial object has not hit the current object pre-detection strategy.

[0083] In another implementation, if the object pre-detection strategy is a set of rules consisting of one or more rules, that is, any object pre-detection strategy is used not only to indicate one or more keywords that the attribute description data of an object under a certain object type should be associated with, but also to indicate the logical relationship between the corresponding keywords; then, the specific implementation of step S202 can be as follows: for the attribute description data of any initial object, traverse N object pre-detection strategies to determine the current object pre-detection strategy being traversed; based on the keywords and logical relationships in the current object pre-detection strategy, determine the target keywords that the attribute description data of any initial object should hit, and search for the target keywords in the attribute description data of any initial object; if the target keywords are found, it is determined that the attribute description data of any initial object hits the current object pre-detection strategy; if the target keywords are not found, continue to traverse N object pre-detection strategies until all N object pre-detection strategies have been traversed.

[0084] Furthermore, if the rules in the current object pre-detection strategy differentiate between different attribute description texts—that is, if the current object pre-detection strategy is composed of rules configured separately for each attribute description text—then when the computer device determines the target keywords that the attribute description data of any initial object needs to hit based on the keywords and logical relationships in the current object pre-detection strategy, it can determine the target keywords that each attribute description text in the attribute description data of any initial object needs to hit on a single attribute description text basis. Correspondingly, when searching for target keywords in the attribute description data of any initial object, the computer device can also search for the target keywords that need to be hit in each attribute description text in the attribute description data of any initial object.

[0085] For example, the current object pre-detection strategy includes: rule a configured for the attribute description text a of an object of a certain object type of interest, and rule b configured for the attribute description text b of that object. Then, the computer device can determine the target keyword a that the attribute description text a of any initial object needs to hit based on the keywords in rule a and the logical relationships between the keywords, and determine the target keyword b that the attribute description text b of any initial object needs to hit based on the keywords in rule b and the logical relationships between the keywords, thereby searching for the target keyword a in the attribute description text a of any initial object and the target keyword b that the attribute description text b of any initial object needs to hit, in order to determine whether the computer device has hit the current object pre-detection strategy.

[0086] S203, from multiple initial objects, select the initial objects corresponding to the attribute description data that hit at least one object pre-detection strategy, and use them as sample objects for the benchmark object recognition model.

[0087] S204. Based on the object type of the object pre-detection strategy that the attribute description data of each sample object hits, perform data clustering based on object type on the attribute description data of each sample object to obtain multiple datasets.

[0088] Each dataset corresponds to a type of object of interest; and each dataset includes attribute description data of each sample object that has matched the object pre-detection strategy corresponding to the object type of interest. For example, suppose 10 sample objects were selected through step S203, and the distribution of the object pre-detection strategies matched by the attribute description data of each sample object is shown in Table 2 below:

[0089] Table 2

[0090] Sample objects Attribute description data Pre-detection strategy for hit objects Corresponding object types Sample object A Attribute description data A Object pre-detection strategy 1 Type 1 of objects of interest Sample object B Attribute description data B Object pre-detection strategy 3 Type 3 of objects of interest Sample object C Attribute description data C Object pre-detection strategy 2 Type 2 of objects of interest Sample object D Attribute description data D Object pre-detection strategy 1 Type 1 of objects of interest Sample object E Attribute description data E Object pre-detection strategy 2 Type 2 of objects of interest Sample object F Attribute description data F Object pre-detection strategy 2 Type 2 of objects of interest Sample object G Attribute description data G Object pre-detection strategy 1 Type 1 of objects of interest Sample object H Attribute description data H Object pre-detection strategy 3 Type 3 of objects of interest Sample object I Attribute Description Data I Object pre-detection strategy 3 Type 3 of objects of interest Sample object J Attribute description data J Object pre-detection strategy 1 Type 1 of objects of interest

[0091] Therefore, based on the object type of the object pre-detection strategy corresponding to the attribute description data of each of the above 10 sample objects, data clustering based on object type is performed on the attribute description data of the above 10 sample objects, resulting in the three datasets shown in Table 3 below:

[0092] Table 3

[0093]

[0094] As can be seen from the description of steps S201-S204 above, the embodiments of this application can complete the initial screening of attribute description data corresponding to each type of object of interest by setting corresponding object pre-detection strategies for each sub-task of the identification of each type of object of interest in advance. This can narrow the detection range of attribute description data used to train the benchmark object recognition model, so as to facilitate the subsequent learning of the benchmark object recognition model.

[0095] S205, each dataset is used to train the baseline object recognition model to obtain target object recognition models for multiple object types of interest; a target object recognition model is used to predict the probability that any object belongs to the corresponding object type of interest based on the attribute description data of any input object.

[0096] In this embodiment, the benchmark object recognition model may include: a feature extraction network, a feature joint layer, and a feedforward network for each attribute dimension. Accordingly, the training principle of the benchmark object recognition model is roughly as follows: after independent feature extraction by each feature extraction network, the feature joint layer and feedforward network perform joint type prediction and gradient backpropagation training based on the output results of each feature extraction network. Furthermore, the main structure of each feature extraction network can be implemented based on a CNN model, or it can be implemented based on models such as LSTM (Long Short-Term Memory), GRU (Gate Recurrent Unit), or BERT (Bidirectional Encoder Representation from Transformers), without limitation. The following example illustrates the network structure of each feature extraction network by using a CNN model as its basic structure.

[0097] For example, each feature extraction network can consist of a CNN model, which extracts features by performing convolutional processing. Furthermore, considering that the features obtained by the CNN model through convolutional processing may have high dimensionality, dimensionality reduction can be performed on the features obtained by the CNN model to facilitate subsequent processing. In this case, each feature extraction network can consist of a CNN model and a pooling layer; the pooling layer achieves dimensionality reduction by downsampling the features output by the CNN model. Taking the input of the benchmark object recognition model as product attribute description data as an example, a schematic diagram of the benchmark object recognition model in this case can be seen exemplarily. Figure 3a As shown. For example, to extract stronger text structure features from the attribute description text, embodiments of this application can employ a low-level text feature representation method based on character embedding and word embedding to achieve feature extraction from the attribute description text. In this case, each feature extraction network may include two CNN models and one pooling layer; wherein the inputs of the two CNN models are the character vectors and word vectors of the same attribute description text, respectively. In this case, each feature extraction network can capture all Chinese characters and word information at both the character and word feature scales, avoiding the problem of information loss caused by non-standard wording in the object's description text (such as product copy). This multi-scale, multi-dimensional design fits the task characteristics, thereby achieving better recognition results.

[0098] Taking the input of the benchmark object recognition model as the attribute description data of the product as an example, the structural diagram of the benchmark object recognition model can be seen as follows. Figure 3bAs shown. It should be noted that the character vectors and word vectors mentioned here can both be a type of embedding vector. An embedding is a dense vector representation of a word or text. In specific implementations, the character vectors and word vectors of the attribute description text input into any feature extraction network can be obtained by using a character-word vector model to vectorize the attribute description text. The character-word model can be trained on Internet data using a character-word vector training method. The character-word vectors obtained in this way have better adaptability to the task.

[0099] Before implementing step S205, for any dataset, at least two type labels must be set in the benchmark object recognition model according to the type of interest corresponding to any dataset, and these at least two type labels must include at least one type label indicating the type of interest. Then, in step S205, the benchmark object recognition model with the type labels set is trained using any dataset to obtain the target object recognition model for the corresponding type of interest; specifically, this step may include, but is not limited to, the following implementation methods:

[0100] In one implementation, type labels for each sample object in any dataset can be obtained. Then, using the attribute description data and corresponding type labels of each sample object in the dataset, a supervised training model is performed on a benchmark object recognition model to obtain a target object recognition model for the target object type corresponding to the dataset. Specifically, the benchmark object recognition model can be invoked to predict the type of each sample object based on the attribute description data of each sample object in the dataset, obtaining the type prediction result for each sample object. The type prediction result for any sample object may include the predicted probability that any sample object belongs to the object type indicated by each type label in the benchmark object recognition model. Then, the type label corresponding to the highest predicted probability in the type prediction result for each sample object can be used as the type prediction label for each sample object. Based on the difference between the type prediction labels and corresponding type labels of each sample object, the loss value generated by the benchmark object recognition model through the dataset is calculated. The model parameters of the benchmark object recognition model are then optimized in the direction of reducing this loss value to obtain the corresponding target object recognition model.

[0101] The process by which the benchmark object recognition model predicts the type of any sample object based on the attribute description data of any sample object in any dataset, and obtains the type prediction label of any sample object, may include: first, calling each feature extraction network in the benchmark object recognition model to independently extract features from the attribute description text of any sample object in the attribute description data under the corresponding attribute dimension, and obtaining the text features of each attribute description text of any sample object; calling the feature joint layer to perform feature joint processing on the text features of each attribute description text of any sample object according to the attention mechanism, and obtaining the joint features of any sample object; and calling the feedforward network to predict the type of any sample object based on the joint features of any sample object, and obtaining the type prediction result of any sample object.

[0102] In another implementation, Q labeled data pairs and P unlabeled data pairs can be constructed based on the attribute description data in any dataset, where Q and P are both positive integers. A labeled data pair includes: a type label of a labeled sample object and its corresponding attribute description data; an unlabeled data pair includes: attribute description data of an unlabeled sample object and augmented data obtained by augmenting the attribute description data. Then, using the Q labeled data pairs and P unlabeled data pairs, a semi-supervised model training is performed on the baseline object recognition model to obtain the target object recognition model for the target object type corresponding to any dataset. Alternatively, the baseline object recognition model can first be trained in a supervised manner using the Q labeled data pairs to obtain an initial object recognition model; then, the initial object recognition model can be trained in a semi-supervised manner using the Q labeled data pairs and P unlabeled data pairs to obtain the target object recognition model for the target object type corresponding to any dataset. The specific training method for the semi-supervised model training mentioned here will be discussed later. Figure 4 The relevant descriptions will not be repeated here.

[0103] This application's embodiments introduce a baseline object recognition model and train it to obtain a target object recognition model, thereby achieving the object recognition task through the target object recognition model. This improves the efficiency and accuracy of object recognition. Furthermore, during model training, N object pre-detection strategies are set. Each object pre-detection strategy indicates one or more keywords that should be associated with the attribute description data of objects under a certain object type. After obtaining the attribute description data of multiple initial objects used to train the baseline object recognition model, attribute description data associated with each object type can be selected from the attribute description data of the multiple initial objects based on the keywords indicated by the N object pre-detection strategies. This serves as sample data for the baseline object recognition model, ensuring the accuracy of the sample data and improving the subsequent model training effect. It also avoids the problem of resource waste and low training efficiency caused by the baseline object recognition model learning attribute description data unrelated to the object type. Furthermore, by clustering the attribute description data of each sample object into multiple datasets based on the object type of the pre-detection strategy corresponding to the attribute description data of each sample object, and using each dataset to specifically train the benchmark object recognition model, the benchmark object recognition model can consistently and focusedly optimize its model parameters by learning the attribute description data of the dataset corresponding to a single object type. This further improves the model training effect, enabling the trained single target object recognition model to have a strong recognition ability for objects under the corresponding object type, thus further improving the accuracy of object recognition.

[0104] Based on the above Figure 2 The following description, in conjunction with the method embodiments shown, will be used to illustrate the relevant aspects. Figure 4 The flowchart shown above Figure 2 The following is a further description of one implementation of step S205; it should be understood that, in a specific implementation, at least two type labels need to be set in the benchmark object recognition model according to the type of the object of interest corresponding to any dataset, and the at least two type labels must include at least one type label for indicating the type of the object of interest, before proceeding. Figure 4 The process is shown. Specifically, step S205 may include the following steps S2051-S2054:

[0105] S2051. Based on the attribute description data in any dataset, construct Q labeled data pairs and P unlabeled data pairs, where Q and P are both positive integers.

[0106] A labeled data pair includes: a type label for a labeled sample object and corresponding attribute description data; an unlabeled data pair includes: attribute description data for an unlabeled sample object and augmented data obtained by augmenting the attribute description data.

[0107] In one implementation, the computer device can randomly select attribute description data of Q sample objects and attribute description data of P sample objects from any dataset. Next, it can obtain type labels for the Q sample objects, treat these Q sample objects as Q labeled sample objects, and construct Q labeled data using the type labels and corresponding attribute description data of the Q labeled sample objects. Then, it treats all P sample objects as unlabeled sample objects, performs data perturbation augmentation processing on the attribute description data of each unlabeled sample object to obtain augmented data for each unlabeled sample object; and constructs P unlabeled data pairs using the attribute description data and corresponding augmented data of each unlabeled sample object.

[0108] In another implementation, considering that any dataset is obtained by initially filtering the attribute description data of multiple initial objects using N object pre-detection strategies, there may be some initial objects that are not objects of interest, but whose attribute description data happens to include certain keywords from the object pre-detection strategies. In this case, the attribute description data of these initial objects will be misclassified into any dataset. For example, if the object of interest type corresponding to any dataset is wildlife, and the attribute description data of an initial object includes the attribute description text "crocodile brand leather shoes," then because this attribute description text includes the keyword "crocodile" involved in object pre-detection strategy 1, the attribute description data of this initial object will be classified into any dataset. Therefore, any dataset may contain attribute description data that is unrelated to the corresponding object of interest type. For these unrelated attribute description data, a small number of type annotations can be performed on the corresponding sample objects, such as 1000 annotations, and stored in the corresponding database for subsequent model training. Furthermore, considering that some sample objects may have highly similar attribute descriptions during the type labeling process, repeatedly labeling these sample objects does not provide much new information for model training. Therefore, repeatedly labeling highly similar sample objects is a redundant operation and reduces labeling efficiency. Based on this, this application proposes a soft deduplication-based labeling strategy when constructing Q labeled data pairs and P unlabeled data pairs to perform type labeling on non-repeating sample objects, thereby improving labeling efficiency. Figure 5a As shown. Here, "soft deduplication" can be understood as deduplication based on similarity. Accordingly, the specific implementation of step S2051 may include the following steps s11-s17:

[0109] s11, The computer device can select attribute description data of multiple sample objects from any dataset to construct a target training set.

[0110] Specifically, the target training set can be constructed using the attribute description data of all sample objects in any dataset. Alternatively, the target training set can be constructed by randomly selecting the attribute description data of multiple sample objects from any dataset. Another option is to divide any dataset into three subsets according to a preset ratio, using the first subset as the target training set (i.e., using the attribute description data of each sample object in the first subset). In this case, the second subset can be used as the validation set for subsequent model validation, and the third subset as the test set for subsequent model testing. The preset ratio can be set according to actual needs; for example, a 2:2:6 ratio can divide any dataset into 10 equal parts, with the first two parts forming the first subset, the third and fourth parts forming the second subset, and the last six parts forming the third subset.

[0111] s12, The computer device can perform soft deduplication on multiple sample objects in the target training set based on the attribute description data of each sample object in the target training set, and obtain Q sample objects.

[0112] First, the computer device can determine the object features of each sample object based on the attribute description data of each sample object in the target training set. Specifically, since any attribute description data includes multiple attribute description texts, these texts may contain attribute description texts that cannot uniquely describe the object's attributes, in addition to those that can uniquely describe the corresponding object's attributes. For example, for a product, the product name and product description in its attribute description data can usually uniquely describe the product's relevant object attributes; therefore, the product name and product description can be used as attribute description texts that uniquely describe the product. However, product categories and store names usually correspond to multiple products, and these are attribute description texts that cannot uniquely describe the product. Since determining the object features of the corresponding object based on unique attribute description texts can not only improve the accuracy of object features but also effectively reduce processing resources, the computer device can use the attribute description text in the attribute description data of any sample object that uniquely describes the object's attributes as the target attribute description text for that sample object, for any sample object in the target training set. Next, the target attribute description text of each sample object in the target training set is segmented to obtain the corresponding text words for each sample object; and the word frequency matrix of each sample object is constructed using the corresponding text words. Then, the computer device can perform dimensionality reduction hashing (calculating the hash value of the lower dimension) on the word frequency matrix of each sample object to obtain the dimensionality reduction hash value of each sample object; and the dimensionality reduction hash value of each sample object is determined as the object feature of each sample object.

[0113] The computer device can use the Minhash function to perform hash operations on the word frequency matrix of each sample object to obtain the dimensionality-reduced hash value of each sample object. The Minhash function can be understood as a hash function used to calculate low-dimensional hash values; it is a type of Locality Sensitive Hash (LSH) and can be used to quickly estimate the similarity between two features. LSH, mentioned here, is an indexing method for processing high-dimensional vectors. It should be understood that in other embodiments, the computer device can also directly perform word segmentation on the attribute description data of each sample object in the target training set to obtain the corresponding text words for each sample object, thereby performing a series of subsequent calculations to obtain the object features of each sample object. It should also be noted that the above is only an illustrative explanation of how to obtain object features and is not exhaustive; for example, in other embodiments, the computer device can also call a feature extraction model to extract features from the attribute description data of each sample object in the target training set to obtain the object features of each sample object.

[0114] In addition, computer equipment can construct a Locality Sensitive Hash Pool (LSH Pool), which includes one or more feature buckets. A feature bucket can be understood as memory or a database used to store object features. After constructing the LSH Pool, the computer equipment can control the object features of each sample object, sequentially streaming them into the feature buckets within the LSH Pool. Here, "streaming" means controlling the flow of object features from one sample object at a time.

[0115] Then, the computer device determines the current object features of the current sample object to be entered into the Local Sensitive Hash Pool (LSH Pool), and performs a hash mapping on the current object features using an LSH function. Based on the hash mapping result, the computer device allocates a target feature bucket to the current object features in the LSH Pool. Next, the computer device can calculate the feature similarity between the current object features and each existing historical object feature in the target feature bucket, and detect similar sample objects of the current sample object from the sample objects corresponding to each historical object feature based on the feature similarity between the current object features and each existing historical object feature in the target feature bucket. Specifically, for any historical object feature in the target feature bucket, it can determine whether the feature similarity between the current object feature and any historical object feature is greater than a preset threshold; if it is greater, the sample object corresponding to any historical object feature is determined to be a similar sample object of the current sample object; if it is not greater, the sample object corresponding to any historical object feature is determined not to be a similar sample object of the current sample object. If a similar sample object is detected, the current object feature is controlled to enter the target feature bucket; if no similar sample object is detected, the current object feature is controlled to enter the target feature bucket, and the current sample object is added to the set of objects to be labeled.

[0116] It should be noted that when similar sample objects are detected, the computer device can directly assume that the current sample object does not need type labeling. Alternatively, considering the possibility that the current sample object and a sample object a corresponding to a historical object feature 'a' may not be similar, but because the attribute description data of the current sample object and the attribute description data corresponding to sample object a both contain similar attribute description text, the feature similarity between the current object's current feature and the historical object feature 'a' of sample object a exceeds a preset threshold, thus misclassifying sample object a as a similar sample object of the current sample object, in this case, the current sample object should be type-labeled. Based on this, to ensure the accuracy of type labeling, after detecting similar sample objects, the computer device can further calculate the object similarity between the current sample object and the similar sample object based on any distance calculation formula such as edit distance, cosine distance, or Euclidean distance, according to the attribute description data of the current sample object and the similar sample object, and then further determine whether the current sample object and the similar sample object are similar based on this object similarity. If the object similarity is less than the similarity threshold, the current sample object and the similar sample pair are considered dissimilar. In this case, the current sample object can be added to the set of objects to be labeled for subsequent type labeling. If the object similarity is greater than or equal to the similarity threshold, the current sample object and the similar sample pair are indeed similar, and it can be determined that the current sample object does not need type labeling. In other words, in this implementation, for the current sample object, if a similar sample object is detected and the object similarity between the similar sample object and the current sample object is greater than the similarity threshold, the current object feature of the current sample object is stored in the LSH pool. If no similar sample object is detected, or the object similarity between the similar sample object and the current sample object is not greater than the similarity threshold, the current object feature of the current sample object will not only be stored in the LSH pool, but the current sample object will also be added to the set of objects to be labeled.

[0117] The above process is iterated repeatedly until the object features of all sample objects in the target training set are entered into the Local Sensitive Hash Pool (LSH Pool). After the object features of all sample objects are entered into the LSH Pool, the final set of objects to be labeled is obtained. The sample objects in this set are the less similar sample objects, which have greater labeling value. Therefore, the sample objects in the set of objects to be labeled can be determined as Q sample objects after soft deduplication of multiple sample objects, so as to facilitate type labeling of each sample object in the set of objects to be labeled.

[0118] s13, The computer device can obtain the type label of Q sample objects and treat the Q sample objects as Q labeled sample objects.

[0119] s14: Using type labels and corresponding attribute descriptions of Q labeled sample objects, construct Q labeled datasets. Optionally, the Q labeled datasets can also be cached in a database.

[0120] s15, The computer device can select P unlabeled sample objects from the remaining sample objects other than Q sample objects from multiple sample objects.

[0121] In one implementation, the computer device can determine the remaining sample objects (excluding Q sample objects) from a plurality of sample objects, and directly treat each of the remaining sample objects as an unlabeled sample object; in this case, the value of P is equal to the number of sample objects in the remaining sample objects. Alternatively, the computer device can randomly select P sample objects from the remaining sample objects as P unlabeled sample objects; in this case, the value of P can be less than the number of sample objects in the remaining sample objects.

[0122] In another implementation, considering the possibility of imbalanced class distribution among the remaining sample objects (excluding the Q sample objects) in multiple samples, using attribute description data of sample objects with imbalanced class distribution for model training can easily lead to low generalization of the model. Therefore, to improve the subsequent model training effect and enhance the generalization of the trained model, this application proposes a strategy for class balancing (or class equalization) of sample objects based on pseudo-labels to achieve the selection of P unlabeled sample objects. In this implementation, the specific implementation of step s14 can be found in the accompanying documentation. Figure 5b As shown:

[0123] First, a supervised training process is performed on the baseline object recognition model using Q labeled data points to obtain an initial object recognition model. Specifically, the baseline object recognition model can be invoked to predict the type of each labeled sample object based on the attribute description data of each labeled sample object in the Q labeled data points, obtaining the initial type prediction results for each labeled sample object. The initial type prediction result for any labeled sample object can include the predicted probability that any labeled sample object belongs to the object type indicated by each type label in the baseline object recognition model. Then, the type label corresponding to the maximum prediction probability in the initial type prediction results of each labeled sample object can be used as the type prediction label for each labeled sample object. Based on the difference between the type prediction labels and the corresponding type labels of each labeled sample object, the target loss value generated by the baseline object recognition model through the Q labeled data points is calculated, and this target loss value is used for gradient backpropagation to optimize the model parameters of the baseline object recognition model. This process is iterated until the target loss value does not decrease significantly or the number of iterations reaches a threshold, at which point training stops, and the baseline object recognition model at this point is used as the initial object recognition model (denoted as M0). The target loss value (H) is... θ (y) can be calculated using the following formula 1.1:

[0124]

[0125] In Formula 1.1 above, i represents the data index, p θ (x i ) represents the type prediction label of the i-th labeled object, y i The type label (i.e., the real label) of the i-th labeled object can be in one-hot format, such as 0 or 1. m represents the number of data in the current batch. In this embodiment, since Q labeled data are used for training each time, the value of m can be equal to Q. In other embodiments, the Q labeled data can be divided into multiple batches, and one batch can be used for model training each time. In this case, the value of m is less than Q.

[0126] Additionally, the remaining sample objects (excluding Q sample objects) from multiple sample objects can be identified, and each of these remaining sample objects can be considered a candidate sample object. Then, the initial object recognition model can be invoked to predict the type of each candidate sample object based on its attribute description data, and the pseudo-label of each candidate sample object can be determined based on the predicted type results. The type prediction result for any candidate sample object includes the predicted probability that the candidate sample object belongs to the object type indicated by each type label in the initial object recognition model. The computer device can obtain the pseudo-label of any candidate sample object as follows: determine the type label corresponding to the highest predicted probability in the type prediction result of any candidate sample object; if the determined type label is a type label used to indicate the type of interest corresponding to any dataset, then a white label is used as the pseudo-label of the type of any candidate sample object; otherwise, a black label is used as the pseudo-label of the type of any candidate sample object. The white label indicates that any candidate sample object belongs to the type of interest corresponding to any dataset, and the black label indicates that any candidate sample object does not belong to the type of interest corresponding to any dataset.

[0127] For example: Suppose the object of interest for any dataset is an e-cigarette, and the initial object recognition model includes: type label 1 indicating e-cigarette type, and type label 2 indicating other types. If the type prediction result for any candidate sample object is as follows: the prediction probability for type label 1 is 0.3, and the prediction probability for type label 2 is 0.7; then, since the maximum prediction probability in this type prediction result is 0.7, and the type label corresponding to 0.7 is not type label 1 indicating e-cigarette type, the black label can be used as the pseudo-label for the type of any candidate sample object. If the type prediction result for any candidate sample object is as follows: the prediction probability for type label 1 is 0.8, and the prediction probability for type label 2 is 0.2; then, since the maximum prediction probability in this type prediction result is 0.8, and the type label corresponding to 0.8 is type label 1 indicating e-cigarette type, the white label can be used as the pseudo-label for the type of any candidate sample object.

[0128] After obtaining the type pseudo-labels for each candidate sample object, type equalization can be performed on each candidate sample object based on its type pseudo-label. Then, based on the type equalization result, P candidate sample objects are selected from all candidate sample objects as P unlabeled sample objects, ensuring that the number of black and white samples among the P unlabeled sample objects is as balanced as possible. Here, a white sample refers to a sample object with a white label, and a black sample refers to a sample object with a black label. Specifically, since the type pseudo-label of any candidate sample object is either black or white, when the computer device performs type equalization on each candidate sample object based on its type pseudo-label, it can count the number of black labels and the number of white labels among the type pseudo-labels of all candidate sample objects. From the candidate sample objects corresponding to the fewer labels, a first number of candidate sample objects are selected. Then, according to a downsampling strategy (such as random selection), a second number of candidate sample objects are selected from the candidate sample objects corresponding to the more numerous labels. For example, if the number of black labels is greater than the number of white labels, a first number of candidate sample objects can be selected from the candidate sample objects corresponding to white labels, and a second number of candidate sample objects can be selected from the candidate sample objects corresponding to black labels according to a downsampling strategy (such as random selection). The second number is greater than or equal to the first number, and the ratio between the second number and the first number must be less than a preset ratio (such as 2). Then, both the first number of candidate sample objects and the second number of candidate sample objects are treated as unlabeled sample objects. In this case, the P-value is equal to the sum of the first number and the second number.

[0129] s16 performs data perturbation augmentation on the attribute description data of each unlabeled sample object to obtain augmented data for each unlabeled sample object.

[0130] In one implementation, augmented data can be obtained through back-translation, where back-translation refers to translating the translation B of text in language A into a processing method in language A. Based on this, when any attribute description data includes multiple attribute description texts, and each attribute description text is represented in a first language, the computer device, when executing step s16, can select at least one attribute description text from the attribute description data of the p-th unlabeled sample object; where p∈[1, P]. Then, each selected attribute description text is translated into text represented in a second language, obtaining a translation result corresponding to each selected attribute description text. Next, the translation result corresponding to each selected attribute description text is back-translated into text represented in the first language, obtaining a back-translation result for each selected attribute description text; thus, using the back-translation result of each selected attribute description text and the unselected attribute description texts in the attribute description data of the p-th unlabeled sample object, augmented data for the p-th unlabeled sample object is constructed.

[0131] For example, see Figure 5c As shown, suppose the attribute description data of the p-th unlabeled sample object includes the Chinese product name, product details text, and product image OCR text. The computer device can first select all attribute description texts, such as product name, product details text, and product image OCR text, from the attribute description data. Then, it can call the translation interface to translate the product name, product details text, and product image OCR text into Chinese and English, obtaining the English corresponding to the product name, the English corresponding to the product details text, and the English corresponding to the product image OCR text. Then, it can translate these three English texts back into Chinese, obtaining the back-translation results corresponding to the product name, the product details text, and the product image OCR text. These three back-translation results can constitute the augmented data of the p-th unlabeled sample object.

[0132] In another implementation, the computer device can use synonym replacement to obtain augmented data. Synonym replacement refers to randomly selecting n non-stop words in the text and replacing each selected non-stop word with its corresponding synonym. Non-stop words refer to words other than stop words, and stop words may include, but are not limited to, English characters, numbers, mathematical characters, punctuation marks, and frequently used single Chinese characters. Based on this, when executing step s16, the computer device can select at least one attribute description text from the attribute description data of the p-th unlabeled sample object. It can replace one or more non-stop words in each selected attribute description text with their corresponding synonyms to obtain at least one replaced attribute description text. Using at least one replaced attribute description text and the unselected attribute description text, augmented data for the p-th unlabeled sample object is constructed.

[0133] In another implementation, the computer device can also use random insertion to obtain augmented data. Random insertion refers to the process of randomly selecting a non-stop word in the text, randomly choosing its synonym, and inserting it at any position in the text. In this implementation, the specific implementation of step s16 is similar to that of the synonym replacement implementation, and will not be repeated here. In addition, for any text, the random insertion action can be performed once or repeated multiple times, without limitation.

[0134] In another implementation, the computer device may also use a random swapping method to obtain augmented data. Random swapping refers to the process of randomly selecting two words in the text and exchanging their positions. In this implementation, the specific implementation of step s16 is similar to that corresponding to synonym replacement, and will not be repeated here. Furthermore, for any given text, the random swapping action can be performed once or repeatedly, without limitation.

[0135] It should be noted that the above are just a few examples of augmentation methods, and not an exhaustive list. For example, computer devices can also use random deletion to obtain augmented data. Random deletion refers to the process of randomly deleting one or more words from the text.

[0136] s17 uses the attribute description data and corresponding augmented data of each unlabeled sample object to construct P unlabeled data pairs.

[0137] S2052, the baseline object recognition model is invoked to predict the type of the corresponding labeled sample object based on the attribute description data in each labeled data, and the target type prediction result of each labeled sample object is obtained.

[0138] As mentioned above, any attribute description data includes attribute description text under multiple attribute dimensions, and the benchmark object recognition model includes a feature extraction network, a feature joint layer, and a feedforward network corresponding to each attribute dimension. Therefore, the specific implementation of step S2052 can be as follows: For any labeled sample object, each feature extraction network in the benchmark object recognition model is invoked to independently extract features from the attribute description text under the corresponding attribute dimension in the attribute description data of the corresponding labeled data, obtaining the text features of each attribute description text. Then, the feature joint layer is invoked to perform feature joint processing on the text features of each attribute description text according to the attention mechanism, obtaining joint features. Next, the feedforward network is invoked to perform type prediction on any labeled sample object based on the joint features, obtaining the target type prediction result for any labeled sample object. The target type prediction result for any labeled sample object includes: the predicted probability that any labeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model.

[0139] S2053, the baseline object recognition model is invoked to predict the type consistency according to the prediction objective. Based on the attribute description data and corresponding augmented data in each unlabeled data pair, the type prediction is performed on the corresponding unlabeled sample object, and two type prediction results are obtained for each unlabeled sample object.

[0140] The prediction objective of type consistency refers to the fact that the type prediction results obtained based on the attribute description data in the same unlabeled data pair should have a consistent probability distribution with the type prediction results obtained based on the corresponding augmented data. The prediction objective of type consistency is equivalent to setting a target for the model's generalization ability and using a large number of unlabeled data pairs to guide the model towards this target. In this embodiment, the MSE function (a function that calculates the mean of the sum of squares of the errors between corresponding points between two data points) can be set as the consistency prediction loss function corresponding to the unlabeled data pair, so that the training process optimizes the model parameters with the goal of reducing the value of this consistency loss function.

[0141] It should be noted that the method of calling the basic object recognition model to predict the type based on the attribute description data in any unlabeled data pair, and the method of predicting the type based on the augmented data in any unlabeled data pair, are similar to the specific implementation of the aforementioned step S2052, and will not be repeated here.

[0142] S2054, based on the target type prediction result and corresponding type label for each labeled sample object, and the difference between the two type prediction results for each unlabeled sample object, optimize the model parameters of the baseline object recognition model to obtain the target object recognition model for the target object type corresponding to any dataset. In a specific implementation, step S2054 may include the following steps s21-s24:

[0143] s21, The computer device can determine the labeled loss value of the benchmark object recognition model based on the target type prediction result and the corresponding type label for each labeled sample object.

[0144] As mentioned above, the target type prediction result for any labeled sample object includes the predicted probability that any labeled sample object belongs to the object type indicated by each type label in the baseline object recognition model. Based on this, in one embodiment, the computer device can determine the labeled loss value of the baseline object recognition model according to the difference between the type label corresponding to the maximum predicted probability in each of the Q target type prediction results and the corresponding type label of the labeled sample object. The specific calculation method can be found in Formula 1.1 above, and will not be repeated here.

[0145] In another implementation, considering that during joint training with labeled and unlabeled data, the model may quickly overfit to the labeled data due to the limited amount of labeled data, this application proposes a signal mitigation strategy to prevent rapid overfitting of the model to labeled data during training. The basic principle of signal mitigation is to exclude labeled data with overly confident predictions of labeled objects when calculating the labeled loss value during training; that is, to exclude labeled data with excessively high confidence (overly high prediction probability). The error of this portion of labeled data cannot be backpropagated, thus preventing the model from further overfitting to this labeled data. Specifically, at time t during training, a first threshold ηt is set, where 1 / K ≤ ηt ≤ 1, where K is the number of classes. When the maximum confidence (i.e., the maximum prediction probability) pθ(y*|x) calculated based on a certain labeled data is greater than the first threshold ηt, that labeled data is removed from the process of calculating the labeled loss value, and the labeled loss value is calculated only based on the remaining labeled data in the current batch.

[0146] Based on this, the specific implementation of step s21 can be as follows: traverse the target type prediction results of each labeled sample object obtained by prediction; if the maximum prediction probability in the current target type prediction result is greater than the first threshold, then determine the current target type prediction result as a buffer signal; after all Q target type prediction results have been traversed, use the determined buffer signals to perform signal buffering processing on the Q target type prediction results to remove each buffer signal from the Q target type prediction results; then, determine the labeled loss value of the benchmark object recognition model based on the difference between the type label corresponding to the maximum prediction probability in the remaining target type prediction results and the type label of the corresponding labeled sample object.

[0147] s22, The computer device determines the unlabeled loss value of the baseline object recognition model based on the difference between the two type prediction results for each unlabeled sample object.

[0148] The execution order of steps s21 and 22 is not limited in this embodiment. Step s21 can be executed first, followed by step s22; step s22 can be executed first, followed by step s21; or steps s21 and s22 can be executed simultaneously.

[0149] In one implementation, the computer device may employ a consistency loss function to calculate the loss value based on the two type prediction results for each unlabeled sample object, thereby obtaining a consistent prediction loss value (using U). θ (u′,u) represents this. The formula for the consistency prediction loss function mentioned here can be found in Equation 1.2 below:

[0150]

[0151] In Formula 1.2 above, i represents the data index, p θ (u i ) represents the type prediction result corresponding to the attribute description data in the i-th unlabeled data pair, p θ (u′ i ) represents the type prediction result corresponding to the augmented data in the i-th unlabeled data pair; n represents the number of unlabeled sample objects participating in the calculation of the consistency loss value.

[0152] After obtaining the consistency prediction loss value, the computer device directly determines the consistency loss value as the unlabeled loss value of the baseline object recognition model. Alternatively, the computer device can also use the information entropy loss function, calculate the loss value based on the two type prediction results for each unlabeled sample object, obtain the information entropy loss value, and integrate the consistency loss value and the information entropy loss value to obtain the unlabeled loss value of the baseline object recognition model. Here, one type prediction result corresponds to one probability distribution, and the information entropy loss function can be, for example, the KL divergence function, where KL divergence is equivalent to the difference in the information entropy (Shannon entropy) of two probability distributions.

[0153] In another implementation, considering that when labeled data is scarce, the model's understanding of the samples is insufficient, which may result in a relatively flat predicted distribution of unlabeled data. Consequently, when calculating the overall model loss value, the main contribution will come from labeled data, which contradicts the idea of ​​using unlabeled data for model training. Therefore, to improve model training performance, this application proposes a signal sharpening strategy to reduce the flatness of the predicted distribution of unlabeled data, thereby utilizing a richer predicted distribution to calculate the unlabeled loss value, which is more beneficial for model training.

[0154] Based on this, the specific implementation of step s22 can be as follows: The two type prediction results of each unlabeled sample object are used as the two type signals of each unlabeled sample object; then, according to the signal sharpening strategy, the two type signals of each unlabeled sample object are processed to obtain the signal sharpening result; and based on the signal sharpening result and the difference between the two type signals of at least one unlabeled sample object, the unlabeled loss value of the benchmark object recognition model is determined. As mentioned above, a type signal of any unlabeled sample object includes the predicted probability that any unlabeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; based on this, the signal sharpening strategy can include one or more of the following: ① masking processing based on prediction probability (or masking processing based on confidence), ② minimizing the information entropy of the type signal obtained based on augmented data, etc. Among them, the masking processing based on prediction probability means that unlabeled sample objects with poor prediction performance (i.e., unlabeled sample objects whose maximum prediction probability is less than the second threshold) are not included in the calculation of the unlabeled loss value; minimizing the information entropy of the type signal obtained based on the augmented data means that when calculating the unlabeled loss value, the calculation of the information entropy of the type signal obtained based on the augmented data is added, so that the augmented data can be used for a lower information entropy.

[0155] When the signal sharpening strategy includes masking based on prediction probability, the computer device performs signal sharpening processing on the two types of signals of each unlabeled sample object according to the signal sharpening strategy to obtain the signal sharpening result. It can traverse P unlabeled sample objects. If the maximum prediction probability of at least one type of signal in the two types of signals of the currently traversed unlabeled sample object is less than the second threshold, then the current unlabeled sample object and the corresponding two types of signals are masked. After all P unlabeled sample objects have been traversed, each unlabeled sample object that has been masked is added to the signal sharpening result.

[0156] Accordingly, in this implementation, the method for determining the unlabeled loss value of the benchmark object recognition model based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object may include: taking all unlabeled sample objects that are not located in the signal sharpening result among P unlabeled sample objects as valid unlabeled sample objects; calculating the type consistency loss value corresponding to each valid unlabeled sample object based on the difference between the two types of signals of each valid unlabeled sample object; and determining the unlabeled loss value of the benchmark object recognition model based on the type consistency loss value corresponding to each valid unlabeled sample object.

[0157] Specifically, when the signal sharpening strategy includes: masking processing based on prediction probability, when the computer device performs signal sharpening processing on the two types of signals of each unlabeled sample object according to the signal sharpening strategy to obtain the signal sharpening result, it can determine the type signal predicted based on the augmented data of any unlabeled sample object from the two types of signals of any unlabeled sample object; calculate the information entropy of the augmented data of any unlabeled sample object according to the type labels in the determined type signals and the corresponding prediction probabilities; and add the calculated information entropy of the augmented data of any unlabeled sample object to the signal sharpening result.

[0158] Accordingly, in this implementation, the method for determining the unlabeled loss value of the benchmark object recognition model based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object may include: calculating the type consistency loss value corresponding to each unlabeled sample object based on the difference between the two types of signals of each unlabeled sample object; and summing the information entropy in the signal sharpening result and the type consistency loss value corresponding to each unlabeled sample object to obtain the unlabeled loss value of the benchmark object recognition model.

[0159] It should be noted that in practical applications, the two strategies mentioned above—masking based on prediction probability and minimizing the information entropy of the type signal obtained from augmented data—can be used in combination. That is, the signal sharpening strategy can simultaneously include: masking based on prediction probability and minimizing the information entropy of the type signal obtained from augmented data. In this case, when the computer device performs signal sharpening processing on the two type signals of each unlabeled sample object according to the signal sharpening strategy to obtain the signal sharpening result, it can first traverse P unlabeled sample objects. If, in the two type signals of the currently traversed unlabeled sample object, at least one type signal has a maximum prediction probability less than a second threshold, then masking processing is performed on the current unlabeled sample object and its corresponding two type signals. After all P unlabeled sample objects have been traversed, each unlabeled sample object that has undergone masking processing is added to the signal sharpening result. Furthermore, the information entropy of the augmented data of each unlabeled object that has not been masked is calculated; the calculated information entropy is added to the signal sharpening result; the method for calculating the information entropy of the augmented data of any unlabeled object that has not been masked can be found in the method for calculating the information entropy of the augmented data of any unlabeled sample object mentioned above, and will not be repeated here.

[0160] Accordingly, in this implementation, the method for determining the unlabeled loss value of the benchmark object recognition model based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object may include: taking all unlabeled sample objects that are not located in the signal sharpening result among the P unlabeled sample objects as valid unlabeled sample objects; calculating the type consistency loss value corresponding to each valid unlabeled sample object based on the difference between the two types of signals of each valid unlabeled sample object; and summing the type consistency loss value corresponding to each valid unlabeled sample object and the information entropy in the signal sharpening result to obtain the unlabeled loss value of the benchmark object recognition model.

[0161] s23, The computer equipment can perform joint loss value calculation on the labeled loss value and the unlabeled loss value to obtain the model loss value of the benchmark object recognition model.

[0162] In a specific implementation, the computer device can determine a first weight for the labeled loss value and a second weight for the unlabeled loss value. Then, based on the first and second weights, the labeled and unlabeled loss values ​​can be weighted and summed to obtain the model loss value. In this embodiment, the relationship between the first and second weights is not limited; for example, the first weight can be greater than the second weight. Let the first weight be denoted by λ, the second weight be set to 1, and let the labeled loss value be denoted by H, the unlabeled loss value by U, and the model loss value by L(y). Then, the formula for calculating the model loss value can be found in Formula 1.3 below:

[0163] L(y)=U+λH Equation 1.3

[0164] s24, The computer device can optimize the model parameters of the benchmark object recognition model based on the model loss value.

[0165] In one implementation, the computer device can calculate the gradient of the benchmark object recognition model based on the model loss value to obtain the backpropagation gradient of the benchmark object recognition model, and determine the historical learning rate of the benchmark object recognition model, which refers to the learning rate most recently used by the benchmark object recognition model before the current optimization of the benchmark object recognition model; then, the model parameters of the benchmark object recognition model can be optimized based on the backpropagation gradient and the historical learning rate.

[0166] In another implementation, considering that a limited amount of labeled data may cause the model to fall into local optima early, which is detrimental to model training, this application proposes a model parameter update method based on a learning rate decay strategy to improve model training performance. Figure 5d As shown. Specifically, the computer device can calculate the gradient of the benchmark object recognition model based on the model loss value to obtain the backpropagation gradient of the benchmark object recognition model; and determine the historical learning rate of the benchmark object recognition model, and perform regular decay processing on the historical learning rate to obtain the target learning rate; then, based on the backpropagation gradient and the target learning rate, optimize the model parameters of the benchmark object recognition model. The computer device can use any of the following decay methods to regularly decay the historical learning rate: CDRLR (Cosine Decay Restarts Learning Rate), Cyclic learning rate decay, or polynomial learning rate decay; see [link to relevant documentation]. Figure 5e As shown, the Cosine cyclic decay learning rate has the characteristic of periodicity, which makes the learning rate change regularly. This regular change in the learning rate helps the model escape local extrema and find better extrema.

[0167] It should be noted that after optimizing the baseline object recognition model through the above steps, the model's performance can be tested using a pre-defined test set. If the test results exceed the set performance threshold, the test is considered successful. In this case, the optimized baseline object recognition model can be automatically used as the target object recognition model for any dataset's corresponding object type, and the target object recognition model can be deployed online. If the test fails, the operations and maintenance personnel can determine, based on the actual situation, whether to increase the training samples for further optimization of the optimized baseline object recognition model or adjust the performance threshold until the test is successful.

[0168] In addition, the above Figure 4 In the illustrated method embodiment, steps S2052-S2054 are explained using a baseline object recognition model as an example. In practical applications, if the baseline object recognition model has been trained in a supervised manner to obtain an initial object recognition model when determining P unlabeled objects in step S2051, the computer device can also execute steps S2052-S2054 based on the initial object recognition model. That is, in this case, when the computer device executes steps S2052 and S2053, it calls the initial object recognition model to perform type prediction. Furthermore, when executing step S2054, it optimizes the model parameters of the initial object recognition model based on the target type prediction result and corresponding type label for each labeled sample object, as well as the difference between the two type prediction results for each unlabeled sample object, to obtain a target object recognition model for the type of interest corresponding to any dataset.

[0169] Based on the above description, the embodiments of this application can have the following beneficial effects: ① By performing soft deduplication on the complex sample objects, highly similar sample objects can be accurately removed, thereby ensuring that the sample objects after soft deduplication are labeled with their types. This reduces redundant work in the labeling process and effectively improves labeling efficiency. ② By designing a semi-supervised learning framework, the model's dependence on labeled data can be significantly reduced while ensuring the model training effect. This allows for the training of a target object recognition model with superior performance even with a small amount of labeled data. ③ Through various processing methods such as signal sharpening, signal mitigation, and learning rate decay, the model's dependence on labeled data can be significantly reduced while further improving the model training effect.

[0170] In another embodiment, this application also proposes a policy- and model-based object recognition method; in this embodiment, the implementation of the policy- and model-based object recognition method by a computer device is still used as an example for illustration. Please refer to... Figure 6The policy- and model-based object recognition method may include the following steps S601-S605:

[0171] S601, obtain the target attribute description data of the target object to be identified and N object pre-detection strategies, where N is a positive integer; an object pre-detection strategy is used to indicate: one or more keywords that need to be associated with the attribute description data of an object under a certain object type.

[0172] S602, based on the keywords indicated by the N object pre-detection strategies, use the target attribute description data to perform strategy hit detection on the N object pre-detection strategies.

[0173] Since the principle of using target attribute description data to perform policy hit detection on N object pre-detection strategies based on each keyword indicated by N object pre-detection strategies is the same as the principle of using attribute description data of each initial object to perform policy hit detection on N object pre-detection strategies based on each keyword indicated by N object pre-detection strategies, the specific implementation of step S602 can be found in the relevant description of step S202 in the aforementioned application embodiments, and will not be repeated here.

[0174] S603, if the target attribute description data matches at least one object pre-detection strategy, then determine the target object recognition model used for type prediction of the target object.

[0175] In practical implementation, if the target attribute description data matches at least one object pre-detection strategy, then the target object recognition model corresponding to the matched object pre-detection strategy under the target object type can be determined as the target object recognition model for type prediction of the target object. It should be understood that the target object recognition model under any target object type can adopt the aforementioned... Figure 2 The method embodiment shown is obtained.

[0176] S604, the determined target object recognition model is invoked to predict the type of the target object based on the target attribute description data, and the type prediction result of the target object is obtained.

[0177] In a specific implementation, the target attribute description data may include target attribute description text under multiple attribute dimensions, and the determined target object recognition model includes a feature extraction network, a feature joint layer, and a feedforward network corresponding to each attribute dimension. Accordingly, the specific implementation of step S604 can be as follows: Each feature extraction network in the determined target object recognition model is invoked to independently extract features from the target attribute description text under the corresponding attribute dimension in the target attribute description data, obtaining text features for each target attribute description text; the feature joint layer is invoked to perform feature joint processing on the text features of each target attribute description text according to an attention mechanism, obtaining joint features; the feedforward network is invoked to perform type prediction on the target object based on the joint features, obtaining the type prediction result of the target object. The type prediction result of the target object includes the predicted probability that the target object belongs to the object type indicated by each type label in the determined target object recognition model.

[0178] Taking the target object as the target product as an example, when executing step S604, the computer device can call each feature extraction network in the determined target object recognition model to independently extract features from the product name, product text (such as product details text + product image OCR text), product store name, and product category in the target attribute description data of the target product. Then, the feature joint layer is called to perform feature joint processing on the text features output by each feature extraction network according to the attention mechanism. Finally, the feedforward network is called to predict the type of the target product based on the joint features output by the feature joint layer, and the type prediction result of the target product is obtained.

[0179] S605, determine whether the target object is an object of interest based on the type prediction result of the target object.

[0180] In the specific implementation, the maximum predicted probability is determined from the type prediction results of the target object. If the type label corresponding to the determined maximum predicted probability is a type label used to indicate the type of the object of interest, then the target object is determined to be an object of interest, and the type of the object of interest to which the target object belongs is the type of the object of interest indicated by the type label corresponding to the maximum predicted probability. If the type label corresponding to the determined maximum predicted probability is a type label used to indicate the type of other objects, then the target object is determined not to be an object of interest.

[0181] Optionally, in order for the model to be optimized over the long term, this application also proposes a model improvement mechanism based on automatic learning and optimization of feedback data, so that the target object recognition model has the ability to perform custom optimization and can automatically improve itself over the long term.

[0182] Based on this, the computer device can also obtain multiple feedback results for the determined target object recognition model. One feedback result indicates that the type prediction result obtained by the determined target object recognition model based on the attribute description data of an object is inaccurate. Then, the computer device can perform a credibility test on each of the multiple feedback results to filter out the credible feedback results. Specifically, for any feedback result, the computer device can call the determined target object recognition model to perform type prediction on the corresponding object based on the attribute description data corresponding to any feedback result, and obtain the type prediction result of the corresponding object. If the maximum prediction probability in the type prediction result of the corresponding object is less than a third threshold, the corresponding object is determined to be a misjudged object, and any feedback result can be marked as a credible feedback result. If the maximum prediction probability in the type prediction result of the corresponding object is greater than or equal to the third threshold, the attribute description data corresponding to any feedback result is sent to the operation and maintenance personnel for verification to determine whether any feedback result is credible. If it is determined to be credible, any feedback result is marked as a credible feedback result; otherwise, any feedback result is marked as an unreliable feedback result.

[0183] After selecting reliable feedback results, the computer device can select attribute description data for one or more objects from the attribute description data corresponding to the reliable feedback results. Specifically, the computer device can divide the attribute description data corresponding to the reliable feedback results into a test set, a validation set, and a training set according to a ratio (e.g., a 2:2:6 ratio), thereby selecting the attribute description data for each object in the test set. Alternatively, the computer device can directly and randomly select attribute description data for one or more objects from the attribute description data corresponding to the reliable feedback results. Then, the computer device can determine the type label for each selected object and add the type label and corresponding attribute description data of each selected object to the labeled dataset of the determined target object recognition model. Based on the added labeled dataset, the determined target object recognition model can be adaptively optimized. The labeled dataset of the determined target object recognition model is the set of Q labeled data involved in the aforementioned optimization process to obtain the determined target object recognition model; see [link to relevant documentation]. Figure 7 As shown, the computer device can adaptively optimize the determined target object recognition model based on the added labeled dataset by: using the added labeled dataset and the P unlabeled data pairs involved in the process of optimizing the determined target object recognition model, the determined target object recognition model is trained again in a semi-supervised manner to obtain the optimized target object recognition model.

[0184] Furthermore, after obtaining the optimized target object recognition model, the model performance can be tested. If the performance of the optimized target object recognition model is better than or equal to that of the unoptimized target object recognition model, the optimized target object recognition model will be deployed. If the performance of the optimized target object recognition model is lower than (worse than) that of the unoptimized target object recognition model, the optimized target object recognition model will not be deployed, and a message will be sent to the operations and maintenance personnel to investigate the cause.

[0185] This application embodiment sets up N object pre-detection strategies. Each object pre-detection strategy indicates one or more keywords that need to be associated with the attribute description data of an object under a certain type of object of interest. After obtaining the target attribute description data of the target object to be identified, the strategy hit detection can be performed on the target attribute description data according to the keywords indicated by the N object pre-detection strategies to initially identify whether the target object is an object of interest. If the target attribute description data hits at least one object pre-detection strategy, it can be determined that the target object may be an object of interest. At this time, a target object recognition model for secondary identification of the target object can be determined, and the target object recognition model is called to predict the type of the target object based on the target attribute description data. Based on the predicted type prediction result, it is determined whether the target object is an object of interest. By combining strategies and models to identify target objects, the accuracy of object identification can be effectively improved; moreover, the entire identification process does not require human intervention, which can effectively improve the efficiency of object identification. Therefore, the embodiments of this application, through the organic combination of modules such as pre-filtering strategy and multi-dimensional, multi-scale target object recognition model, form a two-level recognition framework that satisfies the requirements of high-precision and high-recall detection of objects of interest.

[0186] Based on the description of the policy-based model training method embodiments above, this application also discloses a policy-based model training apparatus, which can be a computer program (including program code) running on a computer device. This policy-based model training apparatus can execute... Figure 2 or Figure 4 The method flow is shown below. Please refer to [link / reference]. Figure 8 The policy-based model training device can operate the following units:

[0187] The acquisition unit 801 is used to acquire attribute description data of multiple initial objects for training the benchmark object recognition model, and N object pre-detection strategies, where N is a positive integer; wherein, an object pre-detection strategy is used to indicate one or more keywords that need to be associated with the attribute description data of an object under a certain object type.

[0188] The processing unit 802 is used to perform policy hit detection on the N object pre-detection strategies based on the keywords indicated by the N object pre-detection strategies and the attribute description data of each initial object.

[0189] The processing unit 802 is further configured to select, from the plurality of initial objects, the initial objects corresponding to the attribute description data that hit at least one object pre-detection strategy, as sample objects of the benchmark object recognition model;

[0190] The processing unit 802 is also used to perform data clustering processing on the attribute description data of each sample object based on the object type of the object pre-detection strategy hit by the attribute description data of each sample object, to obtain multiple datasets, with each dataset corresponding to one object type.

[0191] Training unit 803 is used to train the baseline object recognition model using each dataset to obtain target object recognition models for multiple object types of interest; a target object recognition model is used to predict the probability that any object belongs to the corresponding object type of interest based on the attribute description data of any input object.

[0192] In one implementation, any object pre-detection strategy is further used to indicate the logical relationship between the corresponding keywords; correspondingly, when the processing unit 802 performs strategy hit detection on the N object pre-detection strategies based on the keywords indicated by the N object pre-detection strategies and using the attribute description data of each initial object, it can specifically be used to:

[0193] For any initial object's attribute description data, traverse the N object pre-detection strategies to determine the current object's pre-detection strategy.

[0194] Based on the keywords and logical relationships in the current object pre-detection strategy, determine the target keywords that the attribute description data of any initial object needs to hit, and search for the target keywords in the attribute description data of any initial object;

[0195] If the target keyword is found, it is determined that the attribute description data of any initial object matches the current object pre-detection strategy; if the target keyword is not found, the process continues to traverse the N object pre-detection strategies.

[0196] In another implementation, when the training unit 803 is used to train the benchmark object recognition model using each dataset to obtain target object recognition models for multiple object types of interest, it can be specifically used for:

[0197] Based on the attribute description data in any dataset, construct Q labeled data pairs and P unlabeled data pairs, where Q and P are both positive integers. A labeled data pair includes: a type label of a labeled sample object and the corresponding attribute description data; an unlabeled data pair includes: attribute description data of an unlabeled sample object and augmented data obtained by augmenting the attribute description data.

[0198] The benchmark object recognition model is invoked to predict the type of the corresponding labeled sample object based on the attribute description data in each labeled data, and the target type prediction result of each labeled sample object is obtained.

[0199] The baseline object recognition model is invoked to predict the type consistency according to the prediction objective. Based on the attribute description data and corresponding augmented data in each unlabeled data pair, the type prediction is performed on the corresponding unlabeled sample object to obtain two type prediction results for each unlabeled sample object.

[0200] Based on the target type prediction result and corresponding type label for each labeled sample object, and the difference between the two type prediction results for each unlabeled sample object, the model parameters of the benchmark object recognition model are optimized to obtain the target object recognition model for the target object type corresponding to any dataset.

[0201] In another implementation, when the training unit 803 is used to construct Q labeled data pairs and P unlabeled data pairs based on the attribute description data in any dataset, it can be specifically used for:

[0202] Attribute description data of multiple sample objects are selected from any dataset to construct a target training set. Based on the attribute description data of each sample object in the target training set, soft deduplication is performed on the multiple sample objects in the target training set to obtain Q sample objects.

[0203] Obtain the type labels of the Q sample objects, and treat the Q sample objects as Q labeled sample objects; and construct Q labeled data using the type labels of the Q labeled sample objects and the corresponding attribute description data;

[0204] From the remaining sample objects excluding the Q sample objects, select P unlabeled sample objects; and perform data perturbation augmentation processing on the attribute description data of each unlabeled sample object to obtain augmented data for each unlabeled sample object;

[0205] Using the attribute description data and corresponding augmented data of each unlabeled sample object, P unlabeled data pairs are constructed.

[0206] In another implementation, when the training unit 803 performs soft deduplication on multiple sample objects in the target training set based on the attribute description data of each sample object in the target training set to obtain Q sample objects, it can specifically be used to:

[0207] Based on the attribute description data of each sample object in the target training set, determine the object features of each sample object;

[0208] Construct a Local Sensitive Hash Pool, which includes one or more feature buckets; and control the object features of each sample object to enter each feature bucket in the Local Sensitive Hash Pool in a streaming manner.

[0209] Determine the current object characteristics of the current sample object that is to be entered into the local sensitive hash pool, and perform hash mapping on the current object characteristics using the local sensitive hash function. Based on the hash mapping result, allocate a target feature bucket for the current object characteristics in the local sensitive hash pool.

[0210] Based on the feature similarity between the current object features and the existing historical object features in the target feature bucket, detect similar sample objects to the current sample object from the sample objects corresponding to the historical object features;

[0211] If a similar sample object is detected, the current object feature is controlled to enter the target feature bucket; if no similar sample object is detected, the current object feature is controlled to enter the target feature bucket, and the current sample object is added to the set of objects to be labeled.

[0212] After the object features of each sample object have entered the local sensitive hash pool, the sample objects in the set of objects to be labeled are determined as Q sample objects after soft deduplication of the multiple sample objects.

[0213] In another implementation, if the similar sample object is detected, the training unit 803 can also be used to:

[0214] Calculate the object similarity between the current sample object and the similar sample object based on the attribute description data of the current sample object and the attribute description data of the similar sample objects;

[0215] If the object similarity is less than the similarity threshold, the current sample object is added to the set of objects to be labeled.

[0216] In another implementation, any attribute description data includes multiple attribute description texts; correspondingly, when the training unit 803 determines the object features of each sample object based on the attribute description data of each sample object in the target training set, it may specifically be used to:

[0217] For any sample object in the target training set, the attribute description text in the attribute description data of any sample object that uniquely describes the object attribute of any sample object is taken as the target attribute description text of any sample object.

[0218] The target attribute description text of each sample object in the target training set is segmented into words to obtain the text words corresponding to each sample object; and the word frequency matrix of each sample object is constructed using the text words corresponding to each sample object.

[0219] Dimensionality reduction hashing is performed on the word frequency matrix of each sample object to obtain the dimension reduction hash value of each sample object; and the dimension reduction hash value of each sample object is determined as the object feature of each sample object.

[0220] In another implementation, when the training unit 803 selects P unlabeled sample objects from the remaining sample objects excluding the Q sample objects from the plurality of sample objects, it may specifically be used to:

[0221] The baseline object recognition model is trained in a supervised manner using the Q labeled data to obtain an initial object recognition model.

[0222] Determine the remaining sample objects among the plurality of sample objects, excluding the Q sample objects, and take each of the remaining sample objects as a candidate sample object;

[0223] The initial object recognition model is invoked to predict the type of each candidate sample object based on the attribute description data of each candidate sample object, and the type pseudo-label of each candidate sample object is determined according to the prediction results of each type.

[0224] Based on the pseudo-labels of the types of each candidate sample object, type equalization is performed on each candidate sample object, and P candidate sample objects are selected from all candidate sample objects as P unlabeled sample objects based on the type equalization result.

[0225] In another implementation, the attribute description text in any attribute description data is text represented using a first language; correspondingly, when the training unit 803 performs data perturbation augmentation processing on the attribute description data of each unlabeled sample object to obtain augmented data for each unlabeled sample object, it can specifically be used to:

[0226] From the attribute description data of the p-th unlabeled sample object, select at least one attribute description text; where p∈[1,P];

[0227] Each selected attribute description text is translated into text expressed in a second language to obtain the translation result corresponding to each selected attribute description text;

[0228] The translation result corresponding to each selected attribute description text is back-translated into text represented using the first language to obtain the back-translation result of each selected attribute description text;

[0229] The augmented data of the p-th unlabeled sample object is constructed by using the back-translation results of each selected attribute description text and the unselected attribute description text in the attribute description data of the p-th unlabeled sample object.

[0230] In another implementation, when the training unit 803 optimizes the model parameters of the benchmark object recognition model based on the target type prediction result and corresponding type label for each labeled sample object, and the difference between the two type prediction results for each unlabeled sample object, it can specifically be used to:

[0231] Based on the target type prediction result and the corresponding type label for each labeled sample object, the labeled loss value of the benchmark object recognition model is determined.

[0232] The unlabeled loss value of the benchmark object recognition model is determined based on the difference between the two type prediction results for each unlabeled sample object.

[0233] A joint loss value is calculated from the labeled loss value and the unlabeled loss value to obtain the model loss value of the benchmark object recognition model, and the model parameters of the benchmark object recognition model are optimized based on the model loss value.

[0234] In another implementation, when the training unit 803 is used to optimize the model parameters of the benchmark object recognition model based on the model loss value, it may specifically be used for:

[0235] The gradient of the benchmark object recognition model is backpropagated based on the model loss value to obtain the backpropagated gradient of the benchmark object recognition model.

[0236] The historical learning rate of the benchmark object recognition model is determined, and the historical learning rate is regularly decayed to obtain the target learning rate;

[0237] The model parameters of the baseline object recognition model are optimized based on the backpropagation gradient and the target learning rate.

[0238] In another embodiment, the benchmark object recognition model includes at least two type labels, and the target type prediction result of any labeled sample object includes: the predicted probability that any labeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; correspondingly, when the training unit 803 determines the labeled loss value of the benchmark object recognition model based on the target type prediction result and the corresponding type label of each labeled sample object, it can be specifically used for:

[0239] If the maximum prediction probability in the current target type prediction result of each labeled sample object obtained by traversing the prediction is greater than the first threshold, then the current target type prediction result is determined as a relief signal.

[0240] After all the predicted Q target types have been traversed, the determined mitigation signals are used to perform signal mitigation processing on the predicted Q target types, so as to remove the mitigation signals from the predicted Q target types.

[0241] The labeled loss value of the benchmark object recognition model is determined based on the difference between the type label corresponding to the highest predicted probability in the remaining target type prediction results and the type label of the corresponding labeled sample object.

[0242] In another implementation, the training unit 803, when determining the unlabeled loss value of the benchmark object recognition model based on the difference between the two type prediction results for each unlabeled sample object, may specifically be used for:

[0243] The two type prediction results of each unlabeled sample object are respectively used as the two type signals of each unlabeled sample object;

[0244] According to the signal sharpening strategy, the two types of signals of each unlabeled sample object are subjected to signal sharpening processing to obtain the signal sharpening result;

[0245] Based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object, the unlabeled loss value of the benchmark object recognition model is determined.

[0246] In another embodiment, the benchmark object recognition model includes at least two type labels, and a type signal of any unlabeled sample object includes: the predicted probability that the unlabeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; the signal sharpening strategy includes: masking processing based on the predicted probability;

[0247] Accordingly, when the training unit 803 performs signal sharpening processing on the two types of signals of each unlabeled sample object according to the signal sharpening strategy to obtain the signal sharpening result, it can be specifically used for:

[0248] Traverse P unlabeled sample objects. If, in the two type signals of the currently traversed unlabeled sample object, the maximum prediction probability of at least one type signal is less than the second threshold, then perform masking processing on the current unlabeled sample object and the corresponding two type signals.

[0249] After all P unlabeled sample objects have been traversed, each unlabeled sample object that has been masked is added to the signal sharpening result.

[0250] In another implementation, the training unit 803, when determining the unlabeled loss value of the benchmark object recognition model based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object, may specifically be used to:

[0251] All unlabeled sample objects among the P unlabeled sample objects that are not located in the signal sharpening result are considered as valid unlabeled sample objects;

[0252] Based on the difference between the two type signals of each valid unlabeled sample object, calculate the type consistency loss value corresponding to each valid unlabeled sample object;

[0253] Based on the type consistency loss value corresponding to each valid unlabeled sample object, the unlabeled loss value of the benchmark object recognition model is determined.

[0254] In another embodiment, the benchmark object recognition model includes at least two type labels, and a type signal of any unlabeled sample object includes: the predicted probability that the unlabeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; the signal sharpening strategy includes: minimizing the information entropy of the type signal obtained based on augmented data;

[0255] Accordingly, when the training unit 803 performs signal sharpening processing on the two types of signals of each unlabeled sample object according to the signal sharpening strategy to obtain the signal sharpening result, it can be specifically used for:

[0256] For any unlabeled sample object, determine the type signal predicted based on the augmented data of the unlabeled sample object from the two type signals of the unlabeled sample object;

[0257] Based on the type labels and corresponding prediction probabilities in the determined type signals, calculate the information entropy of the augmented data of any unlabeled sample object.

[0258] The information entropy of the augmented data of any unlabeled sample object is calculated and added to the signal sharpening result.

[0259] In another implementation, the training unit 803, when determining the unlabeled loss value of the benchmark object recognition model based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object, may specifically be used to:

[0260] Based on the difference between the two type signals of each unlabeled sample object, calculate the type consistency loss value corresponding to each unlabeled sample object;

[0261] The information entropy in the signal sharpening result and the type consistency loss value corresponding to each unlabeled sample object are summed to obtain the unlabeled loss value of the benchmark object recognition model.

[0262] In another implementation, any attribute description data includes attribute description text under multiple attribute dimensions, and the benchmark object recognition model includes a feature extraction network, a feature joint layer, and a feedforward network corresponding to each attribute dimension; correspondingly, when the training unit 803 is used to call the benchmark object recognition model to perform type prediction on the corresponding labeled sample object based on the attribute description data in each labeled data, and to obtain the target type prediction result for each labeled sample object, it can be specifically used for:

[0263] For any labeled sample object, each feature extraction network in the benchmark object recognition model is invoked to independently extract features from the attribute description text under the corresponding attribute dimension in the attribute description data of the corresponding labeled data, so as to obtain the text features of each attribute description text.

[0264] The feature joint layer is invoked to perform feature joint processing on the text features of the attribute description text according to the attention mechanism, so as to obtain joint features;

[0265] The feedforward network is invoked to predict the type of any labeled sample object based on the joint features, thereby obtaining the target type prediction result of any labeled sample object.

[0266] According to another embodiment of this application, Figure 8 The units in the policy-based model training device shown can be individually or entirely merged into one or more other units, or some of the units can be further divided into multiple functionally smaller units. This achieves the same operation without affecting the technical effects of the embodiments of this application. The above units are based on logical function division. In practical applications, the function of one unit can also be implemented by multiple units, or the function of multiple units can be implemented by one unit. In other embodiments of this application, the image processing device may also include other units. In practical applications, these functions can also be implemented with the assistance of other units, and can be implemented collaboratively by multiple units.

[0267] According to another embodiment of this application, the following can be achieved by running on a general-purpose computing device, such as a computer, which includes processing elements and storage elements such as a central processing unit (CPU), random access memory (RAM), and read-only memory (ROM), a device capable of performing operations such as... Figure 2 or Figure 4 The computer program (including program code) for each step involved in the corresponding method shown, to construct such... Figure 8 The present invention describes a policy-based model training apparatus and a policy-based model training method for implementing embodiments of this application. The computer program may be recorded on, for example, a computer-readable recording medium, loaded onto the aforementioned computing device via the same medium, and run therein.

[0268] This application's embodiments introduce a baseline object recognition model and train it to obtain a target object recognition model, thereby achieving the object recognition task through the target object recognition model. This improves the efficiency and accuracy of object recognition. Furthermore, during model training, N object pre-detection strategies are set. Each object pre-detection strategy indicates one or more keywords that should be associated with the attribute description data of objects under a certain object type. After obtaining the attribute description data of multiple initial objects used to train the baseline object recognition model, attribute description data associated with each object type can be selected from the attribute description data of the multiple initial objects based on the keywords indicated by the N object pre-detection strategies. This serves as sample data for the baseline object recognition model, ensuring the accuracy of the sample data and improving the subsequent model training effect. It also avoids the problem of resource waste and low training efficiency caused by the baseline object recognition model learning attribute description data unrelated to the object type. Furthermore, by clustering the attribute description data of each sample object into multiple datasets based on the object type of the pre-detection strategy corresponding to the attribute description data of each sample object, and using each dataset to specifically train the benchmark object recognition model, the benchmark object recognition model can consistently and focusedly optimize its model parameters by learning the attribute description data of the dataset corresponding to a single object type. This further improves the model training effect, enabling the trained single target object recognition model to have a strong recognition ability for objects under the corresponding object type, thus further improving the accuracy of object recognition.

[0269] Based on the description of the above-described policy- and model-based object recognition method embodiments, this application also discloses a policy- and model-based object recognition device, which can be a computer program (including program code) running on a computer device. This policy-based model training device can execute... Figure 6 The method flow is shown below. Please refer to [link / reference]. Figure 9 The policy- and model-based object recognition device can operate the following units:

[0270] The acquisition unit 901 is used to acquire the target attribute description data of the target object to be identified and N object pre-detection strategies, where N is a positive integer; an object pre-detection strategy is used to indicate: one or more keywords that need to be associated with the attribute description data of an object under a certain type of object of interest;

[0271] The identification unit 902 is used to perform policy hit detection on the N object pre-detection strategies based on the keywords indicated by the N object pre-detection strategies and the target attribute description data.

[0272] The identification unit 902 is further configured to determine a target object identification model for type prediction of the target object if the target attribute description data hits at least one object pre-detection strategy.

[0273] The identification unit 902 is further configured to call the determined target object identification model to perform type prediction on the target object based on the target attribute description data, obtain the type prediction result of the target object, and determine whether the target object is an object of interest based on the type prediction result of the target object.

[0274] In one embodiment, the identification unit 902 can also be used for:

[0275] Obtain multiple feedback results for the determined target object recognition model, and one feedback result indicates that the type prediction result obtained by the determined target object recognition model based on the attribute description data of an object is inaccurate.

[0276] The credibility of each feedback result among the multiple feedback results is tested in order to filter out the credible feedback results from the multiple feedback results;

[0277] From the attribute description data corresponding to the credible feedback results, select attribute description data of one or more objects, and determine the type label of each selected object;

[0278] The type labels and corresponding attribute descriptions of each selected object are added to the labeled dataset of the determined target object recognition model; and based on the added labeled dataset, the determined target object recognition model is adaptively optimized.

[0279] According to another embodiment of this application, Figure 9 The units in the policy- and model-based object recognition device shown can be individually or entirely merged into one or more other units, or some of the units can be further divided into multiple functionally smaller units. This achieves the same operation without affecting the technical effects of the embodiments of this application. The above units are based on logical function division. In practical applications, the function of one unit can be implemented by multiple units, or the function of multiple units can be implemented by one unit. In other embodiments of this application, the policy- and model-based object recognition device may also include other units. In practical applications, these functions can also be implemented with the assistance of other units, and can be implemented collaboratively by multiple units.

[0280] According to another embodiment of this application, the following can be achieved by running on a general-purpose computing device, such as a computer, which includes processing elements and storage elements such as a central processing unit (CPU), random access memory (RAM), and read-only memory (ROM), a device capable of performing operations such as... Figure 6 The computer program (including program code) for each step involved in the corresponding method shown, to construct such... Figure 9 The diagram illustrates a policy- and model-based object recognition apparatus and a policy- and model-based object recognition method for implementing embodiments of this application. The computer program may be recorded on, for example, a computer-readable recording medium, loaded onto the aforementioned computing device via the same medium, and executed therein.

[0281] This application embodiment sets up N object pre-detection strategies. Each object pre-detection strategy indicates one or more keywords that need to be associated with the attribute description data of an object under a certain type of object of interest. After obtaining the target attribute description data of the target object to be identified, the strategy hit detection can be performed on the target attribute description data according to the keywords indicated by the N object pre-detection strategies to initially identify whether the target object is an object of interest. If the target attribute description data hits at least one object pre-detection strategy, it can be determined that the target object may be an object of interest. At this time, a target object recognition model for secondary identification of the target object can be determined, and the target object recognition model is called to predict the type of the target object based on the target attribute description data. Based on the predicted type prediction result, it is determined whether the target object is an object of interest. By combining strategies and models to identify target objects, the accuracy of object identification can be effectively improved; moreover, the entire identification process does not require human intervention, which can effectively improve the efficiency of object identification.

[0282] Based on the description of the above method and apparatus embodiments, this application also provides a computer device. Please refer to... Figure 10 The computer device includes at least a processor 1001, an input interface 1002, an output interface 1003, and a computer storage medium 1004. The processor 1001, input interface 1002, output interface 1003, and computer storage medium 1004 within the computer device can be connected via a bus or other means. The computer storage medium 1004 can be stored in the memory of the computer device. The computer storage medium 1004 is used to store computer programs, which include program instructions. The processor 1001 is used to execute the program instructions stored in the computer storage medium 1004.

[0283] The processor 1001 (or CPU (Central Processing Unit)) is the computing and control core of the computer device, suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions to implement corresponding method flows or corresponding functions. In one embodiment, the processor 1001 described in this application embodiment can be used to perform the above-mentioned... Figure 2 or Figure 4 The method flow shown; in another embodiment, the processor 1001 described in this application embodiment can be used to execute the above. Figure 6 The method flow is shown.

[0284] This application also provides a computer storage medium (memory), which is a memory device in a computer device used to store programs and data. It is understood that the computer storage medium here can include both the built-in storage medium of the computer device and extended storage media supported by the computer device. The computer storage medium provides storage space that stores the operating system of the computer device. Furthermore, this storage space also stores one or more instructions suitable for loading and execution by a processor. These instructions can be one or more computer programs (including program code). It should be noted that the computer storage medium here can be high-speed RAM or non-volatile memory, such as at least one disk storage device; optionally, it can also be at least one computer storage medium located remotely from the aforementioned processor.

[0285] In one embodiment, a processor may load and execute one or more instructions stored in a computer storage medium to achieve the aforementioned... Figure 2 or Figure 4 The corresponding steps of the method in the illustrated embodiment; in specific implementation, one or more instructions in the computer storage medium can be loaded and executed by the processor as follows:

[0286] Obtain attribute description data of multiple initial objects for training the benchmark object recognition model, and N object pre-detection strategies, where N is a positive integer; wherein, an object pre-detection strategy is used to indicate: one or more keywords that need to be associated with the attribute description data of an object under a certain object type;

[0287] Based on the keywords indicated by the N object pre-detection strategies, the attribute description data of each initial object is used to perform strategy hit detection on the N object pre-detection strategies.

[0288] From the plurality of initial objects, the initial objects corresponding to the attribute description data that hit at least one object pre-detection strategy are selected as the sample objects of the benchmark object recognition model.

[0289] Based on the object type of the object pre-detection strategy that the attribute description data of each sample object hits, the attribute description data of each sample object is subjected to data clustering based on object type to obtain multiple datasets, with each dataset corresponding to one object type.

[0290] The baseline object recognition model is trained using each dataset to obtain target object recognition models for multiple object types of interest. A target object recognition model is used to predict the probability that any object belongs to the corresponding object type of interest based on the attribute description data of any input object.

[0291] In one implementation, any object pre-detection strategy is further used to indicate the logical relationship between the corresponding keywords; correspondingly, when performing strategy hit detection on the N object pre-detection strategies based on the attribute description data of each initial object according to the keywords indicated by the N object pre-detection strategies, the one or more instructions can be loaded and executed by the processor:

[0292] For any initial object's attribute description data, traverse the N object pre-detection strategies to determine the current object's pre-detection strategy.

[0293] Based on the keywords and logical relationships in the current object pre-detection strategy, determine the target keywords that the attribute description data of any initial object needs to hit, and search for the target keywords in the attribute description data of any initial object;

[0294] If the target keyword is found, it is determined that the attribute description data of any initial object matches the current object pre-detection strategy; if the target keyword is not found, the process continues to traverse the N object pre-detection strategies.

[0295] In another implementation, when training the baseline object recognition model using each dataset to obtain target object recognition models for multiple object types of interest, the one or more instructions can be loaded and executed by the processor:

[0296] Based on the attribute description data in any dataset, construct Q labeled data pairs and P unlabeled data pairs, where Q and P are both positive integers. A labeled data pair includes: a type label of a labeled sample object and the corresponding attribute description data; an unlabeled data pair includes: attribute description data of an unlabeled sample object and augmented data obtained by augmenting the attribute description data.

[0297] The benchmark object recognition model is invoked to predict the type of the corresponding labeled sample object based on the attribute description data in each labeled data, and the target type prediction result of each labeled sample object is obtained.

[0298] The baseline object recognition model is invoked to predict the type consistency according to the prediction objective. Based on the attribute description data and corresponding augmented data in each unlabeled data pair, the type prediction is performed on the corresponding unlabeled sample object to obtain two type prediction results for each unlabeled sample object.

[0299] Based on the target type prediction result and corresponding type label for each labeled sample object, and the difference between the two type prediction results for each unlabeled sample object, the model parameters of the benchmark object recognition model are optimized to obtain the target object recognition model for the target object type corresponding to any dataset.

[0300] In another implementation, when constructing Q labeled data pairs and P unlabeled data pairs based on attribute description data in any dataset, the one or more instructions can be loaded and executed by the processor:

[0301] Attribute description data of multiple sample objects are selected from any dataset to construct a target training set. Based on the attribute description data of each sample object in the target training set, soft deduplication is performed on the multiple sample objects in the target training set to obtain Q sample objects.

[0302] Obtain the type labels of the Q sample objects, and treat the Q sample objects as Q labeled sample objects; and construct Q labeled data using the type labels of the Q labeled sample objects and the corresponding attribute description data;

[0303] From the remaining sample objects excluding the Q sample objects, select P unlabeled sample objects; and perform data perturbation augmentation processing on the attribute description data of each unlabeled sample object to obtain augmented data for each unlabeled sample object;

[0304] Using the attribute description data and corresponding augmented data of each unlabeled sample object, P unlabeled data pairs are constructed.

[0305] In another implementation, when soft deduplication is performed on multiple sample objects in the target training set based on the attribute description data of each sample object in the target training set to obtain Q sample objects, the one or more instructions can be loaded and executed by the processor:

[0306] Based on the attribute description data of each sample object in the target training set, determine the object features of each sample object;

[0307] Construct a Local Sensitive Hash Pool, which includes one or more feature buckets; and control the object features of each sample object to enter each feature bucket in the Local Sensitive Hash Pool in a streaming manner.

[0308] Determine the current object characteristics of the current sample object that is to be entered into the local sensitive hash pool, and perform hash mapping on the current object characteristics using the local sensitive hash function. Based on the hash mapping result, allocate a target feature bucket for the current object characteristics in the local sensitive hash pool.

[0309] Based on the feature similarity between the current object features and the existing historical object features in the target feature bucket, detect similar sample objects to the current sample object from the sample objects corresponding to the historical object features;

[0310] If a similar sample object is detected, the current object feature is controlled to enter the target feature bucket; if no similar sample object is detected, the current object feature is controlled to enter the target feature bucket, and the current sample object is added to the set of objects to be labeled.

[0311] After the object features of each sample object have entered the local sensitive hash pool, the sample objects in the set of objects to be labeled are determined as Q sample objects after soft deduplication of the multiple sample objects.

[0312] In another implementation, if the similar sample object is detected, the one or more instructions can be loaded and executed by the processor:

[0313] Calculate the object similarity between the current sample object and the similar sample object based on the attribute description data of the current sample object and the attribute description data of the similar sample objects;

[0314] If the object similarity is less than the similarity threshold, the current sample object is added to the set of objects to be labeled.

[0315] In another implementation, any attribute description data includes multiple attribute description texts; correspondingly, when determining the object features of each sample object based on the attribute description data of each sample object in the target training set, the one or more instructions can be loaded and executed by the processor:

[0316] For any sample object in the target training set, the attribute description text in the attribute description data of any sample object that uniquely describes the object attribute of any sample object is taken as the target attribute description text of any sample object.

[0317] The target attribute description text of each sample object in the target training set is segmented into words to obtain the text words corresponding to each sample object; and the word frequency matrix of each sample object is constructed using the text words corresponding to each sample object.

[0318] Dimensionality reduction hashing is performed on the word frequency matrix of each sample object to obtain the dimension reduction hash value of each sample object; and the dimension reduction hash value of each sample object is determined as the object feature of each sample object.

[0319] In another implementation, when selecting P unlabeled sample objects from the remaining sample objects excluding the Q sample objects from the plurality of sample objects, the one or more instructions can be loaded and executed by the processor:

[0320] The baseline object recognition model is trained in a supervised manner using the Q labeled data to obtain the initial object recognition model.

[0321] Determine the remaining sample objects among the plurality of sample objects, excluding the Q sample objects, and take each of the remaining sample objects as a candidate sample object;

[0322] The initial object recognition model is invoked to predict the type of each candidate sample object based on the attribute description data of each candidate sample object, and the type pseudo-label of each candidate sample object is determined according to the prediction results of each type.

[0323] Based on the pseudo-labels of the types of each candidate sample object, type equalization is performed on each candidate sample object, and P candidate sample objects are selected from all candidate sample objects as P unlabeled sample objects based on the type equalization result.

[0324] In another implementation, the attribute description text in any attribute description data is text represented using a first language; correspondingly, when performing data perturbation augmentation processing on the attribute description data of each unlabeled sample object to obtain augmented data for each unlabeled sample object, the one or more instructions can be loaded and executed by the processor:

[0325] From the attribute description data of the p-th unlabeled sample object, select at least one attribute description text; where p∈[1,P];

[0326] Each selected attribute description text is translated into text expressed in a second language to obtain the translation result corresponding to each selected attribute description text;

[0327] The translation result corresponding to each selected attribute description text is back-translated into text represented using the first language to obtain the back-translation result of each selected attribute description text;

[0328] The augmented data of the p-th unlabeled sample object is constructed by using the back-translation results of each selected attribute description text and the unselected attribute description text in the attribute description data of the p-th unlabeled sample object.

[0329] In another implementation, when optimizing the model parameters of the baseline object recognition model based on the target type prediction result and corresponding type label for each labeled sample object, and the difference between the two type prediction results for each unlabeled sample object, the one or more instructions can be loaded and executed by the processor:

[0330] Based on the target type prediction result and the corresponding type label for each labeled sample object, the labeled loss value of the benchmark object recognition model is determined.

[0331] The unlabeled loss value of the benchmark object recognition model is determined based on the difference between the two type prediction results for each unlabeled sample object.

[0332] A joint loss value is calculated from the labeled loss value and the unlabeled loss value to obtain the model loss value of the benchmark object recognition model, and the model parameters of the benchmark object recognition model are optimized based on the model loss value.

[0333] In another implementation, when optimizing the model parameters of the baseline object recognition model based on the model loss value, the one or more instructions can be loaded and executed by the processor:

[0334] The gradient of the benchmark object recognition model is backpropagated based on the model loss value to obtain the backpropagated gradient of the benchmark object recognition model.

[0335] The historical learning rate of the benchmark object recognition model is determined, and the historical learning rate is regularly decayed to obtain the target learning rate;

[0336] The model parameters of the baseline object recognition model are optimized based on the backpropagation gradient and the target learning rate.

[0337] In another implementation, the benchmark object recognition model includes at least two type labels, and the target type prediction result of any labeled sample object includes: the predicted probability that any labeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; correspondingly, when determining the labeled loss value of the benchmark object recognition model based on the target type prediction result of each labeled sample object and the corresponding type label, the one or more instructions can be loaded and executed by the processor:

[0338] If the maximum prediction probability in the current target type prediction result of each labeled sample object obtained by traversing the prediction is greater than the first threshold, then the current target type prediction result is determined as a relief signal.

[0339] After all the predicted Q target types have been traversed, the determined mitigation signals are used to perform signal mitigation processing on the predicted Q target types, so as to remove the mitigation signals from the predicted Q target types.

[0340] The labeled loss value of the benchmark object recognition model is determined based on the difference between the type label corresponding to the highest predicted probability in the remaining target type prediction results and the type label of the corresponding labeled sample object.

[0341] In another implementation, when determining the unlabeled loss value of the baseline object recognition model based on the difference between the two type prediction results for each unlabeled sample object, the one or more instructions can be loaded and executed by the processor:

[0342] The two type prediction results of each unlabeled sample object are respectively used as the two type signals of each unlabeled sample object;

[0343] According to the signal sharpening strategy, the two types of signals of each unlabeled sample object are subjected to signal sharpening processing to obtain the signal sharpening result;

[0344] Based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object, the unlabeled loss value of the benchmark object recognition model is determined.

[0345] In another embodiment, the benchmark object recognition model includes at least two type labels, and a type signal of any unlabeled sample object includes: the predicted probability that the unlabeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; the signal sharpening strategy includes: masking processing based on the predicted probability;

[0346] Accordingly, when performing signal sharpening processing on the two types of signals of each unlabeled sample object according to the signal sharpening strategy to obtain the signal sharpening result, the one or more instructions can be loaded and executed by the processor:

[0347] Traverse P unlabeled sample objects. If, in the two type signals of the currently traversed unlabeled sample object, the maximum prediction probability of at least one type signal is less than the second threshold, then perform masking processing on the current unlabeled sample object and the corresponding two type signals.

[0348] After all P unlabeled sample objects have been traversed, each unlabeled sample object that has been masked is added to the signal sharpening result.

[0349] In another implementation, when determining the unlabeled loss value of the benchmark object recognition model based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object, the one or more instructions can be loaded and executed by the processor:

[0350] All unlabeled sample objects among the P unlabeled sample objects that are not located in the signal sharpening result are considered as valid unlabeled sample objects;

[0351] Based on the difference between the two type signals of each valid unlabeled sample object, calculate the type consistency loss value corresponding to each valid unlabeled sample object;

[0352] Based on the type consistency loss value corresponding to each valid unlabeled sample object, the unlabeled loss value of the benchmark object recognition model is determined.

[0353] In another embodiment, the benchmark object recognition model includes at least two type labels, and a type signal of any unlabeled sample object includes: the predicted probability that the unlabeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; the signal sharpening strategy includes: minimizing the information entropy of the type signal obtained based on augmented data;

[0354] Accordingly, when performing signal sharpening processing on the two types of signals of each unlabeled sample object according to the signal sharpening strategy to obtain the signal sharpening result, the one or more instructions can be loaded and executed by the processor:

[0355] For any unlabeled sample object, determine the type signal predicted based on the augmented data of the unlabeled sample object from the two type signals of the unlabeled sample object;

[0356] Based on the type labels and corresponding prediction probabilities in the determined type signals, calculate the information entropy of the augmented data of any unlabeled sample object.

[0357] The information entropy of the augmented data of any unlabeled sample object is calculated and added to the signal sharpening result.

[0358] In another implementation, when determining the unlabeled loss value of the benchmark object recognition model based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object, the one or more instructions can be loaded and executed by the processor:

[0359] Based on the difference between the two type signals of each unlabeled sample object, calculate the type consistency loss value corresponding to each unlabeled sample object;

[0360] The information entropy in the signal sharpening result and the type consistency loss value corresponding to each unlabeled sample object are summed to obtain the unlabeled loss value of the benchmark object recognition model.

[0361] In another embodiment, a processor may load and execute one or more instructions stored in a computer storage medium to achieve the aforementioned... Figure 6 The corresponding steps of the method in the illustrated embodiment; in specific implementation, one or more instructions in the computer storage medium can be loaded and executed by the processor as follows:

[0362] Obtain the target attribute description data of the target object to be identified and N object pre-detection strategies, where N is a positive integer; an object pre-detection strategy is used to indicate: one or more keywords that need to be associated with the attribute description data of an object under a certain object type;

[0363] Based on the keywords indicated by the N object pre-detection strategies, the target attribute description data is used to perform strategy hit detection on the N object pre-detection strategies;

[0364] If the target attribute description data matches at least one object pre-detection strategy, then a target object recognition model for type prediction of the target object is determined.

[0365] The identified target object recognition model is invoked to predict the type of the target object based on the target attribute description data, thereby obtaining the type prediction result of the target object, and determining whether the target object is an object of interest based on the type prediction result of the target object.

[0366] In one implementation, the one or more instructions may be loaded and executed by a processor:

[0367] Obtain multiple feedback results for the determined target object recognition model, and one feedback result indicates that the type prediction result obtained by the determined target object recognition model based on the attribute description data of an object is inaccurate.

[0368] The credibility of each feedback result among the multiple feedback results is tested in order to filter out the credible feedback results from the multiple feedback results;

[0369] From the attribute description data corresponding to the credible feedback results, select attribute description data of one or more objects, and determine the type label of each selected object;

[0370] The type labels and corresponding attribute descriptions of each selected object are added to the labeled dataset of the determined target object recognition model; and based on the added labeled dataset, the determined target object recognition model is adaptively optimized.

[0371] The embodiments of this application can effectively improve the model training effect, enabling the trained single target object recognition model to have a strong recognition ability for objects of the corresponding target object type, thus further improving the accuracy of object recognition. In addition, by combining strategies and models to perform type recognition of target objects, the accuracy of object recognition can be effectively improved; moreover, the entire recognition process does not require human intervention, which can effectively improve the efficiency of object recognition.

[0372] It should be noted that, according to one aspect of this application, a computer program product or computer program is also provided, which includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, causing the computer device to perform the aforementioned... Figure 2 , Figure 4 or Figure 6The methods are provided in various alternative ways in the illustrated method embodiments.

[0373] Furthermore, it should be understood that the above-disclosed embodiments are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Therefore, any equivalent variations made in accordance with the claims of this application are still within the scope of this application.

Claims

1. A policy-based model training method, characterized in that, include: The system acquires attribute description data of multiple initial objects for training a benchmark object recognition model, and N object pre-detection strategies, where N is an integer greater than 1. Each object pre-detection strategy indicates one or more keywords that need to be associated with the attribute description data of an object under a specific object type. Each object type is an object type obtained by subdividing the same object type. The attribute description data of any object includes attribute description text under multiple attribute dimensions. The benchmark object recognition model includes a feature extraction network, a feature joint layer, and a feedforward network corresponding to each attribute dimension. The feature extraction network independently extracts features from the attribute description text under the corresponding attribute dimension to output text features. The feature joint layer performs joint feature processing on the text features output by each feature extraction network according to an attention mechanism to output joint features. The feedforward network performs object type prediction processing based on the joint features. Based on the keywords indicated by the N object pre-detection strategies, the attribute description data of each initial object is used to perform strategy hit detection on the N object pre-detection strategies. From the plurality of initial objects, the initial objects corresponding to the attribute description data that hit at least one object pre-detection strategy are selected as the sample objects of the benchmark object recognition model. Based on the object type of the object pre-detection strategy that the attribute description data of each sample object hits, the attribute description data of each sample object is subjected to data clustering based on object type to obtain multiple datasets, with each dataset corresponding to one object type. The baseline object recognition model is trained using each dataset to obtain target object recognition models for multiple object types of interest. A target object recognition model is used to predict the probability that any object belongs to the corresponding object type of interest based on the attribute description data of any input object.

2. The method as described in claim 1, characterized in that, Any object pre-detection strategy is also used to indicate the logical relationship between the corresponding keywords; The step of performing policy hit detection on the N object pre-detection strategies based on the keywords indicated by the N object pre-detection strategies, using the attribute description data of each initial object, includes: For any initial object's attribute description data, traverse the N object pre-detection strategies to determine the current object's pre-detection strategy. Based on the keywords and logical relationships in the current object pre-detection strategy, determine the target keywords that the attribute description data of any initial object needs to hit, and search for the target keywords in the attribute description data of any initial object; If the target keyword is found, it is determined that the attribute description data of any initial object matches the current object pre-detection strategy; if the target keyword is not found, the process continues to traverse the N object pre-detection strategies.

3. The method as described in claim 1, characterized in that, The benchmark object recognition model is trained using each dataset to obtain target object recognition models for multiple object types of interest, including: Based on the attribute description data in any dataset, construct Q labeled data pairs and P unlabeled data pairs, where Q and P are both positive integers. A labeled data pair includes: a type label of a labeled sample object and the corresponding attribute description data; an unlabeled data pair includes: attribute description data of an unlabeled sample object and augmented data obtained by augmenting the attribute description data. The benchmark object recognition model is invoked to predict the type of the corresponding labeled sample object based on the attribute description data in each labeled data, and the target type prediction result of each labeled sample object is obtained. The baseline object recognition model is invoked to predict the type consistency according to the prediction objective. Based on the attribute description data and corresponding augmented data in each unlabeled data pair, the type prediction is performed on the corresponding unlabeled sample object to obtain two type prediction results for each unlabeled sample object. Based on the target type prediction result and corresponding type label for each labeled sample object, and the difference between the two type prediction results for each unlabeled sample object, the model parameters of the benchmark object recognition model are optimized to obtain the target object recognition model for the target object type corresponding to any dataset.

4. The method as described in claim 3, characterized in that, The construction of Q labeled data pairs and P unlabeled data pairs based on attribute description data from any dataset includes: Attribute description data of multiple sample objects are selected from any dataset to construct a target training set. Based on the attribute description data of each sample object in the target training set, soft deduplication is performed on the multiple sample objects in the target training set to obtain Q sample objects. Obtain the type labels of the Q sample objects, and treat the Q sample objects as Q labeled sample objects; and construct Q labeled data using the type labels of the Q labeled sample objects and the corresponding attribute description data; From the remaining sample objects excluding the Q sample objects, select P unlabeled sample objects; and perform data perturbation augmentation processing on the attribute description data of each unlabeled sample object to obtain augmented data for each unlabeled sample object; Using the attribute description data and corresponding augmented data of each unlabeled sample object, P unlabeled data pairs are constructed.

5. The method as described in claim 4, characterized in that, The step involves performing soft deduplication on multiple sample objects in the target training set based on the attribute description data of each sample object in the target training set, resulting in Q sample objects, including: Based on the attribute description data of each sample object in the target training set, determine the object features of each sample object; Construct a Local Sensitive Hash Pool, which includes one or more feature buckets; and control the object features of each sample object to enter each feature bucket in the Local Sensitive Hash Pool in a streaming manner. Determine the current object characteristics of the current sample object that is to be entered into the local sensitive hash pool, and perform hash mapping on the current object characteristics using the local sensitive hash function. Based on the hash mapping result, allocate a target feature bucket for the current object characteristics in the local sensitive hash pool. Based on the feature similarity between the current object features and the existing historical object features in the target feature bucket, detect similar sample objects to the current sample object from the sample objects corresponding to the historical object features; If a similar sample object is detected, the current object feature is controlled to enter the target feature bucket; if no similar sample object is detected, the current object feature is controlled to enter the target feature bucket, and the current sample object is added to the set of objects to be labeled. After the object features of each sample object have entered the local sensitive hash pool, the sample objects in the set of objects to be labeled are determined as Q sample objects after soft deduplication of the multiple sample objects.

6. The method as described in claim 5, characterized in that, If the similar sample object is detected, the method further includes: Calculate the object similarity between the current sample object and the similar sample object based on the attribute description data of the current sample object and the attribute description data of the similar sample objects; If the object similarity is less than the similarity threshold, the current sample object is added to the set of objects to be labeled.

7. The method as described in claim 5, characterized in that, The attribute description data includes multiple attribute description texts; determining the object features of each sample object based on the attribute description data of each sample object in the target training set includes: For any sample object in the target training set, the attribute description text in the attribute description data of any sample object that uniquely describes the object attribute of any sample object is taken as the target attribute description text of any sample object. The target attribute description text of each sample object in the target training set is segmented into words to obtain the text words corresponding to each sample object; and the word frequency matrix of each sample object is constructed using the text words corresponding to each sample object. Dimensionality reduction hashing is performed on the word frequency matrix of each sample object to obtain the dimension reduction hash value of each sample object; and the dimension reduction hash value of each sample object is determined as the object feature of each sample object.

8. The method as described in claim 4, characterized in that, The step of selecting P unlabeled sample objects from the remaining sample objects excluding the Q sample objects from the plurality of sample objects includes: The baseline object recognition model is trained in a supervised manner using the Q labeled data to obtain an initial object recognition model. Determine the remaining sample objects among the plurality of sample objects, excluding the Q sample objects, and take each of the remaining sample objects as a candidate sample object; The initial object recognition model is invoked to predict the type of each candidate sample object based on the attribute description data of each candidate sample object, and the type pseudo-label of each candidate sample object is determined according to the prediction results of each type. Based on the pseudo-labels of the types of each candidate sample object, type equalization is performed on each candidate sample object, and P candidate sample objects are selected from all candidate sample objects as P unlabeled sample objects based on the type equalization result.

9. The method according to any one of claims 4-8, characterized in that, Each attribute description text in any attribute description data is text represented using a first language; the augmentation process of perturbing the attribute description data of each unlabeled sample object to obtain augmented data for each unlabeled sample object includes: Select at least one attribute description text from the attribute description data of the p-th unlabeled sample object; where p∈[1,P]; Each selected attribute description text is translated into text expressed in a second language to obtain the translation result corresponding to each selected attribute description text; The translation result corresponding to each selected attribute description text is back-translated into text represented using the first language to obtain the back-translation result of each selected attribute description text; The augmented data of the p-th unlabeled sample object is constructed by using the back-translation results of each selected attribute description text and the unselected attribute description text in the attribute description data of the p-th unlabeled sample object.

10. The method according to any one of claims 3-8, characterized in that, The step of optimizing the model parameters of the baseline object recognition model based on the target type prediction result and corresponding type label of each labeled sample object, and the difference between the two type prediction results of each unlabeled sample object, includes: Based on the target type prediction result and the corresponding type label for each labeled sample object, the labeled loss value of the benchmark object recognition model is determined. The unlabeled loss value of the benchmark object recognition model is determined based on the difference between the two type prediction results for each unlabeled sample object. A joint loss value is calculated from the labeled loss value and the unlabeled loss value to obtain the model loss value of the benchmark object recognition model, and the model parameters of the benchmark object recognition model are optimized based on the model loss value.

11. The method as described in claim 10, characterized in that, The step of optimizing the model parameters of the baseline object recognition model based on the model loss value includes: The gradient of the benchmark object recognition model is backpropagated based on the model loss value to obtain the backpropagated gradient of the benchmark object recognition model. The historical learning rate of the benchmark object recognition model is determined, and the historical learning rate is regularly decayed to obtain the target learning rate; The model parameters of the baseline object recognition model are optimized based on the backpropagation gradient and the target learning rate.

12. The method as described in claim 10, characterized in that, The benchmark object recognition model includes at least two type labels, and the target type prediction result of any labeled sample object includes: the predicted probability that any labeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; The step of determining the labeled loss value of the benchmark object recognition model based on the target type prediction result and corresponding type label of each labeled sample object includes: If the maximum prediction probability in the current target type prediction result of each labeled sample object obtained by traversing the prediction is greater than the first threshold, then the current target type prediction result is determined as a relief signal. After all the predicted Q target types have been traversed, the determined mitigation signals are used to perform signal mitigation processing on the predicted Q target types, so as to remove the mitigation signals from the predicted Q target types. The labeled loss value of the benchmark object recognition model is determined based on the difference between the type label corresponding to the highest predicted probability in the remaining target type prediction results and the type label of the corresponding labeled sample object.

13. The method as described in claim 10, characterized in that, The step of determining the unlabeled loss value of the benchmark object recognition model based on the difference between the two type prediction results for each unlabeled sample object includes: The two type prediction results of each unlabeled sample object are respectively used as the two type signals of each unlabeled sample object; According to the signal sharpening strategy, the two types of signals of each unlabeled sample object are subjected to signal sharpening processing to obtain the signal sharpening result; Based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object, the unlabeled loss value of the benchmark object recognition model is determined.

14. The method as described in claim 13, characterized in that, The benchmark object recognition model includes at least two type labels, and a type signal of any unlabeled sample object includes: the predicted probability that the unlabeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; the signal sharpening strategy includes: masking processing based on the predicted probability; The step involves performing signal sharpening processing on the two types of signals of each unlabeled sample object according to the signal sharpening strategy to obtain the signal sharpening result, including: Traverse P unlabeled sample objects. If, in the two type signals of the currently traversed unlabeled sample object, the maximum prediction probability of at least one type signal is less than the second threshold, then perform masking processing on the current unlabeled sample object and the corresponding two type signals. After all P unlabeled sample objects have been traversed, each unlabeled sample object that has been masked is added to the signal sharpening result.

15. The method as described in claim 14, characterized in that, The step of determining the unlabeled loss value of the benchmark object recognition model based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object includes: All unlabeled sample objects among the P unlabeled sample objects that are not located in the signal sharpening result are considered as valid unlabeled sample objects; Based on the difference between the two type signals of each valid unlabeled sample object, calculate the type consistency loss value corresponding to each valid unlabeled sample object; Based on the type consistency loss value corresponding to each valid unlabeled sample object, the unlabeled loss value of the benchmark object recognition model is determined.

16. The method as described in claim 13, characterized in that, The benchmark object recognition model includes at least two type labels, and a type signal of any unlabeled sample object includes: the predicted probability that the unlabeled sample object belongs to the object type indicated by each type label in the benchmark object recognition model; the signal sharpening strategy includes: minimizing the information entropy of the type signal obtained based on augmented data; The step involves performing signal sharpening processing on the two types of signals of each unlabeled sample object according to the signal sharpening strategy to obtain the signal sharpening result, including: For any unlabeled sample object, determine the type signal predicted based on the augmented data of the unlabeled sample object from the two type signals of the unlabeled sample object; Based on the type labels and corresponding prediction probabilities in the determined type signals, calculate the information entropy of the augmented data of any unlabeled sample object. The information entropy of the augmented data of any unlabeled sample object is calculated and added to the signal sharpening result.

17. The method as described in claim 16, characterized in that, The step of determining the unlabeled loss value of the benchmark object recognition model based on the signal sharpening result and the difference between the two types of signals of at least one unlabeled sample object includes: Based on the difference between the two type signals of each unlabeled sample object, calculate the type consistency loss value corresponding to each unlabeled sample object; The information entropy in the signal sharpening result and the type consistency loss value corresponding to each unlabeled sample object are summed to obtain the unlabeled loss value of the benchmark object recognition model.

18. The method according to any one of claims 3-8, characterized in that, The step of calling the benchmark object recognition model to predict the type of the corresponding labeled sample object based on the attribute description data in each labeled data, and obtaining the target type prediction result for each labeled sample object, includes: For any labeled sample object, each feature extraction network in the benchmark object recognition model is invoked to independently extract features from the attribute description text under the corresponding attribute dimension in the attribute description data of the corresponding labeled data, so as to obtain the text features of each attribute description text. The feature joint layer is invoked to perform feature joint processing on the text features of the attribute description text according to the attention mechanism, so as to obtain joint features; The feedforward network is invoked to predict the type of any labeled sample object based on the joint features, thereby obtaining the target type prediction result of any labeled sample object.

19. A policy and model-based object recognition method, characterized in that, include: Obtain the target attribute description data of the target object to be identified and N object pre-detection strategies, where N is an integer greater than 1; An object pre-detection strategy is used to indicate: one or more keywords that need to be associated with the attribute description data of an object under a certain object of interest; different object of interest types are object types obtained by subdividing an object of interest. Based on the keywords indicated by the N object pre-detection strategies, the target attribute description data is used to perform strategy hit detection on the N object pre-detection strategies; If the target attribute description data matches at least one object pre-detection strategy, then a target object recognition model for type prediction of the target object is determined, wherein the target object recognition model is generated using the method described in any one of claims 1-18; The identified target object recognition model is invoked to predict the type of the target object based on the target attribute description data, thereby obtaining the type prediction result of the target object. Based on the type prediction result of the target object, it is determined whether the target object is an object of interest.

20. The method as described in claim 19, characterized in that, The method further includes: Obtain multiple feedback results for the determined target object recognition model, and one feedback result indicates that the type prediction result obtained by the determined target object recognition model based on the attribute description data of an object is inaccurate. The credibility of each feedback result among the multiple feedback results is tested in order to filter out the credible feedback results from the multiple feedback results; From the attribute description data corresponding to the credible feedback results, select attribute description data of one or more objects, and determine the type label of each selected object; The type labels and corresponding attribute descriptions of each selected object are added to the labeled dataset of the determined target object recognition model; and based on the added labeled dataset, the determined target object recognition model is adaptively optimized.

21. A policy-based model training device, characterized in that, include: An acquisition unit is used to acquire attribute description data of multiple initial objects for training a benchmark object recognition model, and N object pre-detection strategies, where N is an integer greater than 1. Each object pre-detection strategy indicates one or more keywords that need to be associated with the attribute description data of an object under a specific object type. Each object type is an object type obtained by subdividing the same object type. The attribute description data of any object includes attribute description text under multiple attribute dimensions. The benchmark object recognition model includes a feature extraction network, a feature joint layer, and a feedforward network corresponding to each attribute dimension. The feature extraction network is used to independently extract features from the attribute description text under the corresponding attribute dimension to output text features. The feature joint layer is used to perform joint feature processing on the text features output by each feature extraction network according to an attention mechanism to output joint features. The feedforward network is used to perform object type prediction processing based on the joint features. The processing unit is used to perform policy hit detection on the N object pre-detection strategies based on the keywords indicated by the N object pre-detection strategies and the attribute description data of each initial object. The processing unit is further configured to select, from the plurality of initial objects, the initial objects corresponding to the attribute description data that hit at least one object pre-detection strategy, as sample objects of the benchmark object recognition model; The processing unit is also used to perform data clustering processing on the attribute description data of each sample object based on the object type of the object pre-detection strategy hit by the attribute description data of each sample object, to obtain multiple datasets, with each dataset corresponding to one object type. The training unit is used to train the baseline object recognition model using each dataset to obtain target object recognition models for multiple object types of interest; a target object recognition model is used to predict the probability that any object belongs to the corresponding object type of interest based on the attribute description data of any input object.

22. An object recognition device based on strategy and model, characterized in that, include: The acquisition unit is used to acquire the target attribute description data of the target object to be identified and N object pre-detection strategies, where N is an integer greater than 1; An object pre-detection strategy is used to indicate one or more keywords that need to be associated with the attribute description data of an object under a certain object type; The identification unit is used to perform policy hit detection on the N object pre-detection strategies based on the keywords indicated by the N object pre-detection strategies and the target attribute description data. The identification unit is further configured to determine a target object identification model for type prediction of the target object if the target attribute description data hits at least one object pre-detection strategy, wherein the target object identification model is generated using the method described in any one of claims 1-18; The identification unit is further configured to call the determined target object identification model to perform type prediction on the target object based on the target attribute description data, obtain the type prediction result of the target object, and determine whether the target object is an object of interest based on the type prediction result of the target object.

23. A computer device, comprising an input interface and an output interface, characterized in that, Also includes: A processor, adapted to implement one or more instructions; and computer storage media; The computer storage medium stores one or more instructions, which are adapted to be loaded by the processor and executed as described in any one of claims 1-18; or, the one or more instructions are adapted to be loaded by the processor and executed as described in any one of claims 19-20, as described in the policy and model-based object recognition method.

24. A computer storage medium, characterized in that, The computer storage medium stores one or more instructions, which are adapted to be loaded by a processor and executed as a policy-based model training method as described in any one of claims 1-18; or, the one or more instructions are adapted to be loaded by the processor and executed as a policy- and model-based object recognition method as described in any one of claims 19-20.

25. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the policy-based model training method as described in any one of claims 1-18; or, when the computer program is executed by the processor, it implements the policy- and model-based object recognition method as described in any one of claims 19-20.