Data object processing method, electronic device, and storage medium
By determining the feature representation and mapping representation of data objects, the problem of low accuracy in HS coding classification is solved, achieving higher classification accuracy and making it suitable for determining coding information in HS coding classification scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG CAINIAO SUPPLY CHAIN MANAGEMENT CO LTD
- Filing Date
- 2022-03-28
- Publication Date
- 2026-06-26
AI Technical Summary
The existing HS coding classification method has the problem of low classification accuracy, which leads to quality problems in related services such as commodity management, taxation, and inspection standards.
By determining the first feature representation of the data object, the second feature representation and the third feature representation of the candidate category information, and determining the first mapping representation and the second mapping representation based on the relationship information of the object attributes, the matching information between the data object and the candidate category information is improved, thereby realizing the determination of the target candidate category information.
This improves the accuracy of HS coding classification, avoids the problem of different products of the same category corresponding to the same feature representation, and enhances the accuracy of classification.
Smart Images

Figure CN116881373B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of communication technology, and in particular to a method for processing data objects, an electronic device, and a storage medium. Background Technology
[0002] With the development of communication technology, the classification of data objects such as customs clearance commodities is becoming increasingly important. The World Customs Organization has developed the HS (Harmonized System) code, which uses digital codes to represent and identify goods in cross-border trade. HS code classification is the process of finding the HS code for a commodity to be classified based on its information. As a universal category identifier for customs clearance commodities, the HS code is the fundamental basis for customs to conduct commodity classification management, review tax standards, and inspect commodity quality indicators. Inconsistencies between the declared HS code and the actual commodity category can cause a series of quality problems in commodity management models, tax collection, the application of inspection standards, billing, statistics, and other related services.
[0003] Current HS coding classification methods typically first use mathematical models to determine the feature representations of data objects, and then determine the corresponding coding information based on these feature representations. However, current HS coding classification methods suffer from low classification accuracy. Summary of the Invention
[0004] This application provides a data object processing method that can improve classification accuracy.
[0005] Correspondingly, embodiments of this application also provide a data object processing device, an electronic device, and a storage medium to implement and apply the above-described method.
[0006] To address the aforementioned problems, this application discloses a method for processing data objects, the method comprising:
[0007] Determine the first feature representation corresponding to the data object, and the second and third feature representations corresponding to the candidate category information; wherein the second feature representation corresponds to the attribute content of the object attribute; and the third feature representation corresponds to the object attribute.
[0008] Based on the relational information corresponding to the object attributes, determine the first mapping representation and the second mapping representation corresponding to the first feature representation and the second feature representation under the condition of the relational information, respectively, and determine the matching information between the data object and the candidate category information based on the first mapping representation, the second mapping representation and the third feature representation;
[0009] Based on the matching information, the target candidate category information corresponding to the data object is determined.
[0010] To address the aforementioned problems, this application discloses a method for processing data objects, the method comprising:
[0011] Determine triplet object samples; the triplet object samples include: a first object sample, a second object sample, and a third object sample; wherein, the second object sample corresponds to the same category information as the first object sample; the third object sample corresponds to different category information than the first object sample;
[0012] Based on the feature representation and attribute weights corresponding to the triplet object samples, the first matching information between the first object sample and the second object sample, and the second matching information between the first object sample and the third object sample are determined; there is a correspondence between the feature representation corresponding to the triplet object sample and the attribute weights, the feature representation corresponding to the object attribute, and the feature representation corresponding to the attribute content;
[0013] The attribute weights are updated based on the mapping relationship between the loss information, the first matching information, and the second matching information.
[0014] To address the aforementioned problems, this application discloses a method for processing data objects, the method comprising:
[0015] Based on the first data analyzer, the first category information corresponding to the data object is determined; the first category information corresponds to a portion of the category information; the first data analyzer is used to characterize the mapping relationship between the object information and the first category information;
[0016] Obtain candidate category information that matches the first category information from the category information corresponding to the object sample;
[0017] Based on the matching information between the data object and the candidate category information, the target candidate category information corresponding to the data object is determined.
[0018] To address the aforementioned problems, this application discloses a data object processing apparatus, the apparatus comprising:
[0019] The feature determination module is used to determine the first feature representation corresponding to the data object, and the second and third feature representations corresponding to the candidate category information; wherein, the second feature representation corresponds to the attribute content of the object attribute; and the third feature representation corresponds to the object attribute.
[0020] The mapping and matching module is used to determine, based on the relational information corresponding to the object attributes, the first feature representation and the second feature representation respectively corresponding to the first mapping representation and the second mapping representation under the condition of the relational information, and to determine the matching information between the data object and the candidate category information based on the first mapping representation, the second mapping representation and the third feature representation;
[0021] The target category determination module is used to determine the target candidate category information corresponding to the data object based on the matching information.
[0022] To address the aforementioned problems, this application discloses a data object processing apparatus, the apparatus comprising:
[0023] A sample determination module is used to determine triplet object samples; the triplet object samples include: a first object sample, a second object sample, and a third object sample; wherein, the second object sample corresponds to the same category information as the first object sample; and the third object sample corresponds to different category information than the first object sample.
[0024] The matching module is used to determine the first matching information between the first object sample and the second object sample, and the second matching information between the first object sample and the third object sample, based on the feature representation corresponding to the triple object sample and the attribute weights; there is a correspondence between the feature representation corresponding to the triple object sample and the attribute weights, the feature representation corresponding to the object attribute, and the feature representation corresponding to the attribute content;
[0025] The update module is used to update the attribute weights based on the mapping relationship between the loss information, the first matching information, and the second matching information.
[0026] To address the aforementioned problems, this application discloses a data object processing apparatus, the apparatus comprising:
[0027] The first category determination module is used to determine the first category information corresponding to the data object based on the first data analyzer; the first category information corresponds to a portion of the category information; the first data analyzer is used to characterize the mapping relationship between the object information and the first category information;
[0028] The candidate category determination module is used to obtain candidate category information that matches the first category information from the category information corresponding to the object sample;
[0029] The target category determination module is used to determine the target candidate category information corresponding to the data object based on the matching information between the data object and the candidate category information.
[0030] To address the aforementioned problems, this application discloses an electronic device, including: a processor; and a memory storing executable code thereon, wherein when the executable code is executed, the processor performs the method as described in any of the above embodiments.
[0031] To address the aforementioned issues, embodiments of this application disclose one or more machine-readable media storing executable code thereon, which, when executed, causes a processor to perform the method as described in any of the above embodiments.
[0032] Compared with the prior art, the embodiments of this application have the following advantages:
[0033] In the technical solution of this application embodiment, determining the first mapping representation and the second mapping representation corresponding to the first feature representation and the second feature representation under the condition of relational information respectively allows the first feature representation and the second feature representation to be mapped to the relational space represented by the relational information corresponding to the object attributes. This can, to a certain extent, avoid the problem of different products of the same category corresponding to the same feature representation. Based on this, this application embodiment determines the target candidate category information corresponding to the data object according to the first mapping representation and the second mapping representation, which can improve classification accuracy. Attached Figure Description
[0034] Figure 1 This is a flowchart of the steps of a data object processing method according to an embodiment of this application;
[0035] Figure 2 This is an example of encoded information from one embodiment of this application;
[0036] Figure 3 This is an example of a knowledge graph according to an embodiment of this application;
[0037] Figure 4 This is a flowchart of the steps of a data object processing method according to an embodiment of this application;
[0038] Figure 5 This is a flowchart of the steps of a data object processing method according to an embodiment of this application;
[0039] Figure 6 This is a flowchart of the steps of a data object processing method according to an embodiment of this application;
[0040] Figure 7 This is a schematic diagram of the training phase processing of one embodiment of this application;
[0041] Figure 8 This is a schematic diagram of the classification stage processing procedure according to an embodiment of this application;
[0042] Figure 9 This is a schematic diagram of the structure of a data object processing apparatus according to an embodiment of this application;
[0043] Figure 10 This is a schematic diagram of the structure of a data object processing apparatus according to an embodiment of this application;
[0044] Figure 11 This is a schematic diagram of the structure of a data object processing apparatus according to an embodiment of this application;
[0045] Figure 12 This is a schematic diagram of the structure of an exemplary device provided in one embodiment of this application. Detailed Implementation
[0046] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, the application will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0047] In this embodiment, the data object can be a composite information representation understood by the software. The data object can be an entity, thing, incidental event or event, role, organizational unit, location, or structure, etc. For example, the data object can include goods, commodities, and other data objects used for customs clearance or declaration. This embodiment is used to determine the category information corresponding to the data object. This category information can include the encoding information of the data object in the HS coding classification scenario. It is understood that this embodiment does not limit the specific category information.
[0048] Current HS coding classification methods typically first use mathematical models to determine the feature representations of data objects, and then determine the corresponding coding information based on these feature representations. However, current HS coding classification methods suffer from low classification accuracy.
[0049] Currently, mathematical models possess feature extraction capabilities, which can be used to represent the mapping relationship between object information and feature representations. Object information can be shallow, typically presented in text form. Feature representations can be deep, typically presented in vector form. Taking a commodity as an example, object information can include object attributes such as category, material, and content.
[0050] In scenarios where different products belong to the same category, traditional mathematical models typically provide the same feature representations. For example, if product A belongs to the category 'facial cleanser' and product B also belongs to the category 'facial cleanser', traditional mathematical models will provide the same feature representations for both products A and B. This results in insufficient accuracy of the feature representations, leading to low product classification accuracy.
[0051] To address the technical problem of low classification accuracy, embodiments of this application provide a data object processing scheme, which specifically includes: determining a first feature representation corresponding to the data object, and a second feature representation and a third feature representation corresponding to the candidate category information; wherein, the second feature representation may correspond to the attribute content of the object attribute; the third feature representation may correspond to the object attribute; based on the relationship information corresponding to the object attribute, determining a first mapping representation and a second mapping representation corresponding to the first feature representation and the second feature representation respectively under the condition of the relationship information, and determining the matching information between the data object and the candidate category information based on the first mapping representation, the second mapping representation, and the third feature representation; and determining the target candidate category information corresponding to the data object based on the matching information.
[0052] This application embodiment determines the first mapping representation and the second mapping representation corresponding to the first feature representation and the second feature representation under the condition of relational information, respectively. The first feature representation and the second feature representation can be mapped to the relational space represented by the relational information corresponding to the object attributes. This can, to some extent, avoid the problem of different products of the same category corresponding to the same feature representation. Based on this, this application embodiment determines the target candidate category information corresponding to the data object according to the first mapping representation and the second mapping representation, which can improve classification accuracy.
[0053] Method Example 1
[0054] Reference Figure 1 The flowchart illustrates the steps of a data object processing method according to an embodiment of this application, which may specifically include the following steps:
[0055] Step 101: Determine the training data; the training data may include: object samples, object attributes, first attribute content, and second attribute content; wherein, the first attribute content represents the attribute content corresponding to the sample with the same category information as the object sample under the condition of the object attribute; the second attribute content represents the attribute content corresponding to the sample with different category information than the object sample under the condition of the object attribute.
[0056] Step 102: Based on the first relation information corresponding to the object attributes, determine the training mapping representations corresponding to the feature representations of the object sample, the content of the first attribute, and the content of the second attribute under the condition of the first relation information.
[0057] Step 103: Update the first relationship information and the feature representation corresponding to the training data according to the mapping relationship between the loss information, the training mapping representation, and the feature representation corresponding to the object attributes.
[0058] The embodiments of this application can be used to train training data to obtain first relation information and feature representations that meet the requirements.
[0059] The embodiments of this application can train training data based on a data analyzer. During the training process, the first relation information and the feature representation corresponding to the training data can be updated to obtain the first relation information and feature representation that meet the requirements.
[0060] This application embodiment can train a mathematical model based on training samples to obtain a data analyzer. A mathematical model is a scientific or engineering model constructed using mathematical logic methods and mathematical language. It is a mathematical structure that, using mathematical language, summarizes or approximates the characteristics or quantitative dependencies of a system referring to something. This mathematical structure is a relational structure characterized by mathematical symbols. A mathematical model can be one or a set of algebraic equations, differential equations, difference equations, integral equations, or statistical equations, or combinations thereof, which quantitatively or qualitatively describe the interrelationships or causal relationships between the variables of the system. Besides mathematical models described by equations, there are also models described using other mathematical tools, such as algebra, geometry, topology, and mathematical logic. In these cases, the mathematical model describes the behavior and characteristics of the system rather than its actual structure. Among them, machine learning and deep learning methods can be used to train mathematical models. Machine learning methods can include linear regression, decision trees, random forests, etc., while deep learning methods can include CNN (Convolutional Neural Networks), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), etc.
[0061] The mathematical model corresponding to the data analyzer in this application embodiment may include: a mathematical model with feature extraction capability, and / or, a mathematical model with interpretability. For example, a mathematical model with feature extraction capability may include: BERT (Bidirectional Encoder Representation from Transformers), ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacement Accurately), Transformer, RNN, CNN, etc. As another example, a mathematical model with interpretability may include: a mathematical model embedding knowledge graphs, such as TransR (Translating on Relation Space) and TransH (translating on hyperplanes), etc.
[0062] The training process for a data analyzer can include forward propagation and backward propagation.
[0063] The forward propagation process calculates the final output information sequentially from the input layer to the output layer based on the input information. This output information can be used to determine the loss information. The input information in this embodiment may include: object samples, object attributes, feature representations corresponding to the first and second attribute contents, and first relationship information.
[0064] Backpropagation updates the input information sequentially from the output layer to the input layer based on the loss information. During backpropagation, the gradient information of the input information can be determined and used to update the input information. For example, backpropagation can calculate and store the gradient information of the input information sequentially from the output layer to the input layer, following the chain rule in calculus.
[0065] In step 101, an object sample set can be constructed, which may include multiple labeled object samples.
[0066] Taking data objects such as commodities as an example, the object samples can correspond to category information, which can be coding information. HS codes can include 22 major categories and 98 chapters. Internationally accepted HS codes can include the first two digits, the third and fourth digits, and the fifth and sixth digits. The seventh digit and subsequent digits can be determined by the country. For example, Chinese customs uses a ten-digit HS coding system, which can include 6 digits of coding information corresponding to the international standard and 4 digits of coding information corresponding to the national standard.
[0067] The encoding information in this application embodiment may include: 6-bit encoding or encoding of more than 6 bits. It is understood that this application embodiment does not limit the number of digits corresponding to the encoding information.
[0068] Reference Figure 2 The illustration shows an example of encoded information according to an embodiment of this application. The encoded information, in order from front to back, may include: 6-digit encoded information corresponding to international standards and 4-digit encoded information corresponding to national standards. The 6-digit encoded information corresponding to international standards may include: 2-digit encoded information corresponding to chapters, 2-digit encoded information corresponding to tariff headings, and 2-digit encoded information corresponding to subheadings.
[0069] The encoding information corresponding to the object sample in this application embodiment may include: 6-bit encoding information; in this case, this application embodiment can be used to determine the 6-bit encoding information corresponding to the data object under the international standard. Of course, the encoding information corresponding to the object sample in this application embodiment may include: 10-bit encoding information; in this case, this application embodiment can be used to determine the 10-bit encoding information corresponding to the data object under both the international standard and the Chinese standard. Furthermore, under the standards of countries other than China, the number of digits corresponding to the encoding information may not be equal to 10.
[0070] In practical applications, a corresponding 10-digit code can be labeled for customs clearance goods, and / or the 10-digit code can be extracted from the customs filing data; and the customs clearance goods corresponding to the above 10-digit code can be saved to the object sample set.
[0071] The object samples in this application can correspond to category information, as well as object attributes and attribute content. In other words, there is a correspondence between object samples, category information, object attributes, and attribute content.
[0072] In practical applications, NER (Named Entity Recognition) can be performed on the object attributes and attribute content corresponding to the object sample to obtain the object attributes and attribute content in entity form. NER can be used to automatically identify named entities from the original corpus.
[0073] Furthermore, embodiments of this application can utilize a knowledge graph to store the mapping relationships between object samples, object attributes, and attribute content. A knowledge graph is a structured semantic knowledge base used to describe concepts and their interrelationships in the physical world. A knowledge graph can be a semantic network that reveals the relationships between entities. A knowledge graph can also store the mapping relationships between object samples, category information, object attributes, and attribute content.
[0074] An entity is an objectively existing and distinguishable thing, including concrete people, things, objects, abstract concepts, or relationships. An entity can be a concrete object, such as facial cleanser or a piece of clothing. An entity can have many characteristics; a single characteristic is called an object attribute. Each object attribute can have one or more attribute contents.
[0075] In practical applications, knowledge graphs can use triples such as (entity, object attribute, attribute content) to represent facts, and graph databases can be chosen as the storage medium.
[0076] Figure 3This is an example of a knowledge graph according to an embodiment of this application, wherein a product can correspond to multiple object attributes such as category, brand, material, content, and style, and an object attribute can correspond to attribute content.
[0077] The training data in this embodiment can be represented as (I, P, V+, V-). Where I can represent an object sample, P can represent an object attribute, V+ can represent the content of a first attribute, and V- can represent the content of a second attribute. The relationship between P and V can be the relationship between an object attribute and its content. For example, if the object attribute is a category, the attribute content could be something like facial cleanser or clothing. Another example is if the object attribute is a material, the attribute content could be something like pure cotton or leather. Yet another example is if the object attribute is a content, the attribute content could be something like silk >= 70% or cotton >= 90%.
[0078] In practical applications, the object sample I1 corresponding to V+ can correspond to the same encoded information as I. Figure 2 Taking the encoded information shown as an example, assuming that the category information corresponding to the encoded information of object sample I includes the following three levels of sub-category information: chapter, tax item and sub-item, then for object sample I, under the category information corresponding to its "chapter", "tax item" and "sub-item", several object samples I1 can be randomly sampled. The attribute content corresponding to I1 under condition P can be recorded as V+.
[0079] In practical applications, the object sample I1 corresponding to V- can correspond to different encoded information as I. Figure 2 Taking the encoded information shown as an example, assuming that the category information corresponding to the encoded information of object sample I includes the following three levels of sub-category information: chapter, tax item, and sub-item, then for object sample I, several object samples I2 can be randomly sampled under the category information corresponding to its respective "chapter" and "tax item" and different "sub-items"; and / or, several object samples I2 can be randomly sampled under the category information corresponding to its respective "chapter" and different "tax items"; and / or, several object samples I2 can be randomly sampled under the category information corresponding to different "chapters". The attribute content corresponding to I2 under condition P can be recorded as V-. The number of object samples I1 and object samples I2 can be matched to achieve the matching of V+ and V-. Among them, V+ and V- can correspond to the same object attribute P.
[0080] Examples of (I, P, V+, V-) can include: (I, category, facial cleanser, milk), (I, material, pure cotton, modal), etc.
[0081] In step 102, the first relation information corresponding to the object attribute can be used to map the feature representations corresponding to the object sample, the first attribute content, and the second attribute content to the relation space corresponding to the object attribute.
[0082] In practical implementation, the feature representation Ie corresponding to the object sample can be determined based on the descriptive information (such as product title information) corresponding to the object sample. For example, the descriptive information corresponding to the object sample can be input into a vector model, and the vector model can output the feature representation Ie corresponding to the object sample. The vector model can include: a sentence vector model and / or a word vector model. The input of the sentence vector model can be: descriptive information that has not been segmented. The sentence vector model can be a language model. The input of the word vector model can be: descriptive information that has been segmented. The vector model or the word vector model can be a pre-trained model. Pre-training can first train the pre-trained model on a large amount of general corpus to learn general language knowledge, and then perform targeted transfer training for the task. The task in this embodiment can be a data object classification task. The above transfer training can adjust the parameters of the pre-trained model to update the feature representation Ie corresponding to the object sample.
[0083] The feature representation Pe corresponding to the object attribute, the first relation information Mp, the feature representation Ve1 corresponding to the first attribute content, and the feature representation Ve2 corresponding to the first attribute content can be determined through initialization.
[0084] In this embodiment, an entity can be a combination of multiple attributes, and different relations can focus on different attributes of the entity. Different relations can have different semantic spaces and relation spaces. Mapping object samples and entities corresponding to attribute content to the relation space corresponding to object attributes can, to some extent, avoid the problem of different products of the same category corresponding to the same feature representation. The above mapping enables object samples with the relationship and entities corresponding to attribute content to be truly close to each other under this relationship, while object samples without the relationship and entities corresponding to attribute content to be far apart.
[0085] This application embodiment can map the feature representation Ie corresponding to the object sample under the relation space corresponding to the first relation information Mp, and the training mapping representation Iep corresponding to Ie can be obtained by multiplying Ie and Mp. This application embodiment can map the feature representation Ve1 corresponding to the first attribute content under the relation space corresponding to the first relation information Mp, and the training mapping representation Vep1 corresponding to Ve1 can be obtained by multiplying Ve1 and Mp. This application embodiment can map the feature representation Ve2 corresponding to the second attribute content under the relation space corresponding to the first relation information Mp, and the training mapping representation Vep2 corresponding to Ve2 can be obtained by multiplying Ve2 and Mp. The above mapping operations can be performed by a data analyzer, and this application embodiment does not limit the specific execution entity corresponding to the mapping operation.
[0086] In step 103, the first matching information A between the object sample and the first attribute content, and the second matching information A between the object sample and the second attribute content can be determined based on the training mapping representation and the feature representation corresponding to the object attribute.
[0087] The object sample corresponds to the same category information as the content of the first attribute, therefore the first matching information A can represent intra-class matching information, and can be denoted as dis_pos. The object sample corresponds to different category information than the content of the second attribute, therefore the second matching information A can represent inter-class matching information, and can be denoted as dis_neg.
[0088] In this embodiment, the entities corresponding to object samples and attribute content are mapped to the relation space corresponding to object attributes, so as to obtain projected entities, such as the first projected entity corresponding to the object sample, the second projected entity corresponding to the first attribute content, and the third projected entity corresponding to the second attribute content.
[0089] This application embodiment can also construct a correspondence A between two projected entities. This correspondence A can be considered as follows: the second projected entity is a fusion of the first projected entity and the feature representation Pe corresponding to the object attribute; and the third projected entity is a fusion of the first projected entity and the feature representation Pe corresponding to the object attribute. Therefore, the first matching information A can be the matching information between the training mapping representation Iep and the first difference representation between the training mapping representation Vep1 and the feature representation Pe; and the second matching information A can be the matching information between the training mapping representation Iep and the second difference representation between the training mapping representation Vep2 and the feature representation Pe.
[0090] This application embodiment can utilize a measurement method to determine the first matching information A and the second matching information A. The measurement method may include: Euclidean distance, or cosine of the included angle, or information entropy, etc. It is understood that this application embodiment does not limit the specific measurement method.
[0091] In practical applications, a first loss function can be used to characterize the mapping relationship between the first loss information and the training mapping representation and the feature representation corresponding to the object attributes. This first loss function can characterize the first difference information between the inter-class dimensional information and the intra-class dimensional information.
[0092] The first inter-class dimensional information can be obtained based on the second matching information A. The first intra-class dimensional information can be obtained based on the first matching information A. For example, a comparison operation can be performed based on the fusion result of the first difference information and the first parameter between the first inter-class dimensional information and the first intra-class dimensional information, as well as a preset value. The larger value in this comparison operation can be used as the first loss information. The first parameter can be used to adjust the first difference information between the first inter-class dimensional information and the first intra-class dimensional information; for example, the first parameter can be a positive number such as 1.
[0093] In this embodiment, since the first matching information A can represent intra-class matching information and the second matching information A can represent inter-class matching information, when both the first matching information A and the second matching information A represent the distance information between two vectors, the actual value of the first matching information A can be 0, and the actual value of the second matching information A can be 1. This embodiment can determine a preset value of 0 based on the actual values of the first and second matching information A. Of course, this embodiment does not limit the specific preset value.
[0094] Since the first matching information A and the second matching information A are related to the first relationship information and the feature representation corresponding to the training data, the first loss information in this embodiment can be obtained based on the mapping relationship represented by the first relationship information, the feature representation corresponding to the training data, and the first loss function. Furthermore, this embodiment can use a preset value as the optimization target to update the first relationship information and the feature representation corresponding to the training data. Optimization methods may include gradient descent, Newton's method, quasi-Newton method, conjugate gradient method, etc. It is understood that this embodiment does not limit the specific optimization method.
[0095] In practical applications, partial derivatives can be calculated for the parameters of the first loss function (such as the first relation information and the feature representations corresponding to the training data). These partial derivatives can be written out as a vector, and the vector corresponding to the partial derivatives can be called the gradient information of the parameters. The update amount of the parameters can be obtained based on the gradient information and the step size information.
[0096] When using gradient descent, various methods can be employed, such as batch gradient descent, stochastic gradient descent, or mini-batch gradient descent. In practical implementation, iteration can be performed based on a single set of training data corresponding to one object sample; alternatively, it can be performed based on multiple sets of training data corresponding to one object sample; or it can be performed based on multiple sets of training data corresponding to multiple object samples. The convergence condition for these iterations can be that the first loss information corresponding to the first loss function meets the convergence condition. Alternatively, the convergence condition can be that the loss value corresponding to the first loss information equals a preset value, or that the number of iterations exceeds a threshold. In other words, the iteration can terminate when the first loss information corresponding to the first loss function meets the convergence condition; in this case, the first target parameter can be obtained, which can be used for classifying data objects.
[0097] In summary, the data object processing method of this application embodiment trains the training data to obtain a first target parameter that meets the requirements. The first target parameter may include: compliant relational information and feature representations, etc. The compliant feature representation may include: feature representations corresponding to object samples, feature representations corresponding to object attributes, feature representations corresponding to the content of the first attribute, and feature representations corresponding to the content of the second attribute, etc.
[0098] In practical applications, the required relational information and feature representations can be saved for use in the classification process of data objects.
[0099] Method Example 2
[0100] Reference Figure 4 The flowchart illustrates the steps of a data object processing method according to an embodiment of this application, which may specifically include the following steps:
[0101] Step 401: Determine the triplet object sample; the triplet object sample may include: a first object sample, a second object sample, and a third object sample; wherein, the second object sample and the first object sample may correspond to the same category information; the third object sample and the first object sample may correspond to different category information.
[0102] Step 402: Based on the feature representation and attribute weights corresponding to the triplet object samples, determine the first matching information B between the first object sample and the second object sample, and the second matching information B between the first object sample and the third object sample; there can be a correspondence B between the feature representation and attribute weights corresponding to the triplet object samples, the feature representation corresponding to the object attributes, and the feature representation corresponding to the attribute content.
[0103] Step 403: Update the attribute weights according to the mapping relationship between the second loss information, the first matching information B, and the second matching information B.
[0104] The embodiments of this application can be used to determine the attribute weights corresponding to object attributes based on training. Attribute weights can characterize the contribution of object attributes to the classification of data objects. Attribute weights can be used to give different levels of attention to different object attributes during the classification process of data objects, thereby improving classification accuracy.
[0105] In step 401, the triplet object sample can be represented as (Iemb, I_pos, I_neg). Here, Iemb can represent the first object sample, I_pos can represent the second object sample, and I_neg can represent the third object sample.
[0106] In practical applications, the second object sample can correspond to the same encoded information as the first object sample. Figure 2 Taking the encoded information shown as an example, assuming that the category information corresponding to the encoded information of the first object sample I includes the following three levels of sub-category information: chapter, tax item, and sub-item, then for the first object sample Iemb, under the category information corresponding to its "chapter", "tax item", and "sub-item", several second object samples I_pos can be randomly sampled.
[0107] In practical applications, the third object sample and the first object sample can correspond to different encoded information. Figure 2 Taking the encoded information shown as an example, assuming that the category information corresponding to the encoded information of the first object sample Iemb includes the following three levels of sub-category information: chapter, tax item, and sub-item, then for the first object sample Iemb, several third object samples I_neg can be randomly sampled under the category information corresponding to its respective "chapter" and "tax item" and different "sub-items"; and / or, several third object samples I_neg can be randomly sampled under the category information corresponding to its respective "chapter" and different "tax items"; and / or, several object samples I_neg can be randomly sampled under the category information corresponding to different "chaps". The number of second object samples I_pos and third object samples I_neg can be matched.
[0108] Triple object samples can be represented as (Iemb, I_pos_1, I_neg1), (Iemb, I_pos_2, I_neg1), ..., (Iemb, I_pos_N, I_negN), where N can be the number of triple object samples, and N can be a positive integer.
[0109] In this embodiment, NER can be performed on the object attributes and attribute content corresponding to the first object sample Iemb to obtain the object attributes and attribute content corresponding to the first object sample Iemb. Similarly, the object attributes and attribute content corresponding to the second object sample I_pos and the third object sample I_neg can be obtained.
[0110] In step 402, a vector model can be used to determine the feature representations corresponding to the object attributes and the feature representations corresponding to the attribute content. The vector model can include sentence vector models and / or word vector models. The object weights corresponding to the object attributes can be determined through initialization.
[0111] There can be a correspondence B between the feature representation of a triplet object sample and its attribute weights, the feature representation of the object attribute, and the feature representation of the attribute content. This correspondence B suggests that the fusion of the feature representation of the triplet object sample and the feature representation of the object attribute should be as close as possible to the feature representation of the attribute content. Based on this correspondence B, the feature representation of the triplet object sample can be obtained by multiplying the second difference information between the feature representation of the attribute content and the feature representation of the object attribute, and the attribute weight. It should be noted that a triplet object sample can correspond to multiple object attributes, and each object attribute can correspond to one type of second difference information. The products of multiple second difference information can be fused to obtain the feature representation of the triplet object sample.
[0112] In practical applications, the feature representations corresponding to the triplet object samples can be determined based on the correspondence B, object weights, feature representations corresponding to object attributes, and feature representations corresponding to attribute content. That is, the feature representations corresponding to the first object sample Iemb can be obtained respectively. Feature representation of the second object sample I_pos and the feature representation corresponding to the third object sample I_neg
[0113] The first object sample and the second object sample correspond to the same category information, so the first matching information B can represent intra-class matching information, and the first matching information B can be denoted as dis1. The first object sample and the third object sample correspond to different category information, so the second matching information B can represent inter-class matching information, and the second matching information B can be denoted as dis2.
[0114] The embodiments of this application can utilize a metric method to determine the first matching information B and the second matching information B. The metric method may include: Euclidean distance, cosine of the included angle, or information entropy, etc.
[0115] In step 403, the mapping relationship between the second loss information and the first matching information B and the second matching information B can be characterized by the second loss function.
[0116] The second inter-class dimensional information can be obtained from the second matching information B. The second intra-class dimensional information can also be obtained from the second matching information B. For example, a comparison can be performed based on the third difference information between the second inter-class and second intra-class dimensional information, the fusion result of the second parameter, and a preset value. The larger value in this comparison can be used as the second loss information. The second parameter can be used to adjust the third difference information between the second inter-class and second intra-class dimensional information; for example, the second parameter can be a positive number such as 1.
[0117] In this embodiment, since the first matching information B can represent intra-class matching information and the second matching information B can represent inter-class matching information, when both the first matching information B and the second matching information B represent the distance information between two vectors, the actual value of the first matching information B can be 0, and the actual value of the second matching information B can be 1. This embodiment can determine a preset value of 0 based on the actual values of the first and second matching information B. Of course, this embodiment does not limit the specific preset value.
[0118] Since the first matching information B or the second matching information B is related to the object weight, the feature representation corresponding to the object attribute, and the feature representation corresponding to the attribute content, the second loss information in this embodiment can be obtained based on parameters such as the correlation between the object weight, the feature representation corresponding to the object attribute, and the feature representation corresponding to the attribute content. Furthermore, this embodiment can use a preset value as the optimization target for the second loss information, updating the correlation between the object weight, the feature representation corresponding to the object attribute, and the feature representation corresponding to the attribute content. Optimization methods can include gradient descent, Newton's method, quasi-Newton method, conjugate gradient method, etc. It is understood that this embodiment does not limit the specific optimization method.
[0119] In practical applications, partial derivatives can be calculated for the parameters of the first loss function (such as the first relation information and the feature representations corresponding to the training data). These partial derivatives can be written out as a vector, and the vector corresponding to the partial derivatives can be called the gradient information of the parameters. The update amount of the parameters can be obtained based on the gradient information and the step size information.
[0120] When using gradient descent, various methods can be employed, such as batch gradient descent, stochastic gradient descent, or mini-batch gradient descent. In practice, iteration can be performed based on a set of triplet object samples corresponding to a first object sample; alternatively, it can be performed based on multiple sets of triplet object samples corresponding to a first object sample; or it can be performed based on multiple sets of triplet object samples corresponding to multiple first object samples. The convergence condition for these iterations can be that the second loss information corresponding to the second loss function meets the convergence condition. The convergence condition can also be that the loss value corresponding to the second loss information equals a preset value, or that the number of iterations exceeds a threshold. In other words, the iteration can end when the second loss information corresponding to the second loss function meets the convergence condition; in this case, a second objective parameter can be obtained, which can be used for classifying data objects.
[0121] In summary, the data object processing method of this application embodiment trains triplet object samples to obtain a second target parameter that meets the requirements. The second target parameter may include, for example, the object weights that meet the requirements. In practical applications, the object weights that meet the requirements can be saved for use in the data object classification process.
[0122] In practical applications, we can first determine the object weights corresponding to object attributes based on the training of a first object sample; that is, we obtain the object weights corresponding to object attributes under the condition of a first object sample. Then, we can fuse (e.g., average) the object weights corresponding to object attributes under the conditions of multiple first object samples to obtain the final object weights. For example, the object weight corresponding to a product category is 0.5, the object weight corresponding to a material is 0.2, and so on.
[0123] It should be noted that some object attributes, due to their long character length or other reasons, may not be included in the training of this application's embodiments. "By milliliters" and "by weight" are examples of this type of object attribute. For this type of object attribute, the object weight corresponding to this type of object attribute can be determined based on the fusion result (such as the average result) of the object weights participating in the training.
[0124] Method Example 3
[0125] This application embodiment describes the training process of a first data analyzer. The first data analyzer is used to determine the first category information corresponding to the data object, and to characterize the mapping relationship between the object information and the first category information. The object information can be text information, such as the descriptive information corresponding to the data object (e.g., product title information), or the attribute content corresponding to the data object (e.g., category, material, content, etc.).
[0126] Category information can include: multiple characters, where the first category information can include: the first M or last M characters of the multiple characters, where M can be a positive integer. Alternatively, category information can include: sub-category information at multiple levels, in which case the first category information can include: information from a subset of the multiple levels. Figure 2 Taking the coding information shown as an example, the category information corresponding to the coding information includes the following three levels of sub-category information: chapter, tax item, and sub-item. The first data analyzer can then be used to determine the coding information corresponding to the chapter, or the coding information corresponding to the chapter and tax item.
[0127] In practical applications, the first data analyzer may include a feature extraction unit with feature extraction capabilities and a classification unit with classification capabilities. The feature extraction unit can be a pre-trained model. The classification unit may include activation functions, etc.
[0128] During the training process of the first data analyzer, the feature extraction unit can determine the feature representation corresponding to the object sample, and the classification unit can classify the object sample. The third loss information between the corresponding classification result and the actual result (which can be determined based on the positive or negative nature of the object sample) can be used to update the parameters of the feature extraction unit and the classification unit until the third loss information meets the preset conditions.
[0129] During the use of the first data analyzer, the feature extraction unit can determine the feature representation corresponding to the data object, and the classification unit can classify the data object. In this case, the first category information corresponding to the data object can be obtained.
[0130] Method Example 4
[0131] Reference Figure 5 The flowchart illustrates the steps of a data object processing method according to an embodiment of this application, which may specifically include the following steps:
[0132] Step 501: Determine the first feature representation corresponding to the data object, and the second and third feature representations corresponding to the candidate category information; wherein, the second feature representation corresponds to the attribute content of the object attribute; the third feature representation corresponds to the object attribute.
[0133] Step 502: Based on the relational information corresponding to the object attributes, determine the first mapping representation and the second mapping representation corresponding to the first feature representation and the second feature representation under the condition of the relational information, and determine the matching information between the data object and the candidate category information based on the first mapping representation, the second mapping representation and the third feature representation;
[0134] Step 503: Determine the target candidate category information corresponding to the data object based on the matching information.
[0135] The embodiments of this application can be used to classify data objects to obtain target candidate category information corresponding to the data objects. For example, the target candidate category information can be encoded information.
[0136] In step 501, the first feature representation Ie1 corresponding to the data object can be determined based on the descriptive information (such as product title information) corresponding to the data object. For example, the descriptive information corresponding to the data object can be input into a vector model, and the vector model can output the feature representation Ie1 corresponding to the data object. The vector model may include: a sentence vector model and / or a word vector model. This application embodiment can be based on... Figure 1 The illustrated method update the parameters of the vector model to improve the accuracy of the feature representation output by the vector model. It should be noted that when using both sentence vector models and word vector models simultaneously, the outputs of the sentence vector model and the word vector model can be concatenated to obtain the first feature representation Ie1 corresponding to the data object.
[0137] Candidate category information can correspond to all object samples in the object sample set. For example, the category information corresponding to all object samples in the object sample set can be used as candidate category information.
[0138] Candidate category information can correspond to a portion of the object samples in the object sample set. For example, the process of determining candidate category information may include: determining the first category information corresponding to the data object based on a first data analyzer; the first category information may correspond to a portion of the category information; the first data analyzer can be used to characterize the mapping relationship between the object information and the first category information; and obtaining candidate category information that matches the first category information from the category information corresponding to the object sample.
[0139] Category information can include multiple characters, and the first category information can include the first M characters of the multiple characters. Thus, category information prefixed with the first category information can be obtained from the category information corresponding to the object sample, serving as candidate category information. Conversely, if the first category information includes the last M characters of the multiple characters, category information suffixed with the first category information can be obtained from the category information corresponding to the object sample, serving as candidate category information.
[0140] In practical applications, the total number of corresponding category information for multi-character strings is usually enormous, typically on the order of tens of thousands or hundreds of thousands. If a first data analyzer were used to classify all multi-character strings, it would affect the classification accuracy of the first data analyzer. However, this embodiment of the application uses the first data analyzer to classify only a portion of the multi-character strings, which can improve the classification accuracy of the first data analyzer. Furthermore, this embodiment of the application can also classify all multi-character strings based on the first category information output by the first data analyzer.
[0141] The second feature representation corresponding to the candidate category information can be represented as P1, and the third feature representation corresponding to the candidate category information can be represented as V1. In Figure 1 In the illustrated method embodiment, after obtaining the first target parameters that meet the requirements, the corresponding relational information and feature representations can be saved. Therefore, the second feature representation P1 and the third feature representation V1 corresponding to the candidate category information can be obtained from the saved feature representations.
[0142] In step 502, the relationship information Mp corresponding to the object attribute can be obtained from the saved relationship information.
[0143] In this embodiment of the application, the relational information corresponding to the object attributes can be obtained by training based on training data; the training data may include: object samples, object attributes, first attribute content and second attribute content; wherein, the first attribute content can characterize the attribute content corresponding to the sample with the same category information as the object sample under the condition of the object attribute; the second attribute content can characterize the attribute content corresponding to the sample with different category information than the object sample under the condition of the object attribute.
[0144] In this embodiment of the application, the training process of training data may include: determining the training mapping representations corresponding to the object sample, the first attribute content, and the second attribute content under the condition of the first relation information, based on the first relation information corresponding to the object attribute; and updating the first relation information and the feature representations corresponding to the training data according to the mapping relationship between the loss information, the training mapping representation, and the feature representations corresponding to the object attribute.
[0145] This application embodiment can map the first feature representation Ie1 and the second feature representation P1 to the relation space represented by the relation information corresponding to the object attributes; in this way, the problem of different products of the same category corresponding to the same feature representation can be avoided to a certain extent. The first mapping representation Iep1 corresponding to the first feature representation Ie1 can be the product of the first feature representation Ie1 and the relation matrix Mp corresponding to the relation information. The second mapping representation Vep1 corresponding to the second feature representation P1 can be the product of the second feature representation P1 and the relation matrix Mp corresponding to the relation information.
[0146] In this embodiment, a correspondence A can be constructed between two projected entities corresponding to the first mapping representation Iep1 and the second mapping representation Vep1. This correspondence A can be considered as the fusion of the projected entity corresponding to the second mapping representation Vep1 and the feature representation Pe corresponding to the object attribute. Therefore, the matching information dis can be the matching information between the second difference representation of the first mapping representation Iep1 and the second mapping representation Vep1 and the feature representation Pe.
[0147] In one implementation, the process of determining the matching information between the data object and the candidate category information based on the first mapping representation, the second mapping representation, and the third feature representation may specifically include: determining the attribute matching information corresponding to the data object under the conditions of the object attributes based on the first mapping representation, the second mapping representation, and the third feature representation; and determining the matching information between the data object and the candidate category information based on the attribute weights of the object attributes and the attribute matching information.
[0148] Attribute matching information can characterize the matching information corresponding to a certain object attribute P. In practical applications, data objects and candidate category information all have corresponding attribute matching information under object attributes P1, P2, P3, ..., Pn. Embodiments of this application can fuse attribute matching information corresponding to multiple object attributes according to attribute weights to obtain matching information between data objects and candidate category information. Corresponding fusion methods may include weighted average methods, etc.
[0149] Attribute weights characterize the contribution of an object's attributes to the classification of the data object. Attribute weights can be used to assign different levels of attention to different object attributes during the classification process, thereby improving classification accuracy.
[0150] In this embodiment of the application, the attribute weights can be obtained by training based on triple object samples; the triple object samples include: a first object sample, a second object sample, and a third object sample; wherein, the second object sample corresponds to the same category information as the first object sample; the third object sample corresponds to different category information as the first object sample; the feature representation corresponding to the triple object sample has a correspondence with the attribute weights, the feature representation corresponding to the object attributes, and the feature representation corresponding to the attribute content.
[0151] In step 503, this embodiment of the application can use a measurement method to determine the matching information dis. When the matching information dis is 0, it indicates a high degree of matching between the data object and the candidate category information. When the matching information dis is 1, it indicates a low degree of matching between the data object and the candidate category information. This embodiment of the application can also obtain a matching score based on the difference between 1 and the matching information dis, and sort the candidate category information according to the matching scores from largest to smallest. Based on the sorting result, the candidate category information with the highest value Q is selected as the target candidate category information, where Q can be a positive integer.
[0152] In summary, the data object processing method of this application embodiment determines the first mapping representation and the second mapping representation corresponding to the first feature representation and the second feature representation under the condition of relational information, respectively. This can map the first feature representation and the second feature representation to the relational space represented by the relational information corresponding to the object attributes. In this way, the problem of different products of the same category corresponding to the same feature representation can be avoided to a certain extent. Based on this, the target candidate category information corresponding to the data object is determined according to the first mapping representation and the second mapping representation, which can improve the classification accuracy.
[0153] Method Example 5
[0154] Reference Figure 6 The flowchart illustrates the steps of a data object processing method according to an embodiment of this application, which may specifically include the following steps:
[0155] Step 601: Determine the first category information corresponding to the data object based on the first data analyzer; the first category information may correspond to a part of the category information; the first data analyzer is used to characterize the mapping relationship between the object information and the first category information;
[0156] Step 602: Obtain candidate category information that matches the first category information from the category information corresponding to the object sample;
[0157] Step 603: Determine the target candidate category information corresponding to the data object based on the matching information between the data object and the candidate category information.
[0158] In practical applications, the total number of corresponding category information for multi-character strings is usually enormous, typically on the order of tens of thousands or hundreds of thousands. If the first data analyzer is used to classify all multi-character strings, it will affect the classification accuracy of the first data analyzer.
[0159] This application's embodiments divide the classification of data objects into a first classification stage and a second classification stage. In the first classification stage, a first data analyzer is used to classify the multi-character portion, which improves the classification accuracy of the first data analyzer. In the second classification stage, the entire multi-character set can be classified based on the first category information output by the first data analyzer; specifically, the target candidate category information corresponding to the data object can be determined based on the matching information between the data object and the candidate category information.
[0160] The process of determining the matching information between data objects and candidate category information may include: determining the matching information between data objects and candidate category information based on the matching information between the first feature representation corresponding to the data object and the fourth feature representation of the object sample corresponding to the candidate category information.
[0161] The process of determining the matching information between data objects and candidate category information may include: determining the first mapping representation and the second mapping representation corresponding to the first feature representation and the second feature representation under the condition of the relationship information, based on the relationship information corresponding to the object attributes, and determining the matching information between data objects and candidate category information based on the first mapping representation, the second mapping representation and the third feature representation.
[0162] The process of determining the matching information between a data object and candidate category information may include: determining the attribute matching information corresponding to the data object under the conditions of the object attributes based on the first mapping representation, the second mapping representation, and the third feature representation; and determining the matching information between the data object and the candidate category information based on the attribute weights of the object attributes and the attribute matching information.
[0163] In summary, the data object processing method of this application's embodiments divides the classification of data objects into a first classification stage and a second classification stage. In the first classification stage, a first data analyzer is used to classify the multi-character portion, which can improve the classification accuracy of the first data analyzer. In the second classification stage, the entire multi-character set can be classified based on the first category information output by the first data analyzer.
[0164] Method Example Six
[0165] The data object processing method in this application embodiment may include a training phase and a classification phase.
[0166] Reference Figure 7 This diagram illustrates the processing procedure of the training phase in one embodiment of this application. A first data analyzer with M bits of data can be trained based on the filing data. The filing data can be data stored by customs. This embodiment of the application can train the first data analyzer based on object samples in the filing data; for details, please refer to the description of method embodiment three.
[0167] This application embodiment can determine training data (I, P, V+, V-) based on the filing data, and train a vector model and a second data analyzer based on the training data. The vector model can have feature extraction capabilities to determine the feature representation corresponding to the input (e.g., I). The second data analyzer can be used to determine the first mapping representation and the second mapping representation corresponding to the first feature representation and the second feature representation under the condition of the relationship information, based on the relationship information corresponding to the object attributes. Based on the first mapping representation, the second mapping representation, and the third feature representation, it determines the matching information between the data object and the candidate category information, and based on the matching information, it determines the target candidate category information corresponding to the data object. The second data analyzer can include structures such as deep neural networks and TransR. It is understood that this application embodiment does not limit the specific structure of the second data analyzer.
[0168] Reference Figure 8 This diagram illustrates the processing steps of the classification stage according to an embodiment of this application. The descriptive information of a data object can be input into a first data analyzer, which then outputs the first M category information corresponding to the data object. Further, candidate category information prefixed with the first M category information can be obtained. Based on a knowledge graph, P1 and V1 corresponding to the candidate category information are determined. Based on the content saved in Method Embodiment 1, the third feature representation corresponding to P1 and the second feature representation corresponding to V1 are determined.
[0169] In this embodiment, the descriptive information of the data object can be input into a vector model, and the vector model can output a first feature representation corresponding to the data object. Further, the first feature representation, second feature representation, and third feature representation can be input into a second data analyzer. The second data analyzer can store relational information and, based on... Figure 5 The method embodiment shown determines the matching score of the candidate category information. This embodiment can select several target candidate category information in descending order of matching score, and output the target candidate category information and its corresponding P1 and V1.
[0170] It should be noted that, for the sake of simplicity, the method embodiments are all described as a series of actions. However, those skilled in the art should understand that the embodiments of this application are not limited to the described order of actions, because according to the embodiments of this application, some steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required by the embodiments of this application.
[0171] Based on the above embodiments, this application also provides a data object processing apparatus, referring to... Figure 9The device may include the following modules:
[0172] The feature determination module 901 is used to determine a first feature representation corresponding to a data object, and a second feature representation and a third feature representation corresponding to candidate category information; wherein, the second feature representation corresponds to the attribute content of the object attribute; and the third feature representation corresponds to the object attribute.
[0173] The mapping and matching module 902 is used to determine, based on the relational information corresponding to the object attributes, the first feature representation and the second feature representation respectively corresponding to the first mapping representation and the second mapping representation under the condition of the relational information, and to determine the matching information between the data object and the candidate category information based on the first mapping representation, the second mapping representation and the third feature representation;
[0174] The target category determination module 903 is used to determine the target candidate category information corresponding to the data object based on the matching information.
[0175] To address the aforementioned problems, this application discloses a data object processing apparatus, referring to... Figure 10 The device may include:
[0176] The sample determination module 1001 is used to determine triplet object samples; the triplet object samples include: a first object sample, a second object sample, and a third object sample; wherein, the second object sample corresponds to the same category information as the first object sample; and the third object sample corresponds to different category information than the first object sample.
[0177] The matching module 1002 is used to determine the first matching information between the first object sample and the second object sample, and the second matching information between the first object sample and the third object sample, based on the feature representation corresponding to the triple object sample and the attribute weight; there is a correspondence between the feature representation corresponding to the triple object sample and the attribute weight, the feature representation corresponding to the object attribute, and the feature representation corresponding to the attribute content.
[0178] The update module 1003 is used to update the attribute weights according to the mapping relationship between the loss information, the first matching information, and the second matching information.
[0179] To address the aforementioned problems, this application discloses a data object processing apparatus, referring to... Figure 11 The device may include:
[0180] The first category determination module 1101 is used to determine the first category information corresponding to the data object according to the first data analyzer; the first category information corresponds to a portion of the category information; the first data analyzer is used to characterize the mapping relationship between the object information and the first category information;
[0181] The candidate category determination module 1102 is used to obtain candidate category information that matches the first category information from the category information corresponding to the object sample;
[0182] The target category determination module 1103 is used to determine the target candidate category information corresponding to the data object based on the matching information between the data object and the candidate category information.
[0183] This application also provides a non-volatile readable storage medium storing one or more modules (programs). When these modules are applied to a device, they enable the device to execute the instructions for the method steps in this application.
[0184] This application provides one or more machine-readable media storing instructions that, when executed by one or more processors, cause an electronic device to perform one or more of the methods described in the above embodiments. In this application, the electronic device includes devices such as servers and terminal devices.
[0185] Embodiments of this disclosure can be implemented as an apparatus with any suitable hardware, firmware, software, or any combination thereof, configured as desired, and the apparatus may include electronic devices such as servers (clusters) and terminals. Figure 12 An exemplary apparatus 1300 is schematically shown that can be used to implement the various embodiments described in this application.
[0186] In one embodiment, Figure 12 An exemplary device 1300 is shown, which includes one or more processors 1302, a control module (chipset) 1304 coupled to at least one of the processors 1302, a memory 1306 coupled to the control module 1304, a non-volatile memory (NVM) / storage device 1308 coupled to the control module 1304, one or more input / output devices 1310 coupled to the control module 1304, and a network interface 1312 coupled to the control module 1304.
[0187] Processor 1302 may include one or more single-core or multi-core processors, and processor 1302 may include any combination of general-purpose processors or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, device 1300 can serve as a server, terminal, or other device as described in the embodiments of this application.
[0188] In some embodiments, apparatus 1300 may include one or more computer-readable media (e.g., memory 1306 or NVM / storage device 1308) having instructions 1314 and one or more processors 1302 that are combined with the one or more computer-readable media and configured to execute instructions 1314 to implement modules and thus perform the actions described in this disclosure.
[0189] In one embodiment, the control module 1304 may include any suitable interface controller to provide any suitable interface to at least one of the processors 1302 and / or any suitable device or component communicating with the control module 1304.
[0190] The control module 1304 may include a memory controller module to provide an interface to the memory 1306. The memory controller module may be a hardware module, a software module, and / or a firmware module.
[0191] Memory 1306 may be used, for example, to load and store data and / or instructions 1314 for device 1300. In one embodiment, memory 1306 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, memory 1306 may include double data rate type quad synchronous dynamic random access memory (DDR4 SDRAM).
[0192] In one embodiment, the control module 1304 may include one or more input / output controllers to provide interfaces to the NVM / storage device 1308 and (one or more) input / output devices 1310.
[0193] For example, NVM / storage device 1308 may be used to store data and / or instructions 1314. NVM / storage device 1308 may include any suitable non-volatile memory (e.g., flash memory) and / or may include any suitable (one or more) non-volatile storage devices (e.g., one or more hard disk drives (HDDs), one or more optical disc drives (CDs), and / or one or more digital universal optical disc (DVD) drives).
[0194] NVM / storage device 1308 may include storage resources that are part of a device on which device 1300 is mounted, or that are accessible by the device but do not necessarily have to be part of the device. For example, NVM / storage device 1308 may be accessed via a network via one or more input / output devices 1310.
[0195] One or more input / output devices 1310 may provide an interface for device 1300 to communicate with any other suitable device. Input / output devices 1310 may include communication components, audio components, sensor components, etc. Network interface 1312 may provide an interface for device 1300 to communicate via one or more networks. Device 1300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and / or protocols, such as accessing wireless networks based on communication standards, such as WiFi, 2G, 3G, 4G, 5G, etc., or combinations thereof.
[0196] In one embodiment, at least one of the processors 1302 may be logically packaged with one or more controllers (e.g., memory controller modules) of the control module 1304. In one embodiment, at least one of the processors 1302 may be logically packaged with one or more controllers of the control module 1304 to form a system-in-package (SiP). In one embodiment, at least one of the processors 1302 may be integrated with the logic of one or more controllers of the control module 1304 on the same die. In one embodiment, at least one of the processors 1302 may be integrated with the logic of one or more controllers of the control module 1304 on the same die to form a system-on-a-chip (SoC).
[0197] In various embodiments, device 1300 may be, but is not limited to, a server, desktop computing device, or mobile computing device (e.g., laptop computing device, handheld computing device, tablet computer, netbook, etc.). In various embodiments, device 1300 may have more or fewer components and / or different architectures. For example, in some embodiments, device 1300 includes one or more cameras, a keyboard, a liquid crystal display (LCD) screen (including a touchscreen display), a non-volatile memory port, multiple antennas, a graphics chip, an application-specific integrated circuit (ASIC), and a speaker.
[0198] In the device 1300, a main control chip can be used as a processor or control module, sensor data, position information, etc. are stored in a memory or NVM / storage device, the sensor group can be used as an input / output device, and the communication interface can include a network interface.
[0199] This application also provides an electronic device, including: a processor; and a memory storing executable code thereon, which, when executed, causes the processor to perform one or more methods as described in this application.
[0200] This application also provides one or more machine-readable media having executable code stored thereon, which, when executed, causes a processor to perform one or more of the methods described in this application.
[0201] As the device embodiment is basically similar to the method embodiment, the description is relatively simple, and relevant parts can be found in the description of the method embodiment.
[0202] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
[0203] This application describes embodiments with reference to flowchart illustrations and / or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0204] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0205] These computer program instructions can also be loaded onto a computer or other programmable data processing terminal equipment, causing a series of operational steps to be performed on the computer or other programmable terminal equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable terminal equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0206] Although preferred embodiments of the present application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of the embodiments of the present application.
[0207] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or terminal device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or terminal device. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or terminal device that includes said element.
[0208] The foregoing has provided a detailed description of a data object processing method, a data object processing apparatus, an electronic device, and a storage medium provided in this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A method for processing data objects, characterized in that, The method includes: The process involves determining a first feature representation corresponding to a data object, and second and third feature representations corresponding to candidate category information; wherein the second feature representation corresponds to the attribute content of the object's attributes; and the third feature representation corresponds to the object's attributes. A first category information corresponding to the data object is determined using a first data analyzer; the first category information corresponds to a portion of the category information; the first data analyzer is used to characterize the mapping relationship between object information and the first category information; and candidate category information matching the first category information is obtained from the category information corresponding to the object sample. Based on the relational information corresponding to the object attributes, determine the first mapping representation and the second mapping representation corresponding to the first feature representation and the second feature representation under the condition of the relational information, respectively, and determine the matching information between the data object and the candidate category information based on the first mapping representation, the second mapping representation and the third feature representation; Based on the matching information, the target candidate category information corresponding to the data object is determined.
2. The method according to claim 1, characterized in that, The relational information corresponding to the object attributes is obtained through training based on training data; the training data includes: object samples, object attributes, first attribute content, and second attribute content; wherein, the first attribute content represents the attribute content corresponding to samples with the same category information as the object samples under the conditions of the object attributes; the second attribute content represents the attribute content corresponding to samples with different category information than the object samples under the conditions of the object attributes.
3. The method according to claim 2, characterized in that, The training process of the training data includes: determining the training mapping representations corresponding to the object sample, the first attribute content, and the second attribute content under the condition of the first relation information, based on the first relation information corresponding to the object attributes; and updating the first relation information and the feature representations corresponding to the training data according to the mapping relationship between the loss information, the training mapping representation, and the feature representations corresponding to the object attributes.
4. The method according to any one of claims 1 to 3, characterized in that, The process for determining the candidate category information includes: Based on the first data analyzer, the first category information corresponding to the data object is determined; the first category information corresponds to a portion of the category information; the first data analyzer is used to characterize the mapping relationship between the object information and the first category information; Obtain candidate category information that matches the first category information from the category information corresponding to the object sample.
5. The method according to any one of claims 1 to 3, characterized in that, The step of determining the matching information between the data object and the candidate category information based on the first mapping representation, the second mapping representation, and the third feature representation includes: Based on the first mapping representation, the second mapping representation, and the third feature representation, determine the attribute matching information corresponding to the data object under the condition of object attributes; Based on the attribute weights of the object's attributes and the attribute matching information, the matching information between the data object and the candidate category information is determined.
6. The method according to claim 5, characterized in that, The attribute weights are obtained through training on triplet object samples; The triplet object sample includes: a first object sample, a second object sample, and a third object sample; wherein, the second object sample corresponds to the same category information as the first object sample; the third object sample corresponds to different category information as the first object sample; and there is a correspondence between the feature representation corresponding to the triplet object sample and the attribute weight, the feature representation corresponding to the object attribute, and the feature representation corresponding to the attribute content.
7. A method for processing data objects, characterized in that, The method includes: Determine triplet object samples; the triplet object samples include: a first object sample, a second object sample, and a third object sample; wherein, the second object sample corresponds to the same category information as the first object sample; the third object sample corresponds to different category information than the first object sample; Based on the feature representation and attribute weights corresponding to the triplet object samples, the first matching information between the first object sample and the second object sample, and the second matching information between the first object sample and the third object sample are determined; there is a correspondence between the feature representation corresponding to the triplet object sample and the attribute weights, the feature representation corresponding to the object attribute, and the feature representation corresponding to the attribute content; The attribute weights are updated based on the mapping relationship between the loss information, the first matching information, and the second matching information; the first data analyzer is used to classify the part of the multi-character data corresponding to the data object, and the entire multi-character data is classified based on the first category information output by the first data analyzer.
8. A method for processing data objects, characterized in that, The method includes: Based on the first data analyzer, the first category information corresponding to the data object is determined; the first category information corresponds to a portion of the category information; the first data analyzer is used to characterize the mapping relationship between the object information and the first category information; Obtain candidate category information that matches the first category information from the category information corresponding to the object sample; Based on the matching information between the data object and the candidate category information, the target candidate category information corresponding to the data object is determined; wherein, the process of determining the matching information between the data object and the candidate category information includes: determining the first mapping representation and the second mapping representation corresponding to the first feature representation and the second feature representation respectively under the condition of the relationship information, based on the relationship information corresponding to the object attributes; determining the attribute matching information corresponding to the data object under the condition of the object attributes based on the first mapping representation, the second mapping representation and the third feature representation; determining the matching information between the data object and the candidate category information based on the attribute weight of the object attributes and the attribute matching information; wherein the third feature representation corresponds to the object attributes.
9. An electronic device, characterized in that, include: processor; and A memory having executable code stored thereon, which, when executed, causes the processor to perform the method as described in any one of claims 1-8.
10. One or more machine-readable media having executable code stored thereon, which, when executed, causes a processor to perform the method as claimed in any one of claims 1-8.