Artificial intelligence-based task prediction method, apparatus, device, and medium
By masking unlabeled table samples and performing self-supervised learning, a pre-trained model is generated and then trained in a supervised manner. This solves the problem of insufficient feature extraction capability for table data and improves the performance and accuracy of task prediction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA PING AN LIFE INSURANCE CO LTD
- Filing Date
- 2022-11-17
- Publication Date
- 2026-06-16
AI Technical Summary
In situations where tabular data labels are scarce, existing technologies have limited neural network models' ability to extract features from tabular data, resulting in limited task prediction performance, and training the model requires a large amount of manual labeling costs.
By masking unlabeled table samples, generating semantic vector sequences using an encoder, and obtaining a pre-trained model through self-supervised learning of the encoder, a task model is then built on the pre-trained model and trained in a supervised manner, reducing labeling costs.
While reducing labeling costs, it improves the task model's ability to extract features from tabular data and enhances task prediction performance, ensuring prediction accuracy even with insufficient labels.
Smart Images

Figure CN115828153B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence technology, specifically to a task prediction method, apparatus, device, and medium based on artificial intelligence. Background Technology
[0002] With the development of artificial intelligence technology, there are more and more scenarios where artificial intelligence technology is used to perform various task predictions. For example, in the product recommendation scenario of e-commerce, neural networks are often used to predict the products that the target object is interested in based on the attribute characteristics of the target object, and then product recommendation content is displayed to the target object according to the prediction results.
[0003] In practical task prediction scenarios, neural networks often need to output prediction results based on tabular modal data. To enable the neural network to learn the features of tabular data, it is usually necessary to collect a large number of tabular modal samples and manually label them, using these labeled samples for supervised training of the neural network. However, for specific tasks with scarce tabular data and labels, the effectiveness of task models obtained through supervised training is limited, and the task models have a low ability to extract features from tabular data. Summary of the Invention
[0004] This application provides a task prediction method, apparatus, device, and medium based on artificial intelligence, which can save the labeling cost of training task models while ensuring the task model's feature extraction capability for tabular data, thereby improving the performance and accuracy of the task model in performing task prediction.
[0005] In a first aspect, embodiments of this application provide a task prediction method based on artificial intelligence, the method comprising the following steps:
[0006] Obtain a first sample set, which includes multiple unlabeled first table samples. Each first table sample includes multiple fields, and each field includes an initial field value.
[0007] For each of the first table samples, some initial field values in the first table sample are masked to obtain the masked table sample.
[0008] The semantic vector sequence corresponding to the masked table sample is determined by a preset encoder;
[0009] Obtain the mask position in the table sample after obtaining the mask;
[0010] The first loss value is determined based on the semantic vector sequence corresponding to the masked table sample, the first table sample, and the mask position;
[0011] The encoder is updated based on the first loss value to obtain a pre-trained model;
[0012] A task model is constructed based on the pre-trained model;
[0013] Obtain a second sample set, which includes multiple second table samples, each of which has a corresponding label;
[0014] The task model is trained based on each of the second table samples and the corresponding labels of the second table samples to obtain the target task model;
[0015] Obtain the target table data, and determine the task prediction result corresponding to the target table data through the target task model.
[0016] In some embodiments, the encoder includes a feature extraction layer and a semantic encoding layer; determining the semantic vector sequence corresponding to the masked table sample using a preset encoder includes:
[0017] The feature extraction layer determines the first embedding vector sequence based on the field and target field values included in the masked table sample;
[0018] The semantic encoding layer determines the semantic vector sequence corresponding to the masked table sample based on the first embedding vector sequence.
[0019] In some embodiments, the feature extraction layer includes a first feature extraction layer and a second feature extraction layer; determining the first embedding vector sequence through the feature extraction layer based on the fields included in the masked table sample and the target field values includes:
[0020] For each field in the masked table sample, the first feature extraction layer determines the first embedding vector of the field based on the type corresponding to the field;
[0021] The second feature extraction layer determines the second embedding vector of the field based on the target field value corresponding to the field;
[0022] Based on the first embedding vector and the second embedding vector corresponding to the field, determine the third embedding vector corresponding to the field;
[0023] The first embedding vector sequence is determined based on the third embedding vector corresponding to all fields in the masked table sample.
[0024] In some embodiments, determining the second embedding vector of the field based on the target field value corresponding to the field through the second feature extraction layer includes:
[0025] Obtain the numeric type of the target field value corresponding to the specified field;
[0026] When the numerical type of the target field value is a continuous numerical value, the target numerical range corresponding to the target field value is determined according to a plurality of preset numerical ranges, and the second embedding vector corresponding to the field is determined according to the first preset embedding vector corresponding to the target numerical range.
[0027] When the numerical type of the target field value is a discrete numerical value, the second embedding vector corresponding to the field is determined according to the second preset embedding vector corresponding to the target field value;
[0028] When the value of the target field is a masked value, the second embedding vector corresponding to the field is determined according to the third preset embedding vector corresponding to the masked value.
[0029] In some embodiments, determining the first loss value based on the semantic vector sequence corresponding to the masked table sample, the first table sample, and the mask position includes:
[0030] Based on the field corresponding to the mask position in the table sample after the mask, determine the label corresponding to each field in the table sample after the mask, such that the label of the field corresponding to the mask position is 1, and the label of the field not corresponding to the mask position is 0;
[0031] A tag sequence is obtained based on the tags corresponding to each of the aforementioned fields;
[0032] An initial field value sequence is determined from the first table sample corresponding to the masked table sample, and a second embedding vector sequence is determined based on the initial field value sequence through a preset feature extraction network;
[0033] The first loss value is calculated based on the label sequence, the second embedding vector sequence, and the semantic vector sequence.
[0034] The tags in the tag sequence, the embedding vectors in the second embedding vector sequence, and the semantic vectors in the semantic vector sequence are all arranged in the order of the fields.
[0035] In some embodiments, the first loss value is calculated using the following formula:
[0036]
[0037] Where, loss represents the first loss value, similarity calculation function represents the similarity calculation function, L represents the number of fields included in the masked table sample, and A iThis represents the i-th semantic vector in the semantic vector sequence. M represents the i-th embedding vector in the second embedding vector sequence. i This represents the i-th label in the label sequence.
[0038] In some embodiments, constructing a task model based on the pre-trained model includes:
[0039] The pre-trained model is used as a preprocessing layer for the task model;
[0040] An output layer is added at the output of the preprocessing layer to obtain the task model.
[0041] Secondly, embodiments of this application provide an artificial intelligence-based task prediction device, the device comprising:
[0042] The first acquisition module is used to acquire a first sample set, which includes multiple unlabeled first table samples. Each first table sample includes multiple fields, and each field includes an initial field value.
[0043] The masking module is used to mask some of the initial field values in each of the first table samples to obtain the masked table samples.
[0044] The first determining module is used to determine the semantic vector sequence corresponding to the masked table sample through a preset encoder;
[0045] The second acquisition module is used to acquire the mask position in the masked table sample;
[0046] The second determining module is used to determine a first loss value based on the semantic vector sequence corresponding to the masked table sample, the first table sample, and the mask position.
[0047] A first training module is used to update the encoder based on the first loss value to obtain a pre-trained model.
[0048] The model building module is used to build a task model based on the pre-trained model;
[0049] The third acquisition module is used to acquire a second sample set, which includes multiple second table samples, each of which has a corresponding label.
[0050] The second training module is used to train the task model based on each of the second table samples and the labels corresponding to the second table samples, so as to obtain the target task model.
[0051] The third determination module is used to acquire target table data and determine the task prediction result corresponding to the target table data through the target task model.
[0052] Thirdly, embodiments of this application provide an electronic device, which includes a memory and a processor. The memory stores computer programs or instructions, and the processor executes the computer programs or instructions to implement the method provided in the first aspect of embodiments of this application.
[0053] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program or instructions, which, when executed by a processor, implement the method provided in the first aspect of embodiments of this application.
[0054] The solution provided in this application first masks the unlabeled first table sample, then outputs the semantic vector sequence corresponding to the masked table sample through an encoder. The encoder's first loss value is then determined based on the semantic vector sequence, and the encoder's parameters are updated based on this first loss value. This allows the encoder to perform deep learning on the table data features in a self-supervised manner during the iterative update process, obtaining a pre-trained model capable of extracting table data features. A task model is then built based on the pre-trained model, and supervised training is performed using labeled second table samples to obtain a target task model for performing the prediction task. This solution reduces the labeling cost of training the target task model and ensures the target task model's ability to extract table data features even when the number of labeled table samples is insufficient, thereby improving the performance and accuracy of the task model in performing task prediction. Attached Figure Description
[0055] Figure 1 This is a flowchart illustrating the task prediction method based on artificial intelligence provided in an embodiment of this application;
[0056] Figure 2 yes Figure 1 A schematic diagram illustrating the specific implementation process of step S103 in the diagram;
[0057] Figure 3 yes Figure 2 A schematic diagram illustrating the specific implementation process of step S201 in the diagram;
[0058] Figure 4 yes Figure 3 A schematic diagram illustrating the specific implementation process of step S302 in the diagram;
[0059] Figure 5 This is a schematic diagram illustrating the process of determining a semantic vector sequence according to an embodiment of this application;
[0060] Figure 6 yes Figure 1 A schematic diagram illustrating the specific implementation process of step S105 in the diagram;
[0061] Figure 7 This is a schematic diagram of the training process of the pre-trained model provided in the embodiments of this application;
[0062] Figure 8 This is a schematic diagram of the task model provided in the embodiments of this application;
[0063] Figure 9 This is a schematic diagram of the structure of an artificial intelligence-based task prediction device provided in an embodiment of this application;
[0064] Figure 10 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0065] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0066] It should be noted that, unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.
[0067] First, let's analyze some of the terms used in this application:
[0068] Artificial Intelligence (AI) is a new branch of computer science that studies, develops, and applies theories, methods, technologies, and systems to simulate, extend, and expand human intelligence. It aims to understand the essence of intelligence and produce intelligent machines that can react in a way similar to human intelligence. Research in this field includes robotics, speech recognition, image recognition, natural language processing, and expert systems. AI can simulate the information processes of human consciousness and thought. Furthermore, AI utilizes digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceiving the environment, acquiring knowledge, and using that knowledge to achieve optimal results.
[0069] Self-supervised learning primarily utilizes auxiliary tasks (pretext) to extract supervisory information from large-scale unsupervised data. This constructed supervisory information is then used to train the network, enabling it to learn representations valuable for downstream tasks.
[0070] Supervised learning is a process that uses a set of labeled data to learn the mapping from input to output, and then applies this mapping to unknown data to achieve the purpose of classification or regression.
[0071] Embedding is a method of representing an object using a numerical vector. This object can be a word, an item, a movie, etc. An item can be represented by a vector because the distance between this vector and other item vectors reflects the similarity between those items. Furthermore, the distance vector between two vectors can even reflect the relationship between them.
[0072] With the development of artificial intelligence technology, there are more and more scenarios where artificial intelligence technology is used to perform various task predictions. For example, in the product recommendation scenario of e-commerce, neural networks are often used to predict the products that the target object is interested in based on the attribute characteristics of the target object, and then product recommendation content is displayed to the target object according to the prediction results.
[0073] In practical task prediction scenarios, neural networks often need to output prediction results based on tabular modal data. To enable the neural network to learn the features of tabular data, it is usually necessary to collect a large number of tabular modal samples and manually label them, using these labeled samples for supervised training of the neural network. However, for specific tasks with scarce tabular data and labels, the effectiveness of task models obtained through supervised training is limited, and the task models have a low ability to extract features from tabular data.
[0074] In view of this, embodiments of this application provide an artificial intelligence-based task prediction method, task prediction device, electronic device, and computer-readable storage medium, which can save the labeling cost of training task models, while ensuring the task model's feature extraction capability for tabular data, thereby improving the performance of the task model in performing task prediction.
[0075] The task prediction method, task prediction device, electronic device, and storage medium provided in this application are specifically described through the following embodiments. First, the task prediction method in the embodiments of this application is described.
[0076] The task prediction method provided in this application can be applied to a terminal, a server, or software running on either a terminal or a server. In some embodiments, the terminal can be a smartphone, tablet, laptop, desktop computer, etc.; the server can be configured as an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms; the software can be an application that implements the task prediction method, but is not limited to the above forms.
[0077] This application can be used in a wide variety of general-purpose or special-purpose computer system environments or configurations. Examples include: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and distributed computing environments including any of the above systems or devices. This application can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.
[0078] It should be noted that in various specific embodiments of the present invention, when processing is required based on data related to the characteristics of an object (such as a user's attribute information or a set of attribute information), the permission or consent of the corresponding object will be obtained first. Furthermore, the collection, use, and processing of this data will comply with the relevant laws, regulations, and standards of the relevant countries and regions. In addition, when an embodiment of the present invention needs to obtain the attribute information of an object, separate permission or consent from the corresponding object will be obtained through pop-ups or redirection to a confirmation page. Only after obtaining the separate permission or consent of the corresponding object will the necessary object-related data for the normal operation of the embodiments of the present invention be obtained.
[0079] Please see Figure 1 This is a flowchart illustrating an artificial intelligence-based task prediction method provided in an embodiment of this application. The method includes the following steps S101-S110, which are described in detail below:
[0080] Step S101: Obtain a first sample set, which includes multiple unlabeled first table samples. Each first table sample includes multiple fields, and each field includes an initial field value.
[0081] It is understood that the first sample set is an unlabeled sample set, meaning that none of the samples in the first sample set have corresponding labels. In this embodiment of the application, the first sample set includes multiple first table samples, each of which has multiple fields, and each field has an initial field value.
[0082] Please refer to Table 1, which is an exemplary first table sample provided in the embodiments of this application. The first table sample shown in Table 1 contains the following fields: "gender", "height", "age", and "occupation", and the initial field values corresponding to each field are "male", "175", "28", and "teacher".
[0083] Table 1
[0084] gender height age Profession male 175 28 teacher
[0085] Step S102: For each of the first table samples, perform masking on some of the initial field values in the first table sample to obtain the masked table sample.
[0086] In practice, a mask ratio can be preset, and then the number of initial field values to be masked can be determined based on the number of initial field values contained in the first table sample and the mask ratio. Then, the initial field values in the first table sample can be randomly masked according to the number of initial field values to be masked.
[0087] Please refer to Table 2, which shows the masked table sample obtained after masking some of the initial field values in the first table sample shown in Table 1. Assuming a pre-set masking ratio of 25%, based on this ratio, one initial field value in the first table sample shown in Table 1 needs to be masked. Then, the initial field value corresponding to "Age" in the first table sample shown in Table 1 is randomly masked; that is, the initial field value "28" corresponding to "Age" is modified to "[MASK]", resulting in the masked table sample. Of course, a higher masking ratio can be set to mask more initial field values in the first table sample; this embodiment of the invention does not limit this.
[0088] Table 2
[0089] gender height age Profession male 175 [MASK] teacher
[0090] Step S103: Determine the semantic vector sequence corresponding to the masked table sample using a preset encoder.
[0091] For example, the encoder can use the BERT model. In a specific implementation, the masked table sample can be input into the encoder, and the encoder can output a sequence of semantic vectors corresponding to the masked table sample. The sequence of semantic vectors contains the semantic vectors corresponding to each field in the table.
[0092] In some embodiments, the encoder includes a feature extraction layer and a semantic encoding layer. Correspondingly, step S103 can be specifically implemented as follows: Figure 2 Steps S201-S202 shown are implemented as follows:
[0093] Step S201: The feature extraction layer determines the first embedding vector sequence based on the field and target field values included in the masked table sample;
[0094] Step S202: Based on the first embedding vector sequence, the semantic coding layer determines the semantic vector sequence corresponding to the masked table sample.
[0095] Understandably, the feature extraction layer is used to extract features from the fields included in the masked table sample and the target field values corresponding to each field, and generate a vector representation for each field, resulting in a first embedding vector sequence. Here, the target field values represent the field values corresponding to each field in the masked table sample, where some field values are masked. The semantic encoding layer is used to generate semantic vectors corresponding to each field based on the first embedding vector sequence, resulting in a semantic vector sequence.
[0096] In some embodiments, the feature extraction layer includes a first feature extraction layer and a second feature extraction layer. Correspondingly, step S201 can be specifically implemented as follows: Figure 3 Steps S301-S304 shown are implemented as follows:
[0097] Step S301: For each field in the masked table sample, the first feature extraction layer determines the first embedding vector of the field based on the type corresponding to the field.
[0098] Step S302: The second feature extraction layer determines the second embedding vector of the field based on the target field value corresponding to the field.
[0099] Step S303: Determine the third embedding vector corresponding to the field based on the first embedding vector and the second embedding vector corresponding to the field;
[0100] Step S304: Determine the first embedding vector sequence based on the third embedding vectors corresponding to all fields in the masked table sample.
[0101] Understandably, the first feature extraction layer is used to extract the features of the fields and generate the first embedding vector corresponding to each field; the second feature extraction layer is used to extract the features of the target field values and generate the second embedding vector corresponding to each target field value; the corresponding first embedding vector and second embedding vector are added together to obtain the third embedding vector corresponding to each field; and the third embedding vectors corresponding to all fields in the masked table sample are combined to form the first embedding vector sequence.
[0102] In some embodiments, step S302 can be specifically achieved through methods such as... Figure 4 Steps S401-S404 shown are implemented as follows:
[0103] Step S401: Obtain the numeric type of the target field value corresponding to the field;
[0104] Step S402: When the numerical type of the target field value is a continuous numerical value, the target numerical range corresponding to the target field value is determined according to a plurality of preset numerical ranges, and the second embedding vector corresponding to the field is determined according to the first preset embedding vector corresponding to the target numerical range.
[0105] Step S403: When the numerical type of the target field value is a discrete numerical value, determine the second embedding vector corresponding to the field according to the second preset embedding vector corresponding to the target field value;
[0106] Step S404: When the value type of the target field value is a masked value, determine the second embedding vector corresponding to the field according to the third preset embedding vector corresponding to the masked value.
[0107] Specifically, the numeric types include continuous, discrete, and masked. Continuous numeric values are in numerical form, such as the field value corresponding to the field "height"; discrete numeric values are in categorical form, such as the field value "male / female" corresponding to the field "gender"; masked numeric values refer to the values after being masked, i.e., "[MASK]".
[0108] For continuous target field values, firstly, based on multiple preset numerical intervals, it is determined which numerical interval the current target field value belongs to. Each preset numerical interval has a corresponding first preset embedding vector. Then, based on the numerical interval to which the current target field value belongs, the second embedding vector corresponding to the field is determined (i.e., ... ).
[0109] For discrete target field values, a second preset embedding vector is first set for each category. For example, for the "gender" field, the second preset embedding vector for "male" is set to "embedding1" and the second preset embedding vector for "female" is set to "embedding2". The second embedding vector corresponding to the current target field value can be determined based on the second preset embedding vector.
[0110] For a mask-type target field value, a fixed embedding is set for the mask, which is the third preset embedding vector. Therefore, the second embedding vector corresponding to the mask-type target field value is the fixed third preset embedding vector.
[0111] Please see Figure 5 , Figure 5 This is a schematic diagram illustrating the process of determining a semantic vector sequence according to an embodiment of this application. For example... Figure 5 As shown, the masked table sample is input into the encoder. The masked table sample includes multiple fields and target field values corresponding to each field. The encoder includes a feature extraction layer, which consists of a first feature extraction layer and a second feature extraction layer. The first feature extraction layer extracts features based on the type corresponding to each field, generating the first embedding vector for each field. The second feature extraction layer determines the second embedding vector for each field based on the target field value corresponding to each field. Will and The corresponding embedding vectors are added together to obtain the third embedding vector for each field, i.e., e = [e0, e1, ..., e2]. L-1 ],in The first embedding vector sequence e is determined based on the third embedding vectors corresponding to all fields in the masked table sample. The encoder also includes a semantic encoding layer, which receives the first embedding vector sequence e from the feature extraction layer as input, causing the semantic encoding layer to output a semantic vector sequence A = [A0, A1, ..., A...] corresponding to the masked table sample based on the first embedding vector sequence e. L-1 ].
[0112] Step S104: Obtain the mask position in the masked table sample.
[0113] It is understandable that the mask position refers to the location of the masked field value in the table sample after masking.
[0114] Step S105: Determine the first loss value based on the semantic vector sequence corresponding to the masked table sample, the first table sample, and the mask position.
[0115] Understandably, the first loss value can be determined based on the similarity between the embedding representation of the first table sample at the mask position and the embedding representation of the masked table sample at the mask position.
[0116] For example, step S105 can be specifically achieved through, as follows: Figure 6 Steps S501-S504 shown are implemented as follows:
[0117] Step S501: Based on the field corresponding to the mask position in the table sample after the mask, determine the label corresponding to each field in the table sample after the mask, such that the label of the field corresponding to the mask position is 1, and the label of the field not corresponding to the mask position is 0.
[0118] Step S502: Obtain a tag sequence based on the tags corresponding to each of the fields.
[0119] Taking Table 2 as an example, the mask sequence corresponding to the masked table sample shown in Table 2 is [0,0,1,0].
[0120] Step S503: Determine an initial field value sequence from the first table sample corresponding to the masked table sample, and determine a second embedding vector sequence based on the initial field value sequence using a preset feature extraction network.
[0121] For example, the first table sample corresponding to the masked table sample shown in Table 2 is shown in Table 1, thus the initial field value sequence can be obtained as [Male, 175, 28, Teacher]. This initial field value sequence is input into the feature extraction network to obtain the second embedding vector sequence. The second embedding vector sequence includes the embedding vectors corresponding to each initial field value of the first table sample.
[0122] The tags in the tag sequence, the embedding vectors in the second embedding vector sequence, and the semantic vectors in the semantic vector sequence are all arranged in the order of the fields.
[0123] Step S504: Calculate the first loss value based on the label sequence, the second embedding vector sequence, and the semantic vector sequence.
[0124] Specifically, the first loss value can be calculated using the following formula:
[0125]
[0126] Where, loss represents the first loss value, similarity calculation function represents the similarity calculation function, L represents the number of fields included in the masked table sample, and A i This represents the i-th semantic vector in the semantic vector sequence. M represents the i-th embedding vector in the second embedding vector sequence. i This represents the i-th label in the label sequence.
[0127] Step S106: Update the encoder according to the first loss value to obtain a pre-trained model.
[0128] Understandably, the training termination condition of the encoder is determined based on the first loss value and the preset first loss threshold. If the training termination condition is met, the current encoder is used as the pre-trained model; if the training termination condition is not met, the encoder parameters are adjusted according to the first loss value, and the encoder is retrained based on the first sample set.
[0129] Please see Figure 7 , Figure 7 This is a schematic diagram illustrating the training process of the pre-trained model provided in an embodiment of this application. For example... Figure 7 As shown, during the training of the pre-trained model, the first unlabeled table sample is masked to obtain the masked table sample; the encoder outputs the semantic vector sequence corresponding to the masked table sample; and a pre-defined feature extraction network determines the second embedding vector sequence E based on the initial field value sequence in the first table sample. The first loss value of the encoder is determined based on the semantic vector sequence A and the second embedding vector sequence E; the parameters of the encoder are updated based on the first loss value. In this way, the encoder learns the features of the tabular data in a self-supervised manner during the iterative update process, and obtains a pre-trained model with the ability to extract features of the tabular data.
[0130] Step S107: Construct a task model based on the pre-trained model.
[0131] It is understood that the embodiments of this application construct a task model for task prediction based on a pre-trained model.
[0132] Specifically, step S107 can be achieved through the following steps S601-S602:
[0133] Step S601: Use the pre-trained model as the preprocessing layer of the task model;
[0134] Step S602: Add an output layer at the output end of the preprocessing layer to obtain the task model.
[0135] Please see Figure 8 , Figure 8 This is a schematic diagram of the task model provided in the embodiments of this application. Figure 8The preprocessing layer in the task model shown is a pre-trained model obtained through steps S106-S107 of the embodiment of this application. This preprocessing layer has undergone self-supervised learning and has the ability to extract features from tabular data. It can output the corresponding semantic embedding vector based on the input tabular data. Figure 8 The output layer in the task model shown is connected to the output of the preprocessing layer. The output layer can perform task prediction based on the semantic embedding vectors output by the preprocessing layer and output the task prediction results.
[0136] Step S108: Obtain a second sample set, which includes multiple second table samples, each of which has a corresponding label;
[0137] It is understandable that the second sample set is a labeled sample set, and each sample in the second table included in the second sample set has a corresponding label.
[0138] Step S109: Train the task model based on each of the second table samples and the labels corresponding to the second table samples to obtain the target task model.
[0139] Understandably, each sample from the second table is input into the task model described above, causing the task model to output the task prediction result corresponding to each sample from the second table. Then, based on the task prediction result and label corresponding to each sample from the second table, a second loss value is determined. Based on the second loss value and a preset second loss threshold, it is determined whether the training termination condition of the task model is met. If the training termination condition is met, the current task model is used as the target task model. If the training termination condition is not met, the parameters of the task model are adjusted according to the second loss value, and the task model is retrained based on the second sample set.
[0140] Step S110: Obtain target table data and determine the task prediction result corresponding to the target table data through the target task model.
[0141] Understandably, after obtaining the target task model, it can be used to perform prediction tasks based on tabular data. Specifically, the target tabular data is input into the target task model, which first extracts features from the target tabular data, outputs semantic embedding vectors corresponding to the target tabular data, and then performs task prediction based on the semantic embedding vectors, outputting the corresponding task prediction results.
[0142] The solution provided in this application first masks the unlabeled first table sample, then outputs the semantic vector sequence corresponding to the masked table sample through an encoder. The encoder's first loss value is then determined based on the semantic vector sequence, and the encoder's parameters are updated based on this first loss value. This allows the encoder to self-supervisedly learn the features of the table data during iterative updates, obtaining a pre-trained model capable of extracting table data features. A task model is then built based on the pre-trained model, and supervised training is performed using labeled second table samples to obtain a target task model for performing prediction tasks. This solution reduces the labeling cost of training the target task model and ensures the target task model's ability to extract table data features even when the number of labeled table samples is insufficient, thereby improving the performance and accuracy of the task model in performing task predictions.
[0143] The target task model of this application can be applied to e-commerce scenarios. For example, the object attribute information of the target object is recorded in a tabular modality, and then the tabular object attribute information is input into the target task model. The target task model performs feature extraction on the tabular object attribute information to obtain the corresponding semantic embedding vector. Then, recall prediction is performed based on the semantic embedding vector, and the recall prediction result corresponding to the target object is output. Finally, the recall text corresponding to the target object is obtained based on the recall prediction result.
[0144] Please see Figure 9 , Figure 9 This application provides an embodiment of an artificial intelligence-based task prediction device, the device comprising:
[0145] The first acquisition module 801 is used to acquire a first sample set, which includes multiple unlabeled first table samples, each first table sample including multiple fields, and each field including an initial field value.
[0146] The masking module 802 is used to mask some of the initial field values in each of the first table samples to obtain a masked table sample.
[0147] The first determining module 803 is used to determine the semantic vector sequence corresponding to the masked table sample through a preset encoder;
[0148] The second acquisition module 804 is used to acquire the mask position in the masked table sample;
[0149] The second determining module 805 is used to determine a first loss value based on the semantic vector sequence corresponding to the masked table sample, the first table sample, and the mask position.
[0150] The first training module 806 is used to update the encoder according to the first loss value to obtain a pre-trained model.
[0151] Model building module 807 is used to build a task model based on the pre-trained model;
[0152] The third acquisition module 808 is used to acquire a second sample set, which includes multiple second table samples, each of which has a corresponding label.
[0153] The second training module 809 is used to train the task model based on each of the second table samples and the labels corresponding to the second table samples, so as to obtain the target task model.
[0154] The third determining module 810 is used to acquire target table data and determine the task prediction result corresponding to the target table data through the target task model.
[0155] The specific implementation of this task prediction device is basically the same as the specific implementation of the task prediction method described above, and will not be repeated here.
[0156] This application also provides an electronic device, which includes a memory and a processor. The memory stores computer programs or instructions, and the processor executes the computer programs or instructions to implement the above-described task prediction method. This electronic device can be a server or a smart terminal.
[0157] Please see Figure 10 , Figure 10 The diagram illustrates the hardware structure of an electronic device, which includes:
[0158] The processor 1010 can be implemented using a general-purpose central processing unit (CPU), microprocessor, application specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application.
[0159] The memory 1020 can be implemented as a read-only memory (ROM), static storage device, dynamic storage device, or random access memory (RAM). The memory 1020 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 1020 and is called and executed by the processor 1010 using the task prediction method of the embodiments of this application.
[0160] The input / output interface 1030 is used to implement information input and output;
[0161] The communication interface 1040 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
[0162] Bus 1050 transmits information between various components of the device (e.g., processor 1010, memory 1020, input / output interface 1030, and communication interface 1040);
[0163] The processor 1010, memory 1020, input / output interface 1030 and communication interface 1040 are connected to each other within the device via bus 1050.
[0164] This application also provides a computer-readable storage medium storing a computer program or instructions that, when executed by a processor, implement the above-described task prediction method.
[0165] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0166] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.
[0167] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.
[0168] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0169] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.
[0170] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0171] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.
[0172] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0173] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0174] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0175] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0176] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.
Claims
1. A task prediction method based on artificial intelligence, characterized in that, The method includes the following steps: Obtain a first sample set, which includes multiple unlabeled first table samples. Each first table sample includes multiple fields, and each field includes an initial field value. For each of the first table samples, some initial field values in the first table sample are masked to obtain the masked table sample. The semantic vector sequence corresponding to the masked table sample is determined by a preset encoder; Obtain the mask position in the table sample after obtaining the mask; The first loss value is determined based on the semantic vector sequence corresponding to the masked table sample, the first table sample, and the mask position; The encoder is updated based on the first loss value to obtain a pre-trained model; A task model is constructed based on the pre-trained model; Obtain a second sample set, which includes multiple second table samples, each of which has a corresponding label; The task model is trained based on each of the second table samples and the corresponding labels of the second table samples to obtain the target task model; Obtain target table data, and determine the task prediction result corresponding to the target table data through the target task model; The encoder includes a feature extraction layer and a semantic encoding layer. The feature extraction layer includes a first feature extraction layer and a second feature extraction layer. Determining the semantic vector sequence corresponding to the masked table sample using a preset encoder includes: For each field in the masked table sample, the first feature extraction layer determines the first embedding vector of the field based on the type corresponding to the field; The second feature extraction layer determines the second embedding vector of the field based on the target field value corresponding to the field. Based on the first embedding vector and the second embedding vector corresponding to the field, determine the third embedding vector corresponding to the field; The first embedding vector sequence is determined based on the third embedding vector corresponding to all fields in the masked table sample; The semantic encoding layer determines the semantic vector sequence corresponding to the masked table sample based on the first embedding vector sequence.
2. The task prediction method based on artificial intelligence according to claim 1, characterized in that, The step of determining the second embedding vector of the field based on the target field value corresponding to the field through the second feature extraction layer includes: Obtain the numeric type of the target field value corresponding to the specified field; When the numerical type of the target field value is a continuous numerical value, the target numerical range corresponding to the target field value is determined according to a plurality of preset numerical ranges, and the second embedding vector corresponding to the field is determined according to the first preset embedding vector corresponding to the target numerical range. When the numerical type of the target field value is a discrete numerical value, the second embedding vector corresponding to the field is determined according to the second preset embedding vector corresponding to the target field value; When the value of the target field is a masked value, the second embedding vector corresponding to the field is determined according to the third preset embedding vector corresponding to the masked value.
3. The task prediction method based on artificial intelligence according to claim 1, characterized in that, The step of determining the first loss value based on the semantic vector sequence corresponding to the masked table sample, the first table sample, and the mask position includes: Based on the field corresponding to the mask position in the table sample after the mask, determine the label corresponding to each field in the table sample after the mask, such that the label of the field corresponding to the mask position is 1, and the label of the field not corresponding to the mask position is 0; A tag sequence is obtained based on the tags corresponding to each of the aforementioned fields; An initial field value sequence is determined from the first table sample corresponding to the masked table sample, and a second embedding vector sequence is determined based on the initial field value sequence through a preset feature extraction network; The first loss value is calculated based on the label sequence, the second embedding vector sequence, and the semantic vector sequence. The tags in the tag sequence, the embedding vectors in the second embedding vector sequence, and the semantic vectors in the semantic vector sequence are all arranged in the order of the fields.
4. The task prediction method based on artificial intelligence according to claim 3, characterized in that, The first loss value is calculated using the following formula: Wherein, loss represents the first loss value, sim represents the similarity calculation function, and L represents the number of fields included in the masked table sample. Represents the first in the semantic vector sequence A semantic vector, This represents the i-th embedding vector in the second embedding vector sequence. This represents the i-th label in the label sequence.
5. The task prediction method based on artificial intelligence according to claim 1, characterized in that, The construction of the task model based on the pre-trained model includes: The pre-trained model is used as a preprocessing layer for the task model; An output layer is added at the output of the preprocessing layer to obtain the task model.
6. A task prediction device based on artificial intelligence, characterized in that, The apparatus, applicable to the method of any one of claims 1-5, comprises: The first acquisition module is used to acquire a first sample set, which includes multiple unlabeled first table samples. Each first table sample includes multiple fields, and each field includes an initial field value. The masking module is used to mask some of the initial field values in each of the first table samples to obtain the masked table samples. The first determining module is used to determine the semantic vector sequence corresponding to the masked table sample through a preset encoder; The second acquisition module is used to acquire the mask position in the masked table sample; The second determining module is used to determine a first loss value based on the semantic vector sequence corresponding to the masked table sample, the first table sample, and the mask position. A first training module is used to update the encoder based on the first loss value to obtain a pre-trained model. The model building module is used to build a task model based on the pre-trained model; The third acquisition module is used to acquire a second sample set, which includes multiple second table samples, each of which has a corresponding label. The second training module is used to train the task model based on each of the second table samples and the labels corresponding to the second table samples, so as to obtain the target task model. The third determination module is used to acquire target table data and determine the task prediction result corresponding to the target table data through the target task model.
7. An electronic device, characterized in that, It includes a memory and a processor, the memory storing a computer program or instructions, and the processor executing the computer program or instructions to implement the method of any one of claims 1 to 5.
8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program or instructions that, when executed by a processor, implement the method as described in any one of claims 1 to 5.