Short video recommendation method and device, computer device, and storage medium

By combining the first and second neural network models to assign weights to user features and video features and to perform feature concatenation scoring, the problem of existing short video recommendation algorithms not considering the importance of video features is solved, thereby improving the accuracy of recommendations and user satisfaction.

CN115757932BActive Publication Date: 2026-06-23东莞市步步高教育软件有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
东莞市步步高教育软件有限公司
Filing Date
2022-06-21
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing short video recommendation algorithms fail to effectively consider the importance of video features themselves, resulting in low recommendation accuracy.

Method used

The system employs a combination of a first neural network model and a second neural network model to extract user and video features, assigning different weights to different features, and then recommending short videos through feature concatenation and scoring calculation.

Benefits of technology

It improves the accuracy and targeting of short video recommendations, enhances the user experience, and is particularly effective in mining low-frequency features.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115757932B_ABST
    Figure CN115757932B_ABST
Patent Text Reader

Abstract

The application provides a short video recommendation method and device, computer equipment and a storage medium, wherein the short video recommendation method comprises: obtaining user features and video features of short videos; inputting the obtained features into a pre-trained first neural network model to obtain a first feature matrix, wherein the weights of each feature in the first feature matrix are the same; inputting the obtained features into a pre-trained second neural network model to obtain a second feature matrix, wherein different weights are given to each feature in the second feature matrix according to the importance of each feature; scoring each short video based on the first feature matrix and the second feature matrix; and recommending short videos according to the scores of each short video. In this process, in addition to performing high-order feature extraction and feature cross operations, the contribution of different features to the overall effect is also considered, important features are given high weights, and secondary features are given low weights, thereby improving the accuracy of capturing user preference changes and more specifically achieving short video recommendation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and more particularly to a short video recommendation method and apparatus, computer equipment and storage medium. Background Technology

[0002] In recent years, with the rapid development of the internet, short video platforms have achieved tremendous success, with both the number of users and the amount of short video content experiencing explosive growth. Against this backdrop, it has become particularly important to enable users to quickly access content they are interested in, provide personalized recommendations, and identify potentially engaging video content from a vast library of video resources to enhance user engagement.

[0003] Currently, traditional video recommendation algorithms generally include a video selection stage and a video scoring stage. The selection stage filters a set of videos that the user might be interested in from a massive amount of video data, while the scoring stage ranks the selected videos according to the user's preferences. However, most video recommendation algorithms do not consider the importance of individual video features, meaning they do not consider the contribution of features to the overall video effect, thus affecting the accuracy of video recommendations. Summary of the Invention

[0004] The purpose of this invention is to provide a short video recommendation method and apparatus, computer equipment and storage medium, which effectively solves the technical problem of low accuracy in existing short video recommendation methods.

[0005] The technical solution provided by this invention is as follows:

[0006] On the one hand, the present invention provides a short video recommendation method, including:

[0007] Obtain user characteristics and video characteristics of short videos;

[0008] The acquired features are input into a pre-trained first neural network model to obtain a first feature matrix, in which each feature has the same weight.

[0009] The acquired features are input into a pre-trained second neural network model to obtain a second feature matrix, in which different weights are assigned according to the importance of each feature.

[0010] Each short video is scored based on the first feature matrix and the second feature matrix;

[0011] Short videos are recommended based on their ratings.

[0012] More preferably, the first neural network model consists of multiple serially connected feature extraction blocks, each of which includes a multi-head neural network, a first normalized residual network, a feature enhancement network, and a second normalized residual network connected in sequence, and the multi-head neural network includes multiple self-attention networks;

[0013] The feature extraction steps in each feature extraction module, which involve inputting the acquired features into a pre-trained first neural network model to obtain a first feature matrix, include:

[0014] The input features are fed into a multi-head neural network and linearly transformed to obtain the first matrix. For the first feature extraction block, the input features are the acquired user features and the video features of the short video. For other feature extraction blocks, the input features are the output of the previous feature extraction block.

[0015] The input features and the first matrix are input into a first standardized residual network to perform residual and standardization operations to obtain a second matrix.

[0016] The second matrix is ​​input into the feature enhancement network and subjected to a nonlinear transformation to obtain the third matrix;

[0017] The second and third matrices are input into the second standardized residual network for residual and standardization operations.

[0018] More preferably, the second neural network model includes a feature compression network, a feature importance prediction network, and a feature labeling network connected in sequence;

[0019] The step of inputting the acquired features into the pre-trained second neural network model to obtain the second feature matrix includes:

[0020] The acquired features are input into a feature compression network and subjected to average pooling to compress the input features.

[0021] The compressed features are input into a feature importance prediction network to predict the weights of different features. The feature importance prediction network includes two fully connected layers connected after the feature compression network.

[0022] The weights of the predicted different features are input into the feature labeling network, and the weights are applied to the corresponding features to complete the relabeling of the acquired user features and video features.

[0023] More preferably, the short video recommendation process, which involves scoring each short video based on the first feature matrix and the second feature matrix, includes:

[0024] The first feature matrix and the second feature matrix are horizontally concatenated to obtain the concatenated feature matrix;

[0025] The spliced ​​feature matrix is ​​input into a fully connected layer for scoring calculation, resulting in a score for each short video.

[0026] More preferably, the short video recommendation based on the rating of each short video includes:

[0027] The short videos are sorted from highest to lowest score to determine the sorting queue.

[0028] Short video recommendations are made based on the sorting queue; or

[0029] After obtaining user characteristics and short video video characteristics, the process also includes:

[0030] Dimensionality reduction is performed on the user features and video features.

[0031] On the other hand, the present invention provides a short video recommendation device, comprising:

[0032] The feature acquisition module is used to acquire user features and video features of short videos;

[0033] The first feature extraction module is used to input the acquired features into a pre-trained first neural network model to obtain a first feature matrix, wherein each feature in the first feature matrix has the same weight.

[0034] The second feature extraction module is used to input the acquired features into the pre-trained second neural network model to obtain the second feature matrix, in which different weights are assigned according to the importance of each feature.

[0035] The scoring module is used to score each short video based on the first feature matrix and the second feature matrix;

[0036] The video recommendation module is used to recommend short videos based on their ratings.

[0037] More preferably, in the first feature extraction module, the first neural network model consists of multiple serially connected feature extraction blocks. Each feature extraction block includes a multi-head neural network, a first normalized residual network, a feature enhancement network, and a second normalized residual network connected in sequence.

[0038] The multi-head neural network includes multiple self-attention networks for linearly transforming the input features to obtain a first matrix. For the first feature extraction block, the input features are the acquired user features and the video features of the short video. For other feature extraction blocks, the input features are the output of the previous feature extraction block.

[0039] The first standardized residual network is used to perform residual and standardized operations on the input features and the first matrix to obtain the second matrix;

[0040] The feature enhancement network is used to perform a nonlinear transformation on the second matrix to obtain a third matrix;

[0041] The second standardized residual network is used to perform residual and standardization operations on the second and third matrices.

[0042] More preferably, in the second feature extraction module, the second neural network model includes a feature compression network, a feature importance prediction network, and a feature labeling network connected in sequence; wherein,

[0043] The feature compression network is used to perform average pooling on the acquired features to compress the input features;

[0044] The feature importance prediction network is used to predict the weights of the compressed features. The feature importance prediction network includes two fully connected layers connected after the feature compression network.

[0045] The feature labeling network is used to weight the predicted weights of different features to the corresponding features, thereby completing the relabeling of the acquired user features and video features.

[0046] More preferably, the scoring module includes an interconnected feature splicing network and a scoring network, wherein,

[0047] The feature splicing network is used to horizontally splice the first feature matrix and the second feature matrix to obtain a spliced ​​feature matrix.

[0048] The scoring network is used to calculate scores for the spliced ​​feature matrix to obtain scores for each short video.

[0049] More preferably, the video recommendation module includes interconnected sorting units and recommendation units, wherein,

[0050] The sorting unit is used to sort the short videos from highest to lowest score and determine the sorting queue of the short videos;

[0051] The recommendation unit is used to recommend short videos based on the sorting queue; or

[0052] The short video recommendation device also includes a dimensionality reduction module, which is used to perform dimensionality reduction operations on the user features and video features.

[0053] On the other hand, the present invention provides a computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to perform the steps of the above-described short video recommendation method.

[0054] On the other hand, the present invention provides a computer device including a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor performs the steps of the above-described short video recommendation method.

[0055] The short video recommendation method, apparatus, computer equipment, and storage medium provided by this invention combine a first neural network model and a second neural network model to simultaneously extract features from acquired user features and video features. In this process, in addition to high-order feature extraction and feature cross-referencing, the contribution of different features to the overall effect is considered, assigning high weights to important features and low weights to secondary features. This approach effectively leverages the contribution of some low-frequency features to the overall effect, improving the accuracy of capturing changes in user preferences, enabling more targeted short video recommendations, and enhancing the user experience. Attached Figure Description

[0056] The preferred embodiments will now be described in a clear and easy-to-understand manner, with reference to the accompanying drawings, to further explain the above-mentioned characteristics, technical features, advantages, and implementation methods.

[0057] Figure 1 This is a schematic flowchart of an embodiment of the short video recommendation method of the present invention;

[0058] Figure 2 This is a schematic flowchart of another embodiment of the short video recommendation method of the present invention;

[0059] Figure 3 This is a structural diagram of the first neural network model of the present invention;

[0060] Figure 4 This is a schematic diagram of the feature extraction block structure of the present invention;

[0061] Figure 5 This is a structural diagram of the second neural network model of the present invention;

[0062] Figure 6 This is a schematic flowchart of an embodiment of the short video recommendation device of the present invention;

[0063] Figure 7 This is a schematic flowchart of another embodiment of the short video recommendation device of the present invention;

[0064] Figure 8 This is a schematic diagram of the computer device structure of the present invention.

[0065] Explanation of icon numbers:

[0066] 100 - First Neural Network Model, 110 - Feature Extraction Block, 111 - Multi-Head Neural Network, 112 - First Standardized Residual Network, 113 - Feature Enhancement Network, 114 - Second Standardized Residual Network, 200 - Second Neural Network Model, 210 - Feature Compression Network, 220 - Feature Importance Prediction Network, 230 - Feature Labeling Network, 300 - Short Video Recommendation Device, 310 - Feature Acquisition Module, 320 - First Feature Extraction Module, 330 - Second Feature Extraction Module, 340 - Scoring Module, 350 - Video Recommendation Module, 360 - Dimensionality Reduction Module. Detailed Implementation

[0067] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, specific embodiments of the present invention will be described below with reference to the accompanying drawings. Obviously, the drawings described below are merely some embodiments of the present invention. For those skilled in the art, other drawings and other embodiments can be obtained based on these drawings without creative effort.

[0068] One embodiment of the present invention, such as Figure 1 As shown, a short video recommendation method includes:

[0069] S10 acquires user characteristics and video characteristics of short videos;

[0070] S20 inputs the acquired features into the pre-trained first neural network model to obtain the first feature matrix, in which each feature has the same weight;

[0071] S30 inputs the acquired features into the pre-trained second neural network model to obtain the second feature matrix, in which different weights are assigned according to the importance of each feature.

[0072] S40 scores each short video based on the first feature matrix and the second feature matrix;

[0073] S50 recommends short videos based on their ratings.

[0074] In this embodiment, since each user has different interests and daily needs, the terminal software, when recommending short videos to users, scores the videos based on user characteristics and video features to make the recommendations more targeted and better meet the user's needs and preferences. The terminal software refers to the software within the terminal that provides short videos, such as Yuanfudao. User characteristics can include user age, gender, viewing history, etc. Video features can include the video's cover image, instructor information, and knowledge points; these are not specifically limited and can be adjusted according to the actual application.

[0075] In practical applications, the acquired user features and short video features are generally high-dimensional raw data. Directly performing subsequent calculations based on this high-dimensional raw data leads to computational complexity and low accuracy. Therefore, in other embodiments, to improve computational efficiency, after acquiring the user features and video features, a further step S60 is included to perform a dimensionality reduction operation on the acquired features, such as... Figure 2 As shown, the high-dimensional original data is reduced to low-dimensional data. Specifically, the application features and video features are input into the embedding layer for linear projection to obtain low-dimensional user features and video features.

[0076] After obtaining user features and video features, they are input into the first neural network model and the second neural network model respectively for feature extraction. In this process, two different methods are used for feature extraction, which greatly improves the feature richness in the subsequent short video scoring process. Different weights are assigned to features from different aspects to explore the contribution of different features to video recommendation. In particular, low-frequency features that may be ignored in traditional recommendation methods are fully explored in this embodiment to improve the accuracy of short video recommendation and the user's satisfaction with the recommended short videos.

[0077] The following describes the first and second neural network models. After creating the first and second neural network models, they are trained on a training set and a validation set composed of a large number of user features and video features. After training, they are applied in practice to recommend short videos.

[0078] like Figure 3 and Figure 4 As shown, the first neural network model 100 consists of multiple serially connected feature extraction blocks 110 (including feature extraction block 1, feature extraction block 2, ..., feature extraction block n in the figure). Each feature extraction block 110 includes a multi-head neural network 111 (including multiple self-attention networks), a first normalized residual network 112, a feature enhancement network 113 and a second normalized residual network 114 connected in sequence.

[0079] Based on this, in step S20, the acquired features are input into the pre-trained first neural network model 100 to obtain the first feature matrix. The feature extraction steps in each feature extraction module include:

[0080] S21 inputs the input features into the multi-head neural network 111 and performs a linear transformation to obtain the first matrix. For the first feature extraction block, the input features are the acquired user features and the video features of the short video. For other feature extraction blocks, the input features are the output of the previous feature extraction block.

[0081] The first matrix MultiHead(x) obtained by the transformation of the multi-head neural network (including multiple self-attention networks) in the j-th feature extraction module is as shown in equations (1) to (2):

[0082] MultiHead(x) = Concat(head) j1 ,…,head jh W jO (1)

[0083]

[0084] Where j is the index of the feature extraction module; Concat(·) represents the connection function; Let represent the weight matrices of the i-th self-attention network in the j-th multi-head neural network (corresponding to the multi-head neural network in the j-th feature extraction module), which can be obtained through initialization; head ji Q represents the output of the i-th self-attention network in the j-th multi-head neural network, where i = 1, 2, ..., h; j K j V j Let Q(Query), K(Key), and V(Value) matrices represent the j-th multi-head neural network, respectively. In this embodiment, Q... j =K j =V j =x, where x represents the input feature; W jO The weight matrix corresponding to the j-th multi-head neural network can be obtained through initialization.

[0085] For any self-attention network in the multi-head neural network 111, the output is as shown in equation (3):

[0086]

[0087] Where Q, K, and V represent the Q, K, and V matrices of the self-attention network, respectively; d k This represents the number of columns in the Q and K matrices, i.e., the vector dimension.

[0088] S22 inputs the input features and the first matrix into the first standardized residual network 112 to perform residual and standardization operations to obtain the second matrix L1, as shown in equation (4):

[0089] L1=LayerNorm(x+MultiHead(x)) (4)

[0090] S23 inputs the second matrix into the feature enhancement network 113 and performs a nonlinear transformation to obtain the third matrix.

[0091] In the instance where the enhancement network is a two-layer feedforward neural network, and the activation function of the first layer is ReLU and the activation function of the second layer is a linear activation function, the resulting third matrix FFN(x) is as shown in equation (5):

[0092] FFN(x)=ReLU(xW j1 +b j1 W j2 +b j2 (5)

[0093] Among them, W j1 W j2 Let b represent the weight matrices corresponding to the first and second layers, respectively. j1 ,b j2 These represent the bias terms corresponding to the first and second layers, respectively.

[0094] S24 inputs the second and third matrices into the second standardized residual network 114 for residual and standardization operations, and outputs the result L2 as shown in equation (6):

[0095] L2 = LayerNorm(L1 + FFN(x)) (6)

[0096] When the feature extraction block is not the last feature extraction block in the cascaded connection of the first neural network model, its output L2 will be used as the input of the next feature extraction block; conversely, when the feature extraction block is the last feature extraction block in the cascaded connection of the first neural network model, its output L2 is the output of the first neural network model, i.e., the first feature matrix. In practical applications, the number of cascaded feature extraction blocks in the first neural network model can be determined according to actual needs, such as setting 2, 3, or even more.

[0097] like Figure 5 As shown, the second neural network model 200 includes a feature compression network 210, a feature importance prediction network 220, and a feature labeling network 230 connected in sequence.

[0098] Based on this, step S30 inputs the acquired features into the pre-trained second neural network model 200 to obtain the second feature matrix, including:

[0099] S31 inputs the acquired features into the feature compression network 210 and performs average pooling to compress the input features. The compression result F sq (e m As shown in equation (7):

[0100]

[0101] Among them, F sq(·) denotes the compression function; the input feature matrix is ​​E = [e1,…e2]. m ,…,e n ], m = 1, 2, ..., n; K′ represents vector e m The dimension, i.e., e m Let t be a K′-dimensional vector, and let t represent the vector e. m The t-th iteration, t = 1, 2, ..., K′.

[0102] The feature compression process compresses two-dimensional features into a single real number, i.e., the compression result F. sq (e i ) is a scalar, and the output dimension matches the number of input features, representing the global information of the dimensional features.

[0103] S32 inputs the compressed features into the feature importance prediction network 220 to predict the weights of different features. The feature importance prediction network 220 includes two fully connected layers (including a first fully connected layer and a second fully connected layer) connected after the feature compression network 210. That is, the importance of the features output in step S31 is learned using these two fully connected networks. The prediction result A is as shown in equation (8):

[0104] A = F ex (F sq (e m ))=σ2(W2σ1(W1F sq (e m ))) (8)

[0105] Among them, F ex (·) represents the prediction function; σ1 represents the activation function of the first fully connected layer, σ2 represents the activation function of the second fully connected layer, W1 represents the weight matrix of the first fully connected layer, and W2 represents the weight matrix of the second fully connected layer.

[0106] S33 inputs the predicted weights of different features into the feature labeling network 230, and applies the weights to the corresponding features (scaling the importance of the originally acquired features), thus completing the relabeling of the acquired user features and video features. The result is F. scale (A,E) are as shown in equation (8):

[0107] F scale (A,E)=[a1.e1,…,a n .e n ] = [v1,…,v n (9)

[0108] Where, E = [e1, e2, ..., e n ] represents the input matrix of the feature compression network, A = [a1, a2, ..., a n] represents the predicted weight matrix, where a1 corresponds to the predicted weight matrix of feature e1, v1 = a1.e1, and so on.

[0109] After obtaining the first and second feature matrices using the above methods, each short video is then scored, including:

[0110] S41 horizontally concatenates the first feature matrix and the second feature matrix to obtain the concatenated feature matrix;

[0111] S42 inputs the concatenated feature matrix into the fully connected layer for scoring calculation, obtaining a score for each short video.

[0112] The concatenated feature matrix is ​​formed by horizontally concatenating the first feature matrix and the second feature matrix. Assume the first feature matrix is ​​[y1, y2, ..., y]. p The second characteristic matrix is ​​[z1, z2, ..., z]. q The concatenated feature matrix obtained after splicing is [y1, y2, ..., y]. p ,z1,z2,…,z q [z1, z2, ..., z] q ,y1,y2,…,y p The scoring network consists of several fully connected networks that perform further feature cross-interactions. The cross-interaction features are then activated by a sigmoid function and output as the user's rating of the video.

[0113] After obtaining the ratings, step S50, which recommends short videos based on their ratings, includes:

[0114] S51 sorts the short videos according to their ratings from highest to lowest to determine the sorting queue of the short videos;

[0115] S52 recommends short videos based on the sorted queue.

[0116] In this process, short videos are sorted from highest to lowest rating to determine their ranking queue, and then recommended based on this queue. For example, if short videos A, B, C, and D have ratings of 70, 60, 80, and 65 respectively, they are ranked in descending order as follows: Short Video C, Short Video A, Short Video D, Short Video B. Short videos are then recommended sequentially based on this ranking queue. In practical applications, if short videos have the same rating, they can be randomly ranked, or recommendation rules can be pre-defined based on user characteristics, such as prioritizing short videos corresponding to the user's grade level.

[0117] Another embodiment of the present invention provides a short video recommendation device 300, such as... Figure 5As shown, the system includes: a feature acquisition module 310 for acquiring user features and video features of short videos; a first feature extraction module 320 for inputting the acquired features into a pre-trained first neural network model 100 to obtain a first feature matrix, wherein each feature in the first feature matrix has the same weight; a second feature extraction module 330 for inputting the acquired features into a pre-trained second neural network model 200 to obtain a second feature matrix, wherein each feature in the second feature matrix is ​​assigned a different weight according to its importance; a scoring module 340 for scoring each short video based on the first feature matrix and the second feature matrix; and a video recommendation module 350 for recommending short videos based on their scores.

[0118] In this embodiment, since each user has different interests and daily needs, the terminal software (the short video recommendation device is applied in the terminal) scores the short videos based on user characteristics and video features to make the recommendations more targeted and better meet the user's needs and preferences. The terminal software is the software used to provide short videos in the terminal, such as Yuanfudao. User characteristics can be user age, gender, grade level, etc., and video features can be the corresponding textbook, chapter, or knowledge points contained in the video.

[0119] In practical applications, the user features and video features obtained from short videos are generally high-dimensional raw data. Directly performing subsequent calculations based on this high-dimensional raw data leads to computational complexity and low accuracy. Therefore, in other embodiments, to improve computational efficiency, a dimensionality reduction module 360 ​​is also configured in the short video recommendation device 300, such as... Figure 6 As shown, this method performs dimensionality reduction on user features and video features, reducing the high-dimensional original data to low-dimensional data. In practical applications, this dimensionality reduction module can be an embedding layer, which linearly projects the application features and video features to obtain low-dimensional user features and video features.

[0120] After obtaining user features and video features, they are input into the first neural network model 100 and the second neural network model 200 respectively for feature extraction. In this process, two different methods are used for feature extraction, which greatly improves the feature richness in the subsequent short video scoring process. Different weights are assigned to features from different aspects to explore the contribution of different features to video recommendation. In particular, low-frequency features that may be ignored in traditional recommendation methods are fully explored in this embodiment to improve the accuracy of short video recommendation and the user's satisfaction with the recommended short videos.

[0121] The following is a further explanation of each network module in the short video recommendation device 300. After the network model is created, it is trained on a training set and a validation set formed by a large number of user features and video features. After training is completed, it is applied to practice for short video recommendation.

[0122] like Figure 3 and Figure 4 As shown, the first neural network model 100 in the first feature extraction module 320 consists of multiple serially connected feature extraction blocks 110 (including feature extraction block 1, feature extraction block 2, ..., feature extraction block n in the figure). Each feature extraction block 110 includes a multi-head neural network 111, a first normalized residual network 112, a feature enhancement network 113, and a second normalized residual network 114 connected in sequence.

[0123] The multi-head neural network 111 includes multiple self-attention networks for linearly transforming the input features to obtain the first matrix. For the first feature extraction block, the input features are the acquired user features and the video features of the short video. For other feature extraction blocks, the input features are the output of the previous feature extraction block. The first matrix MultiHead(x) obtained by the multi-head neural network (including multiple self-attention networks) in the j-th feature extraction module is as shown in equations (1) to (2).

[0124] The first standardized residual network 112 is used to perform residual and standardization operations on the input features and the first matrix to obtain the second matrix L1, as shown in equation (4).

[0125] The feature enhancement network 113 is used to perform a nonlinear transformation on the second matrix to obtain the third matrix; in an instance where the enhancement network is a two-layer feedforward neural network, and the activation function of the first layer is ReLU and the activation function of the second layer is a linear activation function, the obtained third matrix FFN(x) is as shown in equation (5).

[0126] The second standardized residual network 114 is used to perform residual and standardization operations on the second and third matrices, and the output result L2 is shown in equation (6).

[0127] When the feature extraction block is not the last feature extraction block in the cascaded connection of the first neural network model, its output L2 will be used as the input of the next feature extraction block; conversely, when the feature extraction block is the last feature extraction block in the cascaded connection of the first neural network model, its output L2 is the output of the first neural network model, i.e., the first feature matrix. In practical applications, the number of cascaded feature extraction blocks in the first neural network model can be determined according to actual needs, such as setting 2, 3, or even more.

[0128] like Figure 5As shown, in the second feature extraction module 330, the second neural network model 200 includes a feature compression network 210, a feature importance prediction network 220, and a feature labeling network 230 connected in sequence; wherein,

[0129] Feature compression network 210 is used to perform average pooling on the acquired features to compress the input features, and the compression result F sq (e i As shown in equation (7). This process compresses the two-dimensional features into a real number, i.e., the compression result F. sq (e i ) is a scalar, and the output dimension matches the number of input features, representing the global information of the dimensional features.

[0130] The feature importance prediction network 220 is used to predict the weights of the compressed features, including two fully connected layers (including the first fully connected layer and the second fully connected layer) connected after the feature compression network 210. The prediction result A is as shown in Equation (8).

[0131] The feature labeling network 230 is used to weight the predicted features to the corresponding features, completing the relabeling of the acquired user features and video features, resulting in F. scale (A,E) is as shown in equation (8).

[0132] After the first feature extraction module 320 and the second feature extraction module 330 obtain the first feature matrix and the second feature matrix, they then score each short video based on the network structure in the scoring module 340. Specifically, the scoring module 340 includes an interconnected feature concatenation network and a scoring network. The feature concatenation network is used to horizontally concatenate the first feature matrix and the second feature matrix to obtain a concatenated feature matrix; the scoring network is used to calculate the score based on the concatenated feature matrix to obtain a score for each short video.

[0133] The concatenated feature matrix is ​​formed by horizontally concatenating the first feature matrix and the second feature matrix. Assume the first feature matrix is ​​[y1, y2, ..., y]. p The second characteristic matrix is ​​[z1, y2, ..., z]. q The concatenated feature matrix obtained after splicing is [y1, y2, ..., y]. p ,z1,z2,…,z q [z1, z2, ..., z] q ,y1,y2,…,y p The scoring network consists of several fully connected networks that perform further feature cross-interactions. The cross-interaction features are then activated by a sigmoid function and output as the user's rating of the video.

[0134] After obtaining the rating, the video recommendation module 350 recommends videos based on the results. Specifically, the video recommendation module 350 includes an interconnected sorting unit and a recommendation unit. The sorting unit is used to sort the short videos from highest to lowest rating to determine the sorting queue of short videos. The recommendation unit is used to recommend short videos based on the sorting queue.

[0135] In this process, the ranking unit sorts the short videos from highest to lowest rating, determining their ranking queue. The recommendation unit then recommends the short videos based on this queue. For example, if short videos A, B, C, and D have ratings of 70, 60, 80, and 65 respectively, they are ranked in descending order as follows: C, A, D, B. The short videos are then recommended sequentially based on this ranking queue. In practical applications, if short videos have the same rating, they can be randomly ranked, or recommendation rules can be pre-defined based on user characteristics, such as prioritizing short videos corresponding to the user's grade level.

[0136] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of program modules is merely an example. In practical applications, the above functions can be assigned to different program modules as needed, that is, the internal structure of the device can be divided into different program units or modules to complete all or part of the functions described above. The program modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one processing unit. The integrated unit can be implemented in hardware or as a software program unit. Furthermore, the specific names of the program modules are only for easy differentiation and are not intended to limit the scope of protection of this invention.

[0137] Figure 8 This is a schematic diagram of a computer device provided in one embodiment of the present invention. As shown, the computer device 400 includes a processor 420, a memory 410, and a computer program 411 stored in the memory 410 and executable on the processor 420, such as a short video recommendation update program. When the processor 420 executes the computer program 411, it implements the steps in the above-described embodiments of the various short video recommendation methods; or, when the processor 420 executes the computer program 411, it implements the functions of each module in the above-described embodiments of the various short video recommendation devices.

[0138] Computer device 400 may include, but is not limited to, processor 420 and memory 410. Those skilled in the art will understand that... Figure 8This is merely an example of computer device 400 and does not constitute a limitation on computer device 400. It may include more or fewer components than shown, or combine certain components, or use different components.

[0139] Processor 420 may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. General-purpose processor 420 may be a microprocessor or any conventional processor.

[0140] The memory 410 can be an internal storage unit of the computer device 400, such as a hard disk or RAM of the computer device 400. The memory 410 can also be an external storage device of the computer device 400, such as a plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, or Flash Card equipped on the computer device 400. Furthermore, the memory 410 can include both internal and external storage units of the computer device 400. The memory 410 is used to store the computer program 411 and other programs and data required by the computer device 400. The memory 410 can also be used to temporarily store data that has been output or will be output.

[0141] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0142] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0143] In the embodiments provided by this invention, it should be understood that the disclosed computer devices and methods can be implemented in other ways. For example, the computer device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interfaces, devices, or units, and may be electrical, mechanical, or other forms.

[0144] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0145] Furthermore, in the various embodiments of the present invention, the functional units may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The integrated unit described above can be implemented in hardware or as a software functional unit.

[0146] If an integrated module / unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the implementation of all or part of the processes in the methods of the above embodiments can also be accomplished by a computer program 411 sending instructions to related hardware. The computer program 411 can be stored in a computer-readable storage medium. When executed by the processor 420, the computer program 411 can implement the steps of the various method embodiments described above. The computer program 411 includes computer program code, which can be in the form of source code, object code, executable file, or some intermediate form. The computer-readable storage medium can include any entity or device capable of carrying the code of the computer program 411, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electrical carrier signal, a telecommunication signal, and a software distribution medium, etc. It should be noted that the contents of a computer-readable storage medium may be appropriately added to or subtracted from the contents as required by the legislation and patent practice in a jurisdiction. For example, in some jurisdictions, computer-readable media may not include electrical carrier signals and telecommunication signals, in accordance with the legislation and patent practice.

[0147] It should be noted that the above embodiments can be freely combined as needed. The above description is only a preferred embodiment of the present invention. It should be pointed out that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A short video recommendation method, characterized in that, The method comprises: obtaining user features and video features of a short video; inputting the obtained user features and video features into a pre-trained first neural network model to obtain a first feature matrix, wherein the weights of each feature in the first feature matrix are the same; inputting the obtained user features and video features into a pre-trained second neural network model to obtain a second feature matrix, wherein different weights are assigned to each feature according to the importance of the feature; scoring each short video based on the first feature matrix and the second feature matrix; recommending short videos according to the scores of the short videos; the second neural network model comprises a feature compression network, a feature importance prediction network and a feature calibration network connected in sequence; the inputting of the obtained features into the pre-trained second neural network model to obtain the second feature matrix comprises: performing average pooling operation on the input features in the feature compression network to compress the input features; inputting the compressed features into the feature importance prediction network to predict the weights of different features, wherein the feature importance prediction network comprises two fully connected layers connected after the feature compression network; inputting the predicted weights of different features into the feature calibration network to weight the corresponding features, thereby completing the re-calibration of the obtained user features and video features. 2.The short video recommendation method of claim 1, wherein, The first neural network model is composed of a plurality of serially connected feature extraction blocks, each of which comprises a multi-head neural network, a first standardized residual network, a feature enhancement network and a second standardized residual network connected in sequence, and the multi-head neural network comprises a plurality of self-attention networks; in the inputting of the obtained user features and video features into the pre-trained first neural network model to obtain the first feature matrix, each feature extraction block performs the following feature extraction steps: inputting the input features into the multi-head neural network to obtain a first matrix through linear transformation, wherein for the first feature extraction block, the input features are the obtained user features and video features of the short video, and for other feature extraction blocks, the input features are the output of the previous feature extraction block; inputting the input features and the first matrix into the first standardized residual network to obtain a second matrix through residual and standardization operations; inputting the second matrix into the feature enhancement network to obtain a third matrix through nonlinear transformation; inputting the second matrix and the third matrix into the second standardized residual network to perform residual and standardization operations.

3. The short video recommendation method of claim 1 or 2, wherein the scoring of each short video based on the first feature matrix and the second feature matrix, and further the short video recommendation, comprises: splicing the first feature matrix and the second feature matrix horizontally to obtain a spliced feature matrix; inputting the spliced feature matrix into a fully connected layer to calculate the score, thereby obtaining the score of each short video.

4. The short video recommendation method of claim 1 or 2, wherein the short video recommendation according to the scores of the short videos comprises: sorting the short videos according to the scores from large to small to determine a sorting queue of the short videos; recommending short videos according to the sorting queue; or The obtaining the user features and the video features of the short video further comprises: performing dimension reduction operation on the user features and the video features. 5.A short video recommendation apparatus, characterized in that, Comprise: a feature acquisition module, configured to acquire user features and video features of a short video; a first feature extraction module, configured to input the acquired user features and the video features into a pre-trained first neural network model to obtain a first feature matrix, wherein the weights of each feature in the first feature matrix are the same; a second feature extraction module, configured to input the acquired user features and the video features into a pre-trained second neural network model to obtain a second feature matrix, wherein different weights are assigned to each feature in the second feature matrix according to the importance of each feature; a scoring module, configured to score each short video based on the first feature matrix and the second feature matrix; a video recommendation module, configured to recommend short videos according to the scores of the short videos; In the second feature extraction module, the second neural network model comprises a feature compression network, a feature importance prediction network and a feature calibration network connected in sequence; wherein, the feature compression network is configured to perform average pooling operation on the acquired features to compress the input features; the feature importance prediction network is configured to predict the weights of the compressed features, and the feature importance prediction network comprises two fully connected layers connected after the feature compression network; the feature calibration network is configured to weight the predicted weights of different features to the corresponding features, and complete the re-calibration of the acquired user features and video features. 6.The short video recommendation apparatus of claim 5, wherein, In the first feature extraction module, the first neural network model is composed of a plurality of feature extraction blocks connected in series, and each feature extraction block comprises a multi-head neural network, a first standardized residual network, a feature enhancement network and a second standardized residual network connected in sequence, wherein, the multi-head neural network comprises a plurality of self-attention networks, configured to perform linear transformation on the input features to obtain a first matrix, wherein for the first feature extraction block, the input features are the acquired user features and the video features of the short video, and for other feature extraction blocks, the input features are the output of the previous feature extraction block; the first standardized residual network is configured to perform residual and standardization operation on the input features and the first matrix to obtain a second matrix; the feature enhancement network is configured to perform non-linear transformation on the second matrix to obtain a third matrix; the second standardized residual network is configured to perform residual and standardization operation on the second matrix and the third matrix.

7. The short video recommendation apparatus of claim 5 or 6, wherein, The scoring module comprises a feature splicing network and a scoring network connected to each other, wherein, the feature splicing network is configured to horizontally splice the first feature matrix and the second feature matrix to obtain a spliced feature matrix; the scoring network is configured to perform scoring calculation on the spliced feature matrix to obtain the score of each short video.

8. The short video recommendation device of claim 5 or 6, wherein the video recommendation module comprises a sorting unit and a recommendation unit connected to each other, wherein the sorting unit is configured to sort the short videos from large to small according to the scores to determine a sorting queue of the short videos; and the recommendation unit is configured to recommend short videos according to the sorting queue; or The short video recommendation device further comprises a dimension reduction module configured to perform dimension reduction operation on the user features and the video features.

9. A computer readable storage medium storing a computer program, characterized in that, The computer program, when executed by a processor, causes the processor to perform the steps of the short video recommendation method according to any one of claims 1 to 4. 10.A computer device, comprising a memory and a processor, and characterized in that, The memory stores a computer program, and the computer program, when executed by the processor, causes the processor to perform the steps of the short video recommendation method according to any one of claims 1 to 4.