A vehicle and cargo matching method based on cross attention and double dependence interaction

The CAMBNet model solves the problem of vehicle-cargo matching in online freight platforms where driver preferences are not fixed by using cross-attention and dual-dependency interaction networks, achieving more accurate and efficient vehicle-cargo matching recommendations.

CN122241192APending Publication Date: 2026-06-19GUIZHOU UNIVERSITY OF FINANCE AND ECONOMICS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUIZHOU UNIVERSITY OF FINANCE AND ECONOMICS
Filing Date
2026-05-18
Publication Date
2026-06-19

Smart Images

  • Figure CN122241192A_ABST
    Figure CN122241192A_ABST
Patent Text Reader

Abstract

This invention discloses a vehicle-cargo matching method based on cross-attention and dual-dependency interaction, which relates to the field of deep learning and vehicle-cargo matching recommendation system. The method includes the following steps: (1) extracting features from the historical interaction wide table of driver-cargo source, and using the embedding layer to vectorize the discrete fields to obtain the preliminary feature representations of the driver side and the cargo source side; (2) sending the embedding vectors on both sides into the cross-attention fusion module, and realizing feature complementarity of channel and spatial dimensions through the cross-attention fusion module; (3) concatenating the output result of the cross-attention mechanism with the numerical field to form a complete feature vector; (4) inputting the complete vector obtained in the third step into the dual-dependency interaction module (AMDNet) to iteratively extract high-order feature interaction dependencies; (5) outputting the recommendation score through the prediction layer based on the feature interaction vector in the fourth step, and optimizing the model through the cross-entropy loss function to improve the timeliness and accuracy of vehicle-cargo matching recommendation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of deep learning and vehicle-cargo matching recommendation systems, specifically to a vehicle-cargo matching method based on cross-attention and dual-dependency interaction. Background Technology

[0002] Online freight platforms (also known as "truck-freight matching platforms") are a typical business model of "Internet+" in road freight transportation. Essentially, they move the traditionally fragmented "freight-finding-truck, truck-finding-freight" information online, using algorithms to achieve efficient matching between supply and demand. Following the Ministry of Transport's issuance of the "Opinions on Promoting Reform Pilots and Accelerating the Innovative Development of Non-Vehicle-Owned Logistics" in 2016, the online freight licensing system was officially implemented, and the industry entered a stage of compliant development. According to statistics from the Ministry of Transport's online freight information exchange system, as of the end of June 2024, there were 3,286 online freight companies nationwide, connecting 8.044 million vehicles and 7.377 million drivers. In the first half of the year, a total of 80.877 million waybills were uploaded, a year-on-year increase of 52.8%. Online freight has covered more than 30 sub-categories of goods, including coal, steel, fast-moving consumer goods, and cold chain logistics, becoming the mainstream organizational method for trunk road transportation.

[0003] As the core intelligent hub of such platforms, the accuracy and efficiency of vehicle-cargo matching directly determine the platform's operational effectiveness and user experience. Early matching research focused primarily on management science and operations research, such as constructing credit systems using fuzzy comprehensive evaluation methods or applying game theory and genetic algorithms to solve path optimization and bilateral matching problems. While these methods have achieved results under specific constraints, they generally lack the ability to deeply mine the massive amounts of historical behavioral data accumulated by the platform. With the advent of the big data era, click-through rate prediction models based on machine learning and deep learning have been introduced into this field. Breakthroughs in deep learning in areas such as image, speech, and natural language processing have provided new ideas for personalized vehicle-cargo matching.

[0004] The massive amounts of historical data accumulated by the platform (clicks, bids, transactions, complaints) and multi-source heterogeneous data (GPS trajectories, electronic waybills, payment records, credit scores) naturally constitute a high-dimensional tensor of "drivers and vehicles - shippers and goods," providing ample training fuel for deep learning models. According to Convoy's sustainability report, international companies such as Uber Freight, Convoy, and Amazon Freight have applied deep ranking models to actual order dispatch, resulting in an 8%–12% reduction in average empty mileage. Domestic leading platforms such as Full Truck Alliance, Lalamove, and Didi Freight have also launched intelligent recommendation functions, but publicly available literature discloses very little detail about matching algorithms, and most of it remains at the stage of coarse-grained collaborative filtering or GBDT. Research on key issues such as driver personalized preferences, multi-attribute interaction of goods, and cold start of sparse data is still weak.

[0005] Therefore, a vehicle-cargo matching recommendation method designed for online freight platforms is needed. This method can deeply integrate dynamic driver behavior characteristics and static cargo characteristics through collaborative modeling of driver behavior and dual-dependency interaction methods to solve the problems of unstable driver preferences, coarse conditions in the recall stage, and long vehicle-cargo matching time in freight platform vehicle-cargo matching. Summary of the Invention

[0006] The CAMBNet model addresses the challenges of rapidly changing driver preferences and complex cargo attributes. It utilizes cross-attention to weight fields on both sides and combines a dual-dependency interaction network to mine drivers' historical behavior and preferences, thereby more accurately recommending goods that meet drivers' needs and preferences.

[0007] This invention is achieved through the following technical solution:

[0008] A vehicle-cargo matching method based on cross-attention and dual-dependency interaction includes the following steps:

[0009] (1) Extract features from the historical interaction wide table of driver-cargo source, and use the embedding layer to vectorize the discrete fields to obtain the preliminary feature representations of the driver side and the cargo source side;

[0010] (2) The embedding vectors on both sides are fed into the Cross-Attention Fusion Module, and the feature complementarity of the channel and spatial dimensions is achieved through the Cross-Attention Fusion Module;

[0011] (3) The output of the cross-attention mechanism is concatenated with the numerical field to form a complete feature vector;

[0012] (4) Input the complete vector obtained in step 3 into the dual dependency interaction module (AMDNet) and iteratively extract high-order feature interaction dependencies;

[0013] (5) Based on the feature interaction vector in step four, the prediction layer outputs the recommendation score, and the model is optimized by the cross-entropy loss function.

[0014] Furthermore, step (1) involves vectorizing the discrete fields using an embedding layer, specifically including the following steps:

[0015] Relevant features are extracted from the historical interaction table of drivers and cargo sources. The discrete feature fields are first one-hot encoded, and then the one-hot vector of each category is mapped to a low-dimensional continuous vector space through the embedding query matrix table. The embedding layer transforms it into a low-dimensional dense vector.

[0016] Suppose there are f discrete features, after passing through the embedding layer, we will obtain... , h represents the number of fields. Let k represent the embedding vector corresponding to the i-th field, where k is the embedding dimension, and the embedding vector field is divided into driver side. and cargo side Further processing is pending.

[0017] Perform Min-Max normalization on numerical features: Missing values ​​are imputed using the mean. After normalization eliminates dimensional differences, it can be used directly. It is concatenated with the processed discrete features in the connection layer as input to the higher-order feature interaction module.

[0018] Furthermore, the embedding vectors on both sides in step (2) are fed into the cross-attention, specifically including the following steps:

[0019] 1) Obtain two-dimensional feature maps by linearly mapping the driver-side and cargo-side features respectively. , Then, global average pooling and max pooling are performed to obtain the channel descriptor. , :

[0020]

[0021]

[0022]

[0023]

[0024] 2) The above descriptors are fed into a shared two-layer MLP, activated by ReLU, and then summed. Channel weights are generated using the Sigmoid function. :

[0025]

[0026]

[0027] 3) Utilize Constructing a cross-attention matrix = After Softmax normalization, , By weighting, the channel enhancement features are obtained. The channel weights of the features of the two subnets are obtained through the above method. and They are then multiplied to construct a cross-attention matrix of shape C × C. After Softmax normalization, , Weighting is performed to obtain channel enhancement features. , :

[0028]

[0029]

[0030] 4) To , Then perform spatial averaging and max pooling again, concatenate along the channels, and generate spatial weights via convolution. :

[0031]

[0032]

[0033] in This represents a convolutional layer with a kernel size of k×k. In this model, k=3 is set.

[0034] 5) Features can be obtained through the above operations. and Spatial weight coefficient and , and The input features are multiplied element-wise to obtain the fused features. In addition, residuals are added to the fused features. The final output is the fused driver-side and cargo-side enhanced feature maps. , .

[0035]

[0036]

[0037] Cross-attention mechanisms can dynamically learn the importance of different features and adjust feature weights based on the correlation between drivers and cargo sources, thereby achieving precise alignment and fusion of features.

[0038] 6) Then on and Global average pooling is used to obtain the vector:

[0039]

[0040]

[0041] Furthermore, step (3) of splicing consecutive fields specifically includes the following steps:

[0042] Employing a simple splicing method, continuous features After Min-Max normalization, we get ( However, the driver-side and cargo-side vectors processed through cross-attention are... , The complete vector is obtained by directly concatenating the vectors. This serves as the input for the subsequent Dual Dependency Interaction Network (AMD).

[0043] Furthermore, step (4) involves the dual-dependency interaction network extracting high-order feature interactions, specifically including the following steps:

[0044] 1) First, input reshape to... Treating it as a single-channel time series with length L=7, we obtain ;

[0045] 2) Secondly, perform dual interaction modeling to capture complex dependencies between fields. First, perform patch partitioning, dividing according to patch length P=7:

[0046]

[0047] Where C = 2C + m = 273.

[0048] 3) Then, time mixing and channel mixing are calculated. This method uses a time-step-shared MLP to aggregate the time mixing information of each channel in the time dimension to obtain the time dependency. The timing is mixed and executed for each patch:

[0049]

[0050] 4) Channel mixing transposes the time mixing result and then passes it through a shared MLP again:

[0051]

[0052] 5) After unpatch operation, the output information is split into various channels to obtain the residual. The residual operation ensures that the model better retains its ability to capture temporal dependencies, while effectively utilizing cross-channel dependencies, and finally, global average pooling is performed. .

[0053] 6) The output z of the dual-dependency interaction module is fed into the predictor to further predict the final prediction result.

[0054] Furthermore, the prediction optimization in step (5) specifically includes the following steps:

[0055] The probability of a driver clicking on a specific cargo source is calculated using a fully connected layer with only one output node.

[0056]

[0057] Calculate the AUC value of the evaluation index. ;

[0058] The loss function used is the binary cross-entropy loss function. .

[0059] The beneficial effects of the vehicle-cargo matching recommendation method based on driver behavior and dual-dependency collaboration provided by this invention are:

[0060] (1) This invention utilizes historical data information for effective modeling and introduces a cross-attention mechanism to better capture the rapid changes in driver preferences in vehicle-cargo matching scenarios, thereby improving the timeliness and accuracy of vehicle-cargo matching recommendations.

[0061] (2) This invention solves the problem of high signal noise by using deep matching of dual-dependency networks, thereby improving the real-time performance and accuracy of vehicle-cargo matching recommendations.

[0062] (3) This invention achieves effective integration of driver's real-time preferences and cargo attributes through bi-branch interaction update with cross attention and dual dependency interaction, which not only improves the recommendation effect, but also enhances the interpretability of the recommendation system. Attached Figure Description

[0063] To more clearly illustrate the technical solutions in the implementation of this invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. The accompanying drawings described below are only some embodiments of this invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0064] Figure 1 This is a flowchart illustrating the steps of the vehicle-cargo matching recommendation method based on driver behavior and dual-dependency collaboration according to the present invention.

[0065] Figure 2 This is a diagram illustrating the overall framework of the vehicle-cargo matching recommendation method based on driver behavior and dual-dependency collaboration according to the present invention.

[0066] Figure 3 This is the overall framework diagram of the model in this invention, specifically the expanded diagram of ChannelAttention within Cross-Attention Module Fusion.

[0067] Figure 4 This is the expanded diagram of SpatialAttention within the Cross-Attention Module Fusion framework of the present invention. Detailed Implementation

[0068] To further illustrate the technical solution of the present invention in detail, this embodiment is implemented based on the technical solution of the present invention, and provides a detailed implementation plan and specific steps.

[0069] The data source is a public dataset from an algorithm competition released by Yunmanman, a global highway trunk logistics matching and scheduling platform. The data spans fourteen days, taking a complete "driver browse-click-call-place order" behavior chain as a sample. The fields include 1-dimensional label features, 11-dimensional features on the driver side, 10-dimensional features on the cargo side, and 10-dimensional contextual scene fields, resulting in the final session dataset containing 1,000,000 interaction data points.

[0070] Based on the features extracted from the historical driver-freight source interaction table, the discrete features are first one-hot encoded into vectors, and then the one-hot vectors of each category are mapped to a low-dimensional continuous vector space by embedding a query matrix table. Assuming there are z discrete features in the dataset, for each classification feature... , After being embedded, it is mapped to an h-dimensional dense vector. Let represent the embedding vector of the i-th feature, and h represent the dimension of the embedding layer. The final result will be the embedding matrix. , , z represents the number of fields.

[0071] Perform Min-Max normalization on numerical features: Missing values ​​are imputed using the mean. After normalization eliminates dimensional differences, it can be used directly. It is concatenated with the processed discrete features in the connection layer as input to the higher-order feature interaction module.

[0072] Subsequently, after obtaining the low-dimensional dense vector of discrete features, the embedding vector field is divided into driver-side fields. With cargo-side fields and context fields The two parts are then fed into a cross-attention mechanism for further processing.

[0073] 1) First, turn the driver's side With cargo side The features are respectively obtained into two-dimensional feature maps through linear mapping. , :

[0074]

[0075]

[0076] 2) Secondly, regarding and Perform global average pooling and max pooling operations:

[0077]

[0078]

[0079]

[0080]

[0081] in

[0082] 3) Next, the two descriptors are fed into a shared two-layer MLP (dimensionality reduction ratio r=16), activated by ReLU, summed, and then channel weights are generated using the Sigmoid function:

[0083]

[0084]

[0085]

[0086]

[0087] in and All of these are learnable parameters.

[0088] 4) Then, the channel weights of the two subnets are obtained using the method described above. and They are then multiplied to construct a cross-attention matrix of shape C × C. After Softmax normalization, , Weighted summation yields channel enhancement features:

[0089]

[0090]

[0091] Following this, spatial attention fusion is performed on the two fusion features. , Perform averaging and max pooling along the channel dimension respectively:

[0092]

[0093]

[0094]

[0095]

[0096] in .

[0097] 5) For two features of shape C×H×W, obtain two 1×H×W spatial descriptions by max pooling and average pooling respectively, and connect these two descriptions along the channel dimension to generate spatial weights. Then, obtain the weight coefficients by passing them through a shared weight convolutional layer with a kernel size of k×k.

[0098]

[0099]

[0100] in This represents a convolutional layer with a kernel size of k×k. In this model, k=3 is set.

[0101] 6) Features can be obtained through the above operations. and Spatial weight coefficient and , and The input features are multiplied element-wise to obtain the fused features. In addition, residuals are added to the fused features. The final output is the fused driver-side and cargo-side enhanced feature maps. , .

[0102]

[0103]

[0104] in .

[0105] Next, for and Global average pooling is used to obtain the vector:

[0106]

[0107]

[0108] 7) The importance of different features is dynamically learned through the cross-attention mechanism, and the feature weights are adjusted according to the correlation between drivers and cargo sources, thereby achieving accurate alignment and fusion of features.

[0109] continuous features After Min-Max normalization, we get ( ), and then with and The complete feature vector is obtained by concatenating the features:

[0110]

[0111] Next, complete the field Feed it into the dual-dependency cross network, input reshape to Treating it as a single-channel time series with length L=7, we obtain .

[0112] Then, dual-interaction modeling is performed to capture complex dependencies between fields. First, patch partitioning is performed, with a patch length of P=7:

[0113]

[0114] Where C = 2C + m = 273.

[0115] 8) Next, time mixing and channel mixing are calculated. This method uses a time-step-shared MLP to aggregate the time mixing information of each channel in the time dimension to obtain time dependency. The timing is mixed and executed for each patch:

[0116]

[0117] The MLP consists of two fully connected layers with a hidden dimension of 2×C and the activation function GELU. Channel mixing transposes the temporal mixing result and then passes it through a shared MLP again.

[0118]

[0119] in A patch representing aggregated information. This represents the embedded output of the residual network. This is a learnable scalar scaling factor. It is obtained after performing an unpatch operation and splitting the output information into individual channels to obtain the residual. The residual operation ensures that the model better retains its ability to capture temporal dependencies, while effectively utilizing cross-channel dependencies, and finally, global average pooling is performed. .

[0120] 9) Feed the output z of the dual-dependency interaction module into the predictor to further predict the final prediction result:

[0121] The probability of a driver clicking on a specific cargo source is calculated using a fully connected layer with only one output node:

[0122]

[0123] The final evaluation index, AUC value, is obtained based on the calculation. ;

[0124] The loss calculation uses the binary cross-entropy loss function. .

[0125] The above embodiments merely illustrate the best implementation of the present invention, and the above description should not be construed as limiting the scope of this application. Those skilled in the art can make various modifications and variations within the principles of this invention, but any modifications, equivalent substitutions, improvements, etc., should be included within the protection scope of this invention.

Claims

1. A vehicle-cargo matching method based on cross-attention and dual-dependency interaction, characterized in that, Includes the following steps: (1) Extract features from the historical interaction wide table of driver-cargo source, and use the embedding layer to vectorize the discrete fields to obtain the preliminary feature representations of the driver side and the cargo source side; (2) The embedding vectors on both sides are fed into the Cross-Attention Fusion Module, and the feature complementarity of the channel and spatial dimensions is achieved through the Cross-Attention Fusion Module; (3) The output of the cross-attention mechanism is concatenated with the numerical field to form a complete feature vector; (4) Input the complete vector obtained in step 3 into the dual dependency interaction module (AMDNet) and iteratively extract high-order feature interaction dependencies; (5) Based on the feature interaction vector in step four, the prediction layer outputs the recommendation score, and the model is optimized by the cross-entropy loss function.

2. The vehicle-cargo matching method based on cross-attention and dual-dependency interaction according to claim 1, characterized in that: This method processes the public dataset of the algorithm competition released by Yunmanman, a global highway trunk logistics matching and scheduling platform, and adopts a strategy that combines cross-attention mechanism and dual dependency interaction method. In order to dynamically capture the complex interaction between driver interests and cargo attributes, thereby improving the accuracy of recommendations, the model's structural design is also highly interpretable, which helps to understand and optimize key factors in the vehicle-cargo matching process.

3. The vehicle-cargo matching method based on cross-attention and dual-dependency interaction according to claim 1, characterized in that: In step (1), relevant features are extracted from the historical interaction table of driver-cargo source, and dense fields are normalized. The discrete feature fields are first one-hot encoded, and then the one-hot vector of each category is mapped to a low-dimensional continuous vector space through the embedding query matrix table, which is then transformed into a low-dimensional dense vector by the embedding layer.

4. The vehicle-cargo matching method based on cross-attention and dual-dependency interaction according to claim 3, characterized in that: Suppose there are f discrete features, after passing through the embedding layer, we will obtain... , h represents the number of fields. This represents the embedding vector corresponding to the i-th field, where k is the embedding dimension; Perform Min-Max normalization on numerical features: Missing values ​​are imputed with mean; after normalization to eliminate dimensional differences, they can be used directly. They are concatenated with the processed discrete features in the connection layer as input to the higher-order feature interaction module. After obtaining the embedding vectors of discrete features, the embedding vector fields are then divided into driver-side fields. With cargo side field Two parts, awaiting further use.

5. The vehicle-cargo matching method based on cross-attention and dual-dependency interaction according to claim 1, characterized in that: In step (2), the feature fusion module is mainly responsible for further refining the classification fields processed by the Embedding layer into driver-side fields and cargo-side fields, and inputting the two feature fields into the CAFM (Cross-Attention Fusion Module) module for processing.

6. The vehicle-cargo matching method based on cross-attention and dual-dependency interaction according to claim 1, characterized in that, Step (2) of the CAFM module processing specifically includes the following steps: 1) Obtain two-dimensional feature maps by linearly mapping the driver-side and cargo-side features respectively. , Then, global average pooling and max pooling are performed to obtain the channel descriptor. , : 2) The above descriptors are fed into a shared two-layer MLP, activated by ReLU, and then summed. Channel weights are generated using the Sigmoid function. : 3) Utilize Constructing a cross-attention matrix = After Softmax normalization, , By weighting, the channel enhancement features are obtained. The channel weights of the features of the two subnets are obtained through the above method. and They are then multiplied to construct a cross-attention matrix of shape C × C. After Softmax normalization, , Weighting is performed to obtain channel enhancement features. , : 4) To , Then perform spatial averaging and max pooling again, concatenate along the channels, and generate spatial weights via convolution. : in This represents a convolutional layer with a kernel size of k×k. In this model, k=3. 5) Features can be obtained through the above operations. and Spatial weight coefficient and , and The input features are multiplied element-wise to obtain the fused features. In addition, residuals are added to the fused features. The final output is the fused driver-side and cargo-side enhanced feature maps. , .

7. The vehicle-cargo matching method based on cross-attention and dual-dependency interaction according to claim 1, wherein its features are as follows: The evidence lies in: By dynamically learning the importance of different features through a cross-attention mechanism and adjusting feature weights based on the correlation between drivers and cargo sources, precise alignment and fusion of features can be achieved.

8. The vehicle-cargo matching method based on cross-attention and dual-dependency interaction according to claim 1, characterized in that: Employing a simple splicing method, continuous features After Min-Max normalization, we get ( However, the driver-side and cargo-side vectors processed through cross-attention are... , The complete vector is obtained by directly concatenating the vectors. This serves as the input for the subsequent Dual Dependency Interaction Network (AMD).

9. The vehicle-cargo matching method based on cross-attention and dual-dependency interaction according to claim 8, wherein the complete vector is input into the AMD module, characterized in that: 1) Input reshape to Treating it as a single-channel time series with length L=7, we obtain ; 2) Then, perform dual interaction modeling to capture complex dependencies between fields; first, perform patch partitioning, partitioning according to patch length P=7: Where C = 2C + m = 273; 3) Further computation of temporal and channel mixing: This method uses a time-step-shared MLP to aggregate the temporal mixing information of each channel in the time dimension to obtain temporal dependencies. The timing is mixed and executed for each patch: 4) Channel mixing transposes the time mixing result and then passes it through a shared MLP again: 5) After unpatch operation, the output information is split into various channels to obtain the residual. The residual operation ensures that the model better retains its ability to capture temporal dependencies, while effectively utilizing cross-channel dependencies, and finally, global average pooling is performed. ; 6) The output z of the dual-dependency interaction module is sent to the predictor to further predict the final prediction result.

10. The vehicle-cargo matching method based on cross-attention and dual-dependency interaction according to claim 1, characterized in that, The prediction optimization specifically includes the following steps: The probability of a driver clicking on a specific cargo source is calculated using a fully connected layer with only one output node: Calculate the AUC value of the evaluation index. ; The loss function used is the binary cross-entropy loss function. .