A processing method and system for lane line recognition based on images

By constructing an end-to-end lane line prediction model, the problem of repetitive calculations in lane line recognition in autonomous driving systems is solved, recognition efficiency is improved and system computing power is saved, and efficient processing of lane line instance segmentation and attribute recognition is achieved.

CN115830566BActive Publication Date: 2026-06-26SUZHOU QINGZHOU ZHIHANG INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SUZHOU QINGZHOU ZHIHANG INTELLIGENT TECH CO LTD
Filing Date
2022-12-27
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing autonomous driving systems suffer from redundant calculations in lane line recognition, resulting in low recognition efficiency and wasted system computing resources.

Method used

An end-to-end lane prediction model is constructed, including a three-layer FPN network, a binary segmentation network, a 2D convolutional network, a lane instance recognition module, a lane feature extraction module, and a lane attribute prediction network. This model simultaneously handles lane instance segmentation and attribute recognition, avoiding redundant computation.

Benefits of technology

It improves lane line recognition efficiency, saves system computing resources, and achieves end-to-end lane line instance segmentation and attribute recognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115830566B_ABST
    Figure CN115830566B_ABST
Patent Text Reader

Abstract

Embodiments of the present application relate to a kind of processing method and system for lane line identification based on image, the method comprises: receiving first image;According to preset image size, the size of first image is adjusted to obtain corresponding second image;Second image is input into preset lane line prediction model to carry out lane line instance and attribute identification processing to obtain corresponding third image.Through the end-to-end lane line prediction model provided in the present application, the identification efficiency can be improved, and system algorithm power resource is saved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and in particular to a processing method and system for lane line recognition based on images. Background Technology

[0002] Autonomous vehicles need to recognize the road environment during operation. This environment includes not only the vehicle and surrounding obstacles, but also the lane markings. Determining lane marking information requires the vehicle's autonomous driving system to perform instance segmentation and attribute recognition of lane markings within the road environment. Typically, autonomous driving systems use multiple image processing models to handle lane marking instance segmentation and attribute classification separately. In practical applications, we have found that this conventional approach is prone to redundant computation, which not only reduces recognition efficiency but also wastes system computing resources. Summary of the Invention

[0003] The purpose of this invention is to address the shortcomings of existing technologies by providing a method, system, electronic device, and computer-readable storage medium for lane line recognition based on images. It constructs a lane line prediction model capable of simultaneously handling lane line instance segmentation and attribute recognition end-to-end, and performs lane line instance segmentation, lane line type attribute recognition, and lane line color attribute recognition on the input image based on this model. The end-to-end model provided by this invention avoids the problem of redundant computation in conventional multi-model schemes, thereby improving recognition efficiency and saving system computing resources.

[0004] To achieve the above objectives, a first aspect of the present invention provides a method for lane line recognition based on images, the method comprising:

[0005] Receive the first image;

[0006] The first image is resized according to a preset image size to obtain the corresponding second image; the preset image size is H0×W0×C0, where H0, W0, and C0 are the preset image height, width, and feature dimension, respectively.

[0007] The second image is input into a preset lane prediction model to perform lane line instance and attribute recognition processing to obtain the corresponding third image; the size of the third image is H0×W0×(C0+3); the third image adds three pixel-level semantic features compared to the second image, namely lane line index features, lane line type semantic features, and lane line color semantic features; the lane line type semantic features include multiple lane line types; the lane line color semantic features include multiple lane line colors.

[0008] Preferably, the lane prediction model includes a three-layer FPN network, a binary segmentation network, a 2D convolutional network, a lane instance recognition module, a lane feature extraction module, a lane attribute prediction network, and a lane instance and attribute assignment module.

[0009] The input of the three-layer FPN network is the input of the lane prediction model, and its output is connected to the inputs of the binary segmentation network and the 2D convolutional network, respectively. The three-layer FPN network includes a downsampling residual network side and an upsampling feature extraction network side. The downsampling residual network side includes first, second, and third-level residual units. The upsampling feature extraction network side includes first, second, and third-level feature extraction units. The input of the first-level residual unit is the input of the three-layer FPN network, and its output is connected to the input of the second-level residual unit and the first input of the first-level feature extraction unit, respectively. The output of the second-level residual unit is connected to the first input of the third-level feature extraction unit. The input of the residual unit is connected to the first input of the second-level feature extraction unit; the output of the third-level residual unit is connected to the input of the third-level feature extraction unit; the output of the third-level feature extraction unit is connected to the second input of the second-level feature extraction unit; the output of the second-level feature extraction unit is connected to the second input of the first-level feature extraction unit and the input of the binary segmentation network, respectively; the output of the first-level feature extraction unit is connected to the input of the 2D convolutional network; the first, second, and third-level residual units are by default the conv1, conv2_x, and conv3_x modules of the ResNet101 network;

[0010] The output of the binary segmentation network is connected to the input of the lane line instance recognition module. The binary segmentation network includes a first convolutional unit, a second convolutional unit, a feature vector transformation unit, a first multilayer sensing unit, a second multilayer sensing unit, and a binary image transformation unit. The input of the first convolutional unit is the input of the binary segmentation network, and its output is connected to the input of the second convolutional unit. The output of the second convolutional unit is connected to the input of the feature vector transformation unit. The output of the feature vector transformation unit is connected to the input of the first multilayer sensing unit. The output of the first multilayer sensing unit is connected to the input of the second multilayer sensing unit. The output of the second multilayer sensing unit is connected to the input of the binary image transformation unit. The output of the binary image transformation unit is connected to the input of the lane line instance recognition module.

[0011] The output of the lane line instance recognition module is connected to the first input of the lane line feature extraction module and the first input of the lane line instance and attribute allocation module, respectively.

[0012] The output of the 2D convolutional network is connected to the second input of the lane line feature extraction module;

[0013] The output of the lane line feature extraction module is connected to the input of the lane line attribute prediction network;

[0014] The output of the lane line attribute prediction network is connected to the second and third inputs of the lane line instance and attribute allocation module; the lane line attribute prediction network includes a type attribute prediction unit and a color attribute prediction unit; the input of the type attribute prediction unit is connected to the output of the lane line feature extraction module, and its output is connected to the second input of the lane line instance and attribute allocation module; the input of the color attribute prediction unit is connected to the output of the lane line feature extraction module, and its output is connected to the third input of the lane line instance and attribute allocation module.

[0015] The fourth input terminal of the lane line instance and attribute assignment module is the input terminal of the lane line prediction model.

[0016] Preferably, the step of inputting the second image into a preset lane prediction model for lane line instance and attribute recognition processing to obtain the corresponding third image specifically includes:

[0017] The lane prediction model inputs the second image into the three-layer FPN network for feature extraction to obtain corresponding first-level and second-level feature maps. The first-level feature map has a shape of H1×W1×C1, where H1, W1, and C1 are the height, width, and feature dimension of the first-level feature map, respectively. H1 = H0 / 2, W1 = W0 / 2, and the feature dimension C1 is 64 by default. The second-level feature map has a shape of H2×W2×C2, where H2, W2, and C2 are the height, width, and feature dimension of the second-level feature map, respectively. H2 = H1 / 2, W2 = W1 / 2, and C2 = C1*2.

[0018] The first-level feature map is input into the 2D convolutional network for 2D convolution processing to obtain the corresponding first feature map; the shape of the first feature map is H1×W1×C4; C4 is the feature dimension of the first feature map, C4=C1 / 4;

[0019] The secondary feature map is input into the binary segmentation network for binary image segmentation to obtain the corresponding first binary image; the shape of the first binary image is H1×W1×1;

[0020] The first binary image is input into the lane line instance recognition module for lane line instance recognition processing to obtain the corresponding first lane line coordinate tensor; the shape of the first lane line coordinate tensor is B×H1×2; B is the number of lane line instances; the first lane line coordinate tensor includes a first lane line coordinate vector {p} of length H1, representing the number of lane line instances B. k,j}, 1 ≤ first lane index k ≤ B, 1 ≤ coordinate index j ≤ H1; first lane line coordinate vector {p k,j}Including H1 first lane line coordinates p k,j The first lane line coordinate p k,j Including the x-axis and y-axis;

[0021] The first lane line coordinate tensor and the first feature map are input into the lane line feature extraction module for lane line feature fusion processing to obtain the corresponding first lane line feature tensor; the shape of the first lane line feature tensor is B×(H1*C4); the first lane line feature tensor includes the first lane line feature vector of the number of lane line instances B.

[0022] The first lane feature tensor is input into the type attribute prediction unit of the lane attribute prediction network to predict the type attribute of each lane instance, resulting in a corresponding first prediction tensor. The first lane feature tensor is then input into the color attribute prediction unit of the lane attribute prediction network to predict the color attribute of each lane instance, resulting in a corresponding second prediction tensor. The first prediction tensor has a shape of B×N1, where N1 is a preset number of lane type instances. The first prediction tensor includes a first prediction vector of shape 1×N1 representing the number of lane instance instances B, and the first prediction vector includes a first lane type prediction score for the number of lane type instances N1. The second prediction tensor has a shape of B×N2, where N2 is a preset number of lane color instances. The second prediction tensor includes a second prediction vector of shape 1×N2 representing the number of lane instance instances B, and the second prediction vector includes a first lane color prediction score for the number of lane color instances N2.

[0023] The first lane line coordinate tensor, the first prediction tensor, the second prediction tensor, and the second image are input into the lane line instance and attribute allocation module to perform pixel-level lane line type and lane line color semantic feature addition processing on the second image to obtain the corresponding third image.

[0024] Furthermore, the lane prediction model inputs the second image into the three-layer FPN network for feature extraction to obtain the corresponding first-level and second-level feature maps, specifically including:

[0025] The second image with shape H0×W0×C0 is input into the first-level residual unit to perform downsampling residual operation to obtain the corresponding first-level downsampling feature tensor; the shape of the first-level downsampling feature tensor is H1×W1×C1;

[0026] The first-level downsampled feature tensor is input into the second-level residual unit to perform downsampled residual calculation to obtain the corresponding second-level downsampled feature tensor; the shape of the second-level downsampled feature tensor is H2×W2×C2;

[0027] The second-level downsampling feature tensor is input into the third-level residual unit to perform downsampling residual calculation to obtain the corresponding third-level downsampling feature tensor; the shape of the third-level downsampling feature tensor is H3×W3×C3, where H3, W3, and C3 are the height, width, and feature dimension of the third-level downsampling feature tensor, respectively, and H3 = H2 / 2, W3 = W2 / 2, C3 = C2*2;

[0028] The third-level downsampled feature tensor is input into the third-level feature extraction unit; and the third-level feature extraction unit performs convolution operation on the third-level downsampled feature tensor based on a preset 3×3 first convolution kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding third-level feature map; the shape of the third-level feature map is consistent with the shape of the third-level downsampled feature tensor, which is H3×W3×C3;

[0029] The second-level downsampled feature tensor and the third-level feature map are input into the second-level feature extraction unit; the second-level feature extraction unit performs 2x upsampling and feature dimensionality reduction on the third-level feature map to obtain a first upsampled feature map with a shape of H2×W2×C2; and the second-level downsampled feature tensor and the first upsampled feature map are summed to obtain the corresponding second-level feature map.

[0030] The first-level downsampled feature tensor and the second-level feature map are input into the first-level feature extraction unit; the first-level feature extraction unit performs 2x upsampling and feature dimensionality reduction on the second-level feature map to obtain the corresponding second upsampled feature map with shape H1×W1×C1; and the first-level downsampled feature tensor and the second upsampled feature map are tensor summed to obtain the corresponding first-level feature map.

[0031] Furthermore, the step of inputting the first-level feature map into the 2D convolutional network for 2D convolution processing to obtain the corresponding first feature map specifically includes:

[0032] The first-level feature map is input into the 2D convolutional network; and the 2D convolutional network performs convolution operations on the first-level feature map based on a preset 3×3 second convolutional kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding first feature map.

[0033] Furthermore, the step of inputting the secondary feature map into the binary segmentation network for binary map segmentation processing to obtain the corresponding first binary map specifically includes:

[0034] The binary segmentation network performs 2x upsampling on the input secondary feature map of shape H2×W2×C2 and preserves the feature dimension unchanged to obtain the corresponding third upsampled feature map; the shape of the third upsampled feature map is H1×W1×C2.

[0035] The third upsampled feature map is input into the first convolutional unit; and the first convolutional unit performs convolution operation on the third upsampled feature map based on a preset 3×3 third convolutional kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding first convolutional tensor.

[0036] The first convolution tensor is input into the second convolution unit; and the second convolution unit performs a convolution operation on the first convolution tensor based on a preset 3×3 fourth convolution kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding second convolution tensor; the shape of the second convolution tensor is H1×W1×C1;

[0037] The second convolution tensor is input into the feature vector conversion unit; and the feature vector conversion unit performs tensor flattening on the second convolution tensor to obtain the corresponding global feature vector; the shape of the global feature vector is (H1*W1*C1)×1;

[0038] The global feature vector is input into the first multilayer perceptron; the first fully connected layer of the first multilayer perceptron performs linear calculation on the global feature vector to obtain the corresponding first fully connected vector; and the first normalization layer of the first multilayer perceptron performs normalization calculation on the first fully connected vector to obtain the corresponding first normalized vector; the first multilayer perceptron includes the first fully connected layer and the first normalization layer; the output of the first fully connected layer is connected to the input of the first normalization layer.

[0039] The first normalized vector is input into the second multilayer perceptron; the second fully connected layer of the second multilayer perceptron performs linear calculation on the first normalized vector to obtain the corresponding second fully connected vector; and the first softmax function layer of the second multilayer perceptron performs positive and negative class score regression prediction based on the second fully connected vector to obtain the corresponding first rating tensor; the second multilayer perceptron includes the second fully connected layer and the first softmax function layer; the output of the second fully connected layer is connected to the input of the first softmax function layer; the shape of the first rating tensor is H1×W1×2, and the first rating tensor includes H1*W1 first rating vectors of length 2, and the first rating vector includes a first positive class rating and a first negative class rating.

[0040] The first rating tensor is input into the binary image conversion unit; and the binary image conversion unit initializes a zero-based feature map of shape H1×W1×1; and iterates through each of the first rating vectors of the first rating tensor; during the iteration, it checks whether the first positive class rating of the currently iterated first rating vector is greater than the first negative class rating; when it is confirmed that the first positive class rating is greater than the first negative class rating, the first feature data corresponding to the currently iterated first rating vector in the zero-based feature map is set to 1; and at the end of the iteration, the updated zero-based feature map is output as the corresponding first binary image; the zero-based feature map at the time of initialization consists of the first feature data of H1*W1 initialized to 0, and the first feature data corresponds one-to-one with the first rating vector.

[0041] Furthermore, the step of inputting the first binary image into the lane line instance recognition module for lane line instance recognition processing to obtain the corresponding first lane line coordinate tensor specifically includes:

[0042] Step 71: The lane line instance recognition module divides the first binary image into H1 first row pixel sequences in a bottom-to-top order; the first row of the first row pixel sequence is the bottom row of the first binary image, and the H1th row of the first row pixel sequence is the top row of the first binary image; the first row pixel sequence includes W1 first row pixels; the pixel value of each first row pixel is 0 or 1;

[0043] Step 72: Traverse each first row pixel sequence of the first binary image; during traversal, record the currently traversed first row pixel sequence as the corresponding current row pixel sequence; and record the pixel subsequence formed by multiple consecutive first row pixels with a value of 1 in the current row pixel sequence as the corresponding first subsequence; and identify whether the sequence length of each first subsequence in the current row pixel sequence is odd; if the sequence length of the current first subsequence is odd, then only the first row pixel in the middle position of the current first subsequence is recorded. The pixel value is kept at 1, and the pixel values ​​of all other pixels in the first row are modified to 0; if the sequence length of the current first subsequence is even, then one of the two first row pixels in the middle position in the current first subsequence is selected as the corresponding first reserved point, and only the pixel value of the first reserved point in the current first subsequence is kept at 1, while the pixel values ​​of all other pixels in the first row are modified to 0; at the end of the traversal, the modified first binary image is used as the corresponding second binary image; the second binary image includes H1 second row pixel sequence L from bottom to top. i 1 ≤ row index i ≤ H1, second row pixel sequence L i=1 This is the bottom row of the second binary image, representing the sequence of pixels in the second row. The top row of the second binary image; the pixel sequence L of the second row. i Includes W1 second-row pixels; the pixel value of each second-row pixel is either 0 or 1; the sequence of each second-row pixel is L. i There are no pixels in the second row whose values ​​are all 1 in a row;

[0044] Step 73: Initialize the first index A to 1; and initialize the first lane index k to 0;

[0045] Step 74, in the second row of pixel sequence L i=1 In the process, the second row of pixels with a pixel value of 1 is searched one by one from left to right; each time a pixel is found, the currently searched second row pixel is taken as the corresponding current search point, and the pixel coordinates of the current search point in the second binary image are taken as the corresponding current point coordinates. The first lane index k is incremented by 1, and a new all-zero coordinate vector of length H1 is added as the corresponding first lane line coordinate vector {p}. k,j}, and in the newly added first lane line coordinate vector {p k,j The coordinate index j of the first lane line is matched with the first index A in the following context: k,j Set the coordinates of the current point; proceed to step 76;

[0046] Step 75, in the second row of pixel sequence L i=A In the process, the second row of pixels with a pixel value of 1 are searched one by one from left to right; each time a pixel is found, the currently searched second row of pixels is taken as the current search point; the pixel coordinates of the current search point in the second binary image are taken as the coordinates of the current point; and the existing coordinate vectors of each first lane line {p k,j In the context of the first lane line coordinate p, the first non-zero coordinate preceding the first index A is defined as the coordinate index j. k,j Let the coordinates before the corresponding first lane line be denoted as ; and calculate the absolute value of the lateral coordinate difference between the current point coordinate and each of the first lane line coordinates before the first lane line to obtain the corresponding first lateral coordinate difference; and select the minimum value from all the obtained first lateral coordinate differences as the corresponding minimum lateral coordinate difference; and identify whether the minimum lateral coordinate difference exceeds a preset lateral coordinate difference threshold; if the minimum lateral coordinate difference does not exceed the lateral coordinate difference threshold, then the first lane line coordinate vector {p} corresponding to the first lane line coordinate before the first lane line corresponding to the minimum lateral coordinate difference is... k,j In the context of index j, the first lane line coordinate p is matched with the first index A. k,j Let the current point coordinates be set; if the minimum lateral coordinate difference exceeds the lateral coordinate difference threshold, then increment the first lane index k by 1, and add a new all-zero coordinate vector of length H1 as the corresponding first lane line coordinate vector {p k,j}, and in the newly added first lane line coordinate vector {p k,j The first lane line coordinate p that matches the index j with the first index A in the} k,j Let these be the coordinates of the current point;

[0047] Step 76: Increment the first index A by 1; and identify whether the first index A after incrementing by 1 is greater than H1; if yes, proceed to step 77; if no, proceed to step 75.

[0048] Step 77: Take the latest value of the first lane index k as the corresponding number of lane line instances B; and use the first lane line coordinate vector {p} of the obtained number of lane line instances B as the basis for determining the lane line coordinate vector. k,j The coordinate tensor of the first lane line is formed accordingly.

[0049] Furthermore, the step of inputting the first lane line coordinate tensor and the first feature map into the lane line feature extraction module for lane line feature fusion processing to obtain the corresponding first lane line feature tensor specifically includes:

[0050] The first lane line coordinate tensor and the first feature map are input into the lane line feature extraction module; and the lane line feature extraction module extracts the first lane line coordinate vectors {p} based on the first lane line coordinate tensor. k,j H1 coordinates of the first lane line p k,j H1 corresponding first feature map pixels are marked sequentially on the first feature map; C4 feature data of each first feature map pixel are extracted to form a first pixel feature vector of length C4; the obtained H1 first pixel feature vectors of length C4 are used to form a first lane line feature matrix of shape H1×C4; the first lane line feature matrix is ​​transformed into a one-dimensional vector to obtain a first lane line feature vector of shape (H1*C4)×1; and the first lane line feature vectors of the obtained number of lane line instances B are used to form a first lane line feature tensor of shape B×(H1*C4).

[0051] Furthermore, the step of inputting the first lane line feature tensor into the type attribute prediction unit of the lane line attribute prediction network to predict the type attribute of each lane line instance to obtain the corresponding first prediction tensor specifically includes:

[0052] The first lane feature tensor is input into the type attribute prediction unit of the lane attribute prediction network; the type attribute prediction unit inputs each of the first lane feature vectors of the first lane feature tensor into the third fully connected layer for linear calculation, and inputs the calculation result into the second normalization layer for normalization processing, and inputs the processing result into the third fully connected layer for linear calculation, and inputs the calculation result into the second softmax layer for lane type classification score prediction to obtain the corresponding first prediction vector; and the first prediction vectors of the obtained number of lane instances B form the corresponding first prediction tensor with shape B×N1; the type attribute prediction unit includes the third fully connected layer, the second normalization layer, the third fully connected layer and the second softmax layer; the output vector of the second softmax layer has shape N1×1, where N1 is a preset number of lane types; the first prediction vector has shape N1×1 and includes the first lane type prediction score of the number of lane types N1; the number of lane types N1 includes at least dashed line type and solid line type.

[0053] Furthermore, the step of inputting the first lane line feature tensor into the color attribute prediction unit of the lane line attribute prediction network to predict the color attribute of each lane line instance to obtain the corresponding second prediction tensor specifically includes:

[0054] The first lane feature tensor is input into the color attribute prediction unit of the lane attribute prediction network; the color attribute prediction unit inputs each of the first lane feature vectors of the first lane feature tensor into the fourth fully connected layer for linear calculation, and inputs the calculation result into the third normalization layer for normalization processing, and inputs the processing result into the fifth fully connected layer for linear calculation, and inputs the calculation result into the third softmax layer for lane color classification score prediction to obtain the corresponding second prediction vector; and the second prediction vectors of the obtained number of lane instances B form the corresponding second prediction tensor with a shape of B×N2; the color attribute prediction unit includes the fourth fully connected layer, the third normalization layer, the fifth fully connected layer and the third softmax layer; the output vector of the third softmax layer has a shape of N2×1, where N2 is the preset number of lane colors; the second prediction vector has a shape of N2×1, including the first lane color prediction score of the number of lane colors N2; the number of lane colors N2 includes at least yellow and white.

[0055] Furthermore, the step of inputting the first lane line coordinate tensor, the first prediction tensor, the second prediction tensor, and the second image into the lane line instance and attribute allocation module to perform pixel-level lane line type and lane line color semantic feature addition processing on the second image to obtain the corresponding third image specifically includes:

[0056] The lane line instance and attribute allocation module constructs a zero-based semantic feature map of shape H1×W1×3 for the number B of lane line instances as the corresponding first semantic feature map. The first semantic feature map includes H1*W1 pixels, each pixel corresponding to a first semantic feature vector of length 3. The first semantic feature vector includes three pixel-level semantic features: the lane line index feature, the lane line type semantic feature, and the lane line color semantic feature. The first semantic feature map and the first lane line coordinate vector {p} of the first lane line coordinate tensor are used to construct the first lane line coordinate map. k,j Each prediction vector corresponds to a one-to-one prediction vector of the first prediction tensor and to a one-to-one prediction vector of the second prediction tensor.

[0057] For each of the first semantic feature maps, the corresponding first lane line coordinate vector {p k,j The coordinates p of each of the first lane lines k,jMatched pixels are marked as corresponding first lane line key points; and a specified number of pixels in the same row are marked as multiple first lane line extension points corresponding to the current first lane line key point, with each first lane line key point as the center.

[0058] Based on the first lane line coordinate vector {p} corresponding to each of the first semantic feature maps k,j The first lane index k is set for the lane index features of each first lane key point and the first lane extension point on the first semantic feature map;

[0059] The maximum score of the first lane line type prediction score of the first prediction vector corresponding to each first semantic feature map is identified, and the lane line type corresponding to the maximum score is taken as the corresponding first lane line type; and based on the first lane line type corresponding to each first semantic feature map, the lane line type semantic features of each first lane line key point and the first lane line extension point on the first semantic feature map are set.

[0060] The maximum score of the first lane line color prediction score of the second prediction vector corresponding to each first semantic feature map is identified, and the lane line color corresponding to the maximum score is taken as the corresponding first lane line color; and based on the first lane line color corresponding to each first semantic feature map, the lane line color semantic features of each first lane line key point and the first lane line extension point on the first semantic feature map are set.

[0061] For the first semantic feature map with shape H1×W1×3 obtained by the number of lane line instances B, pixel-level feature fusion processing is performed by adding features point by point to obtain the corresponding second semantic feature map with shape H1×W1×3.

[0062] Based on the graphic ratio of the second image and the second semantic feature map, the second semantic feature map is upsampled using bilinear interpolation to obtain a third semantic feature map with a corresponding shape of H0×W0×3.

[0063] The obtained third semantic feature map and the second image with shape H0×W0×C0 are subjected to pixel-level feature fusion processing by vector concatenation to obtain the corresponding third image with shape H0×W0×(C0+3); the size of the third image is H0×W0×(C0+3); the third image adds three pixel-level semantic features compared to the second image, namely the lane line index feature, the lane line type semantic feature, and the lane line color semantic feature; the lane line type semantic feature includes multiple lane line types, of which at least include dashed line type and solid line type; the lane line color semantic feature includes multiple lane line colors, of which at least include white and yellow.

[0064] A second aspect of the present invention provides a system for implementing the image-based lane line recognition processing method described in the first aspect above, the system comprising: a data receiving module, an image preprocessing module, and a lane line prediction model processing module;

[0065] The data receiving module is used to receive the first image;

[0066] The image preprocessing module is used to adjust the size of the first image according to a preset image size to obtain a corresponding second image; the preset image size is H0×W0×C0, where H0, W0, and C0 are the preset image height, width, and feature dimension, respectively.

[0067] The lane line prediction model processing module is used to input the second image into a preset lane line prediction model to perform lane line instance and attribute recognition processing to obtain the corresponding third image; the size of the third image is H0×W0×(C0+3); the third image adds three pixel-level semantic features compared to the second image, namely lane line index features, lane line type semantic features, and lane line color semantic features; the lane line type semantic features include multiple lane line types; the lane line color semantic features include multiple lane line colors.

[0068] A third aspect of the present invention provides an electronic device, including: a memory, a processor, and a transceiver;

[0069] The processor is used to couple with the memory, read and execute instructions in the memory to implement the steps of the method described in the first aspect above;

[0070] The transceiver is coupled to the processor, and the processor controls the transceiver to send and receive messages.

[0071] A fourth aspect of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the instructions described in the first aspect.

[0072] This invention provides a method, system, electronic device, and computer-readable storage medium for image-based lane line recognition. It constructs a lane line prediction model capable of simultaneously processing lane line instance segmentation and attribute recognition end-to-end. Based on this model, it performs lane line instance segmentation, lane line type attribute recognition, and lane line color attribute recognition on the input image. The end-to-end model provided by this invention avoids the redundant computation problem in conventional multi-model schemes, improving recognition efficiency and saving system computing resources. Attached Figure Description

[0073] Figure 1 This is a schematic diagram of a lane line recognition method based on an image provided in Embodiment 1 of the present invention;

[0074] Figure 2 This is a block diagram of the lane prediction model provided in Embodiment 1 of the present invention;

[0075] Figure 3 This is a module structure diagram of a lane line recognition processing system based on images provided in Embodiment 2 of the present invention;

[0076] Figure 4 This is a schematic diagram of the structure of an electronic device provided in Embodiment 3 of the present invention. Detailed Implementation

[0077] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of this invention, and not all embodiments. Based on the embodiments of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this invention.

[0078] The vehicle's autonomous driving system, through the image-based lane line recognition processing method provided in Embodiment 1 of the present invention, can perform lane line instance segmentation, lane line type attribute recognition, and lane line color attribute recognition on the input image using an end-to-end model (i.e., the lane line prediction model below). Figure 1 This is a schematic diagram of a lane line recognition method based on an image provided in Embodiment 1 of the present invention, as shown below. Figure 1 As shown, this method mainly includes the following steps:

[0079] Step 1: Receive the first image.

[0080] Here, the first image is a two-dimensional image of the vehicle and its surrounding environment, output by the perception module of the vehicle's autonomous driving system. This image is obtained by the image sensor (such as a camera) of the perception module capturing the current environment.

[0081] Step 2: Adjust the size of the first image according to the preset image size to obtain the corresponding second image;

[0082] The preset image size is H0×W0×C0, where H0, W0, and C0 are the preset image height, width, and feature dimension, respectively.

[0083] Here, the input image size of the lane line prediction model in this embodiment of the invention is fixed at H0×W0×C0. Therefore, after obtaining the first image of different sizes, it is necessary to adjust its size. There are many ways to adjust it, such as cropping, scaling, rotation, downsampling / interpolation, and other conventional processing methods, which will not be described in detail here.

[0084] Here, after obtaining the second image, the autonomous driving system of this embodiment of the invention will send it to the lane prediction model for processing through subsequent steps to obtain a third image with three additional pixel-level semantic features (lane line index feature, lane line type semantic feature, and lane line color semantic feature). Before describing the subsequent step 3 in detail, the lane prediction model of this embodiment of the invention will be described below.

[0085] like Figure 2 As shown in the module structure diagram of the lane prediction model provided in Embodiment 1 of the present invention, the lane prediction model of this embodiment includes a three-layer FPN network, a binary segmentation network, a 2D convolutional network, a lane line instance recognition module, a lane line feature extraction module, a lane line attribute prediction network, and a lane line instance and attribute assignment module; wherein,

[0086] 1) Three-layer FPN (Feature Pyramid Networks)

[0087] The three-layer FPN network has its input connected to the lane prediction model and its output connected to the inputs of a binary segmentation network and a 2D convolutional network, respectively. The three-layer FPN network includes a downsampling residual network side and an upsampling feature extraction network side. The downsampling residual network side includes first, second, and third-level residual units. The upsampling feature extraction network side includes first, second, and third-level feature extraction units. The input of the first-level residual unit is the input of the three-layer FPN network, and its output is connected to the input of the second-level residual unit and the first input of the first-level feature extraction unit, respectively. The output of the second-level residual unit is connected to the third-level residual unit... The input of the first-level feature extraction unit is connected to the first input of the second-level feature extraction unit; the output of the third-level residual unit is connected to the input of the third-level feature extraction unit; the output of the third-level feature extraction unit is connected to the second input of the second-level feature extraction unit; the output of the second-level feature extraction unit is connected to the second input of the first-level feature extraction unit and the input of the binary segmentation network, respectively; the output of the first-level feature extraction unit is connected to the input of the 2D convolutional network; the first, second, and third-level residual units are by default the conv1, conv2_x, and conv3_x modules of the ResNet101 network;

[0088] Here, the three-layer FPN network is actually a three-layer pyramid feature extraction network, consisting of a three-level downsampling residual network and a three-level upsampling feature extraction network. Pyramid feature extraction networks are widely used; their principles can be found in the paper "Feature Pyramid Networks for Object Detection," and will not be repeated here. It should be noted that this embodiment uses the conv1, conv2_x, and conv3_x modules of the ResNet101 network as the first, second, and third-level residual units on the three-level downsampling residual network side to refine the granularity of feature extraction. On the three-level downsampling residual network side, the output tensor size (H×W) of each of the first, second, and third-level residual units is 1 / 4 of the input tensor, and the feature dimension (C) of each of the first, second, and third-level residual units is twice that of the input tensor. On the three-level upsampling feature extraction network side, the third-level feature extraction unit performs a shape-invariant, feature-dimension-invariant convolution operation on the output tensor of the third-level residual unit and outputs the corresponding third-level feature map. The second-level feature extraction unit... The feature extraction unit is used to upsample the third-level feature map by 2X tensor size and reduce its dimensionality by half to obtain the corresponding upsampled feature map. It then fuses the output tensors of the upsampled feature map and the second-level residual unit through tensor summation to obtain the corresponding second-level feature map. The first-level feature extraction unit is used to upsample the second-level feature map by 2X tensor size and reduce its dimensionality by half to obtain the corresponding upsampled feature map. It then fuses the output tensors of the upsampled feature map and the first-level residual unit through tensor summation to obtain the corresponding first-level feature map. In this embodiment, the first-level and second-level feature maps output by the three-layer FPN network are applied. Specifically, the second-level feature map is input into a binary segmentation network for binary map construction, and the first-level feature map is input into a 2D convolutional network for feature dimensionality reduction.

[0089] 2) Binary segmentation network

[0090] The output of the binary segmentation network is connected to the input of the lane line instance recognition module. The binary segmentation network includes a first convolutional unit, a second convolutional unit, a feature vector transformation unit, a first multilayer sensing unit, a second multilayer sensing unit, and a binary image transformation unit. The input of the first convolutional unit is the input of the binary segmentation network, and its output is connected to the input of the second convolutional unit. The output of the second convolutional unit is connected to the input of the feature vector transformation unit. The output of the feature vector transformation unit is connected to the input of the first multilayer sensing unit. The output of the first multilayer sensing unit is connected to the input of the second multilayer sensing unit. The output of the second multilayer sensing unit is connected to the input of the binary image transformation unit. The output of the binary image transformation unit is connected to the input of the lane line instance recognition module.

[0091] The binary segmentation network of this invention retains the feature dimension of the secondary feature map and performs 2x upsampling on its tensor size to obtain the size and primary feature map. Figure 1 The system generates an upsampled feature map with richer feature granularity. This upsampled feature map is then passed through two convolutional units (the first and second convolutional units) with the tensor size (H×W) unchanged to obtain the corresponding convolutional tensor. The convolutional tensor is then flattened from three dimensions to one dimension by a feature vector transformation unit to obtain the corresponding one-dimensional global feature vector. Finally, a fully connected layer and a normalization layer of the first Multi-Layer Perceptron (MLP) are used to perform pixel-level linear feature calculation and normalization on the global feature vector to give the corresponding normalized vector. The system then passes through the second Multi-Layer Perceptron (MLP)... A fully connected layer and a softmax function layer of the Perceptron (MLP) perform pixel-level lane line pixel scoring based on normalized vectors and output the corresponding score tensor. The tensor size (H×W) of the score tensor is consistent with the size (H×W) of the first-level feature map, and the feature dimension is 2. It consists of positive class scores and negative class scores. The positive class score represents the probability score of the corresponding pixel being a lane line pixel, and the negative class score represents the probability score of the corresponding pixel not being a lane line pixel. After obtaining the score tensor, the binary segmentation network uses a binary map transformation unit to construct a first binary map based on the input score tensor. The size (H×W) of the first binary map is consistent with the size (H×W) of the first-level feature map, and the dimension is 1. The pixel value of the first binary map is set to 1 if the positive class score is greater than the negative class score, and 0 otherwise. Thus, the embodiment of the present invention can obtain a binary map with lane line pixels as foreground points and non-lane line pixels as background points through the binary segmentation network.

[0092] 3) Lane line instance recognition module

[0093] The output of the lane line instance recognition module is connected to the first input of the lane line feature extraction module and the first input of the lane line instance and attribute assignment module, respectively.

[0094] After obtaining the first binary image output by the binary segmentation network, the lane line instance recognition module of this embodiment of the invention first performs a binary image reconstruction: that is, it scans line by line from the bottom line to the top line. During the scanning process, isolated lane line pixels in each line are retained, and for multiple consecutive lane line pixels, only the middle pixel is retained. This results in a new binary image, namely the second binary image mentioned below. In the second binary image, the lane line pixels in each line are isolated.

[0095] After obtaining the second binary image, the lane line instance recognition module of this embodiment of the invention re-scans it line by line from the bottom row to the top row.

[0096] When scanning the bottom row of the second binary image, all isolated lane line pixels are regarded as the starting point of a lane line, and a lane line instance coordinate vector with a fixed length of H of the first-level feature map is created for each lane line, namely the first lane line coordinate vector below. The first lane line coordinate vector consists of H first lane line coordinates initialized to zero coordinates, and the first first lane line coordinate of each newly created first lane line coordinate vector is updated to the starting point coordinate of the corresponding lane line.

[0097] When scanning other rows of the second binary image, each isolated lane line pixel in the current row is traversed. During traversal, the absolute value of the lateral coordinate difference between the coordinates of the currently traversed lane line pixel and the coordinates of the nearest non-zero coordinate in the coordinate vectors of all previously created lane line instances (first lane line coordinate vector) is calculated, and the coordinate point corresponding to the minimum value is regarded as the nearest reference point to the currently traversed lane line pixel. If the absolute value of the lateral coordinate difference between the currently traversed lane line pixel and the reference point meets a preset threshold, then the two are regarded as belonging to the same lane line instance. At this time, the pixel coordinates of the currently traversed lane line pixel should be updated to the lane line instance coordinate vector (first lane line coordinate vector) corresponding to the reference point. The update method is to update the row index of the corresponding first lane line coordinate vector and the reference point. The first lane coordinate, aligned with the row index of the currently traversed lane pixel, is reset to the pixel coordinate of the currently traversed lane pixel. Conversely, if the absolute value of the lateral coordinate difference between the currently traversed lane pixel and the reference point does not meet the preset threshold, the currently traversed lane pixel is regarded as the starting point of a new lane. A lane instance coordinate vector (first lane coordinate vector) with a fixed length of H of the first-level feature map is created for the new lane. The pixel coordinate of the currently traversed lane pixel is updated in the newly added lane instance coordinate vector (first lane coordinate vector). The update method is to reset the first lane coordinate in the newly added first lane coordinate vector, whose row index is aligned with the row index of the currently traversed lane pixel, to the pixel coordinate of the currently traversed lane pixel.

[0098] The lane line instance recognition module of this embodiment of the invention will eventually obtain multiple lane line instance coordinate vectors (first lane line coordinate vectors) from the second binary image through the above-described line-by-line scanning method. By counting the number of first lane line coordinate vectors, the corresponding number of lane line instances B can be obtained. Then, by combining the first lane line coordinate vectors of the number of lane line instances B, a lane line instance segmentation tensor that can reflect the coordinate information of all lane line instances can be obtained, namely the first lane line coordinate tensor in the following text.

[0099] 4) 2D Convolutional Networks

[0100] The output of the 2D convolutional network is connected to the second input of the lane line feature extraction module;

[0101] The 2D convolutional network of this invention performs 2D convolution operations on the input first-level feature map to obtain a first feature map with the same size (H×W) as the first-level feature map, but with the feature dimension becoming 1 / 4 of the feature dimension of the first-level feature map; here, 2D convolution actually refers to the convolution kernel being a two-dimensional convolution kernel;

[0102] 5) Lane line feature extraction module

[0103] The output of the lane line feature extraction module is connected to the input of the lane line attribute prediction network;

[0104] The lane line feature extraction module of this embodiment of the invention has two inputs: a first feature map output by a 2D convolutional network and a first lane line coordinate tensor output by a lane line instance recognition module. Since the coordinate positions of each lane line instance can be obtained from the first lane line coordinate tensor and the pixel features of each coordinate can be obtained from the first feature map, the lane line feature extraction module actually obtains the pixel features corresponding to each lane line instance on the first feature map based on the first lane line coordinate vector in the first lane line coordinate tensor that reflects the coordinate information of each lane line instance, and constructs the corresponding lane line feature matrix, i.e., the first lane line feature matrix mentioned below. Then, it flattens each first lane line feature matrix from a two-dimensional matrix to a one-dimensional vector to obtain the corresponding first lane line feature vector. Finally, the first lane line feature vectors of the obtained number B lane line instances are used to form the corresponding first lane line feature tensor.

[0105] 6) Lane line attribute prediction network

[0106] The output of the lane line attribute prediction network is connected to the second and third inputs of the lane line instance and attribute allocation module; the lane line attribute prediction network includes a type attribute prediction unit and a color attribute prediction unit; the input of the type attribute prediction unit is connected to the output of the lane line feature extraction module, and the output is connected to the second input of the lane line instance and attribute allocation module; the input of the color attribute prediction unit is connected to the output of the lane line feature extraction module, and the output is connected to the third input of the lane line instance and attribute allocation module.

[0107] The lane line attribute prediction network of this invention embodiment may include multiple attribute prediction units, each of which performs independent attribute prediction using the first lane line feature tensor output by the lane line feature extraction module as input. In this invention embodiment, two specific attribute prediction units are provided: a type attribute prediction unit and a color attribute prediction unit. The type attribute prediction unit is used to perform regression classification prediction of the lane line type attribute of each lane line instance based on the first lane line feature tensor output by the lane line feature extraction module. The lane line type refers to the line type of the lane line, such as dashed line type, solid line type, etc. The color attribute prediction unit is used to perform regression classification prediction of the lane line type attribute of each lane line instance based on the first lane line feature tensor output by the lane line feature extraction module. The lane line color attribute of each lane line instance is used for regression classification prediction. The lane line color refers to the color type of the lane line, such as yellow, white, etc. The model structure of the type attribute prediction unit and the color attribute prediction unit in this embodiment of the invention is similar. They are both classification prediction networks implemented based on a multilayer perceptual MLP network (fully connected layer + normalization layer + fully connected layer + softmax layer). It should be noted that since each attribute prediction unit in the lane line attribute prediction network of this embodiment of the invention is independent, multiple attribute prediction units can be horizontally expanded. That is to say, in addition to the currently given type attribute prediction unit and color attribute prediction unit, other attribute prediction units can be added according to actual application needs.

[0108] 7) Lane line instances and attribute assignment module

[0109] The fourth input of the lane line instance and attribute assignment module is the input of the lane line prediction model;

[0110] The input to the lane line attribute prediction network in this embodiment of the invention includes the initial input image of the lane line prediction model, i.e., the second image; the lane line instance segmentation tensor output by the lane line instance recognition module of the lane line prediction model, i.e., the first lane line coordinate tensor; and the lane line instance attribute classification prediction tensor output by the lane line attribute prediction network, i.e., the first and second prediction tensors. The lane line attribute prediction network actually first constructs a first semantic feature map for each lane line instance from the first lane line coordinate tensor and the first and second prediction tensors, based on the first lane line coordinate vector, the first prediction vector, and the second prediction tensor corresponding to each lane line instance. The first semantic feature map has three types of semantic features for each pixel: lane line index features, lane line type semantic features, and lane line color semantic features. For each pixel in the first semantic feature map related to the corresponding lane line instance, the three semantic features are not 0, and the remaining pixels... All three semantic features of the point are 0. After obtaining the first semantic feature map of all lane line instances, i.e., obtaining B first semantic feature maps of lane line instances, the lane line attribute prediction network of this embodiment performs feature fusion processing on the B first semantic feature maps of lane line instances by adding feature maps to obtain the corresponding second semantic feature map. Because the image size (H×W) of the second semantic feature map is the same as the size (H×W) of the first-level feature map, but not the same as the second image, the lane line attribute prediction network also needs to perform upsampling processing on the second semantic feature map according to the graphic ratio of the second image and the second semantic feature map, based on bilinear interpolation, to obtain a third semantic feature map with the same size (H×W) as the second image. Finally, by superimposing the third semantic feature map on the second image, a third image with three added semantic features can be obtained. This third image not only carries the original image information of the second image, but also carries the instance segmentation information and attribute classification information of each lane line.

[0111] As described above, the lane prediction model constructed in this embodiment of the invention can perform lane line instance segmentation, lane line type attribute recognition, and lane line color attribute recognition on the input image. Furthermore, it should be noted that when training the lane prediction model, this embodiment of the invention can use the focus loss function to evaluate the classification error output by each attribute prediction unit in the lane line attribute prediction network, and use the L1-loss function to evaluate the lane line instance localization error (such as instance length error, instance lateral offset error, etc.) output by the lane line instance recognition module.

[0112] Step 3: Input the second image into the preset lane line prediction model to perform lane line instance and attribute recognition processing to obtain the corresponding third image;

[0113] The third image has a size of H0×W0×(C0+3). Compared with the second image, the third image adds three pixel-level semantic features: lane line index feature, lane line type semantic feature, and lane line color semantic feature. The lane line type semantic feature includes multiple lane line types. The lane line color semantic feature includes multiple lane line colors.

[0114] Specifically, this includes: Step 31, where the lane line prediction model sends the second image input to the model into a three-layer FPN network for feature extraction to obtain the corresponding first-level and second-level feature maps;

[0115] The first-level feature map has a shape of H1×W1×C1, where H1, W1, and C1 are the height, width, and feature dimension of the first-level feature map, respectively. H1 = H0 / 2, W1 = W0 / 2, and the feature dimension C1 is 64 by default. The second-level feature map has a shape of H2×W2×C2, where H2, W2, and C2 are the height, width, and feature dimension of the second-level feature map, respectively. H2 = H1 / 2, W2 = W1 / 2, and C2 = C1*2.

[0116] Specifically, it includes: step 311, inputting the second image with shape H0×W0×C0 into the first-level residual unit to perform downsampling residual operation to obtain the corresponding first-level downsampling feature tensor;

[0117] The shape of the first-level downsampling feature tensor is H1×W1×C1;

[0118] Step 312: Input the first-level downsampled feature tensor into the second-level residual unit to perform downsampled residual calculation to obtain the corresponding second-level downsampled feature tensor;

[0119] The shape of the second-order downsampling feature tensor is H2×W2×C2;

[0120] Step 313: Input the second-level downsampled feature tensor into the third-level residual unit to perform downsampled residual calculation to obtain the corresponding third-level downsampled feature tensor;

[0121] The shape of the third-level downsampling feature tensor is H3×W3×C3, where H3, W3, and C3 are the height, width, and feature dimension of the third-level downsampling feature tensor, respectively, and H3 = H2 / 2, W3 = W2 / 2, C3 = C2*2.

[0122] Step 314: Input the third-level downsampled feature tensor into the third-level feature extraction unit; and the third-level feature extraction unit performs convolution operation on the third-level downsampled feature tensor based on a preset 3×3 first convolution kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding third-level feature map.

[0123] Among them, the shape of the third-level feature map is consistent with the shape of the third-level downsampled feature tensor, which is H3×W3×C3;

[0124] Step 315: Input the second-level downsampled feature tensor and the third-level feature map into the second-level feature extraction unit; and the second-level feature extraction unit performs 2x upsampling and feature dimensionality reduction on the third-level feature map to obtain the corresponding first upsampled feature map with shape H2×W2×C2; and perform tensor summation calculation on the second-level downsampled feature tensor and the first upsampled feature map to obtain the corresponding second-level feature map;

[0125] Step 316: Input the first-level downsampled feature tensor and the second-level feature map into the first-level feature extraction unit; and the first-level feature extraction unit performs 2x upsampling and feature dimensionality reduction on the second-level feature map to obtain the corresponding second-level upsampled feature map with shape H1×W1×C1; and perform tensor summation on the first-level downsampled feature tensor and the second-level upsampled feature map to obtain the corresponding first-level feature map;

[0126] Step 32: Input the first-level feature map into a 2D convolutional network for 2D convolution processing to obtain the corresponding first feature map;

[0127] The shape of the first feature map is H1×W1×C4; C4 is the feature dimension of the first feature map, C4=C1 / 4;

[0128] Specifically, this includes: inputting the first-level feature map into a 2D convolutional network; and having the 2D convolutional network perform convolution operations on the first-level feature map based on a preset 3×3 second convolutional kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding first feature map;

[0129] Step 33: Input the secondary feature map into the binary segmentation network to perform binary image segmentation processing to obtain the corresponding first binary image;

[0130] The shape of the first binary image is H1×W1×1;

[0131] Specifically, it includes: Step 331, where the binary segmentation network performs 2x upsampling on the input second-level feature map with shape H2×W2×C2 and preserves the feature dimension unchanged to obtain the corresponding third-level upsampled feature map;

[0132] The shape of the third upsampled feature map is H1×W1×C2;

[0133] Step 332: Input the third upsampled feature map into the first convolution unit; and the first convolution unit performs convolution operation on the third upsampled feature map based on a preset 3×3 third convolution kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding first convolution tensor.

[0134] Step 333: Input the first convolution tensor into the second convolution unit; and the second convolution unit performs convolution operation on the first convolution tensor based on a preset 3×3 fourth convolution kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding second convolution tensor.

[0135] The shape of the second convolution tensor is H1×W1×C1;

[0136] Step 334: Input the second convolution tensor into the feature vector transformation unit; and the feature vector transformation unit performs tensor flattening on the second convolution tensor to obtain the corresponding global feature vector;

[0137] The shape of the global feature vector is (H1*W1*C1)×1;

[0138] Step 335: Input the global feature vector into the first multilayer perceptron; and use the first fully connected layer of the first multilayer perceptron to perform linear calculation on the global feature vector to obtain the corresponding first fully connected vector; and use the first normalization layer of the first multilayer perceptron to perform normalization calculation on the first fully connected vector to obtain the corresponding first normalized vector.

[0139] The first multilayer sensing unit includes a first fully connected layer and a first normalization layer; the output of the first fully connected layer is connected to the input of the first normalization layer.

[0140] Step 336: Input the first normalized vector into the second multilayer perceptron; and use the second fully connected layer of the second multilayer perceptron to perform linear calculation on the first normalized vector to obtain the corresponding second fully connected vector; and use the first softmax function layer of the second multilayer perceptron to perform positive and negative class score regression prediction based on the second fully connected vector to obtain the corresponding first score tensor.

[0141] The second multilayer sensing unit includes a second fully connected layer and a first softmax function layer; the output of the second fully connected layer is connected to the input of the first softmax function layer; the shape of the first rating tensor is H1×W1×2, and the first rating tensor includes H1*W1 first rating vectors of length 2, and the first rating vector includes a first positive class rating and a first negative class rating;

[0142] Step 337: Input the first rating tensor into the binary image conversion unit; initialize a zero-based feature map of shape H1×W1×1 by the binary image conversion unit; traverse each first rating vector of the first rating tensor; during traversal, confirm whether the first positive class rating of the currently traversed first rating vector is greater than the first negative class rating; when the first positive class rating is confirmed to be greater than the first negative class rating, set the first feature data corresponding to the currently traversed first rating vector in the zero-based feature map to 1; and at the end of the traversal, output the updated zero-based feature map as the corresponding first binary image.

[0143] Among them, the all-zero feature map at the time of initialization consists of the first feature data of H1*W1 initialized to 0, and the first feature data corresponds one-to-one with the first score vector;

[0144] Step 34: Input the first binary image into the lane line instance recognition module to perform lane line instance recognition processing to obtain the corresponding first lane line coordinate tensor;

[0145] Wherein, the shape of the first lane line coordinate tensor is B×H1×2; B is the number of lane line instances; the first lane line coordinate tensor includes a first lane line coordinate vector {p} of length H1 with the number of lane line instances B. k,j}, 1≤first lane index k≤B, 1≤coordinate index j≤H1; first lane line coordinate vector {p k,j}Including H1 first lane line coordinates p k,j The coordinates of the first lane are p. k,j Including the x-axis and y-axis;

[0146] Specifically, it includes: step 341, where the lane line instance recognition module divides the first binary image into H1 first row pixel sequences in order from bottom to top;

[0147] Wherein, the first row of the first pixel sequence is the bottom row of the first binary image, and the first row of the H1th pixel sequence is the top row of the first binary image; the first row of the pixel sequence includes W1 first row pixels; the pixel value of each first row pixel is 0 or 1.

[0148] Step 342: Traverse each first row pixel sequence of the first binary image. During traversal, record the current first row pixel sequence as the corresponding current row pixel sequence. Record the pixel subsequence formed by multiple consecutive first row pixels with a value of 1 as the corresponding first subsequence. Identify whether the sequence length of each first subsequence in the current row pixel sequence is odd. If the sequence length of the current first subsequence is odd, keep only the pixel value of the first row pixel in the middle position of the current first subsequence as 1 and modify the pixel values ​​of all other first row pixels to 0. If the sequence length of the current first subsequence is even, select one of the two first row pixels in the middle position of the current first subsequence as the corresponding first retention point, keep only the pixel value of the first retention point in the current first subsequence as 1 and modify the pixel values ​​of all other first row pixels to 0. At the end of the traversal, use the modified first binary image as the corresponding second binary image.

[0149] The second binary image, from bottom to top, includes H1 second row pixel sequences L. i 1 ≤ row index i ≤ H1, second row pixel sequence L i=1 This is the bottom row of the second binary image, representing the sequence of pixels in the second row. The top row of the second binary image; the pixel sequence L in the second row. i Includes W1 pixels in the second row; the pixel value of each pixel in the second row is either 0 or 1; the sequence of each pixel in the second row is L. i There is no second row of pixels in the array where multiple consecutive pixel values ​​are all 1.

[0150] Step 343: Initialize the first index A to 1; and initialize the first lane index k to 0;

[0151] Step 344, in the second row of pixel sequence L i=1 In the process, the pixels in the second row with a pixel value of 1 are searched one by one from left to right. For each pixel found, the current search point is set as the current search point, and its pixel coordinates in the second binary image are used as the current point coordinates. The index k of the first lane is incremented by 1, and a new all-zero coordinate vector of length H1 is added as the corresponding first lane line coordinate vector {p}. k,j}, and in the newly added first lane line coordinate vector {p k,j In the context of}, the coordinates of the first lane line p are matched with the coordinate index j and the first index A. k,j Set the coordinates of the current point; proceed to step 346;

[0152] Step 345, in the second row pixel sequence L i=AIn the process, the pixels in the second row with a pixel value of 1 are searched one by one from left to right; each time a pixel is found, it is taken as the current search point; the pixel coordinates of the current search point in the second binary image are taken as the current point coordinates; and the existing coordinate vectors of each first lane line {p k,j In the context of the first lane line, coordinate index j is the first non-zero coordinate p before the first index A. k,j Let the coordinates before the corresponding first lane line be denoted as ; calculate the absolute value of the lateral coordinate difference between the current point coordinate and the coordinates before each first lane line to obtain the corresponding first lateral coordinate difference; select the minimum value from all obtained first lateral coordinate differences as the corresponding minimum lateral coordinate difference; and identify whether the minimum lateral coordinate difference exceeds a preset lateral coordinate difference threshold; if the minimum lateral coordinate difference does not exceed the lateral coordinate difference threshold, then the first lane line coordinate vector {p} corresponding to the first lane line coordinate before the minimum lateral coordinate difference is used. k,j In the context of index j, the coordinates p of the first lane line that are matched with the first index A are determined. k,j Set the current point coordinates; if the minimum lateral coordinate difference exceeds the lateral coordinate difference threshold, increment the first lane index k by 1, and add a new all-zero coordinate vector of length H1 as the corresponding first lane line coordinate vector {p k,j}, and in the newly added first lane line coordinate vector {p k,j In the context of}, the coordinates p of the first lane line that match index j with the first index A are determined. k,j Set as the coordinates of the current point;

[0153] Step 346: Increment the first index A by 1; and check whether the first index A after incrementing by 1 is greater than H1; if yes, proceed to step 347; if no, proceed to step 345.

[0154] Step 347: Take the latest value of the first lane index k as the corresponding lane line instance number B; and use the obtained lane line instance number B as the first lane line coordinate vector {p k,j} form the corresponding first lane line coordinate tensor;

[0155] Step 35: Input the first lane line coordinate tensor and the first feature map into the lane line feature extraction module for lane line feature fusion processing to obtain the corresponding first lane line feature tensor;

[0156] The shape of the first lane line feature tensor is B×(H1*C4); the first lane line feature tensor includes the first lane line feature vector with a number of lane line instances B.

[0157] Specifically, this includes: inputting the first lane line coordinate tensor and the first feature map into the lane line feature extraction module; and having the lane line feature extraction module extract the first lane line coordinate vectors {p} from the first lane line coordinate tensor. k,j The H1 coordinates of the first lane line p k,j H1 corresponding first feature map pixels are marked sequentially on the first feature map; C4 feature data of each first feature map pixel are extracted to form a first pixel feature vector of length C4; H1 first pixel feature vectors of length C4 are used to form a first lane line feature matrix of shape H1×C4; the first lane line feature matrix is ​​transformed into a one-dimensional vector to obtain a first lane line feature vector of shape (H1*C4)×1; and the first lane line feature vectors of the number B lane line instances are used to form a first lane line feature tensor of shape B×(H1*C4).

[0158] Step 36: Input the first lane line feature tensor into the type attribute prediction unit of the lane line attribute prediction network to predict the type attribute of each lane line instance to obtain the corresponding first prediction tensor; and input the first lane line feature tensor into the color attribute prediction unit of the lane line attribute prediction network to predict the color attribute of each lane line instance to obtain the corresponding second prediction tensor.

[0159] The first prediction tensor has a shape of B×N1, where N1 is the preset number of lane line types; the first prediction tensor includes a first prediction vector of shape 1×N1 with a lane line instance count B, and the first prediction vector includes a first lane line type prediction score of the lane line type count N1; the second prediction tensor has a shape of B×N2, where N2 is the preset number of lane line colors; the second prediction tensor includes a second prediction vector of shape 1×N2 with a lane line instance count B, and the second prediction vector includes a first lane line color prediction score of the lane line color count N2;

[0160] Specifically, it includes: Step 361, inputting the first lane line feature tensor into the type attribute prediction unit of the lane line attribute prediction network to predict the type attribute of each lane line instance to obtain the corresponding first prediction tensor.

[0161] Specifically, this includes: inputting the first lane line feature tensor into the type attribute prediction unit of the lane line attribute prediction network; the type attribute prediction unit inputting each first lane line feature vector of the first lane line feature tensor into the third fully connected layer for linear calculation, and inputting the calculation result into the second normalization layer for normalization processing, and inputting the processing result into the third fully connected layer for linear calculation, and inputting the calculation result into the second softmax layer for lane line type classification score prediction to obtain the corresponding first prediction vector; and forming the corresponding first prediction tensor with shape B×N1 from the obtained first prediction vectors of lane line instance number B.

[0162] The type attribute prediction unit includes a third fully connected layer, a second normalization layer, a third fully connected layer, and a second softmax layer. The output vector of the second softmax layer has a shape of N1×1, where N1 is the preset number of lane line types. The first prediction vector has a shape of N1×1 and includes the first lane line type prediction score of the number N1 lane line types. The number N1 lane line types include at least dashed line type and solid line type.

[0163] Step 362: Input the first lane line feature tensor into the color attribute prediction unit of the lane line attribute prediction network to predict the color attribute of each lane line instance to obtain the corresponding second prediction tensor.

[0164] Specifically, this includes: inputting the first lane line feature tensor into the color attribute prediction unit of the lane line attribute prediction network; the color attribute prediction unit inputting each first lane line feature vector of the first lane line feature tensor into the fourth fully connected layer for linear calculation, inputting the calculation result into the third normalization layer for normalization processing, inputting the processing result into the fifth fully connected layer for linear calculation, and inputting the calculation result into the third softmax layer for lane line color classification score prediction to obtain the corresponding second prediction vector; and forming the corresponding second prediction tensor with shape B×N2 from the obtained second prediction vectors of lane line instance number B.

[0165] The color attribute prediction unit includes a fourth fully connected layer, a third normalization layer, a fifth fully connected layer, and a third softmax layer. The output vector of the third softmax layer has a shape of N2×1, where N2 is the preset number of lane line colors. The second prediction vector has a shape of N2×1 and includes the first lane line color prediction score of the number N2 lane line colors. The number N2 lane line colors include at least yellow and white.

[0166] Step 37: Input the first lane line coordinate tensor, the first prediction tensor, the second prediction tensor, and the second image into the lane line instance and attribute allocation module to perform pixel-level lane line type and lane line color semantic feature addition processing on the second image to obtain the corresponding third image.

[0167] Specifically, it includes: Step 371, where the lane line instance and attribute allocation module constructs a zero-based semantic feature map of shape H1×W1×3 for the number of lane line instances B as the corresponding first semantic feature map;

[0168] The first semantic feature map includes H1*W1 pixels, each pixel corresponding to a first semantic feature vector of length 3. The first semantic feature vector includes three pixel-level semantic features: lane line index feature, lane line type semantic feature, and lane line color semantic feature. The first semantic feature map and the first lane line coordinate vector {p} of the first lane line coordinate tensor are also considered. k,j There is a one-to-one correspondence between the first prediction vector of the first prediction tensor and the second prediction vector of the second prediction tensor.

[0169] Step 372, for each first semantic feature map, the coordinate vector {p} corresponding to the first lane line is... k,j The coordinates p of each first lane line k,j The matched pixels are marked as the corresponding first lane line key points; and with each first lane line key point as the center, a specified number of pixels before and after in the same row are marked as multiple first lane line extension points corresponding to the current first lane line key point.

[0170] Step 373, based on the first lane line coordinate vector {p} corresponding to each first semantic feature map k,j The first lane index k is set for the lane index features of each first lane key point and first lane extension point on the first semantic feature map;

[0171] Step 374: For each first semantic feature map, identify the maximum score of the first lane line type prediction score of the first lane line type number N1 of the first prediction vector, and take the lane line type corresponding to the maximum score as the corresponding first lane line type; and set the lane line type semantic features of each first lane line key point and first lane line extension point on the first semantic feature map based on the first lane line type corresponding to each first semantic feature map.

[0172] Step 375: The maximum score of the first lane line color prediction score of the number N2 of lane line colors of the second prediction vector corresponding to each first semantic feature map is identified, and the lane line color corresponding to the maximum score is taken as the corresponding first lane line color; and based on the first lane line color corresponding to each first semantic feature map, the lane line color semantic features of each first lane line key point and first lane line extension point on the first semantic feature map are set.

[0173] Step 376: For the first semantic feature map with shape H1×W1×3 obtained by the number of lane line instances B, pixel-level feature fusion processing is performed by adding features point by point to obtain the corresponding second semantic feature map with shape H1×W1×3.

[0174] Step 377: Based on the graphic ratio of the second image and the second semantic feature map, the second semantic feature map is upsampled using bilinear interpolation to obtain the corresponding third semantic feature map with shape H0×W0×3.

[0175] Step 378: The obtained third semantic feature map and the second image with shape H0×W0×C0 are combined and pixel-level feature fusion is performed by vector concatenation to obtain the corresponding third image with shape H0×W0×(C0+3).

[0176] The third image has a size of H0×W0×(C0+3). Compared with the second image, the third image adds three pixel-level semantic features: lane line index feature, lane line type semantic feature, and lane line color semantic feature. The lane line type semantic feature includes multiple lane line types, including at least dashed line type and solid line type. The lane line color semantic feature includes multiple lane line colors, including at least white and yellow.

[0177] Figure 3 This is a module structure diagram of a processing system for lane line recognition based on images, provided in Embodiment 2 of the present invention. This system can be a system, terminal device, or server implementing the aforementioned method embodiment 1, or it can be an apparatus capable of enabling the aforementioned system, terminal device, or server to implement the aforementioned method embodiment 1. For example, the apparatus can be a device or chip system of the aforementioned terminal device or server. Figure 3 As shown, the system includes: a data receiving module 201, an image preprocessing module 202, and a lane line prediction model processing module 203.

[0178] The data receiving module 201 is used to receive the first image.

[0179] The image preprocessing module 202 is used to adjust the size of the first image according to the preset image size to obtain the corresponding second image; the preset image size is H0×W0×C0, where H0, W0, and C0 are the preset image height, width, and feature dimensions, respectively.

[0180] The lane line prediction model processing module 203 is used to input the second image into the preset lane line prediction model to perform lane line instance and attribute recognition processing to obtain the corresponding third image; the size of the third image is H0×W0×(C0+3); the third image adds three pixel-level semantic features compared to the second image, namely lane line index features, lane line type semantic features and lane line color semantic features; the lane line type semantic features include multiple lane line types; the lane line color semantic features include multiple lane line colors.

[0181] The image-based lane line recognition processing system provided in Embodiment 2 of the present invention can execute the method steps in Embodiment 1 above. Its implementation principle and technical effect are similar, and will not be repeated here.

[0182] It should be noted that the division of the various modules in the above system is merely a logical functional division. In actual implementation, they can be fully or partially integrated into a single physical entity, or they can be physically separated. Furthermore, these modules can be implemented entirely in software via processing element calls; they can be fully implemented in hardware; or some modules can be implemented by processing element calls to software, while others are implemented in hardware. For example, the data receiving module can be a separate processing element, or it can be integrated into a chip in the aforementioned device. Alternatively, it can be stored as program code in the memory of the aforementioned device, and called and executed by a processing element of the system. The implementation of other modules is similar. Moreover, these modules can be fully or partially integrated together, or they can be implemented independently. The processing element described here can be an integrated circuit with signal processing capabilities. In the implementation process, the method steps of the aforementioned method or the processing steps of the modules of the aforementioned system can be completed through hardware integrated logic circuits in the processor element or software instructions.

[0183] For example, these modules in the above system can be one or more integrated circuits configured to implement the aforementioned methods, such as one or more application-specific integrated circuits (ASICs), one or more digital signal processors (DSPs), or one or more field-programmable gate arrays (FPGAs). As another example, when a module in the above system is implemented through processing element scheduler code, the processing element can be a general-purpose processor, such as a central processing unit (CPU) or other processor capable of calling program code. Furthermore, these modules can be integrated together to implement a system-on-a-chip (SOC).

[0184] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. This computer program product includes one or more computer instructions. When these computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the foregoing method embodiments are generated. The computer described above can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The aforementioned computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the aforementioned computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, Bluetooth, microwave, etc.) means. The aforementioned computer-readable storage medium can be any available medium that a computer can access, or a data storage device such as a server or data center that integrates one or more available media. The aforementioned available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state disks (SSDs)).

[0185] Figure 4 This is a schematic diagram of an electronic device provided in Embodiment 3 of the present invention. This electronic device can be a terminal device or server implementing the aforementioned method, or it can be a terminal device or server connected to the aforementioned terminal device or server implementing the aforementioned method. Figure 4As shown, the electronic device may include: a processor 301 (e.g., CPU), a memory 302, and a transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls the transmission and reception operations of the transceiver 303. The memory 302 may store various instructions for performing various processing functions and implementing the processing steps described in the foregoing method embodiments. Preferably, the electronic device involved in Embodiment 3 of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to realize communication connections between components. The communication port 306 is used for communication between the electronic device and other peripherals.

[0186] exist Figure 4 The system bus mentioned can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This system bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 4 The symbol is represented by a single thick line, but this does not imply that there is only one bus or one type of bus. The communication interface is used to enable communication between the database access device and other devices (e.g., clients, read-write libraries, and read-only libraries). Memory may include random access memory (RAM) and may also include non-volatile memory, such as at least one disk drive.

[0187] The processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), graphics processing units (GPUs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.

[0188] It should be noted that the embodiments of the present invention also provide a computer-readable storage medium storing instructions that, when run on a computer, cause the computer to perform the aforementioned methods and processes.

[0189] This invention also provides a chip for executing instructions, which is used to perform the processing steps described in the foregoing method embodiments.

[0190] This invention provides a method, system, electronic device, and computer-readable storage medium for image-based lane line recognition. It constructs a lane line prediction model capable of simultaneously processing lane line instance segmentation and attribute recognition end-to-end. Based on this model, it performs lane line instance segmentation, lane line type attribute recognition, and lane line color attribute recognition on the input image. The end-to-end model provided by this invention avoids the redundant computation problem in conventional multi-model schemes, improving recognition efficiency and saving system computing resources.

[0191] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0192] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented in hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.

[0193] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A processing method for lane line recognition based on images, characterized in that, The method includes: Receive the first image; The first image is resized according to a preset image size to obtain the corresponding second image; the preset image size is H0×W0×C0, where H0, W0, and C0 are the preset image height, width, and feature dimension, respectively. The second image is input into a preset lane prediction model to perform lane line instance and attribute recognition processing to obtain the corresponding third image; the size of the third image is H0×W0×(C0+3); the third image adds three pixel-level semantic features compared to the second image, namely lane line index features, lane line type semantic features, and lane line color semantic features; the lane line type semantic features include multiple lane line types; the lane line color semantic features include multiple lane line colors; The lane prediction model includes a three-layer FPN network, a binary segmentation network, a 2D convolutional network, a lane line instance recognition module, a lane line feature extraction module, a lane line attribute prediction network, and a lane line instance and attribute assignment module. The step of inputting the second image into a preset lane prediction model for lane line instance and attribute recognition processing to obtain the corresponding third image specifically includes: The second image is fed into the three-layer FPN network for feature extraction to obtain the corresponding first-level and second-level feature maps. The shape of the first-level feature map is H1×W1×C1, with height H1=H0 / 2, width W1=W0 / 2, and feature dimension C1 is 64 by default. The shape of the second-level feature map is H2×W2×C2, with height H2=H1 / 2, width W2=W1 / 2, and feature dimension C2=C1*2. The first-level feature map is input into the 2D convolutional network for 2D convolution processing to obtain the first feature map; the shape of the first feature map is H1×W1×C4; the feature dimension C4=C1 / 4; The secondary feature map is input into the binary segmentation network for binary image segmentation to obtain a first binary image; the shape of the first binary image is H1×W1×1; The first binary image is input into the lane line instance recognition module for lane line instance recognition processing to obtain the corresponding first lane line coordinate tensor; the shape of the first lane line coordinate tensor is B×H1×2; B is the number of lane line instances; the first lane line coordinate tensor includes B first lane line coordinate vectors {p} of length H1. k,j }, 1 ≤ first lane index k ≤ B, 1 ≤ coordinate index j ≤ H1; first lane line coordinate vector {p k,j }Including H1 first lane line coordinates p k,j The first lane line coordinate p k,j Including the x-axis and y-axis; The first lane line coordinate tensor and the first feature map are input into the lane line feature extraction module for lane line feature fusion processing to obtain the corresponding first lane line feature tensor; the shape of the first lane line feature tensor is B×(H1*C4); the first lane line feature tensor includes B first lane line feature vectors. The first lane line feature tensor is input into the type attribute prediction unit of the lane line attribute prediction network to predict the type attribute of each lane line instance, resulting in a first prediction tensor. The first lane line feature tensor is then input into the color attribute prediction unit of the lane line attribute prediction network to predict the color attribute of each lane line instance, resulting in a second prediction tensor. The first prediction tensor has a shape of B×N1, where N1 is a preset number of lane line types. The first prediction tensor includes B first prediction vectors of shape 1×N1, each containing N1 first lane line type prediction scores. The second prediction tensor has a shape of B×N2, where N2 is a preset number of lane line colors. The second prediction tensor includes B second prediction vectors of shape 1×N2, each containing N2 first lane line color prediction scores. The first lane line coordinate tensor, the first prediction tensor, the second prediction tensor, and the second image are input into the lane line instance and attribute allocation module to perform pixel-level lane line type and lane line color semantic feature addition processing on the second image to obtain the corresponding third image; The step of inputting the first binary image into the lane line instance recognition module for lane line instance recognition processing to obtain the corresponding first lane line coordinate tensor specifically includes: Step 71: Divide the first binary image into H1 first row pixel sequences in a bottom-to-top order; the first row of the first row pixel sequence is the bottom row of the first binary image, and the H1 first row pixel sequence is the top row of the first binary image; the first row pixel sequence includes W1 first row pixels; the pixel value of each first row pixel is 0 or 1; Step 72: Traverse each first row pixel sequence of the first binary image; during traversal, record the currently traversed first row pixel sequence as the corresponding current row pixel sequence; and record the pixel subsequence formed by multiple consecutive first row pixels with a value of 1 in the current row pixel sequence as the corresponding first subsequence; and identify whether the sequence length of each first subsequence in the current row pixel sequence is odd; if the sequence length of the current first subsequence is odd, then only the first row pixel in the middle position of the current first subsequence is recorded. The pixel value is kept at 1, and the pixel values ​​of all other pixels in the first row are modified to 0; if the sequence length of the current first subsequence is even, then one of the two first row pixels in the middle position in the current first subsequence is selected as the corresponding first reserved point, and only the pixel value of the first reserved point in the current first subsequence is kept at 1, while the pixel values ​​of all other pixels in the first row are modified to 0; at the end of the traversal, the modified first binary image is used as the corresponding second binary image; the second binary image includes H1 second row pixel sequence L from bottom to top. i 1 ≤ row index i ≤ H1, second row pixel sequence This is the bottom row of the second binary image, representing the sequence of pixels in the second row. The top row of the second binary image; the pixel sequence L of the second row. i Includes W1 second-row pixels; the pixel value of each second-row pixel is either 0 or 1; the sequence of each second-row pixel is L. i There are no pixels in the second row whose values ​​are all 1 in a row; Step 73: Initialize the first index A to 1; and initialize the first lane index k to 0; Step 74, in the second row of pixel sequence In the process, the second row of pixels with a pixel value of 1 is searched one by one from left to right; each time a pixel is found, the currently searched second row pixel is taken as the corresponding current search point, and the pixel coordinates of the current search point in the second binary image are taken as the corresponding current point coordinates. The first lane index k is incremented by 1, and a new all-zero coordinate vector of length H1 is added as the corresponding first lane line coordinate vector {p}. k,j }, and in the newly added first lane line coordinate vector {p k,j The coordinate index j of the first lane line is matched with the first index A in the following context: k,j Set the coordinates of the current point; proceed to step 76; Step 75, in the second row of pixel sequence In the process, the second row of pixels with a pixel value of 1 are searched one by one from left to right; each time a pixel is found, the currently searched second row of pixels is taken as the current search point; the pixel coordinates of the current search point in the second binary image are taken as the coordinates of the current point; and the existing coordinate vectors of each first lane line {p k,j In the context of the first lane line coordinate p, the first non-zero coordinate preceding the first index A is defined as the coordinate index j. k,j Let the coordinates before the corresponding first lane line be denoted as ; and calculate the absolute value of the lateral coordinate difference between the current point coordinate and each of the first lane line coordinates before the first lane line to obtain the corresponding first lateral coordinate difference; and select the minimum value from all the obtained first lateral coordinate differences as the corresponding minimum lateral coordinate difference; and identify whether the minimum lateral coordinate difference exceeds a preset lateral coordinate difference threshold; if the minimum lateral coordinate difference does not exceed the lateral coordinate difference threshold, then the first lane line coordinate vector {p} corresponding to the first lane line coordinate before the first lane line corresponding to the minimum lateral coordinate difference is... k,j In the context of index j, the first lane line coordinate p is matched with the first index A. k,j Let the current point coordinates be set; if the minimum lateral coordinate difference exceeds the lateral coordinate difference threshold, then increment the first lane index k by 1, and add a new all-zero coordinate vector of length H1 as the corresponding first lane line coordinate vector {p k,j }, and in the newly added first lane line coordinate vector {p k,j The first lane line coordinate p that matches the index j with the first index A in the} k,j Let these be the coordinates of the current point; Step 76: Increment the first index A by 1; and identify whether the first index A after incrementing by 1 is greater than H1; if yes, proceed to step 77; if no, proceed to step 75. Step 77: Take the latest value of the first lane index k as the corresponding number of lane line instances B; and use the first lane line coordinate vector {p} of the obtained number of lane line instances B as the basis for determining the lane line coordinate vector. k,j } form the corresponding first lane line coordinate tensor; The step of inputting the first lane line coordinate tensor and the first feature map into the lane line feature extraction module for lane line feature fusion processing to obtain the corresponding first lane line feature tensor specifically includes: Based on the first lane line coordinate vectors {p} of the first lane line coordinate tensor k,j H1 coordinates of the first lane line p k,j H1 corresponding first feature map pixels are marked sequentially on the first feature map; C4 feature data of each first feature map pixel are extracted to form a first pixel feature vector of length C4; the obtained H1 first pixel feature vectors of length C4 are used to form a first lane line feature matrix of shape H1×C4; the first lane line feature matrix is ​​transformed into a one-dimensional vector to obtain a first lane line feature vector of shape (H1*C4)×1; and the first lane line feature vectors of the obtained number of lane line instances B are used to form a first lane line feature tensor of shape B×(H1*C4).

2. The image-based lane line recognition method according to claim 1, characterized in that, The input of the three-layer FPN network is the input of the lane prediction model, and its output is connected to the inputs of the binary segmentation network and the 2D convolutional network, respectively. The three-layer FPN network includes a downsampling residual network side and an upsampling feature extraction network side. The downsampling residual network side includes first, second, and third-level residual units. The upsampling feature extraction network side includes first, second, and third-level feature extraction units. The input of the first-level residual unit is the input of the three-layer FPN network, and its output is connected to the input of the second-level residual unit and the first input of the first-level feature extraction unit, respectively. The output of the second-level residual unit is connected to the input of the third-level residual unit and the first input of the second-level feature extraction unit, respectively. The output of the third-level residual unit is connected to the input of the third-level feature extraction unit; the output of the third-level feature extraction unit is connected to the second input of the second-level feature extraction unit; the output of the second-level feature extraction unit is connected to the second input of the first-level feature extraction unit and the input of the binary segmentation network, respectively; the output of the first-level feature extraction unit is connected to the input of the 2D convolutional network; the first, second, and third-level residual units are by default the conv1, conv2_x, and conv3_x modules of the ResNet101 network; The output of the binary segmentation network is connected to the input of the lane line instance recognition module. The binary segmentation network includes a first convolutional unit, a second convolutional unit, a feature vector transformation unit, a first multilayer sensing unit, a second multilayer sensing unit, and a binary image transformation unit. The input of the first convolutional unit is the input of the binary segmentation network, and its output is connected to the input of the second convolutional unit. The output of the second convolutional unit is connected to the input of the feature vector transformation unit. The output of the feature vector transformation unit is connected to the input of the first multilayer sensing unit. The output of the first multilayer sensing unit is connected to the input of the second multilayer sensing unit. The output of the second multilayer sensing unit is connected to the input of the binary image transformation unit. The output of the binary image transformation unit is connected to the input of the lane line instance recognition module. The output of the lane line instance recognition module is connected to the first input of the lane line feature extraction module and the first input of the lane line instance and attribute allocation module, respectively. The output of the 2D convolutional network is connected to the second input of the lane line feature extraction module; The output of the lane line feature extraction module is connected to the input of the lane line attribute prediction network; The output of the lane line attribute prediction network is connected to the second and third inputs of the lane line instance and attribute allocation module; the lane line attribute prediction network includes a type attribute prediction unit and a color attribute prediction unit; the input of the type attribute prediction unit is connected to the output of the lane line feature extraction module, and its output is connected to the second input of the lane line instance and attribute allocation module; the input of the color attribute prediction unit is connected to the output of the lane line feature extraction module, and its output is connected to the third input of the lane line instance and attribute allocation module. The fourth input terminal of the lane line instance and attribute assignment module is the input terminal of the lane line prediction model.

3. The image-based lane line recognition method according to claim 2, characterized in that, The lane prediction model inputs the second image into the three-layer FPN network for feature extraction to obtain the corresponding first-level and second-level feature maps, specifically including: The second image with shape H0×W0×C0 is input into the first-level residual unit to perform downsampling residual operation to obtain the corresponding first-level downsampling feature tensor; the shape of the first-level downsampling feature tensor is H1×W1×C1; The first-level downsampled feature tensor is input into the second-level residual unit to perform downsampled residual calculation to obtain the corresponding second-level downsampled feature tensor; the shape of the second-level downsampled feature tensor is H2×W2×C2; The second-level downsampling feature tensor is input into the third-level residual unit to perform downsampling residual calculation to obtain the corresponding third-level downsampling feature tensor; the shape of the third-level downsampling feature tensor is H3×W3×C3, where H3, W3, and C3 are the height, width, and feature dimension of the third-level downsampling feature tensor, respectively, and H3=H2 / 2, W3=W2 / 2, C3=C2*2; The third-level downsampled feature tensor is input into the third-level feature extraction unit; and the third-level feature extraction unit performs convolution operation on the third-level downsampled feature tensor based on a preset 3×3 first convolution kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding third-level feature map; the shape of the third-level feature map is consistent with the shape of the third-level downsampled feature tensor, which is H3×W3×C3; The second-level downsampled feature tensor and the third-level feature map are input into the second-level feature extraction unit; the second-level feature extraction unit performs 2x upsampling and feature dimensionality reduction on the third-level feature map to obtain a first upsampled feature map with a shape of H2×W2×C2; and the second-level downsampled feature tensor and the first upsampled feature map are summed to obtain the corresponding second-level feature map. The first-level downsampled feature tensor and the second-level feature map are input into the first-level feature extraction unit; the first-level feature extraction unit performs 2x upsampling and feature dimensionality reduction on the second-level feature map to obtain the corresponding second upsampled feature map with shape H1×W1×C1; and the first-level downsampled feature tensor and the second upsampled feature map are tensor summed to obtain the corresponding first-level feature map.

4. The image-based lane line recognition method according to claim 2, characterized in that, The step of inputting the first-level feature map into the 2D convolutional network for 2D convolution processing to obtain the corresponding first feature map specifically includes: The first-level feature map is input into the 2D convolutional network; and the 2D convolutional network performs convolution operations on the first-level feature map based on a preset 3×3 second convolutional kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding first feature map.

5. The image-based lane line recognition method according to claim 2, characterized in that, The step of inputting the secondary feature map into the binary segmentation network for binary map segmentation to obtain the corresponding first binary map specifically includes: The binary segmentation network performs 2x upsampling on the input secondary feature map of shape H2×W2×C2 and preserves the feature dimension unchanged to obtain the corresponding third upsampled feature map; the shape of the third upsampled feature map is H1×W1×C2. The third upsampled feature map is input into the first convolutional unit; and the first convolutional unit performs convolution operation on the third upsampled feature map based on a preset 3×3 third convolutional kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding first convolutional tensor. The first convolution tensor is input into the second convolution unit; and the second convolution unit performs a convolution operation on the first convolution tensor based on a preset 3×3 fourth convolution kernel, convolution padding of 1, and convolution stride of 1 to obtain the corresponding second convolution tensor; the shape of the second convolution tensor is H1×W1×C1; The second convolution tensor is input into the feature vector conversion unit; and the feature vector conversion unit performs tensor flattening on the second convolution tensor to obtain the corresponding global feature vector; the shape of the global feature vector is (H1*W1*C1)×1; The global feature vector is input into the first multilayer perceptron; the first fully connected layer of the first multilayer perceptron performs linear calculation on the global feature vector to obtain the corresponding first fully connected vector; and the first normalization layer of the first multilayer perceptron performs normalization calculation on the first fully connected vector to obtain the corresponding first normalized vector; the first multilayer perceptron includes the first fully connected layer and the first normalization layer; the output of the first fully connected layer is connected to the input of the first normalization layer. The first normalized vector is input into the second multilayer perceptron; the second fully connected layer of the second multilayer perceptron performs linear calculation on the first normalized vector to obtain the corresponding second fully connected vector; and the first softmax function layer of the second multilayer perceptron performs positive and negative class score regression prediction based on the second fully connected vector to obtain the corresponding first rating tensor; the second multilayer perceptron includes the second fully connected layer and the first softmax function layer; the output of the second fully connected layer is connected to the input of the first softmax function layer; the shape of the first rating tensor is H1×W1×2, and the first rating tensor includes H1*W1 first rating vectors of length 2, and the first rating vector includes a first positive class rating and a first negative class rating. The first rating tensor is input into the binary image conversion unit; and the binary image conversion unit initializes a zero-based feature map of shape H1×W1×1; and iterates through each of the first rating vectors of the first rating tensor; during the iteration, it checks whether the first positive class rating of the currently iterated first rating vector is greater than the first negative class rating; when it is confirmed that the first positive class rating is greater than the first negative class rating, the first feature data corresponding to the currently iterated first rating vector in the zero-based feature map is set to 1; and at the end of the iteration, the updated zero-based feature map is output as the corresponding first binary image; the zero-based feature map at the time of initialization consists of the first feature data of H1*W1 initialized to 0, and the first feature data corresponds one-to-one with the first rating vector.

6. The image-based lane line recognition method according to claim 2, characterized in that, The step of inputting the first lane line feature tensor into the type attribute prediction unit of the lane line attribute prediction network to predict the type attribute of each lane line instance to obtain the corresponding first prediction tensor specifically includes: The first lane feature tensor is input into the type attribute prediction unit of the lane attribute prediction network; the type attribute prediction unit inputs each of the first lane feature vectors of the first lane feature tensor into the third fully connected layer for linear calculation, and inputs the calculation result into the second normalization layer for normalization processing, and inputs the processing result into the third fully connected layer for linear calculation, and inputs the calculation result into the second softmax layer for lane type classification score prediction to obtain the corresponding first prediction vector; and the first prediction vectors of the obtained number of lane instances B form the corresponding first prediction tensor with shape B×N1; the type attribute prediction unit includes the third fully connected layer, the second normalization layer, the third fully connected layer and the second softmax layer; the output vector of the second softmax layer has shape N1×1, where N1 is a preset number of lane types; the first prediction vector has shape N1×1 and includes the first lane type prediction score of the number of lane types N1; the number of lane types N1 includes at least dashed line type and solid line type.

7. The image-based lane line recognition method according to claim 2, characterized in that, The step of inputting the first lane line feature tensor into the color attribute prediction unit of the lane line attribute prediction network to predict the color attribute of each lane line instance to obtain the corresponding second prediction tensor specifically includes: The first lane feature tensor is input into the color attribute prediction unit of the lane attribute prediction network; the color attribute prediction unit inputs each of the first lane feature vectors of the first lane feature tensor into the fourth fully connected layer for linear calculation, and inputs the calculation result into the third normalization layer for normalization processing, and inputs the processing result into the fifth fully connected layer for linear calculation, and inputs the calculation result into the third softmax layer for lane color classification score prediction to obtain the corresponding second prediction vector; and the second prediction vectors of the obtained number of lane instances B form the corresponding second prediction tensor with a shape of B×N2; the color attribute prediction unit includes the fourth fully connected layer, the third normalization layer, the fifth fully connected layer and the third softmax layer; the output vector of the third softmax layer has a shape of N2×1, where N2 is the preset number of lane colors; the second prediction vector has a shape of N2×1, including the first lane color prediction score of the number of lane colors N2; the number of lane colors N2 includes at least yellow and white.

8. The image-based lane line recognition method according to claim 2, characterized in that, The step of inputting the first lane line coordinate tensor, the first prediction tensor, the second prediction tensor, and the second image into the lane line instance and attribute allocation module to perform pixel-level lane line type and lane line color semantic feature addition processing on the second image to obtain the corresponding third image specifically includes: The lane line instance and attribute allocation module constructs a zero-based semantic feature map of shape H1×W1×3 for the number B of lane line instances as the corresponding first semantic feature map. The first semantic feature map includes H1*W1 pixels, each pixel corresponding to a first semantic feature vector of length 3. The first semantic feature vector includes three pixel-level semantic features: the lane line index feature, the lane line type semantic feature, and the lane line color semantic feature. The first semantic feature map and the first lane line coordinate vector {p} of the first lane line coordinate tensor are used to construct the first lane line coordinate map. k,j Each prediction vector corresponds to a one-to-one prediction vector of the first prediction tensor and to a one-to-one prediction vector of the second prediction tensor. For each of the first semantic feature maps, the corresponding first lane line coordinate vector {p k,j The coordinates p of each of the first lane lines k,j Matched pixels are marked as corresponding first lane line key points; and a specified number of pixels in the same row are marked as multiple first lane line extension points corresponding to the current first lane line key point, with each first lane line key point as the center. Based on the first lane line coordinate vector {p} corresponding to each of the first semantic feature maps k,j The first lane index k is set for the lane index features of each first lane key point and the first lane extension point on the first semantic feature map; The maximum score of the first lane line type prediction score of the first prediction vector corresponding to each first semantic feature map is identified, and the lane line type corresponding to the maximum score is taken as the corresponding first lane line type; and based on the first lane line type corresponding to each first semantic feature map, the lane line type semantic features of each first lane line key point and the first lane line extension point on the first semantic feature map are set. The maximum score of the first lane line color prediction score of the second prediction vector corresponding to each first semantic feature map is identified, and the lane line color corresponding to the maximum score is taken as the corresponding first lane line color; and based on the first lane line color corresponding to each first semantic feature map, the lane line color semantic features of each first lane line key point and the first lane line extension point on the first semantic feature map are set. For the first semantic feature map with shape H1×W1×3 obtained by the number of lane line instances B, pixel-level feature fusion processing is performed by adding features point by point to obtain the corresponding second semantic feature map with shape H1×W1×3. Based on the graphic ratio of the second image and the second semantic feature map, the second semantic feature map is upsampled using bilinear interpolation to obtain a third semantic feature map with a corresponding shape of H0×W0×3. The obtained third semantic feature map and the second image with shape H0×W0×C0 are subjected to pixel-level feature fusion processing by vector concatenation to obtain the corresponding third image with shape H0×W0×(C0+3); the size of the third image is H0×W0×(C0+3); the third image adds three pixel-level semantic features compared to the second image, namely the lane line index feature, the lane line type semantic feature, and the lane line color semantic feature; the lane line type semantic feature includes multiple lane line types, of which at least include dashed line type and solid line type; the lane line color semantic feature includes multiple lane line colors, of which at least include white and yellow.

9. A system for implementing the image-based lane line recognition processing method according to any one of claims 1-8, characterized in that, The system includes: a data receiving module, an image preprocessing module, and a lane line prediction model processing module; The data receiving module is used to receive the first image; The image preprocessing module is used to adjust the size of the first image according to a preset image size to obtain a corresponding second image; the preset image size is H0×W0×C0, where H0, W0, and C0 are the preset image height, width, and feature dimension, respectively. The lane line prediction model processing module is used to input the second image into a preset lane line prediction model to perform lane line instance and attribute recognition processing to obtain the corresponding third image; the size of the third image is H0×W0×(C0+3); the third image adds three pixel-level semantic features compared to the second image, namely lane line index features, lane line type semantic features, and lane line color semantic features; the lane line type semantic features include multiple lane line types; the lane line color semantic features include multiple lane line colors.

10. An electronic device, characterized in that, include: Memory, processor, and transceiver; The processor is configured to be coupled to the memory, read and execute instructions in the memory to implement the method according to any one of claims 1-8; The transceiver is coupled to the processor, and the processor controls the transceiver to send and receive messages.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that, when executed by a computer, cause the computer to perform the method described in any one of claims 1-8.