Warehouse entry method and device based on cargo list photographing according to correction prompt and storage medium
By using an image quality assessment model to evaluate and correct waybill images, the problems of real-time performance and resource waste in waybill recognition are solved, enabling an efficient waybill warehousing process and improving recognition accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHENGDU ZIJIELIU TECH CO LTD
- Filing Date
- 2026-04-09
- Publication Date
- 2026-06-12
AI Technical Summary
In existing technologies, bill of lading recognition suffers from insufficient real-time performance, lack of quality assessment, and serious waste of resources, especially in cloud-based OCR recognition and localized recognition devices that are sensitive to network latency, resulting in low recognition efficiency and waste of resources.
An image quality assessment model is used to evaluate the target accompanying order image. If it does not meet the preset conditions, a shooting correction guidance prompt is output until the image quality meets the requirements. Then, the differential image is extracted and compressed for transmission to reduce invalid background and improve real-time performance and resource utilization efficiency.
It achieves real-time order recognition, reduces resource waste, improves recognition accuracy, and provides shooting guidance function, making it suitable for large-scale applications.
Smart Images

Figure CN121998978B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image recognition technology, specifically relating to a method, device, and storage medium for taking photos of goods documents for warehousing based on correction prompts. Background Technology
[0002] According to the 2024 report of the China Federation of Logistics and Purchasing, there are currently three main technical routes for order recognition in the industry: (1) manual input method, which accounts for 37.2%, with an average processing time of 28 seconds per order and an error rate of 6.8%-12.4% (related to operator proficiency); (2) cloud-based OCR recognition, which accounts for 52.1%, with typical systems such as WMS3.0 of a logistics company and the warehouse brain of an e-commerce group. This recognition method is sensitive to network latency, and when the network latency is greater than 150ms, the recognition failure rate increases to 24.7%; (3) localized recognition equipment, which accounts for 10.7%. Its main problems are high cost and poor flexibility (the unit price of dedicated equipment is greater than 8,000 yuan and it cannot be adapted to different specifications of orders).
[0003] The aforementioned existing technologies have the following defects: (1) Insufficient real-time performance. Existing mobile solutions require uploading complete images to the cloud before feedback can be obtained, with an average response time of 6.3 seconds (under 4G network conditions); (2) Lack of quality assessment. 78% of recognition errors are due to image quality issues (blur, tilt, and overexposure), but existing systems do not have shooting guidance functions; (3) Serious waste of resources. Tests show that 42% of uploaded images contain more than 60% invalid background areas. Therefore, based on the aforementioned shortcomings, how to provide a real-time, resource-saving, quality assessment, and shooting guidance method for taking photos with the goods and putting them into storage has become an urgent problem to be solved. Summary of the Invention
[0004] The purpose of this invention is to provide a method, device, and storage medium for warehousing by taking photos of goods with corrective prompts, in order to solve the problems of insufficient real-time performance, lack of quality assessment, and serious waste of resources in the existing technology.
[0005] To achieve the above objectives, the present invention adopts the following technical solution:
[0006] Firstly, a method for warehousing by photographing accompanying delivery notes based on correction prompts is provided, including:
[0007] Obtain the image of the target shipment document;
[0008] An image quality assessment model is used to assess the image quality of the target accompanying document image, and the image quality assessment results are obtained.
[0009] Based on the image quality assessment results, determine whether the image quality of the target accompanying shipment image meets the preset conditions;
[0010] If so, then acquire the baseline packing slip image and extract the difference image between the target packing slip image and the baseline packing slip image; otherwise, output image shooting correction guidance prompt information, so that after outputting the image shooting correction guidance prompt information, reacquire the target packing slip image until the image quality of the target packing slip image meets the preset conditions. The difference image is used to characterize the image corresponding to the difference between the target packing slip image and the baseline packing slip image.
[0011] The differential image is compressed to obtain a compressed image;
[0012] The compressed image is transmitted to the cloud so that the cloud can reconstruct the target shipment image based on the compressed image and the reference shipment image, and then store the target shipment image in the database.
[0013] Based on the above-disclosed content, after acquiring the target accompanying document image, this invention first uses an image quality assessment model to assess its quality and obtain an image quality assessment result. Then, based on the image quality assessment result, it determines whether the image quality of the target accompanying document image meets preset conditions. If it does not meet the conditions, an image shooting correction guidance prompt will be output to prompt the user to take a corrective photo to re-acquire the target accompanying document image. Then, the aforementioned quality assessment process is repeated until the image quality meets the preset conditions. Then, the subsequent differential transmission process can be performed, namely: acquiring a reference accompanying document image and extracting a differential image to represent the difference between the target accompanying document image and the reference accompanying document image; then, compressing the differential image to obtain a compressed image; finally, transmitting the compressed image to the cloud, thus completing the photo capture and storage of the target accompanying document.
[0014] Through the above design, this invention utilizes an image quality assessment model to achieve quality assessment of the target accompanying order image. When the image quality does not meet preset conditions, it outputs image shooting correction guidance information to prompt the user to take corrective photos, thus realizing the shooting correction guidance function. Simultaneously, after the image quality of the acquired target accompanying order image meets preset conditions, a difference image is extracted to characterize the difference between the target accompanying order image and the reference accompanying order image. Only the difference image is uploaded. Finally, the target accompanying order image is reconstructed in the cloud using the difference image and the reference accompanying order image, thus completing the order entry into the warehouse. Therefore, this invention only transmits the difference between the current image and the reference image, improving real-time performance and reducing invalid background compared to traditional technologies, thereby avoiding resource waste. Based on this, this invention provides a real-time, resource-saving, quality assessment-enabled, and shooting guidance-enabled accompanying order photography and warehousing technology, which can improve the accuracy of subsequent order recognition. Therefore, it is highly suitable for large-scale application and promotion.
[0015] In one possible design, the image quality assessment model includes: an initial convolutional layer, an inverted residual bottleneck block layer, a global average pooling layer, and a fully connected layer. The fully connected layer includes a first branch connection layer, a second branch connection layer, and a third branch connection layer, and each branch connection layer has its own corresponding activation function.
[0016] The initial convolutional layer is used to perform feature extraction processing on the input target image along with the delivery note to obtain the first feature image;
[0017] The inverted residual bottleneck block layer is used to perform feature re-extraction processing on the first feature image based on the depthwise separable convolution mechanism and the attention mechanism to obtain the second feature image;
[0018] A global average pooling layer is used to perform global average pooling on the second feature image to obtain a one-dimensional feature vector, and then input the one-dimensional feature vector into the three branch connection layers respectively.
[0019] The first branch connection layer is used to map the one-dimensional feature vector using the corresponding activation function to obtain the clarity score of the target image along with the shipment order.
[0020] The second branch connection layer is used to map the one-dimensional feature vector using the corresponding activation function to obtain the sine and cosine values of the yaw angle of the target along the waybill image.
[0021] The third branch connection layer is used to map the one-dimensional feature vector using the corresponding activation function to obtain the illumination level of the target image along with the shipment order.
[0022] The image quality assessment result of the target cargo manifest image is composed of the sharpness score, illumination level, and sine and cosine values of the yaw angle.
[0023] In one possible design, the inverted residual bottleneck block layer includes a bottleneck block and multiple MBConv blocks connected in sequence. The output feature of any MBConv block is fused with the input feature of that MBConv block to obtain the fused feature, which is then output to the next MBConv block. The fused feature output by the last MBConv block is input to the bottleneck block, and the bottleneck block is used to output the second feature image.
[0024] Each MBConv block includes a 1×1 up-dimensional convolutional layer, a 3×3 first depthwise separable convolutional layer, a 5×5 second depthwise separable convolutional layer, an SE attention layer, and a 1×1 down-dimensional convolutional layer connected in sequence.
[0025] In one possible design, the initial convolutional layer is a 3×3 two-dimensional convolutional layer, wherein the convolution stride of the two-dimensional convolutional layer is 3, the number of convolutional kernels is 16, and the inverted residual bottleneck block layer includes 8 MBConv blocks.
[0026] In one possible design, the image quality assessment results include: the sharpness score of the target accompanying document image, the sine and cosine values of the yaw angle of the target accompanying document image, and the illumination level of the target accompanying document image.
[0027] Among these, based on the image quality assessment results, it is determined whether the image quality of the target accompanying shipment image meets preset conditions, including:
[0028] Based on the sine and cosine values of the yaw angle, the angular deviation value of the target along with the cargo manifest image is calculated;
[0029] The sharpness score, the angle deviation value, and the illumination level are normalized to obtain normalized sharpness, normalized angle deviation, and normalized illumination level.
[0030] The normalized sharpness, the normalized angle deviation, and the normalized illumination level are weighted and summed to obtain the quality score of the target accompanying shipment image;
[0031] Determine whether the quality score is greater than or equal to the quality threshold;
[0032] If so, the image quality of the target accompanying order image is determined to meet the preset conditions.
[0033] In one possible design, the quality threshold is determined in the following manner;
[0034] The order type is identified by performing order type recognition on the target order image to obtain the order type of the target order image, wherein the order type includes standard printed order, handwritten order and damaged order;
[0035] Obtain a threshold mapping dictionary, wherein the threshold mapping dictionary stores score thresholds corresponding to different bill of lading types;
[0036] Based on the order type, a score threshold corresponding to the order type is matched in the threshold mapping dictionary, and the matched score threshold is used as the quality threshold.
[0037] In one possible design, the difference image between the target accompanying document image and the reference accompanying document image is extracted, including:
[0038] Feature point matching processing is performed on the target accompanying document image and the reference accompanying document image to obtain matching feature pairs;
[0039] Based on the matching feature pairs, the perspective transformation matrix is calculated, and based on the perspective transformation matrix, the target accompanying document image and the reference accompanying document image are aligned to obtain the aligned target accompanying document image.
[0040] The aligned target shipment image and the reference shipment image are compared by performing a difference operation to obtain the initial difference image;
[0041] Contour extraction is performed on the initial difference image to obtain a difference contour image, and the difference contour image is used as the difference image;
[0042] The process of transmitting compressed images to the cloud includes:
[0043] Obtain the ID information of the baseline shipping document image;
[0044] A data packet is generated using the compressed image, the perspective transformation matrix, and the ID information of the reference shipment document image;
[0045] The data packet is transmitted to the cloud so that the cloud can retrieve the reference accompanying document image based on the ID information in the data packet, and decompress the compressed image to obtain a differential image. The differential image is then pasted into the reference accompanying document image to obtain a reconstructed image. The reconstructed image is then corrected based on the perspective transformation matrix to restore the target accompanying document image.
[0046] Secondly, a device for photo-based warehousing of goods accompanied by delivery notes based on correction prompts is provided, including:
[0047] The acquisition unit is used to acquire the image of the target accompanying shipping document;
[0048] The image quality assessment unit is used to perform image quality assessment on the target accompanying document image using an image quality assessment model, and obtain the image quality assessment result.
[0049] The image quality assessment unit is also used to determine, based on the image quality assessment results, whether the image quality of the target accompanying shipment image meets the preset conditions;
[0050] The image processing unit is configured to acquire a reference accompanying document image when the image quality assessment unit determines that the image quality meets the preset conditions, and extract the difference image between the target accompanying document image and the reference accompanying document image; and to output image shooting correction guidance prompt information when the image quality assessment unit determines that the image quality does not meet the preset conditions, so that after outputting the image shooting correction guidance prompt information, the target accompanying document image is reacquired until the image quality of the target accompanying document image meets the preset conditions. The difference image is used to characterize the image corresponding to the difference between the target accompanying document image and the reference accompanying document image.
[0051] A compression unit is used to compress the differential image to obtain a compressed image;
[0052] The transmission unit is used to transmit the compressed image to the cloud, so that the cloud can reconstruct the target shipment image based on the compressed image and the reference shipment image, and store the target shipment image in the database.
[0053] Thirdly, another device for photo-taking and warehousing with accompanying delivery note based on correction prompts is provided. Taking the device as an electronic device as an example, it includes a memory, a processor, and a transceiver that are connected in sequence. The memory is used to store computer programs, the transceiver is used to send and receive messages, and the processor is used to read the computer programs and execute the photo-taking and warehousing method based on correction prompts with accompanying delivery note as described in the first aspect or any possible design in the first aspect.
[0054] Fourthly, a storage medium is provided, on which instructions are stored, which, when executed on a computer, perform the method for photo-taking and warehousing based on correction prompts as described in the first aspect or any possible design of the first aspect.
[0055] Fifthly, a computer program product containing instructions is provided, which, when executed on a computer, causes the computer to perform the method for photo-taking and warehousing based on correction prompts as described in the first aspect or any possible design of the first aspect.
[0056] Beneficial effects:
[0057] (1) This invention utilizes an image quality assessment model to achieve quality assessment of the target accompanying order image. When the image quality does not meet the preset conditions, it outputs image shooting correction guidance information to prompt the user to take a corrective photo, thereby realizing the shooting correction guidance function. At the same time, after the image quality of the acquired target accompanying order image meets the preset conditions, a difference image is extracted to represent the difference between the target accompanying order image and the reference accompanying order image. Only the difference image is uploaded. Finally, the target accompanying order image is restored in the cloud using the difference image and the reference accompanying order image, thus completing the order entry into the warehouse. Therefore, this invention only transmits the difference between the current image and the reference image. Compared with traditional technology, it improves real-time performance and reduces invalid background, thereby avoiding the problem of resource waste. Based on this, this invention provides a real-time, resource-saving, quality assessment, and shooting guidance technology for accompanying order photography and warehouse entry, which can improve the accuracy of subsequent order recognition. Therefore, it is very suitable for large-scale application and promotion.
[0058] (2) The present invention dynamically adjusts the quality threshold according to different types of accompanying orders, thereby improving the accuracy and flexibility of image quality assessment. Attached Figure Description
[0059] Figure 1 This is a flowchart illustrating the steps of the method for photo-taking and warehousing with a delivery note based on correction prompts, as provided in an embodiment of the present invention.
[0060] Figure 2 This is a network structure diagram of the image quality assessment model provided in an embodiment of the present invention;
[0061] Figure 3 This is a network structure diagram of the inverted residual bottleneck block layer provided in an embodiment of the present invention;
[0062] Figure 4 This is a schematic diagram of the structure of the accompanying order photo-taking and warehousing device based on correction prompts provided in an embodiment of the present invention;
[0063] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0064] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the present invention will be briefly introduced below in conjunction with the accompanying drawings and descriptions of the embodiments or the prior art. Obviously, the following description of the structure of the accompanying drawings is only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. It should be noted that the description of these embodiments is for the purpose of helping to understand the present invention, but does not constitute a limitation of the present invention.
[0065] It should be understood that although the terms first, second, etc., may be used herein to describe various units, these units should not be limited by these terms. These terms are only used to distinguish one unit from another. For example, a first unit may be referred to as a second unit, and similarly, a second unit may be referred to as a first unit, without departing from the scope of the exemplary embodiments of the invention.
[0066] It should be understood that the term "and / or" that may appear in this document is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can mean: A exists alone, B exists alone, and A and B exist simultaneously. The term " / and" that may appear in this document describes another relationship between related objects, indicating that two relationships can exist. For example, A / and B can mean: A exists alone, and A and B exist alone. In addition, the character " / " that may appear in this document generally indicates that the related objects before and after it are in an "or" relationship.
[0067] Example:
[0068] See Figure 1 As shown in this embodiment, the method for image-based warehousing with accompanying documents based on correction prompts, after acquiring the target accompanying document image, first uses an image quality assessment model to assess its quality and obtain an image quality assessment result. Then, based on the image quality assessment result, it determines whether the image quality of the target accompanying document image meets preset conditions. If it does not meet the preset conditions, an image shooting correction guidance prompt will be output to prompt the user to take a corrective photo to re-acquire the target accompanying document image. Then, the aforementioned quality assessment process is repeated until the image quality meets the preset conditions. At this point, the subsequent differential transmission process can proceed, namely: acquiring a reference accompanying document image and extracting the parameters used to characterize the target accompanying document image and the reference image. The method involves obtaining a differential image of the differences between images on the delivery note; then, compressing the differential image to obtain a compressed image; finally, transmitting the compressed image to the cloud to complete the photo-taking and warehousing of the target delivery note. This method provides a real-time, resource-saving, quality-assessing, and photo-taking guidance technology for delivery notes, improving the accuracy of subsequent delivery note recognition. Therefore, it is highly suitable for large-scale application and promotion. For example, this method can be run on the delivery note photo-taking end. It is understood that the aforementioned execution entity does not constitute a limitation on the embodiments of this application. Accordingly, the operation steps of this method can be, but are not limited to, the steps S1 to S6 below.
[0069] S1. Obtain the target accompanying order image; in specific implementation, for example, but not limited to, a shipping order camera can be used to photograph the target accompanying order to obtain the corresponding target accompanying order image; wherein, the shipping order camera can be, but not limited to, an Android-based handheld terminal; in this way, after obtaining the target accompanying order image, an image quality assessment can be performed so that, based on the quality assessment results, either image differential processing or image shooting correction guidance prompts can be output.
[0070] The quality assessment process is shown in steps S2 and S3 below.
[0071] S2. An image quality assessment model is used to assess the image quality of the target accompanying document and obtain the image quality assessment result. In specific applications, this embodiment constructs a multi-branch task image quality assessment model from three aspects: sharpness, tilt, and exposure (i.e., illumination level).
[0072] Optionally, one specific network structure of the aforementioned image quality assessment model is disclosed below.
[0073] See Figure 2 As shown, the aforementioned image quality assessment model may include, but is not limited to, an initial convolutional layer, an inverted residual bottleneck layer, a global average pooling layer, and a fully connected layer; wherein, for example, the fully connected layer includes a first branch connection layer, a second branch connection layer, and a third branch connection layer, and each branch connection layer is provided with its own corresponding activation function; in this embodiment, the aforementioned three branch connection layers correspond to the three sub-tasks of sharpness, tilt, and exposure in sequence.
[0074] Furthermore, the detailed working process of each of the aforementioned network layers is disclosed below:
[0075] In specific implementation, the initial convolutional layer is used to perform feature extraction processing on the input target image to obtain the first feature image. In this embodiment, the initial convolutional layer mainly uses a 3×3 two-dimensional convolutional layer to perform downsampling, thereby realizing the initial feature extraction. For example, the convolution stride of the aforementioned two-dimensional convolutional layer is 3, the number of convolutional kernels is 16, and the HSigmoid activation function is used. In this way, the two-dimensional convolutional layer can be used to complete the initial feature extraction and obtain the first feature map.
[0076] Then, the first feature map is input to the inverse residual bottleneck layer for secondary feature extraction. That is, the inverse residual bottleneck layer is used to perform feature re-extraction processing on the first feature image based on the depthwise separable convolution mechanism and the attention mechanism to obtain the second feature image. In specific applications, the inverse residual bottleneck layer mainly uses depthwise separable convolution to perform convolution operations on the first feature image, thereby significantly reducing the number of parameters and computation. At the same time, it uses the internal Squeeze-and-Excitation (SE) attention module to adaptively calibrate the channel feature response, so that the network pays more attention to the information-rich features. In this way, by using the aforementioned depthwise separable convolution and attention module, secondary feature extraction can be achieved.
[0077] Furthermore, the following discloses one specific structure of the inverted residual bottleneck block layer:
[0078] See Figure 3 As shown, the inverse residual bottleneck block layer may include, but is not limited to, a bottleneck block and multiple sequentially connected MBConv blocks. The output feature of any MBConv block is fused with the input feature of that MBConv block to obtain a fused feature, which is then output to the next MBConv block (i.e., multiple MBConv blocks adopt a residual connection structure). The fused feature output by the last MBConv block is input to the bottleneck block so that the convolutional layer in the bottleneck block can be used to extract features and output a second feature image (the bottleneck block has no skip connections, i.e., no residual connections). That is, the output feature of the bottleneck block is the second feature image.
[0079] After describing the overall structure of the inverted residual bottleneck block layer, this embodiment discloses the detailed structure of any of the aforementioned MBConv blocks:
[0080] See Figure 3As shown, any MBConv block can include, but is not limited to, the following sequentially connected layers: a 1×1 up-dimensional convolutional layer, a 3×3 first depthwise separable convolutional layer, a 5×5 second depthwise separable convolutional layer, an SE attention layer (i.e., an SE attention module), and a 1×1 down-dimensional convolutional layer. Thus, the input features to any MBConv block are first processed by a 1×1 convolution in the up-dimensional convolutional layer. Then, depthwise convolution and pointwise convolution are performed within each depthwise separable convolutional layer to reduce the computational cost and model parameters in the network. Next, the channel attention mechanism in the SE attention layer is used to recalibrate the channels of the feature map to enhance useful features and suppress useless features. Then, dimensionality reduction is performed through the down-dimensional convolutional layer to obtain the output features. Finally, the output features of any MBConv block are fused with its corresponding input features (e.g., concatenated) to obtain fused features, which are then input to the next MBConv block.
[0081] Optionally, for example, the aforementioned inverse residual bottleneck block layer may include, but is not limited to, 8 MBConv blocks, and the bottleneck block includes three convolutional layers, wherein the first and third convolutional layers are 1x1 convolutions for dimensionality reduction and dimensionality increase, and the middle convolutional layer is a 3x3 convolution for feature extraction, and the input and output dimensions are different.
[0082] Thus, after feature re-extraction based on the aforementioned inverse residual bottleneck block layer, the obtained second feature map can be input into a global average pooling layer for global average pooling processing. That is, the global average pooling layer is used to perform global average pooling processing on the second feature image to obtain a one-dimensional feature vector. The global average pooling layer averages all values of each channel of the feature map to obtain a one-dimensional feature vector, which replaces the fully connected layer, greatly reducing parameters and preventing overfitting. Based on this, after completing the global average pooling, the one-dimensional feature vector can be input into three branch connection layers respectively, so as to output different quality index values with the help of the three branch connection layers.
[0083] Specifically, the first branch connection layer is used to map the one-dimensional feature vector using the corresponding activation function to obtain the sharpness score of the target accompanying order image. In this embodiment, the first branch connection layer outputs a scalar value, which is mapped to 0-100 points through the Sigmoid function, representing the relative sharpness of the target accompanying order image.
[0084] Similarly, the second branch connection layer is used to map the one-dimensional feature vector using the corresponding activation function to obtain the sine and cosine values of the yaw angle of the target image. In specific implementation, the second branch connection layer outputs two values, representing the sine (sin(θ)) and cosine (cos(θ)) values of the yaw angle (or pitch angle), respectively. This representation avoids the problem of angle periodicity (for example, 359° and 1° are very close, but the values are very different), and it represents the degree of tilt of the image.
[0085] In practice, this connection layer is processed by a specific activation function (such as tanh or Linear) to output two scalar values. These two values correspond to the sine component sin(θ) and the cosine component cos(θ) of the yaw angle (or pitch angle) of the target along with the cargo manifest image, respectively. This two-parameter representation effectively solves the boundary discontinuity problem in the angle interval [0, 2π). For example, when the true angle varies slightly between 359° and 1°, the traditional direct regression method will lead to a huge numerical jump, while the sine and cosine mapping can ensure that the loss function evolves continuously on the unit circle, thereby improving the model's stability and convergence speed in detecting image tilt.
[0086] Furthermore, in the second branch connection layer, the one-dimensional feature vector first passes through a fully connected layer (the fully connected layer has a dimension of 2, that is, the number of neurons is 2), thereby outputting two linear output values. Then, the tanh activation function is used to map the two linear output values to obtain the sine and cosine values.
[0087] When the output layer of the second branch connection layer adopts the tanh activation function, the output features can be restricted to the range of [-1,1], which is strictly aligned with the theoretical range of sine and cosine functions. This helps the model to quickly lock the parameter magnitude in the early stage of training and improve convergence efficiency.
[0088] This layer can also use linear activation combined with L2 norm normalization. By dividing the output vector by its magnitude, the output point is forced to fall on the unit circle, ensuring that the prediction result meets the geometric constraint that the sum of the squares of sin(θ) and cos(θ) equals 1, thus enhancing the physical rationality of the predicted angle. Of course, the processing also goes through a fully connected layer and then is mapped, and the process will not be described in detail.
[0089] Finally, the third branch connection layer is used to map the one-dimensional feature vector using the corresponding activation function to obtain the illumination level of the target image. In this embodiment, the third branch connection layer outputs a scalar value, which can be defined as a discrete level (e.g., 0 indicates underexposure, 1 indicates normal exposure, and 2 indicates overexposure) or a continuous illumination intensity value. The final output result is obtained through the Softmax (discrete) or Sigmoid (continuous) function.
[0090] Thus, based on the quantitative results of the three quality indicators output by the aforementioned three branch connection layers, the image quality assessment result of the target accompanying document image can be generated. That is, the image quality assessment result of the target accompanying document image is composed of the sharpness score, illumination level, and sine and cosine values of the yaw angle of the target accompanying document image.
[0091] Of course, the aforementioned image quality assessment model is pre-trained, that is, it is trained by taking each sample image along with the shipment manifest as input, and taking the sharpness score, illumination level, and sine and cosine values of the yaw angle of each sample image along with the shipment manifest as output.
[0092] Therefore, after obtaining the image quality assessment result of the target accompanying order image through the aforementioned image quality assessment model, the image quality can be judged based on this, as shown in step S3 below.
[0093] S3. Based on the image quality assessment results, determine whether the image quality of the target accompanying order image meets the preset conditions; in specific applications, for example, but not limited to, the following steps S31 to S35 can be used to determine whether the image quality meets the preset conditions.
[0094] S31. Based on the sine and cosine values of the yaw angle, calculate the angular deviation value of the target along with the cargo manifest image; in this embodiment, the formula for calculating the angular deviation value may be, but is not limited to: ,in, This represents the arctangent function in the four quadrants.
[0095] After calculating the angle deviation value, the data can be normalized to unify the dimensions, as shown in step S32 below.
[0096] S32. Normalize the sharpness score, the angle deviation value, and the illumination level to obtain normalized sharpness, normalized angle deviation, and normalized illumination level. In this embodiment, for example, the three indicators are normalized to a score between 0-100 or 0-1. The sharpness score itself is between 0-100, so no normalization is required. The angle deviation value can be converted into a score.
[0097] Specifically, for example, but not limited to, the following formula can be used to obtain the normalized angle deviation.
[0098] ;
[0099] In the formula, To normalize the angle deviation, This is the angular deviation value. This is the scaling factor, and its value is 2.
[0100] Finally, for the illumination level, the normalization process is as follows: if the illumination level is a discrete value (0, 1, 2), it can be mapped to (0, 100, 0) points; while if it is a continuous value, it can be mapped to 0-100 points.
[0101] Thus, after completing the normalization process of the aforementioned three indicators, the quality score of the target along with the shipment image can be calculated, as shown in step S33 below.
[0102] S33. Perform a weighted summation of the normalized sharpness, the normalized angle deviation, and the normalized illumination level to obtain the quality score of the target accompanying image; in this embodiment, for example, the weights of the aforementioned normalized sharpness, the normalized angle deviation, and the normalized illumination level may be, but are not limited to, 0.5, 0.3, and 0.2.
[0103] After calculating the quality score based on the aforementioned step S33, the quality score can be compared with the quality threshold to determine whether the image quality of the target accompanying order image meets the preset conditions; the determination process is as shown in the following steps S34 and S35.
[0104] S34. Determine whether the quality score is greater than or equal to the quality threshold; In specific applications, this embodiment provides a dynamic adjustment method for the quality threshold to improve the flexibility of use and the accuracy of quality judgment; For example, but not limited to, the following steps can be used to determine the quality threshold used this time.
[0105] Step 1: Perform shipment type identification on the target shipment image to obtain the shipment type of the target shipment image. The shipment type includes standard printed shipments, handwritten shipments, and damaged shipments. In this embodiment, for example, but not limited to, a lightweight image classification model can be used to quickly classify the shipment type to obtain the shipment type of the target shipment image. At the same time, the image classification model can share the backbone network with the image quality assessment model, that is, share the initial convolutional layer, the inverted residual bottleneck block layer, and the global average pooling layer. Then, a branch fully connected layer is added to learn the shipment classification subtask, thereby realizing shipment classification. Further, the aforementioned standard printed shipment is a shipment without damage or handwritten content, while a handwritten shipment is a shipment with handwritten content; similarly, a damaged shipment is a shipment with creases and / or stains.
[0106] Meanwhile, this embodiment maintains a type-threshold mapping dictionary (i.e., the threshold mapping dictionary below). Therefore, after obtaining the order type, threshold matching can be performed based on the mapping dictionary, as shown in the second and third steps below.
[0107] Step 2: Obtain the threshold mapping dictionary, which stores the score thresholds corresponding to different order types. In this embodiment, for example, the score threshold for a standard printed order is 85, the score threshold for a handwritten order is 75, and the score threshold for a damaged order is 65. Of course, the score thresholds for the aforementioned different order types can be set according to actual use, and this embodiment is not limited to the aforementioned example.
[0108] After obtaining the threshold mapping dictionary, the target can be matched with the quality threshold corresponding to the shipment image, as shown in the following process.
[0109] Step 3: Based on the order type, match the score threshold corresponding to the order type in the threshold mapping dictionary, and use the matched score threshold as the quality threshold.
[0110] Thus, this embodiment dynamically adjusts the quality threshold according to different order types, which can improve the accuracy and flexibility of image quality assessment and thus adapt to different order scenarios.
[0111] After obtaining the quality threshold, the quality score of the target accompanying document image can be compared with it. When the quality score is greater than or equal to the quality threshold, the image quality of the target accompanying document image is determined to be qualified, that is, it meets the conditions of sharpness, tilt and exposure. Otherwise, the image quality is determined to be unqualified. The process is shown in step S35.
[0112] S35. If so, determine that the image quality of the target accompanying order image meets the preset conditions.
[0113] After completing the quality assessment of the target shipment image through the aforementioned steps S31 to S35, different processing methods can be selected based on the image quality judgment results, as shown in step S4 below.
[0114] S4. If yes, acquire the baseline accompanying document image and extract the difference image between the target accompanying document image and the baseline accompanying document image; otherwise, output image shooting correction guidance information so that after outputting the image shooting correction guidance information, the target accompanying document image is reacquired until the image quality of the target accompanying document image meets the preset conditions. The difference image is used to characterize the image corresponding to the difference between the target accompanying document image and the baseline accompanying document image. In this embodiment, when the image quality does not meet the preset conditions, it is necessary to output image shooting correction guidance information to prompt the user to take a corrective photo and then reacquire the baseline accompanying document image. Then, the image quality is evaluated again until the image quality meets the preset conditions, and then the differential transmission process can be entered.
[0115] Furthermore, the image shooting correction guidance information may include, but is not limited to, a quality score, quality issues, and shooting suggestions. Among these, adjustment suggestions corresponding to different quality issues are pre-stored (such as adjustment suggestions for overexposure, adjustment suggestions for blur, etc.). When in use, the aforementioned quality assessment results are used to match the issues, and then the suggestions are matched to obtain the adjustment suggestions.
[0116] Furthermore, a circular progress bar can be used to display the quality score on the image capture interface of the order form, and to show quality problems such as underexposure, overexposure, tilt, and blur, as well as output adjustment suggestions, such as using dynamic AR arrows to guide the adjustment direction. Based on this, the function of shooting correction guidance can be realized.
[0117] In addition, in this embodiment, if the quality assessment fails three times in a row, a voice prompt can be triggered.
[0118] Optionally, when the image quality of the target accompanying document image meets the preset conditions, differential transmission can be performed, that is, the differential image between the target accompanying document image and the reference accompanying document image can be extracted. The reference accompanying document image is determined as follows: when the current document is photographed and put into storage, the first image of the accompanying document whose image quality meets the preset conditions is used as the reference frame (its ID information is recorded), that is, as the reference accompanying document image, and is encoded with a standard, high compression ratio (such as JPEG quality set to 75), and then transmitted to the cloud; this frame serves as the reference image for subsequent frames.
[0119] Thus, after obtaining the baseline accompanying document image, the differential image can be extracted, and the process can be, but is not limited to, as shown in steps S41 to S44 below.
[0120] S41. Perform feature point matching processing on the target accompanying document image and the reference accompanying document image to obtain matching feature pairs. In this embodiment, for example, but not limited to, a fast feature point detection algorithm (such as ORB) can be used to perform feature matching on the reference frame and the target accompanying document image to obtain matching feature pairs (i.e., matching feature point pairs). Then, based on this, the perspective transformation matrix can be determined, and the process is shown in step S42 below.
[0121] S42. Based on the matching feature pairs, calculate the perspective transformation matrix, and based on the perspective transformation matrix, align the target accompanying document image with the reference accompanying document image to obtain the aligned target accompanying document image. In this embodiment, using the perspective transformation matrix to align the target accompanying document image with the reference accompanying document image can eliminate lens shake or displacement. Of course, the aforementioned feature matching, perspective transformation matrix calculation, and image alignment are all common image processing techniques, and their principles will not be elaborated upon one by one.
[0122] Once the aligned target image is obtained, differential calculation can be performed, as shown in step S43 below.
[0123] S43. Perform a difference operation on the aligned target shipment image and the reference shipment image to obtain an initial difference image. In this embodiment, the absolute value difference operation is performed on the grayscale images of the two to obtain the initial difference image. The brighter areas in this image represent the parts with greater differences between the two images. After obtaining the initial difference image, binarization and contour finding can be performed, as shown in step S44 below.
[0124] S44. Extract contours from the initial difference image to obtain a difference contour image, and use the difference contour image as the difference image; in specific implementation, perform threshold binarization on the difference image and find contours, and the regions enclosed by these contours are the parts that have changed relative to the reference frame, such as the addition of handwritten notes, the stamping of seals, etc.; in this way, the difference contour image can be used as the difference image, that is, the image blocks within the contours (and the coordinate information will be recorded) are used as the difference image.
[0125] After the differential image is extracted through the aforementioned steps S41 to S44, image compression can be performed, as shown in step S5 below.
[0126] S5. The differential image is compressed to obtain a compressed image; after image compression is completed, data transmission can be performed, as shown in step S6 below.
[0127] S6. The compressed image is transmitted to the cloud so that the cloud can reconstruct the target shipment image based on the compressed image and the reference shipment image, and store the target shipment image in the database. In specific implementation, the specific transmission process of the compressed image is as follows: First, obtain the ID information of the reference shipment image; then, generate a data packet using the compressed image, the perspective transformation matrix, and the ID information of the reference shipment image; finally, transmit the data packet to the cloud; thus, the cloud can retrieve the reference shipment image based on the ID information in the data packet, and decompress the compressed image in the data packet to obtain a difference image; then, paste the difference image into the reference shipment image (i.e., paste it back to the corresponding position of the reference shipment image according to the coordinate information of the corresponding area of the difference image) to obtain the reconstructed image; then, the reconstructed image can be corrected based on the perspective transformation matrix to restore the target shipment image; finally, the target shipment image can be sent to the OCR engine for recognition to obtain the shipment data, and finally realize the shipment data entry into the database.
[0128] Optionally, in this embodiment, the success rate of each type of order can be recorded. If the pass rate of a certain type of order is consistently too low, its corresponding threshold can be automatically adjusted to achieve more intelligent optimization.
[0129] In addition, this embodiment provides comparative data on the effects of the method provided in this embodiment and traditional techniques.
[0130] Based on on-site testing (at a cold chain logistics warehouse, March 2024), the following results were achieved:
[0131] Table 1 is a comparison table of the effects of the method provided in the embodiments and the traditional technology.
[0132] Table 1
[0133]
[0134] In the table above, the improvement in single processing time refers to the processing efficiency of the present invention, which is 66.7% higher than that of the traditional solution. The improvement in network traffic consumption refers to the network traffic consumption of the present invention, which is 35% lower than that of the traditional solution.
[0135] The aforementioned comparison of effects demonstrates that the real-time performance and accuracy of the method provided in this implementation are significantly higher than those of traditional technologies.
[0136] Therefore, through the aforementioned steps S1 to S6, which describe in detail the method of taking photos of goods with the delivery note for warehousing, this invention provides a real-time, resource-saving, quality assessment, and photo-taking guidance technology for taking photos of goods with the delivery note. This technology can improve the accuracy of subsequent goods note recognition and is therefore very suitable for large-scale application and promotion.
[0137] like Figure 4 As shown, the second aspect of this embodiment provides a hardware device for implementing the method of taking photos of goods documents for warehousing based on correction prompts as described in the first aspect of the embodiment, including:
[0138] The acquisition unit is used to acquire the image of the target shipment document.
[0139] The image quality assessment unit is used to perform image quality assessment on the target accompanying document image using an image quality assessment model, and obtain the image quality assessment result.
[0140] The image quality assessment unit is also used to determine, based on the image quality assessment results, whether the image quality of the target accompanying shipment image meets the preset conditions.
[0141] The image processing unit is configured to acquire a reference accompanying document image when the image quality assessment unit determines that the image quality meets preset conditions, and extract the difference image between the target accompanying document image and the reference accompanying document image; and to output image shooting correction guidance information when the image quality assessment unit determines that the image quality does not meet preset conditions, so that after outputting the image shooting correction guidance information, the target accompanying document image is reacquired until the image quality of the target accompanying document image meets preset conditions. The difference image is used to characterize the image corresponding to the difference between the target accompanying document image and the reference accompanying document image.
[0142] A compression unit is used to compress the differential image to obtain a compressed image.
[0143] The transmission unit is used to transmit the compressed image to the cloud, so that the cloud can reconstruct the target shipment image based on the compressed image and the reference shipment image, and store the target shipment image in the database.
[0144] The working process, working details and technical effects of the device provided in this embodiment can be found in the first aspect of the embodiment, and will not be repeated here.
[0145] like Figure 5As shown, the third aspect of this embodiment provides another device for photo-taking and warehousing with a delivery note based on correction prompts. Taking the device as an electronic device as an example, it includes: a memory, a processor, and a transceiver that are connected in sequence. The memory is used to store a computer program, the transceiver is used to send and receive messages, and the processor is used to read the computer program and execute the photo-taking and warehousing method with a delivery note based on correction prompts as described in the first aspect of the embodiment.
[0146] For specific examples, the memory may include, but is not limited to, random access memory (RAM), read-only memory (ROM), flash memory, first-in-first-out (FIFO) memory, and / or first-in-last-out (FILO) memory, etc.; specifically, the processor may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor may be implemented using at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), PLA (Programmable Logic Array). The processor may also include a main processor and a coprocessor. The main processor, also known as the CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state.
[0147] In some embodiments, the processor may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. For example, the processor may not be limited to microprocessors of the STM32F105 series, reduced instruction set computer (RISC) microprocessors, x86 architecture processors, or processors with integrated neural network processing units (NPUs). The transceiver may be, but is not limited to, a Wi-Fi transceiver, a Bluetooth transceiver, a General Packet Radio Service (GPRS) transceiver, a ZigBee (a low-power LAN protocol based on the IEEE 802.15.4 standard) transceiver, a 3G transceiver, a 4G transceiver, and / or a 5G transceiver. Furthermore, the device may also include, but is not limited to, a power module, a display screen, and other necessary components.
[0148] The working process, working details and technical effects of the electronic device provided in this embodiment can be found in the first aspect of the embodiment, and will not be repeated here.
[0149] The fourth aspect of this embodiment provides a storage medium that stores instructions containing the method for taking photos of goods documents for warehousing based on correction prompts as described in the first aspect of the embodiment. That is, the storage medium stores instructions that, when the instructions are run on a computer, execute the method for taking photos of goods documents for warehousing based on correction prompts as described in the first aspect of the embodiment.
[0150] The storage medium refers to a carrier for storing data, which may include, but is not limited to, floppy disks, optical disks, hard disks, flash memory, USB flash drives, and / or memory sticks. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
[0151] The working process, working details and technical effects of the storage medium provided in this embodiment can be found in the first aspect of the embodiment, and will not be repeated here.
[0152] The fifth aspect of this embodiment provides a computer program product containing instructions that, when executed on a computer, cause the computer to perform the method for taking photos of goods and putting them into storage based on correction prompts as described in the first aspect of the embodiment. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
[0153] Finally, it should be noted that the above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for warehousing by photographing accompanying delivery notes based on correction prompts, characterized in that, include: Obtain the image of the target shipment document; An image quality assessment model is used to assess the image quality of the target accompanying document image, and the image quality assessment results are obtained. Based on the image quality assessment results, determine whether the image quality of the target accompanying shipment image meets the preset conditions; If so, then acquire the baseline packing slip image and extract the difference image between the target packing slip image and the baseline packing slip image; otherwise, output image shooting correction guidance prompt information, so that after outputting the image shooting correction guidance prompt information, reacquire the target packing slip image until the image quality of the target packing slip image meets the preset conditions. The difference image is used to characterize the image corresponding to the difference between the target packing slip image and the baseline packing slip image. The differential image is compressed to obtain a compressed image; The compressed image is transmitted to the cloud so that the cloud can reconstruct the target shipment image based on the compressed image and the reference shipment image, and then store the target shipment image in the database. The image quality assessment model includes: an initial convolutional layer, an inverted residual bottleneck block layer, a global average pooling layer, and a fully connected layer. The fully connected layer includes a first branch connection layer, a second branch connection layer, and a third branch connection layer, and each branch connection layer has its own corresponding activation function. The initial convolutional layer is used to perform feature extraction processing on the input target image along with the delivery note to obtain the first feature image; The inverted residual bottleneck block layer is used to perform feature re-extraction processing on the first feature image based on the depthwise separable convolution mechanism and the attention mechanism to obtain the second feature image; A global average pooling layer is used to perform global average pooling on the second feature image to obtain a one-dimensional feature vector, and then input the one-dimensional feature vector into the three branch connection layers respectively. The first branch connection layer is used to map the one-dimensional feature vector using the corresponding activation function to obtain the clarity score of the target image along with the shipment order. The second branch connection layer is used to map the one-dimensional feature vector using the corresponding activation function to obtain the sine and cosine values of the yaw angle of the target along the waybill image. The third branch connection layer is used to map the one-dimensional feature vector using the corresponding activation function to obtain the illumination level of the target image along with the shipment order. The image quality assessment result of the target cargo manifest image is composed of the sharpness score, illumination level, and sine and cosine values of the yaw angle. Based on the image quality assessment results, determine whether the image quality of the target accompanying shipment image meets preset conditions, including: Based on the sine and cosine values of the yaw angle, the angular deviation value of the target along with the cargo manifest image is calculated; The sharpness score, the angle deviation value, and the illumination level are normalized to obtain normalized sharpness, normalized angle deviation, and normalized illumination level. The normalized sharpness, the normalized angle deviation, and the normalized illumination level are weighted and summed to obtain the quality score of the target accompanying shipment image; Determine whether the quality score is greater than or equal to the quality threshold; If so, the image quality of the target accompanying order image is determined to meet the preset conditions; The quality threshold is determined in the following manner; The order type is identified by performing order type recognition on the target order image to obtain the order type of the target order image, wherein the order type includes standard printed order, handwritten order and damaged order; Obtain a threshold mapping dictionary, wherein the threshold mapping dictionary stores score thresholds corresponding to different bill of lading types; Based on the order type, a score threshold corresponding to the order type is matched in the threshold mapping dictionary, and the matched score threshold is used as the quality threshold. Extracting the difference image between the target accompanying document image and the reference accompanying document image includes: Feature point matching processing is performed on the target accompanying document image and the reference accompanying document image to obtain matching feature pairs; Based on the matching feature pairs, the perspective transformation matrix is calculated, and based on the perspective transformation matrix, the target accompanying document image and the reference accompanying document image are aligned to obtain the aligned target accompanying document image. The aligned target shipment image and the reference shipment image are compared by performing a difference operation to obtain the initial difference image; Contour extraction is performed on the initial difference image to obtain a difference contour image, and the difference contour image is used as the difference image; The process of transmitting compressed images to the cloud includes: Obtain the ID information of the baseline shipping document image; A data packet is generated using the compressed image, the perspective transformation matrix, and the ID information of the reference shipment document image; The data packet is transmitted to the cloud so that the cloud can retrieve the reference accompanying document image based on the ID information in the data packet, and decompress the compressed image to obtain a differential image. The differential image is then pasted into the reference accompanying document image to obtain a reconstructed image. The reconstructed image is then corrected based on the perspective transformation matrix to restore the target accompanying document image.
2. The method according to claim 1, characterized in that, The inverted residual bottleneck block layer includes a bottleneck block and multiple MBConv blocks connected in sequence. The output feature of any MBConv block is fused with the input feature of that MBConv block to obtain the fused feature, which is then output to the next MBConv block. The fused feature output by the last MBConv block is input to the bottleneck block, and the bottleneck block is used to output the second feature image. Each MBConv block includes a 1×1 up-dimensional convolutional layer, a 3×3 first depthwise separable convolutional layer, a 5×5 second depthwise separable convolutional layer, an SE attention layer, and a 1×1 down-dimensional convolutional layer connected in sequence.
3. The method according to claim 1, characterized in that, The initial convolutional layer is a 3×3 two-dimensional convolutional layer, wherein the convolution stride of the two-dimensional convolutional layer is 3, the number of convolutional kernels is 16, and the inverted residual bottleneck block layer includes 8 MBConv blocks.
4. A device for photo-based warehousing of goods with corrective prompts, characterized in that, The device is used to perform the method for photographing and storing goods in accordance with a delivery note based on correction prompts as described in any one of claims 1 to 3, wherein the device comprises: The acquisition unit is used to acquire the image of the target accompanying shipping document; The image quality assessment unit is used to perform image quality assessment on the target accompanying document image using an image quality assessment model, and obtain the image quality assessment result. The image quality assessment unit is also used to determine, based on the image quality assessment results, whether the image quality of the target accompanying shipment image meets the preset conditions; The image processing unit is configured to acquire a reference accompanying document image when the image quality assessment unit determines that the image quality meets the preset conditions, and extract the difference image between the target accompanying document image and the reference accompanying document image; and to output image shooting correction guidance prompt information when the image quality assessment unit determines that the image quality does not meet the preset conditions, so that after outputting the image shooting correction guidance prompt information, the target accompanying document image is reacquired until the image quality of the target accompanying document image meets the preset conditions. The difference image is used to characterize the image corresponding to the difference between the target accompanying document image and the reference accompanying document image. A compression unit is used to compress the differential image to obtain a compressed image; The transmission unit is used to transmit the compressed image to the cloud, so that the cloud can reconstruct the target shipment image based on the compressed image and the reference shipment image, and store the target shipment image in the database.
5. A device for photo-based warehousing of goods with corrective prompts, characterized in that, include: The system comprises a memory, a processor, and a transceiver connected in sequence for communication. The memory is used to store computer programs, the transceiver is used to send and receive messages, and the processor is used to read the computer programs and execute the method for photo-taking and warehousing with goods order based on correction prompts as described in any one of claims 1 to 3.
6. A storage medium, characterized in that, The storage medium stores instructions that, when executed on a computer, perform the method for taking photos of goods and putting them into storage based on correction prompts as described in any one of claims 1 to 3.