Attitude correction method and electronic device

By using a neural network to obtain the ground normal vector and accelerometer gain coefficient on a mobile device, the attitude parameters of the AHRS algorithm are corrected, which solves the correction error caused by accelerometer motion error and improves the accuracy of attitude estimation.

WO2026123788A1PCT designated stage Publication Date: 2026-06-18BEIJING AUTONAVI YUNMAP TECH CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
BEIJING AUTONAVI YUNMAP TECH CO LTD
Filing Date
2025-08-27
Publication Date
2026-06-18

Smart Images

  • Figure CN2025117220_18062026_PF_FP_ABST
    Figure CN2025117220_18062026_PF_FP_ABST
Patent Text Reader

Abstract

An attitude correction method and an electronic device. By means of the attitude correction method, a ground normal vector can be calculated by using a depth map and a ground segmentation mask of an image captured by a mobile device, and camera intrinsic parameters. Because the ground normal vector is close to an actual gravity direction, the ground normal vector can be used for correcting a measurement error generated by a built-in accelerometer of the mobile device in the vertical direction during movement. In this way, when attitude correction is performed on a gyroscope by using measurement data of the accelerometer, the accuracy of the attitude correction is improved, thereby ensuring the accuracy of the attitude of the mobile device in the vertical direction determined by an AHRS algorithm.
Need to check novelty before this filing date? Find Prior Art

Description

An attitude correction method and electronic device

[0001] This disclosure claims priority to Chinese Patent Application No. 202411845246.9, filed on December 13, 2024, entitled "A Posture Correction Method and Electronic Device", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This disclosure relates to the field of positioning technology, and in particular to an attitude correction method and electronic device. Background Technology

[0003] The Attitude and Heading Reference System (AHRS) algorithm utilizes data measured separately by gyroscopes, accelerometers, and / or magnetometers integrated into mobile devices (such as smartphones). By fusing this data with an optimization-based residual correction approach, the algorithm obtains the mobile device's attitude. The mobile device attitude obtained using the AHRS algorithm can be used to support functions such as AR (Augmented Reality) navigation and indoor positioning.

[0004] The inventors of this disclosure have discovered that when using the AHRS algorithm to obtain the attitude parameters of a mobile device, the acceleration measured by the accelerometer on the mobile device can be used to correct the attitude parameters of the mobile device in the vertical direction. Furthermore, the closer the acceleration measured by the accelerometer is to the direction of gravity, the more accurate the correction of the attitude parameters in the vertical direction. However, when the mobile device itself moves, the acceleration measured by the accelerometer is simultaneously affected by the gravitational acceleration and the acceleration generated by the movement of the mobile device itself. This results in the accelerometer measuring acceleration not only being the vertical gravitational acceleration. In this case, using the acceleration measured by the accelerometer to correct the attitude parameters of the mobile device in the vertical direction may actually introduce errors into the correction. Therefore, a new attitude correction technical solution is needed to overcome the problems existing in the prior art. Summary of the Invention

[0005] In view of this, the present disclosure provides an attitude correction method and an electronic device to improve the accuracy of attitude correction in the vertical direction.

[0006] In a first aspect, this disclosure provides an attitude correction method for correcting the attitude of a mobile device equipped with at least an accelerometer, a camera, and a gyroscope, comprising:

[0007] The image captured by the camera is input into the first trained neural network unit to obtain the depth map and ground segmentation mask of the image;

[0008] The image, the depth map, and the ground segmentation mask are input into a trained second neural network unit to obtain the confidence level of the ground normal vector corresponding to the depth map.

[0009] Based on the depth map, the ground segmentation mask, and the camera intrinsic parameters, the ground normal vector is determined;

[0010] The gain coefficient of the accelerometer is determined based on the measured value of the accelerometer, the gravitational acceleration, the preset adjustment coefficient, and the confidence level of the ground normal vector.

[0011] Based on the preset adjustment coefficient and the confidence level of the ground normal vector, the gain coefficient of the ground normal vector is determined;

[0012] The attitude parameters of the mobile device in the vertical direction, obtained by the AHRS algorithm, are corrected based on the initial rotation matrix of the AHRS algorithm, the accelerometer measurement, the ground normal vector, the gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector.

[0013] In a second aspect of this disclosure, an attitude correction device is provided for attitude correction of a mobile device equipped with at least an accelerometer, a camera, and a gyroscope, the device comprising:

[0014] The acquisition unit is used to input the image captured by the camera into the first trained neural network unit to obtain the depth map and ground segmentation mask of the image;

[0015] The acquisition unit is further configured to input the image, the depth map, and the ground segmentation mask into a trained second neural network unit to obtain the confidence level of the ground normal vector corresponding to the depth map;

[0016] The determining unit is used to determine the ground normal vector based on the depth map, the ground segmentation mask, and the camera intrinsic parameters;

[0017] The determining unit is used to determine the gain coefficient of the accelerometer based on the measured value of the accelerometer, the gravitational acceleration, the preset adjustment coefficient, and the confidence level of the ground normal vector.

[0018] The determining unit is further configured to determine the gain coefficient of the ground normal vector based on the preset adjustment coefficient and the confidence level of the ground normal vector;

[0019] The correction unit is used to correct the vertical attitude parameters of the mobile device obtained by the AHRS algorithm based on the initial rotation matrix of the AHRS algorithm, the measurement value of the accelerometer, the ground normal vector, the gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector.

[0020] Thirdly, this disclosure provides an electronic device, which includes: a memory and a processor;

[0021] The memory is used to store the relevant program code;

[0022] The processor is used to call the program code to execute the attitude correction method described in the first aspect above.

[0023] Fourthly, this disclosure provides a computer-readable storage medium for storing a computer program for performing the attitude correction method described in the first aspect.

[0024] Fifthly, a computer program product is provided that, when executed by a processor, implements the attitude correction method as described in the first aspect.

[0025] Therefore, this disclosure has the following beneficial effects:

[0026] In the above implementation of this disclosure, for a mobile device equipped with an accelerometer, camera, and gyroscope, the image captured by the camera is input into a trained first neural network unit to obtain a depth map and a ground segmentation mask corresponding to the image. The image, depth map, and ground segmentation mask are then input into a trained second neural network unit to obtain the confidence level of the ground normal vector corresponding to the depth map. This confidence level represents the reliability of the ground normal vector calculated based on the depth map and the ground segmentation mask. Based on the depth map, the ground segmentation mask, and camera intrinsic parameters, the ground normal vector is determined. The ground normal vector generally points in a direction perpendicular to the ground, i.e., the direction of gravity. Since the ground normal vector calculated based on the depth map, the ground segmentation mask, and the camera intrinsic parameters is close to the actual direction of gravity, the gain coefficient of the accelerometer, determined based on the accelerometer measurement, gravitational acceleration, a preset adjustment coefficient, and the confidence level of the ground normal vector, characterizes the degree of consistency between the accelerometer output signal and the actual direction of gravity. Meanwhile, although the calculated ground normal vector is close to the actual gravity direction, it is not exactly equal to it. Therefore, it is necessary to determine the gain coefficient of the ground normal vector based on a preset adjustment coefficient and the confidence level of the ground normal vector. This gain coefficient characterizes the reliability of the ground normal vector relative to the actual gravity direction. Finally, based on the initial rotation matrix of the AHRS (Attitude and Heading Reference System) algorithm, the accelerometer measurements, the ground normal vector, gravitational acceleration, the accelerometer gain coefficient, and the gain coefficient of the ground normal vector, the attitude parameters of the mobile device in the vertical direction obtained by the AHRS algorithm are corrected. This correction method can compensate for the error of the accelerometer in the vertical direction when the mobile device is moving, and improve the vertical attitude accuracy of the mobile device obtained by the AHRS algorithm. Attached Figure Description

[0027] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments provided in this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings.

[0028] Figure 1 is a schematic diagram of a neural network unit structure provided in an embodiment of this disclosure;

[0029] Figure 2 is a flowchart of an attitude correction method provided in an embodiment of this disclosure;

[0030] Figure 3 is a schematic diagram of an attitude correction technology framework provided in an embodiment of this disclosure;

[0031] Figure 4 is a structural diagram of an attitude correction device provided in an embodiment of this disclosure;

[0032] Figure 5 is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. Detailed Implementation

[0033] The technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. The described embodiments are merely exemplary implementations of this disclosure and not all implementation methods. Those skilled in the art can obtain other embodiments in conjunction with the embodiments of this disclosure without creative effort, and these embodiments are also within the protection scope of this disclosure.

[0034] Currently, some applications installed on mobile devices (hereinafter referred to as devices) can provide users with services such as AR navigation and indoor positioning. These services rely on the AHRS algorithm to obtain the attitude information of mobile devices (such as mobile phones and tablets). The AHRS algorithm determines the attitude of the device through various sensors on the device, such as gyroscopes, accelerometers, and magnetometers. Gyroscopes can measure the angular velocity of the device and calculate the rotation angle by integrating the angular velocity, thereby obtaining the attitude of the device. However, the measurement data of gyroscopes contains errors and noise. After integration, these errors gradually accumulate, causing the deviation to increase over time, ultimately affecting the accuracy of attitude estimation. Accelerometers can measure gravitational acceleration. When the device is stationary or moving at a constant speed, the direction of gravity is vertically downward. At this time, the accumulated error of the gyroscope in the vertical direction can be corrected by the data from the accelerometer, which can obtain more accurate attitude information. The closer the acceleration measured by the accelerometer is to the direction of gravity, the more accurate the correction effect. However, the acceleration measured by the accelerometer includes not only gravitational acceleration but is also affected by the acceleration generated by the device's own motion. Therefore, when using accelerometer data to correct the vertical attitude parameters of a gyroscope, this additional acceleration component introduces errors into the correction process.

[0035] Based on this, embodiments of this disclosure provide an attitude correction method. This method inputs an image captured by a mobile device into a pre-trained multi-task neural network to obtain a depth map and a ground segmentation mask. Then, the image, depth map, and ground segmentation mask are input into the neural network to obtain the confidence level of the ground normal vector corresponding to the depth map. Based on the depth map, ground segmentation mask, and camera intrinsic parameters, the ground normal vector is determined. Since this ground normal vector is close to the actual gravity direction, the gain coefficient of the accelerometer, determined based on accelerometer measurements, gravitational acceleration, a preset adjustment coefficient, and the confidence level of the ground normal vector, characterizes the degree of consistency between the accelerometer output signal and the actual gravity direction. Furthermore, although the calculated ground normal vector is close to the actual gravity direction, it is not completely equal to it. Therefore, the gain coefficient of the ground normal vector is determined based on a preset adjustment coefficient and the confidence level of the ground normal vector. This gain coefficient characterizes the reliability of the ground normal vector relative to the actual gravity direction. Finally, based on the initial rotation matrix of the AHRS algorithm, the accelerometer measurements, the ground normal vector, gravitational acceleration, the accelerometer gain coefficient, and the ground normal vector gain coefficient, the vertical attitude parameters of the mobile device obtained by the AHRS algorithm are corrected. It is evident that this correction method can compensate for the vertical error of the accelerometer during mobile device movement, improving the vertical attitude accuracy of the mobile device determined by the AHRS algorithm.

[0036] To facilitate understanding of the technical solutions provided in the embodiments of this disclosure, a detailed description will be given below in conjunction with the accompanying drawings.

[0037] Given that the technical solution provided in this disclosure requires the use of trained neural network units, the neural network units in the technical solution of this disclosure will be explained below to facilitate understanding of the technical solution.

[0038] Figure 1 is a schematic diagram of a neural network unit structure provided in an embodiment of this disclosure. As shown in Figure 1, the technical solution of this disclosure includes a first neural network unit and a second neural network unit. The first neural network unit is used to obtain a depth map and a ground segmentation mask corresponding to the original image (an image captured by a camera mounted on the device). Since the first neural network unit can output the depth map and ground segmentation mask of the original image, it is a multi-task neural network unit. The depth map is a grayscale image. The value of each pixel in the depth map represents the distance between the object corresponding to that pixel in the original image and the camera. Generally, a higher pixel value means that the object is closer to the camera, while a lower pixel value indicates that the object is farther away from the camera. The ground segmentation mask is a binary or labeled image that assigns a label to each pixel to distinguish whether the pixel belongs to the ground. Specifically, different labels in the mask are used to identify which areas in the original image are considered ground and which areas are classified as non-ground (i.e., background or other objects).

[0039] The second neural network unit processes the original image, the corresponding depth map, and the ground segmentation mask to obtain the confidence level of the ground normal vector corresponding to the depth map. In other words, the second neural network unit can obtain a confidence level that reflects the ground normal vector calculated from the depth map.

[0040] The neuron structure of the first neural network unit and the second neural network unit can be set according to the actual application scenario, and this embodiment does not limit it.

[0041] Since the first neural network unit is a multi-task neural network unit, it includes an encoding neural network unit 101, a first decoding neural network unit 102, and a second decoding neural network unit 103 as shown in Figure 1; the second neural network unit includes a third decoding neural network unit 104.

[0042] (I) Training of the first neural network unit

[0043] The first neural network unit includes a first decoding neural network unit 102 and a second decoding neural network unit 103, which are respectively cascaded with an encoding neural network unit 101. The training process of the first neural network unit includes:

[0044] (1) Input a single-frame grayscale image with labeled depth values ​​into a cascaded encoding neural network unit 101 and a first decoding neural network unit 102 to obtain the depth value of the depth map predicted for the grayscale image by the first decoding neural network unit 102, and obtain the loss based on the depth value and the labeled depth value of the grayscale image. Use the loss to iteratively train the encoding neural network unit 101 and the first decoding neural network unit 102 until the loss reaches the preset training target.

[0045] That is, a single-frame grayscale image and its corresponding annotation are input into the encoding neural network unit 101. The encoded feature vector output by the encoding neural network unit 101 is input into the first decoding neural network unit 102. The first decoding neural network unit 102 outputs the depth value of the predicted depth map for that grayscale image. Then, the first decoding neural network unit 102 obtains a loss value based on the depth value annotated in the grayscale image and the predicted depth value. Based on this loss value, the encoding neural network unit 101 and the first decoding neural network unit 102 are iteratively trained until the loss between the predicted depth value and the annotated depth value satisfies the training objective.

[0046] The loss value represents the error between the labeled depth value (true value) and the predicted depth value (predicted value). A larger loss value indicates a greater error between the true and predicted values, meaning the parameters of the encoding neural network unit 101 and the first decoding neural network unit 102 are less accurate. Therefore, the parameters of the encoding neural network unit 101 and the first decoding neural network unit 102 need to be adjusted, and retraining is required until the determined loss value is less than a preset threshold. This indicates that the error between the true and predicted values ​​meets the requirements, resulting in well-trained encoding neural network unit 101 and the first decoding neural network unit 102.

[0047] (2) Input the sample image and the ground segmentation mask corresponding to the sample image into the cascaded encoding neural network unit 101 and the second encoding neural network unit 103. Based on the loss of the predicted ground segmentation mask for the sample image and the ground segmentation mask corresponding to the sample image output by the second encoding neural network unit 1403, iteratively train the encoding neural network unit 101 and the second decoding neural network unit 103 until the loss reaches the preset training target.

[0048] That is, the sample image and its corresponding ground segmentation mask are input into the encoding neural network unit 101, and the encoded feature vector output by the encoding neural network unit 101 is input into the second decoding neural network unit 103. The second decoding neural network unit 103 outputs the predicted ground segmentation mask for the sample image. Then, the second decoding neural network unit 103 obtains a loss value based on the ground segmentation mask labeled in the sample image and the predicted ground segmentation mask. Based on this loss value, the encoding neural network unit 101 and the second decoding neural network unit 103 are iteratively trained until the loss between the predicted ground segmentation mask and the labeled ground segmentation mask satisfies the training objective.

[0049] In this embodiment, the loss value is used to represent the error between the labeled ground segmentation mask (true value) and the predicted ground segmentation mask (predicted value). A larger loss value indicates a larger error between the true value and the predicted value, meaning the parameters of the encoding neural network unit 101 and the second decoding neural network unit 103 are less accurate. Therefore, the parameters of the encoding neural network unit 101 and the second decoding neural network unit 103 need to be adjusted, and retraining is required until the determined loss value is less than a preset threshold. This indicates that the error between the true value and the predicted value meets the requirements, and the trained encoding neural network unit 101 and the second decoding neural network unit 103 are obtained.

[0050] The sample image can be the single-frame grayscale image mentioned above, or it can be other images. The training objective corresponding to the first decoding neural network unit 102 and the training objective corresponding to the second decoding neural network unit 103 can be the same, for example, both training objectives are to minimize the loss; the training objectives can also be different, for example, the training objective of the first decoding neural network unit 101 is that the loss is less than or equal to a first threshold, and the training objective of the second decoding neural network unit 103 is that the loss is less than or equal to a second threshold, where the first threshold is not equal to the second threshold.

[0051] As can be seen from the above training process, the first decoding neural network unit 102 for predicting the depth map and the second decoding neural network unit 103 for predicting the ground segmentation mask are trained using the same encoding neural network unit 101. The two training branches jointly train the encoding neural network unit 101, thereby realizing the joint training of multi-task neural network units.

[0052] The encoding neural network unit 101 and the first decoding neural network unit 102 can form a monocular depth estimation network to obtain the depth map of the image; the encoding neural network unit 101 and the second decoding neural network unit 103 form an image segmentation network to obtain the ground segmentation mask of the image.

[0053] When performing multi-task joint training, the encoding neural network unit 101 and the first decoding neural network unit 102 can be trained as a whole, or the encoding neural network unit 101 and the second decoding neural network unit 103 can be trained as a whole. During training, the encoding neural network unit 101 and the first decoding neural network unit 102 can be trained first. After obtaining the trained encoding neural network unit 101 and the first decoding neural network unit 102, when training the encoding neural network unit 101 and the second decoding neural network unit 103, the parameters of the encoding neural network unit 101 are no longer adjusted. Instead, the parameters of the second decoding neural network unit 103 are adjusted to obtain the trained encoding neural network unit 100 and the second decoding neural network unit 103.

[0054] Alternatively, the encoding neural network unit 101 and the second decoding neural network unit 103 can be trained first. After the trained encoding neural network unit 101 and the second decoding neural network unit 103 are obtained, when training the encoding neural network unit 101 and the first decoding neural network unit 102, the parameters of the encoding neural network unit 101 are no longer adjusted. Instead, the parameters of the first decoding neural network unit 102 are adjusted to obtain the trained encoding neural network unit 101 and the first decoding neural network unit 102.

[0055] (II) Training of the second neural network unit

[0056] The second neural network unit can be an independent third decoding neural network unit 104. The training process includes: inputting a single-frame grayscale image, the predicted depth map output by the first decoding neural network unit 102, and the predicted ground segmentation mask output by the second decoding neural network unit 103 into the third decoding neural network unit 104 to train the third decoding neural network unit 104. That is, the third decoding neural network unit 104 is used to evaluate the confidence level of the gravity direction calculated from the predicted depth map.

[0057] The third decoding neural network unit 104 can be trained using supervised learning. Specifically, the confidence level (ground truth) for annotation is determined based on a single-frame grayscale image, a predicted depth map, and a predicted ground segmentation mask. Then, a loss value is determined based on the annotation confidence level and the prediction confidence level (predicted value). This loss value represents the error between the ground truth and the predicted value. A larger loss value indicates a larger error between the ground truth and the predicted value, and the more inaccurate the parameters of the third decoding neural network unit 104 are. The parameters of the third decoding neural network unit 104 are adjusted, and retraining is performed until the determined loss value is less than a preset threshold, indicating that the error between the ground truth and the predicted value is small and meets the requirements. This yields the trained third decoding neural network unit 104.

[0058] After completing the above training, the trained first and second neural network units can be integrated into the client of the application software installed on the mobile device for correcting the mobile device's posture. This application software can be any software capable of providing services such as AR navigation and indoor positioning. It is understood that AR navigation and indoor positioning are merely examples of services, not an exhaustive list; any service relying on the mobile device's posture can utilize the technical solutions provided in this disclosure.

[0059] Referring to Figure 2, which is a flowchart of an attitude correction method provided in an embodiment of this disclosure, the method is used to correct the attitude of a mobile device equipped with at least an accelerometer, a camera and a gyroscope.

[0060] The method may include the following steps:

[0061] S201: Input the image captured by the camera into the first trained neural network unit to obtain the depth map and ground segmentation mask of the image.

[0062] Among these features, when users use AR navigation or indoor positioning services provided by software installed on their mobile devices, they can capture images using the camera on their mobile devices.

[0063] As can be seen from the above training process, the first neural network unit after training can perform multi-task processing. Therefore, after inputting an image into the first neural network unit, the first neural network unit can predict the depth map and ground segmentation mask of the image.

[0064] In this embodiment, after obtaining the image captured by the camera, the image can be processed in grayscale, thereby reducing the amount of image data processed by the first neural network unit and improving prediction efficiency.

[0065] In one embodiment of this disclosure, the first neural network unit includes: an encoding neural network unit 101, a first decoding neural network unit 102, and a second decoding neural network unit 103, wherein the first decoding neural network unit 102 and the second decoding neural network unit 103 are respectively cascaded with the encoding neural network unit 101. The steps of obtaining the depth map and ground segmentation mask of the image described above can be implemented as follows:

[0066] An image is input to an encoding neural network unit 101. The encoded feature vector output by the encoding neural network unit 101 is then fed into a first decoding neural network unit 102 and a second decoding neural network unit 103. The first decoding neural network unit 102 outputs the depth map of the image, and the second decoding neural network unit 103 outputs the ground segmentation mask of the image. That is, the first decoding neural network unit 102 and the second decoding neural network unit 103 perform different prediction tasks based on the output of the encoding neural network unit 101.

[0067] S202: Input the image, depth map, and ground segmentation mask into the trained second neural network unit to obtain the confidence level of the ground normal vector corresponding to the depth map.

[0068] The confidence level represents the reliability of the gravity direction (i.e., the ground normal vector) calculated based on the depth map predicted by the first neural network unit and the ground segmentation mask. The confidence level can range from [0, 1]. The higher the confidence level, the higher the reliability of the ground normal vector calculated based on the predicted depth map and the ground segmentation mask.

[0069] S203: Determine the ground normal vector based on the depth map, ground segmentation mask, and camera intrinsic parameters.

[0070] Here, the ground normal vector is a vector describing the plane direction in three-dimensional space, and in this embodiment, it specifically refers to a vector perpendicular to the ground. S202 and S203 of this disclosure can be executed in parallel, or S202 can be executed first, followed by S203, or S203 can be executed first, followed by S202.

[0071] S204: Determine the gain coefficient of the accelerometer based on the accelerometer's measured values, gravitational acceleration, preset adjustment coefficients, and the confidence level of the ground normal vector.

[0072] The gain coefficient of the accelerometer represents the degree of consistency between the accelerometer output signal and the actual direction of gravity.

[0073] In this embodiment, the influence of different factors (such as gravitational acceleration, confidence level of ground normal vector, etc.) is considered when determining the gain coefficient of the accelerometer to ensure the accuracy of the determined gain coefficient of the accelerometer.

[0074] S205: Determine the gain coefficient of the ground normal vector based on the preset adjustment coefficient and the confidence level of the ground normal vector.

[0075] The gain coefficient of the ground normal vector characterizes the reliability of the ground normal vector relative to the actual gravity direction.

[0076] It should be noted that the embodiments of this disclosure do not limit the execution order of steps S204-S205. That is, they can be executed in the order of S204 and S205, or in the order of S205 and S204, or the two steps can be executed simultaneously. The step numbers should not be regarded as a restriction on the execution order. In addition, if step S203 is executed after step S202, it can also be executed in parallel with S204 and S205, or before or after S204 or S205, without affecting the implementation of this disclosure.

[0077] S206: Based on the initial rotation matrix of the AHRS algorithm, the accelerometer measurement, the ground normal vector, the gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector, the attitude parameters of the mobile device in the vertical direction obtained by the AHRS algorithm are corrected.

[0078] The initial rotation matrix in the AHRS algorithm is determined based on the mobile device's attitude, which is typically calculated from data from the device's built-in sensors (such as accelerometers, magnetometers, and gyroscopes) to represent the transformation relationship between the navigation coordinate system and the device's local coordinate system. A common method is to rotate in a certain order, such as first rotating around the Z-axis, then around the X-axis, and finally around the Y-axis, but in practice, different rotation orders can be chosen depending on the specific situation. Since the mobile device's attitude changes continuously with time and spatial position, the corresponding rotation matrix also needs to be updated in real time. In this embodiment, the so-called "initial rotation matrix" specifically refers to the final rotation matrix used in the last execution of the AHRS algorithm.

[0079] In practice, the attitude error of the mobile device in the vertical direction can be calculated based on the initial rotation matrix of the AHRS algorithm, the measurement value of the accelerometer, the ground normal vector, the gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector. The attitude error can then be used to correct the attitude parameters in the vertical direction obtained by the AHRS algorithm.

[0080] The above describes the method provided by this embodiment. Using this method, a ground normal vector can be calculated based on the depth map of the image captured by the mobile device's camera, the ground segmentation mask, and camera intrinsic parameters. Since this ground normal vector is close to the actual direction of gravity, it can be used to correct the measurement error in the vertical direction caused by the accelerometer built into the mobile device during movement. In this way, the accuracy of gyroscope attitude correction using accelerometer measurement data is improved, thereby ensuring the accuracy of the mobile device's vertical attitude determined by the AHRS algorithm.

[0081] In one embodiment of this disclosure, step S206 can be implemented in the following way:

[0082] Based on the initial rotation matrix of the AHRS algorithm, accelerometer measurements, ground normal vector, gravitational acceleration, accelerometer gain coefficient, and ground normal vector gain coefficient, the vertical attitude error of the mobile device is determined. This attitude error is then used to correct the vertical attitude parameters of the mobile device obtained by the AHRS algorithm. In other words, the attitude error is used to compensate for the vertical attitude parameters of the mobile device to obtain an accurate vertical attitude.

[0083] In practical implementation, the attitude error can be determined in the following ways, including:

[0084] Determine the cosine value between the gravitational acceleration and the initial rotation matrix of the AHRS algorithm; determine a first error based on the gain coefficient of the ground normal vector, the ground normal vector, and the cosine value; for example, the first error can be determined by multiplying the ground normal vector, its gain coefficient, and the cosine value. Determine a second error based on the gain coefficient of the accelerometer, the accelerometer measurement, and the cosine value; for example, the second error can be determined by multiplying the accelerometer gain coefficient, its measurement, and the cosine value. Determine the attitude error of the mobile device in the vertical direction based on the first and second errors; for example, the attitude error can be determined by multiplying the first and second errors.

[0085] The above process can be summarized by the following formula: Residual=[Kpd*DepNorm*cos(R,G)]*[Kpa*AccNorm*cos(R,G)]

[0086] Wherein, Residual represents attitude error, Kpd represents the gain coefficient of the ground normal vector, DepNorm represents the ground normal vector, R represents the initial rotation matrix, G represents gravitational acceleration, Kpa represents the gain coefficient of the accelerometer, and AccNorm represents the measured value of the accelerometer.

[0087] In one embodiment of this disclosure, the camera intrinsic parameters include: principal point and focal length. For step S203, based on the depth map, ground segmentation mask, and camera intrinsic parameters, the ground normal vector is determined, which can be implemented in the following way:

[0088] Based on the depth map and ground segmentation mask, the pixels belonging to ground elements in the depth map are identified as target pixels. Based on the pixel coordinates, depth value, principal point coordinates, and focal length of the target pixels, the 3D point coordinates corresponding to the target pixels are determined. Based on the 3D point coordinates of at least three target pixels, the ground normal vector is determined.

[0089] In this embodiment, since each pixel in the ground segmentation mask has a label indicating whether it is a ground element, the pixels in the depth map can be compared with the pixels in the ground segmentation mask to determine the pixels in the depth map that belong to ground elements as target pixels. There can be multiple pixels in the depth map that belong to ground elements.

[0090] In one possible implementation, to improve the accuracy of target pixels determined based on depth maps and ground segmentation masks, determining pixels belonging to ground elements in the depth map as target pixels based on depth maps and ground segmentation masks can be implemented as follows: determining initial pixels belonging to ground elements in the depth map based on depth maps and ground segmentation masks; filtering the initial pixels based on random sample consensus algorithms to obtain target pixels.

[0091] The Random Sample Consensus (RANSAC) algorithm can calculate the mathematical model parameters of a dataset containing outliers to obtain valid data. In other words, RANSAC can remove noise points from the initial pixel set to obtain the target pixels belonging to the ground elements.

[0092] In one implementation, the three-dimensional point coordinates corresponding to the target pixel are determined based on the pixel coordinates, depth value, principal point coordinates, and focal length. This can be achieved through the following implementation:

[0093] The coordinates of the principal pixel are pixel coordinates, which can be represented as (u0, v0). The coordinates of the target pixel are also pixel coordinates, which can be represented as (u, v). The coordinate systems of the principal and target pixels are the same. For example, when u represents the pixel width and v represents the pixel height of the target pixel, then u0 of the principal pixel coordinates also represents the pixel width and v0 also represents the pixel height. The 3D coordinates of the target pixel can be represented as (x, y, z).

[0094] Given the pixel coordinates of the principal pixel and the target pixel, the 3D coordinates of the target pixel can be determined using the following method:

[0095] Determine the first difference between the target pixel u and the principal pixel u0. Based on the ratio of this first difference to the focal length, determine the x coordinate in the 3D point coordinates. For example, the ratio of the first difference to the focal length is determined as x.

[0096] Similarly, determine the second difference between the target pixel's v and the principal pixel's v0, and then determine the y coordinate in the 3D point coordinates based on the ratio of this second difference to the focal length. For example, the ratio of the second difference to the focal length is used to determine y.

[0097] The z-coordinate in 3D point coordinates is determined based on the depth value of the target pixel. For example, the depth value corresponding to the target pixel is determined as z. Here, the x, y, and z coordinates of the 3D point can represent the coordinate values ​​corresponding to the x-axis, y-axis, and z-axis in 3D space.

[0098] The above process can be expressed by the following calculation formula:

[0099] Wherein, the three-dimensional point coordinates of the target pixel are represented as (x, y, z), the pixel coordinates of the principal point are represented as (u0, v0), the focal length is represented as f, the pixel coordinates of the target pixel are represented as (u, v), and d is the depth value.

[0100] Using the above implementation method, the 3D point coordinates corresponding to the target pixel in the depth map can be determined, that is, the 3D point cloud reconstruction of the image can be completed.

[0101] After obtaining the 3D coordinates of the target pixels, determining the ground normal vector based on the 3D coordinates of at least three target pixels can be implemented as follows:

[0102] Based on the three-dimensional plane equation, the three-dimensional point coordinates of at least three target pixels are constructed into a matrix; the ground normal vector is obtained by solving the matrix.

[0103] The three-dimensional plane equation can be expressed as z = ax + by + c, and the three-dimensional coordinates of the target pixel are represented as (x, y, z). The three-dimensional plane equation includes three unknowns a, b, and c, therefore, the three-dimensional coordinates of at least three target pixels are needed to solve for the unknowns.

[0104] Specifically, the matrix will be constructed based on the 3D point coordinates and 3D plane equations of at least three target pixels as follows: Among them, A, B, The dimension of A and B is related to the number of target pixels selected. Taking three target pixels as an example, the three-dimensional coordinates of the three target pixels are (x1, y1, z1), (x2, y2, z2), and (x3, y3, z3), respectively. Then A and B can be represented in the following form:

[0105] According to the equation of a three-dimensional plane, the ground normal vector is... It can be represented as Since A and B are known vectors, by performing singular value decomposition on A, the specific value of vector N can be calculated, thus obtaining the ground normal vector.

[0106] Similarly, when selecting k target pixels to calculate the ground normal vector, A can be represented as a k*k matrix, and B as a k*1 matrix. It is represented as a k*1 matrix.

[0107] For step S204, based on the accelerometer's measured value, gravitational acceleration, preset adjustment coefficient, and confidence level of the ground normal vector, the accelerometer's gain coefficient is determined. This can be achieved in the following way:

[0108] The absolute value of the difference between the accelerometer measurement and the gravitational acceleration is calculated; the confidence level of the ground normal vector is subtracted from a set constant to obtain the first parameter value; the product of the first adjustment coefficient, the absolute value of the difference, and the first parameter value is obtained, and the sum of the product and the second adjustment coefficient is used as the gain coefficient of the accelerometer. The specific value of the set constant is determined according to the actual application scenario, and this embodiment does not limit it. For example, if the confidence level ranges from 0 to 1, the set constant can be 1.

[0109] Specifically, the above process can be expressed by the following formula: Kpa=α*(1-Conf)*|AccNorm-G|+β

[0110] Where Kpa represents the gain coefficient of the accelerometer, α represents the first adjustment coefficient, β represents the second adjustment coefficient, conf represents the confidence level, AccNorm represents the measured value of the accelerometer, G represents the gravitational acceleration, and 1 is a set constant.

[0111] For step S205, based on the preset adjustment coefficient and the confidence level of the ground normal vector, the gain coefficient of the ground normal vector is determined, which can be achieved in the following way:

[0112] Obtain the product of the first adjustment coefficient and the confidence level of the ground normal vector; use the sum of the product and the second adjustment coefficient as the gain coefficient of the ground normal vector.

[0113] Specifically, the above process can be expressed by the following formula: Kpd=α*Conf+β

[0114] Where Kpd represents the gain coefficient of the ground normal vector, α represents the first adjustment coefficient, β represents the second adjustment coefficient, and Conf represents the confidence level.

[0115] In this embodiment, the specific values ​​of the first adjustment coefficient and the second adjustment coefficient are not limited; they can be the same or different, and can be determined according to the actual application scenario. For example, a = 0.001 and b = 0.001 can be set.

[0116] It should be noted that the implementation of determining the gain coefficient of the accelerometer and the gain coefficient of the ground normal vector provided in the above embodiments is only an exemplary illustration and is not limited to the above calculation method. Variations conceived based on the above implementation method are also within the protection scope of this disclosure.

[0117] The method provided in this disclosure is based on images captured by a camera built into a mobile device, obtaining a depth map and a ground segmentation mask. Based on the depth map and the ground segmentation mask, a ground normal vector representing gravitational acceleration is obtained. Since most mobile devices are equipped with (built into) cameras, this disclosure ensures the accuracy of the vertical attitude parameters of the mobile device obtained by the AHRS algorithm without increasing the hardware cost of the mobile device.

[0118] To facilitate understanding of the overall technical framework of the attitude correction scheme provided in this disclosure, the technical principle of the attitude correction provided in this disclosure will be introduced below with reference to the schematic diagram of the attitude correction technical framework shown in Figure 3.

[0119] Figure 3 is a schematic diagram of an attitude correction technology framework provided in an embodiment of this disclosure. As shown in Figure 3, the image captured by the camera of the mobile device is input into the encoding neural unit 101 to obtain the encoded feature vector. Then, the encoded feature vector is input into the first decoding neural network unit 102 and the second decoding neural network unit 103 respectively. The first decoding neural network unit 102 outputs the depth map of the image, and the second decoding neural network unit 103 outputs the ground segmentation mask of the image. The image, the corresponding depth map, and the ground segmentation mask are then input into the third decoding neural network 104, which outputs the confidence level of the ground normal vector. The ground normal vector is calculated based on the depth map, the ground segmentation mask, and the camera intrinsic parameters. Then, by combining the ground normal vector, the accelerometer measurement value, and the confidence level of the ground normal vector, the attitude error of the mobile device in the vertical direction is calculated, and the attitude parameters of the mobile device in the vertical direction obtained by the AHRS algorithm are corrected based on the attitude error.

[0120] Based on the above method embodiments, this disclosure provides an attitude correction device and an electronic device, which will be described below with reference to the accompanying drawings.

[0121] Referring to Figure 4, which is a structural diagram of an attitude correction device provided in an embodiment of the present disclosure, the device 400 is applied to attitude correction of a mobile device equipped with at least an accelerometer, a camera and a gyroscope, and includes: an acquisition unit 401, a determination unit 402 and a correction unit 403.

[0122] The acquisition unit 401 is used to input the image captured by the camera into the first trained neural network unit to obtain the depth map and ground segmentation mask of the image;

[0123] The acquisition unit 401 is further configured to input the image, the depth map, and the ground segmentation mask into a trained second neural network unit to obtain the confidence level of the ground normal vector corresponding to the depth map;

[0124] The determining unit 402 is used to determine the ground normal vector based on the depth map, the ground segmentation mask, and the camera intrinsic parameters;

[0125] The determining unit 402 is used to determine the gain coefficient of the accelerometer based on the measured value of the accelerometer, the gravitational acceleration, the preset adjustment coefficient, and the confidence level of the ground normal vector.

[0126] The determining unit 402 is further configured to determine the gain coefficient of the ground normal vector based on the preset adjustment coefficient and the confidence level of the ground normal vector;

[0127] The correction unit 403 is used to correct the vertical attitude parameters of the mobile device obtained by the AHRS algorithm based on the initial rotation matrix of the AHRS algorithm, the measurement value of the accelerometer, the ground normal vector, the gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector.

[0128] In some embodiments, the correction unit 403 is specifically used to determine the attitude error of the mobile device in the vertical direction based on the initial rotation matrix based on the AHRS algorithm, the measurement value of the accelerometer, the ground normal vector, the gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector; and to correct the attitude parameters of the mobile device in the vertical direction obtained by the AHRS algorithm using the attitude error.

[0129] In some embodiments, the correction unit 403 is specifically used to determine the cosine value of the gravitational acceleration and the initial rotation matrix of the AHRS algorithm; determine a first error based on the gain coefficient of the ground normal vector, the ground normal vector, and the cosine value; determine a second error based on the gain coefficient of the accelerometer, the measured value of the accelerometer, and the cosine value; and determine the attitude error based on the first error and the second error.

[0130] In some implementations, the determining unit 402 is specifically used to determine, based on the depth map and the ground segmentation mask, pixels belonging to ground elements in the depth map as target pixels; determine the three-dimensional point coordinates corresponding to the target pixels based on the pixel coordinates of the target pixels, the depth value of the target pixels, the coordinates of the principal point and the focal length; and determine the ground normal vector based on the three-dimensional point coordinates of at least three target pixels.

[0131] In some implementations, the determining unit 402 is specifically used to construct a matrix of the three-dimensional point coordinates of the at least three target pixels based on the three-dimensional plane equation; and to solve the matrix to obtain the ground normal vector.

[0132] In some implementations, the determining unit 402 is specifically used to determine the initial pixel points belonging to ground elements in the depth map based on the depth map and the ground segmentation mask; and to filter the initial pixel points based on a random sample consensus algorithm to obtain the target pixel points.

[0133] In some embodiments, the adjustment coefficient includes: a first adjustment coefficient and a second adjustment coefficient. The determining unit 402 is specifically used to calculate the absolute value of the difference between the measured value of the accelerometer and the gravitational acceleration; subtract the confidence level of the ground normal vector from a set constant to obtain a first parameter value; obtain the product of the first adjustment coefficient, the absolute value, and the first parameter value; and use the sum of the product and the second adjustment coefficient as the gain coefficient of the accelerometer.

[0134] In some embodiments, the adjustment coefficient includes a first adjustment coefficient and a second adjustment coefficient. In some embodiments, the determining unit 402 is specifically used to obtain the product of the first adjustment coefficient and the confidence level of the ground normal vector; and use the sum of the product and the second adjustment coefficient as the gain coefficient of the ground normal vector.

[0135] In some embodiments, the first neural network unit includes an encoding neural network unit, a first decoding neural network unit, and a second decoding neural network unit. The first decoding neural network unit and the second decoding neural network unit are respectively cascaded with the encoding neural network unit. The acquisition unit 401 is specifically used to input the image into the encoding neural network unit. The encoded feature vector output by the encoding neural network unit is respectively input into the first decoding neural network unit and the second decoding neural network unit. The first decoding neural network unit outputs the depth map of the image, and the second decoding neural network unit outputs the ground segmentation mask of the image.

[0136] In some implementations, the training process of the first neural network unit includes:

[0137] A single-frame grayscale image with labeled depth values ​​is input into the cascaded encoding neural network unit and the first decoding neural network unit. Based on the loss between the depth value of the predicted depth map of the grayscale image output by the first decoding neural network unit and the labeled depth value of the grayscale image, the encoding neural network unit and the first decoding neural network unit are iteratively trained until the loss reaches the preset training target.

[0138] The sample image and the ground segmentation mask corresponding to the sample image are input into the cascaded encoding neural network unit and the second encoding neural network unit. Based on the loss of the predicted ground segmentation mask of the sample image output by the second encoding neural network unit and the ground segmentation mask corresponding to the sample image, the encoding neural network and the second decoding neural network are iteratively trained until the loss reaches the preset training target.

[0139] In some embodiments, the second neural network unit specifically includes a third decoding neural network unit. The training process of the third decoding neural network unit includes: inputting the single-frame grayscale image, the predicted depth map output by the first decoding neural network unit, and the predicted ground segmentation mask output by the second decoding neural network unit into the third decoding neural network unit, training the third decoding neural network unit, and the third decoding neural network unit outputting the confidence level of the ground normal vector.

[0140] It should be noted that the specific implementation of each unit in the above device embodiments can be found in the relevant descriptions in the above method embodiments. The division of units in this disclosure is illustrative and only represents a logical functional division; in actual implementation, there may be other division methods. The functional units in this disclosure can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. For example, in the above embodiments, the processing unit and the sending unit can be the same unit or different units. The integrated units can be implemented in hardware or as software functional units.

[0141] Based on the attitude correction method provided in the above method embodiments, this disclosure also provides an electronic device, including: one or more processors; a memory storing one or more program codes thereon, wherein when the one or more program codes are executed by the one or more processors, the one or more processors implement the method described in any of the above embodiments.

[0142] Referring now to FIG5, a schematic diagram of the structure of an electronic device 500 suitable for implementing embodiments of the present disclosure is shown. The mobile devices in embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (portable Android devices), PMPs (Portable Media Players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs (Televisions), desktop computers, etc. The electronic device shown in FIG5 is merely an example and should not impose any limitation on the functionality and scope of use of embodiments of the present disclosure.

[0143] As shown in Figure 5, the electronic device 500 may include a processing unit (e.g., a central processing unit, a graphics processing unit, etc.) 501, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 502 or a program loaded from storage device 508 into random access memory (RAM) 503. The RAM 503 also stores various programs and data required for the operation of the electronic device 500. The processing unit 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.

[0144] Typically, the following devices can be connected to I / O interface 505: input devices 506 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 507 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 508 including, for example, magnetic tapes, hard disks, etc.; and communication devices 509. Communication device 509 allows electronic device 500 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 5 shows an electronic device 500 with various devices, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively.

[0145] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 509, or installed from a storage device 508, or installed from a ROM 502. When the computer program is executed by the processing device 501, it performs the functions defined in the methods of embodiments of this disclosure.

[0146] The electronic device provided in this embodiment and the attitude correction method provided in the above embodiments belong to the same inventive concept. Technical details not described in detail in this embodiment can be found in the above embodiments, and this embodiment has the same beneficial effects as the above embodiments.

[0147] Based on the attitude correction method provided in the above embodiments, this disclosure provides a computer-readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the attitude correction method as described in any of the above embodiments.

[0148] It should be noted that the computer-readable medium described above in this disclosure can be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this disclosure, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. The transmitted data signal can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. The computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (Radio Frequency), etc., or any suitable combination thereof.

[0149] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device.

[0150] The aforementioned computer-readable medium carries one or more programs, which, when executed by the electronic device, cause the electronic device to perform the aforementioned attitude correction method.

[0151] The units described in the embodiments of this disclosure can be implemented in software or in hardware. The names of the units / modules do not necessarily limit the unit itself; for example, a voice data acquisition module can also be described as a "data acquisition module".

[0152] The functions described above in this document can be performed, at least in part, by one or more hardware logic components. For example, exemplary types of hardware logic components that can be used, without limitation, include: Field Programmable Gate Array (FPGA), Application-Specific Integrated Circuit (ASIC), Application-Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), and so on.

[0153] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the systems or apparatus disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple, and relevant parts can be referred to the method section.

[0154] It should be understood that in this disclosure, "at least one item" means one or more, and "more than one" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0155] It should also be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0156] The above description of the disclosed embodiments enables those skilled in the art to make or use this disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this disclosure. Therefore, this disclosure is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An attitude correction method, wherein, The method for correcting the attitude of a mobile device equipped with at least an accelerometer, a camera, and a gyroscope includes: The image captured by the camera is input into the first trained neural network unit to obtain the depth map and ground segmentation mask of the image; The image, the depth map, and the ground segmentation mask are input into a trained second neural network unit to obtain the confidence level of the ground normal vector corresponding to the depth map. Based on the depth map, the ground segmentation mask, and the camera intrinsic parameters, the ground normal vector is determined; The gain coefficient of the accelerometer is determined based on the measured value of the accelerometer, the gravitational acceleration, the preset adjustment coefficient, and the confidence level of the ground normal vector. Based on the preset adjustment coefficient and the confidence level of the ground normal vector, the gain coefficient of the ground normal vector is determined; The attitude parameters of the mobile device in the vertical direction, obtained by the AHRS algorithm, are corrected based on the initial rotation matrix of the AHRS algorithm, the accelerometer measurement, the ground normal vector, the gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector.

2. The method according to claim 1, wherein, The initial rotation matrix based on the AHRS algorithm, the accelerometer measurements, the ground normal vector, gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector are used to correct the vertical attitude parameters of the mobile device obtained by the AHRS algorithm, including: The attitude error of the mobile device in the vertical direction is determined based on the initial rotation matrix of the AHRS algorithm, the measurement value of the accelerometer, the ground normal vector, the gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector. The attitude error is used to correct the vertical attitude parameters of the mobile device obtained by the AHRS algorithm.

3. The method according to claim 2, wherein, The initial rotation matrix based on the AHRS algorithm, the accelerometer measurements, the ground normal vector, gravitational acceleration, the gain coefficient of the accelerometer, and the gain coefficient of the ground normal vector are used to determine the attitude error of the mobile device in the vertical direction, including: Determine the cosine value of the gravitational acceleration and the initial rotation matrix of the AHRS algorithm; The first error is determined based on the gain coefficient of the ground normal vector, the ground normal vector, and the cosine value; The second error is determined based on the gain coefficient of the accelerometer, the measured value of the accelerometer, and the cosine value; Based on the first error and the second error, the attitude error of the mobile device in the vertical direction is determined.

4. The method according to any one of claims 1-3, wherein, The camera intrinsic parameters include the principal point and focal length. The determination of the ground normal vector based on the depth map, the ground segmentation mask, and the camera intrinsic parameters includes: Based on the depth map and the ground segmentation mask, the pixels in the depth map that belong to ground elements are determined as target pixels; Based on the pixel coordinates of the target pixel, the depth value of the target pixel, the coordinates of the principal point, and the focal length, the three-dimensional point coordinates corresponding to the target pixel are determined. Determine the ground normal vector based on the 3D point coordinates of at least three target pixels.

5. The method according to any one of claims 1-4, wherein, The adjustment coefficients include: a first adjustment coefficient and a second adjustment coefficient. Based on the accelerometer's measured values, gravitational acceleration, preset adjustment coefficients, and the confidence level of the ground normal vector, the gain coefficient of the accelerometer is determined, including: Calculate the absolute value of the difference between the accelerometer reading and the gravitational acceleration; The first parameter value is obtained by subtracting the confidence level of the ground normal vector from the set constant. Obtain the product of the first adjustment coefficient, the absolute value, and the first parameter value; The sum of the product and the second adjustment coefficient is used as the gain coefficient of the accelerometer.

6. The method according to any one of claims 1-4, wherein, The adjustment coefficients include: a first adjustment coefficient and a second adjustment coefficient. Determining the gain coefficient of the ground normal vector based on the preset adjustment coefficients and the confidence level of the ground normal vector includes: Obtain the product of the first adjustment coefficient and the confidence level of the ground normal vector; The sum of the product and the second adjustment coefficient is used as the gain coefficient of the ground normal vector.

7. The method according to any one of claims 1-6, wherein, The first neural network unit includes: an encoding neural network unit, a first decoding neural network unit, and a second decoding neural network unit. The first decoding neural network unit and the second decoding neural network unit are respectively cascaded with the encoding neural network unit. The step of inputting the image captured by the camera into the trained first neural network unit to obtain the depth map and ground segmentation mask of the image includes: The image is input into the encoding neural network unit, and the encoded feature vector output by the encoding neural network unit is input into the first decoding neural network unit and the second decoding neural network unit respectively. The first decoding neural network unit outputs the depth map of the image, and the second decoding neural network unit outputs the ground segmentation mask of the image.

8. The method according to claim 7, wherein, The training process of the first neural network unit includes: A single-frame grayscale image with labeled depth values ​​is input into the cascaded encoding neural network unit and the first decoding neural network unit. Based on the loss between the depth value of the predicted depth map of the grayscale image output by the first decoding neural network unit and the labeled depth value of the grayscale image, the encoding neural network unit and the first decoding neural network unit are iteratively trained until the loss reaches the preset training target. The sample image and the ground segmentation mask corresponding to the sample image are input into the cascaded encoding neural network unit and the second encoding neural network unit. Based on the loss of the predicted ground segmentation mask of the sample image output by the second encoding neural network unit and the ground segmentation mask corresponding to the sample image, the encoding neural network and the second decoding neural network are iteratively trained until the loss reaches the preset training target.

9. The method according to claim 8, wherein, The second neural network unit specifically includes a third decoding neural network unit, and the training process of the third decoding neural network unit includes: The single-frame grayscale image, the predicted depth map output by the first decoding neural network unit, and the predicted ground segmentation mask output by the second decoding neural network unit are input into the third decoding neural network unit to train the third decoding neural network unit, and the third decoding neural network unit outputs the confidence level of the ground normal vector.

10. An electronic device, wherein, The electronic device includes: a memory and a processor; The memory is used to store the relevant program code; The processor is used to call the program code to execute the attitude correction method according to any one of claims 1-9.

11. A computer-readable storage medium, wherein, The computer-readable storage medium is used to store a computer program for performing the attitude correction method according to any one of claims 1-9.

12. A computer program product, wherein, When the computer program product is executed by the processor, it implements the attitude correction method as described in any one of claims 1-9.