Method for classifying skin burn depth based on wound image features
By acquiring video sequences using a monocular imaging device, generating a mask, and solving the homography matrix, a wound depth undulation distribution field is constructed, solving the problem of inaccurate measurements by handheld devices and achieving high-precision wound area correction and burn depth judgment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DALIAN LUQIAO TECH CO LTD
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, when handheld devices take photos of wounds, the wound area measurement is inaccurate due to perspective projection compression, which affects the judgment of burn depth. In addition, professional 3D scanning equipment is complicated to operate and difficult to popularize.
Video sequences are acquired using a monocular imaging device in a free-moving state. By generating masks for the wound and the label, the homography matrix is calculated to construct the wound depth undulation distribution field. The wound area is then corrected to eliminate perspective distortion, thus achieving high-precision area measurement.
It improves the stability and accuracy of wound area measurement, provides quantifiable information on wound depth, and enhances the reliability and consistency of burn depth classification.
Smart Images

Figure CN122289355A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical burn technology, specifically to a method for classifying skin burn depth based on wound image features. Background Technology
[0002] Accurate assessment of burn wound depth is crucial for developing a treatment plan. Superficial burns (such as superficial second-degree burns) retain some dermal appendages and can heal spontaneously through epithelial cell proliferation, resulting in a sustained and significant shrinkage of the wound area. In contrast, deep burns (such as deep second-degree or third-degree burns) involve full-thickness skin necrosis and lack a source of regeneration, often resulting in stagnant wound area shrinkage or repeated fluctuations in area due to infection. Therefore, the true rate of change of wound area over time is an important objective indicator for determining burn depth.
[0003] Currently, clinical practice primarily uses two-dimensional imaging to measure wound area. This involves taking a handheld photograph of the wound with a scale bar and estimating the area by counting the number of pixels. However, the human body surface is an irregular curved surface, and when taking handheld photographs at the bedside, it is difficult to align the optical axis of the device perpendicular to the wound. This easily leads to perspective compression, resulting in a measured value that is smaller than the actual unfolded area. This also masks the early, subtle healing and contraction trends of the wound, affecting the accurate differentiation of burn depth. While professional three-dimensional scanning equipment can solve this problem, it is expensive and complex to operate, making it impractical for routine, high-frequency bedside follow-ups. In conclusion, how to utilize convenient handheld imaging devices to eliminate projection distortion, obtain high-precision true unfolded wound area, and thus achieve objective auxiliary assessment of burn depth is a pressing technical problem that needs to be solved. Summary of the Invention
[0004] To address the challenge of accurately identifying wound area in scenarios where two-dimensional image methods are used to measure wound area, due to perspective projection compression caused by handheld devices, this invention provides a skin burn depth classification method based on wound image features. The specific technical solution adopted is as follows: This invention proposes a method for classifying skin burn depth based on wound image features, the method comprising: Acquire video sequences; wherein the video sequences are acquired by a monocular imaging device in a free-moving state, and the sequences include the wound area and rigid planar labels; Select a frame from the sequence as a reference frame; divide the reference frame into regions to generate a first mask for the wound area and a second mask for the label; determine the unit physical area corresponding to a unit pixel based on the label image in the second mask and the known physical area of the label; detect target feature points within the first mask and track the actual image coordinates of the target feature points in each frame of the sequence. Solve the homography matrix from the reference frame to each frame in the sequence; based on the homography matrix, map the coordinates of each target feature point in the reference frame to each frame to obtain the planar projection coordinates of each target feature point in each frame; combine the planar projection coordinates with the actual image coordinates to generate the depth undulation distribution field covering the wound area; determine the corrected unfolded area of the wound area according to the spatial change rate of the depth undulation distribution field and the unit physical area. Based on the corrected unfolded area and time interval of two consecutive visits to the same wound, the average daily area shrinkage rate is determined; the average daily area shrinkage rate is compared with a preset judgment threshold, and auxiliary judgment information on wound depth is output based on the comparison result.
[0005] Further, the acquisition of the video sequence includes: Acquire a video sequence that meets preset acquisition conditions; wherein the preset acquisition conditions include at least the following conditions: each frame of the video sequence fully contains the rigid planar label; during the acquisition of the video sequence, the wound area has no discernible displacement relative to the bony landmark; the movement of the imaging device is mainly a displacement parallel to the image plane, and the displacement amplitude in the direction perpendicular to the image plane is smaller than that in the parallel direction.
[0006] Further, the step of dividing the reference frame into regions to generate a first mask for the wound region and a second mask for the label includes: Image analysis was performed on the baseline frame to identify pixel regions belonging to the wound area and pixel regions belonging to the rigid planar label. Based on the recognition results, a first mask and a second mask with the same size as the reference frame are generated; wherein, the first mask is used to mark the pixel position of the wound, and the second mask is used to mark the pixel position of the label.
[0007] Furthermore, the second mask is a binary image, wherein the region with a pixel value of 1 is used to locate and extract the image content of the rigid planar label from the reference frame, as the label image; The process of determining the unit physical area includes: Within the area covered by the second mask, the pixel vertex coordinates of the four vertices of the label in the label image are detected and optimized to the sub-pixel level; based on the known geometry of the label, the coordinates of the four standard vertices of the label in the frontal view are set; based on the correspondence between the four pixel vertex coordinates and the four standard vertex coordinates, the perspective correction matrix is calculated. The image of the label in the second mask is transformed using a perspective correction matrix and mapped to a standard geometric image from the frontal view. Binarize the standard geometric image to generate a binary image corresponding to the label; The number of pixels belonging to the label in the binary image corresponding to the label is counted as the pixel count; the known physical area of the label is divided by the pixel count to obtain the unit physical area represented by each pixel.
[0008] Furthermore, the first mask is a binary image, wherein a region with a pixel value of 1 indicates that the pixel location belongs to the wound region, and a region with a pixel value of 0 indicates that the pixel location does not belong to the wound region; The step of detecting target feature points within the first mask and tracking the actual image coordinates of the target feature points in each frame of the sequence includes: Within the area covered by the first mask of the reference frame, multiple target feature points with obvious image texture are detected, and the coordinates of the target feature points are optimized to the sub-pixel level. For each frame in the video sequence other than the reference frame, a fixed-size image block is extracted centered on the position of the target feature point in the previous frame as a reference image block to be matched, and a local search window is set centered on the position of the target feature point in the previous frame. Within the local search window, candidate image patches of the same size as the reference image patch are extracted sequentially from left to right and top to bottom; the target image patch that is most similar to the reference image patch is then found from the candidate image patches. The center position of the target image block is determined as the actual image coordinates of the target feature point in the current frame.
[0009] Furthermore, within the area covered by the first mask of the reference frame, multiple target feature points with obvious image textures are detected, including: For each pixel position within the coverage area of the first mask, calculate the absolute difference between the pixel position and its right-side adjacent pixel as the degree of grayscale change of the pixel position in the horizontal direction; calculate the absolute difference between the pixel position and its lower-side adjacent pixel as the degree of grayscale change of the pixel position in the vertical direction; calculate the absolute difference between the pixel position and its lower-right adjacent pixel as the degree of grayscale change of the pixel position in the diagonal direction. Compare the grayscale changes of the pixel position in the horizontal, vertical and diagonal directions, and select the largest value as the response value of the pixel position. Pixel locations whose response values exceed a preset threshold are selected as candidate feature points; Calculate the Euclidean distance between adjacent candidate feature points; Candidate feature points whose Euclidean distance is less than a preset distance threshold are removed to obtain the final target feature points.
[0010] Furthermore, the calculation of the homography matrix from the reference frame to each frame of the sequence includes: Within the area covered by the second mask of the reference frame, multiple reference feature points with obvious image textures are detected, and the coordinates of the reference feature points are optimized to the sub-pixel level. For each frame in the video sequence other than the reference frame, track the reference image coordinates of each reference feature point in the current frame; Based on the correspondence between the coordinates of the reference feature points in the reference frame and the coordinates of the reference image in the current frame, the homography matrix from the reference frame to the current frame is calculated.
[0011] Furthermore, the process of obtaining the planar projected coordinates includes: For each target feature point detected within the first mask, its coordinates in the reference frame are obtained; For each frame in the video sequence other than the reference frame, obtain the homography matrix from the reference frame to the current frame; The two-dimensional coordinates of the target feature point in the reference frame are represented as a format containing three values, where the first two values are the x and y coordinates of the target feature point, and the third value is set as a first fixed constant; the format containing three values is multiplied with the homography matrix from the reference frame to the current frame to obtain a new calculation result containing three values. The calculation results are normalized and converted into two-dimensional coordinates in the current frame image coordinate system, which are used as the planar projection coordinates of the target feature point in the current frame. The planar projection coordinates are used to represent the theoretical position that the target feature point should appear in the current frame image if it is located on the plane of the label.
[0012] Furthermore, the step of combining planar projected coordinates with actual image coordinates to generate a depth undulation distribution field covering the wound area includes: For each tracked target feature point, obtain the planar projection coordinates and actual image coordinates of the target feature point in each frame, calculate the Euclidean distance between the two, and use it as the original deviation value. For each frame, the camera translation vector is decomposed from the homography matrix corresponding to the frame, and the magnitude of the camera translation vector is calculated as the motion amplitude factor of the frame; the original deviation value of each target feature point in the frame is divided by the motion amplitude factor of the frame to obtain the normalized local depth deviation value. Calculate the arithmetic mean of the local depth deviation values of the same target feature point in all frames, and use it as the depth fluctuation index of the target feature point. Based on the depth fluctuation index of all target feature points and their pixel positions in the reference frame, the depth fluctuation index corresponding to each pixel position in the wound area is estimated by spatial interpolation. Spatial interpolation refers to estimating the depth fluctuation index of unknown pixel positions based on the known depth fluctuation index of the target feature points, according to the principle that the closer the distance, the greater the influence. The depth undulation indicators of each pixel location are combined to form a two-dimensional array with the same size as the reference frame, which serves as the depth undulation distribution field covering the wound area.
[0013] Furthermore, the process for determining the corrected unfolded area includes: For each pixel location within the first mask of the wound area, calculate the horizontal and vertical partial derivatives of the depth undulation distribution field at the pixel location; Squaring the partial derivative in the horizontal direction yields the squared value in the horizontal direction; squaring the partial derivative in the vertical direction yields the squared value in the vertical direction; adding the squared values in the horizontal and vertical directions gives the spatial rate of change at the pixel location. Multiply the spatial rate of change by the pre-calibrated mapping coefficient to obtain an intermediate value; calculate the sum of the intermediate value and the second fixed constant as the sum value, and perform a square root operation on the sum value to obtain the magnification factor; Multiply the unit physical area by the magnification factor to obtain the local area after pixel position correction; Traverse all pixel positions within the first mask of the wound area, and sum up the corrected local areas of each pixel position. The sum is taken as the corrected unfolded area of the wound area.
[0014] The present invention has the following beneficial effects: This invention generates masks for the wound and the label in a reference frame, establishes a mapping relationship between unit pixels and unit physical area using the known physical area of the label, eliminates errors caused by image scale and shooting distance, and provides a reliable physical benchmark for subsequent area calculation. By solving the homography matrix from the reference frame to each frame, the planar projection coordinates of the target feature points are obtained, and then compared with the actual image coordinates to construct a depth undulation distribution field covering the entire wound, realizing the recovery of three-dimensional wound information from ordinary two-dimensional video and solving the problem that static images cannot reflect the unevenness of the wound. The unit physical area is corrected pixel by pixel according to the spatial change rate of the depth undulation distribution field, eliminating the influence of wound unevenness, tilt, and perspective distortion on area calculation, and obtaining a high-precision corrected unfolded area, which significantly improves the stability and accuracy of wound area measurement. Finally, the daily average area shrinkage rate is calculated based on the corrected unfolded area and time interval of two adjacent visits to the same wound, and wound depth auxiliary judgment information is output by comparing with a preset threshold, transforming empirical judgment into a quantifiable and reproducible objective indicator, improving the consistency and reliability of subsequent burn depth classification judgment. Attached Figure Description
[0015] To more clearly illustrate the technical solutions and advantages in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0016] Figure 1 A flowchart of a skin burn depth classification method based on wound image features is provided as an embodiment of the present invention; Figure 2 An example diagram illustrating the process of determining unit physical area provided in one embodiment of the present invention; Figure 3 This is an example diagram illustrating the homography matrix solving process provided in one embodiment of the present invention. Detailed Implementation
[0017] To further illustrate the technical means and effects adopted by the present invention to achieve its intended purpose, the following, in conjunction with the accompanying drawings and preferred embodiments, details the specific implementation, structure, features, and effects of a skin burn depth classification method based on wound image features proposed according to the present invention. In the following description, different "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined in any suitable form.
[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0019] The following description, in conjunction with the accompanying drawings, details a specific scheme for a skin burn depth classification method based on wound image features provided by the present invention.
[0020] Please see Figure 1 The diagram illustrates a flowchart of a skin burn depth classification method based on wound image features according to an embodiment of the present invention. The method includes: S101: Acquire video sequence; wherein the video sequence is acquired by a monocular imaging device in a free-moving state, and the sequence includes the wound area and rigid planar label.
[0021] Monocular imaging devices refer to imaging devices that contain only a single camera, such as smartphones, medical handheld cameras, tablets, and other everyday portable devices.
[0022] Free-moving state refers to the imaging device being unrestrained by external mechanical structures during video acquisition, allowing the operator to hold the device and take pictures. This corresponds to the real-world scenario in clinical practice where doctors or nurses use mobile phones or cameras to take photos of patients' wounds.
[0023] It is important to emphasize that the "free movement state" described in this invention, while not dependent on external mechanical constraints, does not imply arbitrary shaking. Since the parallax information generated by translational motion has a monotonic correspondence with object depth, while the image changes generated by rotational motion are independent of depth, and forward / backward stretching motion leads to changes in object scale, disrupting the cross-frame mapping stability of feature points. Therefore, in clinical practice, the operator only needs to hold the device, align it with the wound, and slowly move the phone left / right or up / down within 1-2 seconds. This satisfies the requirement that the imaging device's movement is primarily translational parallel to the image plane. This operational procedure is simple and easy to follow, requiring no additional training, ensuring data quality while also considering clinical practicality, and avoiding large-scale rotational movements.
[0024] A rigid planar label is a flat object with a known physical area and a regular geometric shape (such as a square). For example, it can be made of thin medical-grade stainless steel or rigid plastic sheet with a standard pattern (such as a checkerboard) etched or printed on the surface. When used, it is placed next to the wound and temporarily secured with medical tape.
[0025] In this invention, the rigid planar label serves three functions: First, by using the known physical area of the label, the number of pixels in the image can be converted into the actual physical area, thus achieving scale calibration from image space to physical space. Second, the plane on which the label is located serves as a zero-depth reference plane for measuring the undulations of the wound, so as to subsequently know the degree of deviation of the wound area from the plane on which the label is located, i.e., the wound depth undulation information. Third, as a rigid plane, the image changes of the label in the video sequence strictly follow the homography transformation law, so that by tracking the label, the projection transformation relationship from the reference frame to each frame can be accurately calculated, i.e., the homography matrix can be calculated.
[0026] It is important to emphasize that the label must remain in a rigid, flat state (i.e., it should not bend or deform) and must be attached to a healthy skin area adjacent to the wound to ensure that the plane on which the label is located is spatially close to the wound area.
[0027] It should be noted that the acquisition duration of the video sequence in this invention is typically 1 to 2 seconds. In clinical practice, wound healing takes days, and no visible changes occur within seconds. Therefore, within this extremely short acquisition time, the morphology, location, and surface texture of the burn wound can be considered unchanged. This means that the changes in the image position of the target feature points detected in the baseline frame in subsequent frames are entirely due to changes in viewing angle caused by the free movement of the imaging device, rather than deformation of the wound itself. Furthermore, the first mask obtained by region division in the baseline frame can be directly or after transformation and applied to subsequent frames without repeated segmentation.
[0028] In this embodiment, a video sequence that meets preset acquisition conditions is acquired; wherein, the preset acquisition conditions include at least the following conditions: each frame of the video sequence completely contains the rigid planar label; during the acquisition of the video sequence, the wound area has no identifiable displacement relative to the bony landmark; the movement of the imaging device is mainly a displacement parallel to the image plane, and the displacement amplitude in the direction perpendicular to the image plane is smaller than that in the parallel direction.
[0029] Since bony landmarks are defined relative to the patient's own skeletal structure, rather than the imaging equipment, movement of the imaging equipment is permissible during video sequence acquisition. However, the wound should not move relative to the patient's body. In clinical settings, this generally corresponds to minimizing large-scale patient movement during video sequence acquisition.
[0030] It should be noted that the acquisition condition that "the movement of the imaging device is mainly displacement parallel to the image plane, and the displacement amplitude in the direction perpendicular to the image plane is smaller than that in the parallel direction" is a quantitative requirement for the movement mode of the imaging device, which specifically includes two meanings: First, the movement of the device should be mainly translational, that is, the device moves in the left-right or up-down direction; second, the movement amplitude of the device in the front-back direction (i.e., the direction closer to or farther from the wound) should be smaller than the movement amplitude in the above-mentioned translational direction.
[0031] S102: Select a frame from the sequence as a reference frame; divide the reference frame into regions to generate a first mask for the wound region and a second mask for the label; determine the unit physical area corresponding to a unit pixel based on the label image in the second mask and the known physical area of the label; detect target feature points within the first mask and track the actual image coordinates of the target feature points in each frame of the sequence.
[0032] As a preferred implementation method, the first frame of the video sequence is usually selected as the reference frame.
[0033] It is understandable that choosing the first frame is not the only option. If the first frame is of poor quality due to motion blur, reflection, or occlusion, the frame with the best quality from the video sequence can be selected as the reference frame. Once the reference frame is selected, it is fixed in all subsequent calculations.
[0034] It's important to understand that since all subsequent geometric calculations (including target feature point detection, homography matrix calculation, and depth information extraction) must be performed within specific regions, rather than blindly processing the entire image, it's necessary to divide the reference frame into regions to identify "where the wound is" and "where the label is." Specifically, a first mask generated using image segmentation techniques is used to define the wound region for subsequent target feature point detection and area correction; a second mask is used to define the label region for subsequent perspective correction and homography matrix calculation.
[0035] In this embodiment, image analysis is performed on the reference frame to identify pixel regions belonging to the wound area and pixel regions belonging to the rigid planar label. Based on the identification results, a first mask and a second mask with the same size as the reference frame are generated. The first mask is used to mark the pixel position of the wound, and the second mask is used to mark the pixel position of the label.
[0036] It should be noted that the purpose of dividing the reference frame into regions is to locate two key regions involved in subsequent processing: the wound region and the label region. This division can be achieved using conventional image segmentation techniques in the field, such as deep learning-based semantic segmentation models or traditional image segmentation algorithms. The specific implementation methods will not be described in detail in this embodiment.
[0037] It should be noted that the mask can be generated using conventional binary image generation methods in this field, which will not be described in detail in this embodiment.
[0038] It can be understood that the second mask is a binary image, in which the area with a pixel value of 1 is used to locate and extract the image content of the rigid planar label from the reference frame, as the label image.
[0039] It can be understood that the first mask is a binary image, where a pixel value of 1 indicates that the pixel location belongs to the wound area, and a pixel value of 0 indicates that the pixel location does not belong to the wound area.
[0040] It's important to understand that since subsequent steps require calculating the physical area of the wound region, what we can directly obtain is only the number of pixels the wound occupies in the image; we cannot directly know how many square millimeters of real skin each pixel corresponds to. Therefore, it's necessary to establish a scaling relationship from pixel space to physical space, that is, to determine the unit physical area (reflecting how much physical area one pixel represents). This scaling relationship can be determined using a rigid planar label. The specific process is as follows: the label has known physical dimensions; by analyzing the number of pixels the label occupies in the image, the physical area corresponding to each pixel can be calculated. This conversion from pixel space to physical space is the foundation for all subsequent area calculations, and its accuracy directly affects the accuracy of the final corrected unfolded area calculation.
[0041] The process of determining the unit physical area is as follows: Figure 2 As shown, it includes: S102-1: Within the area covered by the second mask, detect the pixel vertex coordinates of the four vertices of the label in the label image and optimize them to the sub-pixel level; based on the known geometry of the label, set the coordinates of the four standard vertices of the label in the frontal view; and calculate the perspective correction matrix based on the correspondence between the four pixel vertex coordinates and the four standard vertex coordinates.
[0042] The four vertices of a label refer to the four corners of its geometric shape. For example, if the label is a square label, then its four vertices are the top left corner, top right corner, bottom right corner, and bottom left corner.
[0043] Pixel vertex coordinates are determined based on an image coordinate system with the top-left corner of the image as the origin, the positive X-axis pointing horizontally to the right, and the positive Y-axis pointing vertically downwards. The pixel vertex coordinates of a vertex can be represented in the form (x, y), where x represents the number of pixel columns from the vertex to the left edge of the image, and y represents the number of pixel rows from the vertex to the top edge of the image.
[0044] It should be noted that conventional corner detection algorithms in this field can be used to detect corners in the label image and obtain the pixel coordinates of its four vertices, such as Harris corner detection, Shi-Tomasi corner detection, etc., which will not be described in detail in this embodiment.
[0045] Since corner detection algorithms typically only locate integer pixel positions, but in real images, object edges and corners are often located between pixels, optimization to the sub-pixel level refers to further refining the initially detected integer pixel vertex coordinates (e.g., (152, 284)) to decimal places (e.g., (152.34, 284.71)). Optimization to the sub-pixel level is an existing technique in image processing, which can be directly applied in this embodiment. For example, sub-pixel optimization first analyzes the distribution pattern of pixel grayscale within a small area around the corner, then uses interpolation methods to estimate the precise position of the corner in continuous space, thereby obtaining the sub-pixel level coordinates of the corner.
[0046] Orthogonal viewpoint refers to the viewpoint observed when the optical axis of the imaging device is perpendicular to the plane of the label. At this time, the label appears in the image as its true geometric shape without perspective distortion.
[0047] Since the geometry and physical dimensions of the label are known (e.g., a square with sides of 50mm), the ideal image position it should present in a "positive view" can be set. That is, based on the known dimensions of the label, the coordinates of four standard vertices can be set. For example, for a 500×500 pixel positive view image to be generated, the four standard vertices can be set to (0, 0), (499, 0), (499, 499), and (0, 499).
[0048] It is understandable that these standard vertex coordinates and the pixel vertex coordinates detected in the image form a one-to-one point pair, which is used to solve the perspective correction matrix.
[0049] It should be noted that the specific process of solving the perspective correction matrix is as follows: First, the standard vertex coordinates and pixel vertex coordinates are paired in the same order (clockwise or counterclockwise) to form four sets of coordinate pairs, ensuring that each pair represents the coordinate mapping relationship of the same vertex of the label in the actual image and the image under the normal viewing angle. Then, based on the basic principle of perspective transformation, a system of linear equations is constructed using the four sets of coordinate pairs. The system of linear equations is solved by the least squares method to eliminate the small errors that may exist in the coordinate measurement process, and finally a 3×3 dimension perspective correction matrix is obtained. This perspective correction matrix can map the label image in the reference frame into a standard geometric image without distortion under the normal viewing angle.
[0050] S102-2: Using the perspective correction matrix, the image of the label in the second mask is transformed and mapped to a standard geometric image under the normal viewing angle.
[0051] Since the perspective correction matrix is calculated based on the coordinates of the four vertices of the label and their standard vertices, the four vertices of the label will be accurately mapped to the set standard vertices coordinates after the transformation, so that the label image that was originally distorted due to the tilt of the shooting angle is "straightened" to eliminate the perspective distortion caused by the shooting angle.
[0052] For example, the process of transforming the label image in the second mask using the calculated perspective correction matrix is as follows: The coordinates of each pixel in the label area of the second mask are converted into three-dimensional homogeneous coordinates (the first two dimensions are the pixel's horizontal and vertical coordinates, and the third dimension is set to a constant 1). These three-dimensional homogeneous coordinates are then multiplied by the perspective correction matrix to obtain the transformed three-dimensional homogeneous coordinates. The transformed three-dimensional homogeneous coordinates are then normalized to two-dimensional pixel coordinates, which are the coordinates of the pixel in the orthogonal view. All pixels in the label area of the second mask are traversed to complete the coordinate transformation of all pixels, ultimately obtaining a standard geometric image without perspective distortion in the orthogonal view, consistent with the actual geometric shape and size ratio of the label.
[0053] The specific method for normalizing the three-dimensional homogeneous coordinates to convert them into two-dimensional pixel coordinates is a technique already existing in the art, and will not be described in detail in this embodiment. For example, the specific normalization process is as follows: divide the first two dimensions (horizontal and vertical coordinates) of the transformed three-dimensional homogeneous coordinates by the third dimension of the three-dimensional homogeneous coordinates respectively to obtain the normalized two-dimensional values, which are the two-dimensional pixel coordinates of the corresponding pixel in the standard geometric image under the normal viewing angle.
[0054] S102-3: Perform binarization processing on the standard geometric image to generate a binary image corresponding to the label.
[0055] As an example, the specific binarization process is as follows: a preset grayscale threshold is set, and pixels in the standard geometric image with grayscale values higher than the preset grayscale threshold are identified as label areas and assigned a value of 1; pixels with grayscale values lower than the preset grayscale threshold are identified as background areas and assigned a value of 0, so as to generate a binary image corresponding to the label.
[0056] It should be noted that the specific value of the preset grayscale threshold is determined based on the grayscale histogram of the standard geometric image. Usually, the boundary value of the grayscale distribution between the label area and the background area is selected as the preset grayscale threshold. For example, the preset grayscale threshold is 128 (grayscale value range 0-255).
[0057] S102-4: Count the number of pixels belonging to the label in the binary image corresponding to the label, and use this as the number of pixels; divide the known physical area of the label by the number of pixels to obtain the unit physical area represented by each pixel.
[0058] The number of pixels is the total number of pixels with a value of 1 in the binary image corresponding to the indicator patch.
[0059] Since the label is fully visible in every frame of the video sequence (as guaranteed by the preset acquisition conditions) and has been accurately marked in the second mask, the label area must exist and contain a certain number of pixels in the corrected standard geometric image. Therefore, the number of pixels corresponding to the label cannot be zero.
[0060] The unit physical area refers to the actual physical area represented by each pixel in a corrected standard geometric image, with the unit being square millimeters per pixel. The unit physical area serves the following purpose: it acts as a transformation coefficient, establishing a mapping relationship between the image pixel space and the real physical space. Therefore, in subsequent calculations, we only need to count the number of pixels occupied by the wound area in the image and multiply it by the unit physical area to obtain the projected area of the wound under the flatness assumption.
[0061] It is important to understand that, since the first mask has accurately marked the wound area, detecting target feature points only within the wound area can avoid interference from irrelevant pixels in the background area, ensuring that all detected feature points belong to the wound and improving the effectiveness and specificity of the target feature points. At the same time, tracking the actual image coordinates of the target feature points in each frame of the video sequence can capture the positional shift of the target feature points as the imaging device moves and the wound morphology changes, providing reliable coordinate data support for subsequent calculation of the homography matrix and construction of the wound depth undulation distribution field.
[0062] In this embodiment, within the area covered by the first mask of the reference frame, multiple target feature points with obvious image textures are detected, and the coordinates of the target feature points are optimized to the sub-pixel level. For each frame in the video sequence other than the reference frame, a fixed-size image block is extracted centered on the position of the target feature point in the previous frame as a reference image block to be matched, and a local search window is set centered on the position of the target feature point in the previous frame. Within the local search window, candidate image blocks of the same size as the reference image block are extracted sequentially from left to right and from top to bottom. The target image block that is most similar to the reference image block is found from the candidate image blocks. The center position of the target image block is determined as the actual image coordinates of the target feature point in the current frame.
[0063] In this method, target feature points are detected only once in the base frame, and are only tracked in subsequent frames without being detected again.
[0064] To accurately detect target feature points, as an example, for each pixel location within the first mask coverage area, the absolute difference between the pixel location and its right-side neighboring pixel is calculated as the degree of grayscale change of the pixel location in the horizontal direction; the absolute difference between the pixel location and its lower-side neighboring pixel is calculated as the degree of grayscale change of the pixel location in the vertical direction; the absolute difference between the pixel location and its lower-right neighboring pixel is calculated as the degree of grayscale change of the pixel location in the diagonal direction; the grayscale change of the pixel location in the horizontal, vertical, and diagonal directions are compared, and the largest value is selected as the response value of the pixel location; pixel locations with response values exceeding a preset threshold are selected as candidate feature points; the Euclidean distance between adjacent candidate feature points is calculated; candidate feature points with Euclidean distances less than a preset distance threshold are removed to obtain the final target feature points.
[0065] To ensure that the retained target feature points still have high texture saliency, when removing candidate feature points whose Euclidean distance is less than a preset distance threshold, as an example, all candidate feature points are sorted in descending order of their corresponding response values; the sorted candidate feature points are traversed sequentially, and the point with the largest response value is retained and recorded as the target point; with the target point as the center, other candidate feature points within the range of Euclidean distance less than the preset distance threshold are removed; the above process is repeated until all candidate feature points have been processed to obtain the final target feature point.
[0066] It should be noted that since the position of each pixel is uniquely determined by its column and row coordinates, for example, a pixel located in the 100th column and 200th row of the image can be represented as (100, 200). Therefore, in the description of this invention, the terms "pixel position," "pixel point," and "pixel coordinates" have the same meaning and all refer to a single pixel unit with specific coordinates in the image.
[0067] The response value is a quantitative indicator that measures whether a pixel location is suitable as a target feature point. A larger response value for a pixel location indicates that the pixel's grayscale changes are more pronounced in all directions, resulting in more significant texture and making it more suitable as a target feature point.
[0068] It should be noted that the specific value of the preset threshold is determined based on historical data statistical analysis, and this embodiment does not impose specific limitations. For example, the specific value selection process is as follows: collect a batch of representative historical wound images, manually mark the texture points suitable for tracking, statistically analyze the response value distribution of these texture points, and select the value that can effectively distinguish textured areas from flat areas as the threshold. For example, when the image grayscale value is normalized to the range of 0~1 or the image grayscale value is in the range of 0~255, the preset threshold can be set to 0.01.
[0069] Neighboring candidate feature points refer to candidate feature points that are geographically close to each other in the image.
[0070] It should be noted that calculating Euclidean distance is a common technique, and this embodiment will not elaborate on it further. For example, the specific calculation process is: the square root of the sum of the squares of the differences in the horizontal and vertical coordinates.
[0071] It should be noted that the specific value of the preset distance threshold is determined based on engineering experience, and this embodiment does not impose a specific limitation. For example, in the process of determining this value, it should be considered that if two feature points are too close, the local search window size of the subsequent optical flow tracing will highly overlap, resulting in the inability to provide independent depth information. For example, a typical value for the preset distance threshold is 5 to 10 pixels.
[0072] A reference image patch is a small square region extracted from an image, centered on the target feature point.
[0073] For example, if the position of a target feature point in the previous frame is pixel coordinates (152, 284), and the fixed size is set to 11×11 pixels, then with the target feature point as the center, extend 5 pixels upward, downward, leftward, and rightward respectively, and extract all pixels in the range from row 149 to 159 and column 279 to 289 to form a square image block of 11 rows × 11 columns.
[0074] The local search window is the area within which the same target feature point is expected to appear in the current frame. The specific value is determined based on engineering experience, and this embodiment does not impose a specific limitation. For example, if the feature point position in the previous frame is (152, 284), and the search window size is set to 31×31 pixels, then a rectangular area from row 269 to 299 and column 137 to 167 is defined as the search range, centered on the target feature point and extending 15 pixels upwards, downwards, leftwards, and rightwards.
[0075] Since the movement of the imaging device also causes displacement of feature points, the size of the local search window is usually larger than the size of the reference image block in order to accommodate the error caused by such displacement.
[0076] To accurately find the target image patch most similar to the reference image patch, as an example, within the local search window, candidate image patches of the same size as the reference image patch are extracted sequentially from left to right and from top to bottom. For each candidate image patch, the absolute difference between its corresponding pixel grayscale values and those of the reference image patch is calculated as the grayscale difference. The grayscale differences of all pixels are summed to obtain a cumulative value representing the similarity between the candidate image patch and the reference image patch. The candidate image patch with the smallest cumulative value is selected as the target image patch most similar to the reference image patch.
[0077] Corresponding pixels between a candidate image patch and a reference image patch refer to pairs of pixels in the two image patches that are at the same relative position. For example, if both image patches are 11×11 pixels, then the pixels located in the 3rd row and 4th column of each image patch constitute a pair of corresponding pixels.
[0078] Since the accumulated value measures the overall difference in grayscale values of corresponding pixels between a candidate image patch and a reference image patch, a smaller accumulated value means that the grayscale values of the two image patches at each pixel position are closer, i.e., their texture features are more similar. The reference image patch reflects the real texture around the target feature point in the previous frame. Therefore, selecting the candidate image patch with the smallest accumulated value means selecting the location of the candidate image patch that is most similar to the reference image patch, i.e., the location where the target feature point is most likely to appear in the current frame.
[0079] For example, assuming a target image block is 11×11 pixels in size, its center point is located in the 6th row and 6th column of the block. The pixel coordinates of this center point are the actual image coordinates of the target feature point in the current frame.
[0080] S103: Solve the homography matrix from the reference frame to each frame in the sequence; based on the homography matrix, map the coordinates of each target feature point in the reference frame to each frame to obtain the planar projection coordinates of each target feature point in each frame; combine the planar projection coordinates with the actual image coordinates to generate the depth undulation distribution field covering the wound area; determine the corrected unfolded area of the wound area according to the spatial change rate of the depth undulation distribution field and the unit physical area.
[0081] It's important to understand that because the image changes of a rigid planar label under different viewpoints strictly follow the homography transformation law, this method needs to accurately transfer the geometric relationships established in the reference frame (such as target feature points and mask regions) to every frame of the video sequence. Therefore, it is necessary to calculate the homography matrix from the reference frame to each frame. The homography matrix has a dual function: first, it predicts the theoretical position (i.e., planar projection coordinates) that a target feature point should appear in the current frame if it is located on the plane of the label; second, it maps the first mask of the reference frame to subsequent frames, ensuring the consistency of the wound area throughout the video sequence and avoiding repeated image segmentation for each frame.
[0082] The homography transformation law originates from fundamental conclusions in existing projective geometry. The specific reasoning process is as follows: the label is a spatial plane, and the imaging process of this spatial plane from different viewpoints can be described as a central projection transformation between two planes. For example, suppose the image of spatial plane S from the first viewpoint is... The image from the second perspective is Then for any point P on the space plane S, its position... pixels in With Image points in There exists a definite 3×3 matrix transformation relationship, namely H is the homography matrix. The validity of this matrix transformation relationship does not depend on whether the intrinsic parameters of the imaging device are known, nor on the motion mode of the imaging device. As long as the label remains a rigid plane and is fully visible in both images, the matrix transformation relationship is strictly valid and conforms to the homography transformation law.
[0083] The homography matrix solution process is as follows: Figure 3 As shown, it includes: S103-1: Within the area covered by the second mask of the reference frame, detect multiple reference feature points with obvious image textures and optimize the coordinates of the reference feature points to the sub-pixel level.
[0084] Reference feature points are specific pixel locations on the surface of a rigid planar label that possess distinct image texture. Similar to target feature points, each reference feature point corresponds to a specific pixel coordinate (x, y) in the image, and its coordinate values can also be optimized to the sub-pixel level. The essential difference between reference and target feature points lies only in their spatial location: target feature points are located within the wound area defined by the first mask, used to characterize the depth undulations of the wound surface; while reference feature points are located within the label area defined by the second mask, used to solve for the homography matrix of the reference plane.
[0085] In the description of this invention, terms such as "reference feature point", "pixel position of reference feature point", and "coordinates of reference feature point" all refer to the same concept, namely, the specific pixel point selected as the tracking target on the surface of the label and its position in the image coordinate system.
[0086] It should be noted that the basic principle of detecting reference feature points within the label area covered by the second mask of the reference frame is the same as that of target feature point detection. Both employ a corner detection method based on grayscale change analysis, selecting locations with obvious image texture as feature points and optimizing their coordinates to the sub-pixel level. The only difference between detecting reference and target feature points lies in the detection area: target feature points are detected within the wound area of the first mask, while reference feature points are detected within the label area of the second mask.
[0087] S103-2: For each frame in the video sequence other than the reference frame, track the reference image coordinates of each reference feature point in the current frame.
[0088] It should be noted that the basic principle and implementation of tracking the reference image coordinates of reference feature points in subsequent frames are exactly the same as those of tracking target feature points. That is, both adopt the local search window method based on reference image patch matching. The only difference is the tracking object: the target feature point is located in the wound area, and its tracking result is used for subsequent depth information extraction; the reference feature point is located in the label area, and its tracking result is used to solve the homography matrix.
[0089] S103-3: Based on the correspondence between the coordinates of the reference feature points in the reference frame and the coordinates of the reference image in the current frame, calculate the homography matrix from the reference frame to the current frame.
[0090] For example, solving the homography matrix based on the correspondence between the coordinates of a reference feature point in the reference frame and its coordinates in the reference image in the current frame is essentially solving a 3×3 mathematical transformation matrix. The homography matrix solution process includes the following steps: First, for all tracked reference feature points, establish a one-to-one correspondence between their coordinates in the reference frame and their coordinates in the reference image in the current frame, ensuring that each set of corresponding coordinates represents the position of the same reference feature point in different frames, and that the correspondence is not disordered; then, based on the mathematical principle of perspective transformation, each set of coordinate correspondences can be used to construct a set of linear equations; use multiple sets of coordinate correspondences (at least four sets) to construct a system of linear equations; finally, use the least squares method to solve the system of linear equations, correcting any minor coordinate measurement errors that may exist during the solution process, and finally obtain a 3×3 dimension homography matrix.
[0091] It is important to understand that, since the homography matrix precisely describes the projection transformation of a rigid planar label from the reference frame viewpoint to the current frame viewpoint, and the plane containing the label is defined as the reference plane of this invention, the homography matrix can be used to predict the theoretical position of any point in space located on the reference plane in the current frame. Specifically, for a target feature point within the wound area, if it is assumed to be located at the height of the reference plane (i.e., on the same plane as the label), its theoretical position in the current frame can be obtained by multiplying the coordinates of the target feature point in the reference frame with the homography matrix to obtain the planar projection coordinates.
[0092] The process of obtaining planar projection coordinates specifically includes the following steps: 1) For each target feature point detected in the first mask, obtain its coordinates in the reference frame; for each frame in the video sequence other than the reference frame, obtain the homography matrix from the reference frame to the current frame.
[0093] 2) Represent the two-dimensional coordinates of the target feature point in the reference frame as a format containing three values, where the first two values are the x-coordinate and y-coordinate of the target feature point, and the third value is set as the first fixed constant; multiply the format containing three values with the homography matrix from the reference frame to the current frame to obtain a new calculation result containing three values.
[0094] It should be noted that the specific value of the first fixed constant is determined based on engineering experience, and this embodiment does not impose a specific limitation. For example, a typical value of the first fixed constant is 1.
[0095] 3) Normalize the calculation results and convert them into two-dimensional coordinates in the current frame image coordinate system as the planar projection coordinates of the target feature point in the current frame; where the planar projection coordinates are used to represent the theoretical position that the target feature point should appear in the current frame image if it is located on the plane of the label.
[0096] It should be noted that since the calculation result is a three-dimensional homogeneous coordinate, the specific method for normalizing the calculation result is the same as the method for normalizing the three-dimensional homogeneous coordinate in step S102, and will not be repeated in this embodiment.
[0097] To generate a complete depth fluctuation distribution field covering the wound area, as an example, for each tracked target feature point, the planar projection coordinates and actual image coordinates of the target feature point in each frame are obtained, and the Euclidean distance between the two is calculated as the original deviation value. For each frame, the camera translation vector is decomposed from the homography matrix corresponding to the frame, and the magnitude of the camera translation vector is calculated as the motion amplitude factor of the frame. The original deviation value of each target feature point in the frame is divided by the motion amplitude factor of the frame to obtain the normalized local depth deviation value. The arithmetic mean of the local depth deviation values of the same target feature point in all frames is calculated as the depth fluctuation index of the target feature point. Based on the depth fluctuation index of all target feature points and their pixel positions in the reference frame, the depth fluctuation index corresponding to each pixel position in the wound area is estimated by spatial interpolation. Here, spatial interpolation refers to estimating the depth fluctuation index of unknown pixel positions based on the known depth fluctuation index of the target feature point, according to the principle that the closer the distance, the greater the influence. The depth fluctuation index of each pixel position is combined to form a two-dimensional array with the same size as the reference frame as the depth fluctuation distribution field covering the wound area.
[0098] Because this invention acquires video in a free-moving state, the camera's translation amplitude between adjacent frames is random. If the original Euclidean distance is directly used as the local depth deviation value of the target feature point, the calculated local depth deviation value will be affected by both the wound depth and the camera's translation amplitude, causing the subsequently calculated depth fluctuation index to lose objectivity. Therefore, this invention decomposes the camera translation vector from the homography matrix of each frame, takes its magnitude as the motion amplitude factor for that frame, and divides the original Euclidean distance by this motion amplitude factor to obtain a normalized local depth deviation value, thereby eliminating the influence of camera motion amplitude on depth measurement.
[0099] It should be noted that the camera translation vector can be decomposed from the homography matrix using conventional homography matrix decomposition algorithms in this field. The basic principle of this algorithm is that the homography matrix describes the projection transformation relationship of a rigid plane between two viewpoints, and this projection transformation relationship can be expressed in the following mathematical form: × Where K is the camera intrinsic parameter matrix, R is the rotation matrix, t is the translation vector, d is the distance to the reference plane, and n is the plane normal vector. Let n be the transpose of the plane normal vector n. Then, through singular value decomposition or iterative optimization methods, the rotation component R and the translation component t can be separated from H. The magnitude of the translation vector t obtained by decomposition is then the camera motion amplitude factor. The above decomposition process is a standard technique in the field of computer vision, and the specific implementation details will not be elaborated in this embodiment.
[0100] The standard relationship H between the homography matrix and camera intrinsic parameters, rotation matrix, and translation vector, as illustrated above, is derived from existing technology in the field of computer vision and can be directly applied in this embodiment.
[0101] The magnitude of the translation vector reflects the amount of translation the camera makes relative to the reference frame in the current frame.
[0102] The camera intrinsic parameter matrix is a parameter matrix that describes the internal geometric characteristics of the imaging device. The camera intrinsic parameter matrix is a standard concept and existing technology in the field of computer vision, and this embodiment can directly apply it.
[0103] The local depth deviation value reflects the degree to which a target feature point deviates from the reference plane (i.e., the plane where the label is located) in a given frame. If the target feature point is exactly at the height of the reference plane, its actual image coordinates should coincide with the plane's projected coordinates, and the local depth deviation value is zero. If the target feature point is higher than the reference plane (the wound is raised), it will generate faster image movement than the reference plane when the imaging device is translated, causing its actual position to deviate from its theoretical position (i.e., the actual image coordinates should not coincide with the plane's projected coordinates). Therefore, the magnitude of the local depth deviation value is positively correlated with the height difference of the target feature point relative to the reference plane.
[0104] Specifically, if the local depth deviation value of a target feature point in a certain frame is larger, it indicates that the target feature point deviates from the reference plane in that frame, that is, the wound surface represented by the target feature point is more undulating; if the local depth deviation value of a target feature point in a certain frame is smaller, it indicates that the target feature point is closer to the height of the reference plane, that is, the wound surface represented by the target feature point is flatter.
[0105] It should be noted that outlier screening can be performed before calculating the depth fluctuation index: all local depth deviation values of each target feature point are statistically analyzed, and outlier deviation values that exceed the range of "mean ± 3 times standard deviation" are removed; if the number of local depth deviation values after removal is less than 50% of the total number of frames in the video sequence, the corresponding target feature point is determined to have failed tracking and is not included in the depth fluctuation index calculation.
[0106] Because a single frame image can still be affected by factors such as illumination fluctuations, sensor noise, and minor motion blur, the local depth deviation value may exhibit random fluctuations. Therefore, in order to eliminate random noise and minor motion errors in single-frame measurements, the arithmetic mean of the local depth deviation values of the same target feature point across all frames can be calculated to effectively smooth out these noises, making the final depth fluctuation index more realistically reflect the true depth of the wound at the target feature point.
[0107] The depth fluctuation index represents the average degree of undulation of a target feature point relative to a reference plane. A larger depth fluctuation index for a target feature point indicates a higher elevation or deeper depression on the wound surface, meaning a more drastic change in local depth. Conversely, a smaller depth fluctuation index indicates that the plane containing the target feature point is closer to the height of the reference plane, meaning a flatter wound surface.
[0108] For example, the specific method for spatial interpolation to estimate the depth fluctuation index of each pixel in the wound area is as follows: taking the pixel position of all target feature points in the reference frame and their corresponding depth fluctuation index as known samples, using nearest neighbor interpolation or bilinear interpolation, and following the principle that "the closer the target feature point is, the greater its influence on the unknown pixel", the depth fluctuation index corresponding to the unknown pixel is obtained by calculating the distance weight between the unknown pixel and the surrounding known target feature points, thereby completing the estimation of the depth fluctuation index of all pixels in the wound area and forming a complete depth fluctuation distribution field.
[0109] The distance weight calculation and depth fluctuation index fitting method are as follows: First, determine several known target feature points around the unknown pixel (for example, usually select 4 to 8 points with the closest Euclidean distance), and calculate the Euclidean distance between the unknown pixel and each of the surrounding known target feature points; then, calculate the weight of each known feature point based on the inverse distance principle, that is, the closer to the unknown pixel, the larger the weight value, and the farther away, the smaller the weight value, and the sum of the weights of all surrounding known target feature points is 1; finally, multiply the depth fluctuation index of each surrounding known target feature point with its corresponding weight, and then sum all the product results to obtain the summation result, which is the depth fluctuation index corresponding to the unknown pixel obtained by fitting.
[0110] Because there might be cases where the location of an unknown pixel completely coincides with the coordinates of a target feature point, leading to instability in mathematical calculations, inverse distance weighted interpolation is used. For an unknown pixel to be estimated, if its location completely coincides with the coordinates of a target feature point, the calculated Euclidean distance will be zero. In this case, the depth fluctuation index of the target feature point is directly assigned to the unknown pixel without weighted calculation. If the unknown pixel coincides with multiple target feature points, the depth fluctuation index of any one of the target feature points can be selected, or the arithmetic mean of the depth fluctuation indices of these overlapping target feature points can be taken. For unknown pixels without overlap, the weights are still calculated according to the inverse distance principle, and a weighted sum is performed to calculate the corresponding depth fluctuation index for the unknown pixel.
[0111] The depth undulation distribution field is a two-dimensional array with the same size as the reference frame. For example, if the reference frame is 1920×1080 pixels, the depth undulation distribution field is also a matrix of 1920 rows × 1080 columns. Each element in the matrix corresponds to a pixel position in the reference frame, and the value of the element represents the depth undulation index at that pixel position.
[0112] Because there may be a fixed height difference between the label and the wound surface—for example, the label may be attached to healthy skin, while the wound surface may be higher or lower due to swelling—and this invention is based on the assumption that the plane where the label is located is a zero-depth reference plane when calculating the depth fluctuation index, a calibration step is added after generating the depth fluctuation distribution field to avoid the depth fluctuation index containing systematic biases that could affect the accuracy of the subsequently obtained corrected unfolded area. This involves selecting the edge pixel region where the wound edge meets the healthy skin (this edge pixel region typically has no significant undulations), calculating the arithmetic mean of the depth fluctuation indices of all pixels within this edge pixel region as the systematic height bias, and subtracting this systematic height bias from the depth fluctuation indices of all pixels to finally achieve calibration and eliminate the influence of the fixed height difference between the label and the wound surface.
[0113] It should be noted that there are some pixel locations in the depth fluctuation distribution field where the depth fluctuation index is directly assigned to zero. These pixel locations mainly correspond to flat areas within the wound area where the degree of undulation is negligible, and their depth does not change significantly, so their depth fluctuation index is assigned to zero.
[0114] The process of revising and determining the unfolded area specifically includes the following steps: 1) For each pixel position within the first mask of the wound area, calculate the horizontal and vertical partial derivatives of the depth undulation distribution field at the pixel position.
[0115] It should be noted that the horizontal and vertical partial derivatives at the pixel position are often calculated using gradient operators. The specific calculation method is a method disclosed in the prior art, and will not be described in detail in this embodiment.
[0116] The horizontal and vertical partial derivatives characterize the rates of change of depth in the horizontal and vertical directions, respectively.
[0117] 2) Squaring the partial derivative in the horizontal direction yields the squared value in the horizontal direction; squaring the partial derivative in the vertical direction yields the squared value in the vertical direction; adding the squared values in the horizontal and vertical directions gives the spatial rate of change at the pixel location.
[0118] The spatial rate of change reflects the local steepness of the wound surface at a given pixel location. A larger spatial rate of change at a given pixel location indicates a steeper wound surface, meaning more dramatic topographic relief (e.g., the edge of eschar or the raised area of granulation tissue); conversely, a smaller spatial rate of change at a given pixel location indicates a smoother wound surface, meaning gentler topographic relief (e.g., a flat eschar surface or a smooth area in the healing process).
[0119] 3) Multiply the spatial rate of change by the pre-calibrated mapping coefficient to obtain the intermediate value; calculate the sum of the intermediate value and the second fixed constant as the sum value, and perform a square root operation on the sum value to obtain the magnification factor.
[0120] It should be noted that the specific value of the pre-calibrated mapping coefficient can be determined based on calibration simulation experiments. The specific experimental process is as follows: using the imaging device to be calibrated, a standard curved surface object with a known actual undulation height (such as a calibration plate with a known ripple height) is acquired at a fixed shooting distance; the actual unfolded area of the curved surface object is calculated; when calculating the corrected unfolded area of the surface according to the method of this invention, the mapping coefficient is used as a variable to be optimized, and its value is adjusted iteratively to minimize the error between the calculated corrected unfolded area and the actual unfolded area of the surface. The mapping coefficient value corresponding to the point where the corrected unfolded area is equal to or has the smallest error is taken as the optimal pre-calibrated mapping coefficient. For example, assuming the imaging device is a standard mobile phone lens with a field of view of about 60-70 degrees, at a typical shooting distance of 20-40cm, the value of the pre-calibrated mapping coefficient is usually between 1.0 and 5.0, such as 2.0.
[0121] It should be noted that the second fixed constant is usually set to 1 to ensure that the value inside the square root is positive and to avoid errors in the square root calculation.
[0122] 4) Multiply the unit physical area by the magnification factor to obtain the local area after pixel position correction.
[0123] Since the rate of spatial change reflects the local steepness of the wound surface at a certain pixel location, geometrically speaking, the magnification measures the degree of expansion of the surface area relative to the projected area (i.e., the area obtained by projecting the wound area surface perpendicularly onto a plane perpendicular to the optical axis of the device). If the wound surface were completely flat, the rate of spatial change would be zero, the magnification would be 1, and the actual surface area would equal the projected area.
[0124] The larger the magnification of the wound area at a certain pixel location, the steeper the local slope and the more severe the surface undulations at that pixel location, and the greater the area increase required for compensation. Conversely, the closer the magnification of the wound area at a certain pixel location is to 1, the flatter the pixel location, and the smaller the area increase required for compensation. Therefore, the corrected local area can be represented by the following formula: in, Indicates the corrected local area; Represents the unit physical area; F represents the second fixed constant; E represents the pre-calibrated mapping coefficient; This represents the partial derivative in the horizontal direction; This represents the partial derivative in the vertical direction.
[0125] 5) Traverse all pixel positions within the first mask of the wound area, and sum up the corrected local area of each pixel position in turn. The sum is used as the corrected unfolded area of the wound area.
[0126] It's important to understand that the corrected unfolded area, by eliminating projection compression errors caused by the tilted shooting angle and compensating for the underestimation of area due to surface undulations, is closer to the true physical coverage of wound tissue than the traditional direct pixel area. Specifically, a larger corrected unfolded area indicates a wider wound area or more pronounced surface undulations; a smaller corrected unfolded area indicates a more localized wound area or a flatter surface.
[0127] S104: Determine the average daily area shrinkage rate based on the corrected unfolding area and time interval between two adjacent visits to the same wound; compare the average daily area shrinkage rate with the preset judgment threshold, and output auxiliary judgment information on wound depth based on the comparison result.
[0128] Two consecutive visits refer to two consecutive clinical follow-up records for the same patient regarding the same wound area.
[0129] As a preferred implementation method, the specific calculation process of the average daily area shrinkage rate is as follows: subtract the corrected unfolded area of the current visit from the corrected unfolded area of the previous visit to obtain the area shrinkage during the interval period; divide the area shrinkage by the corrected unfolded area of the previous visit to obtain the total shrinkage percentage during the interval period; and then divide the total shrinkage percentage by the number of days between the two visits to obtain the average daily area shrinkage rate.
[0130] It is understood that if the unit of time interval within the interval is not days (such as hours), it must be converted to days before performing the division operation. However, in the clinical application of this invention, the date of consultation is usually recorded to a specific date, so the number of days between visits can usually be calculated directly from the date difference.
[0131] It should be noted that if the area of this correction is larger than the area of the previous correction, and the change in area exceeds the preset increase threshold, it is directly judged as "deep burn tendency"; if the change in area does not exceed the preset increase threshold, it is judged as measurement error or slight fluctuation, and the daily average area shrinkage rate is calculated according to the original formula (which is negative), and then compared with the preset judgment threshold.
[0132] Area change rate = total shrinkage percentage × 100%.
[0133] It should be noted that the specific value of the preset amplification threshold is determined based on statistical analysis of large-sample clinical data. The specific determination process involves: collecting wound data from confirmed superficial and deep burn cases; calculating the area change at different healing stages using the method of this invention; statistically analyzing the maximum fluctuation in area change during normal healing of superficial burns; and combining this with the minimum increase in area change due to infection and necrosis in deep burns; finally, selecting the boundary value between the two scenarios as the preset amplification threshold. For example, a typical preset amplification threshold value is 5%.
[0134] It should be noted that the corrected unfolded area refers to the actual physical area of the wound calculated using the method of this invention. Since the application scenario of this invention is the assessment of existing burn wounds, under normal clinical conditions, the corrected unfolded area will always be a value greater than zero.
[0135] It is important to understand that, based on clinicopathological principles, superficial burns, due to the preservation of dermal appendages and strong epithelial regeneration capacity, exhibit continuous and rapid area shrinkage; while deep burns, due to full-thickness skin necrosis and a lack of regeneration sources, exhibit slow or stalled shrinkage, or expansion due to infection. Therefore, based on this clinicopathological principle, two preset judgment thresholds—a rapid healing threshold and a healing stagnation threshold—can be set to further classify wound areas.
[0136] For example, if the average daily area shrinkage rate is greater than the rapid healing threshold, the auxiliary judgment information of "tendency to superficial burns" is output; if the average daily area shrinkage rate is less than the healing stagnation threshold or is negative, the auxiliary judgment information of "tendency to deep burns" is output; if the average daily area shrinkage rate is between the rapid healing threshold and the healing stagnation threshold, the prompt information of "requires comprehensive judgment in conjunction with other clinical indicators" is output.
[0137] It should be noted that the preset judgment thresholds include two values: a rapid healing threshold and a healing stagnation threshold. Their specific values are determined based on statistical analysis of large clinical sample data. The specific determination process is as follows: a batch of clinically diagnosed superficial and deep burn cases are collected, and the average daily area shrinkage rate at different healing stages is calculated according to the method of this invention. The shrinkage rate distribution range of the two groups of cases is statistically analyzed, and the boundary value that best distinguishes the two types of samples is selected as the preset judgment threshold. For example, the rapid healing threshold can be set to 0.8% / day to 1.2% / day, and the healing stagnation threshold can be set to 0.3% / day to 0.5% / day.
[0138] It is important to emphasize that the output of this invention is "auxiliary judgment information," intended to provide clinicians with objective quantitative references rather than final medical diagnostic conclusions.
[0139] It should be noted that the order of the above embodiments of the present invention is merely for descriptive purposes and does not represent the superiority or inferiority of the embodiments. The processes depicted in the accompanying drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
[0140] The various embodiments in this specification are described in a progressive manner. The same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on describing the differences from other embodiments.
Claims
1. A method for classifying skin burn depth based on wound image features, characterized in that, The method includes: Acquire video sequences; wherein the video sequences are acquired by a monocular imaging device in a free-moving state, and the sequences contain the wound area and rigid planar labels; Select a frame from the sequence as a reference frame; divide the reference frame into regions to generate a first mask for the wound area and a second mask for the label; determine the unit physical area corresponding to a unit pixel based on the label image in the second mask and the known physical area of the label; detect target feature points within the first mask and track the actual image coordinates of the target feature points in each frame of the sequence. Solve the homography matrix from the reference frame to each frame in the sequence; based on the homography matrix, map the coordinates of each target feature point in the reference frame to each frame to obtain the planar projection coordinates of each target feature point in each frame; combine the planar projection coordinates with the actual image coordinates to generate the depth undulation distribution field covering the wound area; determine the corrected unfolded area of the wound area according to the spatial change rate of the depth undulation distribution field and the unit physical area. Based on the corrected unfolded area and time interval of two consecutive visits to the same wound, the average daily area shrinkage rate is determined; the average daily area shrinkage rate is compared with a preset judgment threshold, and auxiliary judgment information on wound depth is output based on the comparison result.
2. The skin burn depth classification method based on wound image features according to claim 1, characterized in that, The acquisition of the video sequence includes: Acquire a video sequence that meets preset acquisition conditions; wherein the preset acquisition conditions include at least the following conditions: each frame of the video sequence fully contains the rigid planar label; during the acquisition of the video sequence, the wound area has no discernible displacement relative to the bony landmark; the movement of the imaging device is mainly a displacement parallel to the image plane, and the displacement amplitude in the direction perpendicular to the image plane is smaller than that in the parallel direction.
3. The skin burn depth classification method based on wound image features according to claim 1, characterized in that, The process of dividing the reference frame into regions to generate a first mask for the wound area and a second mask for the label includes: Image analysis was performed on the baseline frame to identify pixel regions belonging to the wound area and pixel regions belonging to the rigid planar label. Based on the recognition results, a first mask and a second mask with the same size as the reference frame are generated; wherein, the first mask is used to mark the pixel position of the wound, and the second mask is used to mark the pixel position of the label.
4. The skin burn depth classification method based on wound image features according to claim 3, characterized in that, The second mask is a binary image, where the region with a pixel value of 1 is used to locate and extract the image content of the rigid planar label from the reference frame, as the label image; The process of determining the unit physical area includes: Within the area covered by the second mask, the pixel vertex coordinates of the four vertices of the label in the label image are detected and optimized to the sub-pixel level; based on the known geometry of the label, the coordinates of the four standard vertices of the label in the frontal view are set; based on the correspondence between the four pixel vertex coordinates and the four standard vertex coordinates, the perspective correction matrix is calculated. The image of the label in the second mask is transformed using a perspective correction matrix and mapped to a standard geometric image from the frontal view. Binarize the standard geometric image to generate a binary image corresponding to the label; The number of pixels belonging to the label in the binary image corresponding to the label is counted as the pixel count; the known physical area of the label is divided by the pixel count to obtain the unit physical area represented by each pixel.
5. The skin burn depth classification method based on wound image features according to claim 3, characterized in that, The first mask is a binary image, where a pixel value of 1 indicates that the pixel location belongs to the wound area, and a pixel value of 0 indicates that the pixel location does not belong to the wound area. The step of detecting target feature points within the first mask and tracking the actual image coordinates of the target feature points in each frame of the sequence includes: Within the area covered by the first mask of the reference frame, multiple target feature points with obvious image texture are detected, and the coordinates of the target feature points are optimized to the sub-pixel level. For each frame in the video sequence other than the reference frame, a fixed-size image block is extracted centered on the position of the target feature point in the previous frame as a reference image block to be matched, and a local search window is set centered on the position of the target feature point in the previous frame. Within the local search window, candidate image patches of the same size as the reference image patch are extracted sequentially from left to right and top to bottom; the target image patch that is most similar to the reference image patch is then found from the candidate image patches. The center position of the target image block is determined as the actual image coordinates of the target feature point in the current frame.
6. The skin burn depth classification method based on wound image features according to claim 5, characterized in that, Within the area covered by the first mask of the reference frame, multiple target feature points with obvious image textures are detected, including: For each pixel position within the coverage area of the first mask, calculate the absolute difference between the pixel position and its right-side adjacent pixel as the degree of grayscale change of the pixel position in the horizontal direction; calculate the absolute difference between the pixel position and its lower-side adjacent pixel as the degree of grayscale change of the pixel position in the vertical direction; calculate the absolute difference between the pixel position and its lower-right adjacent pixel as the degree of grayscale change of the pixel position in the diagonal direction. Compare the grayscale changes of the pixel position in the horizontal, vertical and diagonal directions, and select the largest value as the response value of the pixel position. Pixel locations whose response values exceed a preset threshold are selected as candidate feature points; Calculate the Euclidean distance between adjacent candidate feature points; Candidate feature points whose Euclidean distance is less than a preset distance threshold are removed to obtain the final target feature points.
7. The skin burn depth classification method based on wound image features according to claim 1, characterized in that, The calculation of the homography matrix from the reference frame to each frame of the sequence includes: Within the area covered by the second mask of the reference frame, multiple reference feature points with obvious image textures are detected, and the coordinates of the reference feature points are optimized to the sub-pixel level. For each frame in the video sequence other than the reference frame, track the reference image coordinates of each reference feature point in the current frame; Based on the correspondence between the coordinates of the reference feature points in the reference frame and the coordinates of the reference image in the current frame, the homography matrix from the reference frame to the current frame is calculated.
8. A method for classifying skin burn depth based on wound image features according to claim 7, characterized in that, The process of obtaining the planar projection coordinates includes: For each target feature point detected within the first mask, its coordinates in the reference frame are obtained; For each frame in the video sequence other than the reference frame, obtain the homography matrix from the reference frame to the current frame; The two-dimensional coordinates of the target feature point in the reference frame are represented as a format containing three values, where the first two values are the x and y coordinates of the target feature point, and the third value is set as a first fixed constant; the format containing three values is multiplied with the homography matrix from the reference frame to the current frame to obtain a new calculation result containing three values. The calculation results are normalized and converted into two-dimensional coordinates in the current frame image coordinate system, which are used as the planar projection coordinates of the target feature point in the current frame. The planar projection coordinates are used to represent the theoretical position that the target feature point should appear in the current frame image if it is located on the plane of the label.
9. A method for classifying skin burn depth based on wound image features according to claim 1, characterized in that, The process of combining planar projection coordinates with actual image coordinates to generate a depth undulation distribution field covering the wound area includes: For each tracked target feature point, obtain the planar projection coordinates and actual image coordinates of the target feature point in each frame, calculate the Euclidean distance between the two, and use it as the original deviation value. For each frame, the camera translation vector is decomposed from the homography matrix corresponding to the frame, and the magnitude of the camera translation vector is calculated as the motion amplitude factor of the frame; the original deviation value of each target feature point in the frame is divided by the motion amplitude factor of the frame to obtain the normalized local depth deviation value. Calculate the arithmetic mean of the local depth deviation values of the same target feature point in all frames, and use it as the depth fluctuation index of the target feature point. Based on the depth fluctuation index of all target feature points and their pixel positions in the reference frame, the depth fluctuation index corresponding to each pixel position in the wound area is estimated by spatial interpolation. Spatial interpolation refers to estimating the depth fluctuation index of unknown pixel positions based on the known depth fluctuation index of the target feature points, according to the principle that the closer the distance, the greater the influence. The depth undulation indicators of each pixel location are combined to form a two-dimensional array with the same size as the reference frame, which serves as the depth undulation distribution field covering the wound area.
10. A method for classifying skin burn depth based on wound image features according to claim 9, characterized in that, The process for determining the corrected unfolded area includes: For each pixel location within the first mask of the wound area, calculate the horizontal and vertical partial derivatives of the depth undulation distribution field at the pixel location; Squaring the partial derivative in the horizontal direction yields the squared value in the horizontal direction; squaring the partial derivative in the vertical direction yields the squared value in the vertical direction; adding the squared values in the horizontal and vertical directions gives the spatial rate of change at the pixel location. Multiply the spatial rate of change by the pre-calibrated mapping coefficient to obtain an intermediate value; calculate the sum of the intermediate value and the second fixed constant as the sum value, and perform a square root operation on the sum value to obtain the magnification factor; Multiply the unit physical area by the magnification factor to obtain the local area after pixel position correction; Traverse all pixel positions within the first mask of the wound area, and sum up the corrected local areas of each pixel position. The sum is taken as the corrected unfolded area of the wound area.