Mask vectorization and lightweight human-computer collaborative labeling method and application thereof

By employing mask vectorization and a lightweight human-computer collaborative annotation method, the problems of high interaction costs, data transmission redundancy, and poor cross-platform compatibility in medical cell image annotation are solved. This method enables the generation of low-volume JSON files and efficient human-computer collaborative annotation, thereby improving annotation efficiency and compatibility.

CN122244604APending Publication Date: 2026-06-19FOSHAN UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
FOSHAN UNIVERSITY
Filing Date
2026-05-22
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing medical cell image annotation technologies suffer from high interaction costs, data transmission and storage redundancy, and poor cross-platform compatibility. In particular, in AI pre-annotation of medical cell images, existing technical solutions cannot effectively address issues such as doctor-friendliness, large data file size, front-end software rendering lag, and parsing crashes.

Method used

We employ a mask vectorization and lightweight human-computer collaborative annotation method. By performing topological dimensionality reduction, polygon smoothing approximation, and topological anomaly verification on high-dimensional pixel masks, we generate a JSON file with extremely low data volume, which is compatible with lightweight annotation software across all platforms. Furthermore, we use the Douglas-Puk algorithm for polygon approximation and topological anomaly verification to ensure data security and compatibility.

Benefits of technology

It achieves non-destructive reconstruction of core cell morphological features, generates JSON files with extremely low data volume, reduces the cognitive load on doctors, improves data transmission efficiency and cross-platform compatibility, avoids front-end software crashes, and enables efficient human-computer collaborative annotation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244604A_ABST
    Figure CN122244604A_ABST
Patent Text Reader

Abstract

This invention relates to a mask vectorization and lightweight human-computer collaborative annotation method and its application, relating to the interdisciplinary fields of computer vision, artificial intelligence, and smart healthcare. Addressing the severe technical gap between the existing medical pre-annotation system's "low-level AI pixel output" and "high-level lightweight doctor interaction," the mask vectorization and lightweight human-computer collaborative annotation method of this invention is an intermediate-layer vectorization method that performs minimally simplistic topological dimensionality reduction, polygon smooth approximation (fitting compression), and rigorous topological anomaly verification on high-dimensional pixel masks. This method can generate a structured sequence (JSON) with extremely low data volume, absolute security, and perfect compatibility with lightweight annotation software across all platforms without damaging the core morphological features of cells, thereby completely reconstructing the underlying workflow of medical cell annotation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to computer systems based on specific computational models, and to a mask vectorization and lightweight human-computer collaborative annotation method and its application. Background Technology

[0002] With the development of deep learning technology, the automated analysis of medical cell microscopic images (such as bone marrow cell classification and counting for the diagnosis of leukemia and multiple myeloma) has become an important branch of smart healthcare. Because bone marrow cell images typically possess physical characteristics such as "dense target density, varied morphology, blurred boundaries, and significant overlap and adhesion," building high-precision deep learning models requires massive amounts of accurate cell contour annotation data. In practical clinical applications, relying entirely on doctors to manually outline polygonal contours from scratch is not only extremely time-consuming (a single image containing hundreds of cells can take several hours) but also suffers from extremely high inter-observer variability. Therefore, a human-in-the-loop workflow combining "AI pre-annotation + manual fine-tuning" has become an inevitable trend.

[0003] Currently, for AI pre-annotation of medical cell images, the industry mainly has two similar existing technical solutions: Solution 1: A closed-loop pre-annotation system based on pure pixel-level interaction. Implementation: Utilizing instance segmentation models specifically designed for biological cells (such as Cellpose or StarDist), the system calculates the spatial flow field or convex polygon probability of the input image, outputting a pixel-level instance mask matrix. Subsequently, the system packages the original image and the mask matrix, requiring annotators to load it in a dedicated GUI client and use mouse drawing tools to repair or erase erroneous cell regions pixel by pixel. The drawbacks of this solution are: strong reliance on underlying academic software, complex UI, and extreme unfriendliness for clinicians. Mask data (such as in .npy format) is a dense matrix, resulting in large file sizes that hinder cross-system transfer and cannot be directly integrated into the currently common industry-standard vector graphics-based data flywheel pipeline.

[0004] Existing technical solution two: A coarse vectorization method based on general contour extraction. In some conventional computer vision projects, traditional edge detection operators (such as the Canny operator) or basic topological edge-finding algorithms (such as the findContours function in OpenCV) are used to directly extract the binary image output by the AI ​​into a point set and write it directly to a JSON file for mainstream software such as LabelStudio to read. (Source: Offline pre-annotation processing script libraries of mainstream open-source annotation platforms (such as CVAT, LabelStudio)). The shortcomings of this solution are: lack of topological dimensionality reduction for densely packed medical targets: Directly extracting cell edges will produce extremely dense vertices (a single cell can have thousands of coordinate points). When there are hundreds of cells in a bone marrow smear, the generated JSON file structure is abnormally large, causing severe lag and even memory overflow (OOM) in annotation software based on web or lightweight front-end frameworks. Lack of crash-proof topological verification mechanism: When processing medical staining impurities or red blood cell afterimages, the AI ​​pre-segmentation model often outputs only 1-2 pixels of "isolated noise". Conventional algorithms would forcibly convert this data into unclosed line segments or single points and write it into JSON. However, mainstream polygon rendering engines (PolygonRenderer) strictly require the number of polygon vertices N≥3. Reading such dirty data would trigger an underlying JSON parsing exception, causing the software to crash and completely disrupting the doctor's annotation workflow. Summary of the Invention

[0005] To address the aforementioned issues, this invention provides a mask vectorization and lightweight human-computer collaborative annotation method. This method is an intermediate-layer vectorization method that performs minimal topological dimensionality reduction, polygon smooth approximation (fitting compression), and rigorous topological anomaly verification on high-dimensional pixel masks. This method can generate a structured sequence (JSON) with extremely low data volume, absolute security, and perfect compatibility with lightweight annotation software across all platforms without damaging the core morphological features of cells, thereby completely reconstructing the underlying workflow of medical cell annotation.

[0006] This invention provides a mask vectorization and lightweight human-computer collaborative annotation method, comprising the following steps: Image acquisition and preprocessing: The original image is decoded and normalized to convert it into a pixel matrix, and the image parameters are extracted; Instance segmentation mask generation based on deep learning: The image tensor of the pixel matrix is ​​input into the instance segmentation neural network. Through the calculation of spatial topological flow field and cell probability map, each independent instance is assigned a positive integer label and a high-dimensional two-dimensional mask matrix is ​​output. Mask decoupling and edge contour extraction of binary connected domains: For each positive integer label of the high-dimensional two-dimensional mask matrix, a temporary binary mask is generated. The outermost contour boundary of the connected domain in the temporary binary mask is extracted by the topological edge detection algorithm to obtain the original pixel set. Polygon topology dimensionality reduction and contour smoothing fitting: The original pixel set is compressed and dimensionality reduced by a polygon approximation algorithm to obtain a lightweight polygon vertex array; Geometric topology anomaly verification and security interception: The lightweight polygon vertex array is subjected to anomaly verification and security interception using a topology anomaly filter to obtain a valid polygon vertex array; Encapsulation and disk storage of lightweight vector structured files: The effective polygon vertex array is standardized and assembled to obtain a standardized JSON trie. The standardized JSON trie is output for front-end rendering and human interaction, and the effective JSON vector file returned after interactive correction is received. The JSON trie includes: the parameters, the relative path of the original image, and a list of the effective polygon vertex array.

[0007] In the development of medical AI models (especially cell morphology analysis models), high-quality data annotation is a core bottleneck. This invention aims to address three major technical pain points in existing medical cell pre-annotation workflows: Pain Point 1: High interaction cost of pixel-level masks. Existing medical image AI pre-segmentation models output dense two-dimensional pixel matrix files (such as .npy or high-bit-depth .tiff formats). When medical experts correct these pre-annotation results, they must rely on heavyweight software specific to academia (such as CellposeGUI, ITK-SNAP, etc.), and can only modify them through "pixel-level smearing / erasing," resulting in a heavy cognitive load and extremely difficult interaction for non-computer science clinicians. Pain Point 2: Redundancy in data transmission and storage. Pixel-level mask files are large in size, leading to significant waste of bandwidth and storage resources in cloud deployment and large-scale distributed doctor collaborative annotation scenarios. Pain Point 3: Poor cross-platform compatibility and parsing crashes. General lightweight annotation front-ends (such as Labelme, X-AnyLabeling) typically operate based on JSON vector architecture. When complex cell pixel edges are directly converted into polygons, it is easy to generate extremely large objects containing thousands of redundant vertices, or to generate "illegal topologies" with fewer than 3 vertices due to model prediction noise, causing the front-end parsing engine to lag or even crash.

[0008] Based on this, the inventors propose the above-mentioned mask vectorization and lightweight human-computer collaborative annotation method. This method is an intermediate layer vectorization method that performs minimal topological dimensionality reduction, polygon smooth approximation (fitting compression), and strict topological anomaly verification on high-dimensional pixel masks. This method can generate a structured sequence (JSON) with extremely low data volume, absolute security, and perfect compatibility with lightweight annotation software across all platforms without damaging the core morphological features of cells, thereby completely reconstructing the underlying workflow of medical cell annotation.

[0009] In one embodiment, in the image acquisition and preprocessing step, the normalization process includes color space normalization, the pixel matrix includes a pixel matrix in a three-channel color space, and the parameters include absolute height and width. In the deep learning-based instance segmentation mask generation step, each independent instance is assigned a unique positive integer label, and the positive integer labels of all instances are an increasing sequence.

[0010] In one embodiment, in the polygon topology dimensionality reduction and contour smoothing fitting step, the polygon approximation algorithm includes the Douglas-Puk algorithm, and the compression dimensionality reduction includes: calculating the perimeter of the original pixel contour, setting a dynamic approximation accuracy threshold based on the perimeter, comparing the vertical distance from the original pixel point to the baseline segment with the dynamic approximation accuracy threshold using the Douglas-Puk algorithm, using the dynamic approximation accuracy threshold as the judgment criterion, retaining the key anchor point with the largest vertical distance in a recursive manner, and eliminating redundant pixel points within the approximation error range to obtain a lightweight polygon vertex array.

[0011] Understandably, in the recursive operation of the Douglas-Pock algorithm, the aforementioned dynamic approximation accuracy threshold is specifically used to evaluate the degree to which a pixel deviates from the baseline segment. The algorithm calculates the vertical distance from the contour pixel to the baseline segment and compares this distance with the set dynamic approximation accuracy threshold. Only when the vertical distance is at its maximum will the pixel be determined as a "critical anchor point" with a significant curvature change and retained; when the vertical distance is close to the error range, the pixel is determined as a "redundant pixel" and discarded.

[0012] In one embodiment, the formula for calculating the dynamic approximation accuracy threshold is as follows: =α×ArcLength; where, α is the dynamic approximation accuracy threshold; α is the smoothing coefficient, with a value between 0.0005 and 0.05; ArcLength is the perimeter of the original pixel contour.

[0013] In one embodiment, when the pixel area of ​​the connected region in the temporary binarization mask is less than a preset minimum value, the value of α is 0.02-0.05; when the pixel area of ​​the connected region in the temporary binarization mask is greater than or equal to the lower limit of the conventional area of ​​the target cell, the value of α is 0.0005-0.001.

[0014] In one embodiment, the geometric topology anomaly verification and security interception step includes: detecting the length L of the polygon vertex array; when the length L≥3, the polygon vertex array is determined to be valid, allowed, and pushed onto the valid shape stack; when the length L<3, it is determined to be an illegal topology structure and silently discarded.

[0015] In one embodiment, the mask vectorization and lightweight human-computer collaborative annotation method further includes: a small-sample closed-loop fine-tuning step based on inverse rasterization and momentum gradient protection, which is located after the encapsulation and disk storage step of the lightweight vector structured file; The small-sample closed-loop fine-tuning steps based on inverse rasterization and momentum gradient protection include: parsing the polygon anchor point coordinates of the valid JSON vector file, initializing a zero-based tensor, and using a computer vision filling operator to inverse rasterize the vector coordinates back into a high-fidelity binary mask tensor; feeding the high-fidelity binary mask tensor into the instance segmentation neural network for updating; and using a stochastic gradient descent algorithm with momentum and weight decay during backpropagation to update the network weights, and updating the step size by using a fixed learning rate and momentum inertia constraint gradient.

[0016] The present invention also provides a mask vectorization annotation system, through which the mask vectorization and lightweight human-computer collaborative annotation method are implemented. The mask vectorization annotation system includes an image parsing module, an instance segmentation inference engine, a polygon approximation and vector dimensionality reduction module, a topology gatekeeper and serialization output module. The image parsing module is used to input the original image, decode the image, and output a pixel matrix; The instance segmentation inference engine embeds an instance segmentation neural network, which is used to input the image tensor of the pixel matrix, perform forward propagation of the spatial flow field, and output a high-dimensional two-dimensional mask matrix. The polygon approximation and vector dimensionality reduction module is used to separate the binary mask of independent instances in the high-dimensional two-dimensional mask matrix, perform edge tracking, extract contour boundaries, and use a polygon approximation algorithm to fit key anchor points and perform smooth compression to obtain a lightweight polygon vertex array. The topology gatekeeper and serialization output module are used to forcibly verify the geometric validity of the lightweight polygon vertex array, assemble a standardized JSON trie, output the standardized JSON trie for front-end rendering and human interaction, and receive a valid JSON vector file returned after interactive correction.

[0017] In one embodiment, the mask vectorization annotation system further includes a closed-loop fine-tuning module, which is used to parse the valid JSON vector file, use a computer vision filling operator to inversely rasterize the vector coordinates to restore them into a high-fidelity binary mask tensor, and feed it into the instance segmentation neural network for updating.

[0018] The present invention also provides the application of the aforementioned mask vectorization and lightweight human-computer collaborative annotation method or the aforementioned mask vectorization annotation system in cell image annotation.

[0019] The present invention also provides a method for cell image annotation, which uses the mask vectorization and lightweight human-computer collaborative annotation method or the mask vectorization annotation system described above for annotation.

[0020] Compared with the prior art, the present invention has the following beneficial effects: Addressing the severe technical gap between the existing medical pre-annotation system and the "lower-level AI pixel output" and the "higher-level lightweight doctor interaction," this invention presents a mask vectorization and lightweight human-computer collaborative annotation method. This method employs an intermediate-layer vectorization approach that performs simplified topological dimensionality reduction, polygon smooth approximation (fitting compression), and rigorous topological anomaly verification on high-dimensional pixel masks. This method can generate a structured sequence (JSON) with extremely low data volume, absolute security, and perfect compatibility with lightweight annotation software across all platforms without damaging the core morphological features of cells, thereby completely reconstructing the underlying workflow of medical cell annotation. Attached Figure Description

[0021] Figure 1 This is a schematic diagram illustrating the core steps of the mask vectorization and lightweight human-computer collaborative annotation method in the embodiment; Figure 2 This is a schematic diagram of the core module of the mask vectorization annotation system in the embodiment; Figure 3-5 This is a comparison atlas of original microscopic cell images and lightweight annotated interfaces reconstructed by vectorization according to this invention, wherein... Figure 3 (a) Figure 4 (a) Figure 5 (a) A raw medical cell Wright staining image acquired by a microscope; Figure 3 (b) Figure 4 (b) Figure 5(b) is a schematic diagram of the polygon annotation result automatically generated and rendered in one go without loss after processing by the "instance mask dimensionality reduction and vectorization reconstruction" technology of the present invention; Figure 6-9 This is a set of image images showing the accurate recognition results of the segmentation model based on the closed-loop fine-tuning mechanism of this invention, wherein, Figure 6-8 This is a schematic diagram illustrating the usage effects of three actual single samples. Figure 9 This is a schematic diagram illustrating the usage effect of a test dataset with several samples. Detailed Implementation

[0022] To facilitate understanding of the present invention, a more complete description will be given below with reference to the accompanying drawings. Preferred embodiments of the invention are shown in the drawings. However, the invention can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided to provide a thorough and complete understanding of the disclosure of the invention.

[0023] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items.

[0024] Unless otherwise specified, all reagents, materials, and equipment used in this embodiment are commercially available; unless otherwise specified, all test methods are conventional test methods in this field.

[0025] Example I. A mask vectorization and lightweight human-computer collaborative annotation method.

[0026] Flowchart as follows Figure 1 As shown, the method of the present invention mainly includes the following eight core technical steps: Step S1: Acquisition and Preprocessing of Medical Microscopic Images The system acquires medical cell microscopic images (such as bone marrow cell fluorescence images or smear images) to be processed. Due to the diverse formats of medical images (which may include Raw, TIFF, BMP, etc.), the system first performs image decoding and color space normalization at the low level, uniformly converting them into a pixel matrix of the standard RGB (Red Green Blue) three-channel color space, and extracting the absolute height and width parameters of the image for subsequent scaling and mapping.

[0027] Step S2: Generation of instance segmentation mask based on deep learning The RGB image tensor processed in step S1 is input into a pre-trained instance segmentation neural network. At the bottom layer, the network calculates a high-dimensional two-dimensional mask matrix that is exactly the same size as the original image through spatial flow and cell probability map calculations.

[0028] Technical Implementation: In this two-dimensional matrix, the background pixels have a value of 0, while each identified individual cell entity (instance) is assigned a unique, incrementing positive integer label (e.g., 1, 2, 3...N). The system output is then an extremely large and dense set of pixel-level data.

[0029] Step S3: Mask decoupling and edge contour extraction of binary connected regions Iterate through all positive integer labels i∈[1,N] in the above two-dimensional mask matrix. For each label i: Technical Implementation: A separate temporary binary mask is generated, where pixels equal to i in the matrix are set to 1 (white), and the rest are set to 0 (black). Subsequently, the system calls a topological edge detection algorithm (such as the findContours algorithm based on Green's theorem) to extract only the "external contours" of the binary connected domain of the cell, obtaining the original pixel set composed of continuous pixel coordinates.

[0030] Step S4: Polygon topology dimensionality reduction and contour smoothing fitting Because the original pixel set extracted in step S3 is extremely dense (usually containing hundreds or thousands of points with severe jagged edges), direct output would cause front-end rendering to lag. This step performs polygon approximation compression and dimensionality reduction on it.

[0031] Technical Implementation: The Douglas-Peucker algorithm is introduced. First, the full perimeter (ArcLength) of the original pixel contour is calculated. Then, a dynamic approximation accuracy threshold is set. (Epsilon), its calculation formula is: =α×ArcLength (where α is a smoothing coefficient, typically between 0.001 and 0.01). The algorithm recursively retains the key anchor points with the largest vertical distance, removes redundant pixels within the approximation error range, and finally compresses the huge original pixel set losslessly into a lightweight polygon vertex array containing only dozens of two-dimensional floating-point coordinates.

[0032] Meanwhile, when performing Douglas-Puk (RDP) polygon topology dimensionality reduction, using a globally fixed smoothing coefficient may result in the incomplete removal of small staining noise spots, or the loss of crucial pathological serrated edges of target cells with large areas and extremely irregular edges.

[0033] Therefore, in calculating the dynamic approximation accuracy threshold ( When the smoothing coefficient α is equal to α × ArcLength, it is an adaptive dynamic variable. The system dynamically assigns weights based on the extracted "pixel area size of the binary connected region" or "classification label (e.g., red blood cells / white blood cells)": when the area of ​​the connected region is less than the preset minimum value, a very large α value is assigned to accelerate smooth removal; when the connected region is the main target cell and the area of ​​the connected region is greater than or equal to the lower limit of the area of ​​the conventional target cell (e.g., the area of ​​the connected region is greater than 500 square pixels), a very small α value is assigned to achieve high-fidelity fitting of the edge curvature. The preset connected component area is mainly based on the natural difference in physical pixel area between "real cells" and "stained debris / AI noise" in medical images. Specifically, the "preset minimum value" is set to 50 to 100 square pixels (or defined as less than 0.01% of the total area of ​​the entire image). When the extracted binary connected component area is less than 50 pixels, it is judged as a non-target noise afterimage and assigned an α value of 0.02-0.05 (i.e., a very large α value). The connected component judged as a small noise is subjected to a rough smoothing, which quickly degenerates it into a line segment with L<3, so that it will be completely intercepted and discarded in the subsequent topology verification step. When the connected component is the main target cell and the connected component area is greater than or equal to the lower limit of the conventional target cell area, it is judged as a real case cell and assigned an α value of 0.0005-0.001 (i.e., a very small α value), so as to preserve for medical personnel extremely small pathological protrusion features such as the edge of white blood cells, so as to achieve high-fidelity fitting of edge curvature. This invention proposes an "adaptive dynamic weighting" mechanism, allowing the system to handle targets of conventional area with an α value falling within the traditional range of 0.001 to 0.01. However, when specific "minimal area" or "critical target cell" conditions are triggered, the system actively breaks through these conventional limitations. It either jumps upwards to a "maximum" of 0.02-0.05 to perform aggressive erasure, or dives downwards to a "minimum" of 0.0005-0.001 to achieve extremely high-fidelity fitting. This overcomes the limitations of the traditional α value (0.001-0.01), achieving a dynamic and innovative design that pushes it to both ends (maximum and minimum values).

[0034] This achieves the following technical effects: it balances "extreme background compression" and "high-fidelity restoration of target cells," and in terms of patent layout, it blocks competitors from using "dynamic parameters / variable parameters" to circumvent infringement.

[0035] Step S5: Geometric topology anomaly verification and security interception (core anti-crash mechanism) Because medical images may contain minute staining impurities or noise, the AI ​​in step S2 is prone to outputting "tiny pseudo-connected regions" containing only 1 to 2 pixels. After dimensionality reduction in step S4, these pseudo-connected regions degenerate into line segments or isolated points containing fewer than 3 vertices.

[0036] Technical Implementation: Before writing to the structured file, the system incorporates a strict topology anomaly filter. The program automatically detects the length L of each polygon vertex array. Only when L ≥ 3 (i.e., geometrically it can form a closed valid polygonal plane) is the array allowed and pushed into the valid shape stack; if L < 3, it is judged as an "illegal topology" and silently discarded, thus completely eliminating the parsing crash problem that occurs when lightweight front-end software reads JSON from the root of the data flow.

[0037] Step S6: Encapsulation and disk storage of lightweight vector structured files (JSON) Standardize and assemble all valid polygon vertex arrays that passed the verification in step S5.

[0038] Technical Implementation: The system constructs a standardized JSON (JavaScript Object Notation) trie according to the parsing specifications of mainstream lightweight image annotation software (such as Labelme). The trie contains: the global image dimensions (height and width extracted in step S1), the relative paths of the original image, and a list of shapes consisting of all valid polygons (each shape contains a category label and a vertex array of points). Finally, the trie is serialized and saved as a .json vector file.

[0039] Application expansion: Medical personnel only need to use a lightweight, all-Chinese general-purpose annotation software to load the JSON file, and they can directly add, delete, modify, and fine-tune pathological cells on the original image by manipulating just a few anchor points (such as selecting and deleting red blood cell backgrounds with one click), achieving human-computer collaborative interaction with extremely low cognitive load.

[0040] Meanwhile, bone marrow smears often show overlapping and adhering cells or impurities embedded within cells. In traditional pixel-level annotation, overlapping areas are extremely difficult to handle and easily overlap each other.

[0041] Regarding the problems caused by the overlapping / nesting of dense medical cells, since this system naturally supports polygon overlay and union calculation, medical personnel only need to drag the polygon anchor points at the front end to geometrically separate the overlapping parts.

[0042] Moreover, traditional AI annotation software often requires medical personnel to redraw a box and label it as "background" when dealing with "false positives" (that is, misidentifying background impurities as cells), which places an extremely high cognitive and operational burden on them.

[0043] To address this issue, the human-computer collaborative annotation method of this invention proposes to explicitly establish an implicit mapping relationship between the front-end "deletion action" and the back-end "loss calculation". Medical personnel, in a lightweight front-end software, only need to select redundant erroneous cell polygons and perform a "delete" operation, then save the JSON file. The establishment of this implicit mapping relationship includes: 1. Recording "missing state" of front-end operations: When a medical personnel selects a redundant cell polygon and performs a "delete" operation in the lightweight front-end software, the software automatically removes the vertex coordinate array corresponding to the target from the shapes list when saving the JSON vector file. 2. Back-end reverse rasterization reconstruction: Before entering the closed-loop fine-tuning stage, the system back-end parses the modified JSON file. First, it initializes a zero-based background tensor (Empty Mask, all pixel values ​​are 0) in memory, with the exact same size as the original image. Subsequently, the coordinates of the remaining valid polygon vertices in the JSON are traversed, and a computer vision filling operator (such as cv2.fillPoly) is called to inversely rasterize the retained valid vector coordinates, overwriting positive integer labels (such as 1, 2, 3...) on the all-zero tensor. 3. Implicit generation of negative samples and loss calculation: Since the polygon coordinates deleted in the front end no longer participate in the back end's cv2.fillPoly filling, this region naturally remains "0 (background)" in the reconstructed mask tensor. At this time, this reconstructed mask is fed into the instance segmentation neural network as the ground truth for fine-tuning. During the calculation of the loss function, the high-probability features (false positives) originally predicted by the network in this region will strongly conflict with the current ground truth label "0", thus generating a huge error penalty gradient. 4. Weight Update: This empty mask matrix with penalty gradient is forced to participate in backpropagation, which suppresses the model's overactivation of background noise. Thus, hard example mining and model weight correction are implicitly completed without the need for manual explicit selection and labeling of "this is background".

[0044] Step S7: Adaptive data downsampling fine-tuning mechanism based on hardware safety threshold (OOM explosion-proof protection) Medical images have extremely high resolution. During the "closed-loop fine-tuning" stage, if medical institutions import hundreds or thousands of original images, including correction annotations, for reverse rasterization and neural network training, it can easily lead to computer memory overflow (Out of Memory, OOM) and cause software crashes.

[0045] Technical approach employed: Before performing subsequent model incremental fine-tuning (training), the system incorporates a "system-level resource overload interceptor" (this interceptor is set downstream of the "topology gatekeeper and serialization output module," but at specific times, it acts as the gatekeeper for the "closed-loop fine-tuning module," serving as its pre-filter). A dynamic or fixed safety hardware threshold is set (e.g., MAX_IMAGES_LIMIT = 200). The system first iterates through and verifies the imported image and JSON configuration file pairs. This interceptor is activated before the system receives a training instruction containing the original image and valid JSON, and before constructing the massive training tensor in memory. It first reads the number of file pairs (Image + JSON). If the number exceeds the safe hardware threshold (e.g., N>200), it immediately cuts off the memory loading process of the full data, automatically intercepts the full training stream, and forces the "random sampling strategy" or "hard example mining strategy" to be enabled. Only a safe number of data are allowed to enter the subsequent "reverse rasterization" and "network forward / backward propagation" (i.e., no more than the threshold number of key samples are sent from the excessive samples to the training tensor construction queue).

[0046] This achieves the following technical effects: it completely solves the problem of the underlying training engine crashing due to the uncontrollable amount of medical data during localized lightweight deployment, and greatly improves the industrial-grade robustness of the system.

[0047] Step S8: Small-sample closed-loop fine-tuning based on inverse rasterization and momentum gradient protection Once the system receives a valid JSON vector file modified by medical personnel on the lightweight front-end, it initiates a closed-loop fine-tuning process: Technical implementation (1) Reverse rasterization reconstruction: The system parses the polygon anchor point coordinates in the JSON, initializes the all-zero tensor in memory, and calls the computer vision filling operator (such as cv2.fillPoly) to reverse rasterize the vector coordinates and restore them to a high-fidelity binary mask tensor.

[0048] Technical Implementation (2) Momentum Gradient Descent and Anti-Forgetting Update: The reconstructed mask tensor (including strong negative samples of empty masks) is fed into the instance segmentation engine as fine-tuning data. When updating network weights through backpropagation, the system is forced to use a stochastic gradient descent algorithm with momentum and weight decay (SGD with Momentum and Weight Decay) instead of an adaptive optimizer (such as Adam). By fixing the historical inertia of the learning rate and momentum, the step size of gradient update is constrained. While absorbing the geometric features of newly added target cells, the general cytological identification weights of the base model are locked to prevent catastrophic forgetting caused by radical gradient mutations, ensuring that the updated model has strong generalization robustness in complex medical and pathological scenarios.

[0049] Meanwhile, in response to the problems caused by the overlapping / nesting of the aforementioned dense medical cells, the inventors proposed a "Z-axis hierarchical reverse rendering" technology: when the "reverse rasterization reconstruction" step of the closed-loop fine-tuning is started, the system calls a computer vision filling operator (such as cv2.fillPoly) to render the overlapping polygons sequentially from bottom to top according to the primitive order and hierarchical relationship recorded in the front-end JSON.

[0050] This achieves the following technical effects: it perfectly solves the unique problem of "target overlap and occlusion" in medical microscopic images, enabling the overlapping and intersection of front-end polygons to be losslessly mapped to the topological relationship of high-dimensional tensors in the background.

[0051] In addition, to address the cognitive and operational burden caused by "false positive" samples, this system detects the missing coordinates of the original connected components when reverse-parsed the modified valid JSON vector file. It initializes a pure black tensor (Empty Mask) using the original mask position. When fine-tuning the model to calculate the loss function, this pure black tensor is forcibly fed into the instance segmentation engine as a "Hard Negative Sample".

[0052] This achieves the following technical effects: It greatly innovates the human-computer interaction logic: the complex negative sample labeling action is reduced to a "one-click deletion" action that best aligns with human intuition. It endows the model with the ability to learn autonomously and filter out complex medical background impurities without the need for manual redrawing of pixels.

[0053] II. A mask vectorization annotation system.

[0054] The system is instantiated in the computer's memory and processor as the following 5 core modules (such as...) Figure 2 (as shown) 1. Medical Image Parsing Module: This module receives raw cell image sets from microscopes or cloud storage, performs multi-format (Raw / TIFF / JPG) decoding and adaptation, and outputs standard-sized RGB three-channel pixel tensors.

[0055] 2. Instance Segmentation Engine: Embedded with a deep learning network (such as U-Net architecture) trained on medical data, responsible for performing forward propagation of the spatial flow field on the input pixel tensor to generate a multi-class label two-dimensional mask matrix that distinguishes different independent cell entities.

[0056] 3. Polygon Approximation and Vector Compression Module: This module is used to separate the binary connected domains of individual cells, perform edge tracking to extract the outer contour, and use the Douglas-Puk (RDP) algorithm based on a dynamic perimeter threshold to fit key anchor points and smooth the compression of jagged edges, transforming dense pixels into a sparse two-dimensional coordinate array.

[0057] 4. Topological Gatekeeper & Serialization Module: This module serves as the secure output gateway for the system. It enforces the geometric validity of each vertex array (intercepting abnormal primitives where the length L of the polygon vertex array is less than 3), assembles valid primitives into a vector format protocol tree conforming to lightweight front-end rendering specifications, and finally serializes it into a very small JSON file for output to the local machine or a medical collaboration database.

[0058] 5. Closed-loop fine-tuning module: This includes an inverse rasterization module and a momentum gradient descent and anti-forgetting update module. The inverse rasterization module parses the polygon anchor point coordinates in a valid JSON vector file and calls a computer vision filling operator (such as cv2.fillPoly) to inverse rasterize the vector coordinates back into a high-fidelity binary mask tensor. The momentum gradient descent and anti-forgetting update module feeds the high-fidelity binary mask tensor (containing strong negative samples of empty masks) obtained above as fine-tuning data into the instance segmentation engine. During the backpropagation update of network weights, a stochastic gradient descent algorithm with momentum and weight decay is used, and the gradient update step size is constrained by a fixed learning rate and momentum inertia.

[0059] Based on this, the mask vectorization annotation system of the present invention includes two independent data streams on two timelines: 1. Downstream stream (AI pre-annotation output): Image → Image parsing module → Instance segmentation inference engine → Polygon approximation and vector dimensionality reduction module → Topology gatekeeper and serialization output module → Output standardized JSON trie; 2. Upstream stream (data feedback after fine-tuning by medical personnel): Traverse and verify the valid JSON vector file modified by medical personnel and the original image → System-level resource overload interceptor → Inverse rasterization module → Momentum gradient descent and anti-forgetting update module → Underlying model fine-tuning update.

[0060] III. Summary

[0061] Key point one of this invention: Mask dimensionality reduction and vectorization reconstruction technology based on polygon approximation algorithm Existing implementation methods: After inference, existing medical cell instance segmentation models (such as Cellpose) directly output a dense two-dimensional mask matrix (usually stored in .npy or .tiff format) with the same size as the original image. If modification is required, the technical means is to call a specific image processing library to perform pixel-wise overwriting of the mask values ​​in the pixel coordinate system.

[0062] The implementation scheme of the present invention: After the model outputs the mask matrix, the present invention adds an independent vectorization dimensionality reduction transformation layer.

[0063] The specific technical approach is as follows: First, the set of outer edge pixels of the binary connected domain of each cell instance is extracted by traversing; then, the Douglas-Peucker polygon approximation algorithm is introduced to calculate the vertical distance based on the dynamic perimeter threshold, remove redundant pixels within the approximation error range, and compress and fit hundreds of thousands of pixel coordinates into a sparse vertex array containing only dozens of two-dimensional floating-point coordinates with high fidelity. Finally, the array is serialized and assembled into a JSON structure.

[0064] The specific differences between the two are as follows: existing technologies directly output and manipulate the underlying "dense pixel tensor", relying on the heavy rendering front-end of academia; while this invention adds a mathematical dimensionality reduction process, using the "curve smoothing fitting algorithm (RDP)" to transform pixel edges into sparse "vector polygon anchor points", achieving lightweight cross-platform compatibility from the data structure level.

[0065] The second key point of this invention: a topology anomaly interception and secure access mechanism for lightweight graphics engines. Existing implementation methods: Commonly used open-source contour extraction scripts in the industry (such as findContours using OpenCV) employ an "unconditional transformation" technique. That is, regardless of how small the connected components output by the model are (even if they are only 1-2 noisy pixels), their coordinates are extracted as is and directly written to a JSON file.

[0066] The implementation scheme of this invention: After the vertex array is generated but before it is written to the JSON file, a mandatory topological gatekeeper module is added. Specifically, the code includes array length verification logic to strictly determine the length L of each extracted vertex array. If L ≥ 3, it is determined to be a valid polygon that can form a closed plane and is allowed to be written; if L < 3, it is determined to be "pseudo-connected noise" caused by coloring impurities or AI misjudgment, and an exception interception mechanism is triggered to silently remove it from the memory stack.

[0067] The specific differences between the two are as follows: Existing technologies lack the legality verification of graphical topology, which makes it easy to trigger low-level parsing anomalies when the generated invalid structures (points or unclosed line segments) are input into a demanding polygon rendering engine (such as Labelme); This invention hardcodes a conditional filtering statement based on L≥3 before the structured data is written to disk, thereby blocking the risk of front-end software crashes from the source of the data flow.

[0068] Key Point 3: Closed-Loop Fine-Tuning Mechanism for Medical Data Based on Vector Inverse Rasterization and Momentum Gradient Protection Existing implementation methods: Current frameworks typically retrain on the full dataset or use adaptive learning rate optimizers such as Adam by default during fine-tuning to pursue fast convergence. For medical images, this strategy is prone to getting stuck in an extremely narrow "sharp minimum," resulting in very poor model generalization ability; and when faced with a very small number of corrected samples, aggressive parameter updates can cause the model to lose its original pedestal recognition ability.

[0069] The implementation scheme of this invention: This invention innovatively combines "vector inverse rasterization" with "momentum gradient protection strategy". The system can not only inversely restore JSON anchors into tensor matrices to participate in loss calculation, but also strictly limit the optimizer to SGD+Momentum at the underlying level.

[0070] The specific differences between the two are as follows: Existing technologies lack a defense mechanism against "medical small sample incremental learning" at the underlying algorithm level, and rely too much on the fast convergence of adaptive optimizers, sacrificing robustness; In terms of technical means, this invention artificially suppresses the overfitting of the model to a small number of newly added wrong questions by forcibly limiting the gradient descent path with inertia, and achieves a smooth evolution of the model's ability to resist impurities without reorganizing the underlying pre-trained computation graph.

[0071] In summary, compared with existing technologies, this invention has the following significant advantages: First, this invention achieves extremely lightweight medical data interaction and absolutely stable cross-platform compatibility, reducing the time cost of expert annotation by more than 90%. Existing technologies directly output dense pixel matrices, resulting in large files and easily causing general-purpose lightweight front-end rendering crashes. The reason this invention can achieve the above advantages is because it adopts a different technical means from existing technologies, namely, a "polygon approximation algorithm (such as the RDP algorithm)," which reduces and compresses a high-dimensional mask of hundreds of thousands of pixels into a sparse vector JSON sequence of dozens of anchor points with high fidelity. At the same time, this invention innovatively introduces a "topology anomaly verification and filtering mechanism based on the number of vertices L≥3" before output, eliminating illegal pseudo-connected components caused by AI noise at the source, completely blocking the risk of parsing crashes in lightweight front-ends, allowing medical personnel to complete modifications that originally required hundreds or thousands of clicks simply by dragging / deleting a few anchor points.

[0072] Secondly, this invention achieves low-cost, automated evolution of the anti-interference capability of medical AI models (data flywheel closed loop). Existing technologies typically only output pixel prediction results in one direction, and when faced with incorrectly identified negative samples (such as incorrectly cut red blood cells), they can only be simply discarded, failing to be utilized efficiently. This invention achieves this advantage by employing a technique combining "lightweight vector inverse rasterization (JSON to Mask)" with "forced empty mask matrix participation in loss function calculation." It can losslessly reconstruct the action of medical personnel simply pressing the "Delete" key in the front-end software to remove redundant polygons into a high-dimensional negative sample tensor, and forcefully feed this back to the model for hard example mining (Hard Negative Mining) fine-tuning. This technique endows the model with the ability to autonomously learn and shield itself from complex medical background impurities without requiring manual redrawing of pixels.

[0073] Experimental Example 1. Explanation of the lightweight annotation interaction efficiency improvement based on the method of this invention (see...) Figure 3 ) Background: In routine medical bone marrow smears, a single field-of-view image (e.g., 840×800 resolution, see Figure 3) typically contains hundreds of densely distributed cell targets. Using traditional manual pixel-level outlining or bounding box annotation, the annotation process for a single image is extremely time-consuming. Even with commercially available interactive AI-assisted annotation tools, annotators still need to manually select or prompt cells one by one to generate outlines. Due to the blurred boundaries of dense cells, the outlines generated instantly by AI often have inaccuracies, requiring tedious secondary dragging and correction of irregular polygonal edge nodes. Faced with hundreds of targets in a single image, this workflow of "serial interaction and fine-tuning target by target" offers only a very limited reduction in overall annotation time.

[0074] Effects of implementing this invention: such as Figure 3-5 As shown, Figure 3 (a) Figure 4 (a) Figure 5 (a) A raw medical cell Wright staining image acquired by a microscope; Figure 3 (b) Figure 4 (b) Figure 5 (b) The polygon annotation result, after being processed by the "instance mask dimensionality reduction and vectorization reconstruction" technology of this invention, is automatically generated and rendered in a single, lossless manner in a lightweight front-end software (such as X-AnyLabeling). Figure 3-5 Three sets of actual usage results are provided.

[0075] Data Support: This invention constructs a seamless workflow integrating "second-level pre-computation" and "minimalist manual quality inspection." In actual deployment, the AI ​​instance segmentation and topology vectorization dimensionality reduction generation process for a single high-resolution image containing hundreds of dense cells can be fully automated within 10 seconds. After medical personnel load this lightweight JSON vector file, the target cells on the original image have automatically generated precise polygonal outlines. Experts only need to perform a minimally invasive operation of "scanning and verifying - clicking to delete redundancy (such as red blood cells)." Clinical testing has shown that this technology significantly reduces the pure manual intervention time for a single image from the traditional average of 45 minutes for sketching from scratch to 1-3 minutes, while maintaining extremely high pathological accuracy and reducing the manual interaction load by more than 90%.

[0076] 2. Description of reverse rasterization and rapid model iteration verification based on the present invention (see [link]). Figure 6-9 ) Implementation Background: To verify the invention's function of "reverse reconstructing vector JSON into a mask tensor for fine-tuning". The closed-loop data flywheel mechanism of the "model".

[0077] Effects of implementing this invention: such as Figure 6-9The image shows the segmentation performance of the V1 bone marrow cell segmentation model, iterated based on the mechanism of this invention, on a real test set. The background of the image contains a large number of normal red blood cells that do not require attention, while the bright areas represent the pathological target cells that need to be extracted.

[0078] Data Support: Using the method proposed in this invention—"obtaining expert-corrected JSON → forcibly re-rendering as an empty mask → introducing model loss calculation"—the system successfully trained a customized model with extremely strong "negative sample resistance" on a very small dataset (few-shots) using only fewer than 50 samples. Figure 6-9 The prediction results show that the model accurately identified the target cell population while achieving near 100% silent filtering of redundant red blood cell background. These experimental results fully demonstrate the core value of this invention in accelerating medical AI data processing and model evolution.

[0079] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0080] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of this invention patent should be determined by the appended claims.

Claims

1. A mask vectorization and lightweight human-computer collaborative annotation method, characterized in that, Includes the following steps: Image acquisition and preprocessing: The original image is decoded and normalized to convert it into a pixel matrix, and the image parameters are extracted; Instance segmentation mask generation based on deep learning: The image tensor of the pixel matrix is ​​input into the instance segmentation neural network. Through the calculation of spatial topological flow field and cell probability map, each independent instance is assigned a positive integer label and a high-dimensional two-dimensional mask matrix is ​​output. Mask decoupling and edge contour extraction of binary connected domains: For each positive integer label of the high-dimensional two-dimensional mask matrix, a temporary binary mask is generated. The outermost contour boundary of the connected domain in the temporary binary mask is extracted by the topological edge detection algorithm to obtain the original pixel set. Polygon topology dimensionality reduction and contour smoothing fitting: The original pixel set is compressed and dimensionality reduced by a polygon approximation algorithm to obtain a lightweight polygon vertex array; Geometric topology anomaly verification and security interception: The lightweight polygon vertex array is subjected to anomaly verification and security interception using a topology anomaly filter to obtain a valid polygon vertex array; Encapsulation and disk storage of lightweight vector structured files: The effective polygon vertex array is standardized and assembled to obtain a standardized JSON trie. The standardized JSON trie is output for front-end rendering and human interaction, and the effective JSON vector file returned after interaction correction is received. The JSON trie includes: the parameters, the relative path of the original image, and a list of the valid polygon vertex arrays.

2. The mask vectorization and lightweight human-computer collaborative annotation method according to claim 1, characterized in that, In the image acquisition and preprocessing steps, the normalization process includes color space normalization, the pixel matrix includes a pixel matrix in a three-channel color space, and the parameters include absolute height and width. In the deep learning-based instance segmentation mask generation step, each independent instance is assigned a unique positive integer label, and the positive integer labels of all instances are an increasing sequence.

3. The mask vectorization and lightweight human-computer collaborative annotation method according to claim 1, characterized in that, In the polygon topology dimensionality reduction and contour smoothing fitting steps, the polygon approximation algorithm includes the Douglas-Puk algorithm. The compression dimensionality reduction includes: calculating the perimeter of the original pixel contour, setting a dynamic approximation accuracy threshold based on the perimeter, comparing the vertical distance from the original pixel point to the baseline segment with the dynamic approximation accuracy threshold using the Douglas-Puk algorithm, using the dynamic approximation accuracy threshold as the judgment criterion, retaining the key anchor point with the largest vertical distance through recursion, and eliminating redundant pixel points within the approximation error range to obtain a lightweight polygon vertex array.

4. The mask vectorization and lightweight human-computer collaborative annotation method according to claim 3, characterized in that, The formula for calculating the dynamic approximation accuracy threshold is as follows: =α×ArcLength; where, α is the dynamic approximation accuracy threshold; α is the smoothing coefficient, with a value between 0.0005 and 0.05; ArcLength is the perimeter of the original pixel contour.

5. The mask vectorization and lightweight human-computer collaborative annotation method according to claim 1, characterized in that, In the geometric topology anomaly verification and security interception steps, the anomaly verification and security interception include: detecting the length L of the polygon vertex array; when the length L≥3, the polygon vertex array is determined to be valid, allowed, and pushed onto the valid shape stack; when the length L<3, it is determined to be an illegal topology structure and silently discarded.

6. The mask vectorization and lightweight human-computer collaborative annotation method according to any one of claims 1-5, characterized in that, The mask vectorization and lightweight human-computer collaborative annotation method also includes: a small-sample closed-loop fine-tuning step based on inverse rasterization and momentum gradient protection, which is located after the packaging and disk-writing step of the lightweight vector structured file; The small-sample closed-loop fine-tuning steps based on inverse rasterization and momentum gradient protection include: parsing the polygon anchor point coordinates of the valid JSON vector file, initializing the all-zero tensor, and using a computer vision filling operator to inverse rasterize the vector coordinates to restore them to a high-fidelity binary mask tensor; feeding the high-fidelity binary mask tensor into the instance segmentation neural network for updating; and using a stochastic gradient descent algorithm with momentum and weight decay during backpropagation to update the network weights, and updating the step size by using a fixed learning rate and momentum inertia constraint gradient.

7. A mask vectorization annotation system, characterized in that, The mask vectorization and lightweight human-computer collaborative annotation method according to any one of claims 1-6 is implemented by the mask vectorization annotation system, which includes an image parsing module, an instance segmentation inference engine, a polygon approximation and vector dimensionality reduction module, a topology gatekeeper and serialization output module; The image parsing module is used to input the original image, decode the image, and output a pixel matrix; The instance segmentation inference engine embeds an instance segmentation neural network, which is used to input the image tensor of the pixel matrix, perform forward propagation of the spatial flow field, and output a high-dimensional two-dimensional mask matrix. The polygon approximation and vector dimensionality reduction module is used to separate the binary mask of independent instances in the high-dimensional two-dimensional mask matrix, perform edge tracking, extract contour boundaries, and use a polygon approximation algorithm to fit key anchor points and perform smooth compression to obtain a lightweight polygon vertex array. The topology gatekeeper and serialization output module are used to forcibly verify the geometric validity of the lightweight polygon vertex array, assemble a standardized JSON trie, output the standardized JSON trie for front-end rendering and human interaction, and receive a valid JSON vector file returned after interactive correction.

8. The mask vectorization annotation system according to claim 7, characterized in that, The mask vectorization annotation system also includes a closed-loop fine-tuning module, which is used to parse the valid JSON vector file, use a computer vision filling operator to inversely rasterize the vector coordinates to restore them into a high-fidelity binary mask tensor, and feed it into the instance segmentation neural network for updating.

9. The application of the mask vectorization and lightweight human-computer collaborative annotation method according to any one of claims 1-6 or the mask vectorization annotation system according to any one of claims 7-8 in cell image annotation.

10. A method for cell image annotation, characterized in that, The mask vectorization and lightweight human-computer collaborative annotation method according to any one of claims 1-6 or the mask vectorization annotation system according to any one of claims 7-8 is used for annotation.