Multi-modal large model artificial intelligence surgery navigation system based on edge ai and aigc
By utilizing the multimodal large-model surgical navigation system based on EdgeAI and AIGC, and employing the ONE-PEACE-SZYYv3 algorithm and edge computing technology, the problem of operator skill dependence in surgical equipment has been solved, enabling improved surgical quality for doctors at different levels and efficient treatment in special scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUZHOU SHUZHI YUANYU ARTIFICIAL INTELLIGENCE TECH CO LTD
- Filing Date
- 2023-12-27
- Publication Date
- 2026-06-26
AI Technical Summary
The reliance on the operator's skill level for existing surgical equipment leads to significant differences in the quality of surgeries performed by doctors at different skill levels. This is especially true in resource-scarce remote and wartime environments, making it difficult to achieve the digital dissemination of top-level surgical expertise.
A multimodal large-model artificial intelligence surgical navigation system based on EdgeAI and AIGC is adopted. By collecting multimodal data, training with the ONE-PEACE-SZYYv3 algorithm, and deploying pruning distillation technology in an edge computing environment, it provides real-time surgical navigation guidance.
It has enabled the digital dissemination of surgical skills from top surgical experts, improved the quality of surgery for doctors at different levels, especially in remote and wartime environments, and alleviated the problem of resource scarcity.
Smart Images

Figure CN117481814B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence technology, and in particular to a multimodal large-model artificial intelligence surgical navigation system based on EdgeAI and AIGC. Background Technology
[0002] EdgeAI technology is the implementation of artificial intelligence in an edge computing environment, also known as edge AI. Its computation process occurs near the user at the network edge, close to where the data resides, rather than centralized in cloud computing facilities or private data centers. Devices can make faster and more informed decisions within milliseconds without relying on the internet or the cloud.
[0003] With the explosive growth of mobile computing and Internet of Things (IoT) applications, billions of mobile and IoT devices are connecting to the internet, generating massive amounts of data at the network edge. This results in extremely high latency and network bandwidth consumption associated with the collection of massive amounts of data in cloud computing centers, making it imperative to push artificial intelligence computing loads to the network edge to fully unleash the potential of big data. Edge AI, a combination of edge computing and artificial intelligence, is a key concept in state-of-the-art artificial intelligence.
[0004] AIGC is a generative technology in the field of artificial intelligence, a method of content production that discovers rules through data and automatically generates content. Unlike traditional computer intelligence, AIGC technologies, represented by ChatGPT, are increasingly demonstrating near-human levels of understanding and creativity.
[0005] Multimodal is an important concept in the field of artificial intelligence, referring to the fusion of information or perceptual data from multiple modalities to achieve near-human-level reasoning and decision-making. Multimodal systems are built around three basic elements: an input module, a fusion module, and an output module. The input module is a set of neural networks that can receive and process multiple data types. The fusion module is responsible for integrating and processing relevant data from each data type. The output module generates outputs that contribute to a holistic understanding of the data.
[0006] The field of surgery has undergone two major transformations in its history. The first was the upgrade from traditional open surgery to laparoscopic minimally invasive surgery, and the second was the upgrade from laparoscopic minimally invasive surgery to robotic surgical surgery. The third transformation, which is currently underway and is generating fierce international competition, is the upgrade from robotic surgical surgery to intelligent surgery that combines cutting-edge technologies such as AI with surgical navigation systems. Summary of the Invention
[0007] The purpose of this invention is to upgrade traditional surgical equipment into intelligent equipment that integrates cutting-edge technologies. It provides a multimodal large-model artificial intelligence surgical navigation system based on EdgeAI and AIGC, which solves the problem of operator skill dependence in existing surgical equipment.
[0008] The objective of this invention is achieved through the following technical solution: a multimodal large-scale artificial intelligence surgical navigation system based on EdgeAI and AIGC. The method includes: Step 1, multimodal data fusion and decomposition processing. Massive intraoperative images are collected, classified according to surgical completion quality, and each modality's data is added to form a unified, formatted multimodal intraoperative image database; Step 2, the large data is trained using the AIGC-based ONE-PEACE-SZYYv3 large-scale model algorithm to generate and suggest key intraoperative elements; Step 3, the large model is deployed using edge computing with pruning distillation technology to enable real-time guidance for surgeons to complete surgeries quickly, safely, and accurately even in offline, network-disconnected environments.
[0009] The quality of surgical completion is categorized into three levels: Excellent, Good, and Substandard. Excellent represents very high surgical completion quality, but with a very high level of operational complexity; Good represents good surgical completion quality with relatively low operational difficulty; Substandard represents poor surgical completion quality, with significant intraoperative bleeding, excessively long surgical time, or intraoperative errors.
[0010] The modal data includes: intraoperative image data, intraoperative video data, intraoperative audio data, and intraoperative text data.
[0011] The key elements of the operation include: ideal anatomical path, recommended instruments, recommended operating area, and risk area warning.
[0012] The offline, disconnected, isolated environments include: operating rooms without internet access, hospitals in remote areas, and hospitals in battlefield environments.
[0013] This invention has the following beneficial effects: 1. This invention can, to a certain extent, realize the digital dissemination of medical skills by top surgical experts, alleviating the problem of the scarcity of top surgical experts in my country; 2. This invention can solve the problem of operator skill dependence in the surgical field. The same type and model of surgical instruments can result in vastly different surgical quality depending on the surgeon's skill level. The widespread adoption of intelligent intraoperative navigation can help surgeons at all levels complete ideal surgeries as much as possible; 3. This invention can, to a certain extent, improve the treatment level of surgical patients in special scenarios, especially in remote areas and field hospitals. Attached Figure Description
[0014] Figure 1 This is a schematic diagram of the multimodal data of the present invention.
[0015] Figure 2 This is a functional diagram of the ONE-PEACE algorithm.
[0016] Figure 3 A detailed diagram illustrating the ONE-PEACE algorithm architecture.
[0017] Figure 4 This is a schematic diagram of the ONE-PEACE-SZYYv3 algorithm architecture.
[0018] Figure 5 This is a functional diagram of the ONE-PEACE-SZYYv3 algorithm.
[0019] Figure 6 This is a schematic diagram of the cross-modal generator architecture and functions.
[0020] Figure 7 This is a schematic diagram of the hospital deployment of the present invention. Detailed Implementation
[0021] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present invention provided with reference to the accompanying drawings is not intended to limit the scope of protection of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without inventive effort are within the scope of protection of the present invention. The present invention will be further described below with reference to the accompanying drawings.
[0022] This invention relates to a multimodal large-model artificial intelligence surgical navigation system based on EdgeAI and AIGC, and its specific multimodal data fusion and decomposition processing methods, such as... Figure 1 As shown: A large number of intraoperative images were collected and classified into excellent, good, and suboptimal grades based on the quality of surgical completion. Excellent grade represents very high surgical completion quality, but with very high operative complexity; good grade represents good surgical completion quality and relatively low operative difficulty; suboptimal grade represents poor surgical completion quality, with significant intraoperative bleeding, excessively long operation time, or intraoperative errors.
[0023] The ONE-PEACE algorithm novelly proposes a high-parameter, scalable large model that can fuse infinite modal data. Under the ONE-PEACE algorithm, text modal data and audio modal data can be accurately generated into image modal data in real time. For example, ... Figure 2 As shown, the input text data "beach" and audio data "dog barking" can accurately generate fused image modal data.
[0024] like Figure 3The diagram illustrates a detailed analysis of the ONE-PEACE algorithm architecture of this invention. Under the ONE-PEACE architecture, the data input receives text modal data and audio modal data. After passing through a normalization layer (LN) and a linear layer (Linear), the model employs a multi-head attention mechanism with relative position encoding (RPB). Following further processing through multiple normalization layers (LN) and linear layers (Linear), the final image modal data is generated, which integrates key information from both the text and audio modal data.
[0025] The ONE-PEACE-SZYYv3 algorithm developed in this invention, by reverse-engineering the ONE-PEACE architecture and using a self-developed cross-modal generator, achieves the output of multimodal text and speech data from single-modal image data as input, and successfully applies this technology to the field of surgical procedures for the first time. Its specific loss function is shown below:
[0026] In the above formula, N represents batch elasticity, i and j are two-dimensional indices, σ is a learnable variable, and φ is a constant variable to prevent gradient vanishing or gradient exploding. For example, by inputting real-time intraoperative image data, it can accurately generate multi-dimensional text modal data and audio modal data, and comprehensively guide doctors and provide early warnings in real time by integrating various modal data.
[0027] For example, such as Figure 4 As shown, when real-time intraoperative image data is input, after processing by a cross-modal generator, a multi-round normalization layer (LN), a linear layer (Linear), and a multi-head attention mechanism with relative position encoding (RPB), the model can generate intraoperative navigation information based on the current situation.
[0028] Figure 5 The system provides a more intuitive demonstration of the effectiveness of intraoperative guidance, displaying the optimal anatomical path, hazard warning information, and recommended anatomical methods in real time during the operation.
[0029] Figure 6 This is a detailed demonstration of the architecture and functions of the cross-modal generator. The cross-modal generator has a single input end and a multi-dimensional output end. The input end receives single-modal image data, and after multiple convolutions and pooling, it encodes the single-modal image data into tensor format data that can be mapped one-to-many. Then, the tensor format data triggers the multi-modal database. The multi-modal database is classified according to the trigger and outputs multi-dimensional text modal data and audio modal data in real time.
[0030] like Figure 7 As shown, the ONE-PEACE-SZYYv3 algorithm model will eventually undergo pruning and distillation to achieve edge computing deployment of a multimodal large model, and help doctors complete idealized surgeries in real time in offline and disconnected environments.
[0031] Each module in the system of this invention is used to implement the corresponding method steps in the method of this invention, and specifically includes the corresponding operation steps of the method of this invention.
[0032] The above description is merely a preferred embodiment of the present invention. It should be understood that the present invention is not limited to the forms disclosed herein and should not be construed as excluding other embodiments. It can be used in various other combinations, modifications, and environments, and can be altered within the scope of the concept described herein through the above teachings or related technologies or knowledge. Modifications and variations made by those skilled in the art that do not depart from the spirit and scope of the present invention should be within the protection scope of the appended claims.
Claims
1. A multimodal large-model artificial intelligence surgical navigation system based on EdgeAI and AIGC, characterized in that: The system comprises the following main steps: Step 1: Multimodal data fusion and decomposition processing; acquiring massive intraoperative images, classifying them according to surgical completion quality, and adding data for each modality to form a unified formatted multimodal intraoperative image database. Surgical completion quality includes: excellent, good, and suboptimal. Excellent represents very high surgical completion quality but with very high operational complexity; good represents good surgical completion quality with relatively low operational difficulty; suboptimal represents poor surgical completion quality, with significant intraoperative bleeding, prolonged operation time, or intraoperative errors. The data for each modality includes: intraoperative image data, intraoperative video data, intraoperative audio data, and intraoperative text data. Step 2: Training the big data using the AIGC-based ONE-PEACE-SZYYv3 large-scale model algorithm to generate and suggest key intraoperative elements. The key intraoperative elements include... The model includes: ideal anatomical path, recommended instruments, recommended operating area, and risk area warning. Real-time intraoperative image data is input into this model. After processing by a cross-modal generator, multiple rounds of normalization layers (LN), linear layers (Linear), and a multi-head attention mechanism with relative position encoding (RPB), the model can generate intraoperative navigation information based on the current situation. The cross-modal generator has a single input and a multi-dimensional output. The input receives single-modal image data, which is encoded into tensor format data that can be mapped one-to-many after multiple convolutions and pooling. The tensor format data then triggers a multi-modal database, which, based on the trigger classification, outputs multi-dimensional text and audio modal data in real time. Step three: Using pruning distillation technology, the large model is deployed for edge computing, enabling real-time guidance for doctors to complete surgery quickly, safely, and accurately even in offline, network-disconnected environments.