Surgical data acquisition and processing method, apparatus, device, and storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By synchronously collecting and storing surgical video and audio streams, constructing structured data, and generating surgical knowledge products, the problems of inconvenient data management and low information utilization efficiency in minimally invasive surgery are solved, and intelligent personalized data processing is realized.

CN122201255APending Publication Date: 2026-06-12ZHIYING MATRIX (XIONGAN) MEDICAL TECHNOLOGY CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ZHIYING MATRIX (XIONGAN) MEDICAL TECHNOLOGY CO LTD
Filing Date: 2026-03-17
Publication Date: 2026-06-12

Application Information

Patent Timeline

17 Mar 2026

Application

12 Jun 2026

Publication

CN122201255A

IPC: G10L15/02; G10L15/26; G10L25/48; G10L15/22; H04N5/265

AI Tagging

Application Domain

Television system details Color television details

Technical Efficacy Phrases

Reduce the risk of lossincrease profit

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Integrated device of mobile terminal shell and wireless earphone
CN224290022Ueasy to carryefficient combinationBatteries circuit arrangements Earpiece/earphone attachments Terminal equipment Embedded system
A method and system for statistical analysis of photovoltaic power plant operation data
CN121880422Bresolve overlapFix data lossData processing applications Digital data information retrievalStatistical ReportData mining
FTP unstructured file transmission breakpoint continuation implementation method and device based on DataX
CN121262202BImplement breakpoint resume downloadachieve controllabilityData synchronizationFile transmission
Data storage method and device, program product, medium and household appliance
CN122086301AImprove reliability avoid damage Input/output to record carriers Home appliance Data loss
Log data saving method, device, program product, medium and home appliance
CN122111316AImprove reliabilityReduce the risk of lossInput/output to record carriers Hardware monitoring Home appliance Database

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In current minimally invasive surgeries, recording intraoperative video and audio is cumbersome, data is easily lost, information utilization efficiency is low, and there is a lack of intelligent processing, making it difficult to meet the personalized needs of doctors.

Method used

Simultaneously collect surgical video streams and doctors' audio streams, store them in real time as multimodal data files on a unified timeline, convert them into timestamped transcribed text, construct structured data, and generate surgical knowledge products, including video highlights, PPT teaching materials, and surgical summary reports.

Benefits of technology

It enables the simultaneous acquisition and storage of surgical video and audio, reducing the risk of data loss, improving information utilization, reducing the postoperative burden on doctors, providing an intelligent intraoperative cognitive terminal, and supporting personalized data management.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122201255A_ABST

Patent Text Reader

Abstract

The application discloses a surgery data collection and processing method, device, equipment and storage medium. The method comprises the following steps: synchronously collecting a surgery video stream and a doctor voice stream in surgery, and storing the same as a multimodal data file on a unified time axis in real time; converting the doctor voice data into a timestamped transcription text, extracting timestamped surgery information from the transcription text, and constructing structured data based on the surgery information, wherein the structured data comprises a semantic time axis constructed for surgery video data; and generating a surgery knowledge product based on the structured data and the surgery video data. The application can significantly reduce the overall burden of doctors after surgery, reduce the risk of data loss, and improve the utilization rate of surgery information. The application can provide an intelligent intraoperative cognitive terminal for the surgeon, so that each surgeon can conveniently save, analyze and reuse his / her own surgery process and knowledge.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of surgical data acquisition technology, and in particular to a surgical data acquisition and processing method, device, equipment and storage medium. Background Technology

[0002] Currently, in minimally invasive surgery, intraoperative video and audio recording generally relies on manual operation. This includes methods such as connecting a USB drive or external hard drive to the endoscopic imaging unit (CCU) or interventional DSA imaging machine for video copying, or logging into the hospital server post-operatively to retrieve key information frame by frame. These methods are cumbersome, prone to data loss, and inefficient in information utilization, severely impacting the efficiency of post-operative review, teaching preparation, and / or experience accumulation for doctors. Summary of the Invention

[0003] This application provides a surgical data acquisition and processing method, apparatus, computer equipment, computer-readable storage medium, and computer program product, which can solve at least one of the technical problems mentioned in the background art.

[0004] In view of the above, firstly, embodiments of this application provide a method for surgical data acquisition and processing, including: Simultaneously acquire surgical video streams and doctors' voice streams during surgery, and store them in real time as multimodal data files on a unified timeline; The doctor's voice data is converted into time-stamped transcribed text, and time-stamped surgical information is extracted from the transcribed text. Structured data is constructed based on the surgical information, and the structured data includes a semantic timeline constructed for the surgical video data. Surgical knowledge products are generated based on the structured data and the surgical video data.

[0005] Optionally, the surgical information includes: surgical steps, key events, and physician semantic annotations.

[0006] Optionally, the structured data may also include: a summary of surgical procedures and video screenshot descriptions.

[0007] Optionally, a deep learning-based automatic speech recognition system converts the doctor's speech data into the transcribed text; and / or, The transcribed text is parsed using a semantic analysis system based on a large language model to extract the surgical information.

[0008] Optionally, the surgical knowledge product includes at least one of the following: The video compilation is obtained by cropping and splicing video segments from the surgical video data based on the semantic timeline; PPT surgical teaching materials, which include video screenshots and explanations of the surgery; Surgical summary report document.

[0009] Optionally, after generating the surgical knowledge product based on the structured data and the surgical video data, the method further includes: The multimodal data file and at least some of its derived files are uploaded to a cloud storage platform.

[0010] Secondly, embodiments of this application also provide a surgical data acquisition and processing device, including: The acquisition and storage module is used to simultaneously acquire surgical video streams and doctors' voice streams during surgery, and store them in real time as multimodal data files on a unified timeline; The processing module is used to convert the doctor's voice data into time-stamped transcribed text, extract time-stamped surgical information from the transcribed text, and construct structured data based on the surgical information, the structured data including a semantic timeline constructed for the surgical video data; The generation module is used to generate surgical knowledge products based on the structured data and the surgical video data.

[0011] Thirdly, embodiments of this application also provide a computer device, including a memory and a processor; The memory is connected to the processor, the memory is used to store computer programs, and the processor is used to call the computer programs so that the computer device executes the surgical data acquisition and processing method described in any one of the first aspects.

[0012] Optionally, the device further includes a video input interface and an audio receiving module, wherein the video input interface is used to connect to the output port of an imaging device, and the audio receiving module is used to communicate with a voice acquisition device worn by a doctor to receive a doctor's voice stream from the voice acquisition device.

[0013] Fourthly, embodiments of this application also provide a computer-readable storage medium storing a computer program adapted to be loaded by a processor and execute the surgical data acquisition and processing method described in any one of the first aspects.

[0014] Fifthly, embodiments of this application also provide a computer program product, including a computer program that, when executed by a processor, implements the steps of the surgical data acquisition and processing method described in any one of the first aspects.

[0015] The aforementioned surgical data acquisition and processing methods, devices, computer equipment, computer-readable storage media, and computer program products, during the intraoperative phase, synchronously acquire surgical video streams and doctor's voice streams and store them in real time as multimodal data files on a unified timeline. In the postoperative phase, the doctor's voice data in the multimodal data files is converted into time-stamped transcribed text, and time-stamped surgical information is extracted from the transcribed text. Structured data is then constructed based on the surgical information, including a semantic timeline built for the surgical video data. Finally, surgical knowledge products are generated based on the structured data and surgical video data. Therefore, the embodiments of this application can achieve synchronous acquisition and real-time storage of surgical video streams and doctor's voice streams during surgery, and can perform timestamp alignment. Postoperatively, it can automatically generate structured data and surgical knowledge products, significantly reducing the overall postoperative burden on doctors, reducing the risk of data loss, and improving the utilization rate of surgical information. Furthermore, the embodiments of this application break away from the traditional hospital-based recording system, providing individual surgeons with intelligent intraoperative cognitive terminals, enabling each surgeon to conveniently save, analyze, and reuse their surgical procedures and knowledge. Attached Figure Description

[0016] Figure 1 This is a flowchart illustrating the surgical data acquisition and processing method according to an embodiment of this application.

[0017] Figure 2 This is a schematic diagram of the surgical data acquisition and processing device according to an embodiment of this application.

[0018] Figure 3 This is a schematic diagram of a computer device according to an embodiment of this application. Detailed Implementation

[0019] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.

[0020] Currently, surgeons primarily rely on the following two methods for intraoperative video recording: First, you can directly insert a USB flash drive or external hard drive into the endoscopic imaging processing unit (CCU) or DSA device for local recording. Although this method is simple to operate, it has significant limitations. For example, the recorded videos are easily lost, the file names are chaotic, the information cannot be stored in a structured way, and doctors still need to manually review and edit them after surgery, which is extremely inefficient. Moreover, this method is almost impossible to use for remote storage or intelligent analysis.

[0021] Secondly, relying on the hospital's PACS or video management server, intraoperative images / videos are centrally uploaded to the hospital's server, and doctors need to log in to the system to review the recordings after the operation. These systems generally lack the ability to record and recognize voice simultaneously, cannot accurately locate key segments of the operation worth reviewing, and do not have functions such as automatically generating reports and teaching documents. They rely entirely on manual operation by doctors, resulting in serious repetitive work and making it difficult to promote to individual doctors.

[0022] Furthermore, while some intraoperative video recording products exist on the market, most are passive recording devices that only provide recording and playback functions, lacking AI assistance, simultaneous voice recording, and semantic understanding capabilities. There are currently no "intraoperative cognitive devices" designed for individual physician use that possess voice-driven capabilities, intraoperative marking, structured analysis, and automatic content generation.

[0023] In general, the existing technology has the following main drawbacks: (1) Intraoperative data acquisition lacks completeness and intelligence: Most current surgical recording systems only support video acquisition and lack the ability to record synchronously with the surgeon's voice information, resulting in the inability to capture and utilize semantic information such as intraoperative instructions, discussion content, and key markers. This is because traditional systems are mostly designed by image equipment manufacturers and do not take into account the doctor's voice behavior and its information value during the operation process.

[0024] (2) Postoperative care relies heavily on manual operation, resulting in low efficiency: Doctors need to manually search for key images, take screenshots, edit, and write summaries and teaching materials, which is time-consuming, labor-intensive, and prone to errors, seriously affecting the efficiency of doctors' postoperative review, teaching preparation, and / or experience accumulation. The root cause is that the existing system does not integrate any form of semantic understanding or automated content extraction function, and cannot perform structured processing and intelligent assistance on video content.

[0025] (3) Data management is decentralized and risky: The use of USB flash drives or external hard drives for recording results in problems such as easy file loss, disordered naming, and difficulty in backing up content, which affects the preservation and subsequent use of medical records. The reason is that this method relies on manual physical transfer and has not formed a centralized and standardized data management mechanism.

[0026] (4) Lack of flexible deployment solutions for individual doctors: Existing server-based intraoperative recording systems are mostly deployed centrally by hospitals, which makes it difficult to meet the personalized needs of doctors and also limits convenience and usability. This is mainly because the system is designed for institutional rather than personal use scenarios and lacks a lightweight product form that is "mobile, interactive, and intelligent".

[0027] To address these issues, this application provides a surgical data acquisition and processing method, a surgical data acquisition and processing device, a computer device, a computer-readable storage medium, and a computer program product. The embodiments of this application will now be described in detail with reference to the accompanying drawings.

[0028] Please see Figure 1 This application discloses a surgical data acquisition and processing method, which can be widely applied to scenarios such as laparoscopic surgery and vascular / neural interventional surgery, and supports doctors' intraoperative operations and postoperative review.

[0029] The surgical data acquisition and processing methods include: 101. Synchronously acquire surgical video streams and doctor's voice streams during surgery, and store them in real time as multimodal data files on a unified timeline.

[0030] It should be noted that the surgical video stream is generated by capturing the surgical scene. For example, it can be achieved by acquiring and outputting images in real time through a laparoscopic image processing host or a DSA device, thus realizing real-time acquisition of the surgical video stream. The doctor's voice stream is a voice stream formed by real-time acquisition of the doctor's voice during the surgery. It is understood that the doctor's voice stream refers to continuous acquisition, not that the doctor's voice is continuous.

[0031] In some embodiments, a video input interface and an audio receiving module may be provided on the device performing the method. The video input interface is used to connect to the output port of the imaging device, and the audio receiving module is used to communicate with a voice acquisition device worn by the doctor to receive the doctor's voice stream from the voice acquisition device.

[0032] Specifically, the imaging equipment can be an endoscopic image processing host or a DSA device, etc. The audio receiving module can be a Bluetooth receiving module, and the voice acquisition device can be a Bluetooth headset or microphone worn by the doctor. The Bluetooth receiving module pairs with the Bluetooth headset or microphone to acquire the doctor's voice.

[0033] It should be noted that when capturing video, the video capture resolution needs to be compatible with the output specifications of imaging equipment such as endoscope image processing host or DSA device, the audio sampling rate needs to ensure the accuracy of speech recognition, and the Bluetooth protocol needs to be compatible with Bluetooth headsets or microphones.

[0034] It should be noted that real-time storage as multimodal data files on a unified timeline means that the surgical video and the doctor's audio are time-aligned, with each frame of video and each segment of audio given a precise timestamp, which can be based on the system startup time as the starting time. Furthermore, the video and audio are encoded before storage.

[0035] Before synchronously acquiring / recording surgical video streams and doctor's voice streams, it is necessary to start the system first. This can be done via voice commands or by using the touchscreen on the device. Once started, the surgical video stream and doctor's voice stream can be acquired synchronously. At the same time, the system will generate a unified timestamp stream, and then a multimodal data file on a unified timeline can be formed based on the surgical video stream, doctor's voice stream, and unified timestamp stream.

[0036] In addition, preoperative preparation is required before starting recording. During this stage, the doctor can create a recording task (which can be based on case tags or a custom file), complete the communication settings between the audio receiving module and the voice acquisition device (such as pairing the Bluetooth receiving module with the Bluetooth microphone), and check whether the video and voice signals can be transmitted normally.

[0037] 102. Convert the doctor's voice data into time-stamped transcribed text, extract time-stamped surgical information from the transcribed text, and construct structured data based on the surgical information. The structured data includes a semantic timeline constructed for the surgical video data.

[0038] After the surgery is completed, the doctor's voice data and surgical video data can be automatically packaged. Based on the stored doctor's voice data, time-stamped surgical information can be obtained, and then structured data can be built based on the surgical information.

[0039] In some embodiments, a deep learning-based automatic speech recognition system converts a doctor's speech data into transcribed text. The transcribed text can be in JSON format, but is not limited to this.

[0040] In some embodiments, a semantic analysis system based on a large language model (such as GPT-4o or Claude) is used to parse the transcribed text to extract surgical information. However, this is not a limitation. Furthermore, when using a large language model to perform semantic parsing of the transcribed text, predefined rules can be incorporated.

[0041] In some embodiments, surgical information includes surgical steps, key events, and physician semantic annotations.

[0042] During surgery, doctors can record the upcoming steps verbally. They can also record key procedures or other surgical events verbally. Furthermore, if the doctor deems it necessary to mark or edit, they can use verbal prompts such as "Mark here" or "Edit." All of the doctor's speech is collected in real-time as part of their audio stream. After the surgery, when the doctor's audio data is converted into timestamped transcribed text, the aforementioned information—surgical steps, key events, and the doctor's semantic annotations—can be extracted.

[0043] In some embodiments, the structured data further includes: a summary of surgical procedures and video screenshot descriptions.

[0044] After extracting the timestamped surgical information, structured data can be constructed based on this information. This structured data includes a semantic timeline built for the surgical video data, as well as surgical step summaries and video screenshot descriptions. The semantic timeline can be understood as recording the sequence and content of events with time as the coordinate axis, such as recording "at time A, recording the entry into surgical step B." Video screenshot descriptions are textual explanations of video screenshots. These screenshots can be generated based on the doctor's editing commands or other surgical information, and the textual descriptions are generated based on the corresponding surgical information.

[0045] In some embodiments, after generating structured data, desensitization processing can be performed, that is, removing or blurring privacy information such as patient information and information such as logos, and OCR+NER models can be called to detect privacy, etc.

[0046] 103. Generate surgical knowledge products based on structured data and surgical video data.

[0047] Because structured data and surgical video data have a temporal correspondence, surgical knowledge products can be generated jointly based on structured data and surgical video data.

[0048] It is understandable that, in addition to generating surgical knowledge products based on structured data, surgical knowledge products can also be generated by combining post-operative user prompts.

[0049] In some embodiments, the surgical knowledge product may include video highlights, which are obtained by cropping and splicing video segments from surgical video data based on a semantic timeline.

[0050] In some embodiments, surgical knowledge products may include PowerPoint surgical teaching materials, which include video screenshots and descriptions of the surgery. To facilitate the generation of PowerPoint surgical teaching materials, structured templates for preoperative / intraoperative / postoperative procedures can be created, allowing users to directly fill in the templates when generating the materials. Doctors can manually upload preoperative information such as medical history and preoperative images. Postoperative information may include postoperative rehabilitation suggestions, which can be filled in by the doctor or generated by the system based on information stored in the database.

[0051] In some embodiments, the surgical knowledge product may include a surgical summary report document. The surgical summary report document may be in Word or PDF format, but is not limited to these formats.

[0052] In some embodiments, surgical knowledge products may include content that supports intraoperative review and remote teaching collaboration.

[0053] In some embodiments, after generating a surgical knowledge product based on structured data and surgical video data, the method further includes: Upload the multimodal data file and at least some of its derived files to the cloud storage platform.

[0054] Specifically, the device executing the method can be equipped with communication modules such as WLAN and LAN for cloud connectivity and remote uploading. Doctors can remotely access, manage, download, or further edit content through apps or web platforms.

[0055] To ensure data security, all data is stored encrypted by default.

[0056] The ability to upload multimodal data files and at least some of their derived files to a cloud storage platform, along with the support for encryption, makes the files more secure.

[0057] In addition, it can support anonymization configuration to meet the needs of clinical teaching or research.

[0058] In this embodiment, during the intraoperative phase, the surgical video stream and the doctor's voice stream are simultaneously acquired and stored in real time as a multimodal data file on a unified timeline. Postoperatively, the doctor's voice data in the multimodal data file is converted into time-stamped transcribed text, and time-stamped surgical information is extracted from the transcribed text. Structured data is then constructed based on the surgical information, including a semantic timeline built from the surgical video data. Finally, surgical knowledge products are generated based on the structured data and the surgical video data. Therefore, this embodiment can achieve simultaneous acquisition and real-time storage of the surgical video stream and the doctor's voice stream during surgery, and can align the timestamps. Postoperatively, it can automatically generate structured data and surgical knowledge products, significantly reducing the overall postoperative burden on doctors, reducing the risk of data loss, and improving the utilization rate of surgical information. Furthermore, this embodiment breaks away from the traditional hospital-based recording system, providing individual surgeons with an intelligent intraoperative cognitive terminal, enabling each surgeon to conveniently save, analyze, and reuse their surgical process and knowledge.

[0059] It should be understood that although the steps in the flowcharts of the embodiments described above are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the embodiments described above may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.

[0060] Based on the same inventive concept, this application also provides a surgical data acquisition and processing device for implementing the surgical data acquisition and processing method described above. The solution provided by this device is similar to the solution described in the above method; therefore, the specific limitations / descriptions in the embodiments of the surgical data acquisition and processing device provided below can be found in the limitations / descriptions of the surgical data acquisition and processing method above, and will not be repeated here.

[0061] Please see Figure 2 The surgical data acquisition and processing device according to an embodiment of this application includes: The acquisition and storage module 201 is used to synchronously acquire surgical video streams and doctor's voice streams during surgery, and store them in real time as multimodal data files on a unified timeline; Processing module 202 is used to convert doctor's voice data into time-stamped transcribed text, extract time-stamped surgical information from the transcribed text, and construct structured data based on the surgical information. The structured data includes a semantic timeline constructed for the surgical video data. The generation module 203 is used to generate surgical knowledge products based on structured data and surgical video data.

[0062] This application embodiment can achieve synchronous acquisition and real-time storage of surgical video streams and doctor's voice streams during surgery, and can perform timestamp alignment. After surgery, it can automatically generate structured data and surgical knowledge products, which can significantly reduce the overall burden on doctors after surgery, reduce the risk of data loss, and improve the utilization rate of surgical information. In addition, this application embodiment breaks away from the traditional recording system deployed on a hospital-by-hospital basis, and can provide surgeons with an intelligent intraoperative cognitive terminal, enabling each surgeon to conveniently save, analyze, and reuse their own surgical process and knowledge.

[0063] In this application embodiment, the term "module" refers to a computer program or part of a computer program that has a predetermined function and works with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of an overall module or unit that includes the functionality of that module or unit.

[0064] Figure 3 This is a schematic diagram of the structure of a computer device provided in an embodiment of this application. Figure 3 As shown, the computer device may include a processor 601 and a memory 602. The memory 602 is connected to the processor 601 and is used to store computer programs. The processor 601 is used to invoke the computer programs to cause the computer device to execute the surgical data acquisition and processing method described in the above embodiments. Furthermore, the computer device may also include at least one communication bus 603. The communication bus 603 is used to enable communication between components. The memory 602 may be a high-speed RAM or a non-volatile memory, such as at least one disk storage device.

[0065] In some embodiments, the device further includes a video input interface and an audio receiving module, wherein the video input interface is used to connect to the output port of an imaging device, and the audio receiving module is used to communicate with a voice acquisition device worn by a doctor to receive a doctor's voice stream from the voice acquisition device.

[0066] In some embodiments, the device may employ a custom embedded all-in-one host (specifically, a SoC module with lightweight computing and AI acceleration capabilities, such as an AI edge computing module based on ARM architecture), and the casing should meet medical-grade insulation, anti-interference, and sterilization requirements.

[0067] This application embodiment can achieve synchronous acquisition and real-time storage of surgical video streams and doctor's voice streams during surgery, and can perform timestamp alignment. After surgery, it can automatically generate structured data and surgical knowledge products, which can significantly reduce the overall burden on doctors after surgery, reduce the risk of data loss, and improve the utilization rate of surgical information. In addition, this application embodiment breaks away from the traditional recording system deployed on a hospital-by-hospital basis, and can provide surgeons with an intelligent intraoperative cognitive terminal, enabling each surgeon to conveniently save, analyze, and reuse their own surgical process and knowledge.

[0068] This application also provides a computer-readable storage medium storing a computer program adapted to be loaded by a processor and executed by the surgical data acquisition and processing method described in the above embodiments.

[0069] This application also provides a computer program product or computer program that includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the electronic device to perform the surgical data acquisition and processing method as described in the above embodiments.

[0070] It should be understood that, in the embodiments of this application, the processor may be a central processing unit (CPU), but it may also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor.

[0071] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by hardware related to computer program instructions. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.

[0072] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.

[0073] The above-disclosed examples are merely preferred embodiments of this application and should not be construed as limiting the scope of this application. Therefore, any equivalent variations made in accordance with the claims of this application shall fall within the scope of this application.

Claims

1. A method for surgical data acquisition and processing, characterized in that, include: Simultaneously acquire surgical video streams and doctors' voice streams during surgery, and store them in real time as multimodal data files on a unified timeline; The doctor's voice data is converted into time-stamped transcribed text, and time-stamped surgical information is extracted from the transcribed text. Structured data is constructed based on the surgical information, and the structured data includes a semantic timeline constructed for the surgical video data. Surgical knowledge products are generated based on the structured data and the surgical video data.

2. The surgical data acquisition and processing method according to claim 1, characterized in that, The surgical information includes: surgical steps, key events, and semantic annotations by the physician.

3. The surgical data acquisition and processing method according to claim 1, characterized in that, The structured data also includes: a summary of the surgical procedures and video screenshot descriptions.

4. The surgical data acquisition and processing method according to claim 1, characterized in that, A deep learning-based automatic speech recognition system converts the doctor's speech data into the transcribed text; and / or, The transcribed text is parsed using a semantic analysis system based on a large language model to extract the surgical information.

5. The surgical data acquisition and processing method according to claim 1, characterized in that, The surgical knowledge product includes at least one of the following: The video compilation is obtained by cropping and splicing video segments from the surgical video data based on the semantic timeline; PPT surgical teaching materials, which include video screenshots and explanations of the surgery; Surgical summary report document.

6. The surgical data acquisition and processing method according to claim 1, characterized in that, After generating the surgical knowledge product based on the structured data and the surgical video data, the method further includes: The multimodal data file and at least some of its derived files are uploaded to a cloud storage platform.

7. A surgical data acquisition and processing device, characterized in that, include: The acquisition and storage module is used to simultaneously acquire surgical video streams and doctors' voice streams during surgery, and store them in real time as multimodal data files on a unified timeline; The processing module is used to convert the doctor's voice data into time-stamped transcribed text, extract time-stamped surgical information from the transcribed text, and construct structured data based on the surgical information, the structured data including a semantic timeline constructed for the surgical video data; The generation module is used to generate surgical knowledge products based on the structured data and the surgical video data.

8. A computer device, characterized in that, Including memory and processor; The memory is connected to the processor, the memory is used to store computer programs, and the processor is used to call the computer programs so that the computer device executes the surgical data acquisition and processing method according to any one of claims 1 to 6.

9. The computer device according to claim 8, characterized in that, Also includes: The system includes a video input interface and an audio receiving module. The video input interface is used to connect to the output port of an imaging device, and the audio receiving module is used to communicate with a voice acquisition device worn by a doctor to receive the doctor's voice stream from the voice acquisition device.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program adapted to be loaded by a processor and execute the surgical data acquisition and processing method according to any one of claims 1 to 6.