Apparatus and method for monitoring swallowing function on basis of VFSS images
The VFSS image-based monitoring device uses AI for rapid and accurate assessment of swallowing disorders, addressing the need for skilled interpretation and enabling real-time risk detection in swallowing studies.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- WONKWANG UNIV CENT FOR IND ACAD COOP
- Filing Date
- 2025-08-08
- Publication Date
- 2026-07-02
AI Technical Summary
Existing videofluoroscopic swallowing studies (VFSS) require skilled personnel for interpretation and are time-consuming, making it difficult for unskilled users to accurately assess swallowing disorders and identify potential aspiration risks.
A VFSS image-based monitoring device using artificial intelligence, comprising a class identification model (EfficientNetV2) for swallowing class classification and object detection models (YOLOv7) to analyze VFSS images, calculating a Penetration-Aspiration Scale (PAS) score for real-time monitoring and guidance.
Enables unskilled users to quickly and accurately determine the severity of swallowing disorders and provide real-time warnings for potential aspiration, facilitating timely intervention.
Smart Images

Figure KR2025011994_02072026_PF_FP_ABST
Abstract
Description
VFSS image-based swallowing function monitoring device and method
[0001] The present invention relates to a VFSS image-based swallowing function monitoring device and method that objectively evaluates the severity of swallowing disorders based on the PAS (Penetration-Aspiration) scale and enables real-time monitoring of the condition of patients at risk of complications such as aspiration pneumonia.
[0002] Dysphagia is a condition in which a person is unable to consume food smoothly due to a problem with food passing from the mouth to the esophagus.
[0003] Oropharyngeal dysphagia is characterized by difficulty swallowing and may be accompanied by penetration, aspiration, or a sensation of food remaining in the pharynx.
[0004] Penetration, a mild form of dysphagia, refers to a condition where food enters the laryngeal vestibule but fails to reach the vocal cords, and it generally resolves naturally.
[0005] Aspiration is a severe difficulty swallowing caused by accidentally inhaling food or liquid into the airway through the vocal cords. The primary purpose of tube nutrition for patients with dysphagia is to prevent aspiration and aspiration pneumonia, a form of pneumonia in which foreign substances, such as food, saliva, and phlegm, enter the alveoli and lungs through the trachea rather than the esophagus.
[0006] Furthermore, in patients with stroke and dementia, aspiration pneumonia can lead to irreversible sequelae from which permanent recovery is impossible. The prevalence of aspiration pneumonia has shown a steep increase in recent years, reaching 20%, and it ranked as the fourth leading cause of death in 2016. Treating this condition with broad-spectrum antibiotics takes weeks to months. Therefore, medical professionals must accurately diagnose the patient's dysphagia, determine whether to provide tube nutrition, and identify the appropriate enteral nutrition product for the patient.
[0007] Accordingly, a videofluoroscopic swallowing study (VFSS) has been proposed, in which a patient is ingested food mixed with barium, and the process through the oral cavity, pharynx, and esophagus is recorded using X-ray imaging.
[0008] However, there is a disadvantage in that VFSS test results can only be interpreted and analyzed by skilled personnel, and there is a problem in that it takes a considerable amount of time to analyze the vast amount of data included in the VFSS and infer the degree of swallowing disorder.
[0009] Accordingly, in order to solve the aforementioned problems, the present invention aims to provide a VFSS image-based swallowing function monitoring device and method that enables even unskilled users to quickly and accurately determine the degree of swallowing impairment by analyzing VFSS images using artificial intelligence technology.
[0010] The objectives of the present invention are not limited to those mentioned above, and other unmentioned objectives will be clearly understood by those skilled in the art to which the present invention pertains from the description below.
[0011] As a means to solve the above problem, according to one embodiment of the present invention, a VFSS (videofluoroscopic swallowing study) image acquisition unit that acquires a videofluoroscopic swallowing study image of a patient’s food swallowing behavior, divides it into frames, and extracts a plurality of single images; a swallowing class classification unit that identifies a swallowing class corresponding to each of the single images using a class identification model in which the correlation between the image and the swallowing class is pre-learned; an object detection unit that identifies the food location corresponding to each of the single images by classifying the correlation between the image and the food detection result by swallowing class and providing five pre-learned object detection models, and by varying the object detection models according to the swallowing class identification result; a PAS score calculation unit that identifies a food location change pattern by tracking and monitoring the food location in frames and calculates a PAS score based on the food location change pattern; and a patient condition monitoring unit that generates monitoring information including at least one of the VFSS image, the PAS score, and the food location and provides real-time guidance.
[0012] The above swallowing class classification unit is implemented with EfficientNetV2 and is characterized by pre-learning the correlation between an image and a swallowing class based on training data that has an image as an input condition and a swallowing class as an output condition.
[0013] Each of the above object detection models is implemented with YOLOv7 and is characterized by having a correlation between an image and a food detection result pre-learned based on training data that has an image as an input condition and a food detection result as an output condition.
[0014] The above PAS score calculation unit is characterized by setting the PAS score to "1" when food moves into the esophagus, setting it to "2" when food enters the airway but reaches the supraglottic region and is immediately discharged into the duct, setting it to "3" when food remains in the supraglottic region, setting it to "4" when food remains in the supraglottic region for a long time and is discharged into the duct, setting it to "5" when food remains in the supraglottic region, setting it to "6" when food reaches the lower airway and is discharged into the duct again, setting it to "7" when food remains in the lower airway and its position changes, and setting it to "8" when food remains in the lower airway and its position does not change.
[0015] The above-mentioned patient condition monitoring unit is characterized by further including a function that guides a treatment method corresponding to the patient's current PAS score based on a predefined treatment method for each PAS score.
[0016] In addition, the patient condition monitoring unit is characterized by further including a function that searches and analyzes papers uploaded to the internet to extract and guide treatment methods corresponding to the patient's current PAS score.
[0017]
[0018] As a means to solve the above problem, according to another embodiment of the present invention, a VFSS (videofluoroscopic swallowing study) video recording of a patient's food swallowing behavior is obtained, and then divided into frames to extract a plurality of single images; a step of identifying a swallowing class corresponding to each of the single images using a class identification model that has been pre-trained on the correlation between the image and the swallowing class; a step of identifying the food location corresponding to each of the single images by using five object detection models that have been pre-trained on the correlation between the image and the food detection result by class, while varying the object detection models according to the swallowing class identification result; a step of tracking and monitoring the food location on a frame-by-frame basis to identify a food location change pattern and calculating a PAS score based on the food location change pattern; and a step of generating monitoring information including at least one of the VFSS video, the PAS score, and the food location and providing real-time guidance.
[0019] The present invention enables the objective analysis of the degree of swallowing disorder using artificial intelligence technology in VFSS images, thereby allowing even unskilled individuals without medical image analysis capabilities to quickly and accurately identify the degree of swallowing disorder.
[0020] In addition, it enables real-time monitoring of the patient's condition and allows for the rapid identification and warning of dangerous conditions resulting from food infiltration and aspiration.
[0021] FIG. 1 is a drawing for explaining a VFSS image-based swallowing function monitoring device according to one embodiment of the present invention.
[0022] FIGS. 2 and FIGS. 3 are drawings for illustrating a VFSS image-based swallowing function monitoring method according to an embodiment of the present invention.
[0023] FIG. 4 is a drawing illustrating an example of a patient condition monitoring screen according to an embodiment of the present invention.
[0024] Before specifically describing the present disclosure, the method of description in the specification and drawings is described.
[0025] First, the terms used in this specification and claims have been selected based on general terms considering their functions in the various embodiments of this disclosure. However, these terms may vary depending on the intent of those skilled in the art, legal or technical interpretations, and the emergence of new technologies. Additionally, some terms have been arbitrarily selected by the applicant. Such terms may be interpreted according to the meanings defined in this specification; in the absence of specific definitions, they may be interpreted based on the overall content of this specification and common technical knowledge in the relevant field.
[0026] In addition, the same reference numbers or symbols described in each drawing attached to this specification represent parts or components that perform substantially the same function. For convenience of explanation and understanding, the same reference numbers or symbols are used to describe different embodiments. That is, even if components having the same reference number are all depicted in multiple drawings, the multiple drawings do not imply a single embodiment.
[0027] Additionally, in this specification and claims, terms including ordinal numbers, such as "first," "second," etc., may be used to distinguish between components. These ordinal numbers are used to distinguish identical or similar components from one another, and the meaning of the terms should not be limited by the use of such ordinal numbers. For example, the order of use or arrangement of components combined with such ordinal numbers should not be restricted by the number. If necessary, each ordinal number may be used interchangeably.
[0028] In this specification, singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, terms such as "comprising" or "consisting of" are intended to specify the existence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof.
[0029] In the embodiments of the present disclosure, terms such as "module," "unit," "part," etc. are used to refer to a component that performs at least one function or operation, and such component may be implemented in hardware or software, or in a combination of hardware and software. Additionally, a plurality of "modules," "units," "parts," etc. may be integrated into at least one part or chip and implemented as at least one processor, except where each needs to be implemented in specific individual hardware.
[0030] Furthermore, in the embodiments of the present disclosure, when a part is described as being connected to another part, this includes not only a direct connection but also an indirect connection through another medium. Additionally, the meaning that a part includes a certain component implies that, unless specifically stated otherwise, it does not exclude other components but may include additional components.
[0031]
[0032] FIG. 1 is a drawing for explaining a VFSS image-based swallowing function monitoring device according to one embodiment of the present invention.
[0033] Referring to FIG. 1, the device (100) of the present invention includes a VFSS (videofluoroscopic swallowing study) video recording of a patient's food swallowing behavior, a VFSS image acquisition unit (110) that acquires the videofluoroscopic swallowing study video recording and then divides it into frames to extract a plurality of single images; a swallowing class classification unit (120) that identifies the swallowing class corresponding to each of the single images using a class identification model that has been pre-trained with the correlation between the image and the swallowing class; an object detection unit (130) that identifies the food location corresponding to each of the single images by varying the object detection models according to the swallowing class identification result and has five object detection models that have been pre-trained with the correlation between the image and the food detection result by class; a PAS score calculation unit (140) that identifies the food location change pattern by tracking and monitoring the food location in frames and calculates the PAS score based on the food location change pattern; and a patient condition monitoring unit (150) that generates monitoring information including at least one of the VFSS image, the PAS score, and the food location and provides real-time guidance.
[0034] For reference, the VFSS examination device (200) applied to the present invention is implemented as at least one of a computed tomography device, a magnetic resonance imaging device, and an X-ray device, and acquires and provides a VFSS (videofluoroscopic swallowing study) image of the patient's food swallowing behavior by repeatedly photographing the patient's body at a preset frame rate.
[0035]
[0036] Hereinafter, the operation method of the present invention will be described with reference to FIGS. 2 and FIGS. 3.
[0037] First, a VFSS video of the patient's food swallowing behavior is received from the VFSS inspection device (200) through the VFSS image acquisition unit (110) (S1).
[0038]
[0039] VFSS images are divided into frames and converted into multiple single images. Then, data preprocessing operations are performed, such as changing the file format of each single image from a medical image file format (e.g., DICOM) to a general image file format (e.g., TIFF, JPG, etc.) so that image processing is possible (S2).
[0040]
[0041] Then, through the swallowing class classification unit (120), the correlation between the image and the swallowing class is identified using a pre-trained class identification model to identify the swallowing class corresponding to each of the multiple single images (S3).
[0042] The class identification model of the present invention can be implemented using EfficientNetV2, a CNN-based classifier, and allows for pre-training the correlation between an image and a swallowing class through multiple training data sets in which an image is used as an input condition and a swallowing class is used as an output condition.
[0043] At this time, the swallowing class is divided into five stages as shown in Fig. 2: the oral class, where food is present in the mouth; the pharyngeal class, where food reaches the pharynx and involuntary swallowing occurs; the esophageal class, where food moves into the esophagus; the penetration class, where food enters up to the supraglottic region; and the aspiration class, where food passes through the glottis and enters the lower airway.
[0044] Accordingly, the present invention analyzes the input image through a class identification model to immediately determine which stage the current swallowing class corresponds to.
[0045]
[0046] When the swallowing class is determined through step S3, the object detection unit (130) changes and uses the object detection model according to the swallowing class determined through step S3, and additionally identifies the food location corresponding to each single image (S4).
[0047] Each object detection model of the present invention can be implemented with YOLOv7, and can pre-train the correlation between an image and the location of a food through multiple training data that have an image as an input condition and a food detection result as an output condition.
[0048]
[0049] And the PAS score calculation unit (140) collects and analyzes the food location identified through the object detection unit (130) in frame units (i.e., tracks and monitors) to identify the food location change pattern, and calculates and notifies the PAS score based on this (S5).
[0050] For reference, the PAS score consists of a scale from 1 to 8 points.
[0051] 1. Food (bolus) does not enter the airway (normal swallowing).
[0052] 2. Food enters the airway but remains above the glottis and is removed (spontaneous removal after penetration).
[0053] 3. Food enters the airway but remains above the glottis and is not removed (infiltration, failure to remove).
[0054] 4. Food enters the upper part of the gagule and is removed by the effort to close the gagule (removal after deep penetration).
[0055] 5. Food enters the supraglottic region and is not removed (deep penetration, failure to remove).
[0056] 6. Food passes over the glottis into the lower airway but is removed by coughing, etc. (spontaneous removal after aspiration).
[0057] 7. Food passes over the glottis and enters the lower airway, where it is not removed (aspiration, failure to remove).
[0058] 8. Food crosses the glottis and enters the lower airway with no response (cough, swallowing reflex) (asymptomatic aspiration).
[0059] Accordingly, the PAS score calculation unit (140) of the present invention identifies the food position change pattern and sets the PAS score to "1" when the food moves into the esophagus, sets it to "2" when the food enters the airway but reaches the upper glottis and is immediately discharged into the duct, sets it to "3" when the food remains in the upper glottis, sets it to "4" when the food remains in the upper glottis for a long time and is discharged into the duct, sets it to "5" when the food remains in the upper glottis, sets it to "6" when the food reaches the lower airway and is discharged into the duct again, sets it to "7" when the food remains in the lower airway and its position changes, and sets it to "8" when the food remains in the lower airway and there is no change in position.
[0060]
[0061] Finally, the patient condition monitoring unit (150) generates monitoring information including at least one of a VFSS image, a PAS score, and a food location to provide real-time guidance to the medical staff (S6).
[0062] In addition, after verifying the patient's risk status based on the PAS score, if the patient is in a risk state, it is possible to generate warning information to provide it immediately to medical staff.
[0063]
[0064] FIG. 4 is a drawing illustrating an example of a patient condition monitoring screen according to an embodiment of the present invention.
[0065] As illustrated in FIG. 4, the patient condition monitoring unit (150) of the present invention configures a screen that simultaneously guides VFSS images and PAS scores to display them in real time.
[0066] In addition, by generating at least one of a bounding box and text indicating the food location and overlaying them onto the VFSS image, medical personnel and others can more quickly and accurately identify the current location of the food. Furthermore, if there are multiple food detection locations, the bounding box display color can be set differently for each food location.
[0067] In addition, the degree of swallowing difficulty is classified into three grades based on the PAS score, and this can be provided in real-time. For example, a PAS score of 1 is classified as normal, 2 to 3 as mild swallowing difficulty, 4 to 5 as moderate swallowing difficulty, and 6 to severe swallowing difficulty. Furthermore, if the PAS score exceeds a preset threshold (e.g., PAS 4) or the degree of swallowing difficulty becomes moderate or higher, it is determined to be a dangerous condition with a possibility of respiratory arrest due to food, and warning information is generated to provide immediate information to medical personnel.
[0068]
[0069] In addition, considering that the treatment method should vary depending on the severity of the swallowing disorder, the present invention may additionally provide guidance by searching for the optimal treatment method for the patient based on the PAS score through the patient condition monitoring unit (150).
[0070] In this case, the treatment method corresponding to the PAS score may be a method predefined by medical professionals, but if necessary, it may also be the result of extracting and organizing treatment methods by searching and analyzing papers uploaded to the internet. This can be performed by accessing online paper collection platforms using the PubMed API, selecting papers related to swallowing disorders, and then automatically extracting phrases related to 'treatment methods by PAS score' using the PAS score as a search keyword to provide guidance, but it is not limited to this method.
[0071] For example, treatment methods corresponding to the PAS score may be guided as follows.
[0072] 1. Mild swallowing difficulty (PAS stages 1-3): Cases where food is expelled even if it enters the vocal cords, or where the problem is not significant.
[0073] ※ Treatment Approach:
[0074] - Swallowing rehabilitation exercises: Strengthen oral and pharyngeal muscles.
[0075] Supraglottic Swallow: Closes and protects the airway by coughing before swallowing.
[0076] Mendelsohn technique: Activates the hypopharyngeal muscles during swallowing to make swallowing more efficient.
[0077] - Food texture control:
[0078] A method to make swallowing easier by changing to a thicker food (e.g., in the form of jelly or puree).
[0079] Use liquid-controlled beverages.
[0080] ※ Behavioral therapy: Posture adjustment (e.g., Chin Tuck posture, tilting the head forward).
[0081]
[0082] 2. Moderate swallowing difficulty (PAS grades 4-5): Cases where food penetrates the vocal cords but is not expelled, or where there is a high risk of recurrent aspiration.
[0083] ※ Treatment Approach:
[0084] - Strengthening swallowing training:
[0085] Shaker Exercise: Strengthens neck muscles to improve airway obstruction.
[0086] Effortful Swallow: Swallow forcefully to remove residue.
[0087] - Strict control of food texture:
[0088] Limit to thicker liquids or purees.
[0089] Stimulating the swallowing sensation by regulating the temperature of food and beverages.
[0090] - Use of assistive devices:
[0091] Electrical stimulation therapy (e-stim): Airway obstruction and swallowing muscle activation.
[0092] - Insertion of nasogastric tube or gastrostomy tube (temporary): Assists with nutritional intake problems.
[0093]
[0094] 3. Severe dysphagia (PAS grades 6-8): Food is aspirated beyond the vocal cords, and there is no cough reflex or aspiration persists.
[0095]
[0096] ※ Treatment Approach:
[0097] - Parenteral nutrition:
[0098] Safe nutritional support via nasogastric tube (NGT) or gastrostomy tube (PEG) insertion.
[0099] - Swallowing training:
[0100] Sensorimotor stimulation therapy: Activation of the swallowing reflex with heat / taste stimulation.
[0101] Improvement of airway closure ability through electrical stimulation therapy.
[0102] - Surgical treatment for airway protection:
[0103] Vocal cord injection or airway diversion (e.g., tracheostomy).
[0104] - Patient Education and Management:
[0105] To prevent involuntary aspiration during swallowing, limit oral intake or take under supervision.
[0106]
[0107] Meanwhile, the various embodiments described above may be implemented in a recording medium readable by a computer or a similar device using software, hardware, or a combination thereof.
[0108] According to hardware implementation, the embodiments described in this disclosure may be implemented using at least one of ASICs (Application Specific Integrated Circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays), processors, controllers, microcontrollers, microprocessors, and other electrical units for performing functions.
[0109] In some cases, the embodiments described herein may be implemented as the processor itself. In a software implementation, embodiments such as the procedures and functions described herein may be implemented as separate software parts. Each of the aforementioned software parts may perform one or more functions and operations described herein.
[0110] Meanwhile, computer instructions for performing processing operations in electronic devices, etc., according to the various embodiments of the present disclosure described above may be stored in a non-transitory computer-readable medium. When computer instructions stored in such a non-transitory computer-readable medium are executed by a processor of a specific device, they cause the specific device described above to perform processing operations according to the various embodiments described above.
[0111] A non-transient computer-readable medium refers to a medium that stores data semi-permanently and can be read by a device, unlike media that store data for a short period of time such as registers, caches, and memory. Specific examples of non-transient computer-readable media include CDs, DVDs, hard disks, Blu-ray discs, USBs, memory cards, and ROMs.
[0112]
[0113] Although preferred embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the specific embodiments described above. It is understood that various modifications can be made by those skilled in the art without departing from the essence of the present disclosure as claimed in the claims, and such modifications should not be understood individually from the technical spirit or perspective of the present disclosure.
Claims
1. A VFSS (videofluoroscopic swallowing study) image acquisition unit that acquires a videofluoroscopic swallowing study (VFSS) video of a patient's food swallowing behavior, divides it into frames, and extracts it into multiple single images; A swallowing class classification unit that identifies the swallowing class corresponding to each of the single images using a pre-trained class identification model for the correlation between the image and the swallowing class; An object detection unit comprising five pre-trained object detection models that classify the correlation between an image and a food detection result by swallowing class, and identifying the location of the food corresponding to each of the single images while varying the object detection models according to the swallowing class identification result; A PAS score calculation unit that tracks and monitors the food location in frame units to identify a food location change pattern and calculates a PAS score based on the food location change pattern; and A VFSS image-based swallowing function monitoring device comprising a patient condition monitoring unit that generates monitoring information including at least one of the above VFSS image, the above PAS score, and the above food location, and provides real-time guidance.
2. In paragraph 1, the swallowing class classification part A VFSS image-based swallowing function monitoring device implemented with EfficientNetV2, characterized by pre-learning the correlation between an image and a swallowing class based on training data having an image as an input condition and a swallowing class as an output condition.
3. In paragraph 1, each of the object detection models is A VFSS image-based swallowing function monitoring device implemented with YOLOv7, characterized by pre-learning the correlation between an image and a food detection result based on training data having an image as an input condition and a food detection result as an output condition.
4. In paragraph 1, the above PAS score calculation unit A VFSS image-based swallowing function monitoring device characterized by setting the PAS score to "1" when food moves into the esophagus, setting it to "2" when food enters the airway but reaches the supraglottic region and is immediately discharged into the duct, setting it to "3" when food remains in the supraglottic region, setting it to "4" when food remains in the supraglottic region for a long time and is discharged into the duct, setting it to "5" when food remains in the supraglottic region, setting it to "6" when food reaches the lower airway and is discharged into the duct again, setting it to "7" when food remains in the lower airway and its position changes, and setting it to "8" when food remains in the lower airway and its position does not change.
5. In paragraph 1, the patient condition monitoring unit A VFSS image-based swallowing function monitoring device characterized by further including a function that guides a treatment method corresponding to the patient's current PAS score based on a predefined treatment method according to the PAS score.
6. In paragraph 1, the patient condition monitoring unit A VFSS image-based swallowing function monitoring device characterized by further including a function to search and analyze papers uploaded to the internet to extract and guide treatment methods corresponding to the patient's current PAS score.
7. A step of acquiring a videofluoroscopic swallowing study (VFSS) video of a patient's food swallowing behavior, dividing it into frames, and extracting multiple single images; A step of identifying the swallowing class corresponding to each of the single images using a pre-trained class identification model that determines the correlation between the image and the swallowing class; A step of identifying the location of food corresponding to each of the single images by using five pre-trained object detection models that classify the correlation between the image and the food detection result by class, while varying the object detection models according to the swallowing class identification result; A step of tracking and monitoring the food location on a frame-by-frame basis to identify the food location change pattern, and calculating a PAS score based on the food location change pattern; and A VFSS image-based swallowing function monitoring method comprising the step of generating monitoring information including at least one of the above VFSS image, the above PAS score, and the above food location, and providing real-time guidance.