System and method for detecting occupant illness symptoms

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By using multiple sensors to detect and fuse data in shared spaces, disease symptoms can be identified and visualized, solving the problem of the inability to effectively detect disease symptoms in existing technologies and improving the ability to provide infectious disease early warning and cleanliness information.

CN114511768BActive Publication Date: 2026-06-23ROBERT BOSCH GMBH

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: ROBERT BOSCH GMBH
Filing Date: 2021-10-22
Publication Date: 2026-06-23

Application Information

Patent Timeline

22 Oct 2021

Application

23 Jun 2026

Publication

CN114511768B

IPC: G06V10/94; G06V40/70; G16H50/80

AI Tagging

Application Domain

Epidemiological alert systems Multiple biometrics use

Technology Topics

Engineering Source image

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Texitile light ageing test instrument
CN1588059Acompact structure Easy to assemble and disassemble Material analysis by optical meansTextile testingEngineering Light filter
Multi-dimensional training method and device of support vector machine
CN114186620AImprove linear separabilityimprove classificationKernel methods Character and pattern recognition Data set Descent algorithm
Loop structure of cold heat flows
CN1916533AImprove efficiencySimple configurationFluid circulation arrangement Heating and refrigeration combinations Heat flow Working fluid
Environment-friendly mobile collecting box for decoration cutting dust
CN108636005AThe dragging process is smoothavoid secondary flyingUsing liquid separation agent Working accessories Engineering Sediment
An IGBT lifetime prediction method based on a GA-Elman-LSTM combined model
CN115964937BImprove forecast accuracySolve the problem of easy to fall into local minimumInternal combustion piston engines Biological models Engineering Data mining

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies are unable to effectively detect and provide early warning of potential disease symptoms in shared spaces, increasing the risk of infectious disease transmission in public places and shared transportation.

Method used

It uses multiple sensors (such as audio, video, and radar sensors) to detect and fuse data, identify and visualize potential disease symptoms, and provide cleanliness information to service providers and occupants.

Benefits of technology

It improves the accuracy and timeliness of detecting potential disease symptoms, helps reduce the spread of infectious diseases, and provides cleanliness information to facilitate decision-making.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN114511768B_ABST

Patent Text Reader

Abstract

The present invention relates to systems and methods for detecting occupant illness symptoms. Systems and methods for detecting occupant illness symptoms are disclosed herein. In embodiments, a memory is configured to maintain a visualization application and data from one or more sources, such as an audio source, an image source, and / or a radar source. A processor is in communication with the memory and a user interface. The processor is programmed to receive data from the one or more sources, execute a human detection model based on the received data, execute an activity recognition model that identifies illness symptoms based on the data from the one or more sources, determine a location of the identified symptoms, and execute the visualization application to display information in the user interface. The visualization application can display a background image with an overlay image that includes an indicator for each location of the identified illness symptoms. Additionally, data from the audio source, the image source, and / or the radar source can be fused.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to systems and methods for detecting disease symptoms in occupants. In some embodiments, the systems and methods are capable of detecting disease symptoms in people in public or crowded places or in shared mobility such as public transportation or riding shared rides. Background Technology

[0002] Infectious diseases can spread more easily in crowded places such as restaurants, stadiums, and public buildings. The same is true for shared mobility services, such as buses, trains, taxis, and ride-sharing services. Current technology is insufficient to provide owners and / or occupants of such places and services with adequate information about cleanliness. If service providers or occupants had knowledge about potential illnesses among occupants, they could make better decisions to help contain the spread of infectious diseases. Summary of the Invention

[0003] In one embodiment, a system for detecting occupant disease symptoms is provided. The system includes a user interface, a memory configured to maintain a visualization application and image data from an image source, and a processor. The processor communicates with the memory and the user interface. The processor is programmed to receive image data from the image source, the image data including a background image associated with an area occupied by the occupant. The processor is further programmed to: execute a human detection model configured to detect occupants within the image data; execute an activity recognition model configured to recognize image-based disease symptoms of the detected occupant based on the motion of the detected occupant within the image data; determine the location of the identified disease symptoms using the image data from the image source; and execute a visualization application to display an overlay image superimposed on the background image in the user interface. The overlay image includes an indicator for the location of each identified disease symptom, the indicator displaying information that the identified disease symptom occurred at that location.

[0004] In one embodiment, a system for detecting occupant disease symptoms includes a user interface, a memory configured to maintain a visualization application and audio data from an audio source, and a processor communicating with the memory and the user interface. The processor is programmed to: receive a background image from a camera over the area occupied by the occupant; receive audio data from the audio source; execute a classification model configured to classify portions of the audio data as indicative of disease symptoms; determine the location of the disease symptoms based on the classified portions of the audio data; and execute a visualization application to display an overlay image superimposed on the background image in the user interface, the overlay image including an indicator for the location of each determined disease symptom, the indicator displaying information that the disease symptom occurred at that location.

[0005] In another embodiment, another system for detecting occupant disease symptoms includes a user interface, a memory configured to maintain a visualization application and radar data from a radar source, and a processor communicating with the memory and the user interface. The processor is programmed to: receive a background image from a camera over the area occupied by the occupant; receive radar data from the radar source; execute a human detection model configured to detect the occupant based on the radar data; execute an activity recognition model or vital sign recognition model configured to identify radar-based disease symptoms of the detected occupant based on the radar data; determine the location of the radar-based identified disease symptoms using the radar data from the radar source; and execute a visualization application to display an overlay image superimposed on the background image in the user interface, the overlay image including an indicator indicating the location of the radar-based identified disease symptom for each determined symptom location. Attached Figure Description

[0006] Figure 1 An example of a system for detecting occupant disease symptoms according to an embodiment is shown.

[0007] Figure 2 The interior of a vehicle, according to an embodiment, shows the location of a sensor.

[0008] Figure 3 The interior of a bus, according to an embodiment, shows the locations of one or more sensors.

[0009] Figure 4 A flowchart is shown according to an embodiment for detecting and displaying occupant disease symptoms based on audio data.

[0010] Figure 5 The output of a visualization application according to an embodiment is shown, which is used to highlight areas with a larger number of detected occupant disease symptoms.

[0011] Figure 6 A flowchart is shown according to an embodiment for detecting and displaying occupant disease symptoms based on image data.

[0012] Figure 7 An implementation of a human detection application for detecting humans based on data from sensors, according to an embodiment, is shown.

[0013] Figure 8 It is a sequence of frames illustrating the use of disease detection operations or classification.

[0014] Figure 9 A flowchart is shown according to an embodiment for detecting and displaying occupant disease symptoms based on the fusion of image data and audio data.

[0015] Figure 10A flowchart is shown according to another embodiment for detecting and displaying occupant disease symptoms based on the fusion of image data and audio data.

[0016] Figure 11 A flowchart is shown according to an embodiment for detecting and displaying occupant disease symptoms based on radar data.

[0017] Figure 12 A flowchart for detecting and displaying occupant disease symptoms based on radar data, according to another embodiment, is shown.

[0018] Figure 13 A flowchart for detecting and displaying occupant disease symptoms based on radar data, according to another embodiment, is shown.

[0019] Figure 14 A flowchart for detecting and displaying occupant disease symptoms based on radar data, according to another embodiment, is shown.

[0020] Figure 15 A flowchart is shown according to an embodiment for detecting and displaying occupant disease symptoms based on the fusion of radar data, image data, and audio data.

[0021] Figure 16 A flowchart is shown according to an embodiment for detecting and displaying occupant disease symptoms based on the fusion of radar data, image data, and audio data. Detailed Implementation

[0022] Embodiments of this disclosure are described herein. However, it should be understood that the disclosed embodiments are merely examples and other embodiments can take various alternative forms. The drawings are not necessarily to scale; and some features may be enlarged or reduced to show details of specific components. Therefore, the specific structural and functional details described herein are not intended to be limiting, but are merely intended as a representative basis for teaching those skilled in the art to broadly apply the embodiments. As those skilled in the art will understand, the various features illustrated and described with reference to any of the drawings can be combined with features in one or more other drawings to produce embodiments not explicitly shown or described. The combinations of features shown provide representative embodiments of typical applications. However, for a particular application or implementation, various combinations and modifications of features consistent with the teachings of this disclosure may be expected.

[0023] People increasingly rely on shared mobility services, such as buses, trains, taxis, and ride-hailing services like Uber and Lyft. In these shared mobility services, many different people occupy public spaces at different times. With the spread of new infectious diseases, the risk of infection increases when sharing such public spaces. Current technologies cannot provide occupants with sufficient information to assess the cleanliness of shared spaces. This disclosure proposes several novel techniques to assist occupants of shared mobility services in making informed decisions based on disease-indicating activities (such as coughing or sneezing events) of previous occupants, indicated by one or more different types of sensors (such as audio sensors, video sensors, and / or radar sensors). If more than one different type of sensor is used to detect potential disease symptoms in occupants, the sensor data can be fused.

[0024] In other embodiments, the sensors are used in other large, crowded environments, such as restaurants, public buildings, concert venues, and sporting events. The sensors can be used to detect symptoms of illness in occupants of these locations.

[0025] This disclosure also proposes providing such information to providers (e.g., owners or managers) of fleets (such as vehicle rental services). For example, one or more of the sensors described herein can be placed in each vehicle of the fleet and used to detect illness symptoms of occupants in that vehicle and communicate this information to the fleet provider. When the fleet provider learns that a previous occupant may be ill due to the detection of signs such as coughing or sneezing, the fleet provider can disinfect such vehicles and notify co-occupants or subsequent occupants of the potential for infection. This information can also help city planners to understand broadly which routes germs spread more rapidly and the associated symptoms.

[0026] Figure 1 An example system 100 is shown for detecting occupant symptom symptoms and visualizing the detected symptoms. System 100 can also be referred to as a detection and visualization system because it is at least partially configured to process images and determine specific features or quality of images representing occupant symptom symptoms, and to provide a visualization of the detected occupant symptom symptoms so that the occupant or other user can make informed decisions and actions. In other embodiments, the system utilizes audio or radio frequency (RF) to determine occupant symptom symptoms. The illustrated system 100 is not only configured to detect occupant symptom symptoms but also to display information about the symptoms (e.g., image annotations or image overlays) to act on data illustrating the detected or determined symptoms.

[0027] In one or more embodiments, system 100 is configured for capturing image data 102. Combined with or separated from image data 102, system 100 may be configured to capture and process audio data 104 and / or radar data 106. System 100 includes a server 108 hosting a visualization application 110 accessible via a network 114 by one or more client devices 112. Server 108 includes a processor 116 operatively connected to memory 118 and network device 120. Server 108 further includes an image data input source 122 for receiving image data 102, operatively connected to processor 116 and memory 118. Server 108 may also include an audio data input source 124 for receiving audio data 104, operatively connected to processor 116 and memory 118. Server 108 may also include a radar data input source 126 for receiving radar data 106, operatively connected to processor 116 and memory 118. Client device 112 includes a processor 128 operatively connected to memory 130, a display device 132, a human-machine interface (HMI) control 134, and a network device 136. Client device 112 may allow an operator to access network client 138.

[0028] It should be noted that the example system 100 is an example, and other systems consisting of multiple units 100 can be used. For example, although only one client device 112 is shown, it is conceivable that a system 100 including multiple client devices 112 is possible. As another possibility, although the example implementation is shown as a network-based application, alternative systems can be implemented as standalone systems, local systems, or as client-server systems with thick client software. Various components such as image source 122, audio source 124, and radar source 126, and associated data 102, 104, 106 can be received and processed locally at the client devices of system 100 instead of at server 108.

[0029] Each of the processor 116 of server 108 and the processor 128 of client device 112 may include one or more integrated circuits that implement the functions of a central processing unit (CPU) and / or a graphics processing unit (GPU). In some examples, processors 116, 128 are system-on-a-chip (SoCs) that integrate the functions of the CPU and GPU. The SoC may optionally include other components such as memory 118 and network devices 120 or 136 into a single integrated device. In other examples, the CPU and GPU are connected to each other via a peripheral connectivity device such as PCI express or another suitable peripheral data connection. In one example, the CPU is a commercially available central processing unit that implements an instruction set such as x86, ARM, Power, or MIPS instruction set families.

[0030] Regardless of the specific circumstances, during operation, processors 116 and 128 execute stored program instructions retrieved from memories 118 and 130, respectively. The stored program instructions therefore include software that controls the operation of processors 116 and 128 to perform the operations described herein. Memories 118 and 130 may include both non-volatile memory and volatile memory devices. Non-volatile memory includes solid-state memory (such as NAND flash memory, magnetic and optical storage media) or any other suitable data storage device that retains data when system 100 is disabled or powered off. Volatile memory includes static and dynamic random access memory (RAM), which stores program instructions and data during operation of system 100.

[0031] The GPU of client device 112 may include hardware and software for displaying at least two-dimensional (2D) and optionally three-dimensional (3D) graphics to display device 132 of client device 112. Display device 132 may include an electronic display screen, projector, printer, or any other suitable device for reproducing the graphics display. In some examples, processor 128 of client device 112 uses the hardware capabilities of the GPU to execute software programs to accelerate the performance of machine learning or other computational operations described herein.

[0032] The HMI control 134 of client device 112 may include any of a variety of devices that enable client device 112 of system 100 to receive control input from workers, fleet vehicle managers, or other users. Examples of suitable input devices for receiving human-machine interface input may include a keyboard, mouse, trackball, touchscreen, voice input device, graphical tablet, etc. As described herein, the user interface may include any one or both of display device 132 and HMI control 134.

[0033] Network devices 120 and 136 may each include any of a variety of devices that enable server 108 and client device 112 to send and / or receive data from external devices via network 114, respectively. Examples of suitable network devices 120 and 136 include network adapters or peripheral interconnect devices that receive data from another computer or external data storage device, which are capable of receiving large datasets in an efficient manner.

[0034] Visualization application 110 is an example of a network application executed by server 108. When executed, visualization application 110 can use various algorithms to perform aspects of the operations described herein. In the example, visualization application 110 may include instructions executable by processor 116 of server 108 as discussed above. Visualization application 110 may include instructions stored in memory 118 and executable by processor 116 as described herein. Computer-executable instructions can be compiled or interpreted by computer programs created using various programming languages and / or technologies, including, but not limited to, individually or in combination, Java, C, C++, C#, Visual Basic, JavaScript, Python, Perl, PL / SQL, etc. Generally, processor 116 receives instructions, for example, from memory or RAM 118, computer-readable media, etc., and executes these instructions to perform one or more processes, including one or more of the processes described herein. Such instructions and other data can be stored and transmitted using various computer-readable media.

[0035] The network client 138 may be a web browser or other web-based client executed by the client device 112. When executed, the web client 138 may allow the client device 112 to access the visualization application 110 to display the user interface of the visualization application 110. The network client 138 may further provide input received via HMI control 134 to the visualization application 110 of the server 108 via the network 114.

[0036] In artificial intelligence (AI) or machine learning systems, model-based reasoning refers to inference methods that operate based on a machine learning model 140 that analyzes the worldview. Generally, the machine learning model 140 is trained to learn a function that provides a precise correlation between input and output values. At runtime, the machine learning engine uses the knowledge encoded in the machine learning model 140 against observed data to derive conclusions such as diagnoses or predictions. An example machine learning system may include the TensorFlow AI engine, available from Alphabet Inc. in Mountain View, California, although other machine learning systems may be used additionally or alternatively. As specifically discussed herein, the visualization application 110 communicates with the machine learning model 140 and can be configured to identify features of image data 102 for use in efficient and scalable ground reality generation systems and methods to produce high-precision (pixel-level accuracy) annotations for developing object detection / localization and object tracking. In some embodiments, the visualization application 110 communicates with the machine learning model 140 and can be configured to identify audio features or patterns of audio data 104 for use in similar systems to generate visual output on a display 132 or web client 138 at the location of such audio source. In some embodiments, the visualization application 110 communicates with the machine learning model 140 and can be configured to identify radar features or patterns in the radar data 106 for use in similar systems to generate visual output on a display 132 or web client 138 at the location of a person at a target detectable by radar. In short, the visualization application may include or communicate with the machine learning model 140 for performing image recognition (e.g., Figure 6 Steps 606-612), audio recognition (e.g., Figure 4 Steps 406-412) and / or radar identification ( Figure 11 Each step of steps 1106-1112) and / or any fusion step including two or more of these techniques.

[0037] Image data input source 122 may be a camera, for example, mounted in a location such as a vehicle, fleet of vehicles, public transportation, a restaurant, an airplane, a movie theater, or other location where large-scale traffic or gatherings of people occur, or other locations where the presence and location of a person exhibiting symptoms of illness may be determined. Image data input source 122 is configured to capture image data 102. In another example, image data input source 122 may be an interface, such as a network device 120 or an interface to memory 118, for retrieving previously captured image data 102. Image data 102 may be a single image or a video recording, such as a sequence of images. Each image in image data 102 may be referred to herein as a frame. For privacy reasons, faces and license plates may be blurred from image data 102 for certain annotation or visualization tasks.

[0038] Audio source 124 may be an acoustic sensor or microphone mounted in the exemplary locations described above and configured to detect and locate events of interest (e.g., areas where disease symptoms occur). Audio source 124 is configured to capture audio data 104. In another example, audio input source 124 may be an interface, such as network device 120 or an interface to memory 118, for retrieving previously recorded audio data 104. Audio data 104 may be received audio from audio source 124 (e.g., a microphone), which can always be detected and / or recorded when audio source 124 is enabled. As will also be described herein, audio source 124 may be multiple audio sources 124 in an array or at various locations, thereby allowing the determination of triangulations or the location of a subject occupant with disease symptoms.

[0039] Radar source 126 can be a contactless sensor configured to detect human vital signs, such as respiration, respiratory rate, heart rate, heart rate variability, and human emotions, by analyzing the interaction between radio frequency signals and physiological movements, without any contact with the human body. A non-limiting example of such radar source 126 is a Doppler SD radar, in which a continuous wave (CW) narrowband signal is transmitted, reflected from a human target, and subsequently demodulated in the receiver of radar source 126. Other radar sources 126 include ultra-wideband (UWB) radar or other CW radar devices, or millimeter-wave sensors, such as 60-GHz or 77-GHz millimeter-wave sensors.

[0040] Figure 2An embodiment of the placement of sensor 200 within vehicle 202 is illustrated. Vehicle 202 may be a passenger vehicle, such as a sedan, van, truck, SUV, etc. As described herein, in other embodiments, the vehicle is a bus, train, airplane, or other public transport vehicle. The sensor can be one or more of image source 122, audio source 124, radar source 126, or any combination thereof. The deployment and placement of the sensor may depend on the environment. For example, in the illustrated embodiment, sensor 200 is mounted on or attached to the dashboard 204 of vehicle 202. In other embodiments, sensor 200 is mounted on or attached to the windshield 206, rearview mirror 208, or other locations within vehicle 202. In this embodiment, sensor 200 is positioned such that it can appropriately receive image data, audio data, and / or radar data from occupants within vehicle 202.

[0041] Instead of using a single sensor 200, arrays or multiple sensors 200 can be placed throughout the vehicle. In embodiments where the vehicle is a bus or other large, multi-passenger vehicle, multiple sensors 200 can be utilized throughout the vehicle. More sensors can be used to cover large shared mobility spaces, such as in buses or trains. As an example, in Figure 3 The diagram illustrates the deployment of multiple sensors 200 in bus 302. Sensors can be deployed in other areas of bus 302, including the ceiling, under or above the seats, and other locations.

[0042] As described herein, sensor 200 can be used in any vehicle, particularly for transporting multiple occupants simultaneously (e.g., buses) or separately at different times (e.g., ride-hailing or fleet vehicles, vehicle rentals, etc.). Similarly, sensor 200 can be located in non-vehicle locations such as restaurants, public buildings, airports, arenas, stadiums, and other such locations where high pedestrian traffic or density is likely. In short, the descriptions and illustrations provided herein are not intended to limit sensor 200 to use solely within vehicles.

[0043] Figure 4 A flowchart illustrates an embodiment of a system 400 for detecting events indicative of occupant disease symptoms, locating the events, and visually displaying relevant information. These steps can be performed by... Figure 1At least some of the structures shown are used to perform this, such as processors 116 and 128, audio source 124, memory 118, audio data 104, etc. In this embodiment, one or more of the sensors 200 are placed around a desired location with an occupant, such as the aforementioned vehicle, building, etc. In this embodiment, one or more of the sensors include audio source 124, such as a microphone. Audio source 124 is configured to continuously listen to audio sounds at a specific sampling rate when used. In other words, at 402, the system receives audio data 104, such as acoustic signals, from audio source 124.

[0044] System 400 can include a preprocessing step at 404. The captured audio data 104 is denoised using filters. The audio data 104 is then segmented using a sliding window algorithm. Similarly, privacy-preserving audio processing can be used to meet user privacy requirements. For example, the system can be configured to selectively cancel or reject human speech from a continuous audio stream using a speech activity detection (VAD) algorithm. By performing VAD in the preprocessing stage, unnecessary encoding or transmission of silence packets can be avoided, or noise or irrelevant speech can be removed, thereby saving computation and network bandwidth. Various embodiments of VAD are envisioned and should be included within the scope of this disclosure. For example, many VAD systems follow a general architecture of: (i) first performing noise reduction, then (ii) calculating features or quality from a segment of the input signal, such as audio data 104, and then (iii) applying classification rules to classify the segment as speech or non-speech, optionally applying a threshold and comparing the classified noise with the threshold.

[0045] System 400 can also include a feature extraction model or application at 406. At this step, the relevant audio data, which has been denoised and filtered as described above, is extracted for analysis. At this step, relevant features of the audio data can be extracted using Mel-frequency cepstral coefficients (MFCC), Agora convolutional neural networks (CNN), or other types of machine learning, temporal features, frequency domain features, and / or combinations thereof. Depending on the type of feature extraction algorithm, the extracted data (audio feature representation) can be stored as a multidimensional vector or matrix.

[0046] System 400 can also include a classification model or application at 408. At this step, a classifier is used to classify audio events. A portion of the preprocessed and extracted audio data can be classified as sneezing, coughing, shortness of breath, or other audio that indicates the likelihood of an occupant's illness. For this purpose, a Support Vector Machine (SVM), Random Forest, or Multilayer Perceptron classifier can be used. The machine learning model 140 described herein can be implemented for this purpose. Similarly, audio feature learning and classification can be performed end-to-end using deep audio analysis algorithms, where time-domain waveforms are used as input. For example, a CNN with 34 weight layers can efficiently optimize very long sequences, such as vectors of size 32,000, to process acoustic waveforms. This can be achieved using batch normalization and residual learning. An example of such a model is discussed in "Very deep convolutional neural networks for raw waveforms" presented by Wei Dai, Chia Dai, Shuhui Qu, Juncheng Li, and Samarjit Das at the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[0047] System 400 can also include angle of arrival (AoA) estimation or determination at 410. AoA can be implemented to estimate the location of a sound source, such as a cough or sneeze. To perform this, the system can include multiple sensors 200 or audio sources 124. Beamforming algorithms can be used to estimate the AoA of the incoming acoustic signal. If the audio source is, for example, a microphone, this can be achieved, for example, by using a single microphone array with multiple microphones, using delay and beamforming and multi-signal classification (MUSIC) algorithms.

[0048] After AoA estimation is completed, the localization process can occur at 412. Audio direction-finding techniques, such as triangulation, can be implemented. This provides the source location of the event of interest (e.g., coughing, sneezing, etc.). In a simplified example, the location of the sound source being analyzed can be determined by a processor by measuring the time difference between each audio source receiving the audio. For example, if an array of microphones is used, the time between the first and second microphones receiving the audio signal is noted by (multiple) processors and compared with the time between the second and third microphones receiving the audio signal. This process can continue for numerous sensors provided at the system's location.

[0049] In another embodiment, such as Figure 4As shown, after the classification at 408, the AoA estimation can be skipped and the localization at 412 can be performed based on the intensity of the acoustic signal itself, instead of pulling data from the AoA estimation step 410.

[0050] At step 414, the system then performs time-series aggregation. In this step, audio events of interest detected throughout the day are aggregated. The system is able to calculate how many times each audio event occurred in each region of the location. For example, if the system is implemented on a bus, the aggregation can compile the number of sneezing or coughing events that occurred at a specific seat on the bus. In the case of a restaurant, the aggregation can compile the number of sneezing or coughing events that occurred at a specific table in the restaurant. This aggregation 414 can aggregate the number of audio events indicating a disease at each audio source (e.g., a microphone) or at each defined (e.g., triangulated) location. The aggregation results can be stored locally on memory 118 or stored in the cloud via network 114.

[0051] The result of aggregation 414 can trigger a flag in the system, indicating that a specific region of interest is affected by a large number of occupant symptoms and requires disinfection. For example, aggregation could indicate via audio signal processing that a specific seat on a bus is affected by a large number of occupant symptoms, and that area of the bus could be flagged as infected until the seat is cleaned. The number of detected occupant symptoms can be compared to a threshold for flagging an area as infected. For example, the threshold could be three, so that if the system detects three detected occupant symptoms (e.g., cough or sneezing detected by an audio signal) since the most recent cleaning, the system flags the area as infected until it is cleaned again. After the target area is disinfected, the aggregation can be reset to zero.

[0052] The system can then perform visualization at point 416. At this step, the aggregated information from 414 is displayed to personnel in a visually friendly format. The visualization can be viewed at client device 112 (e.g., a display device or user interface) or locally when transmitted over network 114. In one example, a "heatmap" can be displayed to personnel for visualization. The heatmap can be color-coded, showing different colors at each location corresponding to the number of disease symptoms detected at those locations. The visualization may include a background image. The background image can be a static single image of the occupant's location (e.g., an empty bus). Alternatively, the background image can be a live view of the occupant's location (e.g., a video). A heatmap with colors corresponding to the locations of detected diseases can be overlaid on the background image.

[0053] Figure 5An example of a visualization 500 shown on a monitor for human observation is presented. Figure 5 The images shown are from an image or video source, such as a camera or image source 122. In this example, image source 122 is installed inside the bus to display real-time images of the interior of bus 502. The system can be pre-programmed such that the locations shown in the images are matched with corresponding locations detected from audio source 124. In other words, the locations of disease symptoms detected by audio source 124, as explained herein, can be superimposed on the image from image source 122; matching can be performed at an initial step between the locations shown in the images and the locations determined by the audio source, so that the processor can perform simple color coding on the images in the regions that match the determined locations of the disease symptoms detected from audio source 124.

[0054] exist Figure 5 In the illustrated embodiment, the majority of the background image 502 is superimposed on the overlay image 504. In this embodiment, the overlay image 504 includes a blue or dark tone where no detected disease symptoms are found. In other embodiments, the overlay image 504 is sharp such that the background image 502 is not distorted or color-coded in areas where no detected disease symptoms are found. Using the system explained herein, the signal received from the audio source 124 is processed, and the locations of detected disease symptoms from the occupant are determined. These locations correspond to different color tones or shades, as shown in regions 506 and 508. Region 506 may correspond to a location with five detected recent occupant disease symptoms, while region 508 may correspond to a location with four detected recent occupant disease symptoms. Regions 506 and 508 are also part of the overlay image 504 superimposed on the background image 502. Thus, the heatmap shows region 506, which is redder or has a brighter color, superimposed on image 502. Figure 5 The heatmap shown is merely one example of an indicator displaying detected occupant disease symptoms occurring at locations 506 and 508. In other embodiments, unlike color-coded heatmaps, the overlay image 504 can display boxes, stars, circles, or other such indicators corresponding to areas where disease symptoms were detected.

[0055] This exemplary visualization 500 can be shown in various configurations. Of course, the visualization can be provided to the owner or manager of a location, such as a fleet of vehicles, buses, restaurants, etc. Alternatively, the visualization can be displayed on the smartphones or mobile devices (e.g., client device 112) of passengers or occupants at the location to provide them with informed decisions about locations to avoid, reducing the chance of infection transmission. Network 114 can convey such information to mobile devices through the exemplary structure explained herein. The visualization can also be integrated into augmented reality (AR) applications on the mobile devices of passengers or occupants. The visualization can also be provided on displays installed within an area (e.g., inside a bus) to notify the current occupant of potential contamination.

[0056] In another embodiment, instead of displaying enhanced information, aggregated information can be stored locally and notified when a user is near a location where a large number (e.g., above a threshold) of detected occupant diseases have been detected. Each sensor 200 may be equipped with a speaker and can output audio notifications when a user approaches such a potentially contaminated area that has not yet been cleaned.

[0057] Figure 6 A flowchart is shown of an embodiment of a system 600 for detecting events indicative of occupant disease symptoms, locating the events, and visually displaying relevant information. Again, these steps can be performed by… Figure 1 At least some of the structures shown are used to perform this, such as processors 116 and 128, image source 122, memory 118, image data 102, etc. In this embodiment, one or more of the sensors 200 are placed around a desired location with an occupant, such as the aforementioned vehicle, building, etc. In this embodiment, one or more of the sensors include image source 122, such as a camera. Image source 122 is configured to continuously capture images or a series of images (video) at a specific sampling rate when in use. In other words, at 602, the system receives image data 102, such as captured images, from image source 122.

[0058] System 600 can include a preprocessing step at 604. For consistency reasons, for all images fed into the system, the captured images can be resized to their base size at 604. The captured images can also be denoised to smooth the image and remove unwanted noise. One example of denoising is using Gaussian blur. During the preprocessing step still at 604, the image can be segmented to separate the background from the foreground object. Other preprocessing functions can be performed to prepare the image for processing such as human detection and feature extraction.

[0059] Once the image has been preprocessed at 604, the system performs a human detection step at 606. One or more object detection techniques can be used, such as You Only See Once (YOLO), Single-Shot Multi-Box Detector (SSD), Faster R-CNN, and so on. Many of these object detection techniques utilize pre-trained models for "human" or "person" detection. This can be performed, for example, as part of machine learning model 140.

[0060] Figure 7 Image 700 of occupants within a test area (such as a bus) is shown. The human detection technique in step 606 provides bounding boxes around each detected human, such as... Figure 7 The bounding boxes 702, 704, and 706 are shown in yellow. Some object detectors (such as YOLO) also provide outputs including the percentage of confidence that the detected target is actually a human. By default, the bounding box is placed around humans if only a certain confidence level (e.g., 50% or higher) is met. However, this confidence threshold can be adjusted.

[0061] Return to reference Figure 6 If a human is detected at position 606, the system can perform feature extraction or modeling at position 608. At this step, relevant visual features are extracted from each person for action recognition to identify sneezing, coughing, or other such movements that might indicate an underlying disease. To capture spatiotemporal features, a two-dimensional (2D) convolutional network (ConvNet) can be inflated into a three-dimensional (3D) convolutional network, and the inflated 3D ConvNet (I3D) features can be used. The filters and pooling kernels of the ultra-deep image classification ConvNet can be extended to 3D, enabling the learning of a seamless spatiotemporal feature extractor from images or videos. Alternatively, deep convolutional networks, such as VGG16 (Simonyan, Karen, and Andrew Zisserman, "Very deep convolutional networks for large-scale image recognition"), can be used. arXiv preprint arXiv:1409.1556 (2014) or ResNet (He, Kaiming et al., "Deep residual learning for image recognition") Proceedings of the IEEE conference on computer vision and Pattern recognition(2016)) can be used to extract spatial features and then integrated into LSTM-based networks for action recognition. A sliding window can be used to capture the features of each person within that time window. Similarly, OpenFace (Amos, Brandon, Bartosz Ludwiczuk, and Mahadev Satyanarayanan, "Openface: A general-purpose face recognition library with mobile applications") can be used to extract spatial features and then integrate them into LSTM-based networks for action recognition. CMU School of Computer Science 6 (2016) or DeepFace (Taigman, Yaniv et al., Deepface: Closing the gap to human-level performance in face verification). Proceedings of the IEEE conference on computer vision and Pattern recognition Neural networks (e.g., 2014) can be used to capture facial features. By using these feature extraction systems, facial features can be used for activity recognition and detection of additional health parameters. For example, facial feature extraction systems can extract human facial or body features, which can then be used to detect potential illnesses such as sneezing, coughing, runny nose, red eyes, fatigue, rashes, or body aches. Thus, a person's nose, eyes, mouth, and hands can be detected and extracted at 608 points using a feature extraction model.

[0062] Privacy-preserving technologies can be used to protect the privacy of occupants. In one embodiment, the pixels of the captured image are transformed in a way that facial recognition algorithms cannot identify people, but features used for activity recognition are minimally affected by this transformation.

[0063] In the case of extracting facial and body features, an activity recognition step can be performed at step 610. At this step, a classifier is used to classify human activities using the extracted visual features. For this purpose, a fully connected layer can be added next to the feature map extracted in step 608. Alternatively, a Support Vector Machine (SVM), Random Forest, or Multilayer Perceptron classifier can be used. The classifier can classify visual events into the following events of interest: sneezing, coughing, shortness of breath, runny nose, tearing, red eyes, fatigue, body aches, and / or vomiting. This can be referred to as performing a disease detection operation, or more broadly, an activity recognition model. The model can use machine learning systems, such as those described herein.

[0064] As an example, Figure 8The image shows a sequence of frames displaying a person sneezing. When a person sneezes or coughs, the activity can be subdivided by detecting whether the person's hand covers their face during the event and combining this with head movements. This is an example of the output of a disease detection operation using image data.

[0065] The classifier can also categorize visual events as someone disinfecting an area by indicating that someone is wiping or spraying the area. This can be recorded as a positive cleaning event, which can reset time-series aggregations, or it can be used to update the area cleanliness stored in the system.

[0066] Return to reference Figure 6 The system can employ localization at point 612. At this step, the location of the event is estimated using the coordinates of the bounding box of the person of interest. This can be achieved through depth analysis of the person relative to his or her surroundings within the field of view. This can be performed using a single image capture device or multiple image capture devices (for additional confidence). Prior steps can be provided to calibrate the image capture devices to map how each pixel relates to its physical real-world location. One or more of the image sources 122 can be equipped with onboard depth detection to determine the depth (e.g., distance from the image source) of any given target within the image. Alternatively, such information can be determined by a non-onboard system analyzing the image, which has known variables such as the location of the image sources, distances between certain features in the image, etc.

[0067] At point 614, the system then executes something similar to... Figure 4 The time-series aggregation step 414 involves detecting events of interest throughout the day and aggregating them to determine the cleanliness of a specific area. For example, the processor can calculate and store how many times each captured potential disease event (e.g., coughing, sneezing, etc.) occurred in each area of the field of view. This can be calculated locally at each sensor or in the cloud. The value can be automatically reset after someone has been detected cleaning the area. Alternatively, or additionally, the value can be reset after a specific amount of time (e.g., 12 hours or a whole night) has elapsed without human activity, or it can be manually reset.

[0068] After time-series aggregation has been performed, the information can be presented to the user via visualization at point 616. The visualization can be similar to the visualization at point 416 described above. Specifically, the image in the field of view of the image capture device can be overlaid with a "heatmap," which changes in intensity or color based on the number of potential disease events detected in these areas.

[0069] In another embodiment, in addition to an RGB camera, a thermal camera can be used as an additional image capture device. The thermal camera can be used to estimate the detected human body temperature to detect potential heat generation, thereby enhancing the analysis described above.

[0070] Figure 9 and Figure 10 A flowchart illustrating an embodiment of a system for detecting events indicative of occupant disease symptoms, locating events, and visually displaying relevant information while simultaneously using the fusion of audio and visual data. Figure 9 and Figure 10 In this embodiment, image data 102 and audio data 104 are fused together to improve the system's recognition capability. The sensor described above may include both audio and image sources. Alternatively, the test area may be equipped with separate arrays of audio and image sources distributed throughout the area.

[0071] refer to Figure 9 An embodiment of a system 900 for detecting and displaying occupant disease symptoms using the fusion of audio and image data is shown. Regarding the audio data 104, the acoustic signal is acquired at 402 and preprocessed at 404, and feature extraction is performed at 406. These steps are similar to those in the reference [reference]. Figure 4 The steps described are as follows: Regarding image data 102, an image is captured from an image source (e.g., a camera) at 602. The image is preprocessed at 604, and human detection is performed at 606, and feature extraction is performed at 608. These steps are similar to those in the reference. Figure 6 The steps described.

[0072] A fusion layer is added at 902 to fuse the audio data from steps 402, 404, and 406 with the image data from steps 602, 604, 606, and 608. Fusion can be implemented to confirm or improve the confidence level of the obtained data. For example, a subset of occupant disease symptom data detected from a single individual may indicate that that individual has the disease, but not all individuals will indicate all possible symptoms of the disease. Furthermore, some symptom indications may not be as severe as others. The accuracy of disease symptom determination can be indicated using a probability scale. The information needed to determine the probability scale can be obtained from any of the various resources available.

[0073] Accuracy can be improved when fusing audio and image data. For example, if the determined cough arrival angle from the audio source coincides with the location from the image source of the shaking head associated with that cough, the cough data can be determined to be accurate and reliable. In the case of feature fusion, events of interest are detected by using the fused feature map. Downstream of the fusion, an activity recognition step can be performed at 904, which is similar to step 610 described above, except that the confidence of the audio is now added to the video. For example, if described herein and Figure 8 If the image signal processing shown produces a certain disease symptom, the fused audio data can confirm the presence of the disease symptom by matching the activity identified by the image processing with an audio source such as the sound of a sneeze.

[0074] The system then performs the AoA estimation (906), localization (908), time series aggregation (910), and visualization (912) as described above.

[0075] Figure 10 An embodiment of a system 1000 for capturing audio and image data, processing the data, fusing the data, and constructing visualizations from the fused data is illustrated. Here, a consistency comparison and inspection is performed on detections from each modality (e.g., microphone and camera). For example, acoustic signals are captured at 402, preprocessing is performed at 404, feature extraction is performed at 406, classification is performed at 408, an optional step of AoA estimation is performed at 410, and localization is performed at 412. Simultaneously, camera images are acquired at 602, preprocessing of these images is performed at 604, human detection is performed at 606, feature extraction is performed at 608, activity recognition is performed at 610, and localization is performed at 612. In the fusion step at 1002, confidence scores for each modality are considered to filter out incorrect detections. For example, in order to label an event as one in which disease symptoms have occurred, both the audio and camera data must have confidence scores above a certain threshold. In another embodiment, a sliding scale can be implemented, where a lower threshold for another modality (e.g., a camera) is acceptable based on an increase in confidence in one modality (e.g., a microphone). As one data source becomes more reliable, the threshold for positive disease symptom detection from another data source can be lowered.

[0076] After fusing the information or data, timer series aggregation of the fused data is performed at position 1004. Based on the time series aggregation of the fused data, a visualization is output at position 1006. The visualization can be a heatmap, as described here.

[0077] The system disclosed here can also be operated using radar, rather than using audio and image data (or a combination thereof). Figure 11A flowchart of an embodiment of a system 1100 for detecting events indicating occupant disease symptoms via radar, locating the events, and visually displaying relevant information is shown. Radar devices such as radar source 126 described herein enable the perception of vital sign parameters, such as respiratory rate, fever rate, heart rate variability, and human emotion, parameters that might otherwise be unavailable using the audio and image techniques described herein.

[0078] In addition, the radar source 126 can also detect coughing, sneezing, sudden falls, or other movements that indicate underlying disease symptoms. Coughing and sneezing introduce unique patterns of chest, upper body, or whole-body movements, which can be detected by the radar source 126 and processed as described herein. Vital signs can also be used to differentiate benign cases such as seasonal allergies and asthma from actual diseases. In other words, if the radar source 126 does not detect deviations from established norms in heart rate, respiratory rate, chest movements, etc., a sneeze detected alone may not be a marker of an underlying disease.

[0079] The system first detects the location of the target person. Location information can be obtained using radar sources through distance and angle estimation. Radar signals reflected from the target person can capture such body movements in a non-contact manner. With the help of signal processing techniques and / or machine learning models, events such as coughing, sneezing, or other disease symptoms can be detected. The system also maps between the target person's disease symptoms.

[0080] System 1100 first acquires the radar baseband signal. One or more radar sources 126 are deployed and installed at desired locations for occupant detection, such as vehicles in a convoy. Radar sources 126 can include infrared (IR) radar and frequency modulated continuous wave (FMCW) radar. The location of radar sources 126 is also recorded during deployment. Acquisition of the raw radar signal is performed by connecting radar sensors to a data recording device to obtain and record radar data 106. The raw radar signal may include I and Q samples, amplitude and / or phase information.

[0081] Having acquired and recorded the baseband radar signal 106, preprocessing of the data can occur at 1104. At this step, the system performs one or more methods, including denoising, alignment, filtering, processing missing data, and upsampling. This allows for better conditioning of the data for key processing steps such as human detection, feature extraction, and vital sign recognition.

[0082] At point 1106, the system performs human detection based on preprocessed radar data. Given the known radar sensor locations, the positions of one or more occupant targets in 2D or 3D space are extracted. Radar data is obtained by receiving reflected radio waves at the sensors. Thus, the human detection step can be accomplished, for example, by estimating the distance and / or angle to the target occupant based on the reflected radio waves. Human detection can be accomplished by various methods, one of which is disclosed in Ram M. Narayanan, Sonny Smith, and Kyle A. Gallagher's article "A Multifrequency Radar System for Detecting Humans and Characterizing Human Activities for Short-Range Through-Wall and Long-Range Foliage Penetration Applications" published in the International Journal of Microwave Science and Technology, Volume 2014, Article ID 958905.

[0083] At point 1108, features can be extracted from detected humans based on radar data. These features include time-domain, frequency-domain, and spatial-domain features. A feature extraction procedure for radar human recognition based on Merlin transform of time-series radar cross-section (RCS) measurements can also be used; the mathematical relationship between the distribution of target scattering within the cross-section and the RCS amplitude is derived and analyzed, and RCS features are extracted using a sequential method. By using time-domain features, respiratory rate, heart rate, etc., can be extracted to identify signal patterns that need to be observed over time. Similarly, identifiable human regions, such as eyes, nose, mouth, hands, chest, etc., can be extracted, and disease symptoms within them can be analyzed. For example, for disease detection, hands covering the face and sudden head movements first require identification of the hands and face.

[0084] With feature extraction, it is possible to identify vital signs (e.g., heart rate, respiratory rate, etc.) and sudden movements (e.g., coughing, sneezing, falling, etc.) at 1110. This can be performed using classification models (such as those described in reference to audio and image classification) that are capable of predicting and estimating with significant confidence what activity, vital signs, or sudden movements the target occupant is performing. This can include signal processing and / or machine learning models, which may include, but are not limited to, Fast Fourier Transform (FFT), Independent Component Analysis (ICA), Principal Component Analysis (PCA), Nonnegative Matrix Factorization (NMF), and wavelet transform classification models.

[0085] Although radar already exists to provide location, the location step at 1112 can include locating the target occupant based on any potential anomalies. For example, if one of the radar sources 126 has unwanted noise or other errors, one or more other radar sources can be used to determine the location of the target occupant exhibiting symptoms of illness.

[0086] At points 1114 and 1116, timer sequence aggregation and visualization can be performed, respectively. These steps can be similar to those described herein, such as steps 414, 416, 614, 616, 910, 912, 1004, and 1006. For example, heatmaps as described above can be displayed to the user or occupant, overlaying a shadow color onto the image at the occupant's location.

[0087] Figure 12 A flowchart is shown of an embodiment of another system 1200 for detecting events indicating occupant disease symptoms via radar, locating the events, and visually displaying relevant information. System 1200 is a simplified version of the system 1100 described above, which includes many of the same steps. In this embodiment, feature extraction is removed, and activity recognition 1210 is utilized alone. At 1210, this step involves sudden movement detection, such as coughing / sneezing or falling. A classification model will predict whether the current event is a coughing or sneezing or falling or other similar disease symptom event.

[0088] Figure 13 and Figure 14 Additional flowcharts are shown for embodiments of other systems 1300 and 1400 used for detecting events indicating occupant disease symptoms via radar, locating events, and visually displaying relevant information. Figure 13 In this process, activity recognition at 1210 and vital sign recognition at 1310 are separated, and then the two are merged in the localization step at 1112. This enables the provision of separate radar sensors, one dedicated to activity recognition and the other to vital sign recognition. A more simplified embodiment is described below. Figure 14In this system, system 1400 has the option to remove the feature extraction step 1108 and combine activity and vital sign identification into a single step of 1110.

[0089] Figure 15 A flowchart illustrates an embodiment of a system for detecting events indicating occupant disease symptoms via audio, image, and radar, fusing information from all three types of sensors, and visualizing the output based on the fused data. One or more image sources 122 are used to capture images at 602, and then... Figure 1 One or more of the associated processors and structures are used for preprocessing 604, human detection 606, and feature extraction 608. One or more audio sources 124 are used to acquire acoustic signals at 402, and then... Figure 1 One or more of the associated processors and structures are used for preprocessing 404 and feature extraction 406. One or more radar sources 126 are used to establish the radio frequency baseband signal 1112, and then... Figure 1 One or more of the associated processors and structures are used for preprocessing 1104, human detection 1106, and optionally feature extraction 1108.

[0090] System 1500 includes a fusion step 1502, in which audio, image, and radar data are all fused together to generate a comprehensive examination and analysis of potential occupant conditions. It is possible to examine radar data for accuracy by processing audio and image data; to examine image data for accuracy by comparing it with radar and audio data; and to examine audio data for accuracy by comparing it with radar and image data. This step can be similar to the fusion step 902 described above, except that radar data is added.

[0091] The fusion results of the features at 1502 are then passed to 1110, where activity and / or vital signs are identified from the radar data as described above. Subsequently, the localization step at 1504 and the visualization of the time-series aggregated and fused data at 1506 are performed at 1508. By fusing radar data with audio and image data, a more comprehensive and accurate visualization can be provided to the user.

[0092] Figure 16 A system 1600 according to a similar embodiment is shown, except that information fusion 1602 occurs after positioning is performed at 412, 612, and 1112. This embodiment illustrates various architectures and layouts for the various steps of signal processing and fusion contemplated in this disclosure; data fusion can occur in many variations along the time of the processing process.

[0093] The techniques described herein can be verified by additional systems in the surrounding area. For example, if the techniques described herein are used in a passenger vehicle, the processor can access data from other vehicle systems. In one embodiment, the vehicle seats may have weight sensors; this can further contribute to the accuracy of the systems described herein (e.g., providing mental health checks) if a sudden fluctuation in weight is detected in the seat at the same time as a sneeze or cough is detected.

[0094] The processes, methods, or algorithms disclosed herein are deliverable to or implemented by any processing device, controller, or computer capable of including or comprising any existing programmable electronic control unit or dedicated electronic control unit. Similarly, processes, methods, or algorithms can be stored in various forms as data and instructions executable by a controller or computer, including but not limited to information permanently stored on non-writable storage media (such as ROM devices) and information variablely stored on writable storage media (such as floppy disks, magnetic tapes, CDs, RAM devices, and other magnetic and optical media). Processes, methods, or algorithms can also be implemented in a software executable object. Alternatively, processes, methods, or algorithms can be implemented wholly or partially using suitable hardware components, such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), state machines, controllers, or other hardware components or devices, or a combination of hardware, software, and firmware components.

[0095] While exemplary embodiments have been described above, these embodiments do not attempt to describe all possible forms covered by the claims. The language used in this specification is descriptive and not restrictive, and it should be understood that various modifications can be made without departing from the spirit and scope of this disclosure. As previously stated, features of various embodiments can be combined to form further embodiments of the invention that are not explicitly described or illustrated. Although various embodiments may have been described as having advantages over other embodiments or prior art embodiments in terms of one or more desired features, those skilled in the art will recognize that one or more features or characteristics can be compromised to achieve desired overall system properties, depending on the specific application and implementation. These properties may include, but are not limited to, cost, strength, durability, life cycle cost, merchantability, appearance, packaging, size, suitability, weight, manufacturability, ease of assembly, etc. Therefore, even if any embodiment is described as less desirable than other embodiments or prior art implementations in terms of one or more features, these embodiments are not outside the scope of the invention and may be desirable for a particular application.

Claims

1. A system for detecting disease symptoms in an occupant, the system comprising: user interface; A memory configured to maintain visualization applications and image data from image sources; as well as The processor communicates with the memory and the user interface and is programmed to: Receive image data from the image source, the image data including a background image associated with the area being occupied by the occupant; Execute a human detection model configured to detect the occupant within the image data; An activity recognition model is executed, the activity recognition model being configured to identify image-based disease symptoms of the detected occupant within the image data based on the detected occupant's movement; The location of the identified disease symptoms is determined using the image data from the image source; as well as The visualization application is executed to display an overlay image superimposed on the background image in the user interface. The overlay image includes an indicator for each location of the identified disease symptom, the indicator displaying information about the occurrence of the identified disease symptom at that location.

2. The system of claim 1, wherein the overlay image comprises a color-coded heatmap that varies in intensity to correspond to the number of disease symptoms identified at that location.

3. The system of claim 1, wherein the processor is further programmed to extract relevant features from the image data using a convolutional network and to send the extracted relevant features to the activity recognition model for identifying the disease symptoms.

4. The system of claim 1, wherein the processor is further programmed to aggregate the identified disease symptoms over time to determine a time series aggregate, wherein the indicator at each location changes based on the time series aggregate at that location.

5. The system of claim 1, wherein the processor is further programmed to: Receive audio data from the audio source; Execute one or more models to determine audible disease symptoms based on the audio data; The audible disease symptoms are fused with the image-based disease symptoms; as well as The visualization application is performed based on the fused audible disease symptoms and the image-based disease symptoms.

6. The system of claim 5, wherein the fusion of the audible disease symptoms and the image-based disease symptoms occurs prior to the execution of the activity recognition model, such that the activity recognition model is configured to use both the audible disease symptoms and the image-based disease symptoms to identify the disease symptoms of the detected occupant.

7. The system of claim 5, wherein the fusion of the audible disease symptoms and the image-based disease symptoms occurs after the activity recognition model is executed and before the visualization application is executed.

8. The system of claim 1, wherein the processor is further programmed to: Receive radar data from the radar source; The execution is configured to detect the occupant based on the radar data; Execute a radar-based activity recognition model or vital sign recognition model configured to identify the detected occupant's disease symptoms based on the radar data; The identified radar-based disease symptoms are fused with the image-based disease symptoms; as well as The visualization application is performed based on a fusion of radar-based disease symptoms and image-based disease symptoms.

9. The system of claim 8, wherein the processor is further programmed to: Receive audio data from the audio source; Execute one or more models to determine audible disease symptoms based on the audio data; The audible disease symptoms are fused with the image-based disease symptoms and the radar-based disease symptoms; as well as The visualization application is performed based on fused audible disease symptoms, image-based disease symptoms, and radar-based disease symptoms.

10. A system for detecting disease symptoms in an occupant, the system comprising: user interface; The memory is configured to maintain the visualization application and audio data from the audio source; as well as The processor communicates with the memory and the user interface and is programmed to: Receive background images from the camera in the area currently occupied by the occupant; Receive audio data from the audio source; The execution is configured to classify portions of the audio data into classification models that indicate disease symptoms in order to identify audible disease symptoms; The location of the disease symptoms is determined based on the classified portion of the audio data; as well as The visualization application is executed to display an overlay image superimposed on the background image in the user interface. The overlay image includes an indicator for the location of each identified disease symptom, which displays information about the occurrence of the disease symptom at that location.

11. The system of claim 10, wherein the overlay image comprises a color-coded heatmap, the heatmap varying in intensity to correspond to the number of identified disease symptoms at that location.

12. The system of claim 10, wherein the system comprises a plurality of audio sources, and the processor is configured to determine the location of the disease symptom based on triangulation of audio data output from the plurality of audio sources.

13. The system of claim 10, wherein the processor is further programmed to aggregate identified disease symptoms over time to determine a time series aggregate, wherein the indicator at each location changes based on the time series aggregate at that location.

14. The system of claim 10, wherein the processor is further programmed to: Receive image data from the camera; Execute one or more models to determine image-based disease symptoms based on the image data; The audible disease symptoms are fused with the image-based disease symptoms; as well as The visualization application is performed based on the fused audible disease symptoms and the image-based disease symptoms.

15. The system of claim 14, wherein the processor is further programmed to: Receive radar data from the radar source; The execution is configured to detect the occupant based on the radar data; Execute a radar-based activity recognition model or vital sign recognition model configured to identify the detected occupant's disease symptoms based on the radar data; The identified radar-based disease symptoms are fused with the audible disease symptoms; as well as The visualization application is performed based on a fusion of radar-based disease symptoms, audible disease symptoms, and image-based disease symptoms.

16. The system of claim 10, wherein the processor is further programmed to: Receive radar data from the radar source; The execution is configured to detect the occupant based on the radar data; Execute a radar-based activity recognition model or vital sign recognition model configured to identify the detected occupant's disease symptoms based on the radar data; The identified radar-based disease symptoms are fused with the audible disease symptoms; as well as The visualization application is performed based on the fusion of radar-based disease symptoms and the audible disease symptoms.

17. A system for detecting disease symptoms in an occupant, the system comprising: user interface; The memory is configured to maintain visualization applications and radar data from radar sources; as well as The processor communicates with the memory and the user interface and is programmed to: Receive background images from the camera in the area currently occupied by the occupant; Receive radar data from the radar source; The execution is configured to detect the occupant based on the radar data; Execute a radar-based activity recognition model or vital sign recognition model configured to identify the detected occupant's disease symptoms based on the radar data; The location of the radar-based identified disease symptoms is determined using the radar data from the radar source; as well as The visualization application is executed to display an overlay image superimposed on the background image in the user interface. The overlay image includes an indicator for the location of each identified symptom, indicating where the radar-based identified disease symptom occurs.

18. The system of claim 17, wherein the overlay image comprises a color-coded heatmap that varies in intensity to correspond to the number of disease symptoms identified at that location.

19. The system of claim 17, wherein the processor is further programmed to: Receive image data from the camera; Execute one or more models to determine image-based disease symptoms based on the image data; Fusing the image-based disease symptoms with the radar-based disease symptoms; and The visualization application is performed based on a fusion of image-based and radar-based disease symptoms.

20. The system of claim 17, wherein the processor is further programmed to: Receive audio data from the audio source; Execute one or more models to determine audible disease symptoms based on the audio data; The audible disease symptoms are fused with the radar-based disease symptoms; as well as The visualization application is performed based on the fused audible disease symptoms and the radar-based disease symptoms.