Information processing device

The information processing device addresses frame rate issues in multimodal AI by using dedicated and zero-shot object detection models to enhance traffic sign and signal detection, improving accident analysis accuracy and efficiency.

WO2026140328A1PCT designated stage Publication Date: 2026-07-02PIONEER IP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
PIONEER IP
Filing Date
2025-07-30
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Multimodal AI systems face challenges in obtaining appropriate analysis results due to varying frame rates of input videos, leading to computational inefficiencies and inadequate learning, especially in accident analysis scenarios.

Method used

An information processing device and method that includes a video information acquisition unit, driving data acquisition unit, and a sign and signal detection unit using both dedicated and zero-shot object detection models to accurately identify traffic signs and signals, combined with an accident analysis model for precise accident analysis.

Benefits of technology

Enables high-accuracy detection of traffic signs and signals, resulting in more appropriate accident analysis outcomes by integrating dedicated and zero-shot object detection models with specialized accident analysis models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2025026966_02072026_PF_FP_ABST
    Figure JP2025026966_02072026_PF_FP_ABST
Patent Text Reader

Abstract

The present invention obtains an appropriate analysis result. The present invention: acquires a moving image captured from a moving body in a prescribed period including the occurrence timepoint of an accident that has occurred in the moving body; acquires travel data of the moving body in the prescribed period; by using a dedicated object detection model for detecting the state of a traffic light and a traffic sign and on the basis of the moving image, detects the positions of the traffic sign and the traffic light in each frame of an analysis moving image; by using a zero shot object detection model, detects the positions of the traffic sign and the traffic light in each of the frames of the analysis moving image; and analyzes the accident on the basis of the analysis moving image, the travel data, and the state of the traffic light and the traffic sign detected in each of the frames of the analysis moving image.
Need to check novelty before this filing date? Find Prior Art

Description

Information processing apparatus

[0007] ,

[0001] The present invention relates to an information processing apparatus.

[0002] For an input of an instruction (prompt) described in natural language, a large language model (LLM) that outputs an answer described in natural language has been developed and used in various fields (for example, Patent Document 1). In addition, by inputting a plurality of different types of data such as videos, voices, and text data, and a prompt, information is processed based on the input plurality of types of data and the prompt, and a multimodal AI that outputs an analysis result described in natural language has also been developed (for example, Non-Patent Document 1).

[0003] Japanese Patent No. 7586386

[0004] A. Yang et al., "Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning," 2023 IEEE / CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 10714-10726, doi: 10.1109 / CVPR52729.2023.01032.

[0005] In the multimodal AI, when the frame rate of the input video is too low, it may not be possible to obtain an appropriate analysis result. On the other hand, when the frame rate of the video is too high, it may cause a wasteful computational load or a long calculation time. In addition, depending on the field in which the multimodal AI performs analysis, the learning of the multimodal AI may be insufficient, and it may not be possible to obtain an appropriate analysis result.

[0006] An example of the problem to be solved by the present invention is to obtain an appropriate analysis result.

[0007] To solve the above problems, the invention described in claim 1 is an information processing device comprising: a video information acquisition processing unit that acquires video footage taken from a moving object during a predetermined period including the time of an accident that occurred to the moving object; a driving data acquisition processing unit that acquires driving data of the moving object during the predetermined period; a sign and signal detection processing unit that detects the state of traffic signs and signals in each frame of an analysis video based on the video; and an accident analysis processing unit that performs an analysis of the accident based on the analysis video, the driving data, and the state of traffic signs and signals detected in each frame of the analysis video, wherein the sign and signal detection processing unit comprises: a first sign and signal detection unit that detects the position of traffic signs and signals in each frame of the analysis video using a dedicated object detection model for detecting the state of traffic signs and signals; and a second sign and signal detection unit that detects the position of traffic signs and signals in each frame of the analysis video using a zero-shot object detection model.

[0008] The invention described in claim 7 is an information processing method performed by a computer, comprising: a video information acquisition process step of acquiring video footage taken from a moving object during a predetermined period including the time of an accident that occurred to the moving object; a driving data acquisition process step of acquiring driving data of the moving object during the predetermined period; a sign and signal detection process step of detecting the state of traffic signs and signals in each frame of an analysis video based on the video; and an accident analysis process step of performing an analysis of the accident based on the analysis video, the driving data, and the state of traffic signs and signals detected in each frame of the analysis video, wherein the sign and signal detection process step comprises: a first sign and signal detection step of detecting the position of traffic signs and signals in each frame of the analysis video using a dedicated object detection model for detecting the state of traffic signs and signals; and a second sign and signal detection step of detecting the position of traffic signs and signals in each frame of the analysis video using a zero-shot object detection model.

[0009] The invention described in claim 8 is an information processing program that causes a computer to execute the information processing method described in claim 7.

[0010] The invention described in claim 9 is a computer-readable storage medium that stores the information processing program described in claim 8.

[0011] This figure shows an accident analysis device 100 according to one embodiment of the present invention. This figure shows an example of a control unit 110. This figure illustrates an example of a frame of an analysis video. This figure illustrates an example of a detection result output by the sign / signal detection processing unit 113. This figure shows an example of a processing operation performed in the control unit 110. This figure shows an example of a processing operation performed in the control unit 110. This figure shows an example of a processing operation performed in the control unit 110. This figure shows an example of a priority road determination processing unit 119. This figure illustrates an example of an aerial image that includes the road R1 traveled by the moving object M and the road R2 traveled by the other party object involved in the accident with the moving object M. This figure shows an example of a processing operation performed in the control unit 110. This figure shows an example of a processing operation performed in the priority road determination processing unit 119. This figure shows an example of a sign / signal detection processing unit 113. This figure shows an example of a processing operation performed in the sign / signal detection processing unit 113.

[0012] An information processing device according to one embodiment of the present invention includes: a video information acquisition processing unit that acquires video footage taken from a moving object during a predetermined period including the time of an accident that occurred to the moving object; a driving data acquisition processing unit that acquires driving data of the moving object during the predetermined period; a sign and signal detection processing unit that detects the state of traffic signs and signals in each frame of an analysis video based on the video; and an accident analysis processing unit that performs an analysis of the accident based on the analysis video, the driving data, and the state of traffic signs and signals detected in each frame of the analysis video. The sign and signal detection processing unit includes: a first sign and signal detection unit that detects the position of traffic signs and signals in each frame of the analysis video using a dedicated object detection model for detecting the state of traffic signs and signals; and a second sign and signal detection unit that detects the position of traffic signs and signals in each frame of the analysis video using a zero-shot object detection model. Therefore, in this embodiment, it is possible to recognize and detect the state of traffic signs and signals with high accuracy, and by combining this with an accident analysis model specialized for accident analysis, it is possible to obtain more appropriate accident analysis results.

[0013] The prompts input to the zero-shot object detection model may include instructions to cause the zero-shot object detection model to consider changes in the appearance of the traffic signs and signals under a given environment. Doing so makes it possible to recognize and detect the state of the traffic signs and signals with higher accuracy.

[0014] The sign and signal detection processing unit may further include a sign and signal classification unit that uses a zero-shot classification model to classify the type of traffic sign and the state of traffic signals in each frame of the analysis video based on cropped images of traffic signs and traffic signals detected by the first sign and signal detection unit and cropped images of traffic signs and traffic signals detected by the second sign and signal detection unit. This makes it possible to detect the type of traffic sign and the state of traffic signals more accurately and to obtain more appropriate accident analysis results.

[0015] The accident analysis processing unit may further include a priority road determination processing unit that determines whether the road the moving body was traveling on was a priority road relative to the road the other party's object was traveling on, and the accident analysis processing unit may perform the accident analysis based on the analysis video, the driving data, the state of traffic signs and signals detected in each frame of the analysis video, and the determination result of the priority road determination unit. The priority road determination processing unit may also include a first priority road determination unit that determines, based on the analysis video, whether the road the moving body was traveling on was a priority road relative to the road the other party's object was traveling on. By doing so, it becomes possible to obtain more appropriate accident analysis results, in particular, more appropriate fault ratios for the accident.

[0016] The priority road determination processing unit may further include: an aerial image acquisition processing unit that acquires aerial images of the road the moving object is traveling on and the road the other object is traveling on when the first priority road determination unit is unable to determine whether the road the moving object is traveling on is a priority road; and a second priority road determination unit that uses an image caption generation model to determine, based on the aerial images, whether the road the moving object is traveling on is a priority road relative to the road the other object is traveling on. In this way, it becomes possible to determine whether a road is a priority road even when it is not possible to determine it based on the video for analysis.

[0017] An information processing method according to one embodiment of the present invention is an information processing method performed by a computer, comprising: a video information acquisition processing step of acquiring video footage taken from a moving object during a predetermined period including the time of an accident that occurred to the moving object; a driving data acquisition processing step of acquiring driving data of the moving object during the predetermined period; a sign and signal detection processing step of detecting the state of traffic signs and signals in each frame of an analysis video based on the video; and an accident analysis processing step of performing an analysis of the accident based on the analysis video, the driving data, and the state of traffic signs and signals detected in each frame of the analysis video, wherein the sign and signal detection processing step comprises: a first sign and signal detection step of detecting the position of traffic signs and signals in each frame of the analysis video using a dedicated object detection model for detecting the state of traffic signs and signals; and a second sign and signal detection step of detecting the position of traffic signs and signals in each frame of the analysis video using a zero-shot object detection model. Therefore, in this embodiment, it is possible to recognize and detect the status of traffic signs and signals with high accuracy, and by combining this with an accident analysis model specialized for accident analysis, it is possible to obtain more appropriate accident analysis results.

[0018] An information processing program according to one embodiment of the present invention causes a computer to execute the above-described information processing method. Therefore, in this embodiment, it is possible to recognize and detect the status of traffic signs and signals with high accuracy, and by combining it with an accident analysis model specialized for accident analysis, it is possible to obtain more appropriate accident analysis results.

[0019] A computer-readable storage medium according to one embodiment of the present invention stores the above-mentioned information processing program. Therefore, in this embodiment, the above-mentioned information processing program can be distributed independently in addition to being incorporated into a device, and version upgrades and the like can be easily performed.

[0020] <Accident Analysis Device 100> Figure 1 shows an accident analysis device 100 according to one embodiment of the present invention. The accident analysis device 100 analyzes an accident that occurred in a moving object M (for example, a vehicle) based on video footage taken from the moving object M.

[0021] The accident analysis device 100 includes a control unit 110, a storage unit 120, and a communication unit 130. The control unit 110 is an information processing device (computer) for processing information, and for example, includes a CPU (Central Processing Unit). The accident analysis device 100 is typically a server device. Alternatively, the accident analysis device 100 may be a PC (Personal Computer). The storage unit 120 is a storage device for storing information, and for example, includes a hard disk drive, a solid-state drive, or memory. The storage unit 120 stores various programs executed in the accident analysis device 100, such as an operating system and software. These various programs include programs for executing processing related to accident analysis in the accident analysis device 100. Part of the storage unit 120 may be a storage device such as a memory card that can be removed from the accident analysis device 100. The communication unit 130 is a communication device for sending and receiving information with other devices.

[0022] Figure 2 shows an example of the control unit 110. The control unit 110 includes a video information acquisition processing unit 111, a driving data acquisition processing unit 112, a sign / signal detection processing unit 113, and an accident analysis processing unit 114.

[0023] The video information acquisition processing unit 111 acquires video footage (accident video) taken from the mobile body M during a first period including the time of the accident that occurred to the mobile body M. The accident video is, for example, video footage taken by a drive recorder installed on the mobile body M. For example, the drive recorder is equipped with a camera that takes video footage and an impact sensor, and when the impact sensor detects an impact of a predetermined value or higher, it is considered that an accident has occurred, and the video footage taken by the camera during the first period including the timing of the accident is stored as the accident video. The first period is, for example, the period from a time one hour before the time of the accident that occurred to the mobile body M to a time two hours after the time of the accident. The first and second times are set as appropriate, and the first and second times may be the same time or different times.

[0024] The video information acquisition processing unit 111 may acquire accident videos from the storage unit 120 where the accident videos are stored. For example, if the accident analysis device 100 is a PC, the storage unit 120 may be a memory card on which the drive recorder stores the accident videos. Alternatively, the video information acquisition processing unit 111 may acquire accident videos from an external device using communication via the communication unit 130. For example, if the accident analysis device 100 is a server device, the video information acquisition processing unit 111 may acquire accident videos from the drive recorder via communication over the Internet using the communication unit 130.

[0025] The driving data acquisition processing unit 112 acquires driving data of the mobile body M during the first period described above. The driving data of the mobile body M includes, for example, the speed, acceleration, and driving history of the mobile body M, which are acquired based on the output of various sensors provided on the mobile body M. The driving history of the mobile body M includes the time-based position information of the mobile body M and information about the links (driving links) that the mobile body M traveled on (for example, the road type of the driving link). The driving data of the mobile body M may also include information about the operation of the mobile body M (for example, information about the operation of the accelerator, brakes, and turn signals of the mobile body M).

[0026] The driving data acquisition processing unit 112 may acquire driving data from the storage unit 120 where the driving data is stored. For example, if the accident analysis device 100 is a PC, the storage unit 120 may be a memory card on which the drive recorder stores the driving data. Alternatively, the driving data acquisition processing unit 112 may acquire driving data from an external device using communication via the communication unit 130. For example, if the accident analysis device 100 is a server device, the driving data acquisition processing unit 112 may acquire driving data from the drive recorder via communication over the Internet using the communication unit 130.

[0027] The sign / signal detection processing unit 113 detects the state of traffic signs and signals in each frame of the analysis video based on the accident video acquired by the video information acquisition processing unit 111. The analysis video may be the accident video itself acquired by the video information acquisition processing unit 111, or, as detailed below, it may be a video generated by the analysis video generation processing unit 115 based on the accident video acquired by the video information acquisition processing unit 111.

[0028] Traffic signs include regulatory signs (for example, signs installed on the side of the road or above the road that notify of prohibitions, regulations, and restrictions, such as no entry, speed limits, and one-way streets), regulatory markings (for example, speed limits and lane divisions marked on the road) to indicate points where traffic caution is required (for example, signs installed on the side of the road or above the road that indicate the location of stop lines and pedestrian crossings), directional markings (for example, stop lines and pedestrian crossings marked on the road), and supplementary signs to supplement the main signs (for example, signs installed below the main sign that notify the date and time when the traffic regulations indicated by the main sign are in effect). Traffic signs may also include warning signs to draw attention (for example, signs installed on the side of the road or above the road that notify the presence of schools, kindergartens, or daycare centers nearby). For example, the sign / signal detection processing unit 113 detects the location of a sign and identifies the type of the detected sign, as detailed below.

[0029] The status of a traffic light includes the illumination status of the traffic light's lamps. The illumination status of the traffic light's lamps includes whether the red, yellow, or blue lamps are illuminated. The illumination status of the traffic light's lamps also includes the status of the arrow lights. For example, the sign / signal detection processing unit 113 detects the location of a traffic light and identifies the illumination status of the detected traffic light's lamps, as detailed below.

[0030] Figure 3 is a diagram illustrating an example of a frame from a video for analysis. In the example shown in Figure 3, the sign / signal detection processing unit 113 detects the positions of traffic sign 1, traffic sign 2, traffic sign 3, traffic sign 4, traffic sign 5, traffic sign 6, traffic sign 7, traffic sign 8, traffic sign 9, and traffic light 1, and identifies that the type of traffic sign 1 is "Stop", the type of traffic sign 2 is "Stop line", the type of traffic sign 3 is "School, kindergarten, nursery school etc.", the type of traffic sign 4 is "Crosswalk", the type of traffic sign 5 is "Crosswalk ahead", the type of traffic sign 6 is "Crosswalk ahead", the type of traffic sign 7 is "Stop line", the type of traffic sign 8 is "Crosswalk", the type of traffic sign 9 is "Stop line", and the lighting state of traffic light 1 is "Green". Furthermore, the sign / signal detection processing unit 113 identifies the location of traffic signs and traffic lights by a rectangular area (bounding box) surrounding the traffic signs and traffic lights, for example, as shown in Figure 3. In the example shown in Figure 3, the bounding box is indicated by a thick dashed line.

[0031] Figure 4 illustrates an example of the detection results output by the sign / signal detection processing unit 113. In the example shown in Figure 4, it is an example of the detection results for the frame shown in Figure 3, and it describes the location and type of traffic signs 1, 2, 3, 4, 5, 6, 7, 8, and 9, as well as the location and illumination status of traffic light 1, detected in the frame shown in Figure 3. For example, for each frame of the analysis video, the sign / signal detection processing unit 113 obtains text data summarizing the location and type of the traffic signs and the location and status of the traffic lights detected in that frame, as shown in Figure 4. In the example shown in Figure 4, the locations of the traffic signs and traffic lights are indicated by the coordinates of the upper left and lower right corners of the bounding boxes surrounding the traffic signs and traffic lights.

[0032] The accident analysis processing unit 114 analyzes the accident that occurred to the mobile vehicle M based on the analysis video, driving data, and the status of traffic signs and signals detected in each frame of the analysis video by the sign / signal detection processing unit 113, and obtains the analysis results of the accident. The analysis results of the accident include, for example, a description of the accident situation, the cause of the accident, and the percentage of fault in the accident.

[0033] The accident analysis processing unit 114 uses, for example, a multimodal AI (also called an accident analysis model) that combines an image (video) analysis model and a natural language processing model to analyze an accident that occurred to a moving object M and obtain the analysis results of the accident. In other words, the accident analysis processing unit 114 inputs the accident analysis model with instructions (prompts) to have the accident analysis model analyze the accident captured in the analysis video based on the state of traffic signs and signals detected in each frame of the analysis video by the sign / signal detection processing unit 113 and driving data, as well as the analysis video itself, and obtains the analysis results of the accident output from the accident analysis model. The accident analysis processing unit 114 may include an accident analysis model, or it may use an external accident analysis model.

[0034] In this embodiment, the sign and signal detection processing unit 113 detects the status of traffic signs and signals in advance, and the status of traffic signs and signals detected by the sign and signal detection processing unit 113, in addition to the video for analysis and driving data, is input to the accident analysis model. Therefore, in this embodiment, the accident analysis model performs accident analysis after the status of traffic signs and signals in each frame of the video for analysis has already been identified. In other words, in this embodiment, by combining the sign and signal detection processing unit 113, which can accurately detect the status of signs and signals, with an accident analysis model specialized for accident analysis, it is possible to obtain more appropriate accident analysis results.

[0035] Figure 5 shows an example of processing operations performed in the control unit 110. The video information acquisition processing unit 111 acquires video footage (accident video) taken from the mobile body M during a first period including the time of the accident that occurred to the mobile body M, and the driving data acquisition processing unit 112 acquires driving data of the mobile body M during the first period (step S501). The sign / signal detection processing unit 113 detects the state of traffic signs and signals in each frame of the analysis video based on the accident video acquired by the video information acquisition processing unit 111 (step S502). The accident analysis processing unit 114 analyzes the accident that occurred to the mobile body M based on the analysis video, the driving data, and the state of traffic signs and signals detected in each frame of the analysis video by the sign / signal detection processing unit 113, and obtains the analysis results of the accident (step S503).

[0036] <Analysis video generation processing unit 115, traffic light state identification processing unit 116> When the light source of a traffic light is an LED, the traffic light flashes according to the frequency of the AC power supply (50Hz or 60Hz). If the traffic light is photographed at a frame rate equivalent to the frequency of the AC power supply, or half of that frequency, depending on the timing, the traffic light may be photographed as being off even though it is lit. Therefore, the frame rate of many dashcams is set to, for example, 27.5fps. If all frames of a video shot at 27.5fps are analyzed, it places a heavy computational load on the system and the calculation time becomes long.

[0037] Therefore, the control unit 110 may further include an analysis video generation processing unit 115, as shown in Figure 2. The analysis video generation processing unit 115 downsamples the frames of the accident video acquired by the video information acquisition processing unit 111 to generate an analysis video. This reduces the computational load during image analysis and shortens the computation time.

[0038] As the speed of the moving object M increases, the amount of movement of the object M per unit time increases, resulting in a greater change in the scenery captured by the camera between temporally consecutive frames in the accident video. Here, if the amount of frame decimation from the accident video is kept constant regardless of the speed of the moving object M and a video for analysis is generated, frames containing information necessary for accident analysis may be omitted from the video for analysis, potentially reducing the accuracy of the accident analysis. Furthermore, the speed of the moving object M is generally proportional to the speed limit of the road, and the higher the speed limit of the road, the higher the speed of the moving object M traveling on that road.

[0039] Therefore, the analysis video generation processing unit 115 may generate an analysis video by downsampling frames from the accident video based on the driving data. In this case, the driving data may include at least one of the speed of the moving object M and the type of road on which the moving object M is traveling, and the analysis video generation processing unit 115 may determine the frame rate of the analysis video based on the speed of the moving object M or the road on which the moving object M is traveling. By doing so, it becomes possible to obtain appropriate analysis results while reducing the computational load.

[0040] In this case, the analysis video generation processing unit 115 may determine the frame rate of the analysis video to be larger as the speed of the moving object M increases. Alternatively, the analysis video generation processing unit 115 may determine the frame rate of the analysis video to be larger as the speed limit of the road on which the moving object M is traveling increases. In this case, the analysis video generation processing unit 115 may, for example, use a threshold of 1 or more to change the frame rate of the analysis video in steps according to the relationship between the threshold and the speed of the moving object M or the speed limit of the road on which the moving object M is traveling.

[0041] As mentioned above, when the light source of a traffic light is an LED, the traffic light flashes according to the frequency of the AC power supply (50Hz or 60Hz). Therefore, in some frames of the video for analysis, the traffic light may be recorded as being off or dimly lit, making it impossible to determine the state of the traffic light (the color of the light). In other words, there are frames in which the state of the traffic light detected by the sign / signal detection processing unit 113 is unknown (frames with unknown traffic light state).

[0042] Therefore, the control unit 110 may further include a traffic light state identification processing unit 116, as shown in Figure 2. The traffic light state identification processing unit 116 identifies the state of the traffic light in the frame where the traffic light state is unknown, based on the state of the traffic light in the frames immediately preceding and following that frame. The accident analysis processing unit 114 may then analyze the accident that occurred to the moving object M based on the analysis video, the driving data, the state of traffic signs and traffic lights detected in each frame of the analysis video by the sign / signal detection processing unit 113, and the state of the traffic light identified by the traffic light state identification processing unit 116, and obtain the analysis results of the accident. For example, the traffic light state identification processing unit 116 may identify the state of the traffic light in the frame where the traffic light state is unknown as the state of the traffic light in the frame where the traffic light state is unknown, among the frames in which the state of the traffic light detected by the sign / signal detection processing unit 113 is not unknown, which is the closest frame in time to the frame where the traffic light state is unknown. By doing so, it becomes possible to identify the state of the traffic lights in every frame of the video being analyzed before conducting an accident analysis, thereby obtaining more accurate accident analysis results.

[0043] The sign / signal detection processing unit 113 obtains the reliability of the detected signal state, and the signal state identification processing unit 116 determines frames in which the reliability of the signal state detected by the sign / signal detection processing unit 113 is less than a predetermined threshold as frames in which the signal state detected by the sign / signal detection processing unit 113 is unknown (unknown signal state frames), and frames in which the reliability of the signal state detected by the sign / signal detection processing unit 113 is equal to or greater than the predetermined threshold as frames in which the signal state detected by the sign / signal detection processing unit 113 is known (known signal state frames). Furthermore, the signal state identification processing unit 116 may determine the signal state in the frame that is temporally closest to the unknown signal state frame among the known signal state frames as the signal state in the unknown signal state frame.

[0044] Figure 6 shows an example of processing operations performed in the control unit 110. The video information acquisition processing unit 111 acquires video footage (accident video) taken from the mobile body M during a first period including the time of the accident that occurred to the mobile body M, and the driving data acquisition processing unit 112 acquires driving data of the mobile body M during the first period (step S601). The analysis video generation processing unit 115 generates an analysis video based on the accident video acquired by the video information acquisition processing unit 111 (step S602). The sign / signal detection processing unit 113 detects the state of traffic signs and signals in each frame of the analysis video generated by the analysis video generation processing unit 115 (step S603). The signal state identification processing unit 116 identifies the state of the signal in a frame where the state of the signal detected by the sign / signal detection processing unit 113 is unknown (unknown signal state frame) based on the state of the signal in the frames before and after the unknown signal state frame (step S604). The accident analysis processing unit 114 analyzes the accident that occurred to the moving vehicle M based on the analysis video, driving data, the status of traffic signs and signals detected in each frame of the analysis video by the sign / signal detection processing unit 113, and the status of the signals identified by the signal status identification processing unit 116, and obtains the analysis results of the accident (step S605).

[0045] <Accident type information acquisition processing unit 117, other party determination processing unit 118> Generally, the fault ratio of an accident is determined based on the accident type. Therefore, as shown in FIG. 2, the control unit 110 may further include an accident type information acquisition processing unit 117. The accident type information acquisition processing unit 117 acquires accident type information, which is information related to the accident type. Then, based on the analysis video, the driving data, the accident type information, and the states of traffic signs and signal lights detected in each frame of the analysis video by the sign / signal detection processing unit 113, the accident analysis processing unit 114 may perform an analysis of the accident that occurred to the moving body M and obtain the analysis result of the accident, particularly the fault ratio of the accident. By doing so, even if an accident analysis model that has not undergone special learning for the accident type is used to analyze the accident, it becomes possible to identify the accident type of the accident that occurred to the moving body M, and it becomes possible to obtain a more appropriate accident analysis result, particularly a more appropriate accident fault ratio.

[0046] The accident type information may be, for example, a summary text of the accident type. As civil traffic litigation cases, there are hundreds of accident types. These hundreds of accident types are summarized as images with texts about the accident situation, the fault ratio, the correction factors of the fault ratio, etc. Therefore, the summary text of the accident type may be prepared in advance based on this accident type image. At this time, the summary text of the accident type may be generated based on this accident type image using, for example, an image caption generation model.

[0047] The accident fault ratio may be corrected based on the attributes of the other party. For example, when the other party is a traffic vulnerable person such as a child or an elderly person, the fault ratio can be corrected from the perspective of protecting traffic vulnerable persons. Therefore, the accident type information may include elements for correcting the fault ratio, and the control unit 110 may further include an other party determination processing unit 118 as shown in FIG. 2. The other party determination processing unit 118 determines the attributes of the object of the other party in the accident that occurred to the moving body M based on the analysis video. When the object of the other party in the accident is a person, the other party determination processing unit 118 determines, as the attributes of the object of the other party in the accident, for example, whether the object of the other party in the accident corresponds to a child (for example, a person aged 6 or more and less than 13 years old) or an elderly person (for example, a person aged 65 or more), or whether it corresponds to an infant (a person under 6 years old) or a person with a physical disability. When the object of the other party in the accident is a vehicle, the other party determination processing unit 118 may determine the vehicle type (for example, a large vehicle, a motorcycle, a bicycle) of the other party in the accident as the attributes of the object of the other party in the accident. Then, the accident analysis processing unit 114 analyzes the accident that occurred to the moving body M based on the analysis video, the driving data, the accident type information, the states of the traffic signs and signal devices detected in each frame of the analysis video by the sign / signal detection processing unit 113, and the attributes of the object of the other party determined by the other party determination processing unit 118, and obtains the analysis result of the accident, particularly, the fault ratio of the accident. By doing so, even if the accident is analyzed using an accident analysis model that has not undergone special learning for the accident type, it is possible to obtain a more appropriate accident fault ratio considering the correction elements.

[0048] In particular, the accident type information may include at least elements for correcting the fault ratio when the object of the other party is a child or an elderly person, and the attributes of the object of the other party in the accident may include at least whether the object of the other party is a child or an elderly person.

[0049] The opposing party determination processing unit 118 uses, for example, an image (video) analysis model to determine the attributes of the opposing object involved in the accident with the moving object M based on the video for analysis. The image (video) analysis model may be a dedicated object detection model that has been specially trained on infants, children, the elderly, people with disabilities, and vehicle types; it may be a zero-shot object detection model; or it may be a multimodal AI (for example, a natural language processing model capable of analyzing images (videos)).

[0050] Figure 7 shows an example of processing operations performed in the control unit 110. The video information acquisition processing unit 111 acquires video footage (accident video) taken from the mobile body M during a first period including the time of the accident that occurred to the mobile body M, the driving data acquisition processing unit 112 acquires driving data of the mobile body M during the first period, and the accident type information acquisition processing unit 117 acquires accident type information, which is information about the type of accident (step S701). The sign / signal detection processing unit 113 detects the state of traffic signs and signals in each frame of the analysis video based on the accident video acquired by the video information acquisition processing unit 111, and the other party determination processing unit 118 determines the attributes of the other party object involved in the accident that occurred to the mobile body M based on the analysis video (step S702). The accident analysis processing unit 114 analyzes the accident that occurred to the moving object M based on the analysis video, driving data, accident type information, the status of traffic signs and signals detected in each frame of the analysis video by the sign / signal detection processing unit 113, and the attributes of the opposing party determined by the opposing party determination processing unit 118, and obtains the analysis results of the accident, in particular the percentage of fault in the accident (step S703).

[0051] Between steps S702 and S703 of the processing operation shown in Figure 7, the signal status identification processing unit 116 identifies the status of the signal in the frame where the status of the signal detected by the sign / signal detection processing unit 113 is unknown (signal status unknown frame) based on the status of the signal in the frames before and after the unknown frame. In step S703 of the processing operation shown in Figure 7, the accident analysis processing unit 114 analyzes the accident that occurred to the moving object M based on the analysis video, driving data, accident type information, the status of traffic signs and signals detected by the sign / signal detection processing unit 113 in each frame of the analysis video, the attributes of the opposing party determined by the opposing party determination processing unit 118, and the status of the signal identified by the signal status identification processing unit 116, and obtains the analysis results of the accident, in particular the percentage of fault in the accident.

[0052] <Priority Road Determination Processing Unit 119> In determining the percentage of fault in an accident, it is also important to determine whether the road on which the moving object M traveled (traveling road) was a priority road with respect to the travel road of the other party involved in the accident. Therefore, the control unit 110 may further include a priority road determination processing unit 119, as shown in Figure 2. The priority road determination processing unit 119 determines whether the travel road of the moving object M was a priority road with respect to the travel road of the other party involved in the accident with the moving object M. The accident analysis processing unit 114 may then analyze the accident based on the analysis video, the travel data, the status of traffic signs and signals detected in each frame of the analysis video, and the determination result of the priority road determination processing unit 119, and obtain the analysis result of the accident, in particular, the percentage of fault in the accident. In this way, even if the accident analysis is performed using an accident analysis model that has not undergone special training for accident analysis, it becomes possible to obtain a more appropriate accident analysis result, in particular, a more appropriate percentage of fault in the accident.

[0053] The priority road determination processing unit 119 may also have a first priority road determination unit 1191, as shown in Figure 8. The first priority road determination unit 1191 uses an image (video) analysis model to determine, based on the video for analysis, whether the road traveled by the moving object M was a priority road relative to the road traveled by the other object. In this case, the image (video) analysis model may be a dedicated object detection model that has been specially trained on traffic signs, a zero-shot object detection model, or a multimodal AI (for example, a natural language processing model capable of analyzing images (videos)).

[0054] As shown in Figure 8, the priority road determination processing unit 119 may further include an aerial image acquisition processing unit 1192 and a second priority road determination unit 1193. If the first priority road determination unit 1191 could not determine whether the road traveled by the moving body M was a priority road to the road traveled by the other party object involved in the accident, the aerial image acquisition processing unit 1192 acquires an aerial image that includes the road traveled by the moving body M and the road traveled by the other party object involved in the accident with the moving body M. The aerial image is an image of the road as seen from above, and may be, for example, an aerial photograph, a satellite photograph, or a map image (for example, a map image that allows confirmation of road width). The second priority road determination unit 1193 uses an image caption generation model to determine, based on the aerial image acquired by the aerial image acquisition processing unit 1192, whether the road traveled by the moving body M was a priority road to the road traveled by the other party object involved in the accident with the moving body M. This approach makes it possible to determine which roads are priority even when it is not possible to determine priority roads based on the analysis video.

[0055] In this case, the prompt input to the image caption generation model should include an instruction to cause the image caption generation model to determine that the road on which the moving object M is traveling is a priority road to the road on which the other party's object is traveling when the road on which the moving object M is traveling satisfies at least one of the following determination conditions. The one or more determination conditions include, for example, "there is a 'priority road' traffic sign on the road on which the moving object M is traveling" (Condition 1), "there is a 'stop' traffic sign on the road on which the other party's object is traveling" (Condition 2), "the center line runs through the intersection on the road on which the moving object M is traveling" (Condition 3), "the width of the road on which the moving object M is traveling is wider than the width of the road on which the other party's object is traveling" (Condition 4), and "the road on which the moving object M is traveling does not satisfy any of conditions 1 to 4, and the moving object M is traveling from the left side of the other party's object" (Condition 5).

[0056] Figure 9 illustrates an example of an aerial image that includes the road R1 traveled by the moving object M and the road R2 traveled by the other party involved in the accident with the moving object M. In the example aerial image shown in Figure 9, traffic signs such as "priority road" and "stop" and the center line cannot be seen, but it can be confirmed that the width of the road R1 traveled by the moving object M is wider than the width of the road R2 traveled by the other party involved in the accident. Therefore, in the example shown in Figure 9, when the second priority road determination unit 1193 inputs the above prompt to the image caption generation model, the second priority road determination unit 1193 determines that the road R1 traveled by the moving object M is a priority road relative to the road R2 traveled by the other party involved in the accident.

[0057] Figure 10 shows an example of processing operations performed in the control unit 110. The video information acquisition processing unit 111 acquires video footage (accident video) taken from the mobile body M during a first period including the time of the accident that occurred to the mobile body M, and the driving data acquisition processing unit 112 acquires the driving data of the mobile body M during the first period (step S1001). The sign / signal detection processing unit 113 detects the state of traffic signs and signals in each frame of the analysis video based on the accident video acquired by the video information acquisition processing unit 111, and the priority road determination processing unit 119 determines whether the road on which the mobile body M was traveling was a priority road to the road on which the other party in the accident that occurred to the mobile body M was traveling (step S1002). The accident analysis processing unit 114 analyzes the accident that occurred to the mobile vehicle M based on the analysis video, driving data, the status of traffic signs and signals detected in each frame of the analysis video by the sign / signal detection processing unit 113, and the determination result of the priority road determination processing unit 119, and obtains the analysis result of the accident (step S1003).

[0058] Between steps S1002 and S1003 of the processing operation shown in Figure 10, the signal status identification processing unit 116 may identify the signal status in the frame where the signal status detected by the sign / signal detection processing unit 113 is unknown (signal status unknown frame) based on the signal status in the frames before and after the unknown signal status frame. Furthermore, in step S1001 of the processing operation shown in Figure 10, the accident type information acquisition processing unit 117 may acquire accident type information, which is information related to the accident type, and in step S1002 of the processing operation shown in Figure 10, the other party determination processing unit 118 may determine the attributes of the other party object involved in the accident that occurred to the moving object M based on the analysis video. Furthermore, in step S1003 of the processing operation shown in Figure 10, the accident analysis processing unit 114 may analyze the accident that occurred to the moving body M based on the analysis video, the driving data, the status of traffic signs and signals detected in each frame of the analysis video by the sign / signal detection processing unit 113, the determination result of the priority road determination processing unit 119, as well as the status of the signals identified by the signal status identification processing unit 116 and / or the attributes of the opposing party determined by the opposing party determination processing unit 118, and obtain the analysis result of the accident, in particular the percentage of fault in the accident.

[0059] Figure 11 shows an example of processing operations performed in the priority road determination processing unit 119. The first priority road determination unit 1191 uses an image (video) analysis model to determine, based on the video for analysis, whether the road traveled by the moving object M was a priority road relative to the road traveled by the other object (step S1101). If the first priority road determination unit 1191 can determine whether the road traveled by the moving object M is a priority road (step S1102, YES), the process ends. If the first priority road determination unit 1191 cannot determine whether the road traveled by the moving object M is a priority road (step S1102, NO), the aerial image acquisition processing unit 1192 acquires an aerial image that includes the road traveled by the moving object M and the road traveled by the other object involved in the accident with the moving object M (step S1103). The second priority road determination unit 1193 uses an image caption generation model to determine, based on the aerial image acquired by the aerial image acquisition processing unit 1192, whether the road on which the moving object M was traveling was a priority road relative to the road on which the other party object involved in the accident with the moving object M was traveling (step S1104).

[0060] <Sign and Signal Detection Processing Unit 113> The sign and signal detection processing unit 113 detects the location of traffic signs and traffic lights, and identifies the type of the detected traffic sign and the lighting state of the detected traffic light. At this time, the sign and signal detection processing unit 113 may have a first sign and signal detection unit 1131, as shown in Figure 12. The first sign and signal detection unit 1131 uses a dedicated object detection model (e.g., YOLO) for detecting the state of traffic signs and traffic lights to detect the location of traffic signs and traffic lights in each frame of the analysis video, and detects the type of traffic sign and the state of the traffic light. The dedicated object detection model is, for example, an object detection model specialized in detecting traffic signs and traffic lights, which has been trained on images of various types of traffic signs and traffic lights. In this way, it becomes possible to recognize and detect the state of traffic signs and traffic lights with high accuracy. The sign and signal detection processing unit 113 may include a dedicated object detection model, or it may use an external dedicated object detection model.

[0061] While dedicated object detection models can accurately recognize pre-trained traffic signs and signals, their accuracy decreases if the training data is biased towards specific regions or countries, resulting in reduced recognition accuracy for traffic signs and signals in other regions or countries. Including images of traffic signs and signals from all regions and countries in the training data is costly.

[0062] Therefore, the sign / signal detection processing unit 113 may further include a second sign / signal detection unit 1132, as shown in Figure 12. The second sign / signal detection unit 1132 uses a zero-shot object detection model (e.g., Florence) to detect the positions of traffic signs and signals in each frame of the analysis video, and to detect the type of traffic sign and the state of the signal. Since the zero-shot object detection model has the ability to recognize objects of classes that do not exist in the training data, it is possible to detect objects with traffic sign and signal designs that do not exist in the training data as traffic signs and signals. The sign / signal detection processing unit 113 may include a zero-shot object detection model, or it may use an external zero-shot object detection model.

[0063] Furthermore, the appearance of traffic signs and traffic lights changes during adverse weather conditions such as rain, fog, and snow, as well as at night. Their appearance also changes if parts of them are obscured by obstacles (e.g., trees or trucks). If the training data does not account for these changes in the appearance of traffic signs and traffic lights, the accuracy of their recognition will decrease. Including all images of traffic signs and traffic lights that account for these changes in appearance in the training data would be very costly. Zero-shot object detection models enable flexible object recognition based on the context of the prompt.

[0064] Therefore, the prompts input to the zero-shot object detection model should include instructions to cause the model to consider changes in the appearance of traffic signs and signals under a given environment. Here, the given environment includes, for example, adverse weather conditions such as rain, fog, or snow, or environments where parts of traffic signs and signals are hidden by obstacles (e.g., trees or trucks).

[0065] The sign / signal detection processing unit 113 may further include a sign / signal classification unit 1133, as shown in Figure 12. The sign / signal classification unit 1133 uses a zero-shot classification model (e.g., CLIP) to classify the type of traffic sign and the state of the traffic light in the cropped image in each frame of the analysis video, based on the cropped image of the location of the traffic sign and traffic light (bounding box surrounding the detected traffic sign and traffic light) detected by the first sign / signal detection unit 1131 and the cropped image of the location of the traffic sign and traffic light (bounding box surrounding the detected traffic sign and traffic light) detected by the second sign / signal detection unit 1132. The sign / signal detection processing unit 113 may include a zero-shot classification model or use an external zero-shot classification model.

[0066] In other words, the sign / signal classification unit 1133 may use a zero-shot classification model to infer objects within the cropped image of the locations of traffic signs and signals detected by the first sign / signal detection unit 1131 and within the cropped image of the locations of traffic signs and signals detected by the second sign / signal detection unit 1132, and obtain classification results for the type of traffic sign and the state of the signal from the zero-shot classification model. The accident analysis processing unit 114 may use the classification results for the type of traffic sign and the state of the signal obtained by the sign / signal classification unit 1133 as the state of traffic signs and signals detected in each frame of the analysis video by the sign / signal detection processing unit 113. By doing so, it becomes possible to detect the type of traffic sign and the state of the signal more accurately, and to obtain more appropriate accident analysis results.

[0067] Figure 13 shows an example of processing operations performed in the sign / signal detection processing unit 113. The processing operations shown in Figure 13 are examples of processing operations performed by the sign / signal detection processing unit 113 in step S502 in Figure 5, step S603 in Figure 6, step S702 in Figure 7, and step S1002 in Figure 10. The first sign / signal detection unit 1131 uses a dedicated object detection model for detecting the state of traffic signs and signals to detect the positions of traffic signs and signals in each frame of the analysis video, and the second sign / signal detection unit 1132 uses a zero-shot object detection model to detect the positions of traffic signs and signals in each frame of the analysis video (step S1301). The sign / signal classification unit 1133 uses a zero-shot classification model to classify the type of traffic sign and the state of the traffic lights in the cropped image for each frame of the analysis video, based on the cropped image of the locations of traffic signs and traffic lights detected by the first sign / signal detection unit 1131 (locations of traffic signs and traffic lights detected using a dedicated object detection model) and the cropped image of the locations of traffic signs and traffic lights detected by the second sign / signal detection unit 1132 (locations of traffic signs and traffic lights detected using a zero-shot object detection model) (step S1302).

[0068] The present invention has been described above with reference to preferred embodiments. Although the present invention has been described with reference to specific examples, various modifications and changes can be made to these examples without departing from the spirit and scope of the invention as described in the claims.

[0069] 100 Accident analysis device 110 Control unit 111 Video information acquisition processing unit 112 Driving data acquisition processing unit 113 Sign / signal detection processing unit 1131 First sign / signal detection unit 1132 Second sign / signal detection unit 1133 Sign / signal classification unit 114 Accident analysis processing unit 115 Analysis video generation processing unit 116 Traffic light status identification processing unit 117 Accident type information acquisition processing unit 118 Opponent determination processing unit 119 Priority road determination processing unit 1191 First priority road determination unit 1192 Aerial image acquisition processing unit 1193 Second priority road determination unit 120 Storage unit 130 Communication unit

Claims

1. An information processing device comprising: a video information acquisition processing unit that acquires video footage taken from a moving object during a predetermined period including the time of an accident that occurred to the moving object; a driving data acquisition processing unit that acquires driving data of the moving object during the predetermined period; a sign and signal detection processing unit that detects the state of traffic signs and signals in each frame of an analysis video based on the video; and an accident analysis processing unit that performs an analysis of the accident based on the analysis video, the driving data, and the state of traffic signs and signals detected in each frame of the analysis video, wherein the sign and signal detection processing unit comprises: a first sign and signal detection unit that detects the position of traffic signs and signals in each frame of the analysis video using a dedicated object detection model for detecting the state of traffic signs and signals; and a second sign and signal detection unit that detects the position of traffic signs and signals in each frame of the analysis video using a zero-shot object detection model.

2. The information processing device according to claim 1, wherein the prompt input to the zero-shot object detection model includes an instruction to cause the zero-shot object detection model to take into account changes in the appearance of the traffic signs and signals under a predetermined environment.

3. The information processing apparatus according to claim 1, wherein the sign and signal detection processing unit further comprises a sign and signal classification unit that uses a zero-shot classification model to classify the type of traffic sign and the state of the traffic signal in each frame of the analysis video based on cropped images of traffic signs and traffic signals detected by the first sign and signal detection unit and cropped images of traffic signs and traffic signals detected by the second sign and signal detection unit.

4. The information processing device according to claim 1, further comprising a priority road determination processing unit that determines whether the road on which the moving object was traveling was a priority road with respect to the road on which the other party object in the accident was traveling, wherein the accident analysis processing unit performs an analysis of the accident based on the analysis video, the driving data, the state of traffic signs and signals detected in each frame of the analysis video, and the determination result of the priority road determination unit.

5. The information processing apparatus according to claim 4, wherein the priority road determination processing unit has a first priority road determination unit that determines, based on the analysis video, whether the road on which the moving object traveled was a priority road with respect to the road on which the other object traveled.

6. The information processing apparatus according to claim 5, wherein the priority road determination processing unit further comprises: an aerial image acquisition processing unit that acquires aerial images of the road the moving body is traveling on and the road the other object is traveling on when the first priority road determination unit could not determine whether the road the moving body is traveling on is a priority road; and a second priority road determination unit that uses an image caption generation model to determine, based on the aerial images, whether the road the moving body is traveling on is a priority road relative to the road the other object is traveling on.

7. An information processing method performed by a computer, comprising: a video information acquisition process step of acquiring video footage taken from a moving object during a predetermined period including the time of an accident that occurred to the moving object; a driving data acquisition process step of acquiring driving data of the moving object during the predetermined period; a sign and signal detection process step of detecting the state of traffic signs and signals in each frame of an analysis video based on the video; and an accident analysis process step of performing an analysis of the accident based on the analysis video, the driving data, and the state of traffic signs and signals detected in each frame of the analysis video, wherein the sign and signal detection process step comprises: a first sign and signal detection step of detecting the position of traffic signs and signals in each frame of the analysis video using a dedicated object detection model for detecting the state of traffic signs and signals; and a second sign and signal detection step of detecting the position of traffic signs and signals in each frame of the analysis video using a zero-shot object detection model.

8. An information processing program that causes a computer to execute the information processing method described in claim 7.

9. A computer-readable storage medium storing the information processing program described in claim 8.