Information processing system

The information processing system addresses the challenge of managing large volumes of vehicle-captured video data by generating and utilizing feature text, enhancing data management efficiency and reducing costs through flexible search capabilities.

JP2026109319APending Publication Date: 2026-07-01TOYOTA JIDOSHA KK

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
TOYOTA JIDOSHA KK
Filing Date
2024-12-19
Publication Date
2026-07-01

AI Technical Summary

Technical Problem

Existing systems struggle to effectively manage large volumes of vehicle-captured video data, leading to inefficiencies in data utilization and management, particularly due to the transmission of only data meeting predetermined conditions, which limits the utilization of non-transmitted data.

Method used

An information processing system that includes storage means for vehicle-captured images, generation of feature text from image frames, storage of feature text outside the vehicle, acquisition of search queries, and retrieval of matching images based on feature text, reducing the need to transmit raw video data.

Benefits of technology

Enables efficient management and retrieval of vehicle-captured images using feature text, minimizing storage and communication costs while allowing flexible search conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026109319000001_ABST
    Figure 2026109319000001_ABST
Patent Text Reader

Abstract

Manage vehicle-recorded footage appropriately while reducing costs. [Solution] The information processing system includes: storage means for storing images captured by a camera mounted on the vehicle inside the vehicle; generation means for generating feature text that indicates the characteristics of a scene from image frames contained in the images stored in the storage means; feature storage means provided outside the vehicle for storing the feature text; query acquisition means for obtaining search queries from a user outside the vehicle to search for images stored in the storage means; search means for searching for feature text that matches the search query from among the feature texts stored in the feature storage means; and output means for extracting images corresponding to feature text that matches the search query from the storage means and outputting them to the user.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the technical field of information processing systems.

Background Art

[0002] As this type of system, there is known one that manages images captured by a vehicle. For example, Patent Document 1 discloses a technique for transmitting only data that meets predetermined detection conditions (such as position information and motion conditions) according to the purpose of use of the data among the video data captured by the vehicle to a server.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] By using a large language model (LLM) constructed with a large amount of data and deep learning technology, text related to images can be automatically generated. Although it is conceivable to manage images using such text, it is not easy to appropriately manage the huge amount of video data captured by a vehicle. For example, in the technique described in Patent Document 1, since only data that meets the detection conditions is transmitted to the server as described above, it is difficult to utilize the data that has not been transmitted to the server, and it cannot be said that the images are appropriately managed.

[0005] This disclosure has been made in view of the above problems, and an object thereof is to provide an information processing system capable of appropriately managing images captured by a vehicle while reducing costs.

Means for Solving the Problems

[0006] An information processing system according to one aspect of this disclosure includes: storage means for storing images captured by a camera mounted on the vehicle within the vehicle; generation means for generating feature text indicating the characteristics of a scene from image frames included in the images stored in the storage means; feature storage means provided outside the vehicle for storing the feature text; query acquisition means for obtaining a search query from a user outside the vehicle to search for the images stored in the storage means; search means for searching for feature text that matches the search query from among the feature texts stored in the feature storage means; and output means for extracting the images corresponding to the feature text that matches the search query from the storage means and outputting them to the user. [Brief explanation of the drawing]

[0007] [Figure 1] This is a block diagram showing the hardware configuration of the information processing system according to the embodiment. [Figure 2] This is a block diagram showing the functional configuration of the information processing system according to the embodiment. [Figure 3] This is a flowchart showing the flow of the data storage operation by the information processing system according to the embodiment. [Figure 4] This is a flowchart showing the flow of the extraction operation by the information processing system according to the embodiment. [Modes for carrying out the invention]

[0008] The following describes an embodiment of the information processing system with reference to the drawings.

[0009] (Hardware configuration) First, the hardware configuration of the information processing system according to the embodiment will be described with reference to Figure 1. Figure 1 is a block diagram showing the hardware configuration of the information processing system according to the embodiment.

[0010] In Figure 1, the information processing system 1 according to the embodiment is configured to include an in-vehicle device 10 and a server 50. The in-vehicle device 10 is a device mounted on a vehicle. On the other hand, the server 50 is a device installed outside the vehicle. The in-vehicle device 10 and the server 50 are configured to communicate with each other via wireless communication. Here, for the sake of explanation, one in-vehicle device 10 and one server 50 are shown, but each of multiple in-vehicle devices 10 may be configured to communicate with one server 50. Furthermore, one in-vehicle device 10 may be configured to communicate with multiple servers 50. Moreover, multiple in-vehicle devices 10 may be configured to communicate with multiple servers 50.

[0011] The in-vehicle device 10 comprises a computing device 110, a storage device 120, a communication device 130, an input device 140, and an output device 150. The computing device 110, storage device 120, communication device 130, input device 140, and output device 150 are connected to each other via a data bus. All of the devices included in the in-vehicle device 10 described above may be mounted in the vehicle. However, some of the devices included in the in-vehicle device 10 may be mounted in the vehicle, while the remaining devices are installed outside the vehicle.

[0012] The arithmetic unit 110 is configured to perform various arithmetic processes in the in-vehicle device 10. The arithmetic unit 110 may have a processor. The arithmetic unit 110 may have a single processor or may have multiple processors. In other words, the arithmetic unit 110 may have one or more processors. The processor may be a multi-core processor. If the arithmetic unit 110 has a single processor that is a multi-core processor, then logically, the arithmetic unit 110 can be said to have multiple processors.

[0013] The processor in the arithmetic unit 110 may be at least one of the following: CPU (Central Processing Unit), GPU (Graphics Processing Unit), FPGA (Field Programmable Gate Array), and TPU (Tensor Processing Unit).

[0014] The storage device 120 may be at least one of the following: RAM (Random Access Memory), ROM (Read Only Memory), hard disk drive, magneto-optical disk drive, SSD (Solid State Drive), and optical disk array. In other words, the storage device 120 may be implemented by a single device or by multiple devices.

[0015] The storage device 120 is capable of storing desired data. The storage device 120 may store the computer program CP that the arithmetic unit 110 will execute. The storage device 120 may temporarily store data that the arithmetic unit 110 will use temporarily when the arithmetic unit 110 is executing the computer program CP.

[0016] The computer program CP may be recorded on a computer-readable and non-temporary recording medium. In this case, the computer program CP may be stored in the storage device 120 by reading the recording medium using a recording medium reading device (not shown) provided by the in-vehicle device 10. At least one of the following may be used as the recording medium: an optical disc, a magnetic medium, a magneto-optical disc, a semiconductor memory, and any other medium capable of storing a program. The computer program CP may also be obtained from an external device (not shown) of the in-vehicle device 10 via a communication device 130. In other words, the computer program CP may be downloaded from an external device to the storage device 120 of the in-vehicle device 10.

[0017] The arithmetic unit 110 (for example, a processor) may execute the processing that the in-vehicle device 10 should perform together with the memory device 120 in which the computer program CP is stored (in other words, together with the memory device 120 and the computer program CP stored in the memory device 120). For example, by the arithmetic unit 110 executing the computer program CP, a logical functional block for executing the processing that the in-vehicle device 10 should perform may be realized within the arithmetic unit 110 (for example, within the processor).

[0018] The communication device 130 is configured to communicate with devices outside the in-vehicle device 10. The communication device 130 may use wired communication or wireless communication.

[0019] The input device 140 is a device capable of receiving information input to the in-vehicle device 10 from an external source. The input device 140 may include an operating device (e.g., a keyboard, mouse, touch panel, etc.) that can be operated by the user of the in-vehicle device 10. The input device 140 may also include a recording medium reader capable of reading information recorded on a recording medium that can be attached to or detached from the in-vehicle device 10, such as a USB (Universal Serial Bus) memory. When information is input to the in-vehicle device 10 via the communication device 130 (in other words, when the in-vehicle device 10 acquires information via the communication device 130), the communication device 130 may function as an input device.

[0020] The output device 150 is a device capable of outputting information to the outside of the in-vehicle device 10. The output device 150 may have a display device capable of outputting visual information such as characters and images as the above information. The output device 150 may have a speaker capable of outputting auditory information such as voice as the above information. The output device 150 may be configured to output the above information (for example, control information of other devices). The output device 150 may be capable of outputting information to a recording medium that can be attached to and removed from the in-vehicle device 10, such as a USB memory stick. When the in-vehicle device 10 outputs information via the communication device 130, the communication device 130 may function as an output device.

[0021] Note that the server 50 may also have the same hardware configuration as the in-vehicle device 10 described above. For example, the server 50 may be configured to include components similar to the arithmetic unit 110, the storage device 120, the communication device 130, the input device 140, and the output device 150.

[0022] <Functional Configuration> Next, with reference to FIG. 2, the functional configuration of the information processing system 1 according to the embodiment will be described. FIG. 2 is a block diagram showing the functional configuration of the information processing system according to the embodiment.

[0023] In FIG. 2, the information processing system 1 is configured as a system for managing video data captured by a vehicle. The in-vehicle device 10 in the information processing system 1 includes, as components for realizing its functions, a video data storage unit 210, a feature text generation unit 220, and a video extraction unit 230. Further, the server 50 in the information processing system 1 includes, as components for realizing its functions, a feature text storage unit 510, a query acquisition unit 520, a feature text search unit 530, and a video output unit 540. Each of the feature text generation unit 220, the video extraction unit 230, the query acquisition unit 520, the feature text search unit 530, and the video output unit 540 may be a processing block realized by the arithmetic unit 110 described above. Also, each of the video data storage unit 210 and the feature text storage unit 510 may be a database realized by the storage device 120 described above.

[0024] The video data storage unit 210 is configured to be able to store video data captured by a vehicle. The video data typically includes data of an image of the surrounding area of the vehicle captured by a camera mounted on the vehicle, but may also include an image inside the vehicle (i.e., inside the passenger compartment). Further, the video data storage unit 210 may be configured to acquire and store video captured by a terminal (e.g., a smartphone, etc.) owned by a user using the vehicle via wireless communication or the like.

[0025] The video data storage unit 210 may have a function to delete low-priority video data when the storage capacity becomes low. This priority may be assigned based on at least one of the following: information about the vehicle at the time the video was taken, and information about the surrounding conditions of the vehicle. More specifically, video taken in situations that are unlikely to occur may be given a higher priority (i.e., video that is less likely to be deleted).

[0026] The feature text generation unit 220 is configured to generate feature text from video data stored in the video data storage unit 210. Specifically, the feature text generation unit 220 analyzes each image frame contained in the video data to generate feature text. Feature text is text data that describes the characteristics of a scene.

[0027] The feature text may include text corresponding to predetermined conditions. Specific examples of predetermined conditions include the time, location, and circumstances of the video recording. In this case, the feature text generated from video footage of children crossing a crosswalk may be: "Time of recording: Month / Day / Hour / Minutes, Location: Town / City / Prefecture, Circumstance: Many children are crossing the crosswalk." These predetermined conditions may be changeable as needed. In this case, the feature text generation unit 220 simply needs to regenerate the feature text based on the changed predetermined conditions from the video data stored in the video data storage unit 210. For example, if a new item is added as a predetermined condition, the feature text generation unit 220 simply needs to generate additional feature text related to the new item.

[0028] The feature text generation unit 220 may generate feature text using a machine learning model. This model may take image frames contained in video data as input and output feature text. This model may be, for example, a large-scale language model (LLM). It may also be a multimodal LLM that handles multiple modals.

[0029] The video extraction unit 230 is configured to extract video data corresponding to feature text input from the server 50 (specifically, the feature text search unit 530) from among multiple video data stored in the video data storage unit 210. The video extraction unit 230 is also configured to transmit the video data extracted from the video data storage unit 210 to the server 50 (specifically, the video output unit 540).

[0030] The feature text storage unit 510 is configured to store feature text. Specifically, the feature text storage unit 510 is configured to sequentially acquire and store the feature text generated by the feature text generation unit 220.

[0031] The query acquisition unit 520 acquires a search query from the user. The search query is query information for searching video data stored in the video data storage unit 210, and may be text information including natural language. For example, the search query may include the text, "Time of day: daytime, video footage of one minute before and after a scene in which a pedestrian is running a red light at an intersection." The query acquisition unit 520 may acquire text information entered by the user of the vehicle using a touch panel or the like as a search query. Alternatively, the query acquisition unit 520 may acquire voice information entered by the user of the vehicle using a microphone or the like as a search query. In this case, the query acquisition unit 520 may have a function to convert the acquired voice information into text (i.e., a voice recognition function).

[0032] The feature text search unit 530 is configured to search for feature texts stored in the feature text storage unit 510 that match the search query obtained by the query acquisition unit 520. Furthermore, when the feature text search unit 530 finds a feature text that matches the search query, it is configured to output information about that feature text to the in-vehicle device 10 (specifically, the video extraction unit 230) and to allow the extraction of video to be requested.

[0033] The video output unit 540 is configured to receive video data extracted by the video extraction unit 230 in the in-vehicle device 10 and output it to the user. The video output unit 540 may, for example, output the video data to a display and play it automatically. The video output unit 540 may also output multiple video data. In this case, the video output unit 540 may present the user with a list containing multiple video data and output the video data selected by the user from the list. The video data output by the video output unit 540 may be used, for example, for accident cause analysis or automatic generation of car life log videos.

[0034] (Accumulation action) Next, with reference to Figure 3, the flow of the data storage operation by the information processing system 1 according to the embodiment (specifically, the operation when storing video data and feature text corresponding to the video data) will be explained. Figure 3 is a flowchart showing the flow of the data storage operation by the information processing system according to the embodiment.

[0035] As shown in Figure 3, when the storage operation by the information processing system 1 according to the embodiment is started, first video is captured in the vehicle (step S101), and the video data storage unit 210 stores the captured video data (step S102).

[0036] Next, the feature text generation unit 220 generates feature text from the video data stored in the video data storage unit 210 (step S103). Then, the feature text generation unit 220 sends the generated feature text to the server 50 (step S104). The server 50 receives the feature text sent by the feature text generation unit 220 and stores it in the feature text storage unit 510 (step S105).

[0037] Although this example shows feature text being generated immediately after video is captured, the feature text may be generated after a period of time has elapsed since the video was captured (for example, after a sufficient amount of video has been stored in the video data storage unit 210). Furthermore, if predetermined conditions related to the feature text are changed, the feature text may be regenerated at that time.

[0038] (Extraction operation) Next, with reference to Figure 4, the flow of the extraction operation by the information processing system 1 according to the embodiment (specifically, the operation of outputting video data that matches the search query entered by the user from the stored video data) will be explained. Figure 4 is a flowchart showing the flow of the extraction operation by the information processing system according to the embodiment.

[0039] As shown in Figure 4, when the extraction operation by the information processing system 1 according to the embodiment is started, the query acquisition unit 520 first acquires a search query from the user (step S201). Then, the feature text search unit 530 searches the feature text stored in the feature text storage unit 510 for one that matches the search query acquired by the query acquisition unit 520 (step S202).

[0040] Next, the feature text search unit 530 transmits information about feature texts that match the search query to the in-vehicle device 10 and requests video extraction (step S203). Then, the video extraction unit 230 extracts the video corresponding to the requested feature text from among the multiple video data stored in the video data storage unit 210 (step S204).

[0041] Next, the video extraction unit 230 transmits the video data extracted from the video data storage unit 210 to the video output unit 540 in the server 50 (step S205). The video output unit 540 then receives the video data extracted by the video extraction unit 230 and outputs it to the user (step S206).

[0042] (Technical effects) Next, the technical effects obtained by the information processing system 1 according to this embodiment will be described.

[0043] As explained in Figures 1 to 4, in the information processing system 1 according to this embodiment, video data captured by the vehicle is managed by the server 50 using feature text generated from the video data. In this way, there is no need to send the video data itself captured by the vehicle to the server 50; the video data can be properly managed by simply sending the feature text generated from the video data to the server 50. For example, a user who wants to output desired video data can enter a search query on the server 50 and perform a search of the feature text managed on the server 50. As a result, video data matching the search query is extracted on the vehicle side and sent to the server 50. With this configuration, the storage cost on the server 50 side and the communication cost between the in-vehicle device 10 and the server 50 can be reduced. In addition, since video searches can be performed using feature text, it is possible to extract videos that match flexible conditions specified in text.

[0044] This disclosure is not limited to the embodiments described above and may be modified as appropriate without contradicting the gist or idea of ​​the invention as can be inferred from the claims and the specification as a whole. Information processing systems with such modifications are also included within the technical scope of the present invention. [Explanation of symbols]

[0045] 1…Information processing system, 10…In-vehicle device, 50…Server, 110…Calculation unit, 120…Storage device, 130…Communication device, 140…Input device, 150…Output device, 210…Video data storage unit, 220…Feature text generation unit, 230…Video extraction unit, 510…Feature text storage unit, 520…Query acquisition unit, 530…Feature text search unit, 540…Video output unit

Claims

1. A storage means for storing images captured by a camera mounted on the vehicle within the vehicle, A generation means for generating feature text that indicates the characteristics of a scene from image frames contained in the video stored in the storage means, A feature storage means is provided on the exterior of the vehicle and stores the feature text, A query acquisition means that obtains a search query from an external user of the vehicle to search for the video stored in the storage means, A search means for searching for feature text that matches the search query from among the feature text stored in the feature storage means, An output means extracts the video corresponding to the feature text that matches the search query from the storage means and outputs it to the user, An information processing system equipped with the following features.

2. The aforementioned feature text includes text corresponding to predetermined conditions, The generation means regenerates the feature text based on the changed predetermined conditions if the predetermined conditions are changed after the feature text has been generated. The information processing system according to claim 1.

3. The storage means is Based on at least one of the information regarding the vehicle and the surrounding conditions of the vehicle at the time the video was filmed, the video is assigned a priority and stored. When deleting the aforementioned video, the videos are deleted in order from the lowest priority to the lowest priority. The information processing system according to claim 1 or 2.

4. The generation means is a large-scale language model that takes the image frame as input and outputs the feature text. The information processing system according to claim 1 or 2.