Image extraction device, vehicle, image extraction system, image extraction method, and storage medium storing image extraction program

CN115905592BActive Publication Date: 2026-06-19TOYOTA JIDOSHA KK

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TOYOTA JIDOSHA KK
Filing Date
2022-09-05
Publication Date
2026-06-19

Smart Images

  • Figure CN115905592B_ABST
    Figure CN115905592B_ABST
Patent Text Reader

Abstract

The present disclosure relates to an image extraction device, a vehicle, an image extraction system, an image extraction method, and a storage medium having an image extraction program recorded thereon. The image extraction device of the present disclosure includes: an acquisition unit that acquires an image outside a vehicle captured by an imaging device mounted on the vehicle; an extraction unit that extracts a plurality of extraction images that are images including an image captured at a time point that satisfies a predetermined condition; an inference unit that infers a desired image that is an image most desired by an occupant of the vehicle from the plurality of extraction images; and an output unit that outputs the desired image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to an image extraction apparatus, a vehicle, an image extraction system, an image extraction method, and a storage medium storing an image extraction program. Background Technology

[0002] Japanese Patent Application Publication No. 2019-114875 discloses an image capturing device that infers subjects that match the occupant's preferences based on machine learning results of images uploaded by occupants to social media, and instructs the camera unit used for capturing images of the exterior of the vehicle to capture the inferred subjects.

[0003] The imaging device disclosed in Japanese Patent Application Publication No. 2019-114875 captures an image only once, therefore there is a possibility that the image captured by the imaging device may not be the image most desired by the occupants. Summary of the Invention

[0004] This disclosure was made in consideration of the above facts, and its purpose is to provide an image extraction device, vehicle, image extraction system, image extraction method, and storage medium storing an image extraction program that are more likely to output the image most desired by the occupant compared to the case of taking only one picture.

[0005] The image extraction apparatus of the first form includes: an acquisition unit that acquires images of the exterior of a vehicle captured by a camera mounted on a vehicle; an extraction unit that extracts multiple extracted images from the images, including images captured at time points that meet predetermined conditions; an inference unit that infers a desired image from the multiple extracted images as the image most desired by the occupants of the vehicle; and an output unit that outputs the desired image.

[0006] In the first-type image extraction apparatus, the acquisition unit acquires images of the exterior of the vehicle captured by a camera mounted on the vehicle; the extraction unit extracts multiple extracted images from these images, including images captured at time points that meet predetermined conditions; the inference unit infers a desired image from the multiple extracted images, which is the image most desired by the occupants of the vehicle; and the output unit outputs the desired image. According to the first-type image extraction apparatus, compared to the case of capturing only one image, the probability of outputting the image most desired by the occupants is higher.

[0007] The second type of image extraction device is configured such that, based on the first type of image extraction device, the extraction unit extracts images including images taken at time points when the occupant's condition changes as the extracted images.

[0008] According to the image extraction device of the second form, even when the occupant cannot make a physical shooting instruction such as pressing a switch, it is possible to extract images including those taken at points in time when the occupant's situation changes.

[0009] The third type of image extraction device is configured such that, based on the first or second type of image extraction device, the extraction unit extracts an image including an image captured at the time when the occupant makes a sound containing a predetermined speech as the extracted image.

[0010] According to the image extraction device of the third form, even when the occupant cannot make a physical shooting instruction such as pressing a switch, it is possible to extract images including images captured at the time when the occupant makes a sound including a predetermined speech.

[0011] The fourth type of image extraction device is configured such that, based on any of the first to third types of image extraction devices, the extraction unit extracts an image including an image taken at a time when the vehicle passed a predetermined location as the extracted image.

[0012] According to the image extraction device of the fourth form, even when the occupant cannot make a physical instruction to take a picture, such as pressing a switch, it is possible to extract images including those taken at the time when the vehicle passes through a predetermined location.

[0013] The fifth type of image extraction device is configured such that, based on the image extraction devices of any of the first to fourth types, the inference unit infers the desired image based on the results of machine learning performed on the image submitted to the social networking service.

[0014] According to the fifth form of the image extraction device, a desired image can be inferred from multiple extracted images based on images submitted to social networking services.

[0015] The image extraction device described in the sixth form is configured such that, based on the image extraction device of the fifth form, the inference unit infers the desired image based on the results of machine learning on the image submitted by the occupant to the social networking service.

[0016] According to the image extraction device of the sixth form, it is possible to infer a desired image from multiple extracted images based on images submitted by occupants to social networking services.

[0017] The image extraction device of the seventh form is configured such that, based on the image extraction devices of any of the first to fourth forms, the inference unit infers the desired image based on the results of machine learning of the occupant's action history.

[0018] According to the image extraction device of the seventh form, it is possible to infer the desired image from multiple extracted images based on the crew's action history.

[0019] The vehicle in the eighth form is equipped with an image extraction device and a shooting device of any of the first to seventh forms.

[0020] Based on the vehicle in its eighth form, compared to the case of taking only one picture, it is more likely to be able to output the image most desired by the occupants.

[0021] The ninth type of image extraction system is an image extraction system comprising an image extraction device of any one of the first to seventh types and a mobile terminal, wherein the mobile terminal comprises a display device for displaying the desired image.

[0022] According to the image extraction system of the ninth form, compared with the case of taking only one picture, it is more likely to display the image most desired by the occupants on the display device.

[0023] The image extraction method of the tenth form is configured such that a computer performs the following processing: acquiring images of the exterior of a vehicle captured by a camera mounted on the vehicle; extracting multiple extracted images from the images, including images captured at time points that meet predetermined conditions; inferring a desired image from the multiple extracted images as the image most desired by the occupants of the vehicle; and outputting the desired image.

[0024] According to the image extraction method of the 10th form, compared with the case of only taking one picture, it is more likely to output the image most desired by the occupants.

[0025] The non-temporary storage medium storing the image extraction program in the 11th form enables the computer to perform the following processing: acquiring images of the exterior of the vehicle captured by a camera mounted on the vehicle, extracting multiple extracted images from the images, including images captured at time points that meet predetermined conditions, deducing a desired image from the multiple extracted images as the image most desired by the occupants of the vehicle, and outputting the desired image.

[0026] According to the non-temporary storage medium storing the image extraction program in the 11th form, the possibility of outputting the image most desired by the occupants is higher compared to the case of taking only one picture.

[0027] According to this disclosure, compared to the case of taking only one picture, there is a higher probability that the image most desired by the occupants can be output. Attached Figure Description

[0028] Exemplary embodiments of this disclosure will be described in detail based on the following figures, wherein,

[0029] Figure 1This is a diagram showing a simplified structure of the image extraction system according to the first embodiment.

[0030] Figure 2 It is a block diagram representing the hardware structure of equipment mounted on a vehicle.

[0031] Figure 3 This is a block diagram illustrating an example of the functional structure of a CPU in a control device.

[0032] Figure 4 This is a flowchart illustrating an example of the trigger decision processing performed by the control device involved in the implementation.

[0033] Figure 5 This is a flowchart illustrating an example of the generation process performed by the control device according to the first embodiment.

[0034] Figure 6 This is a flowchart illustrating an example of the output processing flow performed by the control device involved in the embodiment.

[0035] Figure 7 This is a diagram showing a simplified structure of the image extraction system according to the second embodiment.

[0036] Figure 8 This is a flowchart illustrating an example of the generation process performed by the control device according to the second embodiment.

[0037] Figure 9 This is a flowchart illustrating an example of the extraction process performed by the control device according to the second embodiment.

[0038] Figure 10 This is a flowchart illustrating an example of the inference process performed by the inference device according to the second embodiment. Detailed Implementation

[0039] Hereinafter, an image extraction system as an embodiment of the present disclosure will be described using the accompanying drawings.

[0040] [First Implementation]

[0041] (structure)

[0042] Figure 1 This is a block diagram illustrating a simplified structure of the image extraction system 10 according to this embodiment.

[0043] like Figure 1As shown, the image extraction system 10 according to this embodiment includes a vehicle 12, an SNS (Social Networking Service) server 40, and a smartphone 50 as a mobile terminal. The vehicle 12, the SNS server 40, and the smartphone 50 are interconnected via a network N1. In addition, the vehicle 12 includes a control device 20 as an image extraction device and an external camera 22 as a shooting device.

[0044] SNS server 40 functions as a management server for a social networking service (hereinafter referred to as "SNS"). SNS server 40 stores data related to submissions for each user's account. The following explanation uses the case where the user of SNS server 40 is a passenger of vehicle 12 as an example.

[0045] The smartphone 50 is a mobile terminal held by the occupants of the vehicle 12 (hereinafter referred to as "occupants"). Alternatively, a mobile terminal such as a personal computer or tablet can be used instead of the smartphone 50. The smartphone 50 is equipped with a liquid crystal display 50A, which serves as a display device for displaying desired images (described later) output from the control device 20.

[0046] Figure 2 This is a block diagram showing the hardware structure of the device mounted on the vehicle 12 in this embodiment.

[0047] The exterior camera 22 is a camera device used to capture images of the exterior of the vehicle. The exterior camera 22 can be installed either outside or inside the vehicle.

[0048] Microphone 23 is a device for collecting the sounds emitted by the occupants. Microphone 23 is located in the dashboard, center console, front pillar, or control panel, etc.

[0049] GPS device 24 is a device for obtaining the location information of vehicle 12.

[0050] The seating sensor 25 is, for example, a sensor that uses a piezoelectric element to detect pressure changes caused by the occupant sitting down. The seating sensor 25 is installed on the seat surface of the driver's seat, etc.

[0051] The human sensor 26 is, for example, a sensor that uses infrared, ultrasonic, or visible light to detect people approaching the vehicle 12. The human sensor 26 is installed in the bumper or similar location. Alternatively, the human sensor 26 can also be a sensor for detecting occupants. In this case, the human sensor 26 is installed inside the vehicle 12.

[0052] The control device 20 is configured to include a CPU 20A, a ROM 20B, a RAM 20C, a storage device 20D, an input / output (I / F) 20E, and a wireless communication (I / F) 20F. The CPU 20A, ROM 20B, RAM 20C, storage device 20D, I / F 20E, and wireless communication (I / F) 20F are communicatively connected to each other via a bus 21. The CPU 20A is an example of a processor.

[0053] CPU 20A is a central processing unit that executes various programs and controls various components. Specifically, CPU 20A reads programs from ROM 20B or storage device 20D and uses RAM 20C as its operating area to execute the programs. In this embodiment, the control program 250, described later, is stored in storage device 20D. CPU 20A acts as a processor by executing the control program 250. Figure 3 The acquisition unit 200, extraction unit 210, generation unit 220, inference unit 230 and output unit 240 shown perform their functions.

[0054] ROM 20B stores various programs and data. RAM 20C temporarily stores programs or data as a working area. Storage device 20D, as a storage unit, is composed of HDD (Hard Disk Drive) or SSD (Solid State Drive) and stores various programs and data, including the operating system. In this embodiment, storage device 20D stores control program 250 and calculation data 260.

[0055] The input / output I / O 20E is an interface for communicating with various devices mounted on the vehicle 12. In this embodiment, the control device 20 is connected to an exterior camera 22, a microphone 23, a GPS device 24, a seating sensor 25, and a human sensor 26 via the input / output I / O 20E. Furthermore, the exterior camera 22, microphone 23, GPS device 24, seating sensor 25, and human sensor 26 can also be directly connected to the bus 21.

[0056] The wireless communication I / F20F is an interface used to communicate with other devices such as SNS servers 40 and smartphones 50, for example, using standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark).

[0057] The control program 250, which is an image extraction program, is a program used to control the control device 20.

[0058] In the calculation data 260, a score calculation formula for calculating the evaluation score of the extracted image is stored for each type representing a summary of the photographed subject, such as the sea, an airplane, or a person.

[0059] Figure 3 This is a block diagram illustrating an example of the functional architecture of the CPU20A. For example... Figure 3 As shown, the CPU 20A includes an acquisition unit 200, an extraction unit 210, a generation unit 220, an inference unit 230, and an output unit 240. Each functional structure is implemented by the CPU 20A reading and executing the control program 250 stored in the storage device 20D.

[0060] The acquisition unit 200 is capable of acquiring images of the exterior of the vehicle captured by the exterior camera 22. Additionally, the acquisition unit 200 acquires the voices emitted by the occupants from the microphone 23. Furthermore, the acquisition unit 200 acquires the location information of the vehicle 12 from the GPS device 24.

[0061] In addition, the acquisition unit 200 has the function of acquiring images (hereinafter also referred to as "submitted images") uploaded to the SNS by passengers and sent from the SNS server 40. Furthermore, in addition to submitted images, the acquisition unit 200 can also acquire text or audio information uploaded to the SNS by passengers. Furthermore, the acquisition unit 200 can acquire the location where images uploaded to the SNS by a predetermined number (e.g., 100,000) or more accounts from the SNS server 40 were taken.

[0062] The extraction unit 210 has the function of extracting multiple images from the images acquired by the acquisition unit 200, including images captured at a time point (hereinafter referred to as the "trigger acquisition time") that meets a predetermined condition. In this embodiment, all images captured from before the trigger acquisition time (e.g., 5 seconds before the trigger acquisition time) to after the trigger acquisition time (e.g., 5 seconds after the trigger acquisition time) are used as the extracted images.

[0063] Furthermore, in this embodiment, the moment of trigger acquisition is the moment of change in the occupant's condition. Specifically, the moment of change in the occupant's condition is the moment when a predetermined threshold V (hereinafter referred to as "threshold V") of the change in the occupant's voice is met or the moment when the occupant leaves the seat. In addition, the moment of change in the occupant's condition may also be the moment when a predetermined threshold of the change in the occupant's gaze is met or the moment when the occupant's facial expression changes, etc.

[0064] Furthermore, in this embodiment, the trigger acquisition time is the moment when the occupant emits a sound including a predetermined statement. In this embodiment, the predetermined statement is at least one of the following: statements expressing interest in the outside world, such as "What's outside?", or statements expressing feelings about the scenery, such as "What a beautiful sea!". Additionally, in this embodiment, the trigger acquisition time is the moment when the occupant instructs on taking a picture of the outside world.

[0065] Furthermore, in this embodiment, the trigger acquisition time is the time when the application vehicle 12 passes through a predetermined location. In this embodiment, the predetermined location is the location captured by images posted to the SNS by a predetermined number or more accounts acquired by the application acquisition unit 200, or the location where a predetermined number of people P (hereinafter referred to as "number of people P") or more are within the detection range of the human sensing sensor 26. In addition, the predetermined location is the destination set by the application occupants for the vehicle navigation device installed in the vehicle 12.

[0066] The generation unit 220 uses a dataset consisting of a combination of submitted images pre-acquired by the acquisition unit 200 and scores reflecting the occupants' preferences as teaching data. It then generates a score calculation formula for each type using machine learning methods such as neural networks and stores the generated score calculation formula in the storage device 20D. Furthermore, the generation unit 220 can also use a dataset consisting of a submitted image and the number of positive evaluations of that submitted image as teaching data.

[0067] The inference unit 230 has the function of inferring a desired image, which is the most desired image for the vehicle occupants, from multiple extracted images extracted by the extraction unit 210. Furthermore, the desired image includes not only still images but also animations. In this embodiment, the inference unit 230 infers the desired image based on the results of machine learning applied to the submitted images acquired by the acquisition unit 200. Specifically, the inference unit 230 determines the type of extracted image by inputting the extracted image into a convolutional neural network model (hereinafter referred to as a CNN model). Furthermore, the inference unit 230 selects a score calculation formula associated with the determined type from the storage device 20D. The inference unit 230 inputs the extracted image into the selected score calculation formula and calculates an evaluation score for each extracted image. Finally, the inference unit 230 infers the image with the highest evaluation score among the extracted images as the desired image.

[0068] Furthermore, the inference unit 230 can also infer the desired image based on the results of machine learning on images posted by accounts that are connected to the occupant on SNS. In other words, the inference unit 230 can also infer the desired image based on the results of machine learning on images posted by at least one of the accounts the occupant follows on SNS (hereinafter referred to as "followers") and the accounts the occupant is followed on SNS (hereinafter referred to as "fans"). Specifically, the acquisition unit 200 acquires images posted by at least one of the followers and fans, and the number of positive reviews for those images. Moreover, the generation unit 220 uses the images of the aforementioned posts acquired by the acquisition unit 200 and the number of the aforementioned positive reviews as teaching data, and generates a score calculation formula for each type through machine learning such as a neural network. Furthermore, the inference unit 230 extracts images by inputting them into the score calculation formula associated with the determined type, and infers the image with the highest review score as the desired image.

[0069] Furthermore, the inference unit 230 can also infer the desired image based on the results of machine learning on images posted by accounts with a predetermined number of followers (hereinafter referred to as "number of followers") or more on SNS. Specifically, the acquisition unit 200 acquires images posted by accounts with a predetermined number of followers or more, and the number of positive reviews for the images posted by those accounts. Moreover, the generation unit 220 uses the images posted by the acquisition unit 200 and the number of positive reviews as teaching data, and generates a score calculation formula for each type using machine learning such as a neural network. Furthermore, the inference unit 230 extracts images by inputting them into the score calculation formula associated with the determined type, and infers the image with the highest review score as the desired image.

[0070] Furthermore, even without teaching data, the inference unit 230 can infer the desired image from multiple extracted images extracted by the extraction unit 210 through reinforcement learning. Alternatively, the inference unit 230 can infer the desired image without using machine learning. For example, the inference unit 230 can infer the image that is most similar to the most recent submission image or the submission image that received the most positive reviews as the desired image.

[0071] The output unit 240 has the function of outputting the desired image inferred by the inference unit 230. In this embodiment, the output unit 240 outputs the desired image to the smartphone 50. However, it is not limited to this example. For example, the output unit 240 may also output the desired image to a display on the dashboard of the vehicle 12.

[0072] (Processing flow)

[0073] Next, the processing flow of the image extraction system 10 in this embodiment will be described.

[0074] First, use Figure 4 The trigger decision processing executed by the CPU 20A reading the control program 250 in the control device 20 will be explained.

[0075] First, in step S100, CPU 20A acquires images of the outside of the vehicle captured by the external camera 22. Then, CPU 20A proceeds to step S101.

[0076] In step S101, CPU 20A remains in standby mode until a predetermined time (e.g., 5 minutes) has elapsed. In other words, CPU 20A sleeps for a predetermined time. If the predetermined time has elapsed, CPU 20A proceeds to step S102.

[0077] In step S102, the CPU 20A determines whether the occupant instructed the external camera to take pictures during the period when the external camera 22 was taking pictures of the images acquired in step S100. Specifically, the CPU 20A determines whether it received input for instructing the external camera to take pictures during the period when the external camera 22 was taking pictures of the images acquired in step S100 via a switch or similar device provided in the vehicle 12. If it is determined that the occupant instructed the external camera to take pictures during the period when the external camera 22 was taking pictures of the images acquired in step S100 (step S102: Yes), the CPU 20A proceeds to step S108. On the other hand, if it is determined that the occupant did not instruct the external camera to take pictures during the period when the external camera 22 was taking pictures of the images acquired in step S100 (step S102: No), the CPU 20A proceeds to step S103.

[0078] In step S103, CPU 20A determines whether the change in the amount of sound emitted by the occupant during the period when the external camera 22 captures the image obtained in step S100 has increased to or exceeded a threshold V. Specifically, CPU 20A determines whether the change in sound received via microphone 23 during the period when the external camera 22 captures the image obtained in step S100 has increased to or exceeded the threshold V. If it is determined that the change in the amount of sound emitted by the occupant during the period when the external camera 22 captures the image obtained in step S100 has increased to or exceeded the threshold V (step S103: Yes), CPU 20A proceeds to step S108. On the other hand, if it is determined that the change in the amount of sound emitted by the occupant during the period when the external camera 22 captures the image obtained in step S100 has not increased to or exceeded the threshold V (step S103: No), CPU 20A proceeds to step S104.

[0079] In step S104, the CPU 20A determines whether the seating sensor 25 switched from on to off during the period when the external camera 22 was capturing the image obtained in step S100. Specifically, the CPU 20A determines whether the occupant left the seat during the period when the external camera 22 was capturing the image obtained in step S100. If it is determined that the seating sensor 25 switched from on to off during the period when the external camera 22 was capturing the image obtained in step S100 (step S104: Yes), the CPU 20A proceeds to step S108. On the other hand, if it is determined that the seating sensor 25 did not switch from on to off during the period when the external camera 22 was capturing the image obtained in step S100 (step S104: No), the CPU 20A proceeds to step S105.

[0080] In step S105, CPU 20A determines whether sound, including predetermined speech, was acquired during the period when the exterior camera 22 captured the image obtained in step S100. Specifically, CPU 20A determines whether the sound acquired via microphone 23 during the period when the exterior camera 22 captured the image obtained in step S100 includes at least one of speech expressing interest in the outside world and speech expressing feelings about the scenery. If it is determined that sound including predetermined speech was acquired during the period when the exterior camera 22 captured the image obtained in step S100 (step S105: Yes), CPU 20A proceeds to step S108. On the other hand, if it is determined that sound including predetermined speech was not acquired during the period when the exterior camera 22 captured the image obtained in step S100 (step S105: No), CPU 20A proceeds to step S106.

[0081] In step S106, the CPU 20A determines whether the vehicle 12 passed through a location with more than P people within a predetermined range from the vehicle 12 (i.e., within the range that the human sensor 26 can detect) during the period when the external camera 22 captured the image obtained in step S100. In other words, the CPU 20A determines whether the change in the value of the human sensor 26 during the period when the external camera 22 captured the image obtained in step S100 has become a predetermined threshold S or higher. If it is determined that the vehicle 12 passed through a location with more than P people within a predetermined range from the vehicle 12 during the period when the external camera 22 captured the image obtained in step S100 (step S106: Yes), the CPU 20A proceeds to step S108. On the other hand, if it is determined that the vehicle 12 did not pass through a location with more than P people within a predetermined range from the vehicle 12 during the period when the external camera 22 captured the image obtained in step S100 (step S106: No), the CPU 20A proceeds to step S107.

[0082] In step S107, CPU 20A determines whether the vehicle 12 passed through the location captured by the image obtained in step S100 during the period when the external camera 22 captured the image. Specifically, CPU 20A determines whether the vehicle 12 passed through the location captured by the image by a predetermined number of accounts. If it is determined that the vehicle 12 passed through the location captured by the image obtained in step S100 during the period when the external camera 22 captured the image (step S107: Yes), CPU 20A proceeds to step S108. On the other hand, if it is determined that the vehicle 12 did not pass through the location captured by the image obtained in step S100 during the period when the external camera 22 captured the image (step S107: No), CPU 20A proceeds to step S109.

[0083] In step S108, CPU 20A acquires the trigger acquisition time. In other words, CPU 20A acquires the time when the condition corresponding to the process that was affirmatively determined in any of the processes from step S102 to step S107 is satisfied. For example, if CPU 20A made an affirmative determination in step S103, in step S108, CPU 20A acquires the time when the change in the amount of sound emitted by the occupant becomes above the threshold V. Then, CPU 20A proceeds to step S109.

[0084] In step S109, CPU 20A determines whether the processing from steps S102 to S107 has been performed on all images in the external vehicle image acquired in step S100. If it is determined that the processing from steps S102 to S107 has been performed on all images in the external vehicle image acquired in step S100 (step S109: Yes), CPU 20A proceeds to step S110. On the other hand, if it is determined that the processing from steps S102 to S107 has not been performed on all images in the external vehicle image acquired in step S100 (step S109: No), CPU 20A returns to step S102.

[0085] In step S110, CPU20A stores all the acquired trigger acquisition times in the waiting queue Q and ends the current trigger decision process.

[0086] In the above process, the CPU20A determines the trigger acquisition time through the trigger decision process.

[0087] Next, use Figure 5 The generation process executed by the CPU 20A reading the control program 250 in the control device 20 will be explained.

[0088] First, in step S200, CPU 20A retrieves the submitted image from SNS server 40. Then, CPU 20A proceeds to step S201.

[0089] In step S201, CPU 20A uses the dataset consisting of the combination of the submitted images obtained in step S200 and the scores of the occupants' preferences as teaching data, and generates a score calculation formula for each type through machine learning such as a neural network. Then, CPU 20A proceeds to step S202.

[0090] In step S202, CPU 20A stores the fraction calculation formulas generated in step S201 in storage device 20D according to each type, and ends the generation process.

[0091] In the above-described generation process, CPU20A generates a score calculation formula for each type.

[0092] Next, use Figure 6 The output processing performed by the CPU 20A reading the control program 250 in the control device 20 will be explained.

[0093] First, in step S300, CPU 20A determines whether a trigger acquisition time is stored in the waiting queue Q. If it is determined that a trigger acquisition time is stored in the waiting queue Q (step S300: Yes), CPU 20A proceeds to step S301. On the other hand, if it is determined that no trigger acquisition time is stored in the waiting queue Q (step S300: No), CPU 20A ends this output process.

[0094] In step S301, CPU 20A obtains the trigger acquisition time from the waiting queue Q. Then, CPU 20A proceeds to step S302.

[0095] In step S302, CPU 20A extracts consecutive images. Specifically, CPU 20A extracts all images captured from before the predetermined time before the trigger acquisition time obtained in step S301 to after the predetermined time. Then, CPU 20A proceeds to step S303.

[0096] In step S303, CPU 20A classifies the extracted images. Specifically, CPU 20A determines the type of extracted image by inputting the extracted image into the CNN model. Then, CPU 20A proceeds to step S304.

[0097] In step S304, CPU 20A selects from storage device 20D the fraction calculation formula associated with the type determined in step S303. Then, CPU 20A proceeds to step S305.

[0098] In step S305, CPU20A inputs the extracted images to the score calculation formula selected in step S304, and calculates the evaluation score for each extracted image. Then, CPU20A proceeds to step S306.

[0099] In step S306, CPU 20A infers the image with the highest evaluation score from the extracted images as the desired image. Then, CPU 20A proceeds to step S307.

[0100] In step S307, CPU 20A outputs the desired image inferred in step S306 to smartphone 50. Then, CPU 20A returns to step S300.

[0101] The above process involves outputting the image with the highest score from the extracted images as the desired image.

[0102] (Summary of the first implementation method)

[0103] In summary, the image extraction system 10 according to this embodiment is more likely to output the image most desired by the occupants compared to the case of taking only one picture.

[0104] Here, the control device 20 in this embodiment has an image extraction function, which includes images captured at times when the sound change threshold V emitted by the occupant exceeds the threshold value V, or when the occupant leaves the seat. Therefore, even when the occupant cannot give a physical instruction to take a picture, such as pressing a switch, the extracted image can still be extracted.

[0105] Furthermore, the control device 20 in this embodiment has an image extraction function, which includes images captured at the moment when the occupant makes a sound, including at least one of a statement expressing interest in the outside world or a statement expressing feelings about the scenery. Therefore, even when the occupant cannot give a physical instruction to take a picture, such as pressing a switch, the extracted image can still be extracted.

[0106] Furthermore, the control device 20 in this embodiment has an image extraction function. This image is taken at a time when the vehicle 12 passes through a location where a predetermined number or more accounts have posted images to an SNS, or at a location where there are P or more people within the detection range of the human sensor 26. Therefore, even when the occupant cannot physically instruct the taking of an image, such as by pressing a switch, the extracted image can still be extracted.

[0107] Furthermore, the control device 20 in this embodiment has a function to infer a desired image based on the results of machine learning on images submitted by passengers to SNS. Therefore, it is possible to infer a desired image from multiple extracted images based on images submitted by passengers to SNS.

[0108] [Second Implementation]

[0109] In the first embodiment, CPU 20A infers the desired image based on the results of machine learning on images submitted by occupants to SNS. In the second embodiment, the desired image is inferred based on the results of machine learning on the occupants' activity history. Furthermore, in the first embodiment, CPU 20A infers the desired image and outputs it. In the second embodiment, a device different from the control device 20 infers the desired image and outputs it. The differences from the first embodiment will be explained below.

[0110] (structure)

[0111] Figure 7 This is a block diagram illustrating a simplified structure of the image extraction system 10 according to this embodiment.

[0112] In expression Figure 7 The block diagram and representation of the simplified structure of the image extraction system 10 shown are as follows. Figure 1 In the block diagram of the simplified structure of the image extraction system 10 shown, the image extraction system 10 differs from the SNS server 40 in that it has an inference device 60 instead of the SNS server 40.

[0113] The inference device 60 has the function of inferring a desired image and outputting the desired image.

[0114] (Processing flow)

[0115] Next, the processing flow in the image extraction system 10 of this embodiment will be described. Furthermore, the trigger determination process is the same as in the first embodiment, so its description will be omitted.

[0116] First, use Figure 8 The process of generating the program by reading the control program 250 through the CPU 20A in the control device 20 will be described.

[0117] First, in step S400, CPU 20A retrieves the browsing history of web pages visited by the occupant using a web (World Wide Web) browser from smartphone 50. In this embodiment, the browsing history includes information such as the URL (Uniform Resource Locator), title, and browsing start time. Then, CPU 20A proceeds to step S401.

[0118] In step S401, CPU 20A uses the dataset consisting of a combination of the browsing history of the web pages obtained in step S400 and the scores of the passengers' preferences as teaching data, and generates a score calculation formula for each type through machine learning such as a neural network. Then, CPU 20A proceeds to step S402.

[0119] In step S402, CPU 20A stores the fraction calculation formulas generated in step S401 in storage device 20D according to each type, and ends the generation process.

[0120] In the generation process described above, CPU20A uses the combination of the occupant's web browsing history and the occupant's preference score as teaching data, and generates a score calculation formula for each type.

[0121] Furthermore, in this embodiment, the occupant's web browsing history is used as the occupant's activity history. However, it is not limited to this example. For example, the vehicle 12's driving history can also be used as the occupant's activity history.

[0122] Next, use Figure 9 The extraction process performed by reading the control program 250 through the CPU 20A in the control device 20 will be described.

[0123] The processing from step S500 to step S502 in the extraction process is the same as the processing from step S300 to step S302 in the output process of the first embodiment, so the description is omitted.

[0124] exist Figure 9 In step S503, the extracted image obtained in step S502 is sent to the inference device 60. Then, the CPU 20A returns to step S500.

[0125] Next, use Figure 10 The inference process performed by reading the program through the CPU 60A provided by the inference device 60 will be explained.

[0126] In step S600, CPU 60A receives the extracted image from control device 20. Then, CPU 60A proceeds to step S601.

[0127] In step S601, CPU 60A classifies the extracted images received from control device 20. Specifically, CPU 60A determines the type of extracted image by inputting the extracted image into the CNN model. Then, CPU 60A proceeds to step S602.

[0128] In step S602, CPU 60A receives from control device 20 the fraction calculation formula associated with the type determined in step S601. Then, CPU 60A proceeds to step S603.

[0129] In step S603, CPU 60A inputs the extracted images to the score calculation formula received in step S602, and calculates an evaluation score for each extracted image. Then, CPU 60A proceeds to step S604.

[0130] In step S604, CPU 60A infers the image with the highest evaluation score from the extracted images as the desired image. Then, CPU 60A proceeds to step S605.

[0131] In step S605, CPU 60A outputs the desired image inferred in step S604 to smartphone 50 and ends the inference process.

[0132] In the inference process described above, the image with the highest evaluation score among the extracted images is output as the desired image.

[0133] Furthermore, in this embodiment, the inference device 60 and the smartphone 50 are different devices. However, it is not limited to this example. For example, the inference device 60 and the smartphone 50 could also be the same device. In other words, the smartphone 50 could also infer a desired image and display that desired image on the liquid crystal display 50A.

[0134] In this embodiment, the CPU 60A of the inference device 60 inputs a score calculation formula generated by the CPU 20A of the control device 20 to extract an image and calculates an evaluation score. However, it is not limited to this example. The CPU 60A of the inference device 60 may also input a score calculation formula generated by the CPU 60A to extract an image and calculate an evaluation score. In this case, the calculation formula data 260 is not stored in the storage device 20D of the control device 20, but in the storage device of the inference device 60.

[0135] [Remark]

[0136] In the image extraction system 10 described above, the control device 20 is integrated into the vehicle 12. However, the control device 20 can also be located outside the vehicle 12. In this case, the control device 20 acquires images of the outside of the vehicle from the external camera 22 via the network N1.

[0137] Furthermore, the trigger determination processing, generation processing, output processing, and extraction processing of the software (program) read and executed by the CPU 20A in the above embodiment can also be executed by various processors other than the CPU. Examples of processors in this case include FPGAs (Field-Programmable Gate Arrays) and PLDs (Programmable Logic Devices) whose circuit structure can be changed after manufacturing, and ASICs (Application Specific Integrated Circuits), which are dedicated electronic circuits with circuit structures specifically designed for executing specific processes. In addition, the trigger determination processing, generation processing, output processing, and extraction processing can be executed by one of these various processors, or by a combination of two or more processors of the same or different types (e.g., multiple FPGAs, and a combination of a CPU and an FPGA). More specifically, the hardware structure of these various processors is an electronic circuit composed of circuit elements such as semiconductor elements.

[0138] Furthermore, while the above embodiment describes the control program 250 being pre-stored (installed) in the storage device 20D, it is not a limitation. The program may also be provided as a storage medium such as a CD-ROM (Compact Disc Read Only Memory), DVD-ROM (Digital Versatile Disc Read Only Memory), or USB (Universal Serial Bus) memory. Alternatively, the program may be downloaded from an external device via a network.

[0139] The processing flow described in the above embodiments is also an example. Without departing from the main idea, unnecessary steps can be deleted, new steps can be added, and the processing order can be changed.

[0140] Furthermore, the structures of the vehicle 12, control device 20, SNS server 40, smartphone 50, and inference device 60 described in the above embodiments are examples, and can be changed according to the situation without departing from the main idea.

Claims

1. An image extraction device, wherein, The image extraction device includes: The acquisition unit acquires images of the exterior of the vehicle captured by a camera mounted on the vehicle; The extraction unit extracts multiple images from the image, including images captured at time points that meet predetermined conditions. The inference unit infers a desired image from the multiple extracted images, which is the image most desired by the occupants of the vehicle. as well as The output unit outputs the desired image. The inference unit is configured as follows: The extracted images are input into a convolutional neural network model to determine the type of each of the plurality of extracted images; Select a score calculation formula associated with the type of each of the multiple extracted images; The evaluation score for each of the multiple extracted images is calculated by inputting the extracted images into the score calculation formula. The image with the highest evaluation score among the extracted images is inferred as the desired image.

2. The image extraction device according to claim 1, wherein, The extraction unit extracts images including those captured at the time points when the occupant's condition changes, as the extracted images.

3. The image extraction device according to claim 1, wherein, The extraction unit extracts images including those captured at the time when the occupant makes a sound containing a predetermined speech as the extracted images.

4. The image extraction device according to claim 1, wherein, The extraction unit extracts images including those captured at the time the vehicle passed the predetermined location as the extracted images.

5. The image extraction apparatus according to any one of claims 1 to 4, wherein, The inference unit infers the desired image based on the results of machine learning applied to images submitted to social networking services.

6. The image extraction apparatus according to claim 5, wherein, The inference unit infers the desired image based on the results of machine learning on images submitted by the occupants to the social networking service.

7. The image extraction apparatus according to any one of claims 1 to 4, wherein, The inference unit infers the image most similar to the image that received the most positive reviews from the images submitted by the occupant to the social networking service as the desired image.

8. The image extraction apparatus according to any one of claims 1 to 4, wherein, The inference unit infers the desired image based on the results of machine learning on the occupant's historical activity records.

9. The image extraction apparatus according to claim 8, wherein, The inference unit infers the desired image based on the results of machine learning performed on the occupant's web browsing history.

10. The image extraction apparatus according to claim 8, wherein, The inference unit infers the desired image based on the results of machine learning on the vehicle's driving history.

11. A vehicle, wherein, The vehicle has the following features: The image extraction apparatus according to any one of claims 1 to 10; and Filming equipment.

12. An image extraction system comprising the image extraction device according to any one of claims 1 to 10, and a mobile terminal, wherein, The mobile terminal has a display device for displaying the desired image.

13. An image extraction method, wherein, The computer is to perform the following processing: To acquire images of the exterior of the vehicle captured by a camera mounted on the vehicle. Extract multiple images from the imagery, including images captured at time points that meet predetermined conditions. The extracted images are input into a convolutional neural network model to determine the type of each of the plurality of extracted images. A score calculation formula associated with the type of each of the plurality of extracted images is selected. By inputting the extracted images into the score calculation formula, an evaluation score is calculated for each of the plurality of extracted images. The image with the highest evaluation score among the plurality of extracted images is inferred as the desired image most desired by the occupants of the vehicle. Output the desired image.

14. A non-transitory storage medium storing an image extraction program, wherein, The image extraction program causes the computer to perform the following processes: To acquire images of the exterior of the vehicle captured by a camera mounted on the vehicle. Extract multiple images from the imagery, including images captured at time points that meet predetermined conditions. The extracted images are input into a convolutional neural network model to determine the type of each of the plurality of extracted images. A score calculation formula associated with the type of each of the plurality of extracted images is selected. By inputting the extracted images into the score calculation formula, an evaluation score is calculated for each of the plurality of extracted images. The image with the highest evaluation score among the plurality of extracted images is inferred as the desired image most desired by the occupants of the vehicle. Output the desired image.