Information processing apparatus, mobile body, control method therefor, and storage medium

By acquiring and analyzing the feature quantities of the captured images around the user, calculating the impurity, and generating efficient questions, the problem of excessive questioning in existing technologies is solved, and efficient estimation of target objects is achieved.

CN116778216BActive Publication Date: 2026-06-23HONDA MOTOR CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HONDA MOTOR CO LTD
Filing Date
2023-02-07
Publication Date
2026-06-23

Smart Images

  • Figure CN116778216B_ABST
    Figure CN116778216B_ABST
Patent Text Reader

Abstract

An information processing apparatus, a mobile body, a control method thereof, and a storage medium are provided, and an object as a target is estimated by generating an efficient question using a feature quantity of image recognition. The information processing apparatus acquires a captured image, detects a plurality of objects included in the captured image, and extracts a plurality of feature quantities for each of the plurality of detected objects. In addition, the information processing apparatus calculates impurity for each of the extracted feature quantities, the impurity indicating a degree to which a predetermined object cannot be separated from the plurality of objects in a case where a question for estimating the predetermined object from the plurality of objects based on each feature quantity is made to a user. Further, the information processing apparatus generates a question in a manner to reduce a number of questions for minimizing the impurity based on the extracted feature quantities and the impurity of each feature quantity.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to information processing devices, mobile bodies, methods for controlling them, and storage media. Background Technology

[0002] In recent years, small mobile entities have emerged, including electric vehicles with a passenger capacity of approximately one to two people (also known as micromobilities) and mobile conversational robots that provide various services to people. These mobile entities provide various services by identifying whether any object from a set of landmarks such as people and buildings is a target object (hereinafter referred to as the target). To identify the user as the target object, the mobile entity engages in dialogue with the user to filter candidates.

[0003] Regarding user questioning, Patent Document 1 proposes the following technology: by asking the user multiple questions through dialogue, and when filtering candidates for classification results based on the user's answers, a decision tree is generated that can reduce the number of questions asked to the user even if the user's answer is incorrect.

[0004] Existing technical documents

[0005] Patent documents

[0006] Patent Document 1: Japanese Patent Application Publication No. 2018-5624 Summary of the Invention

[0007] The problem that the invention aims to solve

[0008] However, the aforementioned prior art has the following problems. The prior art also considers the possibility of user errors when filtering classification and retrieval candidates based on answers, even while reducing the number of questions asked of the user. However, the prior art, filtering classification candidates based on answers to multiple user questions, does not effectively utilize information beyond the user's answers. In particular, when inferring a target user from multiple individuals, the feature data of images captured in the user's surrounding area is highly valuable information.

[0009] The present invention was made in view of the above-mentioned problems, and its object is to generate efficient questions by utilizing the feature quantities of image recognition, thereby estimating the target object.

[0010] means for solving problems

[0011] According to the present invention, for example, an information processing apparatus is characterized in that the information processing apparatus comprises: an acquisition mechanism that acquires a captured image; an extraction mechanism that detects a plurality of objects contained in the captured image and extracts a plurality of feature quantities for each of the detected plurality of objects; a calculation mechanism that calculates an impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a question is asked to a user for inferring a predetermined object from the plurality of objects based on each feature quantity; and a generation mechanism that generates the question based on the feature quantities extracted by the extraction mechanism and the impurity of each of the feature quantities in a manner that reduces the number of questions required to minimize the impurity.

[0012] Additionally, according to the present invention, for example, there is a mobile body characterized in that the mobile body comprises: an acquisition mechanism that acquires a captured image; an extraction mechanism that detects a plurality of objects contained in the captured image and extracts a plurality of feature quantities for each of the detected plurality of objects; a calculation mechanism that calculates an impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a question is asked to a user for inferring a predetermined object from the plurality of objects based on each feature quantity; and a generation mechanism that generates the question based on the feature quantity extracted by the extraction mechanism and the impurity of each of the feature quantities in a manner that reduces the number of questions to minimize the impurity.

[0013] Furthermore, according to the present invention, for example, a control method for an information processing apparatus, characterized in that the control method comprises: an acquisition step, wherein in the acquisition step, an image is acquired; an extraction step, wherein in the extraction step, a plurality of objects contained in the image are detected, and a plurality of feature quantities are extracted for each of the detected plurality of objects; a calculation step, wherein in the calculation step, an impurity is calculated for each feature quantity extracted in the extraction step, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a question is posed to a user for inferring a predetermined object from the plurality of objects based on the respective feature quantities; and a generation step, wherein in the generation step, the question is generated based on the feature quantities extracted in the extraction step and the impurity of each of the feature quantities in a manner that reduces the number of questions required to minimize the impurity.

[0014] Furthermore, according to the present invention, a method for controlling a mobile body is characterized in that the method comprises: an acquisition step, wherein acquiring a captured image; an extraction step, wherein detecting a plurality of objects contained in the captured image and extracting a plurality of feature quantities for each of the detected plurality of objects; a calculation step, wherein calculating an impurity for each feature quantity extracted in the extraction step, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a question is posed to a user for inferring a predetermined object from the plurality of objects based on the respective feature quantities; and a generation step, wherein generating a question based on the feature quantities extracted in the extraction step and the impurity of each of the feature quantities, in a manner reducing the number of questions required to minimize the impurity.

[0015] Furthermore, according to the present invention, for example, a storage medium is characterized by storing a program for enabling a computer of an information processing device to function as a mechanism for: an acquisition mechanism that acquires a captured image; an extraction mechanism that detects a plurality of objects contained in the captured image and extracts a plurality of feature quantities for each of the detected plurality of objects; a calculation mechanism that calculates an impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a question is posed to a user for inferring a predetermined object from the plurality of objects based on the respective feature quantities; and a generation mechanism that generates the question based on the feature quantities extracted by the extraction mechanism and the impurity of each of the feature quantities in a manner that reduces the number of questions required to minimize the impurity.

[0016] Furthermore, according to the present invention, it is characterized, for example, by storing a program for enabling a computer of a mobile body to function as a mechanism for: an acquisition mechanism that acquires a captured image; an extraction mechanism that detects a plurality of objects contained in the captured image and extracts a plurality of feature quantities for each of the detected plurality of objects; a calculation mechanism that calculates an impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a question is posed to the user for inferring a predetermined object from the plurality of objects based on the respective feature quantities; and a generation mechanism that generates the question based on the feature quantities extracted by the extraction mechanism and the impurity of each of the feature quantities in a manner that reduces the number of questions required to minimize the impurity.

[0017] Invention Effects

[0018] According to the present invention, efficient questions can be generated using image recognition features, thereby estimating the target object. Attached Figure Description

[0019] Figure 1 This is a diagram illustrating an example of a system according to an embodiment of the present invention.

[0020] Figure 2 This is a block diagram illustrating an example of the hardware configuration of the mobile body according to this embodiment.

[0021] Figure 3 This is a block diagram illustrating an example of the functional configuration of the mobile body involved in this embodiment.

[0022] Figure 4 This is a block diagram illustrating an example configuration of the server and communication device involved in this embodiment.

[0023] Figure 5 This is a diagram used to explain the image acquisition involved in this embodiment.

[0024] Figure 6 This is a diagram used to explain the image analysis involved in this embodiment.

[0025] Figure 7 This is a diagram used to explain the question generation involved in this embodiment.

[0026] Figure 8 This diagram compares the questions involved in this embodiment with the questions in the comparative examples.

[0027] Figure 9 This is a flowchart illustrating a series of actions involved in the presumed processing of a user who uses speech and images, as described in this embodiment.

[0028] Figure 10 This is a flowchart illustrating a series of actions in the user estimation process (S106) involving speaking and capturing images as described in this embodiment.

[0029] Figure 11 This is a flowchart illustrating a series of actions involved in the detailed processing of S206 as described in this embodiment.

[0030] Figure 12 This is a diagram illustrating an example of a system according to another implementation.

[0031] Explanation of reference numerals in the attached figures

[0032] 100, 1210: Vehicle; 110: Server; 120: Communication device; 404: Control unit; 413: User data acquisition unit; 414: Voice information processing unit; 415: Image information processing unit; 416: Meeting point estimation unit; 417: User estimation unit. Detailed Implementation

[0033] Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Furthermore, the following embodiments are not intended to limit the invention to which the technical solution pertains. Additionally, not all combinations of features described in the embodiments are necessarily essential to the invention. Two or more features from the plurality of features described in the embodiments may be arbitrarily combined. Furthermore, identical or identical components are labeled with the same reference numerals, and repeated descriptions are omitted.

[0034] <System Structure>

[0035] Reference Figure 1 The configuration of System 1 according to this embodiment will be described below. System 1 includes a vehicle (mobile body) 100, a server 110, and a communication device (communication terminal) 120. In this embodiment, the server 110 uses the speech information of the user 130 and the captured images of the surroundings of the vehicle 100 to estimate the user and cause the user 130 to meet with the vehicle 100. The user interacts with the server 110 through a predetermined application launched on the communication device 120, providing their location by speaking, and moves towards a designated meeting point (e.g., a nearby red mailbox). The server 110 estimates the user and the meeting point and controls the vehicle 100 to move towards the estimated meeting point. Hereinafter, each component will be described in detail.

[0036] Vehicle 100 is equipped with a battery and is, for example, an ultra-compact mobile tool that moves primarily by a motor. An ultra-compact mobile tool refers to a vehicle that is more compact than a typical motor vehicle and has a passenger capacity of approximately one or two people. In this embodiment, vehicle 100 is described as an example of an ultra-compact mobile tool, but this is not intended to limit the invention; for example, it could also be a four-wheeled vehicle or a straddle-type vehicle. Furthermore, the vehicle of the present invention is not limited to a means of transportation; it could also be a vehicle carrying goods that travels alongside pedestrians or a vehicle that guides pedestrians. Further, in the present invention, it is not limited to four-wheeled or two-wheeled vehicles; autonomous walking robots and the like can also be applied. That is, the present invention can be applied to the aforementioned vehicles, walking robots, and other mobile bodies; vehicle 100 is an example of such a mobile body.

[0037] Vehicle 100 connects to network 140 via wireless communication such as Wi-Fi or 5G. Vehicle 100 can measure the state inside and outside the vehicle using various sensors (vehicle position, driving status, surrounding objects, etc.) and send the measured data to server 110. This collected and transmitted data is generally referred to as floating data, detection data, traffic information, etc. Vehicle-related information is sent to server 110 at certain intervals or based on the occurrence of specific events. Vehicle 100 can operate autonomously even when user 130 is not in the vehicle. Vehicle 100 receives control commands and other information from server 110, or controls its own actions using data measured by the vehicle.

[0038] Server 110 is an example of an information processing device, consisting of one or more server devices, capable of acquiring vehicle-related information sent from vehicle 100, speech information sent from communication device 120, and location information via network 140, estimating user 130, and controlling the movement of vehicle 100. The movement control of vehicle 100 includes adjusting the meeting point between user 130 and vehicle 100.

[0039] The communication device 120 is, for example, a smartphone, but is not limited to this; it could also be a headset-type communication terminal, a personal computer, a tablet computer, a game console, etc. The communication device 120 is connected to the network 140 via wireless communication such as Wi-Fi or 5G.

[0040] Network 140 includes, for example, communication networks such as the Internet and mobile phone networks, and transmits information between server 110, vehicle 100, and communication device 120. In system 1, when user 130 and vehicle 100, located in a remote location, have approached to the point where they can visually confirm objects (as visual markers), the system uses spoken information and image information captured by vehicle 100 to estimate the user and adjust the meeting point. Furthermore, in this embodiment, an example of a camera installed on the vehicle itself to capture images around vehicle 100 is described, but it is not necessarily required to install a camera on vehicle 100. For example, images captured by surveillance cameras already installed around vehicle 100 can be used, or both of the above can be used. Therefore, when determining the user's position, images captured at a further optimal angle can be used. For example, when a user speaks about their position relative to a marker, by analyzing images captured by a camera close to the marker and the predicted position, the user requesting to meet with the miniature mobile vehicle can be more accurately determined.

[0041] Before user 130 and vehicle 100 approach to a point where they can visually confirm landmarks, server 110 first moves vehicle 100 to an area containing the user's current location or predicted location. Then, once vehicle 100 reaches the approximate area, server 110, based on a pre-planned image of user 130, sends audio messages to communication device 120 requesting visual landmarks and user-related information (e.g., "Are there any shops nearby?", "Is your clothing black?"). Locations associated with visual landmarks include, for example, names of places included in map information. Here, visual landmarks represent physical objects that the user can visually recognize, such as buildings, traffic lights, rivers, mountains, statues, signs, and various other objects. Server 110 receives spoken information from communication device 120 containing locations associated with visual landmarks (e.g., "There's a building with a xx coffee shop"). Then, server 110 obtains the location of the corresponding location from map information and moves vehicle 100 to the vicinity of that location (i.e., close enough that the vehicle and user can visually confirm the landmark, etc.). Afterwards, according to this embodiment, based on captured images of the user's surroundings, an efficient question is generated based on feature quantities predicted by an image recognition model to reduce the number of questions asked, and the user is inferred based on the user's answer to the question. Details regarding the question generation method will be described later. Furthermore, in this embodiment, the case where the user is inferred is a person, but it is also possible that the user is not a person, but rather another landmark. For example, the user could be inferred as a sign, building, etc., designated as a marker. In this case, the question is directed to another landmark.

[0042] <The Composition of a Moving Body>

[0043] Next, refer to Figure 2 The configuration of a vehicle 100, which is an example of a mobile body according to this embodiment, will be described. Figure 2 In this embodiment, (A) represents the side of the vehicle 100. Figure 2 (B) in the diagram represents the internal structure of vehicle 100. Arrow X in the diagram indicates the front-to-back direction of vehicle 100, F indicates front, and R indicates rear. Arrows Y and Z indicate the width (left-right) and height (up-down) directions of vehicle 100.

[0044] Vehicle 100 is an electric autonomous vehicle equipped with a driving unit 12 and using a battery 13 as its main power source. The battery 13 is, for example, a rechargeable battery such as a lithium-ion battery, and the vehicle 100 is propelled by the power supplied from the battery 13 via the driving unit 12. The driving unit 12 is a four-wheeled vehicle equipped with a pair of front wheels 20 on each side and a pair of rear wheels 21 on each side. The driving unit 12 can also be in the form of a three-wheeled vehicle or other forms. Vehicle 100 has a single-person or two-person seat 14.

[0045] The driving unit 12 includes a steering mechanism 22. The steering mechanism 22 is a mechanism that uses a motor 22a as a drive source to change the steering angle of a pair of front wheels 20. By changing the steering angle of the pair of front wheels 20, the direction of travel of the vehicle 100 can be changed. The driving unit 12 also includes a drive mechanism 23. The drive mechanism 23 is a mechanism that uses a motor 23a as a drive source to rotate a pair of rear wheels 21. By rotating the pair of rear wheels 21, the vehicle 100 can move forward or backward.

[0046] The vehicle 100 is equipped with detection units 15 to 17 for detecting objects around the vehicle 100. Detection units 15 to 17 are external sensor groups that monitor the surroundings of the vehicle 100. In this embodiment, they are all imaging devices that capture images of the surroundings of the vehicle 100, and for example, they are equipped with an optical system such as a lens and an image sensor. However, radar or optical radar (Light Detection and Ranging) may be used instead of imaging devices, or based on imaging devices.

[0047] Two detection units 15 are separately arranged at the front of the vehicle 100 along the Y direction, mainly for detecting objects in front of the vehicle 100. Detection units 16 are respectively arranged on the left and right sides of the vehicle 100, mainly for detecting objects on the sides of the vehicle 100. Detection unit 17 is arranged at the rear of the vehicle 100, mainly for detecting objects behind the vehicle 100.

[0048] <Control Structure of a Moving Body>

[0049] Figure 3 This is a block diagram of the control system of a vehicle 100, which is a moving body. Here, the configuration required for implementing the present invention will be mainly described. Therefore, other configurations may also be included based on the configuration described below. The vehicle 100 includes a control unit (ECU) 30. The control unit 30 includes a processor, represented by a CPU, a storage device such as a semiconductor memory, and an interface for external devices. The storage device stores the program executed by the processor, the data used by the processor in processing, etc. The processor, storage device, and interface may also be provided in multiple sets according to the different functions of the vehicle 100 and configured to communicate with each other.

[0050] The control unit 30 acquires the detection results from detection units 15 to 17, input information from the operation panel 31, sound information input from the sound input device 33, and control commands from the server 110 (e.g., capturing images, sending current location, etc.), and executes the corresponding processing. The control unit 30 controls motors 22a and 23a (driving control of the driving unit 12), controls the display on the operation panel 31, provides sound-based reports to the occupants of the vehicle 100, and outputs information.

[0051] The voice input device 33 picks up the voices of the occupants of the vehicle 100. The control unit 30 can recognize the input voices and perform corresponding processing. The GNSS (Global Navigation Satellite System) sensor 34 receives GNSS signals to detect the current position of the vehicle 100. The storage device 35 is a large-capacity storage device that stores map data and other information such as roads, landmarks, shops, etc., that the vehicle 100 can travel on. The storage device 35 can also store programs executed by the processor, data used by the processor in processing, etc. The storage device 35 can also store various parameters (such as the learned parameters and hyperparameters of a deep neural network) of the machine learning models used for voice recognition and image recognition executed by the control unit 30. The communication unit 36 ​​is, for example, a communication device that can connect to the network 140 via wireless communication such as Wi-Fi or 5G mobile communication.

[0052] <Composition of Servers and Communication Devices>

[0053] Next, refer to Figure 4 Hereinafter, an example of the configuration of server 110 and communication device 120, which are information processing apparatuses according to this embodiment, will be described. Furthermore, the functions of server 110 described below can also be implemented by vehicle 100 as shown in the following modified examples. In this case, the control unit 404 of server 110 described later is implemented in a manner integrated with the control unit 30 of the aforementioned mobile body.

[0054] (Server configuration)

[0055] First, an example of the configuration of server 110 will be described. Here, the configuration required for implementing the present invention will be mainly described. Therefore, other configurations may also be included based on the configuration described below. The control unit 404 includes a processor, represented by a CPU, a storage device such as a semiconductor memory, and an interface for external devices. The storage device stores the program executed by the processor, the data used by the processor in processing, etc. The processor, storage device, and interface may also be set up in multiple groups according to the different functions of server 110 and configured to communicate with each other. The control unit 404 executes various actions of server 110 by executing programs, such as the adjustment processing of the convergence position described later. In addition to the CPU, the control unit 404 may also include dedicated hardware for executing GPUs or processing machine learning models suitable for neural networks, etc.

[0056] The user data acquisition unit 413 acquires images and location information transmitted from the vehicle 100. Additionally, the user data acquisition unit 413 acquires at least one of the speech information of the user 130 transmitted from the communication device 120 and the location information of the communication device 120. The user data acquisition unit 413 can store the acquired images and location information in the storage unit 403. The images and speech information acquired by the user data acquisition unit 413 are input into the model that has completed learning in the inference stage in order to obtain inference results, but can also be used as learning data for the machine learning model executed by the server 110.

[0057] The sound information processing unit 414 includes a machine learning model for processing sound information and performs learning and inference phases of the machine learning model. The machine learning model of the sound information processing unit 414 performs operations using a deep learning algorithm employing a deep neural network (DNN) to identify place names, landmarks such as buildings, shop names, and object names contained in the spoken information. Objects may include pedestrians, signs, markers, vending machines, and other outdoor equipment, building elements such as windows and entrances, roads, vehicles, and two-wheeled vehicles. The DNN completes its learning phase and, by inputting new spoken information into the learned DNN, can perform recognition processing (inference phase processing) on ​​the new spoken information. Furthermore, in this embodiment, the case where the sound recognition processing is performed by the server 110 is described as an example, but sound recognition processing can also be performed in vehicles or communication devices, and the recognition results can be sent to the server 110.

[0058] The image information processing unit 415 includes a machine learning model for processing image information and performs learning and inference phases of the machine learning model. The machine learning model of the image information processing unit 415, for example, performs operations using a deep learning algorithm employing a deep neural network (DNN) to identify objects contained in the image information. Objects may include pedestrians, signs, markers, vending machines and other outdoor equipment, building elements such as windows and entrances, roads, vehicles, and bicycles. For example, the machine learning model of the image information processing unit 415 is an image recognition model that extracts features of pedestrians contained in the image (e.g., objects near the pedestrian, the color of clothing, the color of a bag, whether or not a mask is worn, whether or not a smartphone is present, etc.).

[0059] The question generation unit 416 obtains the impurity of each feature quantity based on multiple feature quantities extracted from the captured image taken by the vehicle 100 through an image recognition model and their reliability. Based on the derived impurities, it recursively generates a question set that minimizes the impurity in the shortest form. Impurity represents the degree to which the target cannot be separated from other target sets. The user estimation unit 417 estimates the user based on the user's answer to the generated questions. Here, user estimation refers to estimating the user (target) who requests to meet with the vehicle 100, estimating the requesting user from one or more people within a predetermined area. The meeting position estimation unit 418 performs the adjustment process for the meeting position of user 130 and vehicle 100. Details regarding the impurity acquisition process, user estimation process, and meeting position adjustment process will be described later.

[0060] Furthermore, server 110 generally has access to more computing resources than vehicle 100. Additionally, by receiving and storing image data captured by various vehicles, it can collect learning data for diverse situations, enabling learning to address a wider range of scenarios. Based on the stored information, an image recognition model is generated, and this model is used to extract features from the captured images.

[0061] The communication unit 401 is, for example, a communication device including communication circuitry, that communicates with external devices such as the vehicle 100 and the communication device 120. In addition to receiving at least one of image information and location information from the vehicle 100, and voice information from the communication device 120, the communication unit 401 also sends control commands to the vehicle 100 and voice information to the communication device 120. The power supply unit 402 supplies power to the various components within the server 110. The storage unit 403 is a non-volatile memory such as a hard disk or semiconductor memory.

[0062] (Composition of a communication device)

[0063] Next, the configuration of the communication device 120 will be described. The communication device 120 refers to a portable device such as a smartphone owned by the user 130. Here, the configuration required for implementing the present invention will be mainly described. Therefore, other configurations may also be included based on the configuration described below. The communication device 120 includes a control unit 501, a storage unit 502, an external communication device 503, a display operation unit 504, a microphone 507, a speaker 508, and a speed sensor 509. The external communication device 503 includes a GPS 505 and a communication unit 506.

[0064] The control unit 501 includes a processor, such as a CPU. The storage unit 502 stores the programs executed by the processor, data used by the processor during processing, etc. Alternatively, the storage unit 502 can be assembled inside the control unit 501. The control unit 501 is connected to other components 502, 503, 504, 508, and 509 via signal lines such as buses, enabling it to send and receive signals and control the entire communication device 120.

[0065] The control unit 501 can communicate with the communication unit 401 of the server 110 via the network 140 using the communication unit 506 of the external communication device 503. Additionally, the control unit 501 acquires various information via the GPS 505. The GPS 505 acquires the current location of the communication device 120. Thus, for example, location information can be provided to the server 110 along with the user's speech information. Furthermore, in this invention, the GPS 505 is not a necessary component; this invention provides a system that can be used even in facilities such as indoor spaces where GPS 505 location information cannot be obtained. Therefore, location information based on the GPS 505 is processed as supplementary information for estimating the user's location.

[0066] The display operation unit 504 is, for example, a touch panel-type LCD display, capable of displaying various information and accepting user operations. The display operation unit 504 displays information such as the query content from the server 110 and the meeting point with the vehicle 100. Furthermore, when an query is received from the server 110, the microphone 507 of the communication device 120 can be activated by operating the microphone button, which is displayed in an selectable manner. The microphone 507 captures the user's speech as audio information. The microphone can also be activated by pressing the microphone button displayed on the operation screen to capture the user's speech. When the speaker 508 queries the user according to the instructions from the server 110, it outputs a sound-based message (e.g., "Is the bag red?"). If the query is sound-based, even a simple configuration such as a headset without a display screen, such as the communication device 120, allows interaction with the user. Additionally, even when the user is not holding the communication device 120, the user can hear the query from the server 110 through a headset or similar means. If the query is text-based, the query from server 110 is displayed on the display operation unit of communication device 120, and the user can obtain a response by pressing a button displayed on the operation screen or by entering text in the chat window. In this case, unlike voice-based queries, the query can be made without being affected by ambient noise.

[0067] The speed sensor 509 is an accelerometer that detects the acceleration of the communication device 120 in the forward, left, right, and up / down directions. The output value representing the acceleration from the speed sensor 509 is stored in a ring buffer in the storage unit 502 and is rewritten starting from the earliest record. The server 110 can also acquire this data and use it to detect the user's direction of movement.

[0068] <Summary of Question Generation Using Speech and Images>

[0069] Reference Figures 5 to 8 This document provides an overview of the question generation process executed on server 110, which utilizes speech and images. Specifically, it describes the process of generating efficient questions based on images captured by vehicle 100 to identify targets such as users, signs, etc.

[0070] (Image captured)

[0071] Figure 5 This is a diagram illustrating an example of an image captured by vehicle 100. Figure 5In this process, vehicle 100 is in a state where it has moved to a general location based on the user's speech and location information. After moving to the general location, vehicle 100 uses at least one of detection units 15 to 17 to capture images of the area surrounding the location of the user presumed to be the target. The captured image 600 includes pedestrian A, pedestrian B, pedestrian C, pedestrian D, building 601, utility pole 602, and pedestrian crossings 603 and 604 on the road. When vehicle 100 acquires the captured image 600, it sends it to server 110. Furthermore, if vehicle 100 possesses an image recognition model, it can also extract features from the captured images. Additionally, if vehicle 100 does not have a camera function, it can acquire images captured by cameras located on other vehicles and buildings in the vicinity. Furthermore, image analysis can be performed using the multiple captured images described above.

[0072] (Feature extraction)

[0073] Figure 6 This is a graph representing the feature quantities extracted from the captured image 600 using an image recognition model in server 110. 610 represents the extracted features (hereinafter referred to as feature quantities). The image information processing unit 415 of server 110 first detects people using an image recognition model. Here, in the captured image 600, four people, pedestrians A to D, are detected. Then, the image information processing unit 415 extracts feature quantities for each detected person. As shown in 610, as feature quantities associated with the multiple detected people, for example, objects near the detected people, the color and type of the detected people's clothing, the color of their pants, the color of their bags, etc., are detected. Furthermore, the actions of the detected people are detected, such as whether they are looking at a smartphone, whether they are wearing a mask, whether they are standing still, and which direction they are facing. As shown in 610, feature quantities are extracted for each of the detected pedestrians A to D. In addition, when the target object is a building or sign, features can also be obtained by detecting objects near the detected object, the color and category of the detected object, and the text and patterns displayed on the object.

[0074] (Generation of questions corresponding to impurity)

[0075] Figure 7This diagram illustrates the method for generating questions using impurity according to this embodiment. First, the question generation unit 416 of server 110 extracts one or more feature quantities using an image recognition model, and further obtains the feature quantity values, their reliability, and the weights of the feature quantities themselves. Reliability, for example, represents how confident the image recognition model is in predicting the feature quantity value. Weight represents how much of the feature quantity's value is reflected in the impurity calculation. Reliability and weight can also be values ​​updated in real-time through machine learning. The weights of the feature quantities can also be set heuristically for each feature quantity. Furthermore, the question generation unit 416 recursively generates optimal and efficient questions based on the obtained feature quantities, their weights, and reliability. In addition, the generated questions are preferably those that a human can answer with "yes" or "no," thereby reducing the diversity of answers. That is, it has the secondary effect of reducing the difficulty for computers to understand speech and recognize voices.

[0076] right Figure 7 The example shown will be used for illustration. As shown in 610, feature quantities are extracted from the captured image 600 for pedestrians A to D. As shown in 701, the user requesting the rendezvous, i.e., the target user, is designated as B. As mentioned above, impurity represents the degree to which a target cannot be separated from other target groups within a target group. Therefore, in the case of including all pedestrians A to D, according to the impurity calculation model described later, the impurity is "4.8".

[0077] Here, assuming all feature values ​​have equal weight and reliability, the question generation unit 416 generates a question that minimizes impurity in its shortest form, i.e., a question asking about features possessed by only one user, such as "Is the bag red?". Of course, if no feature exists that only one user possesses, multiple questions may be generated. In this case, questions can be asked sequentially or based on other information, such as features of users considered more likely based on their location. In the example of 610, if the user answers "yes" to the above question, pedestrian B can be presumed as the target user. On the other hand, if the user answers "no," the set is narrowed down to pedestrians A, C, and D, and the next question is generated.

[0078] On the other hand, when the weight and reliability of the bag's color are low, the question generation unit 416 uses other features with higher weight and reliability to generate questions, such as "Are you looking at your smartphone?". When the user answers "yes", the set is narrowed down to pedestrians A and B, with an impurity of "1.9". Next, the question generation unit 416 generates the question "Are you wearing a mask?". Thus, even if the user answers either "yes" or "no", the target user can be inferred. In this way, the question generation unit 416 generates optimal and efficient questions by considering the weight and reliability of the feature values.

[0079] Impurity calculation models can be formalized in various ways. For example, they can be formalized heuristically or approximated using neural networks. As mentioned above, the weights of features can be set heuristically or learned from data through machine learning.

[0080] An example of an impurity calculation model is shown below. Figure 7 702. 703 represents the number of objects in the set other than the target. For example, if the target is people, it represents the number of people in the set of multiple people other than the predetermined people. The smaller N is, the lower the impurity. 704 represents the penalty based on the weights of the features and the reliability of the feature values. The smaller the penalty, the lower the impurity. 705 represents the content of each variable. Additionally, F represents the set of features (the set of feature values), and M represents the dimension of the features. k Let f* represent the set of feature values ​​possessed by each object with respect to the k-th feature. k This represents the feature values ​​possessed by the target user. N represents the number of objects. w represents the set of weights for each feature. C fk This represents the reliability obtained from the image recognition results of each object for the k-th feature. Furthermore, the impurity calculation model of 702 is merely an example and is not intended to limit the invention. For example, instead of simply calculating the sum of each term in 702 and 703, coefficients or standardization based on the number of objects could be introduced. Additionally, for the penalty term, instead of simply calculating the weights or the reciprocal of the reliability, other operations or functions could be introduced. Furthermore, depending on the amount of data collected, function approximations based on neural networks or the like could be introduced.

[0081] (Generated efficient questions)

[0082] Figure 8 This section illustrates an efficient questioning method as described in this embodiment and an example of a questioning method used as a comparison. In the comparison example, questions are generated sequentially using the extracted feature values ​​shown in 610 to filter target users. Therefore, the likelihood of generating multiple questions is high, such as... Figure 8As shown, it is possible to generate questions such as "Are there any buildings nearby?" which are characteristics of all pedestrians A to D, and "Are their clothes black?" which are characteristics of pedestrians A and B. On the other hand, according to the invention of this application, if using Figure 8 As described, a question, "Are the shoes red?", is generated using as few characteristics as possible of pedestrians. For example, if pedestrian B is the target user, a "yes" answer is received, allowing the target user to be identified with a single question. Thus, according to this embodiment, impurity can be minimized in the shortest possible form, thereby minimizing the number of dialogues required when estimating the target user.

[0083]

[0084] Next, refer to Figure 9 This embodiment describes a series of actions related to the merging control in the server 110. Furthermore, this process is implemented by the control unit 404 executing a program. In the following description, for simplicity, each process is described as being executed by the control unit 404, but the corresponding processes are executed by various parts of the control unit 404. Furthermore, while the process of the user and vehicle finally merging is described here, the characteristic configuration of this invention is associated with the user's estimation (identification), and the configuration for estimating the merging position is not essential. That is, the following describes processing steps that also include control related to the estimation of the merging position, but control that only performs processing steps related to the user's estimation can also be executed.

[0085] In S101, the control unit 404 receives a request (meeting request) from the communication device 120 to initiate a rendezvous with the vehicle 100. In S102, the control unit 404 obtains the user's location information from the communication device 120. Furthermore, the user's location information is obtained from the GPS 505 of the communication device 120. This location information can also be received simultaneously with the request in S101. In S103, the control unit 404 determines the approximate rendezvous area (also referred to as the rendezvous area or predetermined area) based on the user's location obtained in S102. The rendezvous area is, for example, an area with a predetermined distance (e.g., several hundred meters) centered on the current location of the user 130 (communication device 120).

[0086] In S104, the control unit 404 tracks the movement of vehicle 100 toward the rendezvous area, for example, based on location information periodically sent from vehicle 100. Furthermore, the control unit 404 can, for example, select the vehicle closest to the user 130's current location (or the destination after a predetermined time) from among multiple vehicles located in the vicinity of the user 130's current location. Alternatively, the control unit 404 may select a specific vehicle 100 as the vehicle to rendezvous with the user 130 if the rendezvous request includes information specifying that particular vehicle 100.

[0087] In S105, the control unit 404 determines whether the vehicle 100 has arrived at the rendezvous area. For example, if the distance between the vehicle 100 and the communication device 120 is within the radius of the rendezvous area, the control unit 404 determines that the vehicle 100 has arrived at the rendezvous area and causes the process to proceed to S106. Otherwise, the server 110 causes the process to return to S105 and waits for the vehicle 100 to arrive at the rendezvous area.

[0088] In S106, the control unit 404 uses spoken words and captured images to estimate the user. Details regarding this user estimation process using spoken words and captured images will be described later. Next, in S107, the control unit 404 further estimates the meeting point based on the user estimated in S106. For example, by estimating the user within the captured image, if the user mentions "a nearby red mailbox" as a meeting point, the control unit 404 can more accurately estimate the meeting point by searching for red mailboxes closest to the estimated user. Then, in S108, the control unit 404 sends the meeting point location information to the vehicle. That is, the control unit 404 sends the meeting point estimated in the processing of S107 to the vehicle 100, thereby causing the vehicle 100 to move towards the meeting point. After sending the meeting point to the vehicle 100, the control unit 404 concludes the series of actions.

[0089] <A series of actions that presuppose the user's speech and the images captured>

[0090] Next, refer to Figure 10 This section describes a series of actions in server 110 related to the presumption processing (S106) of users who made statements and took images. Furthermore, this processing is related to... Figure 9 The process shown is similarly implemented by the program executed by the control unit 404.

[0091] In S201, the control unit 404 acquires images captured by the vehicle 100. Additionally, it can acquire images from surveillance cameras located on other vehicles besides the vehicle 100, or from buildings surrounding the location believed to be the target user's location.

[0092] In S202, the control unit 404 uses an image recognition model to detect one or more people in the acquired image. Then, in S203, the control unit 404 uses the image recognition model to extract features from each detected person. The result of the processing in S202 and S203 is, for example, the extraction of features from each person. Figure 6 The 610 figures show the people and their respective characteristics. Furthermore, weights and reliability are assigned to the extracted features.

[0093] Next, in S204, the control unit 404 uses the calculation formula described above to obtain the impurity of each feature extracted in S203. Then, in S205, the control unit 404 generates a question that minimizes the number of questions based on the impurity.

[0094] In S206, the control unit 404 sends a question to the user based on the generated question, and repeatedly asks questions based on the user's answers to make inferences about the user until the user can be correctly identified, thus ending the processing in this flowchart. For detailed processing, please refer to... Figure 11 To be described later.

[0095] Reference Figure 11 The detailed processing of S206 is explained below. Furthermore, this processing is related to... Figure 9 The process shown is similarly implemented by the program executed by the control unit 404.

[0096] In S301, the control unit 404, based on the weights of features related to each question in the generated question group, as well as reliability and the number of questions asked, sends the question group with the fewest questions to the communication device 120. Here, a question group is a set containing more than one question, representing a set in which the target user can be inferred by engaging in dialogue with the user according to the questions in the question group.

[0097] Next, in S302, the control unit 404 determines whether a user response to the question sent in S301 has been received from the communication device 120. If a response has been received, the process proceeds to S303; otherwise, it waits in S302 until a response is received. Furthermore, if no user response is received even after a predetermined time has elapsed since the question was sent, the question can be resent, or an error message can be displayed to terminate the process.

[0098] In S303, the control unit 404 determines whether the target user can be filtered based on the user's answer. That is, if the user can be inferred, proceed to S304; otherwise, send the next question, thus returning the process to S301. In S304, the control unit 404 infers the target user and ends the processing of this flowchart.

[0099] <Variation Example>

[0100] Hereinafter, variations of the present invention will be described. In the above embodiment, an example of executing convergence control including user presupposition in server 110 has been described. However, the above-described process can also be performed by mobile bodies such as vehicles and walking robots. In this case, as Figure 12 As shown, system 1200 consists of vehicle 1210 and communication device 120. User speech information is transmitted from communication device 120 to vehicle 1210. Image information captured by vehicle 1210 is processed by the control unit within the vehicle instead of being transmitted via the network. Vehicle 1210 can have the same configuration as vehicle 100, except that control unit 30 can perform convergence control. Control unit 30 of vehicle 1210 acts as a control device within vehicle 1210, executing the aforementioned processing by executing a stored program. Figures 9 to 11 The interactions between the server and the vehicle in the series of actions shown can be performed within the vehicle (e.g., inside the control unit 30, or between the control unit 30 and the detection unit 15). Other processing can be performed in the same way as the server.

[0101] <Summary of Implementation Methods>

[0102] 1. The information processing apparatus (e.g., 110) according to the above embodiments includes:

[0103] Acquiring agency (401), which acquires captured images;

[0104] Extraction mechanism (415, S203) detects multiple objects contained in the captured image and extracts multiple feature quantities for each of the detected multiple objects;

[0105] A calculation mechanism (415, S204) calculates impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a user is asked to infer the predetermined object from the plurality of objects based on the respective feature quantities; and

[0106] The generation mechanism (416, S205) generates the question based on the feature quantity extracted by the extraction mechanism and the impurity of each feature quantity, in a manner that reduces the number of questions required to minimize the impurity.

[0107] According to this implementation, efficient questions can be generated using image recognition features, thereby estimating the target object.

[0108] 2. In the information processing apparatus of the above embodiment, the extraction mechanism extracts the feature quantity using an image recognition model (S203), and the generation mechanism, based on the feature quantity and the impurity, further generates the question that minimizes the impurity in the shortest form based on the reliability and weight of the feature quantity extracted using the image recognition model (S205).

[0109] According to this implementation method, feature extraction can be performed efficiently using the learned image recognition model, and optimal questions can be generated based on its reliability and weights.

[0110] 3. In the information processing apparatus of the above embodiments, the reliability represents the reliability of the feature value, and the feature value represents the value of the feature extracted by the image recognition model for each of the plurality of objects. Figure 7 Furthermore, the weights are set heuristically or based on machine learning for each feature. Figure 7 ).

[0111] According to this implementation method, feature quantities can be extracted efficiently through the learned image recognition model, and the best question can be generated based on its reliability and weight. Furthermore, the weights of each feature quantity can be set appropriately.

[0112] 4. In the information processing apparatus of the above embodiments, the impurity is obtained at least according to one of the following: the number of objects other than the predetermined objects included in the set of the plurality of objects, and a penalty based on the weight of the feature quantity and / or reliability. Figure 7 ).

[0113] According to this implementation method, impurity can be derived while taking into account the reliability and weight of each feature quantity, thus enabling efficient query generation.

[0114] 5. The information processing apparatus of the above embodiments further includes: a sending mechanism (401, S301) that sends a question generated by the generating mechanism to a communication device owned by the user; a receiving mechanism (401, S302) that receives an answer to the question from the communication device; and an estimation mechanism (417, S304) that estimates the predetermined object from the plurality of objects based on the answer received by the receiving mechanism.

[0115] According to this implementation, it is possible to efficiently estimate objects such as users based on questions generated in order to minimize impurities in the shortest possible form.

[0116] 6. In the information processing apparatus of the above embodiments, the acquisition mechanism acquires location information from the communication device owned by the user and acquires images (401, 413) obtained by taking pictures of the surrounding area of ​​the location information from the outside.

[0117] According to this implementation method, the approximate location of the user can be determined, and the captured images of the surrounding area can be used to generate questions.

[0118] 7. In the information processing apparatus of the above embodiment, the acquisition mechanism acquires the image captured by the vehicle to which the user requests to meet (15-17, S201).

[0119] According to this implementation method, it is possible to more accurately estimate the target and connect it with the target user.

[0120] 8. In the information processing apparatus of the above embodiments, the acquisition mechanism acquires images captured by a camera located around the location information.

[0121] According to this implementation, even if the vehicle does not have a camera function, it is possible to acquire images of the surroundings of the target user.

[0122] 9. In the information processing apparatus of the above embodiments, the feature quantity refers to at least one piece of information, in the case of a person, indicating nearby objects, the color of clothing, the type of clothing, the color of a bag, whether the person is looking at a communication device, and whether they are wearing a mask. Figure 8 Additionally, the feature quantity refers to at least one of the following: the object's color, category, text displayed on the object, and pattern.

[0123] According to this implementation, it is possible to efficiently estimate objects (including users who are objects) based on various characteristic quantities.

[0124] 10. The mobile body (e.g., 1210) according to the above embodiment includes:

[0125] Acquiring agency (401), which acquires captured images;

[0126] Extraction mechanism (415, S203) detects multiple objects contained in the captured image and extracts multiple feature quantities for each of the detected multiple objects;

[0127] A calculation mechanism (415, S204) calculates impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a user is asked to infer the predetermined object from the plurality of objects based on the respective feature quantities; and

[0128] The generation mechanism (416, S205) generates the question based on the feature quantity extracted by the extraction mechanism and the impurity of each feature quantity, in a manner that reduces the number of questions required to minimize the impurity.

[0129] According to this implementation, efficient questions can be generated in a moving body using image recognition features without going through a server, thereby estimating the target.

Claims

1. An information processing device, characterized in that, The information processing device includes: Image acquisition agency, which acquires and captures images; An extraction mechanism detects multiple objects contained in the captured image and extracts multiple feature quantities for each of the detected multiple objects; An impurity acquisition mechanism acquires impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a user is asked to infer a predetermined object from the plurality of objects based on the individual feature quantities, and when the candidate set consisting of the plurality of objects is divided into multiple partial sets. as well as A generating mechanism, based on the feature quantity extracted by the extraction mechanism and the impurity of each feature quantity, selects a feature quantity that efficiently separates the predetermined object from the candidate set in a manner that reduces the number of queries required to minimize the impurity, and generates a query corresponding to the selected feature quantity.

2. The information processing device according to claim 1, characterized in that, The extraction mechanism uses an image recognition model to extract the feature quantities. The generating mechanism, based on the feature quantity and the impurity, and also on the reliability and weight of the feature quantity extracted using the image recognition model, generates the question in the shortest form that minimizes the impurity.

3. The information processing device according to claim 2, characterized in that, The reliability refers to the reliability of the feature value, which represents the value of the feature extracted by the image recognition model for each of the plurality of objects.

4. The information processing apparatus according to claim 2, characterized in that, The weights are set heuristically or based on machine learning for each feature.

5. The information processing apparatus according to claim 2, characterized in that, The impurity is obtained by at least one of the following: the number of objects other than the predetermined objects included in the set of the plurality of objects, and a penalty based on the weight of the feature quantity and / or reliability.

6. The information processing apparatus according to claim 1, characterized in that, The information processing device also includes: A sending mechanism that sends the query generated by the generating mechanism to the communication device owned by the user; A receiving mechanism that receives a response to the question from the communication device; as well as An estimating mechanism that, based on a response received by the receiving mechanism, estimates the predetermined object from the plurality of objects.

7. The information processing apparatus according to claim 1, characterized in that, The image acquisition mechanism obtains location information from the user's communication device and captures images of the surrounding area of ​​that location from the outside.

8. The information processing apparatus according to claim 7, characterized in that, The image acquisition device acquires images taken by the vehicle that the user requested to join.

9. The information processing apparatus according to claim 7, characterized in that, The image acquisition mechanism acquires images captured by cameras located around the location information.

10. The information processing apparatus according to claim 1, characterized in that, The feature quantity refers to, in the case of a person, at least one of the following: nearby objects, color of clothing, type of clothing, color of bag, type of bag, whether looking at a communication device, and whether wearing a mask.

11. The information processing apparatus according to claim 1, characterized in that, The feature quantity refers to at least one of the following information: the color, category, text displayed on the object, and pattern.

12. A mobile body, characterized in that, The mobile body has: Image acquisition agency, which acquires and captures images; An extraction mechanism detects multiple objects contained in the captured image and extracts multiple feature quantities for each of the detected multiple objects; An impurity acquisition mechanism acquires impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a user is asked to infer a predetermined object from the plurality of objects based on the individual feature quantities, and when the candidate set consisting of the plurality of objects is divided into multiple partial sets. as well as A generating mechanism, based on the feature quantity extracted by the extraction mechanism and the impurity of each feature quantity, selects a feature quantity that efficiently separates the predetermined object from the candidate set in a manner that reduces the number of queries required to minimize the impurity, and generates a query corresponding to the selected feature quantity.

13. A control method for an information processing device, characterized in that, The control method for the information processing device includes: The image acquisition step involves acquiring a captured image. The extraction step involves detecting multiple objects contained in the captured image and extracting multiple feature quantities for each of the detected multiple objects. An impurity acquisition step, wherein impurity is acquired for each feature quantity extracted in the extraction step, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a question is posed to the user for inferring the predetermined object from the plurality of objects based on the individual feature quantities, and when the candidate set consisting of the plurality of objects is divided into multiple partial sets; as well as In the generation step, based on the feature quantity extracted in the extraction step and the impurity of each feature quantity, a feature quantity that is efficiently separated from the candidate set is selected in a manner that reduces the number of queries used to minimize the impurity, and a query corresponding to the selected feature quantity is generated.

14. A method for controlling a moving body, characterized in that, The control method for the moving body includes: The image acquisition step involves acquiring a captured image. The extraction step involves detecting multiple objects contained in the captured image and extracting multiple feature quantities for each of the detected multiple objects. An impurity acquisition step, wherein impurity is acquired for each feature quantity extracted in the extraction step, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a question is posed to the user for inferring the predetermined object from the plurality of objects based on the individual feature quantities, and when the candidate set consisting of the plurality of objects is divided into multiple partial sets; as well as In the generation step, based on the feature quantity extracted in the extraction step and the impurity of each feature quantity, a feature quantity that is efficiently separated from the candidate set is selected in a manner that reduces the number of queries used to minimize the impurity, and a query corresponding to the selected feature quantity is generated.

15. A storage medium, characterized in that, It stores programs that enable the computer of the information processing device to function as a mechanism for: Image acquisition agency, which acquires and captures images; An extraction mechanism detects multiple objects contained in the captured image and extracts multiple feature quantities for each of the detected multiple objects; An impurity acquisition mechanism acquires impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a user is asked to infer a predetermined object from the plurality of objects based on the individual feature quantities, and when the candidate set consisting of the plurality of objects is divided into multiple partial sets. as well as A generating mechanism, based on the feature quantity extracted by the extraction mechanism and the impurity of each feature quantity, selects a feature quantity that efficiently separates the predetermined object from the candidate set in a manner that reduces the number of queries required to minimize the impurity, and generates a query corresponding to the selected feature quantity.

16. A storage medium, characterized in that, It stores programs for enabling the computer of the mobile body to function as the following mechanisms: Image acquisition agency, which acquires and captures images; An extraction mechanism detects multiple objects contained in the captured image and extracts multiple feature quantities for each of the detected multiple objects; An impurity acquisition mechanism acquires impurity for each feature quantity extracted by the extraction mechanism, the impurity representing the degree to which the predetermined object cannot be separated from the plurality of objects when a user is asked to infer a predetermined object from the plurality of objects based on the individual feature quantities, and when the candidate set consisting of the plurality of objects is divided into multiple partial sets. as well as A generating mechanism, based on the feature quantity extracted by the extraction mechanism and the impurity of each feature quantity, selects a feature quantity that efficiently separates the predetermined object from the candidate set in a manner that reduces the number of queries required to minimize the impurity, and generates a query corresponding to the selected feature quantity.