Map generation system
By processing building image data using a Visual Language Model (VLM), map data containing information about the impact of autonomous moving bodies is automatically generated, solving the problem of information acquisition difficulties in existing technologies and improving generation efficiency and accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TOYOTA JIDOSHA KK
- Filing Date
- 2025-12-25
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies cannot effectively generate map information for areas where autonomous moving bodies cannot move, and require a large amount of manual operation to obtain detailed structural and object information.
The Visual Language Model (VLM) is used to generate map data from building image data and request input, automatically extracting information related to the building's internal structure and objects, and generating map data related to the movement of autonomous moving bodies.
It enables the automatic generation of map data containing information on the movement impact of autonomous mobile bodies without requiring a large amount of manpower, avoiding the lack of information acquisition in the simulation environment and improving generation efficiency and accuracy.
Smart Images

Figure CN122306040A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to a map generation system. Background Technology
[0002] Japanese Patent Application Publication No. 2021-196487 describes a map conversion system for obtaining a two-dimensional or three-dimensional map that specifies the movable path of a moving object. This map conversion system obtains a map based on three-dimensional BIM data representing a space pre-processed with information on the internal structure and structural attributes, and predetermined individual information corresponding to the type of moving object, through simulation achieved by estimating its own location. BIM is an abbreviation for Building Information Modeling. Summary of the Invention
[0003] However, the technology described in Japanese Patent Application Publication No. 2021-196487 has the potential to fail to obtain map information such as areas where autonomous mobile bodies cannot move. Furthermore, in the technology described in Japanese Patent Application Publication No. 2021-196487, information related to detailed structures and objects not included in the data needs to be obtained through manual input and measurements taken during inspections, thus requiring a significant amount of time. Therefore, there is a need to develop a technology that can easily generate map data, such as map data containing information that would affect the movement of autonomous mobile bodies, based on image data of the internal structure of buildings.
[0004] In the map generation system disclosed herein, building data, which includes at least image data of the internal structure of a building, and a request are input into a visual language model, wherein the request includes a requirement to generate first information, and the first information is information related to at least one of the internal structure contained in the building data and the objects present therein. The visual language model is a model that takes image data and language data as input and outputs at least one of image data and language data. The system obtains output information output from the visual language model in response to the request, and generates map data associated with the internal structure of the building based on the obtained output information.
[0005] According to this disclosure, map data containing desired information can be easily generated from image data of the internal structure of a building. Attached Figure Description
[0006] The features, advantages, technical and industrial significance of representative embodiments of the present invention will be depicted in the following drawings for reference, wherein the same symbols denote the same elements.
[0007] Figure 1This is a schematic diagram illustrating a structural example of a management system that uses map data generated in the map generation system described in the implementation.
[0008] Figure 2 This is a block diagram illustrating a structural example of a map generation system according to an implementation method.
[0009] Figure 3 To indicate to Figure 2 A schematic diagram of an example of single-level map data of a facility input into a map generation system.
[0010] Figure 4 This is a diagram illustrating an example of a table showing the correspondence between pathpoint attributes and actions.
[0011] Figure 5 To express [the opinion / towards] Figure 3 Input map data and from Figure 2 A schematic diagram of an example of a map generation system that outputs single-layer map data within a facility.
[0012] Figure 6 This is a flowchart illustrating an example of the map generation method involved in the implementation. Detailed Implementation
[0013] Although the invention is described below by way of embodiments, the invention as defined in the claims is not limited to the embodiments described below. Furthermore, the structures described in the embodiments are not necessarily necessary as means to solve the problem.
[0014] Implementation The map data generated in the map generation system (hereinafter, this system) according to this embodiment can be used, for example, in a management system for managing autonomous mobile entities, as described below.
[0015] Example of the general structure of a management system Figure 1 This is a schematic diagram illustrating the structure of the management system 1. The management system 1 includes a management device 100, robots 200, cameras 500, a network 600, user terminals 400, and auxiliary units 700. The management system 1 is a system for managing one or more robots 200. The management device 100 manages the movement and tasks of the multiple robots 200.
[0016] Robot 200 is an autonomous mobile body that performs tasks such as transport. Robot 200 moves autonomously within medical and welfare facilities such as hospitals, rehabilitation centers, nursing facilities, and elderly care facilities. Robot 200 is used to transport medicines, medical equipment, meals, tableware, medical records, spare supplies, specimens, linen products, and people. The transported items can also be patients. Furthermore, the management system 1 can also be used in commercial facilities such as shopping malls. Robot 200 has wheels, a chassis, motors, sensors, batteries, and controllers. At least one of Robot 200 can be a different type of robot. Robot 200 can also all be of the same type. Each of Robot 200 is assigned a unique identification number (ID). Although in Figure 1 The middle image shows three robots 200, but the number of robots is not specifically limited; it only needs to be one or more.
[0017] Furthermore, at least one of the robots 200 can also perform tasks other than the delivery task. These other tasks include cleaning, security, and guidance. The robot 200 can perform multiple tasks such as cleaning, security, and guidance using the auxiliary unit 700, or it can perform tasks independently. The robot 200 can perform various tasks, for example, by combining the auxiliary unit 700 with itself. The robot 200 can also prepare different auxiliary units depending on the task. By replacing the auxiliary unit 700, the robot 200 becomes a multi-tasking robot capable of performing multiple tasks.
[0018] When performing a transport task, the auxiliary unit 700 becomes a wheeled trolley or delivery vehicle for carrying the transported items. When performing a cleaning task, the auxiliary unit 700 differs from the illustrated case but includes a vacuum cleaner for sucking up trash and other debris. When performing a security task, the auxiliary unit 700 differs from the illustrated case but includes sensors such as LiDAR (registered trademark, hereinafter the same), and a camera. The following description primarily focuses on the scenario where robot 200 performs a transport task.
[0019] User U1 or User U2 can use User Terminal 400 to perform tasks such as entrusting the delivery of goods. For example, User Terminal 400 is a tablet computer or a smartphone. User Terminal 400 only needs to be an information processing device capable of wireless or wired communication.
[0020] Robot 200 and user terminal 400 are connected to management device 100 via network 600. Network 600 is a wired or wireless LAN (Local Area Network) or WAN (Wide Area Network). Management device 100 is also connected to network 600 via wired or wireless means. Communication between devices can be achieved, for example, using common communication standards such as Wi-Fi (registered trademark).
[0021] Various signals sent from user terminals 400 of users U1 and U2 are transmitted to management device 100 via network 600 and then forwarded from management device 100 to robot 200. Similarly, various signals sent from robot 200 are transmitted to management device 100 via network 600 and then forwarded from management device 100 to user terminals 400. Management device 100 is a server connected to each device, which collects data from each device. Furthermore, management device 100 is not limited to a single physical device and may have multiple devices implementing distributed processing. In addition, management device 100 may be configured to be distributed among edge devices such as robot 200. For example, part or all of management system 1 may be mounted on robot 200.
[0022] The robot 200 includes a drive motor, wheels, and a battery. It also includes a camera, LiDAR sensors, and a processing unit. The robot 200 infers its own position based on sensor detection results. Based on its position, the robot 200 autonomously moves along a route on a map from its starting point to its destination. The starting point becomes the robot's current position, and the destination becomes the target for transporting the goods. Alternatively, the source of the goods can be used as a route point for pathfinding. Furthermore, the route search can be performed either by the management device 100 or by the robot 200 itself.
[0023] User terminal 400 and robot 200 can also transmit and receive signals without going through management device 100. For example, user terminal 400 and robot 200 can also directly transmit and receive signals via wireless communication. Furthermore, management device 100 can also collect data from camera 500. Camera 500 is a surveillance camera or a burglar alarm camera, etc. Additionally, management device 100 can also collect data from communication devices or sensors (not shown).
[0024] The facility is designed to utilize multiple types of robots 200. A management device 100 assigns tasks to each robot 200. Each robot 200 can also mount an auxiliary unit 700 corresponding to its assigned task and perform that task. The task performed by each of the three robots 200 can be input by user U1 or user U2, or it can be pre-scheduled. For example, user U1 or others can delegate tasks by operating a user terminal 400. User U1 or others can input the type of task to be performed. User U1 or others can also input the area and time period for the task. The management device 100 creates a schedule for the robots 200 to perform tasks efficiently.
[0025] Users U1 or U2 can also operate the user terminal 400 to delegate delivery tasks. In this case, users U1 or U2 input information related to the delivery item. Furthermore, users U1 or U2 can also input predetermined arrival information indicating the scheduled arrival time of the delivery item. The management device 100 allocates robots to perform delivery tasks based on the predetermined arrival information. Moreover, the management device 100 sends control signals for the robots to perform tasks. These control signals may include delivery item information such as the route to the destination and the delivery item itself.
[0026] In this overall structure, the management system 1 can be constructed as a whole by distributing its various elements across the robot 200, user terminal 400, and management device 100. Furthermore, it can also be constructed by concentrating the essential elements for transporting the goods into a single device.
[0027] The management device 100 includes a server computer or similar device to perform calculations for controlling and managing the robots 200. The management device 100 can be installed as a program-executing device, such as a central processing unit (CPU) of a computer. Furthermore, the functions described later can also be implemented through a program. The management device 100 manages each robot 200 based on internally stored map data, the delivery item ID, and the robot ID of the robot 200.
[0028] For example, the management device 100 manages the schedules of multiple robots 200 in a way that enables the robot 200 to perform tasks efficiently. For example, when the management device 100 receives a task assignment from a user terminal 400, it selects one robot 200 from the multiple robots 200 and issues instructions to the robot 200 to perform the task. Alternatively, the management device 100 instructs the robot 200 to use an auxiliary unit 700.
[0029] The map data used in management system 1 may include waypoints that correspond to locations or areas on the map. Attributes may also be set for these waypoints. Based on the map data or waypoints, at least one of the following can be set: actions to be taken by a moving vehicle, and passage distinction information such as whether the vehicle can pass or its priority.
[0030] The map data is data representing a floor plan of the facility (also simply called a map). This map data may also include information related to restricted driving areas or waypoints. Alternatively, the map data may not be a floor plan of the entire facility, but rather data representing a portion of the area where services are scheduled to be performed. Each robot 200 autonomously travels to its destination by referring to the map data.
[0031] Map data is data generated by this system. Map data can be generated based on, for example, architectural drawings of a facility, image data obtained from cameras installed inside the facility, distance sensor measurement results, or other building data, or based on a combination of these. Architectural drawing data can be image data of architectural drawings obtained from paper media, or it can be CAD data, BIM data, or image data converted to PDF (registered trademark, hereinafter the same). CAD is an abbreviation for Computer-Aided Design, and PDF is an abbreviation for Portable Document Format. Distance sensor measurement results are examples of image data obtained by a sensor-equipped robot 200 or other robot or human moving inside a building. Distance sensors can be, for example, LiDAR (registered trademark, hereinafter the same), depth sensors, stereo cameras, radar, or combinations thereof. Map data is not limited to two-dimensional map data; it can also be three-dimensional map data. Measurement data that forms the basis of map data, for example in the case of a LiDAR ranging sensor, refers to two-dimensional or three-dimensional point cluster data obtained using LiDAR.
[0032] Example of the system structure use Figure 2 This section will explain the system that generates the map data described above. Figure 2 A block diagram illustrating a map generation system 10, which serves as a structural example of this system.
[0033] The map generation system 10 can be configured as a computer and includes a processing unit 11 consisting of a processor and memory, a storage unit 12 consisting of a storage device, a communication unit 13 for communicating with external devices, an operation unit 14 for receiving user operations, and a display unit 15 for displaying information. The communication unit 13 has a communication interface. Furthermore, the map generation system 10 can also be configured as a distributed system where some functions are distributed across multiple devices.
[0034] The storage unit 12 stores the learning model 12a in a state accessible from the processing unit 11. The learning model 12a is an example of a learned model that has undergone machine learning and includes at least a Vision-Language Model (VLM). The type of VLM is not limited. If the learning model 12a includes a machine learning model other than a VLM, the model can be configured as a model for pre-processing, intermediate processing, or post-processing of the VLM, and its algorithm only needs to be a model capable of working in conjunction with the VLM to perform the requested processing.
[0035] The map generation system 10 inputs image data, which includes at least the internal structure of the building, i.e., the building data described above, and requests the learning model 12a or the VLM contained in the learning model 12a for the aforementioned facilities and buildings.
[0036] A VLM is a model that takes image data and speech data as input and outputs at least one of them. A VLM can also be generative AI (Artificial Intelligence) such as Chat-GPT (registered trademark), but is not limited to these. Building data includes, for example, the various architectural drawings mentioned above, image data obtained by cameras installed inside the facility, or multiple such data, and can be referred to as building drawing data. Furthermore, building data may also include measurement results data from range sensors.
[0037] The request is a request that can be processed by the operation unit 14 and includes a request to generate first information, which is information related to at least one of the internal structure and objects contained in the building data. The first information can be, for example, a predetermined keyword or a predetermined icon. The request can be a request pre-stored in the storage unit 12, in which case the request is read and input into the VLM. More specifically, the request may be, for example, as follows: It is desired to simultaneously read the internal structure and objects of the building contained in the input building data, extract buildings that match walls, passageways, passageways of a predetermined width or less, stairs, elevators, and installations, and output map data in which pathpoints corresponding to their attributes are set. The request may also include pathpoint definitions and requests the setting of pathpoints that do not match them. Alternatively, the request may also include pathpoint definitions and only request the setting of general pathpoints.
[0038] The processing unit 11 of the map generation system 10 acquires output information from the VLM in response to the input building data and request, and generates map data associated with the internal structure of the building based on the acquired output information. The output information only needs to be information representing the same type as the first information, and only needs to be information extracted from the building data that matches the first information. The output information can be, for example, keywords or icons processed in the management system 1 such as abbreviations or category IDs related to a part of the internal structure or an object. A part of the internal structure includes walls, elevators, etc., hereinafter referred to as a structure. An object can refer to an object installed in the building. The output information includes location information such as coordinates corresponding to the building data.
[0039] The processing unit 11 assigns output information to the original building data or image data generated based on the building data, and generates and outputs map data corresponding to the user's request. The output target can be an external device via the communication unit 13, a display unit 15, storage in the storage unit 12, or a combination thereof.
[0040] Image data generated from building data can be configured to be generated according to an input request. Requirements related to image data generation may include, for example, requirements for processing building data such as deleting redundant line segments and converting location information into map data in a format used in management system 1. Furthermore, requirements related to image data generation may be replaced by requirements for generating other types of data. For example, such requirements may include requirements for generating a database that establishes a correspondence between coordinates representing locations and descriptive text describing the structure or entity.
[0041] The generated map data can be associated with keywords or icons related to the output information, such as parts of the internal structure or objects, based on location information, or with the original building data or image data generated from the building data.
[0042] Thus, the output map data can be map data used for reference when the robot 200 moves, and the output information is set in a way that establishes a correspondence with the positions on the map data. Here, the setting that establishes a correspondence with the positions can be either assigning position information such as coordinate information, or updating the position information set in the building data.
[0043] Next, use Figures 3-5 Let's illustrate an example of map data generation in map generation system 10. Figure 3 This is a schematic diagram illustrating an example of single-layer map data within a facility input into the map generation system 10. Figure 4 This is a diagram illustrating an example of a table showing the correspondence between pathpoint attributes and actions. Figure 5 To express [the opinion / towards] Figure 3 Input map data and from Figure 2 A schematic diagram of an example of single-layer map data within a facility output by the map generation system 10.
[0044] As an example of building data input to learning model 12a or its contained VLM, the following are listed: Figure 3 The architectural drawing data 1000a of the single-story building illustrated in the example is used for explanation.
[0045] Architectural drawing data 1000a represents the drawing data used during the construction of a specific floor within a facility. The facility illustrated is a hospital, such as... Figure 3As shown, on this floor, there are employee workstations Ss1, Ss2, Ss3, Ss4, stairs St, elevator EV, etc. as structures, and multiple fixtures such as tables or shelves T1, T2, etc. as fixtures.
[0046] The first information included in the request, that is, the information to be obtained as output information, can be set as information related to at least one of the structures and objects that affect the movement of robot 200 (hereinafter referred to as second information). The second information may include, for example, one or more of the following: waypoints, narrow passages, walls, pedestrian-restricted areas, pedestrian-accessible areas, robot 200-restricted areas, robot 200-accessible areas, obstacles, etc., representing locations the robot 200 has passed through. Furthermore, the locations may be referred to as relay points.
[0047] Information related to waypoints can include the definition and description of the waypoint, such as the definition and description of waypoint attributes (hereinafter referred to as WP attributes) that are attributes of the waypoint.
[0048] Here, pathpoints are explained. Table 40 describes the WP attributes set for a pathpoint and their corresponding actions taken at that pathpoint during application in management system 1. Furthermore, more than one attribute can be assigned to a pathpoint. Specifically, in addition to the general WP attributes, one or more other attributes can be assigned.
[0049] General waypoints are path points that represent locations along a movement route. These path points can be either start or end points. Examples of potential waypoints include, but are not limited to, locations near exits, narrow passages, and elevators. General waypoints serve as points along a route in a planned path. The robot 200 moves autonomously through general waypoints towards the next waypoint. For example, after setting each waypoint as a general waypoint in the map data, the WP attributes can be set by overwriting or appending them, in addition to those shown in Table 40. WP attributes can also be attributes corresponding to the configuration and type of surrounding rooms.
[0050] When a charger is set as a WP attribute, it can be used as an action to connect the charger, disconnect the charger, or perform relative position correction by recognizing a mark on the charger. For example, when the remaining battery level of robot 200 falls below a certain value, robot 200 moves towards the charger's waypoint. When robot 200 reaches the charger's waypoint, it performs relative position correction. For example, a mark is attached to the charger, and relative position correction is performed by photographing the mark using robot 200's camera. Then, robot 200 connects to the charger to begin charging. When charging is complete, robot 200 disconnects the charger. The following explanation of the WP attribute will be more concise.
[0051] When a return point is set as a WP attribute, the robot 200 can identify its own position using markers recorded at various locations within the facility. When a door is set as a WP attribute, the robot can automatically open the door. When an elevator (EV) is set as a WP attribute, the robot can switch the destination floor map to the elevator (EV) map. When an elevator (EV) is set as a WP attribute, the robot can use any of the following actions: calling the EV car, detecting people and obstacles inside the car, taking the elevator, or emitting a sound indicating that the robot is taking the elevator or temporarily refusing to take the elevator. When an elevator (EV) is set as a WP attribute, the robot can use any of the following actions: taking the elevator down, or emitting a sound to remind the user to be careful when descending. When a delivery truck is set as a WP attribute, the robot can use any of the following actions: relative position correction to the delivery truck based on the recognition of markers recorded on the delivery truck, crawling under the delivery truck, or lifting the delivery truck. When a delivery truck is set as a WP attribute for unloading, the actions that can be performed include obstacle detection at the delivery truck's placement location, movement to the unloading location, or descent of the delivery truck. When a delivery truck is set as a WP attribute for waiting in a queue, the action that can be performed is to wait until an entry permission is received from the management device 100 (which acts as a server) or the advance robot 200. When a right-of-way is set as a WP attribute for waiting, the action that can be performed is to request the right of way to the right-of-way area at a waypoint near the right-of-way area and wait until an entry permission is received from the server. When a right-of-way is set as a WP attribute for granting access, the action that can be performed is to notify the server that the right-of-way area has been traversed.
[0052] When a request containing information related to the waypoints of robot 200 is input as the first piece of information, the computation processing unit 11 inputs the request to the learning model 12a. Furthermore, the computation processing unit 11 obtains the location information of each place that corresponds to the waypoints of the architectural drawing data 1000a that was also input as the output information of the VLM, and generates map data with the location information set.
[0053] The map data generated from architectural drawing data 1000a and the requested input, for example, becomes... Figure 5 Map data 1000b is generated when a request for information on waypoints defined near a narrow passage is input as the first piece of information. Map data 1000b consists of information on waypoints WP assigned to locations indicated by black dots. It is also possible to simultaneously include information on the waypoints defined corresponding to the various WP attributes illustrated in Table 40 as the first piece of information. Thus, the map generation system 10 can output map data either by inputting building data and a request containing only one type of information as the first piece of information, or by inputting a request containing multiple types of information as the first piece of information.
[0054] Furthermore, if information other than waypoint-related information is set as the first information, map data with that information can also be generated at the location corresponding to that information. To give a simpler example, when the first information includes information representing walls, the generated map data is then assigned the information that each wall is a wall.
[0055] Furthermore, candidate pathpoints can also be locations set in a manner that establishes a correspondence with places segmented by segmentation algorithms such as the Voronoi segmentation algorithm. In this case, the computation processing unit 11 inputs output information from the VLM into the segmentation algorithm located at the back end of the VLM set in the learning model 12a. Moreover, the computation processing unit 11 performs segmentation processing based on this output information, sets pathpoints at predetermined locations in each of the segmented areas, and outputs the set map data. For example, if first information containing information representing walls is input, the VLM can output the wall information as part or all of the output information, and use this output information as a keyword to perform the processing in the segmentation algorithm. This processing is performed on building data input to the VLM or map data output from the VLM, and can output the map data in a manner that includes information about the areas segmented based on information representing walls. Through such processing, when setting pathpoints, pathpoints as transit points can be set at predetermined locations in each area, such as the center of the area, the boundary with adjacent areas, etc. Furthermore, by including not only walls but also other types of information in the initial information, it is possible to set attributes related to waypoints in each region, or actions that can be taken. Additionally, the segmentation algorithm can also be used as teaching data to learn machine learning models obtained from the results of region segmentation performed on various types of building data.
[0056] Furthermore, if the rules for how to segment regions using segmentation algorithms are known for various types of building data, then when the rules are included in the request as part or all of the definition and description of the path points as information related to the path points, the path point information can be obtained as output information from the VLM even if the segmentation algorithm is not included in the learning model 12a.
[0057] Summary of the processing of this system This disclosure, as exemplified by the processing of map generation system 10, also includes a method for generating map data as described above, using a computer. Although it may use... Figure 6 This section provides a brief explanation of the map generation method, but it can be applied as various application examples exemplified in this system. Figure 6 This is a flowchart illustrating an example of the map generation method involved in this embodiment.
[0058] In this map generation method, firstly, the computer exemplified in the map generation system 10 inputs building data as image data into the VLM (S1), and also inputs a request into the VLM (S2). The order of S1 and S2 is not limited, and they can also be performed simultaneously. Next, the computer performs operations in the VLM (S3) and obtains the output information output from the VLM in response to the request (S4). Then, based on the output information obtained by the computer, map data associated with the internal structure of the building is generated (S5), and the processing ends.
[0059] Furthermore, this disclosure also includes a method for causing a computer to execute the processes shown in such a map generation method. Additionally, some or all of the processes in the robot 200, management device 100, etc., described above can also be implemented as programs. These programs can be stored and supplied to a computer using a variety of types of non-transitory computer-readable media. Non-transitory computer-readable media include a variety of types of physical recording media. Examples of non-transitory computer-readable media include magnetic recording media (e.g., floppy disks, magnetic tapes, hard disk drives), optomagnetic recording media (e.g., optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R / Ws, and semiconductor memories (e.g., mask ROMs, PROMs (Programmable ROMs), EPROMs (Erasable PROMs), flash ROMs, and RAMs (Random Access Memory)). Furthermore, programs can also be supplied to a computer via a variety of types of temporary computer-readable media. Examples of temporary computer-readable media include electrical signals, optical signals, and electromagnetic waves. Temporary computer-readable media can supply programs to a computer via wired communication paths such as wires and optical fibers, or wireless communication paths.
[0060] Effects of this implementation method According to this embodiment, map data containing desired information that affects the movement of autonomous mobile vehicles can be easily generated from image data of the building's internal structure without requiring a large amount of time. Furthermore, according to this embodiment, information about the building's interior is automatically generated by VLM, allowing map data to be generated even without the movement of autonomous mobile vehicles within a simulated environment, i.e., even without measurement result data. Therefore, according to this embodiment, regardless of whether the building data includes measurement result data, situations where map information for areas where the autonomous mobile vehicle cannot move is not obtained within the simulated environment will not occur. Even when the building data consists only of measurement result data, this embodiment can output map data with pathpoints and other information set within a range that minimally affects the movement of the autonomous mobile vehicle. Furthermore, according to this embodiment, since information about the building's interior can be automatically generated by VLM in a way that includes information that cannot be directly read from pre-prepared data such as BIM, the time required to obtain additional information about the building's interior can be reduced.
[0061] Other application examples Furthermore, the present invention is not limited to the above-described embodiments, but can be appropriately modified without departing from the spirit of the invention.
[0062] For example, the generated map data can also be used for purposes other than the movement of autonomous mobile bodies like robot 200. Although in Figures 3-5 The system assumes that the input building data and the generated map data are two-dimensional data, but as explained regarding map data, this data can also be three-dimensional. Regarding three-dimensional map data, attributes such as ceilings and floors can be set as waypoints. For example, attributes such as ceilings as vents, ceilings below a predetermined height, floors with guide lights, and whether the floor material is a predetermined material can also be set. Therefore, for example, based on the height of robot 200, at a waypoint where the attributes include a ceiling below a predetermined height, actions such as prohibiting passage can be set as actions. Furthermore, at a waypoint where the attributes include a ground surface with guide lights, actions such as prohibiting parking can be set as actions. In addition, three-dimensional map data can also be used for the operation of autonomous flying bodies that perform flight.
[0063] Furthermore, although the description assumes that the learning model 12a or the LVM contained therein is a learned model, it can also be a model capable of relearning. For example, the learning model 12a can be a VLM, and include an open-source machine learning model such as RaG (Retrieval-Augmented Generation). Therefore, the processing unit 11 can update the learning model 12a based on the latest database. This is because the database can be updated in a way that obtains more accurate output information.
Claims
1. A map generation system, wherein, Building data, comprising at least image data of the building's internal structure, and a request to be input into a visual language model, wherein the request includes a requirement to generate first information, and the first information is information relating to at least one of the internal structure contained in the building data and the objects present therein, and the visual language model is a model that takes image data and language data as input and outputs at least one of image data and language data. Obtain the output information output from the visual language model in response to the request, and Based on the obtained output information, map data associated with the internal structure of the building is generated.
2. The map generation system as described in claim 1, wherein, The first information is information relating to at least one of the internal structures that affect the movement of the moving body and the objects present.
3. The map generation system as described in claim 2, wherein, The first information includes information related to the waypoints of the moving body.
4. The map generation system as described in claim 1 or 2, wherein, The map data is the map data referenced to enable the moving object to move, and is set in a way that establishes a correspondence between the output information and the position on the map data.
5. The map generation system as described in claim 1 or 2, wherein, The building data is at least one of the following: architectural drawings of the building, image data obtained by a camera installed inside the building, and image data obtained by an autonomous moving body equipped with sensors moving inside the building.