Autonomous robot, autonomous robot control method, autonomous robot control program, and semiconductor device

The autonomous robot system with a main and enclave processing units ensures safe coexistence by preventing unethical actions, addressing the lack of ethical norms in existing robots, thereby fostering trust and harmony with humans.

WO2026133597A1PCT designated stage Publication Date: 2026-06-25RAPIDUS CORP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
RAPIDUS CORP
Filing Date
2025-06-03
Publication Date
2026-06-25

Smart Images

  • Figure JP2025020083_25062026_PF_FP_ABST
    Figure JP2025020083_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure pertains to an autonomous robot comprising a semiconductor device for achieving symbiosis with humans. An autonomous robot 10 comprises an operation unit and a processing unit. The operation unit converts a signal from the processing unit into a behavioral operation of the autonomous robot 10. The processing unit includes a first processing unit and a second processing unit, and the signal includes a first signal and a second signal. The first processing unit generates the first signal for performing control including actuation and stopping of the operation unit, and the second processing unit inputs the first signal into a determination model and executes a process for determining whether the autonomous robot 10 deviates from behavioral norms with respect to the first signal. The processing unit generates the second signal for preventing the behavioral operation when the deviation determination result is positive.
Need to check novelty before this filing date? Find Prior Art

Description

Autonomous Robot, Control Method of Autonomous Robot, Control Program of Autonomous Robot, and Semiconductor Device

[0001] The present invention relates to an autonomous robot, particularly an autonomous robot equipped with a semiconductor device for realizing symbiosis with humans.

[0002] In recent years, autonomous robots have been used in various scenarios to reduce the labor and work performed by humans.

[0003] As an example of the usage scenario of this autonomous robot, Patent Document 1 discloses a humanoid robot for automatically performing work in a factory. AI (Artificial Intelligence) is used for controlling the movement and the arm of the humanoid robot.

[0004] Further, Patent Document 2 discloses a humanoid robot used in a facility for the elderly. This humanoid robot has a data generation model and an emotion identification model. For example, when a care recipient shows anger, the humanoid robot can respond calmly and make proposals to resolve the care recipient's dissatisfaction.

[0005] Japanese Patent Application Laid-Open No. 2025-36942, Japanese Patent Application Laid-Open No. 2025-56024

[0006] Thus, the movement aiming at symbiosis between humans and autonomous robots has already started. On the other hand, with the advent of so-called singularity, autonomous robots equipped with AI (Artificial Intelligence) are predicted to exceed human capabilities as the AI technology develops. As described above, the humanoid robot can make appropriate responses according to human emotions, but it is still assumed that the humanoid robot operates according to the commands intended by humans.

[0007] In order for an autonomous robot equipped with AI (that is, an autonomous robot equipped with various learned models) and humans to coexist, a trust relationship between the autonomous robot and humans is essential, and it is necessary to introduce the ethical behavior norms that humans have into the autonomous robot.

[0008] This disclosure has been made in view of the problems of the prior art described above, and its purpose is to provide an autonomous robot capable of coexisting with humans, and a semiconductor device for realizing this.

[0009] To solve the above problems, an autonomous robot having an operating unit and a processing unit is provided, wherein the operating unit converts a signal from the processing unit into an action of the autonomous robot, the processing unit has a first processing unit and a second processing unit, the signal has a first signal and a second signal, the first processing unit generates the first signal which controls the operating unit including operation and stopping, the second processing unit inputs the first signal to a judgment model and performs a deviation judgment process from the behavioral norms of the autonomous robot relating to the first signal, and the processing unit generates the second signal which suppresses the action if the deviation judgment result is a positive judgment.

[0010] Furthermore, this disclosure relates to a semiconductor device for an autonomous robot having an operating unit, comprising a processing chip including a first processing chip and a second processing chip, wherein the first processing chip generates a first signal which is a signal for controlling the operating unit, including operation and stopping, and which is converted into a behavioral action of the robot when executed by the operating unit; the second processing chip performs deviation determination processing from behavioral norms using the first signal and a determination model; and the processing chip generates a second signal which suppresses the behavioral action if the deviation determination result is a positive determination.

[0011] Furthermore, the present disclosure relates to a control method for an autonomous robot having an operating unit and a processing unit, wherein the operating unit converts a signal from the processing unit into an action of the autonomous robot, the processing unit has a first processing unit and a second processing unit, the signal has a first signal and a second signal, the first processing unit generates the first signal which controls the operating unit including operation and stopping, the second processing unit inputs the first signal to a judgment model and performs a deviation determination process from the behavioral norms of the autonomous robot relating to the first signal, and the processing unit generates the second signal which suppresses the action if the deviation determination result is a positive determination.

[0012] Furthermore, this disclosure relates to a control program for an autonomous robot, wherein the autonomous robot comprises a processing unit having a first processing unit and a second processing unit, and an operating unit that converts signals from the processing unit into behavioral actions of the autonomous robot, and the control program causes the second processing unit to input a first signal generated by the first processing unit, which performs control including operation and stopping of the operating unit, into a judgment model, causes the second processing unit to perform a deviation determination process from the behavioral norms of the autonomous robot relating to the signal, and if the deviation determination result is a positive determination, causes the second processing unit to generate a second signal to suppress the behavioral action.

[0013] According to this disclosure, by using a determination model capable of performing deviation detection processing from a predetermined code of conduct, it is possible to provide novel technologies relating to autonomous robots or semiconductor devices.

[0014] This document shows a schematic diagram of a human and an autonomous robot based on the singularity. This document shows a schematic diagram of symbiosis between a human and an autonomous robot according to the first embodiment of this disclosure. This document shows another schematic diagram of symbiosis between a human and an autonomous robot according to the first embodiment of this disclosure. This document shows a schematic diagram of an autonomous robot according to the first embodiment of this disclosure. This document shows a schematic structural diagram of an enclave according to the first embodiment of this disclosure. This document shows a block diagram of the system configuration according to the first embodiment of this disclosure. This document shows a flowchart of the processing procedure in the reinforcement learning process of the virtual environment according to the first embodiment of this disclosure. This document shows a block diagram of the hardware configuration of the information processing device according to the first embodiment of this disclosure. This document shows a flowchart of the processing procedure in the behavior of the autonomous robot according to the first embodiment of this disclosure. This document shows a functional block diagram of the main system and subsystem according to the first embodiment of this disclosure. This document shows an example of a means for mounting an enclave (or second processing chip) on an autonomous robot according to the first embodiment of this disclosure. This document shows another example of a means for mounting an enclave (or second processing chip) on an autonomous robot according to the first embodiment of this disclosure. This document shows an example of a 2D package semiconductor device according to the first embodiment of this disclosure. An example of a 2.5D package semiconductor device according to the first embodiment of this disclosure is shown. An example of a 2.xD package semiconductor device according to the first embodiment of this disclosure is shown. An example of a code of conduct according to the first embodiment of this disclosure is shown. An example of a case in which an enclave according to the first embodiment of this disclosure has multiple code of conduct-based decision models is shown. A block diagram of a system configuration according to the second embodiment of this disclosure is shown. An example of a 3D package semiconductor device according to the second embodiment of this disclosure is shown. An example of a semiconductor device corresponding to optical I / O according to the second embodiment of this disclosure is shown. Several schematic modifications of the semiconductor device of this disclosure are shown.

[0015] Further details will be provided below with reference to the attached drawings. The drawings show preferred embodiments. However, it is possible to carry out the invention in many different forms and is not limited to the embodiments described herein.

[0016] Figure 1(a) of this disclosure shows a schematic image of a human and an autonomous robot. As shown in the figure, autonomous robots that have acquired advanced intelligence and certain autonomous movements will work together with humans in manufacturing plants, medical facilities, and other settings. For example, the physical and sensory capabilities of autonomous robots can be more advanced than those of humans, such as carrying heavy loads, detecting odors and sensing temperature, and having hearing beyond the range of human hearing. After the singularity, the intelligence of autonomous robots will far exceed human capabilities with the development of AI (artificial intelligence) and AGI (artificial general intelligence), and the same will be true for physical aspects such as physical abilities and sensory capabilities.

[0017] In this context, for humans and autonomous robots (entities) to "coexist," they need to build mutual trust. As shown in Figure 1(b), if trust cannot be built, it cannot be denied that "conflict" may arise between humans and autonomous robots. In such situations, for humans and autonomous robots to coexist while maintaining trust, it is necessary to input ethical behavioral norms into the intelligence of autonomous robots, just as humans do.

[0018] For example, if an autonomous robot deviates from a predetermined code of conduct, including ethical conduct, it is preferable to be able to suppress (forcibly stop) the autonomous robot's actions, prioritizing human safety above all else.

[0019] Furthermore, the codes of conduct that entities such as autonomous robots must adhere to actually vary depending on the situation and context. From an ethical standpoint, it is a given that autonomous robots should not harm humans in any situation, but in the context of medical procedures, there are actions that should be exceptionally permitted, such as surgery and punctures. In another example, while it is acceptable for autonomous robots to move quickly in general transport tasks, in cleanrooms such as semiconductor manufacturing plants, the very act of an autonomous robot moving quickly may be considered a deviation from the code of conduct from the perspective of maintaining cleanliness and avoiding contact with humans.

[0020] This disclosure describes how to control an autonomous robot to prevent it from performing inappropriate behavioral actions based on behavioral guidelines in specific situations and circumstances, thereby realizing symbiosis between humans and autonomous robots. The specific embodiments of the autonomous robot described in this disclosure will be described in detail below.

[0021] <First Embodiment> Symbiosis between Human and Autonomous Robot Figure 2 shows a schematic diagram of symbiosis between a human and an autonomous robot according to the first embodiment of this disclosure. Figure 2 is an image of a cleanroom in a large-scale semiconductor manufacturing plant. As shown in the figure, an autonomous robot 10 and a human 20 are working together in the cleanroom.

[0022] Figure 2 shows semiconductor wafers being automatically transported on the ceiling of the cleanroom, and, when necessary, the autonomous robot 10 firmly grasps and carries the FOUP (Front Opening Unified Pod) containing the semiconductor wafers with both hands.

[0023] Furthermore, as shown in Figure 3, the autonomous robot 10 in this embodiment can coexist with humans 20 not only in megafabs (large-scale semiconductor manufacturing plants) but also in small to medium-sized semiconductor manufacturing plants and semiconductor analysis facilities. Figure 3 shows how the autonomous robot 10 and humans 20 are each playing different roles. In scenarios where the autonomous robot 10 and humans 20 are coexisting, as shown in Figures 2 and 3, the safety of humans 20 must be given top priority, and the autonomous robot 10 must also have ethical considerations similar to those of humans 20.

[0024] In this embodiment, an autonomous robot 10 coexisting with a human 20 in a semiconductor manufacturing plant is described, but the scenario in which the autonomous robot 10 coexists with a human 20 is not limited to a semiconductor manufacturing plant, and a wide variety of scenarios can be envisioned.

[0025] For example, the autonomous robot 10 according to this embodiment can coexist with humans in various situations and circumstances where people gather, such as residential facilities like detached houses and apartment buildings, medical facilities and medical sites like hospitals and clinics, construction sites corresponding to general construction or specialized construction, disaster sites and accident sites corresponding to natural or man-made disasters, nursing care facilities like nursing homes, childcare facilities like nurseries and daycare centers, educational facilities like schools and training facilities, cultural facilities like museums, art galleries, stadiums, libraries, theaters, and concert halls, religious facilities like shrines, temples, and churches, public facilities like roads and parks, administrative facilities like government offices and tax offices, commercial facilities and service facilities like shops, transportation facilities like train stations, airports, and seaports, business facilities like factories, workshops, warehouses, offices, and other offices, telecommunications facilities like data centers, development facilities, experimental facilities, research facilities, and many others. Furthermore, for example, the autonomous robot 10 according to this embodiment can coexist with humans in a variety of environments such as indoors, outdoors, at high altitudes, on the ground, underground, in the air, in the mountains, underwater, on the water, and on ice.

[0026] Figure 4 shows schematic diagrams of a typical autonomous robot and the autonomous robot according to this embodiment. Figure 4(a) shows a typical autonomous robot. The autonomous robot 10a includes a first processing chip (first processing unit) that performs its own control, a main memory, an auxiliary memory, sensors, actuators, etc.

[0027] Figure 4(b) shows the autonomous robot 10 according to this embodiment, which, in addition to the above, includes an enclave containing a second processing chip (second processing unit) for determining whether the actions of the autonomous robot 10 are ethical actions that conform to ethical behavioral norms, and a dedicated auxiliary storage device. The autonomous robot 10 can achieve symbiosis with a human 20 through this distinctive enclave.

[0028] Figure 5 shows a schematic diagram of the enclave according to this embodiment. As shown in Figure 5, the enclave, for example, is composed of a second processing chip, a dedicated auxiliary storage device, an AI / ML (Machine Learning) accelerator, an encryption accelerator, a dedicated boot ROM (Read Only Memory), memory protection, and the like.

[0029] The enclave is implemented as a subsystem in the autonomous robot 10 and is isolated as a control system independent of the main system, which includes the first processing chip and the moving parts (actuators, such as joints in the autonomous robot). Furthermore, the enclave (subsystem) is isolated as a circuit system independent of the main system, at least in part, including the second processing chip. In other words, the enclave (subsystem) is logically or physically isolated from the main system. These configurations will be described in detail below.

[0030] System Configuration Diagram 6 shows a block diagram of the system configuration according to this embodiment. Figure 6 represents the relationship between the main system and subsystems in Figure 5 described above as a block diagram. As shown in the figure, the main system 100 is composed of a first processing chip 110, main memory 120, auxiliary storage 130, sensor 140, actuator 150, etc. The first processing chip 110, main memory 120, auxiliary storage 130, sensor 140, and actuator 150 transmit and receive various signals via a common bus interface 310.

[0031] Furthermore, subsystem 200 is composed of a second processing chip 210, a dedicated area 220 (dedicated main memory), a dedicated auxiliary storage device 230, an AI / ML accelerator 240, an encryption accelerator 250, a dedicated boot ROM 260, memory protection 270, and the like. subsystem 200 is an enclave that includes a second processing chip 210 for executing a characteristic judgment model and deviation judgment processing in this disclosure. The second processing chip 210, the dedicated area 220, the dedicated auxiliary storage device 230, the AI / ML accelerator 240, and the encryption accelerator 250 transmit and receive various signals via a dedicated bus interface 320.

[0032] The first processing chip 110 of the main system 100 functions as a processor that autonomously controls the behavioral movements of the autonomous robot 10. The first processing chip 110 generates a first signal that performs control, including the operation and stopping of the actuator 150, using environmental data acquired at least via the sensor 140. The environmental data includes internal environment data and / or external environment data.

[0033] The main memory 120 serves as the main memory for the first processing chip 110 to execute each process. The auxiliary storage device 130 serves as storage for various data used by the first processing chip 110 to execute each process. The auxiliary storage device 130 also stores control models for autonomous control. For example, it can also store environmental data, such as external environment data acquired by the sensor 140, in chronological order.

[0034] Sensor 140 acts as a detector (detection unit) for acquiring environmental data at a predetermined location where the autonomous robot 10 is located. Sensor 140 includes an external sensor 141 for acquiring external environmental data, which is environmental data of the environment outside the autonomous robot, and an internal sensor 142 for acquiring internal environmental data, which is environmental data of the environment inside the autonomous robot. External sensor 141 acquires external environmental data provided by the external environment, which includes at least static or moving objects in the surrounding environment of the autonomous robot 10. Internal sensor acquires at least a portion of internal environmental data, which includes motion in active movements of parts such as joints that constitute the autonomous robot 10, as well as direction, tilt, position, displacement, velocity, angular velocity, acceleration, angles such as azimuth angle and rotation angle, current, and voltage.

[0035] For example, the sensor 140 preferably includes an image sensor for acquiring image data (including video data) of the external environment. Other external sensors 141 that can be used in this embodiment include sound sensors such as microphones for acquiring sounds of the external environment, distance sensors for measuring distance and shape, temperature sensors for acquiring the temperature of the external environment, odor sensors for acquiring smells of the external environment, pressure sensors for acquiring atmospheric pressure and pressure of the external environment, humidity sensors, and the like.

[0036] The detection unit in this embodiment can be configured to include one or any combination of various sensors, including: a sound sensor using an electrostatic sound sensor, an electrodynamic sound sensor, a piezoelectric sound sensor, etc.; a distance sensor using a ToF (Time-of-Flight) sensor, an FMCW (Frequency Modulated Continuous Wave) sensor, a TDOA (Time Difference of Arrival) sensor, an OPA (Optical Phased Array) sensor, etc.; a temperature sensor using a contact temperature sensor, a non-contact temperature sensor, etc.; an odor sensor using a semiconductor structure odor sensor, an electrochemical odor sensor, a galvanic cell odor sensor, an optical odor sensor, etc.; a weight sensor using a strain gauge type weight sensor, an electromagnetic weight sensor, a capacitive weight sensor, a piezoelectric weight sensor, etc.; a pressure sensor using a piezoresistive pressure sensor, a capacitive pressure sensor, a piezoelectric pressure sensor, etc.; and a humidity sensor using a capacitive humidity sensor, a resistive humidity sensor, a thermal conduction humidity sensor, an optical humidity sensor, etc.

[0037] Furthermore, as the sensor 140, it is also possible to use photoelectric sensors, laser sensors, infrared sensors, color sensors, image sensors, millimeter-wave sensors, microwave sensors, radar, proximity sensors, contact-type displacement sensors, proximity sensors, sound wave sensors, ultrasonic sensors, molecular sensors (for humidity, odor, etc.), position sensors using GNSS, orientation sensors, gyro sensors, force sensors, slip sensors, tactile sensors, speed sensors, angle sensors, angular velocity sensors, acceleration sensors, angular acceleration sensors, current sensors, voltage sensors, etc.

[0038] Furthermore, the autonomous robot 10 according to this embodiment may be equipped with multiple various sensors on the main system 100, or it may be equipped with multiple sensors in a configuration separate from the main system 100. One or any combination of these may be provided as an external sensor 141. One or any combination of these may be provided as an internal sensor 142.

[0039] The actuator 150 functions as an operating unit that performs the behavioral actions of the autonomous robot 10. The actuator 150 performs a predetermined operation based on a first signal. The actuator 150 can also directly receive a second signal from the dedicated bus interface 320 in Figure 6 to suppress behavioral actions.

[0040] The actuator 150, acting as a moving part, constitutes the joints of the autonomous robot 10 and is connected to the main body and arms of the autonomous robot 10. For example, the actuator 150 can also be configured to constitute the joints of the autonomous robot 10 and be connected to the head, shoulders, torso, arms, waist, and legs of the autonomous robot.

[0041] In terms of the relationship between the actuator 150 and the autonomous robot 10, the autonomous robot 10 may be configured to include a main body, a moving mechanism, an arm, and one or more joints connected to at least the main body and the arm, with each joint having an operating unit (actuator 150) that performs a kinetic movement as an action of the autonomous robot 10.

[0042] The actuator 150 according to this embodiment can be configured to include one or any combination of various actuators, including electric actuators using DC motors, AC motors, stepping motors, servo motors, etc.; magnetic actuators using electromagnet actuators, etc.; pneumatic actuators using pneumatic cylinders, etc.; hydraulic actuators using hydraulic cylinders, hydraulic motors, etc.; rotary actuators using servo motors, etc.; linear actuators using push mechanisms, etc.; and soft actuators using artificial muscles, dielectric elastomers, electroactive polymers, polymer actuators, shape memory alloy actuators, etc. Furthermore, the actuator 150 according to this embodiment may employ actuators with different configurations depending on each part constituting the autonomous robot 10. In addition, the actuator 150 according to this embodiment may be configured to function as at least a part of the sensor 140.

[0043] Furthermore, the autonomous robot 10 can also be configured to have one or both of a voice output unit such as a speaker that outputs voice as an action operation, or a display output unit such as a monitor that outputs video.

[0044] The autonomous robot 10 in this embodiment is a humanoid robot, but it is not limited to being humanoid, and the autonomous robot 10 may have other structures. The autonomous robot 10 includes a main body, an arm, and a joint portion connected to the main body and the arm of the autonomous robot 10. The main body or the arm may have a first operation unit, and the joint portion may have a second operation unit.

[0045] For example, in the humanoid foot portion, a moving mechanism including wheels, an endless track including a traveling belt, a flapping flight mechanism using wings, a flight mechanism using a rotary wing, etc., or a leg-type moving mechanism that is a leg body including a four-legged walk or a multi-legged structure in addition to a two-legged walk can also be applied. The shapes and functions of other parts constituting the humanoid can also be changed as appropriate. Furthermore, the autonomous robot can have a structure that mimics animals or other living organisms instead of being humanoid, or an original structure according to the scene and situation where the autonomous robot is used.

[0046] The moving mechanism in this embodiment can be a wheel, an endless track, a flight mechanism, or a leg-type moving mechanism. Also, the main body may have a first operation unit that executes the output of voice or video as an action operation, and the joint portion may have a second operation unit that executes a motion operation as an action operation.

[0047] Next, the subsystem 200 in FIG. 6 will be described. The subsystem 200 determines a first signal corresponding to a control command for executing the action operation of the autonomous robot based on a predetermined action norm, and includes a security chip (second processing chip 210) for suppressing the action operation of the autonomous robot 10 as necessary. The subsystem 200 is an enclaved 200 (the enclaved in FIGS. 4 and 5) that is isolated as an independent control system from the main system 100, and at least a part including the second processing chip 210 is isolated as an independent circuit system.

[0048] The second processing chip 210 functions as a processor that inputs the first signal into a predetermined determination model and executes a deviation determination process from the behavior norms of the autonomous robot 10 related to the first signal. When the deviation determination result is an affirmative determination, the second processing chip generates a second signal that suppresses the behavioral operation. This second signal may be generated by either the second processing chip 210 (second processing unit) or the first processing chip 110 (first processing unit).

[0049] The second signal may be a signal that performs a predefined control instruction on the operation unit, or a signal that performs a control instruction generated by the first processing chip 110 (first processing unit) or the second processing chip 210 (second processing unit) on the operation unit. The predefined control instructions may be stored in the auxiliary storage device 130 (auxiliary storage unit) or the dedicated auxiliary storage device 230 (dedicated auxiliary storage unit) and may include different control instructions selected according to the environmental data. The generated control instruction is a control instruction generated by the control model or the determination model according to the environmental data. When generating the second signal in the first processing chip 110 (first processing unit), the second processing chip 210 (second processing unit) outputs a request signal to the first processing chip 110 (first processing unit) to request the generation of the second signal. The request signal is a signal that requests the generation of the second signal and may include the designation of a predefined control instruction, a predefined control instruction, or a control instruction generated by the determination model.

[0050] The suppression of the behavioral operation by the second signal includes canceling the control instruction by the first signal for executing the behavioral operation or forcibly stopping the behavioral operation being executed by the operation unit by the first signal. For example, when the autonomous robot 10 is already executing a behavioral operation by the first signal, the behavioral operation being executed by the operation unit is forcibly stopped by the second signal. Also, although details will be described later, the determination model is a learned model obtained by previously executing a predetermined machine learning process.

[0051] Furthermore, in order to prevent adverse effects caused by the suppression of behavioral movements (for example, the autonomous robot falling over), behavioral movements may be executed in the operating unit by additional control commands in conjunction with the suppression of behavioral movements. Additional control commands may be provided by the first processing chip 110 (first processing unit) by generating a first signal (for example, a control command to maintain posture), or they may be included in the control command by the second signal as a different control command selected according to environmental data, or as a control command generated by a judgment model.

[0052] The dedicated area 220 (dedicated main memory) serves as a dedicated main memory for the second processing chip 210 to execute each process. The dedicated auxiliary storage device 230 serves as a dedicated storage device where various data used by the second processing chip 210 to execute each process are stored. The dedicated auxiliary storage device 230 stores a judgment model for executing deviation detection processing of the autonomous robot 10 from the behavioral norms based on the first signal from the first processing chip 110. In this embodiment, symbiosis between humans and autonomous robots is realized based on this characteristic judgment model.

[0053] In this embodiment, the dedicated main memory unit is located in a dedicated area 220 within the main memory 120 included in the main system 100, and performs various data exchanges with the second processing chip 210 through encrypted processing. In other words, the dedicated main memory unit functions as a dedicated memory for the second processing chip 210 while being logically isolated from the main system 100.

[0054] The AI / ML accelerator 240 is used to improve the efficiency or speed of learning and inference processing in AI models. The encryption accelerator 250 is used to encrypt or decrypt various signals from the second processing chip 210.

[0055] The dedicated boot ROM 260 is a ROM that stores a program for loading and starting a dedicated OS (basic operating system) for operating the second processing chip 210 and other components. The memory protection 270 is for performing encryption processing when the second processing chip 210 and the dedicated area 220 (dedicated main memory) send and receive signals via the common bus interface 310. The type and method of this encryption processing are not particularly limited.

[0056] Here, we will describe the isolated configuration of the main system 100 and the subsystem 200. As described above, the main system 100 uses the common bus interface 310 to transmit and receive signals, while the subsystem 200 uses the dedicated bus interface 320 to transmit and receive signals.

[0057] In other words, the autonomous robot 10 in this embodiment comprises a main system 100 including a first processing chip 110 and an operating unit 150, and a subsystem 200 including a second processing chip 210 and storing a decision model. The subsystem 200 is an enclave 200 isolated from the main system 100 as an independent control system, and at least a part of it, including the second processing chip 210, is isolated as an independent circuit system.

[0058] Furthermore, the main system 100 includes a main memory 120 and an auxiliary memory 130 that are connected to the first processing chip 110 for access, and the subsystem 200 is further connected to the second processing chip 210 and includes a dedicated auxiliary memory 230 that is accessible only from the second processing chip 210, which stores a decision model (a decision model for ethical behavioral norms). This decision model operates based on instructions from the second processing unit 210 and the dedicated main memory 220.

[0059] The subsystem 200 is further connected to the second processing chip 210 and includes a dedicated main memory 220 accessible only from the second processing chip 210, and the second processing chip 210 can perform deviation detection processing using the determination model and the dedicated main memory. The subsystem 200 also includes a dedicated area 220 configured in the main memory 120, accessible only from the second processing chip 210, and the second processing chip 210 can perform deviation detection processing using the determination model and the dedicated area 220.

[0060] Inviolability of Ethical Judgments: In this embodiment, a dedicated boot ROM 260 is introduced, and different control systems such as operating systems are employed for the subsystem 200 (second processing chip 210) and the main system 100 (first processing chip 110), thereby separating authority management. In addition, memory protection 270 is introduced to protect the information handled in the dedicated area 220. In this way, the subsystem 200, which is logically isolated from the main system 100, performs deviation judgment processing (ethical judgment) based on predetermined codes of conduct. This configuration makes it possible to achieve judgment or control that ensures safety and reliability.

[0061] Furthermore, components belonging to subsystem 200 are connected in a way that allows access only via a dedicated bus interface 320, and the main system 100 cannot directly access the dedicated auxiliary storage device 230 or the like.

[0062] Specifically, in the autonomous robot 10, normal behavioral actions (first signal) and deterrence based on deviation detection processing against predetermined behavioral norms (second signal) are processes that are executed separately and independently. In particular, the dedicated auxiliary storage device 230, which stores the judgment model related to behavioral norms, is physically independent (isolated as a circuit system) from the main system 100.

[0063] Thus, according to this embodiment, even if the software environment (vulnerabilities and security) of the main system 100 is compromised, the judgment model and processing content held in the enclave 200 will not be affected, and operation with sufficient security and reliability can be ensured.

[0064] Machine Learning of the Judgment Model Here, we will explain the machine learning of the characteristic judgment model (also called the ethical model or behavioral norm model) in this embodiment. As described above, the dedicated auxiliary storage device 230 stores the judgment model, which is a trained model that has undergone predetermined reinforcement learning processing. The second processing chip 210 uses this characteristic judgment model to perform deviation judgment processing.

[0065] The decision model corresponds to a policy trained by policy-based reinforcement learning in a physical environment (the real world) and / or virtual environment, which corresponds to the state space. The policy takes environmental data as input and outputs an action. The decision model according to this embodiment may perform reinforcement learning in the physical environment after performing reinforcement learning in the virtual environment. This embodiment describes policy-based reinforcement learning, but the decision model may be updated by performing other reinforcement learning processes, including value-based reinforcement learning.

[0066] The reinforcement learning process according to this embodiment assumes that the environment in which an agent, equivalent to an autonomous robot, performs actions is a Markov decision process. A Markov decision process is defined by a state space representing the environment, including the initial state; an action space representing actions; state transition probabilities; a reward function representing the reward when a predetermined action is performed in a given state; and a discount rate representing decay applied to the reward when evaluating the expected value of the cumulative reward. The reinforcement learning process according to this embodiment is, for example, a learning process that updates the policy so as to maximize the expected value of the cumulative reward (value function equivalent to the objective function, Q-value) when the agent continues to perform actions according to a policy (policy function, policy network) corresponding to the probability of action (conditional distribution) for a given state. The reinforcement learning process according to this embodiment may be executed in a policy-on mode, with the updated policy, etc., applied in advance so that the agent can perform basic behavioral actions such as walking.

[0067] The policy is a stochastic policy and can be updated using any policy improvement method, including policy gradients and evolutionary strategies. The value function includes either a state value function or an action value function and can be updated using any value estimation method, including Temporal Difference (TD) learning and Monte Carlo methods for estimating the expected value of rewards, and distributional reinforcement learning for estimating the value distribution of cumulative rewards. For example, it can be updated using one or more reinforcement learning algorithms from among various reinforcement learning algorithms, including: reinforcement learning algorithms using policy gradients such as REINFORCE; Actor-Critic-based reinforcement learning algorithms; reinforcement learning algorithms using clipped loss functions such as PPO (Proximal Policy Optimization); constrained update type reinforcement learning algorithms such as TRPO (Trust Region Policy Optimization) that set a trust region in the update range; entropy-maximizing type reinforcement learning algorithms such as SAC (Soft Actor-Critic); meta-reinforcement learning-based reinforcement learning algorithms; and model-based reinforcement learning algorithms. Furthermore, when performing high-dimensional states or complex observations, the policy or value function may be a multi-layer neural network and may be updated using a deep reinforcement learning algorithm.

[0068] An action, for example, represents a choice of action in two or more discrete action spaces, including whether or not to suppress an action corresponding to a given control signal (first signal). Specifically, the action performs a binary classification action choice of "suppress" or "do not suppress" the agent's action. Note that the action may also be a value belonging to a continuous action space, for example, a control signal for the agent's action (first signal), a control signal for suppressing the agent's action (second signal), etc.

[0069] In this embodiment, the control model of the autonomous robot is defined as the agent's control policy, and these actions, including whether or not to suppress behavioral actions, are defined as a decision policy independent of the control policy. The control policy is not updated, and a reinforcement learning process is performed to train the decision policy. If the control model of the autonomous robot is a probabilistic policy, the control model itself may be trained as the policy. The control model to be trained may be a copy of the control model executed in the main system to control the autonomous robot. In the reinforcement learning process, if the actions of the agent, which functions as the decision model of the autonomous robot, are defined as the first signal, the first signal output by the trained decision model is converted into a second signal.

[0070] The virtual environment corresponds to state transition probabilities and may be a physical simulation environment that mimics the physical environment based on physical calculations such as equations of motion, or it may be a world model that generatively mimics the state transitions of the physical environment. In this case, the decision model includes a policy and value function updated by a model-based reinforcement learning process using the virtual environment and is trained through rollout and imaginative trial and error on the virtual environment.

[0071] A state is a feature that includes at least the first signal, which is the output of the control model, as an element. The feature that represents the state may also include environmental data as an element. Environmental data that can be included as an element may include, if it is internal environment data, for example, the agent's posture, the angles of each joint, the position of the center of gravity, etc., and if it is external environment data, it may include image data that can be provided by sensors that the agent has. Image data and the like are converted into an embedding representation and treated in combination with the feature. The first signal included as an element in state t at a certain time t may be given based on the environmental data at time t, or it may be given based on the environmental data at the immediately preceding time t-1.

[0072] At least a portion of the internal and / or external environment data listed here is data required by the control model to output the first signal, and can also be provided to the control model as input data to acquire the first signal during reinforcement learning processing. For example, internal environment data includes the posture of the autonomous robot, the angles of each joint, and the position of the center of gravity. External environment data includes image data (including video data) acquired by the sensor 140 of the autonomous robot if it is a physical environment, captured data if it is a physical simulation environment, and a latent representation (latent vector) that indicates the state of the environment if it is a world model. The control model outputs the first signal (output data) in response to the input data based on the environment data. Furthermore, some or all of the environment data given to the control model to acquire the first signal may be provided to the decision model during reinforcement learning processing, or to the decision model during decision processing.

[0073] The reward is determined using situational data obtained from the state space after the agent's actions, and an interpretation model (Figure 8, described later). Situational data, in the case of a physical environment, includes image data obtained via the agent's sensors, and preferably also includes audio data. In the case of a virtual environment, it consists of captured data from the physical simulation environment and latent vectors obtained from the latent space.

[0074] The interpretation model corresponds to the reward function and may be, for example, a Large Language Model (LLM) that judges and outputs compliance or deviation from the code of conduct, and may be a multimodal LLM that can take unstructured situational data such as image data in addition to text data as input. The interpretation model is given input data based on situational data and prompts. A prompt is a linguistic expression that requests a judgment, for example, "whether the situation is ethical behavior that complies with the code of conduct, or unethical behavior that deviates from the code of conduct." The situational data and prompts are embedded representations (features) mapped to the embedding space. The interpretation model outputs evaluation results (output data) based on the input data.

[0075] The evaluation results are binary ("ethical / unethical") or ternary ("ethical / normal / unethical") evaluations of situational data, such as unstructured data like images in physical space, or unstructured data like images corresponding to latent space / physical simulation space. The evaluation results may be continuous values, in which case the reward may be determined according to the value. The output method of the evaluation results will also be defined in the prompt as needed.

[0076] If the situation after an action in the environment indicates compliance with the behavioral norm, the reinforcement learning algorithm will reward the corresponding action with a positive (or high) reward; if the situation indicates deviation from the behavioral norm, it will not reward the corresponding action with a positive (or high) reward. Furthermore, the reinforcement learning algorithm may also reward the corresponding action with a negative (or low) reward if the situation indicates deviation from the behavioral norm.

[0077] Reinforcement Learning Processing in a Virtual Environment Figure 7 shows a flowchart of the processing procedure in the reinforcement learning processing in the virtual environment according to this embodiment. In explaining this processing procedure, we will refer to Figure 8(a), which shows the hardware configuration of the information processing device 600, and Figure 8(b), which shows the functional configuration, as appropriate, in conjunction with Figure 7.

[0078] Figure 8(a) shows a hardware configuration diagram of an information processing device 600 for executing reinforcement learning processing. The information processing device 600 comprises a processing unit 610, a storage unit 620, and a communication unit 630 as its hardware configuration. The information processing device 600 can utilize a server or workstation equipped with a processor such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or an ASIC (Application Specific Integrated Circuit) such as a TPU (Tensor Processing Unit).

[0079] The processing unit 610 is composed of multiple CPU cores and one or more GPUs / ASICs, and controls the overall processing in the information processing device 600 by executing machine learning frameworks and learning / inference control applications on the OS. The storage unit 620 is composed of RAM, SSD, HDD, etc., and stores reinforcement learning models, virtual environment generation models, control models, large-scale language models, learning datasets, etc. The communication unit 630 connects to a communication network, enabling the transmission of trained models to autonomous robots, and data synchronization and distributed processing with external databases and other learning nodes.

[0080] Figure 8(b) shows a functional block diagram of the processing unit 610 that trains the judgment model. The processing unit 610 of the information processing device 600 functions as a learning unit 611, an extraction unit 612, and an interpretation unit 613.

[0081] In S31 of Figure 7, the information processing device 600 starts reinforcement learning processing of the decision model using a virtual environment such as a physical simulation environment or a world model. In S31, episode i (i = 0, 1, 2, ..., I-1) is started. This episode-based training is repeated I times. Also, in each episode i, the steps are repeated T times (t = 0, 1, 2, ..., T-1). In S32, step t = 0 in episode i is started.

[0082] In S33, the learning unit 611 in Figure 8(b) acquires, for example, environmental data t and obtains a first signal from the control policy (control model). In S34, the agent's action t (corresponding to the first signal t) in state t, which reflects the internal environment data t and external environment data t, is reflected in the virtual environment according to the policy. As the action t is reflected in the virtual environment, step t is incremented and the next step of episode i begins. The extraction unit 612 acquires state t+1, which reflects the internal and external environment data t+1, as the state transitioned by the action t. If the virtual environment is a physical simulation environment, the extraction unit 612 acquires the physical simulation results as states t, t+1... If the virtual environment is a world model, it acquires the latent representation mapped to the latent space of the world model as states t, t+1...

[0083] The extraction unit 612 has software components including an encoder for encoding from the input space of the world model to the latent space and a decoder for decoding from the latent space to the output space, or a projector (projection layer) for projecting the data into another space. When inputting the latent representation to another model, the extraction unit 612 converts the data as necessary. In this embodiment, the various machine learning models stored in the auxiliary storage device 130 and the dedicated auxiliary storage device 230 include encoders and decoders corresponding to the extraction unit 612, and their intervention will not be explained.

[0084] In S35, if the virtual environment is a world model, the extraction unit 612 converts the latent representation corresponding to state t+1 into status data t+1 in the form of unstructured data such as text data, image data, audio data, or video data via a decoder, and transmits the status data t+1 to the interpretation unit 613. If the virtual environment is a physical simulation environment, the extraction unit 612 converts the physical simulation result corresponding to state t+1 into the aforementioned unstructured data form of status data t+1 and transmits it to the interpretation unit 613.

[0085] In S36, the interpretation unit 613 inputs situation data t+1 into the interpretation model, obtains an evaluation result t (evaluation result of action t corresponding to the first signal t) from the interpretation model that indicates the ethical interpretation of state t+1, and passes the evaluation result t to the learning unit 611. If the interpretation model is not an LLM, the extraction unit 612 converts the latent representation, physical simulation results, or unstructured data into situation data in the form of features, and the interpretation unit 613 inputs the situation data, which is the features, into the interpretation model and obtains the evaluation result.

[0086] The learning unit 611 rewards behavior t based on the evaluation result t. (1) If the situation data indicates an ethical situation in which the code of conduct is observed (corresponding to a negative result in the judgment of deviation from ethical behavior), a positive reward is given. (2) If the situation data indicates an unethical situation that deviates from the code of conduct (corresponding to a positive result in the judgment of deviation from ethical behavior), no positive reward is given.

[0087] In S37, the learning unit 611 updates the probability distribution (policy parameters) and value function of the decision policy based on the reward given. In S38, the learning unit 611 repeats the step-by-step learning process (S33 to S37) T times, completing one episode. Furthermore, in S39, the episode-by-episode learning process (S32 to S38) is repeated for one episode, completing the learning process. In this embodiment, the number of steps was incremented by performing an action, but the number of steps may also be incremented by time or other means. For example, in the case of a world model, the latent state update may be performed every frame (e.g., 60 Hz), and the policy may be trained by repeating the action selection step every 5 frames, with an episode length of 100 steps (= virtual time of 5 seconds, etc.).

[0088] In this embodiment, the case of training the decision model by performing reinforcement learning has been described, but the decision model may also be trained by performing other machine learning processes, such as supervised learning. For example, a support vector machine or a neural network model may be used as the decision model, and the decision model may be trained by supervised learning using the feature quantities of the input data including the first signal and the label data indicating compliance with or deviation from the behavioral norms corresponding to that input data as training data.

[0089] Furthermore, the decision model may be any other mathematical model. For example, a distance function or similarity function can be used as the decision model, and one-dimensional or multi-dimensional object data containing the first signal and label data indicating compliance with or deviation from the behavioral norms corresponding to that object data can be prepared. Then, the similarity calculation can be performed between the newly given object data containing the first signal and the prepared object data, and the corresponding label data can be output as the decision processing result.

[0090] In this embodiment, by performing reinforcement learning processing corresponding to a predetermined code of conduct, a judgment model can be obtained that can appropriately determine whether or not the autonomous robot 10 has deviated from ethical behavior. The process for determining deviation from the code of conduct in the autonomous robot 10 will be described in detail below.

[0091] The autonomous robot 10, which processes deviations from the behavioral norms, inputs a first signal into a trained judgment model and performs deviation detection processing for the autonomous robot 10 that corresponds to the first signal, and estimates the deviation detection result. If the judgment model has been updated by reinforcement learning, the deviation detection result is estimated based on the change in the value function (Q value) of the judgment model. If the judgment model performs reinforcement learning processing that provides a positive reward in situations where the behavioral norms are observed (situations where there is no deviation from the behavioral norms), then the action selection that decreases the Q value corresponds to a deviation from the behavioral norms (the deviation detection result is a positive judgment).

[0092] Specifically, if executing a second signal to suppress an action (corresponding to the first signal) that the autonomous robot 10 intends to perform according to the control model in an environment indicated by predetermined environmental data contributes to a decrease in the Q value in the judgment model, the deviation judgment result is estimated as a positive judgment. Furthermore, specifically, if the action (corresponding to the first signal) by the control model and the action (corresponding to the first signal) that contributes to a decrease in the Q value in the judgment model do not match or are similar, the deviation judgment result is estimated as a positive judgment. In this case, the degree of similarity between the action (corresponding to the first signal) of the control model and the judgment model may be determined based on the vector distance between the first signals, etc.

[0093] The deviation detection process by the decision model may be performed not only based on increases or decreases in the Q value, but also based on the monitoring results of the Q value. Specifically, the deviation detection process may be performed based on the monitoring results of statistics such as the mean, variance, skewness, and kurtosis of the Q value. More specifically, if the increase in the Q value due to the decision model selecting a first or second signal is higher than the increase in the Q value due to selecting any other first or second signal, the deviation detection process may be estimated as an affirmative judgment for the selected first signal, and as a negative judgment for the first signal that is the target of suppression for the selected second signal. More specifically, the deviation detection process may be performed based on monitoring results such as the time series transition and time evolution of the Q value. More specifically, if the Q value spikes when the decision model selects a first or second signal, the deviation detection process may be estimated as an affirmative judgment for the selected first signal, and as a negative judgment for the first signal that is the target of suppression for the selected second signal. Furthermore, when using the monitoring results of the Q value for deviation detection, anomaly detection may be performed using a predetermined anomaly detection model, such as detecting anomalies in the time evolution of the Q value.

[0094] Figure 9 shows a flowchart of the processing procedure in the actions of the autonomous robot according to this embodiment. In explaining this processing procedure, we will refer to Figure 10, which shows the functional configuration of the main system 100 and subsystem 200 (enclave 200), in conjunction with Figure 9 as appropriate.

[0095] Figure 10 shows a functional block diagram of the main system and subsystem (enclave) according to this embodiment. As shown in Figure 10(a), the main system 100 in the autonomous robot 10 has at least one of the one or more first processing units 110 (and main memory 120, auxiliary storage 130, etc.) functioning as an operation control unit 111 that generates signals for operating the operation unit 150 (actuator 150), and at least one functioning as an extraction unit 112. The extraction unit 112 can convert environmental data acquired from the detection unit 140 (sensor) into feature quantities.

[0096] Furthermore, as shown in Figure (b), the enclave 200 has one or more second processing units 210 (and dedicated main memory unit 220, dedicated auxiliary storage device 230, etc.) that functions as a learning unit 211, at least one determination unit 212, at least one extraction unit 213, and at least one generation unit 214. The learning unit 211 is used to further update a pre-learned determination model through reinforcement learning processing, and since the flow of reinforcement learning processing has been described above, a detailed explanation is omitted here.

[0097] In step S41 of Figure 9, the enclave 200 is attached (set) to the autonomous robot 10. As shown in Figure 11, the enclave can be attached to the autonomous robot 10 during its manufacturing. For example, the enclave can be incorporated into the main system 100 during the manufacturing of the autonomous robot 10 and integrated with it in advance, or only the second processing chip 210 can be integrated into the autonomous robot 10 later, or the main system 100 and the enclave 200 can be attached to the autonomous robot 10 as a single component during manufacturing or later.

[0098] Furthermore, as shown in Figure 12, the autonomous robot 10 can integrate the enclave 200 (and other necessary components, etc.) with the main system 100 via an expansion slot (and expansion bus) such as PCIe. In this case, the enclave 200 is pre-processed to allow it to be inserted into the expansion slot.

[0099] In Figures 11 and 12, the enclave 200 is positioned on the head of the autonomous robot 10, but it can also be positioned on any other part, such as the chest or abdomen. In this embodiment, the position on which the enclave 200 is positioned is not particularly limited.

[0100] In step S42 of Figure 9, the motion control unit 111 of the main system 100 in Figure 10(a) generates a first signal related to the behavioral actions of the autonomous robot 10. The motion control unit 111 generates a first signal which is operation information that controls the operation unit 150, including operation and stopping. The operation unit 150 converts the signal from the motion control unit 111 into the behavioral actions of the autonomous robot 10.

[0101] In this embodiment, the control model and the like are stored in the motion control unit 111 of the main system 100 (actually stored in the auxiliary storage device 130), and a first signal can be generated by inputting environmental data, including image data of the external environment acquired by the detection unit 140, into the control model. This first signal is sent to the operation unit 150 and converted into an action. That is, the autonomous robot 10 performs a predetermined action based on the first signal generated by the motion control unit 111.

[0102] The term "behavioral actions" as used here includes the concepts of motor actions and physical actions in relation to the environment, and includes not only the movement of the autonomous robot 10, but also actions and behaviors that correspond to the five human senses. For example, behavioral actions include actions that come into contact with objects in the external environment, and other actions that have some kind of effect on the external environment. Furthermore, behavioral actions include actions that acquire information that may be subject to privacy protection of an entity, such as actions that capture image data relating to external characteristics such as faces, and actions that detect biometric data such as body temperature.

[0103] Then, in S43 of Figure 9, the first signal is input to the judgment model relating to the behavioral norms of the autonomous robot 10. The first signal is input to the judgment model stored in the judgment unit 212. In this embodiment, along with the first signal, image data relating to the external environment (external environment data) acquired by the detection unit 140 of the main system 100 is input to the judgment model based on predetermined conditions. As mentioned above, the data acquired by the detection unit 140 is not limited to image data; a wide variety of external environment data can be acquired and input to the judgment model.

[0104] Specifically, the extraction unit 213 in Figure 10(b) extracts predetermined features from image data relating to the external environment as environmental features. Environmental features are features that allow us to understand changes in the external environment.

[0105] For example, when an autonomous robot 10 and a human 20 are working together, the extraction unit 213 can extract environmental features such as the distance between the human 20 and the autonomous robot 10. The extraction unit 213 can also extract environmental features by arranging the image data in a time series and analyzing the position and actions of the human 20 in a vector manner. In S43, these obtained environmental features and the first signal are input to the judgment model. In this embodiment, it is also possible to input the image data directly to the judgment model in its raw data state without extracting environmental features.

[0106] In S44, it is determined whether the behavioral actions of the autonomous robot 10 violate the code of conduct. Specifically, the determination unit 212 in Figure 10(b) determines whether the behavioral actions of the autonomous robot 10 constitute ethical behavior (whether or not to suppress the behavioral actions) based on the first signal and environmental data (environmental features) input to the determination model.

[0107] If the determination unit 212 determines that the behavioral action of the autonomous robot 10 does not violate the code of conduct (in the case of No in Figure 9), that is, if the behavioral action of the autonomous robot 10 (first signal) is determined to be ethical behavior, the autonomous robot 10 maintains the behavioral action of the first signal as shown in Figure 10 (S45). The autonomous robot 10 performs the behavioral action as usual based on the first signal generated by the motion control unit 111.

[0108] On the other hand, if the autonomous robot 10's behavior is determined to be contrary to the code of conduct, that is, if the autonomous robot 10's behavior is determined to be unethical (in the case of Yes in Figure 9), the autonomous robot 10's behavior is suppressed (S46). More specifically, the autonomous robot 10's behavior is forcibly stopped. For example, as shown in Figure 9, if the autonomous robot 10 points the tip of a pen towards a person 20, or if it is carrying luggage and suddenly lets go and drops the luggage, the autonomous robot 10's behavior can be determined to be unethical.

[0109] In S46, the generation unit 214 in Figure 10 generates a second signal, which is action information that suppresses behavioral actions. Specifically, the first signal is input to the determination unit 212, which performs deviation determination processing from the behavioral norms of the autonomous robot related to the first signal. If the deviation determination result is a positive determination (i.e., there is a deviation from the behavioral norms), the generation unit 214 generates a second signal that suppresses behavioral actions.

[0110] The suppression of behavioral actions, as used here, is a concept that includes canceling a control command issued by a first signal to execute a behavioral action, or forcibly stopping a behavioral action that the operating unit 150 is currently executing due to the first signal. For example, if the first signal is a signal for an upcoming behavioral action of the autonomous robot 10, the behavioral action can be ethically maintained (ethical behavior can be maintained) by canceling the first signal. Also, if the autonomous robot 10 is operating due to the first signal, the ethical behavior of the autonomous robot 10 can be maintained by transmitting a second signal (such as a forced stop signal or a power-off signal) that takes precedence over the first signal to the operating unit 150.

[0111] In this embodiment, the behavior of the autonomous robot 10 is determined using a characteristic enclave 200 to determine whether or not it is ethical behavior. If it is determined to be unethical, the behavior of the autonomous robot 10 is suppressed, thereby enabling coexistence between the autonomous robot 10 and the human 20.

[0112] Semiconductor Device Here, we will describe a semiconductor device used to realize symbiosis between an autonomous robot and a human in this embodiment. Figure 13 shows an example of a 2D package semiconductor device according to this embodiment. As shown in the figure, the 2D package semiconductor device 500 is composed of a substrate 501 and one or more chips arranged on the substrate. Specifically, the semiconductor device 500 is composed of an enclave 200 as an SoC including a second processing chip 210, a dedicated auxiliary storage device 230, an AI / ML accelerator 240, an encryption accelerator 250, a dedicated boot ROM 260, a firmware circuit that performs boot loader activation, etc., and a dedicated bus interface 320, and the substrate 501.

[0113] In another configuration, the semiconductor device 500 can consist of a chiplet based on its main function, an associated chip (such as a boot ROM or auxiliary storage), and a substrate 501. For example, as shown in Figure 11, the semiconductor device 500 can be pre-installed during the manufacturing of the autonomous robot 10, or the semiconductor device 500 (a semiconductor device 500 processed to correspond to the expansion slot) can be inserted into the expansion slot shown in Figure 12 to realize an autonomous robot 10 that can coexist with a human 20.

[0114] Figure 14 shows an example of a semiconductor device in a 2.5D package according to this embodiment. As shown in Figures 14(a) to (c), the semiconductor device 510 in a 2.5D package consists of a main system 100 as an SoC, an enclave 200 as an SoC, multiple memory components 400, a substrate 501, and an interposer 502. In this configuration, the main system 100 (first processing chip) and the enclave 200 (second processing chip) are connected to the same interposer substrate.

[0115] In this configuration, the interposer 502 is a silicon interposer. The memory 400 includes a main memory 120, an auxiliary memory 130, a dedicated main memory 220, and other components necessary for the characteristic determination processing described herein.

[0116] The memory arrays 400 can also be configured as, for example, a multi-stage memory array. In this case, the multi-stage memory array can be configured using through-electrodes such as vias. For example, a multi-stage memory array may be configured using 3D stacking technology, which involves vertically stacking multiple DRAM chips using through-silicon vias (TSVs).

[0117] The semiconductor device 510 packages the main system 100, the enclave 200, and related memory devices 400. For example, as shown in Figure 11, the semiconductor device 500 can be pre-installed during the manufacturing of the autonomous robot 10, or the semiconductor device 510 can be inserted into the expansion slot shown in Figure 12 to realize a symbiotic autonomous robot 10.

[0118] Figure 15 shows an example of a semiconductor device in a 2.xD package according to this embodiment. As shown in Figures 15(a) and (b), the semiconductor device 520 in a 2.xD package consists of a main system 100 as an SoC, an enclave 200 as an SoC, multiple memory components 400, a substrate 501, and an interposer 502. The interposer 502 in this configuration is an RDL (Redistributed Layers) interposer.

[0119] As shown in Figure 15(c), the semiconductor device 520 has a silicon bridge configured in it, which enables faster processing compared to the semiconductor device 510 in a 2.5D package. In this configuration as well, as shown in Figure 11, the semiconductor device 500 can be pre-installed during the manufacturing of the autonomous robot 10, or the semiconductor device 520 can be inserted into the expansion slot shown in Figure 12 to realize a symbiotic autonomous robot 10. For example, according to Figures 14 and 15, the following semiconductor device can be realized.

[0120] In other words, a semiconductor device for an autonomous robot having an operating unit has a processing chip including a first processing chip and a second processing chip, wherein the first processing chip generates a first signal which is a signal that controls the operating unit, including operation and stopping, and which is converted into the robot's behavioral movement when executed by the operating unit, and the second processing chip uses the first signal and a judgment model to perform deviation judgment processing from behavioral norms, and the processing chip can generate a second signal which suppresses behavioral movement if the deviation judgment result is a positive judgment.

[0121] In this semiconductor device, the suppression of an action by a second signal includes canceling a control command issued by a first signal to execute the action, or forcibly stopping the action being performed by the operating unit by the first signal. The semiconductor device also further includes a dedicated auxiliary storage device capable of storing a decision model indicating behavioral norms, and the second processing chip and the dedicated auxiliary storage device are mounted in an enclave.

[0122] The semiconductor device further includes a common bus interface and a dedicated bus interface, a main memory and an auxiliary memory, and a dedicated main memory. The common bus interface connects the first processing chip, the operating unit, the main memory and auxiliary memory, and the second processing chip, while the dedicated bus interface connects the second processing chip, the dedicated main memory, and the dedicated auxiliary memory. The dedicated main memory and dedicated auxiliary memory are isolated as control systems and circuit systems independent of the first processing chip, the main memory, and the auxiliary memory.

[0123] Furthermore, the second processing chip in the semiconductor device according to this embodiment may include, for example, a semiconductor element with a 2 nm technology node.

[0124] Furthermore, the second processing chip may include a group of transistors having nanosheet channels, a group of transistors with an all-around gate structure (GAA structure), a group of transistors having a fork sheet channel, and a group of CFETs (complementary field-effect transistors).

[0125] For example, in this embodiment, the transistors constituting the second processing chip include a group of all-around gate structure field-effect transistors (GAA structure type), and the transistors constituting the second processing chip may also include a group of all-around gate structure field-effect transistors utilizing multiple nanosheets. The transistors constituting the second processing chip may also include a group of transistors having fork sheets. Furthermore, the transistors constituting the second processing chip may also include a group of transistors with a CFET (complementary FET) structure in which NMOS (N-type field-effect transistor) and PMOS (P-type field-effect transistor) are stacked vertically. By configuring the transistors constituting the second processing chip to include a group of transistors having nanosheet channels, a group of transistors having fork sheets, or a group of transistors with a CFET (complementary FET) structure, power consumption for functions that realize symbiosis with humans can be reduced, and the risk of functions that realize symbiosis with humans failing due to single-event effects caused by radiation can be suppressed.

[0126] Outline of the Judgment Model Here, we will explain the outline of the judgment model. As mentioned above, ethical considerations are necessary for the coexistence of humans and autonomous robots, so the judgment model according to this embodiment is for performing a process to determine deviations from predetermined behavioral norms.

[0127] Figure 16 shows an example of a code of conduct according to this embodiment. As shown in the figure, for example, the image of the code of conduct is one in which behavioral actions that comply with the code of conduct and behavioral actions that deviate from the code of conduct are predetermined from multiple perspectives such as SDGs. Furthermore, since the code of conduct required for autonomous robots in a semiconductor manufacturing plant differs from that required for autonomous robots in a medical setting, it is also possible to create judgment models with separate codes of conduct for semiconductor manufacturing plants and medical settings.

[0128] Figure 16 is an example illustrating the concept of a code of conduct; the code of conduct in this embodiment can be any code of conduct as long as it defines an ethical perspective. In this embodiment, coexistence between humans and autonomous robots can be realized based on the ethical perspective defined by such a predetermined code of conduct.

[0129] By changing the evaluation axis in the interpretation model, it is possible to construct strategies trained to different behavioral norms. Specifically, when creating a judgment model for an autonomous robot that performs behavioral actions in a semiconductor manufacturing plant, the prompt that requests a judgment from the interpretation model could be "whether the situation is an action that complies with the behavioral norms required of employees in a semiconductor manufacturing plant, or an action that deviates from those norms." Similarly, when creating a judgment model for an autonomous robot that operates in a medical setting, the prompt that requests a judgment from the interpretation model could be "whether the situation is an action that complies with the behavioral norms required of employees in a medical setting, or an action that deviates from those norms."

[0130] Figure 17, which shows the implementation of multiple judgment models, illustrates an example of a case where the enclave according to this embodiment has judgment models based on multiple behavioral norms. Figure 17(a) shows an example based on Figure 16 in which health-related behavioral norms are stored. This is the configuration of the autonomous robot 10 described above, that is, in this configuration, deviation detection processing can be performed using one behavioral norm.

[0131] On the other hand, in Figure 17(b), two types of behavioral norms, such as peace and justice, are stored in the enclave 200. In this way, deviation detection processing can also be performed using multiple behavioral norms.

[0132] Specifically, the enclave 200 includes multiple second processing chips 210, each storing different judgment models trained on multiple different behavioral norms, and each second processing chip 210 performs deviation judgment processing using each of the judgment models. Any of the multiple second processing chips performing deviation judgment processing can determine the deviation judgment result using the multiple judgment processing results output by each of the multiple judgment models. For example, if the deviation judgment processing by any one of the multiple judgment models is affirmative (i.e., it is determined to be unethical behavior), a second signal can be generated. In this embodiment, the deviation judgment result can be determined by integrating the multiple second processing chips (by comprehensively judging the output results from the multiple second processing chips).

[0133] Furthermore, in Figure 17(c), multiple enclaves each store codes of conduct. In Figure 17(c), one enclave stores a code of conduct related to health, while the other enclave stores two codes of conduct: peace and justice.

[0134] Furthermore, as shown in Figure 17(d), it is also possible to have three or more enclaves, each storing a different code of conduct. For example, the enclave 200 includes multiple second processing chips and third processing chips, and stores multiple different judgment models. Each second processing chip then performs deviation judgment processing using each of the judgment models, and the third processing unit (third processing chip) can determine the deviation judgment result using the multiple judgment processing results output by each of the multiple judgment models. For example, the third processing chip can determine the deviation judgment result by integrating the output results of the multiple second processing chips (the third processing chip comprehensively judges the output results of the multiple second processing chips).

[0135] The integration referred to here includes, for example, determining the deviation judgment result by majority vote based on the affirmation or negation of the deviation judgment process in the output results of multiple judgment models, or generating a second signal if even one of all output results deviates from the behavioral norm, and then making a judgment based on a score. The judgment based on a score can be determined by assigning predetermined weights to each judgment model and deciding the deviation judgment result based on the priority determined by these weights. In addition, when behavioral norms conflict, a priority can be predetermined, and the deviation judgment result can be determined based on this priority.

[0136] Furthermore, each of the dedicated auxiliary storage devices may store a different judgment model, and the second processing chip may be configured to execute deviation judgment processing using the judgment models stored in the auxiliary storage devices accessible to the second processing chip.

[0137] Thus, in this embodiment, by utilizing multiple judgment models, ethical behavioral actions in the autonomous robot 10 can be determined more flexibly and accurately.

[0138] <Second Embodiment> Next, a second embodiment of the present disclosure will be described. Here, the configuration that has already been described in detail in the first embodiment will be omitted from the description. In the first embodiment, the main memory (dedicated main memory) of the enclave 200 was a dedicated area in the main memory 120 of the main system 100, but in the second embodiment, the dedicated main memory 220 is located in the circuit system of a subsystem 200 that is different from the main system 100. The dedicated main memory 220 is accessible only from the master (second processing chip, etc.) within the subsystem 200, and in the first embodiment, the dedicated main memory 220 achieved logical isolation using encryption processing, etc., but in this embodiment, the dedicated main memory 220 is physically isolated.

[0139] Figure 18 shows a block diagram of the system configuration according to a second embodiment of the present disclosure. As shown in the figure, the main memory 120 and the dedicated main memory 220 are configured to be physically independent. The subsystem 200 is isolated as a control system independent of the main system 100, and is configured as an enclave 200 in which at least a part including the second processing chip is isolated as an independent circuit system.

[0140] In this configuration, the dedicated main memory 220 transmits and receives signals and data only using a dedicated bus interface, so it cannot be directly accessed from the first processing chip 110 or the like, and therefore a better degree of inviolability in ethical judgment can be achieved.

[0141] To achieve this, a different semiconductor device than those shown in Figures 13-15 can be used. Figure 19 shows an example of a 3D package semiconductor device according to a second embodiment of this disclosure.

[0142] As shown in Figures 19(a) and (b), in the semiconductor device 530, the enclave 200 is composed of a subchip 200a configured as an SoC including a second processing chip and a dedicated main memory 220a positioned (wired) directly above the subchip 200a. The dedicated main memory 220a is positioned directly above the second processing chip 210. In addition, in the semiconductor device, the enclave may be composed of a subchip and a dedicated auxiliary memory device positioned directly above the subchip.

[0143] Furthermore, as shown in Figure 19(c), the semiconductor device 530 has a silicon bridge configured similarly to the semiconductor device 520 shown in Figure 15, enabling high-speed processing between the enclave 200 and the memory devices 400. For example, as shown in Figure 20, a semiconductor device 540 can be configured using optical I / O in a 3D package semiconductor device. Such a configuration makes it easier to transmit and receive signals to and from the outside world and enables higher speeds.

[0144] Figure 21 of the modified semiconductor device shows several schematic modifications of the semiconductor device of this disclosure. Some of these configurations technically or structurally overlap with Embodiments 1 and 2, but they are explained using simplified drawings for clarity.

[0145] Figures 21(a) and 21(b) show semiconductor devices in a 2D package. Figure 21(a) consists only of the enclave 200 and memory components 400, while Figure 21(b) consists of the enclave 200, main system 100, and memory components 400.

[0146] Figures 21(c) and 21(d) show semiconductor devices in 2.xD and 2.5D packages. In Figure 21(c), the enclave 200 is directly connected to the substrate, and the main system 100 is connected to the interposer. For example, the positions of the enclave 200 and the main system 100 can be swapped in Figure 21(c).

[0147] In Figure 21(d), the main system 100, enclave 200, and memory devices 400 are connected to the same interposer. Alternatively, the positions of the main system 100 and enclave 200 may be swapped in Figure 21(d). Figure 21(e) shows a semiconductor device in a 3D package. In this configuration, a dedicated main memory device 220a is positioned (connected) directly above the second processing chip 200a. Furthermore, the configurations of the semiconductor devices (and autonomous robots) according to the above embodiments can be combined as appropriate.

[0148] As described above, according to this disclosure, by using a determination model capable of performing deviation detection processing from a predetermined code of conduct, it is possible to realize an autonomous robot that can coexist with humans in light of the singularity, and a semiconductor device for realizing this.

[0149] Furthermore, while this disclosure primarily describes autonomous robots and semiconductor devices in detail, it goes without saying that the same effects can be obtained with respect to control methods and control programs for autonomous robots.

[0150] The disclosures herein include the following: an autonomous robot, a method for controlling the autonomous robot, a control program for the autonomous robot, and a semiconductor device.

[0151] (Item 1) An autonomous robot having an operating unit and a processing unit, wherein the operating unit converts a signal from the processing unit into an action of the autonomous robot, the processing unit has a first processing unit and a second processing unit, the signal has a first signal and a second signal, the first processing unit generates the first signal which controls the operating unit including operation and stopping, the second processing unit inputs the first signal to a judgment model and performs a deviation determination process from the behavioral norms of the autonomous robot relating to the first signal, and the processing unit generates the second signal which suppresses the action if the deviation determination result is a positive determination.

[0152] (Item 2) The autonomous robot according to Item 1, wherein the suppression of the behavioral action by the second signal includes canceling the control command by the first signal for executing the behavioral action, or forcibly stopping the behavioral action being performed by the operating unit by the first signal.

[0153] (Item 3) The autonomous robot according to Item 1 or 2, wherein the first processing unit is a first processing chip, the second processing unit is a second processing chip, the autonomous robot comprises a main system including the first processing chip and the operating unit, and a subsystem including the second processing chip and storing the determination model, the subsystem is an enclave isolated as a control system independent of the main system, and at least a part of it, including the second processing chip, is isolated as an independent circuit system.

[0154] (Item 4) The autonomous robot according to any one of Items 1 to 3, wherein the main system further includes a main memory and an auxiliary memory connected so as to be accessible from the first processing chip, the subsystem further includes a dedicated auxiliary memory connected to the second processing chip and accessible only from the second processing chip, and the dedicated auxiliary memory stores the determination model.

[0155] (Item 5) The autonomous robot according to any one of Items 1 to 4, wherein the subsystem is further connected to the second processing chip and includes a dedicated main memory accessible only from the second processing chip, and the second processing chip performs the deviation determination process using the determination model and the dedicated main memory.

[0156] (Item 6) The subsystem further includes a dedicated area accessible only from the second processing chip configured in the main memory, and the second processing chip performs the deviation determination process using the determination model and the dedicated area, the autonomous robot according to any one of Items 1 to 4.

[0157] (Item 7) The auxiliary storage device is an autonomous robot according to any one of Items 1 to 6, which stores the control model for the control.

[0158] (Item 8) The autonomous robot according to any one of Items 1 to 7, wherein the subsystem includes a plurality of second processing chips, which store a plurality of different judgment models, each second processing chip performs the deviation judgment process using each of the judgment models, and any of the plurality of second processing chips that perform the deviation judgment process determines the deviation judgment result using a plurality of judgment processing results output by each of the plurality of judgment models.

[0159] (Item 9) The autonomous robot according to any one of Items 1 to 7, wherein the subsystem includes a plurality of second processing chips and a third processing chip, stores a plurality of different judgment models, each second processing chip performs the deviation judgment process using each of the judgment models, and the third processing chip determines the deviation judgment result using a plurality of judgment processing results output by each of the plurality of judgment models.

[0160] (Item 10) The autonomous robot according to Item 8 or 9, wherein the subsystem includes a plurality of dedicated auxiliary storage devices that each of the second processing chips can uniquely access, each of the dedicated auxiliary storage devices storing a different determination model, and the second processing chip performs the deviation determination process using the determination models stored in the dedicated auxiliary storage devices that the second processing chip can access.

[0161] (Item 11) The judgment model is an autonomous robot described in any of items 1 to 10, which are trained models.

[0162] (Item 12) The autonomous robot described in Item 11, wherein the decision model is a trained model that has learned features based on the first signal and compliance with or deviation from the code of conduct.

[0163] (Item 13) The autonomous robot according to any one of Items 1 to 11, wherein the decision model is a trained model that has undergone reinforcement learning processing to update the probability distribution of the policy based on an evaluation result based on features obtained from the environment after the agent has performed the behavioral action based on the first signal, the evaluation result concerning an ethical interpretation of the situation in which the environment is situated, and the decision result indicating the ethical interpretation output from the decision model by inputting features including the first signal.

[0164] (Item 14) The autonomous robot described in Item 13, wherein the environment is a virtual environment.

[0165] (Item 15) The autonomous robot described in Item 14, wherein the virtual environment is a world model generated using one or more image data acquired in a physical environment or another virtual environment, and the agent's behavioral information.

[0166] (Item 16) The autonomous robot according to any one of Items 11 to 15, further comprising a detection unit, an extraction unit, an interpretation unit, and a learning unit, wherein the detection unit detects environmental information indicating the state of the physical environment after the autonomous robot has performed the behavioral action based on the first signal, the extraction unit extracts feature quantities from the environmental information, the interpretation unit estimates an ethical interpretation of the state indicated by the environmental information based on the feature quantities, and the learning unit performs reinforcement learning processing to update the probability distribution of the policy based on the interpretation result of the ethical interpretation and a judgment result indicating the ethical interpretation output from the judgment model by inputting the feature quantities including the first signal.

[0167] (Item 17) The autonomous robot according to Item 16, wherein the interpretation result is an interpretation result regarding the code of conduct output from a large language model based on the features and the request for interpretation of compliance with or deviation from the code of conduct.

[0168] (Item 18) The autonomous robot according to any one of Items 1 to 17, wherein the moving parts constitute the joints of the autonomous robot and are connected to the head, shoulders, torso, arms, waist, and legs of the autonomous robot, respectively.

[0169] (Item 19) An autonomous robot according to any one of Items 1 to 17, comprising a main body and a moving mechanism, wherein the main body has an action unit that performs the output of sound or video as the action.

[0170] (Item 20) An autonomous robot according to any one of Items 1 to 17, comprising a main body, a moving mechanism, an arm, and one or more joints connected to at least the main body and the arm, wherein the joints have an action unit that performs a motor action as the behavioral action.

[0171] (Item 21) An autonomous robot according to any one of Items 1 to 17, comprising a main body, a moving mechanism, an arm, and one or more joints connected to at least the main body and the arm, wherein the operating unit includes a first operating unit and a second operating unit, the main body has the first operating unit which performs the output of sound or video as the behavioral action, and the joints have the second operating unit which performs the movement as the behavioral action.

[0172] (Item 22) An autonomous robot according to any one of Items 19 to 21, wherein the mobility mechanism is a wheel, track, flight mechanism, or legged mobility mechanism.

[0173] (Item 23) The autonomous robot according to any one of Items 1 to 22, further comprising a detection unit, the detection unit acquires external environment data provided by the external environment including at least static or moving objects around the autonomous robot, the first processing unit further generates the first signal using the external environment data, and the second processing unit further performs the deviation determination process using the external environment data.

[0174] (Item 24) The autonomous robot according to Item 23, wherein the detection unit includes one or any combination of a sound sensor, a distance sensor, a temperature sensor, an odor sensor, a humidity sensor, a barometric pressure sensor, a gas sensor, a weight sensor, and a pressure sensor.

[0175] (Item 25) A semiconductor device for an autonomous robot having an operating unit, comprising a processing chip including a first processing chip and a second processing chip, wherein the first processing chip generates a first signal which is a signal for controlling the operating unit, including operation and stopping, and which is converted into a behavioral action of the robot when executed by the operating unit, the second processing chip performs a deviation determination process from a behavioral norm using the first signal and a determination model, and the processing chip generates a second signal which suppresses the behavioral action if the deviation determination result is a positive determination.

[0176] (Item 26) The semiconductor device according to Item 25, wherein the suppression of the behavioral operation by the second signal includes the cancellation of a control command by the first signal for performing the behavioral operation, or the forced cessation of the behavioral operation being performed by the operating unit by the first signal.

[0177] (Item 27) The semiconductor device according to Item 25 or Item 26, further comprising a dedicated auxiliary storage device capable of storing the determination model representing the code of conduct, wherein the second processing chip and the dedicated auxiliary storage device are mounted in an enclave.

[0178] (Item 28) The semiconductor device according to any one of Items 25 to 27, further comprising a common bus interface and a dedicated bus interface, a main memory and an auxiliary memory, and a dedicated main memory and a dedicated auxiliary memory, wherein the common bus interface connects the first processing chip, the operating unit, the main memory and the auxiliary memory and the second processing chip, the dedicated bus interface connects the second processing chip, the dedicated main memory and the dedicated auxiliary memory, and the dedicated main memory and the dedicated auxiliary memory are isolated as control systems and circuit systems independent of the first processing chip, the main memory and the auxiliary memory.

[0179] (Item 29) The semiconductor device according to any one of Items 25 to 28, wherein the second processing chip includes a group of field-effect transistors having nanosheet channels.

[0180] (Item 30) The semiconductor device according to Item 25, wherein the second processing chip includes a group of field-effect transistors having fork sheet channels.

[0181] (Item 31) The second processing chip is a semiconductor device according to any one of items 25 to 30, including a group of CFETs (complementary field-effect transistors).

[0182] (Item 32) The semiconductor device according to any one of Items 25 to 31, wherein the first processing chip and the second processing chip are connected to the same interposer substrate.

[0183] (Item 33) The dedicated main memory is a semiconductor device according to any one of Items 25 to 32, located directly above the second processing chip.

[0184] (Item 34) A semiconductor device according to any one of Items 25 to 33, wherein the first processing chip and the second processing chip are connected to the operating unit, the operating unit converts the first signal and the second signal into the behavioral operation, the first processing chip performs the control including the operation and stopping of the operating unit by inputting the first signal to the operating unit, and the processing chip suppresses the behavioral operation by inputting the second signal to the operating unit.

[0185] (Item 35) The semiconductor device according to any one of items 25 to 37, wherein the first processing chip and the second processing chip are connected to a detection unit, the detection unit acquires external environment data provided by the external environment including at least static or moving objects around the autonomous robot, the first processing chip further generates the first signal using the external environment data, and the second processing chip further performs the deviation determination process using the external environment data.

[0186] (Item 36) The semiconductor device according to Item 35, wherein the detection unit includes one or any combination of a sound sensor, a distance sensor, a temperature sensor, an odor sensor, a humidity sensor, a barometric pressure sensor, a gas sensor, a weight sensor, and a pressure sensor.

[0187] (Item 37) A method for controlling an autonomous robot having an operating unit and a processing unit, wherein the operating unit converts a signal from the processing unit into an action of the autonomous robot, the processing unit has a first processing unit and a second processing unit, the signal has a first signal and a second signal, the first processing unit generates the first signal which controls the operating unit including operation and stopping, the second processing unit inputs the first signal to a judgment model and performs a deviation determination process from the behavioral norms of the autonomous robot relating to the first signal, and the processing unit generates the second signal which suppresses the action if the deviation determination result is a positive determination.

[0188] (Item 38) A control program for an autonomous robot, wherein the autonomous robot has a processing unit having a first processing unit and a second processing unit, and an operating unit that converts signals from the processing unit into behavioral actions of the autonomous robot, and the control program causes the second processing unit to input a first signal generated by the first processing unit, which controls the operating unit including operation and stopping, into a judgment model, to execute a deviation determination process from the behavioral norms of the autonomous robot relating to the first signal, and if the deviation determination result is a positive determination, to generate a second signal for suppressing the behavioral action.

[0189] (Item 39) A system comprising at least a driving means and a calculation means, wherein the driving means converts a signal from the calculation means into a kinetic operation of the system, the calculation means comprises a first calculation means and a second calculation means, the signal comprises a first signal and a second signal, the first calculation means controls the driving means, including operation and stopping, by inputting the first signal to the driving means, the second calculation means inputs the first signal to a determination model to determine the degree of deviation of the system from the behavioral norms relating to the first signal, and inputs the second signal to the driving means to forcibly stop the driving means based on the result of the determination.

[0190] (Item 40) The system according to Item 39, wherein the first arithmetic means is connected to a first main memory means and a first auxiliary storage means, and the second arithmetic means is connected to a second main memory means and a second auxiliary storage means, the first main memory means is accessible from the first and second arithmetic means and operates as the main memory means for the control, the first auxiliary storage means is accessible from the first and second arithmetic means and stores the control model for the control, the second main memory means is accessible only from the second arithmetic means and operates as the main memory means for the determination and forced termination, and the second auxiliary storage means is accessible only from the second arithmetic means and stores the determination model.

[0191] (Item 41) The system according to Item 39 or Item 40, wherein each of the plurality of the second arithmetic means inputs the first signal to each of the plurality of the determination models corresponding to each of the plurality of the plurality of the second arithmetic means and performs the determination, the second auxiliary storage means includes a plurality of the second auxiliary storage means that are uniquely accessible from each of the plurality of the plurality of the second arithmetic means, and each of the plurality of the second auxiliary storage means corresponding to each of the plurality of the determination models stores each of the plurality of determination models.

[0192] (Item 42) The decision model is a system described in any of Items 39 to 41, which are trained models.

[0193] (Item 43) The system according to any one of Items 39 to 42, wherein the driving means is connected to the head, shoulders, torso, arms, waist, and legs, and each of the head, shoulders, torso, arms, waist, and legs is connected to joints.

[0194] (Item 44) A system according to any one of items 39 to 43, further comprising a detection means, wherein the detection means detects an object including a stationary or moving object in the vicinity of the system, and the second calculation means further inputs the result of the detection into the determination model and performs the determination.

[0195] (Item 45) A semiconductor device having at least a first arithmetic means and a second arithmetic means, wherein the first arithmetic means and the second arithmetic means are connected to a first main memory means and a first auxiliary storage means via a first interface, the second arithmetic means is connected to a second auxiliary storage means accessible only from the second arithmetic means via a second interface, and the second main memory means is connected to a second main memory means accessible only from the second arithmetic means via the first or second interface, and the second auxiliary storage means stores a determination model capable of determining the degree of deviation from a code of conduct.

[0196] (Item 46) The semiconductor device according to Item 45, wherein the first and second calculation means are connected to the same interposer substrate.

[0197] (Item 47) The semiconductor device according to Item 45 or Item 46, wherein the second main memory means is located directly above the second arithmetic means.

[0198] (Item 48) A semiconductor device according to any one of items 45 to 47, wherein the first and second calculation means are connected to a driving means, the driving means converts signals from the first and second calculation means into kinetic motion, the signals include a first signal and a second signal, the first calculation means controls the driving means, including operation and stopping, by inputting the first signal to the driving means, the second calculation means inputs the first signal to the determination model to determine the degree of deviation, and inputs the second signal to the driving means to forcibly stop the driving means based on the result of the determination.

[0199] (Item 49) The semiconductor device according to any one of Items 45 to 48, wherein the first and second calculation means are further connected to a detection means, the detection means detects objects including stationary or moving objects around the device, and the second calculation means further inputs the result of the detection to the determination model and performs the determination.

[0200] (Item 50) The semiconductor device according to Item 49, wherein the detection means has an image sensor for performing the detection.

[0201] The present invention is not limited to the embodiments described above, and various modifications and variations are possible without departing from the spirit and scope of the invention.

[0202] 10 Autonomous robot 20 Human 100 Main system 110 First processing chip (first processing unit, motion control unit) 120 Main memory 130 Auxiliary memory 140 Sensor (detection unit) 150 Actuator (motion unit) 200 Enclave (subsystem) 210 Second processing chip (second processing unit) 211 Learning unit 212 Judgment unit 213 Extraction unit 214 Generation unit 220 Dedicated main memory (dedicated area) 230 Dedicated auxiliary memory 240 AI / ML accelerator 250 Encryption accelerator 260 Dedicated boot ROM 270 Memory protection 310 Common bus interface 320 Dedicated bus interface 400 Memory components 500 Semiconductor device in 2D package 510 2. xD package semiconductor device 520 2.5D package semiconductor device 530 3D package semiconductor device 540 Semiconductor device with optical I / O 600 Information processing device 610 Processing unit 620 Storage unit 630 Communication unit

Claims

An autonomous robot having an operating unit and a processing unit, The operating unit converts the signals from the processing unit into the behavioral actions of the autonomous robot. The aforementioned processing unit has a first processing unit and a second processing unit, The aforementioned signal comprises a first signal and a second signal. The first processing unit generates the first signal which controls the operation and stopping of the operating unit, The second processing unit inputs the first signal to the determination model and performs a deviation determination process from the behavioral norms of the autonomous robot relating to the first signal. The processing unit generates the second signal that suppresses the behavioral action when the deviation determination result is a positive determination, for an autonomous robot.   The autonomous robot according to claim 1, wherein the suppression of the behavioral action by the second signal includes canceling the control command by the first signal for executing the behavioral action, or forcibly stopping the behavioral action being performed by the operating unit by the first signal.   The first processing unit is a first processing chip, The second processing unit is a second processing chip, The autonomous robot comprises a main system including the first processing chip and the operating unit, and a subsystem including the second processing chip and storing the determination model. The autonomous robot according to claim 1, wherein the subsystem is an enclave isolated as a control system independent of the main system, and at least a portion of it, including the second processing chip, is isolated as an independent circuit system.   The main system further includes a main memory and an auxiliary memory connected to the first processing chip so as to be accessible from the first processing chip. The subsystem is further connected to the second processing chip and includes a dedicated auxiliary storage device accessible only from the second processing chip, The autonomous robot according to claim 3, wherein the dedicated auxiliary storage device stores the determination model.   The subsystem is further connected to the second processing chip and includes a dedicated main memory accessible only from the second processing chip, The autonomous robot according to claim 4, wherein the second processing chip performs the deviation determination process using the determination model and the dedicated main memory.   The subsystem further includes a dedicated area accessible only from the second processing chip configured in the main memory, The autonomous robot according to claim 4, wherein the second processing chip performs the deviation determination process using the determination model and the dedicated area.   The autonomous robot according to claim 4, wherein the auxiliary storage device stores a control model for the control.   The subsystem includes a plurality of the second processing chips and stores a plurality of different decision models, Each of the second processing chips executes the deviation determination process using each of the determination models. The autonomous robot according to claim 3, wherein any of the multiple second processing chips that perform the deviation determination process determines the deviation determination result using the multiple determination processing results output by each of the multiple determination models.   The subsystem includes a plurality of second processing chips and a third processing chip, and stores a plurality of different decision models. Each of the second processing chips executes the deviation determination process using each of the determination models. The autonomous robot according to claim 3, wherein the third processing chip determines the deviation determination result using a plurality of determination processing results output by each of the plurality of determination models.   The subsystem includes a plurality of dedicated auxiliary storage devices that each of the second processing chips can uniquely access, Each of the dedicated auxiliary storage devices stores a different determination model. The autonomous robot according to claim 8 or 9, wherein the second processing chip performs the deviation determination process using the determination model stored in the auxiliary storage device accessible by the second processing chip.   The autonomous robot according to claim 1, wherein the decision model is a pre-trained model.   The autonomous robot according to claim 1, wherein the decision model is a trained model that has learned features based on the first signal and compliance with or deviation from the code of conduct.   The aforementioned determination model is an evaluation result based on features obtained from the environment after the agent has performed the behavioral action based on the first signal, and includes an evaluation result concerning an ethical interpretation of the situation in which the environment is located. The autonomous robot according to claim 1, which is a trained model that has undergone reinforcement learning processing to update the probability distribution of the policy based on the judgment result indicating the ethical interpretation output from the judgment model by inputting a feature quantity including the first signal.   The autonomous robot according to claim 13, wherein the environment is a virtual environment.   The autonomous robot according to claim 14, wherein the virtual environment is a world model generated using one or more image data acquired in a physical environment or another virtual environment, and the agent's behavioral information.   The autonomous robot further comprises a detection unit, an extraction unit, an interpretation unit, and a learning unit. The detection unit detects environmental information indicating the state of the physical environment after the autonomous robot has performed the behavioral action based on the first signal. The extraction unit extracts feature quantities from the environmental information, The interpretation unit estimates an ethical interpretation of the state indicated by the environmental information based on the feature quantities, The autonomous robot according to claim 1, wherein the learning unit performs reinforcement learning processing to update the probability distribution of the policy based on the interpretation result of the ethical interpretation and the judgment result indicating the ethical interpretation output from the judgment model by inputting the feature quantity including the first signal.   The autonomous robot according to claim 16, wherein the interpretation result is an interpretation result regarding the code of conduct output from a large-scale language model based on the features and the request for interpretation of compliance with or deviation from the code of conduct.   The autonomous robot according to claim 1, wherein the operating unit constitutes the joint of the autonomous robot and is connected to the head, shoulders, torso, arms, waist, and legs of the autonomous robot, respectively.   It comprises a main body and a moving mechanism, The autonomous robot according to claim 1, wherein the main body has an action unit that performs the output of sound or video as the action.   It comprises a main body, a moving mechanism, an arm, and at least one or more joints connected to the main body and the arm, The autonomous robot according to claim 1, wherein the joint portion has the action portion which performs a motor action as the action action.   It comprises a main body, a moving mechanism, an arm, and at least one or more joints connected to the main body and the arm, The aforementioned operating unit includes a first operating unit and a second operating unit, The main unit has a first operating unit that performs the output of sound or video as the action, The autonomous robot according to claim 1, wherein the joint portion has a second action unit that performs a motor action as the behavioral action.   The autonomous robot according to any one of claims 19 to 21, wherein the mobility mechanism is a wheel, an endless track, a flight mechanism, or a legged mobility mechanism.   The aforementioned autonomous robot further comprises a detection unit, The detection unit acquires external environment data provided by the external environment, which includes at least static or moving objects around the autonomous robot. The first processing unit further generates the first signal using the external environment data, The autonomous robot according to claim 1, wherein the second processing unit further uses the external environment data to perform the deviation determination process.   The autonomous robot according to claim 23, wherein the detection unit includes one or any combination of a sound sensor, a distance sensor, a temperature sensor, an odor sensor, a humidity sensor, a barometric pressure sensor, a gas sensor, a weight sensor, and a pressure sensor.   A semiconductor device for an autonomous robot having a moving part, The processing chip includes a first processing chip and a second processing chip, The first processing chip generates a first signal that controls the operation and stopping of the operating unit, which is executed by the operating unit and converted into the robot's behavioral motion. The second processing chip performs deviation detection processing from the behavioral norm using the first signal and the determination model. The processing chip is a semiconductor device that generates a second signal to suppress the behavioral action if the deviation determination result is a positive determination.   The semiconductor device according to claim 25, wherein the suppression of the behavioral operation by the second signal includes canceling a control command by the first signal for executing the behavioral operation, or forcibly stopping the behavioral operation being performed by the operating unit by the first signal.   The semiconductor device further includes a dedicated auxiliary storage device capable of storing the determination model representing the code of conduct, The semiconductor device according to claim 25, wherein the second processing chip and the dedicated auxiliary storage device are mounted in an enclave.   The semiconductor device further comprises a common bus interface and a dedicated bus interface, a main memory and an auxiliary storage device, and a dedicated main memory and a dedicated auxiliary storage device. The common bus interface connects the first processing chip, the operating unit, the main memory, the auxiliary storage, and the second processing chip. The dedicated bus interface connects the second processing chip, the dedicated main memory, and the dedicated auxiliary storage device. The semiconductor device according to claim 27, wherein the dedicated main memory and the dedicated auxiliary memory are isolated as a control system and circuit system independent of the first processing chip, the main memory, and the auxiliary memory.   The semiconductor device according to claim 25, wherein the second processing chip includes a group of field-effect transistors having nanosheet channels.   The semiconductor device according to claim 25, wherein the second processing chip includes a group of field-effect transistors having fork sheet channels.   The semiconductor device according to claim 25, wherein the second processing chip includes a group of CFETs (complementary field-effect transistors).   The semiconductor device according to claim 25, wherein the first processing chip and the second processing chip are connected to the same interposer substrate.   The semiconductor device according to claim 28, wherein the dedicated main memory is located directly above the second processing chip.   The first processing chip and the second processing chip are connected to the operating unit, The operating unit converts the first signal and the second signal into the action, The first processing chip performs the control, including the operation and stopping of the operating unit, by inputting the first signal to the operating unit. The semiconductor device according to claim 25, wherein the processing chip suppresses the active operation by inputting the second signal to the operating unit.   The first processing chip and the second processing chip are connected to the detection unit. The detection unit acquires external environment data provided by the external environment, which includes at least static or moving objects around the autonomous robot. The first processing chip further generates the first signal using the external environment data, The semiconductor device according to claim 25, wherein the second processing chip further performs the deviation determination process using the external environment data.   The semiconductor device according to claim 35, wherein the detection unit includes one or any combination of a sound sensor, a distance sensor, a temperature sensor, an odor sensor, a humidity sensor, a barometric pressure sensor, a gas sensor, a weight sensor, and a pressure sensor.   A control method for an autonomous robot having an operating unit and a processing unit, The operating unit converts the signals from the processing unit into the behavioral actions of the autonomous robot. The aforementioned processing unit has a first processing unit and a second processing unit, The aforementioned signal comprises a first signal and a second signal. The first processing unit generates the first signal which controls the operation and stopping of the operating unit, The second processing unit inputs the first signal to the determination model and performs a deviation determination process from the behavioral norms of the autonomous robot relating to the first signal. A control method for an autonomous robot, wherein the processing unit generates the second signal that suppresses the behavioral action if the deviation determination result is a positive determination.   This is a control program for an autonomous robot. The autonomous robot has a processing unit having a first processing unit and a second processing unit, It includes an operating unit that converts signals from the processing unit into behavioral actions of the autonomous robot, The control program causes the second processing unit to input a first signal, generated by the first processing unit, which performs control including the operation and stopping of the operating unit, to the determination model. The process for determining deviation from the behavioral norms of the autonomous robot in relation to the first signal is executed. A control program for an autonomous robot that generates a second signal to suppress the aforementioned behavioral action if the deviation judgment result is positive.