Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

167 results about "Imitation learning" patented technology

Imitation learning is learning by imitation in which an individual observes an arbitrary behavior of a demonstrator and replicates that behavior.

Learning by imitation dialogue generation method based on generative adversarial networks

The invention relates to a learning by imitation dialogue generation method based on generative adversarial networks. The method comprises the following steps: 1) building a dialogue statement expertcorpus; 2) building the generative adversarial network, wherein a generator in the generative adversarial network comprises a pair of encoder and decoder; 3) building a false corpus; 4) performing first classification training for a discriminator; 5) inputting an input statement into the generator, and training the encoder and the decoder in the generator through a reinforcement learning architecture; 6) adding an output statement generated in the step 5) into the false corpus, and continuing training the discriminator; 7) alternatively performing training of the generator and training of thediscriminator through a training mode of the generative adversarial network, until that the generator and the discriminator both are converged. Compared with the prior art, the method provided by theinvention can generate the statements more similar as that of human and avoid emergence of too much general answers, and can promote training effects of a dialogue generation model and solve a problemof extremely high frequency of the general answers.
Owner:TONGJI UNIV

Method for selecting reward function in adversarial imitation learning

The invention provides a method for selecting reward functions in adversarial imitation learning. The method comprises the following steps: constructing a strategy network with a parameter theta, a discrimination network with a parameter w and at least two reward functions; obtaining teaching data under an expert strategy and storing the teaching data into an expert data buffer containing an expert track, wherein the input of the control strategy network is a state returned by the simulation environment, and the output is a decision action; enabling the discrimination network to update parameters by using the state action pair under the expert strategy and the state action pair of the strategy network; in the award calculation stage, judging that the input of the network is a state actionpair of the strategy network, and judging that the output value is an award value obtained by award function calculation; selecting a reward function of the current task according to the sizes of theperformance indexes of different reward functions; and storing the parameters of the strategy network corresponding to the selected reward function. And the intelligent agent learns under the guidanceof different reward functions, and then selects an optimal reward function in a specific task scene according to the performance evaluation index.
Owner:SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic movement primitives

ActiveCN111618847ASolve the problem of uneven joint movementAdaptableProgramme-controlled manipulatorPattern recognitionCamera image
The invention discloses a mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic movement primitives. The mechanical arm autonomous grabbing method includes the following steps that firstly, a camera image assembly is installed, it is ensured that the recognition area is not shielded, grabbing target area images are preprocessed and sent to a deep reinforcementlearning intelligent agent as state information; secondly, a local strategy near-end optimization training model is established on the basis of the state and deep reinforcement leaning principle; thirdly, a new mixed movement primitive model is established by fusing the dynamic movement primitives and imitation learning; and fourthly, a mechanical arm is trained to autonomously grab objects on thebasis of the models. By means of the mechanical arm autonomous grabbing method, the problem that the mechanical arm joint movement based on traditional deep reinforcement learning is unsmooth can beeffectively solved, the learning problem of primitive parameters is converted into the reinforcement learning problem through combination with the dynamic movement primitive algorithm, and by means ofthe training method of deep reinforcement learning, the mechanical arm can complete the autonomous grabbing task.
Owner:NANTONG UNIVERSITY

Robot imitation learning method based on virtual scene training

PendingCN110991027ATraining model fastLate migration is fastArtificial lifeDesign optimisation/simulationData setAlgorithm
The invention discloses a robot imitation learning method based on virtual scene training. The method comprises the following steps: designing a robot model and a virtual interaction environment according to a specific task; collecting and arranging an expert data set; determining a state value space S and an action value space A according to the specific task, and determining structures of the network of a strategy generator and the network of a discriminator according to the state value space S and the action value space A; sampling data from the strategy generator, designing a parameter updating strategy, and alternately training the strategy generator and the discriminator by combining the expert data set and adopting an adversarial training method until the discriminator converges toa saddle point; and testing a network model composed of the strategy generator and the discriminator obtained by training, and taking a real environment state as input of the strategy generator so asto obtain action output. According to the method, a value return function is judged and learned; a large number of complex intermediate steps of inverse reinforcement learning with high calculation amount are bypassed; and the learning process is simpler and more efficient.
Owner:SOUTH CHINA UNIV OF TECH

Remote machine two-arm system with imitation learning mechanism and method

The invention discloses a remote machine two-arm system with an imitation learning mechanism and a method. The remote machine two-arm system comprises a demonstrator teaching module, a self-designed action executing module composed of two humanoid machine arms and a digital rudder controller, an XBOX360 body sensor perception module, a remote upper computer communication module and an imitation learning algorithm study module. The module independent power supply mode is adopted, demonstrators do demonstration actions in front of an XBOX360 sensor, the XBOX360 is used for collecting action dataof the demonstrators, and data processing is conducted on a local upper computer (server side); learning is performed through an imitation learning algorithm, and real-time communication is achievedthrough the server side and a remote upper computer (client side); and then the remote client side sends the data to the controller in real time through a serial port, and the controller receiving thedata is used for guiding the two machine arms to imitate and learn the teaching actions of the demonstrators. By means of the remote machine two-arm system with the imitation learning mechanism and the method, machine arm operation intelligence is improved; and meanwhile, operation efficiency of the machine arms in a dangerous space is greatly improved and actual application significance is realized.
Owner:BEIJING UNIV OF TECH

Intelligent automobile in-loop simulation test method based on mixed traffic flow model

The invention provides an intelligent vehicle in-loop simulation test method based on a mixed traffic flow model, and the method comprises the steps: building a mixed traffic flow model through a generative adversarial network and an Actor-Critic network, solving a traffic flow vehicle driving strategy through a near-end strategy optimization algorithm, and carrying out the interaction with the environment to form a vehicle driving track; through a discrimination model, the generated track being distinguished from the actual track and the retrograde motion, and a reward signal being provided for the traffic flow environment. According to the method, the values of multiple influence factors of the mixed traffic flow model are combined by utilizing a combined test method, so that the test times are reduced, and the influence on the test during the interaction of the factors is explored; according to the traffic flow model generation method based on generative adversarial imitation learning, a vehicle can obtain a decision similar to an actual traffic flow; the combined case test generation method based on the greedy algorithm can improve the test efficiency. According to the method, a good improvement effect is obtained through empirical analysis.
Owner:JILIN UNIV

Hierarchical reinforcement learning training method and device based on imitation learning

The invention discloses a hierarchical reinforcement learning training method and device based on imitation learning, and an electronic device. The method comprises the steps of obtaining teaching data of a human expert; conducting pre-training based on imitation learning by using the teaching data, and determining an initial strategy; and based on the initial strategy, performing retraining basedon reinforcement learning, and determining a training model. The teaching data is used for pre-training and re-training, priori knowledge and strategies are effectively utilized, the search space isreduced, and the training efficiency is improved.
Owner:INFORMATION SCI RES INST OF CETC

Unmanned aerial vehicle flight control method based on imitation learning and reinforcement learning algorithms

The invention discloses an unmanned aerial vehicle flight control method based on imitation learning and reinforcement learning algorithms. The method comprises the following steps: creating an unmanned aerial vehicle flight simulation environment simulator; defining a basic action set of flight; classifying the trajectory data according to the flight basic actions; for each flight action, learning mapping network parameters from a flight basic action to an original action by utilizing imitation learning; counting the minimum continuous action number of each basic action; constructing an upper-layer reinforcement learning network, and adding the minimum continuous action number as punishment p of aircraft action inconsistency; in the simulator, obtaining current observation information andawards, and selecting corresponding flight basic actions by using a pDQN algorithm; inputting the state information of the aircraft into an imitation learning neural network corresponding to the flight basic action, and outputting the original action of the simulator; inputting the obtained original action into a simulator to obtain observation and awards of the next moment; and performing training by using a pDQN algorithm until the strategy network on the upper layer converges.
Owner:NANJING UNIV

Mechanical arm action learning method and system based on third-person imitation learning

ActiveCN111136659ABreak the balance of the gameGame balance maintenanceProgramme-controlled manipulatorThird partyAutomatic control
The invention discloses a mechanical arm action learning method and system based on third-person imitation learning. The method and system are used for automatic control of a mechanical arm so that the mechanical arm can automatically learn how to complete a corresponding control task by watching a third-party demonstration. According to the method and system, samples exist in a video form, and the situation that a large number of sensors are needed to be used obtaining state information is avoided; an image difference method is used in a discriminator module so that the discriminator module can ignore the appearance and the environment background of a learning object, and then third-party demonstration data can be used for imitation learning; the sample acquisition cost is greatly reduced; a variational discriminator bottleneck is used in the discriminator module to restrain the discriminating accuracy of a discriminator on demonstration generated by the mechanical arm, and the training process of the discriminator module and a control strategy module is better balanced; and the demonstration action of a user can be quickly simulated, operation is simple and flexible, and the requirements for the environment and demonstrators are low.
Owner:NANJING UNIV

Imitation learning social navigation method based on feature map fused with pedestrian information

ActiveCN112965081AExpand the feasible areaReasonable and efficient perceptionInternal combustion piston enginesNavigational calculation instrumentsPoint cloudRgb image
The invention discloses an imitation learning social navigation method based on a feature map fused with pedestrian information. According to the imitation learning social navigation method, a robot is guided to imitate the movement habits of experts by introducing an imitation learning method, the navigation method conforming to the social specifications is planned, the planning efficiency is improved, the problem that the robot is locked is solved, and the robot is helped to be better integrated into a man-machine co-integration environment. The method comprises the steps of: acquiring a time sequence motion state of a pedestrian through pedestrian detection and tracking in a sequence RGB image and three-dimensional point cloud alignment; then, in combination with two-dimensional laser data and a social force model, acquiring a local feature map marked with pedestrian dynamic information; and finally, establishing a deep network with the local feature map, a current speed of the robot and a relative position of a target as input and a robot control instruction as output, training with expert teaching data as supervision, and acquiring a navigation strategy conforming to social specifications.
Owner:ZHEJIANG UNIV

Unmanned vehicle lane changing decision-making method and system based on adversarial imitation learning

The invention discloses an unmanned vehicle lane changing decision-making method and system based on adversarial imitation learning. The method comprises the steps that an unmanned vehicle lane changing decision-making task is described as a partially observable Markov decision-making process; training is carried out from examples provided by professional driving teaching by adopting an adversarial imitation learning method to obtain an unmanned vehicle lane changing decision-making model; and in the unmanned driving process of the vehicle, a vehicle lane change decision-making result is obtained through the unmanned vehicle lane change decision-making model by taking currently obtained environmental vehicle information as an input parameter of the unmanned vehicle lane change decision-making model. According to the invention, a lane changing strategy is learned from examples provided by professional driving teaching through adversarial imitation learning; a task reward function does not need to be manually designed, and direct mapping from the vehicle state to the vehicle lane changing decision can be directly established, so that the correctness, robustness and adaptivity of thelane changing decision of the unmanned vehicle under the dynamic traffic flow condition are effectively improved.
Owner:GUANGZHOU UNIVERSITY

Depth reinforcement learning strategy optimization defense method and device based on imitation learning

The invention discloses a deep reinforcement learning strategy optimization defense method and device based on imitation learning, and the method comprises the steps: building an agent automatic driving simulation environment of deep reinforcement learning, constructing a target agent based on a deep Q network in reinforcement learning, and carrying out the reinforcement learning of the target agent to optimize the parameters of the deep Q network; utilizing the parameter-optimized deep Q network to generate a state action pair sequence of the target agent at T moments as expert data, wherein an action value in a state action pair corresponds to an action with a minimum Q value; constructing an adversarial agent based on the generative adversarial network, and performing imitation learning on the adversarial agent, that is, taking the state in the expert data as the input of the generative adversarial network, and taking the expert data as a label to supervise and optimize the parameters of the generative adversarial network; and performing adversarial training on the target agent based on the state generated by the adversarial agent, and then optimizing parameters of the deep Q network to achieve deep reinforcement learning strategy optimization defense.
Owner:ZHEJIANG UNIV OF TECH

Internet of Vehicles mobile edge computing task unloading method and system based on learning pruning

The invention provides an Internet of Vehicles mobile edge computing task unloading method and system based on learning pruning, belongs to the technical field of Internet of Vehicles communication, and the method comprising the following steps: calculating vehicle parameters in an Internet of Vehicles mobile edge computing scene, calculating unloading parameters according to the vehicle parameters; constructing a task unloading utility model according to the unloading parameters; and solving an optimal solution of the task unloading utility model by using a branch and bound algorithm in combination with an imitation learning method, thereby selecting a task unloading mode in a utility optimization manner and determining computing resources obtained by bidding. On the premise of considering the mobility of the vehicle, a vehicle utility function in an Internet of Vehicles scene is established, so that the selection of the unloading mode by the vehicle and the calculation resource obtained by bidding are carried out in a utility optimization mode; when the task is selected to be unloaded to the service type vehicle, the vehicle running in different directions is selected, so that the transmission rate is increased; a branch and bound method is utilized, a pruning strategy based on learning is combined to accelerate the branch pruning process, and the complexity is reduced.
Owner:SHANDONG NORMAL UNIV

Robot demonstration teaching method based on meta-imitation learning

PendingCN111983922AFast generalizationQuick teachProgramme-controlled manipulatorArtificial lifeNetwork structureMachine
The invention discloses a robot demonstration teaching method based on meta-imitation learning, and relates to the technical field of machine learning, and the method comprises the steps: obtaining arobot demonstration teaching task set; constructing a network structure model and obtaining a self-adaptive target loss function; in the meta-training stage, learning and optimizing a loss function and initialization values and parameters of the loss function by using an algorithm I; in the meta-test stage, learning the trajectory demonstrated by the expert by using an algorithm II to obtain a learning strategy; taking the expert demonstration track as input, in combination with a learning strategy, generating a robot imitation track by utilizing the network structure model, and mapping the robot imitation track to the action of the robot in combination with the robot state information. According to the method, a new scene can be rapidly generalized from a small number of demonstration examples given by expert demonstration, specific task engineering does not need to be carried out, and the robot can self-learn strategies irrelevant to tasks according to the expert demonstration, so that a track is generated, and one-time demonstration and rapid teaching are realized.
Owner:GUANGZHOU INST OF ADVANCED TECH CHINESE ACAD OF SCI

Autonomous learning method and system of agent for man-machine cooperative work

The invention belongs to the technical field of artificial intelligence. The invention discloses an autonomous learning method and system of an agent for man-machine cooperative work. The method comprises the steps of obtaining a cooperation data set, training a cooperation agent and a simulation agent according to the cooperation data set; and assessing whether the cooperation agent and the simulation agent meet assessment requirements or not according to the obtained assessment data generated by cooperation of the trained cooperation agent and the simulation agent in the environment, judgingwhether the trained simulation agent needs new imitation learning or not if the assessment requirements are met, and ending autonomous learning of the trained cooperation agent if the assessment requirements are not met. The system comprises a cooperation agent, a simulation agent and a server. Through the scheme, the dynamic change of the environment can be adapted, the same performance effect can be obtained in the similar environment, the demonstration behaviors of different teaches can be simulated, so that the trained intelligent agent can adapt to the dynamic change of the teaches, andthe teaches with different operation levels can also achieve the same cooperation effect.
Owner:启元世界(北京)信息技术服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products