A warehouse inventory management and optimization method based on artificial intelligence

By establishing a grid model in the warehousing system and using deep Q-networks (DQN) to optimize picking strategies, the problem of low processing efficiency of traditional warehousing systems for small-batch, multi-category orders is solved, achieving more efficient inventory management and space utilization.

CN120494697BActive Publication Date: 2026-06-26WUXI INSTITUTE OF TECHNOLOGY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
WUXI INSTITUTE OF TECHNOLOGY
Filing Date
2025-05-30
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Traditional warehousing systems struggle to efficiently handle small-batch, multi-category orders. Inefficient storage location allocation leads to long search times, complex picking routes, inflexible scheduling, and poor adaptability.

Method used

By establishing a warehouse inventory grid model and using a deep Q-network (DQN) to train an order picking scheme optimization model, the optimal picking strategy is generated. This strategy comprehensively considers the product storage location, pallet partitioning, order priority, and system status, thereby reducing the stacker crane travel distance and the number of picking operations.

Benefits of technology

Shorten order processing time, improve warehousing efficiency, reduce operating costs, enhance the accuracy and flexibility of inventory management, achieve full pallet clearance, and improve space utilization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120494697B_ABST
    Figure CN120494697B_ABST
Patent Text Reader

Abstract

The present application relates to a kind of based on artificial intelligence warehouse inventory management and optimization method, by comprehensively considering commodity storage location, tray partition, order priority and system state etc., construct new mathematical model, and use deep reinforcement learning DQN network solution, realize intelligent decision commodity pick-up location and generate optimal picking path;The method provided by the present application can effectively shorten order processing time, for the demand of small batch, multi-category order, improve warehouse efficiency, reduce stacker moving distance and pick-up frequency, reduce operating cost, and can be flexibly scheduled according to priority, adapt to system real-time state change;Compared with traditional method, after adopting the optimization strategy provided by the present application, the number of tray extraction is significantly reduced, has significant beneficial effect, can be widely applied in intelligent warehousing field, improve the automation, informatization and intelligent level of warehousing management, meet the demand of modern logistics industry to efficient warehousing system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent warehousing technology, and in particular to a method for warehouse inventory management and optimization based on artificial intelligence. Background Technology

[0002] With the rapid development of information technology and the booming rise of e-commerce, the logistics industry is undergoing unprecedented changes. Against this backdrop, intelligent warehousing systems have attracted widespread attention due to their advantages in improving logistics efficiency and reducing operating costs. By integrating advanced automated equipment, IoT technology, big data analysis, and artificial intelligence algorithms, intelligent warehousing systems achieve automation, informatization, and intelligence in warehouse management, thereby improving the speed and accuracy of logistics and distribution. In many warehousing application scenarios, small-batch, multi-category orders typically involve multiple products, each with a small demand, but a wide variety of categories. Due to the complexity and variability of these products, higher demands are placed on the picking, sorting, and distribution of warehousing systems. Traditional warehousing systems focus more on the storage and distribution of large-scale, single-category products, often struggling to efficiently handle the demands of small-batch, multi-category orders. Furthermore, unreasonable allocation of storage locations leads to excessively long product search times and complex picking paths, reducing overall efficiency. At the same time, traditional warehousing systems cannot flexibly schedule according to priorities and have poor adaptability to real-time changes in system status. Summary of the Invention

[0003] In view of this, the present invention provides an artificial intelligence-based warehouse inventory management and optimization method. The method uses artificial intelligence algorithms to solve the optimal picking plan for the total number of goods required by the user, effectively solving the problem that existing warehousing systems are unable to efficiently handle small-batch, multi-category order demands, shortening order processing time, improving warehousing efficiency, and reducing operating costs.

[0004] To achieve the above objectives, the present invention provides an artificial intelligence-based warehouse inventory management and optimization method, comprising the following steps:

[0005] S1. Establish a warehouse inventory grid model using a grid, including picking stations, shelves, and aisles. Each shelf includes several pallets, each pallet includes several partitions, and each partition stores one SKU of goods in a certain inventory quantity. Mark the location of the picking station, the location of the shelf, and the location and partitioning of the pallets on each shelf in the grid model.

[0006] S2. Obtain the state space and action space;

[0007] S3. Determine the objective function and set constraints to establish an order picking scheme optimization model;

[0008] S301, The objective function is to minimize the number of trays required and maximize the clearing of the entire tray;

[0009] S302. Set constraints to meet order requirements, including order requirement constraints and pallet inventory limit constraints;

[0010] S303. Based on the established order picking plan optimization model, the expression is:

[0011] ;

[0012] in, The objective function represents the required number of pallets. This represents the objective function for clearing the entire market. Indicates the weighting coefficient;

[0013] S4. Train the order picking scheme optimization model using a deep Q-network (DQN) to generate a picking strategy;

[0014] S5. Evaluate and optimize the picking strategy, and output the optimal picking strategy.

[0015] Preferably, the state space includes order information, inventory information, stacker crane location and status information, and currently picked product information. The information contained in the state space is quantified and combined to generate a state vector to describe the current state of the warehousing system.

[0016] The action space refers to the picking actions that the stacker crane can perform in its current state.

[0017] Preferably, the objective function for the required number of trays The expression is:

[0018] ;

[0019] in, Indicates order From the tray Take the goods Quantity, This indicates the quantity of goods in stock in the warehouse. This represents the current set of task pools for orders;

[0020] Complete liquidation objective function The expression is:

[0021] ;

[0022] in, Indicates tray Chinese commodities Inventory levels.

[0023] Preferably, the order demand constraint expression is:

[0024] ;

[0025] The expression for the pallet inventory limit constraint is:

[0026] ;

[0027] in, Indicates order Required goods The quantity.

[0028] Preferably, the deep Q-network (DQN) training of the order picking scheme optimization model includes the following steps:

[0029] S401. Initialize the network weights and set the hyperparameters learning rate and experience pool size. Randomly select the initial states of the state space and action space. Based on the initial state Select the action to perform;

[0030] S402, The stacker crane interacts with the warehouse environment and executes a picking operation; the warehouse environment then returns to the next state. and reward function This refers to the new state of the storage system after an action is performed, which is a quadruple of the current state, the action performed, the reward obtained, and the experience for the next state. Stored in the experience pool;

[0031] S403, Change to the next state Consider this as the current state, and repeat steps S401 and S402 until the number of experience pools reaches the preset threshold.

[0032] S404, Update DQN network parameters;

[0033] The reward function and the next state are randomly sampled from the full experience pool. The predicted Q-value of the current DQN network for the next state and action is calculated. The Q-value of the next state of the target network is calculated. The Q-value of the next state is input into the Q-Network. The mean squared error loss between the predicted Q-value and the target Q-value is minimized through an optimization algorithm. The weights and bias parameters of the Q-Network are updated. The updated Q-Network interacts with the environment to generate experience quadruples and store them in the experience pool.

[0034] S405. Take the next state as the current state and repeat steps S403 and S404 until the Q-Network converges. The converged Q-Network is the picking strategy.

[0035] Preferably, evaluating and optimizing the generated picking strategy includes the following steps:

[0036] S501. Input the test order into the intelligent warehousing system, and the stacker crane will execute the picking task according to the picking strategy produced by the DQN network.

[0037] S502. Calculate various performance indicators during the test, including the number of pallets used, order processing time, picking path length, and full pallet clearing rate, and determine whether the preset performance requirements are met. If the preset performance requirements are not met, repeat step S404.

[0038] S503. Output the picking strategy that meets the preset performance requirements, which is the optimal picking strategy.

[0039] Compared with the prior art, the beneficial effects of the present invention are:

[0040] This invention establishes a mathematical model and combines it with a deep reinforcement learning (DQN) algorithm to intelligently determine the location of goods for picking and generate the optimal picking route. For multi-batch, multi-category inventory, it comprehensively considers the storage location of goods, pallet partitioning, order priority, and system status to reduce the movement distance and picking frequency of stacker cranes, thereby shortening order processing time and improving warehousing efficiency. At the same time, the method provided by this invention can also achieve full pallet clearing, freeing up storage space, improving the utilization rate of warehousing space, and further reducing warehousing costs. Moreover, the intelligent decision-making capability helps to improve the accuracy and flexibility of inventory management, better meeting customers' requirements for order processing speed and accuracy. Attached Figure Description

[0041] Figure 1 This is a flowchart of the process framework of the present invention;

[0042] Figure 2 This is a schematic diagram of the core process of the DQN network in this invention;

[0043] Figure 3 This is a graph showing how the reward function of this invention changes with iteration;

[0044] Figure 4 This is a schematic diagram of the intelligent warehouse inventory grid model of the present invention;

[0045] Figure 5 This is a comparison chart showing the number of tray retrievals between the optimized strategy of this invention and the traditional strategy. Detailed Implementation

[0046] To further illustrate the technical means and effects of the present invention in achieving its intended purpose, the following detailed description of the specific implementation methods, structures, features, and effects of the present invention, in conjunction with the accompanying drawings and preferred embodiments, is provided below.

[0047] Example 1

[0048] This embodiment proposes a solution for the automatic scheduling optimization of outbound warehousing orders in small batches and multiple product categories. It intelligently determines which pallet in which aisle each item should be retrieved from based on the specific requirements of the order. This decision-making process considers multiple factors, including but not limited to the storage location of the goods, pallet partitioning, order priority, and the real-time status of the warehousing system. Furthermore, in recent years, deep reinforcement learning algorithms have demonstrated powerful intelligent decision-making capabilities in various fields. Through interactive learning between the agent and the environment, they can automatically explore optimal strategies in complex environments, providing a new approach to solving outbound scheduling of warehousing orders. Simultaneously, deep reinforcement learning algorithms can handle high-dimensional state spaces and action spaces, adapting to dynamically changing environments, giving them potential application advantages in complex and ever-changing scenarios such as warehousing and logistics.

[0049] Stacker cranes are key automated equipment in warehousing systems. They are responsible for moving within aisles and can automatically identify and reach the aisles and shelves where the goods required for an order are located through a navigation system. Once they reach the designated location, the stacker crane will perform picking operations, taking the required goods off the pallet and transporting them to the next processing stage.

[0050] When an order arrives, the warehousing system needs to calculate the optimal picking plan based on the demand for each item in the order, ensuring the fewest picking operations while using the fewest number of pallets to meet the order requirements. This reduces the stacker crane's movement time and energy consumption. The core objective is to optimize the picking plan to reduce the number of pallets required, thereby improving overall warehousing efficiency.

[0051] Therefore, this embodiment provides a warehouse inventory management and optimization method based on artificial intelligence; specifically, it includes the following steps:

[0052] S1. Establish a warehouse inventory grid model using a grid, including picking stations, shelves, and aisles. Multiple layers of shelves are installed on both sides of each aisle. Each shelf includes several pallets, and each pallet includes several partitions. Each partition stores one SKU of goods in a specific inventory unit. This design allows a single pallet to store multiple products, thereby increasing storage flexibility and space utilization. Mark the picking station location, shelf location, and the location and partitioning of pallets on each shelf in the grid model, clearly defining the spatial relationships and distance information between each location.

[0053] S2. Obtain the state space and action space;

[0054] The state space includes order information (such as the demand for each item in the order), inventory information (the inventory of each item in each pallet), the location and status information of the stacker crane, and the information of the currently picked items. This information is quantified and combined to generate a state vector to describe the current state of the warehousing system.

[0055] The action space is the picking behavior that the stacker crane can perform in the current state (e.g., selecting a certain product on a specific pallet for picking; each action in the action space corresponds to a different combination of pallets and products).

[0056] S3. Determine the objective function and set constraints to establish an order picking scheme optimization model.

[0057] S301, The objective function is to minimize the number of trays required and maximize the clearing of the entire tray;

[0058] Minimizing the required number of pallets means reducing the number of racks and aisles accessed by the stacker crane, thereby reducing movement time and energy consumption. The objective function for the required number of pallets is... The expression is:

[0059] ;

[0060] in, Indicates order From the tray Take the goods Quantity, This indicates the quantity of goods in stock in the warehouse. This represents the current set of task pools for orders;

[0061] Full pallet clearing aims to maximize the utilization of storage space and reduce warehousing costs by clearing the entire pallet of goods. The objective function of full pallet clearing is... The expression is:

[0062] ;

[0063] in, Indicates tray Chinese commodities Inventory levels.

[0064] S302. Set constraints to meet order requirements, including order requirement constraints and pallet inventory limit constraints;

[0065] To ensure that for each item in each order, the total quantity of that item picked from all relevant pallets is at least equal to the order's demand for that item, we can mathematically model the order demand and picking quantity. This ensures the integrity and accuracy of the order. The order demand constraint expression is as follows:

[0066] ;

[0067] Ensuring that the total amount of a certain item taken from each pallet during the picking process does not exceed the inventory of that item on that pallet can prevent inventory shortages or data inconsistencies caused by over-picking. The pallet inventory limit constraint expression is as follows:

[0068] ;

[0069] in, Indicates order Required goods The quantity.

[0070] S303. Based on the established order picking plan optimization model, the expression is:

[0071] ;

[0072] in, The objective function represents the required number of pallets. This represents the objective function for clearing the entire market. This represents the weighting coefficient, which is adjusted by... To balance the importance of these two optimization objectives.

[0073] S4. Use a deep Q-network (DQN) to train an order picking scheme optimization model and generate picking strategies.

[0074] In the constructed order picking scheme optimization model, the objective function is non-convex, non-smooth, and non-coercive, making it difficult to solve using traditional methods. However, with the development of artificial intelligence, Deep Q-Network (DQN), as an algorithm combining deep learning and reinforcement learning, can effectively handle complex state spaces by using deep neural networks to approximate the Q-value function. The core idea of ​​DQN is to use deep neural networks to approximate the Q-value function. The Q-value function is used in reinforcement learning to evaluate the expected reward of taking a certain action in a certain state. Traditional Q-learning methods perform well when the state space is small, but in complex environments, the dimensionality of the state space is often very high, making the storage and updating of the Q-value table impractical. DQN, by introducing deep learning technology and using neural networks to approximate the Q-value function, can handle high-dimensional state spaces. Therefore, in this embodiment, the order picking scheme optimization model trained using DQN is adopted, and the core steps are as follows: Figure 2 As shown;

[0075] Construct a Deep Q-Network (DQN) consisting of an input layer, hidden layers, and an output layer. The number of neurons in the input layer should match the dimension of the state vector to receive and process state information. The hidden layers can use multiple fully connected layers or convolutional layers (if the state information has a spatial structure). The number of neurons in the output layer is consistent with the size of the action space and is used to output the Q-value for each possible action.

[0076] S401. Initialize the network weights and set the hyperparameters learning rate and experience pool size to ensure good initial performance at the start of training. Randomly select the initial states of the state space and action space. initial state This could be an empty order pool or a scenario containing a small number of initial orders, while the experience pool is empty and ready to collect experience samples, based on the initial state. Select the action to perform;

[0077] S402: The stacker crane (agent) interacts with the warehouse environment and executes picking operations. The warehouse environment then returns to the next state. and reward function This refers to the new state of the storage system after an action is performed, which is a quadruple of the current state, the action performed, the reward obtained, and the experience for the next state. Stored in the experience pool;

[0078] S403, Change to the next state Consider this as the current state, and repeat steps S401 and S402 until the number of experience pools reaches the preset threshold.

[0079] S404, Update DQN network parameters;

[0080] The reward function is obtained by randomly sampling from a full experience pool. and the next state Calculate the current DQN network's response to the next state. Based on the predicted Q-values ​​of the actions, the Q-value of the next state of the target network is calculated. This next state's Q-value is then input into the Q-Network. An optimization algorithm minimizes the mean squared error loss between the predicted Q-value and the target Q-value. The weights and bias parameters of the Q-Network are then updated. Finally, the updated Q-Network interacts with the environment to generate empirical quadruples. And store in the experience pool

[0081] S405. Take the next state as the current state and repeat steps S403 and S404 until the Q-Network converges. The converged Q-Network is the picking strategy.

[0082] S5. Evaluate and optimize the picking strategy, and output the optimal picking strategy.

[0083] S501. Input the test order into the intelligent warehousing system, and the stacker crane will execute the picking task according to the picking strategy produced by the DQN network.

[0084] S502. Calculate various performance indicators during the test, including the number of pallets used, order processing time, picking path length, and full pallet clearing rate, and determine whether the preset performance requirements are met. If the preset performance requirements are not met, repeat step S404 or adjust the DQN network structure and hyperparameters.

[0085] The process of repeated training, evaluation, and optimization is used to continuously improve the performance of the strategy until satisfactory optimization results are achieved. This can effectively solve the problem of outbound scheduling optimization for small-batch, multi-category orders in a real warehousing environment and meet the enterprise's warehousing efficiency and cost control requirements.

[0086] S503. Output the picking strategy that meets the preset performance requirements, which is the optimal picking strategy.

[0087] Example 2

[0088] In this embodiment, the intelligent warehouse map model is constructed using grid modeling, such as... Figure 3 As shown, it includes three picking stations on the far left, seven rows of shelves in the middle area, and six aisles. Each shelf has several pallets, each pallet has several sections, and each section can only store one type of stocking keeping unit (SKU) item.

[0089] Twenty orders were randomly selected from the actual order data. To simplify the calculation process, pallets containing only materials not needed in this batch of orders were not considered in this calculation. There are a total of 24 types of materials needed in this batch of orders, and there are a total of 162 pallets containing these materials. The number of pallets was reduced from the initial 1779 to 162, which is about 10 times. According to the above simplification method, the inventory matrix was reduced from 1799×920 to 162×24, and the order matrix was reduced to 20×24.

[0090] The calculation process was optimized so that the selection of whether to retrieve the corresponding material is only made when the pallet contains the material. That is, if it is determined that the material is not in the pallet, the loop will not be entered, which greatly reduces the number of calculations from 3240 loops to about 500 loops, and the amount of calculation is reduced by 6 times.

[0091] During training, the reward function Values ​​such as Figure 4As shown, by comparing the required types and quantities of materials to be retrieved with the actual types and quantities of materials retrieved, it can be concluded that the types and quantities of materials retrieved both meet the requirements of the actual order.

[0092] In the 20 orders tested, there were 27 different material requirements for different orders. According to the previous rules, it would have required 27 pallet retrievals. After optimization, the number of pallets retrieved was reduced to 24, which reduced the number of pallets required and significantly improved the overall warehousing efficiency as the number of orders increased.

[0093] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the scope of the present invention.

Claims

1. A warehouse inventory management and optimization method based on artificial intelligence, characterized in that, Includes the following steps: S1. Establish a warehouse inventory grid model using a grid, including picking stations, shelves, and aisles. Each shelf includes several pallets, each pallet includes several partitions, and each partition stores one SKU of goods in a certain inventory quantity. Mark the location of the picking station, the location of the shelf, and the location and partitioning of the pallets on each shelf in the grid model. S2. Obtain the state space and action space; The state space includes order information, inventory information, stacker crane location and status information, and currently picked product information. The information contained in the state space is quantified and combined to generate a state vector, which is used to describe the current state of the warehousing system. The action space includes the picking actions that the stacker crane can perform in the current state; S3. Determine the objective function and set constraints to establish an order picking scheme optimization model; S301, The objective function is to minimize the number of trays required and maximize the clearing of the entire tray; S302. Set constraints to meet order requirements, including order requirement constraints and pallet inventory limit constraints; S303. Based on the established order picking plan optimization model, the expression is: in, The objective function represents the required number of pallets. This represents the objective function for clearing the entire market. Indicates the weighting coefficient; S4. Train the order picking scheme optimization model using a deep Q-network (DQN) to generate a picking strategy; S5. Evaluate and optimize the picking strategy, and output the optimal picking strategy.

2. The warehouse inventory management and optimization method based on artificial intelligence according to claim 1, characterized in that, Objective function for the number of trays required The expression is: in, Indicates order From the tray Take the goods Quantity, This indicates the quantity of goods in stock in the warehouse. This represents the current set of task pools for orders; Complete liquidation objective function The expression is: in, Indicates tray Chinese commodities Inventory levels.

3. The method for warehouse inventory management and optimization based on artificial intelligence according to claim 2, characterized in that, The order demand constraint expression is as follows: The expression for the pallet inventory limit constraint is: in, Indicates order Required goods The quantity.

4. The warehouse inventory management and optimization method based on artificial intelligence according to claim 1, characterized in that, The Deep Q-Network (DQN) training of the order picking scheme optimization model includes the following steps: S401. Initialize the network weights and set the hyperparameters learning rate and experience pool size. Randomly select the initial states of the state space and action space. Based on the initial state Select the action to perform; S402, The stacker crane interacts with the warehouse environment and executes a picking operation; the warehouse environment then returns to the next state. and reward function This refers to the new state of the storage system after an action is performed, which is a quadruple of the current state, the action performed, the reward obtained, and the experience for the next state. Stored in the experience pool; S403, Change to the next state Consider this as the current state, and repeat steps S401 and S402 until the number of experience pools reaches the preset threshold. S404, Update DQN network parameters; The reward function is obtained by randomly sampling from a full experience pool. and the next state Calculate the current DQN network's response to the next state. Based on the predicted Q-values ​​of the actions, the Q-value of the next state of the target network is calculated. This next state's Q-value is then input into the Q-Network. An optimization algorithm minimizes the mean squared error loss between the predicted Q-value and the target Q-value. The weights and bias parameters of the Q-Network are then updated. Finally, the updated Q-Network interacts with the environment to generate empirical quadruples. And store it in the experience pool S405. Take the next state as the current state and repeat steps S403 and S404 until the Q-Network converges. The converged Q-Network is the picking strategy.

5. The warehouse inventory management and optimization method based on artificial intelligence according to claim 1, characterized in that, Evaluating and optimizing the generated picking strategy includes the following steps: S501. Input the test order into the intelligent warehousing system, and the stacker crane will execute the picking task according to the picking strategy produced by the DQN network. S502. Calculate various performance indicators during the test, including the number of pallets used, order processing time, picking path length, and full pallet clearing rate, and determine whether the preset performance requirements are met. If the preset performance requirements are not met, repeat step S404. S503. Output the picking strategy that meets the preset performance requirements, which is the optimal picking strategy.