A machine learning based abnormal inventory identification method

By combining machine learning methods with multi-factor feature learning and clustering algorithms, the problem of low efficiency and misjudgment in abnormal inventory identification in aviation manufacturing enterprises has been solved, achieving adaptive abnormal inventory identification and improving the efficiency and accuracy of inventory management.

CN122310291APending Publication Date: 2026-06-30CHENGDU AIRCRAFT INDUSTRY GROUP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHENGDU AIRCRAFT INDUSTRY GROUP
Filing Date
2026-03-19
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, the identification of abnormal inventory in aerospace manufacturing enterprises relies on manual operation, which is inefficient and prone to misjudgment. Multi-source inventory data lacks a dynamic learning framework, making it impossible to achieve accurate adaptive identification.

Method used

A machine learning-based approach is adopted, combining multi-factor feature learning, K-Means clustering and XGBoost algorithm to process the temporal, discrete and continuous features of inventory data. By using K-Means clustering and XGBoost algorithm to mine the potential knowledge of inventory data, adaptive identification of abnormal inventory is achieved.

Benefits of technology

It significantly improves inventory management efficiency, reduces labor and time costs, and enhances the accuracy and adaptability of abnormal inventory identification.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122310291A_ABST
    Figure CN122310291A_ABST
Patent Text Reader

Abstract

This invention discloses a machine learning-based method for identifying abnormal inventory, belonging to the field of machine learning. The method includes: collecting and filtering raw inventory data from a production management information system to obtain collected inventory data containing time, discrete, and continuous features; employing different processing strategies to extract effective features from different categories of data within the collected inventory data, and then cascading these features to form a multi-factor feature set; using the K-Means clustering method to cluster different categories of data within the multi-factor feature set to achieve inventory data segmentation; and introducing the XGBoost algorithm for each inventory data block to mine the implicit knowledge of multiple data blocks and distinguish between normal and abnormal inventory. This invention, by mining the differences and consistency among inventory samples, obtains the potential implicit knowledge of inventory samples, adaptively determines the inventory status, reduces labor costs and time consumption, and improves the efficiency and effectiveness of inventory management.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and more specifically to a method for identifying abnormal inventory based on machine learning. Background Technology

[0002] Existing inventory management methods primarily focus on refining systems and optimizing strategies from a management perspective, providing some support for enterprise inventory management. However, they generally overlook the importance of handling abnormal inventory. The abnormal inventory investigation process heavily relies on manual operation, and as enterprises expand and inventory volumes increase, labor and time costs rise significantly. This is especially true in aerospace manufacturing companies, where inventory data is characterized by complex structures, diverse categories, and massive volumes. Manual investigation is not only inefficient but also prone to misjudgment and omissions due to differences in understanding of the judgment rules among different investigators. Furthermore, existing methods lack dynamic learning frameworks for multi-source inventory data, failing to effectively reduce measurement variability caused by different data sources and hindering accurate and adaptive identification of abnormal inventory. Summary of the Invention

[0003] This invention aims to address the problems of manual labor, low efficiency, and susceptibility to misjudgment in the identification of abnormal inventory in aerospace manufacturing enterprises, as well as the large differences in the measurement of multi-source inventory data in existing technologies. It provides an abnormal inventory identification method based on machine learning, which combines multi-factor feature learning, K-Means clustering and block division with XGBoost algorithm to mine the potential knowledge of inventory data, achieve adaptive and accurate identification of abnormal inventory, reduce management costs, and improve inventory management efficiency.

[0004] To achieve the above-mentioned objectives, the technical solution of the present invention is as follows: A machine learning-based method for identifying abnormal inventory includes the following steps: a. Based on business management experience, data collection, data filtering and data cleaning are carried out on the original inventory data in the production management information system to obtain collected inventory data containing multiple data types including time characteristics, discrete characteristics and continuous characteristics. b. For different types of data in the collected inventory data, different processing strategies are used to extract effective features, and then various feature data samples are cascaded to form a multi-factor feature set; c. Use the K-Means clustering method to perform feature clustering on data samples with multi-factor feature sets to achieve inventory data segmentation; d. Introduce the XGBoost algorithm for each inventory data block to mine the implicit knowledge of multiple data blocks, distinguish between normal inventory and abnormal inventory, and achieve adaptive abnormal inventory identification.

[0005] Furthermore, the time features are directly related to the final inventory status, including the warehousing date, production start date, and last operation date; the discrete features are highly expressive of basic inventory information and inventory status, including model code, model name, task number, status, main manufacturing unit, and current specialized factory; the continuous features include data ID, MES barcode, material code, and drawing number.

[0006] Furthermore, in step b, regarding the processing of time characteristics, based on the actual operation of inventory management, one or more combinations of date interval method, Boolean processing method, and direct decomposition method are selected to mine effective information of time characteristics.

[0007] Furthermore, the date interval method includes: determining the target date for identifying the inventory status of each category based on the actual turnover of each category; selecting a time feature that is strongly correlated with the inventory dwell time and status, and calculating the number of days between it and the target date as a new time feature dimension.

[0008] Furthermore, the Boolean processing method includes: matching production information with current inventory data by project, product, and flight range to determine whether the inventory has been put into production or whether the production plan has been opened, and generating Boolean features, i.e., the plan is opened as 1 and not opened as 0.

[0009] Furthermore, the direct decomposition method includes: setting the window value to one month or one quarter, calculating the number of weeks in which the outbound time and the last operation time fall within the corresponding window, as a new time feature dimension.

[0010] Furthermore, in step b, the discrete features are processed using the ONE-HOT feature encoding strategy, which treats different values ​​of discrete features as different labels to form sparse encoding, thereby obtaining multidimensional feature vectors that are mutually perpendicular and have independent relationships.

[0011] Furthermore, in step b, the processing of continuous features adopts the Label-Encoder feature encoding strategy. Continuous features are treated as features with a weakly ordered relationship, and each feature dimension is directly converted into a continuous numerical representation, forming a new continuous feature dimension. Inventory data contains relatively few continuous features, with most being discrete features. To avoid the curse of dimensionality, this solution treats discrete inventory features with a wide range of values ​​and many feature labels, such as model codes, quality numbers, and component drawing numbers, as continuous features. For these features, this solution introduces the Label-Encoder feature encoding strategy for learning. The Label-Encoder feature encoding strategy is theoretically more suitable for processing features with an ordered sequence. However, in inventory data, although features such as model codes and quality numbers do not have an absolutely ordered sequence, the encoding rules are usually formulated based on project conditions and production management requirements, exhibiting certain regularities.

[0012] The process of using K-Means clustering to divide inventory data into blocks in step c is as follows: initialize k cluster centers, calculate the Euclidean distance from each data sample to each cluster center, and assign the sample to the cluster corresponding to the nearest cluster center; recalculate the cluster center of each cluster, iterate the above process until the cluster centers tend to stabilize, and output k inventory data blocks.

[0013] Furthermore, the objective function of the XGBoost algorithm is: In the formula, Indicates the first The true label of each sample It refers to the first The predicted label of a sample in round t-1. It is a regularization term; Regular terms The calculation formula is: ; in, Represents all leaf nodes of the i-th tree This represents the weight of the j-th leaf node, where γ and λ are control parameters. The optimized objective function obtained by performing a Taylor expansion on the objective function is: ; In the formula, , ; Let j be the set of all leaf nodes corresponding to sample j.

[0014] Furthermore, in step d, when training the XGBoost classifier, the maximum model depth is set to 5, the learning rate to 0.15, the maximum number of trees generated to 100, the model solution method to tree structure mode, and the number of classes to 2, corresponding to normal inventory and abnormal inventory respectively; Accuracy (Acc), Precision (Pre), and Recall (Recall) are used as evaluation metrics for model training and parameter tuning, and the calculation formulas for the evaluation metrics are as follows: Acc = (TP+TN) / (TP+FN+TN+FP); Pre = TP / (TP + FN); Recall = TP / (TP + FP); Where TP represents the number of correctly identified abnormal inventory samples, TN represents the number of correctly identified normal inventory samples, FP represents the number of incorrectly identified abnormal inventory samples, and FN represents the number of incorrectly identified normal inventory samples.

[0015] In summary, the present invention has the following advantages: 1. This invention constructs an abnormal inventory identification framework for aerospace manufacturing enterprises. Through multi-factor feature learning, it differentiates the temporal, discrete, and continuous features of inventory. Combining K-Means clustering and XGBoost algorithm, it achieves adaptive identification of abnormal inventory, significantly improving inventory clearance efficiency and reducing labor and time costs.

[0016] 2. This invention introduces a multi-factor feature learning strategy, adopting appropriate processing methods for different types of features to obtain more robust feature representations, effectively mining the potential knowledge of inventory data, and improving the accuracy of abnormal inventory identification.

[0017] 3. This invention utilizes K-Means clustering technology to divide multi-factor feature sets into data blocks, reducing information interference from different data sources, ensuring the effective transfer of implicit knowledge within data blocks, and improving the training and recognition performance of subsequent algorithms.

[0018] 4. The K-XGB combined algorithm proposed in this invention has been verified to be effective on real inventory datasets of aviation manufacturing enterprises, providing new technical ideas and practical solutions for inventory data management and abnormal inventory handling in aviation manufacturing enterprises. Attached Figure Description

[0019] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments, wherein: Figure 1 This is a schematic diagram of time feature processing. The target date in the diagram is May 31, 2023. The first dimension is the number of intervals between the year and the target year, the second dimension is the number of months in the current year, and the third dimension is the number of days between the entry date and the target date. Figure 2 This is a schematic diagram of discrete feature processing. In the diagram, the main process section information is processed using binary encoding. Each time, only one dimension is 1 and the others are 0. After processing, the first type of process section information corresponds to the first row and first column of the transformed matrix, the second dimension of process section information corresponds to the second row and second column, and so on to obtain a 22*22 dimensional feature matrix. Figure 3 This is a schematic diagram of continuous feature processing. The model code information is processed using continuous encoding, with each type of information corresponding to a code. Figure 4 This is a flowchart of the abnormal inventory identification method based on machine learning according to the present invention. Detailed Implementation

[0020] To more clearly illustrate the present invention, the following description, in conjunction with preferred embodiments and accompanying drawings, further clarifies the invention. Those skilled in the art should understand that the specific description below is illustrative rather than restrictive and should not be construed as limiting the scope of protection of the present invention.

[0021] This embodiment proposes a machine learning-based method for identifying abnormal inventory. Using inventory data from an aerospace manufacturing company from January 2016 to June 2023 as the experimental subject, the method includes four steps: data preparation, multi-factor feature learning, K-Means clustering data segmentation, and XGBoost anomaly identification. Figures 1 to 4 As shown, the specific implementation process is as follows: Step 1: Data Preparation ① Data Filtering: Analyze the correlation between inventory fields and inventory status to determine the raw inventory data to be collected from 21 fields, including data ID, MES barcode, model code, model name, task number, material code, drawing number, quality number, current quantity, material type, production date, status, main manufacturing unit, current specialized factory, completion rate, main manufacturing section, whether paperless, warehousing date, outbound date, last operation time, and responsible unit name.

[0022] ② Data collection: Collect inventory data of all products and flights of the aircraft manufacturing enterprise from January 2016 to June 2023, with a contract period of 2016-2023.

[0023] ③ Data preprocessing: The collected data is deduplicated and cleaned. The 50 data samples with the nearest Euclidean distance are selected, and the missing fields are filled using the local mean method, finally obtaining an inventory data sample of 284354×21.

[0024] Step 2: Multi-factor feature learning The preprocessed inventory data is differentiated based on time, discrete, and continuous features, concatenated to form a multi-factor feature set, and then a 10-fold cross-validation method is used to divide the training and test sets. ①Time feature processing: Time sample sets are formed by filtering time feature fields such as warehousing date, production start date, and last operation date; the target date for non-metallic materials is set to the last day of the quarter, and the target date for forgings and castings, metallic materials, finished products, and parts is set to the last day of half a year; the date interval method is used to calculate the number of days between the time feature fields such as warehousing date, production start date, and last operation date and the target date according to the category, forming a feature of 284354×3; Using Boolean processing, based on the production status and production plan opening status of each project and product at each stage of production, inventory data is matched to obtain information such as whether the plan is open and whether production has been completed, forming a feature of 284354×2. Based on the relationship between inventory inbound / outbound cycles and inventory status, time feature fields such as inbound date, outbound date, and last operation date are selected and processed using the direct decomposition method; the window period for inbound / outbound dates is set to quarters, and the window period for the last operation date is set to months, and the number of weeks corresponding to the inbound date, outbound date, and last operation date are calculated respectively, forming a feature of 284354×3; By cascading the above features, a time feature sample of 284354×8 is obtained.

[0025] ② Discrete Feature Processing: Nine fields, including machine code, machine name, and task number, are selected to form a discrete sample set. After uniform cleaning of the feature data, ONE-HOT encoding is used. The task number is processed as 31 dimensions, the machine code / name as 25 dimensions, the main production section as 22 dimensions, the responsible unit name as 20 dimensions, the main production unit / current specialized plant as 15 dimensions, the material type as 10 dimensions, the status as 5 dimensions, etc. Finally, a total of 284354×168 discrete feature samples are formed.

[0026] ③ Continuous feature processing: Select 10 fields with large value ranges, such as data ID, MES barcode, material code, and drawing number, to form a continuous sample set. After summarizing and cleaning, use Label-Encoder to encode and form 284354×10 continuous feature samples.

[0027] ④ Feature cascading: Concatenate time, discrete and continuous feature samples in columns to obtain a multi-factor inventory feature set of 284354×186; divide the multi-factor inventory feature set into 10 equal parts, and randomly select 9 parts as the training set and 1 part as the test set each time to iterate the algorithm training.

[0028] Step 3: Segmenting inventory data based on K-Means clustering This step utilizes unsupervised clustering technology to divide sample data from multiple sources into blocks, making samples within a class as similar as possible and samples between classes as different as possible, thereby enhancing the feature representation capabilities of different data blocks and reducing information interference caused by different data sources.

[0029] Specifically, suppose we have a data sample matrix X containing n samples, where... Indicates the first One data sample, Each object possesses m-dimensional feature attributes. The K-Means algorithm aims to cluster n objects into k specified clusters based on the similarity between data samples. Each object belongs to one and only one cluster whose distance to the cluster center is minimized. For the K-Means algorithm, k cluster centers must first be initialized. Then calculate the distance from each data sample to each cluster center. Euclidean distance can usually be used, and the calculation method is shown in Formula 2-1.

[0030] (2-1) in Represents the i-th data sample. 3D features The j-th cluster center represents the j-th cluster center. Dimensional features.

[0031] By comparing the distance of each data sample to each cluster center in turn, the sample is assigned to the cluster with the nearest cluster center, resulting in k clusters. Then, the cluster centers in each cluster are recalculated using formula (2-2).

[0032] (2-2) in Indicates the updated number Cluster center express The number of data samples in the cluster.

[0033] By iteratively calculating new cluster centers and re-dividing the data points into clusters based on the new cluster centers, the cluster centers are stabilized, and finally the clustering results are output.

[0034] For example, set the number of data blocks k=7 and initialize 7 cluster centers; calculate the Euclidean distance between each data sample and each cluster center, and assign the sample to the nearest cluster; recalculate the cluster center of each cluster, iterate the sample allocation and center update process until the cluster centers tend to stabilize, and finally output 7 inventory data blocks.

[0035] Step 4: XGBoost Algorithm for Abnormal Inventory Identification The objective function implemented by the XGBoost algorithm is expressed as follows: (2-3) The first term represents the loss between the actual value and the predicted value, which is usually a convex function. Indicates the first The true label of each sample It refers to the first The first term is the predicted label of a sample in round t-1; the second term is the regularization term, used to control the complexity of the model and reduce the risk of overfitting to some extent. Its calculation formula can be expressed as follows: (2-4) in Represents all leaf nodes of the i-th tree This represents the weight of the j-th leaf node.

[0036] To further accelerate the optimization process, formula (2-3) can be Taylor expanded and transformed into an approximate expression, as follows: (2-5) Can be ordered: (2-6) (2-7) In summary, the final objective function for optimization can be obtained as follows: (2-8) The specific operation should be carried out according to the following steps: 1) First, call Python's XGBoost algorithm library to train one XGBoost classifier for each data block; 2) Then debug the model parameters, set the maximum depth (max_depth) to 5, the learning rate (LR) to 0.15, the maximum number of trees generated (n_estimators) to 100, the model solution mode (booster) to 'gbtree' (tree structure mode), and the number of classes (num_class) to 2 (representing "normal" and "abnormal"). 3) Use accuracy (Acc), precision (Pre), and recall (Recall) for model training and parameter tuning; the specific calculation method is as follows: Acc = (TP+TN) / (TP+FN+TN+FP); Pre = TP / (TP + FN); Recall = TP / (TP + FP); Where TP represents the number of correctly identified positive inventory samples, TN represents the number of correctly identified negative inventory samples, FP represents the number of incorrectly identified positive inventory samples, and FN represents the number of incorrectly identified negative inventory samples. This scheme treats inventory data in abnormal states as positive examples and inventory data in normal states as negative examples.

[0037] 4) Apply the trained parameter set and model to output the identified inventory status results.

[0038] This invention first introduces multi-factor feature learning to extract features from the original inventory data, obtaining multi-factor feature expressions with rich potential and effective information. Then, considering the certain differences and internal consistency between different categories of inventory data, K-Means clustering technology is used to perform feature clustering on the complex and diverse inventory data, realizing inventory data block division to improve the coupling within the data block and enhance the accuracy of inventory identification. Furthermore, the XGBoost algorithm is introduced for each inventory data block to mine the implicit knowledge of multiple data blocks, distinguish between normal and abnormal inventory, and realize adaptive abnormal inventory identification, thereby providing effective support for inventory status early warning and subsequent inventory disposal.

[0039] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any simple modifications or equivalent changes made to the above embodiments based on the technical essence of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for identifying abnormal inventory based on machine learning, characterized in that, Includes the following steps: a. Based on business management experience, data collection, data filtering and data cleaning are carried out on the original inventory data in the production management information system to obtain collected inventory data containing multiple data types including time characteristics, discrete characteristics and continuous characteristics. b. For different types of data in the collected inventory data, different processing strategies are used to extract effective features, and then various feature data samples are cascaded to form a multi-factor feature set; c. Use the K-Means clustering method to perform feature clustering on data samples with multi-factor feature sets to achieve inventory data segmentation; d. Introduce the XGBoost algorithm for each inventory data block to mine the implicit knowledge of multiple data blocks, distinguish between normal inventory and abnormal inventory, and achieve adaptive abnormal inventory identification.

2. The abnormal inventory identification method based on machine learning as described in claim 1, characterized in that, The time features include the date of warehousing, the date of production, and the date of the last operation; the discrete features include the model code, model name, task number, status, main manufacturing unit, and current specialized factory; the continuous features include the data ID, MES barcode, material code, and drawing number.

3. The abnormal inventory identification method based on machine learning as described in claim 2, characterized in that, In step b, regarding the processing of time characteristics, based on the actual operation of inventory management, one or more combinations of date interval method, Boolean processing method, and direct decomposition method are selected to mine effective information of time characteristics.

4. The abnormal inventory identification method based on machine learning as described in claim 3, characterized in that, The date interval method includes: determining the target date for identifying the inventory status of each category based on the actual turnover of each category; selecting time features that are strongly correlated with the inventory dwell time and status, and calculating the number of days between them and the target date as a new time feature dimension.

5. The abnormal inventory identification method based on machine learning as described in claim 3, characterized in that, The Boolean processing method includes: matching production information with current inventory data by project, product, and flight range; determining whether the inventory has been put into production or whether the production plan has been opened; and generating Boolean features, i.e., 1 is the identifier for the plan to be opened and 0 is the identifier for the plan not to be opened.

6. The abnormal inventory identification method based on machine learning as described in claim 3, characterized in that, The direct decomposition method includes: setting the window value to one month or one quarter, calculating the number of weeks in which the outbound time and the last operation time fall within the corresponding window, and using this as a new time feature dimension.

7. The abnormal inventory identification method based on machine learning as described in claim 2, characterized in that, In step b, the discrete features are processed using the ONE-HOT feature encoding strategy, which treats different values ​​of discrete features as different labels to form sparse encoding, thereby obtaining multidimensional feature vectors that are mutually perpendicular and have independent relationships.

8. The abnormal inventory identification method based on machine learning as described in claim 2, characterized in that, In step b, the continuous features are processed using the Label-Encoder feature encoding strategy, which treats continuous features as features with weakly ordered relationships and directly converts each feature dimension into a continuous numerical representation, forming a new continuous feature dimension.

9. The abnormal inventory identification method based on machine learning as described in claim 1, characterized in that, The process of using K-Means clustering to divide inventory data into blocks in step c is as follows: initialize k cluster centers, calculate the Euclidean distance from each data sample to each cluster center, and assign the sample to the cluster corresponding to the nearest cluster center; recalculate the cluster center of each cluster, iterate the above process until the cluster centers tend to stabilize, and output k inventory data blocks.

10. The abnormal inventory identification method based on machine learning as described in claim 1, characterized in that, The objective function of the XGBoost algorithm is: In the formula, Indicates the first The true label of each sample It refers to the first The predicted label of a sample in round t-1. It is a regularization term; Regular terms The calculation formula is: ; in, Represents all leaf nodes of the i-th tree This represents the weight of the j-th leaf node, where γ and λ are control parameters. The optimized objective function obtained by performing a Taylor expansion on the objective function is: ; In the formula, , ; Let j be the set of all leaf nodes corresponding to sample j.

11. The abnormal inventory identification method based on machine learning as described in claim 10, characterized in that, In step d, when training the XGBoost classifier, the maximum model depth is set to 5, the learning rate to 0.15, the maximum number of trees generated to 100, the model solution method to tree structure mode, and the number of classes to 2, corresponding to normal inventory and abnormal inventory respectively; Accuracy (Acc), Precision (Pre), and Recall (Recall) are used as evaluation metrics for model training and parameter tuning. The calculation formulas for the evaluation metrics are as follows: Acc = (TP+TN) / (TP+FN+TN+FP); Pre = TP / (TP + FN); Recall = TP / (TP + FP); Where TP represents the number of correctly identified abnormal inventory samples, TN represents the number of correctly identified normal inventory samples, FP represents the number of incorrectly identified abnormal inventory samples, and FN represents the number of incorrectly identified normal inventory samples.