Small sample x-ray security screening contraband detection method based on fusion svm
By combining a two-stage training method with Faster R-CNN and SVM modules, the data dependency problem of identifying new contraband in X-ray security inspections was solved, enabling efficient detection of irregularly shaped contraband with limited data, thus improving recognition accuracy and adaptability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CIVIL AVIATION UNIV OF CHINA
- Filing Date
- 2022-12-30
- Publication Date
- 2026-06-19
AI Technical Summary
Existing automated X-ray security inspection algorithms require a large amount of data to detect new types of prohibited items, making it difficult to quickly deploy and identify irregularly shaped prohibited items in emergency situations.
A two-stage training method is adopted. First, the base class is trained using the Faster R-CNN model. Then, the SVM module is integrated in the fine-tuning stage, and the SVM constraint module is used to fine-tune the model with small sample data to enhance the model's ability to detect new contraband.
It enables rapid identification of novel contraband with limited data, improves the model's detection accuracy and adaptability, and supports rapid deployment and identification of irregularly shaped contraband.
Smart Images

Figure CN116403002B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of X-ray security inspection for the detection of prohibited items, and more specifically to a small-sample X-ray security inspection method that integrates SVM for the detection of prohibited items. Background Technology
[0002] To maintain public safety, my country has equipped transportation hubs such as airports, train stations, and subway entrances with security screening machines for inspecting luggage and bags. Security personnel operate these machines, using X-ray imaging to check pedestrians' bags for prohibited items. To improve efficiency, many manufacturers have begun equipping security screening machines with automated algorithms for detecting prohibited items. These algorithms typically allow computers to assist humans in automatically identifying the location and type of prohibited items from X-ray images. These automated detection algorithms play a crucial role in preventing criminals and curbing criminal activity.
[0003] In recent years, with the increasing social conflicts and the rise of new types of crimes, many criminals often choose to carry unconventional-shaped prohibited items, such as oddly shaped knives and firearms, into public areas. These oddly shaped prohibited items can also be used for criminal purposes and are more likely to slip through routine security checks, necessitating strengthened prevention measures.
[0004] However, existing automated detection algorithms all employ deep learning methods, and deep learning models are highly dependent on the amount of available supervised data. In such emergency situations, security personnel may not yet have sufficient data for training. If the sample size is insufficient, the model will struggle to detect contraband in luggage. Therefore, there is an urgent need for an automated security contraband detection technology that can achieve identification without requiring extensive data collection, enabling rapid deployment of security algorithms and a swift response to new types of contraband. Summary of the Invention
[0005] This invention overcomes the shortcomings of the prior art and provides a small-sample X-ray security inspection method for detecting contraband by incorporating SVM.
[0006] The objective of this invention is achieved through the following technical solution.
[0007] A small-sample X-ray security inspection method integrating SVM for contraband detection is proposed, employing a two-stage training approach for the basic detection model. This two-stage training method includes a first-stage base class training phase and a second-stage fine-tuning phase.
[0008] The first stage, the base class training stage, is used to train a model with basic detection capabilities for X-ray images of contraband.
[0009] The second stage, the fine-tuning stage, is used to train a model that can detect new classes in small samples.
[0010] The specific steps are as follows:
[0011] Step 1: Collect or create a large number of X-ray images of luggage packages containing contraband, labeled with tags;
[0012] Step 2: Use the large amount of base class data mentioned above to train the base classes;
[0013] Step 3: Collect a small number of X-ray images of luggage packages containing rare and unusual contraband with tags;
[0014] Step 4: Use the small amount of new data mentioned above to fine-tune the training and obtain a detection model that can detect rare and unusual contraband.
[0015] The basic detection model is a two-stage Faster R-CNN, comprising a backbone network, a region generation network, a RoI feature extraction head consisting of two fully connected layers, and finally, a classifier and regressor to perform the detection. Image information is extracted and compressed into feature information by the backbone network, and four layers of image feature information at different scales are output through a feature pyramid. The region generation network generates predicted anchor boxes. After multi-scale feature information is input into the region generation network, candidate boxes corresponding to different scales are generated. RoI pooling extracts the proposal regions corresponding to the proposal anchor boxes, and the length of their feature information is standardized. The proposal regions are... After passing through two fully connected layers, the output is have ,in For RoI feature vectors, is the dimension of the feature vector, which will be input into the classifier for the final category recognition task.
[0016] In the first stage, the base class training stage, the Faster R-CNN model is trained using a large amount of base class data to obtain general multi-level semantic information about X-ray data. The training dataset consists of all base class data. The model is composed of base class data that is used to train the model's representational ability through multiple iterations.
[0017] In the second stage, the fine-tuning stage, new class data is fused with base class data for simultaneous training. Each of the categories K samples are selected from each dataset to form the training dataset during the fine-tuning phase. The sample size is One, from the model training time Randomly sampled data is used, and parameters are adjusted using gradient descent. The parameters of the backbone network are frozen, while the remaining parameters are fine-tuned normally. In the second stage of fine-tuning, a trainable SVM constraint module embedding layer module is fused to the basic detection model to add additional loss constraints to the model, making the model parameters more adaptable to small sample tasks. (The last part, "suggested regions," appears to be incomplete and lacks context.) Before being fed into the SVM constraint module, the data is mapped by a remapping module consisting of two fully connected layers. ,have ,in .
[0018] The SVM constraint module includes an IoU filter, a multi-class SVM optimization task, and a QP interpreter.
[0019] IoU filters are used to filter out suggested area features that are of low quality.
[0020] Multi-class SVM optimization task, used to construct additional classification tasks;
[0021] The QP interpreter is used to solve the gradient propagation task of the above SVM optimization task.
[0022] The solution process for the multi-class SVM optimization task is used to classify the input vector into different categories by determining support vectors and establishing a maximum margin objective function. The specific solution process is as follows:
[0023]
[0024] In the above formula, The coefficient for the penalty term. As a relaxation factor, This is the Kronecker product. The above equation uses the QP solver provided by the qpth library to solve for the parameters and calculate the gradient by solving its KKT conditions. ;
[0025] make
[0026] The objective function in the dual space is as follows:
[0027]
[0028] In the above formula One-hot encoding for each input feature, For the mapping matrix to be solved, we have Kernel function method Calculate the inner product of the input data. Projecting onto a higher-dimensional inner product space, the objective function of the kernel function is as follows:
[0029]
[0030] In the above formula, The kernel function is characterized by;
[0031] To measure the classification performance of the module, negative log-likelihood loss is used to calculate the SVM loss, which in turn calculates the similarity between the logistic value distribution output by the SVM layer and the true logistic value distribution, as follows:
[0032]
[0033] In the second fine-tuning stage, the SVM loss constraint is added, and the final joint loss function is:
[0034]
[0035] in, The proportional coefficient used to adjust the loss balance.
[0036] The proportionality coefficient for adjusting the loss balance is 0.5.
[0037] The beneficial effects of this invention are as follows: This solution provides automated detection support for novel contraband items for which a large number of usable samples are not yet available. A model with rapid learning capabilities can be obtained with only one long-term training session. For certain scarce contraband items that are difficult to obtain, security inspectors do not need to acquire a large amount of contraband data; the model can support the identification of such contraband items, providing conditions for rapid deployment of X-ray contraband automatic detection algorithms and providing technical support for quickly combating certain criminal activities. Attached Figure Description
[0038] Figure 1 It is a model flowchart;
[0039] Figure 2 This is an SVM module diagram;
[0040] Figure 3 These are the experimental results using the SIXray dataset as training data - 10 shots;
[0041] Figure 4 These are the experimental results using the SIXray dataset as training data - 30 shots. Detailed Implementation
[0042] Example
[0043] This invention employs Faster R-CNN, a two-stage object detection model with relatively high detection accuracy, as the basic detection framework for few-shot tasks. The components of this model are:
[0044] The system consists of a backbone network, a region proposal network (RPN), a RoI feature extraction head composed of two fully connected layers (FC layers), and a classifier and regressor that ultimately perform detection.
[0045] To detect the categories and locations of various contraband items, image information is extracted and compressed into feature information via a backbone network. This feature pyramid outputs four layers of image feature information at different scales. The RPN network generates predicted anchor boxes; multi-scale feature information is input into the RPN network to generate corresponding candidate boxes (proposals) at different scales. RoI pooling extracts the region proposals corresponding to the proposed anchor boxes and standardizes the length of their feature information. The proposed regions are denoted as... The output after two fully connected layers have ,in For RoI feature vectors, This is the dimension of the feature vector. This vector will be input into the classifier for the final category recognition task.
[0046] like Figure 1 As shown, the two-stage training method consists of a base-training stage and a fine-tuning stage.
[0047] In the case of few-shot problems, base class data refers to a large amount of existing supervised data, while new class data refers to supervised data of new categories that are not yet seen by the model and have insufficient available samples. In the first stage, the Faster R-CNN model will be trained on a large amount of base class data to obtain general multi-level semantic information about X-ray data. The training dataset consists of all base class data. The model's representational capabilities are trained through multiple iterations using base class data. The knowledge learned by the model in the first stage using a large amount of base class data will be selectively transferred to new classes during fine-tuning with small samples.
[0048] In the second stage, the small-sample fine-tuning stage, this method will use new class data and fuse it with base class data for simultaneous training. Each of the categories K samples are selected from each dataset to form the training dataset during the fine-tuning phase. The sample size is One. The model will be trained from... Randomly sampled data is used in this stage, and parameters are adjusted using gradient descent. The amount of data used in this stage is much smaller than that used in base class training. To reduce the number of parameters that need to be adjusted, this method freezes the parameters of the backbone network, while the remaining parameters are fine-tuned normally.
[0049] To add new constraints to the model fine-tuning stage, this method integrates a trainable SVM embedding layer module into the model. This SVM layer adds additional loss constraints to the model, enabling the model parameters to adapt to tasks with small sample sizes. This method incorporates it into the object detection framework. This module uses SVM loss to guide the fine-tuning process of the model parameters. Furthermore, it proposes regions... Before being fed into the SVM module, the remapping module, consisting of two fully connected layers, is mapped to... .have ,in .
[0050] like Figure 2 As shown, the SVM module includes a built-in solution process for a multi-class SVM task. Support Vector Machine (SVM) is one of the most common and effective classification algorithms in traditional machine learning. This method classifies input vectors into different categories by determining support vectors and establishing a maximum margin objective function. The objective function of this multi-class SVM algorithm is:
[0051] in It is the penalty coefficient. It is a relaxation factor. This is the Kronecker product. The above equation is a convex function, and the objective function of SVM is also a convex function. The problem being solved is a quadratic programming problem, so the QP solver provided by the qpth library can be used to solve for the parameters and calculate the gradient by solving its KKT conditions. .
[0052] When actually defining the SVM objective function, transforming it to the dual space can reduce the complexity of the problem. Let:
[0053] The objective function in the dual space is as follows:
[0054] in, It is the one-hot encoding of each input feature. For the mapping matrix to be solved, we have Since the computation of the dual space depends only on the inner product computation of the input data. Using kernel function method This inner product is projected into a higher-dimensional inner product space. The objective function using the kernel function is as follows:
[0055] in, It is the kernel function of the feature, and different kernel functions can be selected and substituted into the above formula.
[0056] To measure the classification performance of the module, this method uses the common negative log-likelihood loss to calculate the SVM loss. This loss calculates the similarity between the distribution of the output logistic values of the SVM layer and the distribution of the true logistic values, as follows:
[0057] During the fine-tuning phase, the SVM loss constraint mentioned above is added, and the final joint loss function is:
[0058] in, It is a proportionality coefficient used to adjust the loss balance, taken as... .
[0059] Based on the above method, a two-stage training model is used. During the fine-tuning stage, reducing the parameter space and applying SVM loss constraints effectively improves the model's ability to identify prohibited item categories with small sample sizes. The model obtained through this method can support prohibited item categories with insufficient sample sizes.
[0060] The experimental results were obtained using SIXray images, a public dataset in the field of security inspection of contraband, as training data. The metric used is the mean average precision (mAP).
[0061] In this embodiment, only guns and knives are treated as small sample classes, while the remaining categories are treated as large sample classes. In this experiment, the average precision value of the large sample classes (base classes) is denoted as bAP, and the average precision value of the small sample classes (new classes) is denoted as nAP. For example... Figure 3 The table shown contains the results of a 10-shot experiment, where 10 shots means that only 10 samples from each new class are used for fine-tuning training. Figure 4 The table shows the results of the 30-shot experiment, where 30 shots means that only 30 samples from each new class are used for fine-tuning training.
[0062] Experimental results show that, with a 10-shot dataset, the conventional training method achieves only 31.3 and 15.0 mAP for novel samples, while the SVM-constrained small-sample contraband detection scheme achieves 54.8 and 28.9 mAP. In the fine-tuning task with 30 samples, the conventional training method achieves only 26.9 and 14.1 mAP for novel samples, while the SVM-constrained small-sample contraband detection scheme achieves 70.1 and 38.9 mAP. For the conventional training method, the experimental results for 10 and 30 shots are not significantly different, while the SVM-constrained small-sample detection method shows a significant improvement with 30 shots. Compared to the conventional training method, the SVM-constrained small-sample detection method improves the nAP50 index by 27%. These experimental results demonstrate that the present invention effectively improves the recognition performance of novel samples.
[0063] The foregoing has provided a detailed description of one embodiment of the present invention, but this description is merely a preferred embodiment and should not be construed as limiting the scope of the invention. All equivalent variations and modifications made within the scope of the claims of this invention should still fall within the patent coverage of this invention.
Claims
1. A small-sample X-ray security inspection method for detecting contraband using SVM, characterized by: A two-stage training method is used for the basic detection model, consisting of a first stage of base class training and a second stage of fine-tuning. The first stage, the base class training stage, is used to train a model with basic detection capabilities for X-ray images of contraband. The second fine-tuning stage is used to train a model capable of detecting new classes in small samples. A trainable SVM constraint module embedding layer module is integrated into the basic detection model in the second fine-tuning stage to add additional loss constraints to the model, so that the model parameters change in a direction that is more suitable for small sample tasks. The specific steps are as follows: Step 1: Collect or create a large number of X-ray images of luggage packages containing contraband, labeled with tags; Step 2: Use the large amount of base class data mentioned above to train the base classes; Step 3: Collect a small number of X-ray images of luggage packages containing rare and unusual contraband with tags; Step 4: Use the small amount of new data mentioned above to fine-tune the training and obtain a detection model that can detect rare and unusual contraband. The SVM constraint module includes an IoU filter, a multi-class SVM optimization task, and a QP interpreter.
2. The small-sample X-ray security inspection method for detecting contraband using SVM as described in claim 1, characterized in that: The basic detection model is a two-stage Faster R-CNN, comprising a backbone network, a region generation network, a RoI feature extraction head consisting of two fully connected layers, and finally, a classifier and regressor to perform the detection. Image information is extracted and compressed into feature information by the backbone network, and four layers of image feature information at different scales are output through a feature pyramid. The region generation network generates predicted anchor boxes. After multi-scale feature information is input into the region generation network, candidate boxes corresponding to different scales are generated. RoI pooling extracts the proposal regions corresponding to the proposal anchor boxes, and the length of their feature information is standardized. The proposal regions are... After passing through two fully connected layers, the output is have ,in For RoI feature vectors, is the dimension of the feature vector, which will be input into the classifier for the final category recognition task.
3. The small-sample X-ray security inspection method for detecting contraband using SVM as described in claim 2, characterized in that: In the first stage, the base class training stage, the Faster R-CNN model is trained using a large amount of base class data to obtain general multi-level semantic information about X-ray data. The training dataset consists of all base class data. The model is composed of base class data that is used to train the model's representational ability through multiple iterations.
4. The small-sample X-ray security inspection method for detecting contraband using SVM as described in claim 3, characterized in that: In the second stage, the fine-tuning stage, new class data is fused with base class data for simultaneous training. Each of the categories K samples are selected from each dataset to form the training dataset during the fine-tuning phase. The sample size is One, from the model training time Randomly sampled data was used to adjust parameters via gradient descent. The backbone network parameters were frozen, while other parameters were fine-tuned normally. (Recommended region) Before being fed into the SVM constraint module, the data is mapped by a remapping module consisting of two fully connected layers. ,have ,in .
5. The small-sample X-ray security inspection method for detecting contraband using SVM as described in claim 4, characterized in that: IoU filters are used to filter out suggested area features that are of low quality. Multi-class SVM optimization task, used to construct additional classification tasks; The QP interpreter is used to solve the gradient propagation task of the above SVM optimization task.
6. The small-sample X-ray security inspection method for detecting contraband using SVM as described in claim 5, characterized in that: The solution process for the multi-class SVM optimization task is used to classify the input vector into different categories by determining support vectors and establishing a maximum margin objective function. The specific solution process is as follows: In the above formula, The coefficient for the penalty term. As a relaxation factor, For the Kronecker product, the above equation uses the QP solver provided by the qpth library to solve for the parameters and calculate the gradient by solving its KKT conditions. ; make The objective function in the dual space is as follows: In the above formula One-hot encoding for each input feature, For the mapping matrix to be solved, we have Kernel function method Calculate the inner product of the input data. Projecting onto a higher-dimensional inner product space, the objective function of the kernel function is as follows: In the above formula, The kernel function is characterized by; To measure the classification performance of the module, negative log-likelihood loss is used to calculate the SVM loss, which in turn calculates the similarity between the logistic value distribution output by the SVM layer and the true logistic value distribution, as follows: In the second fine-tuning stage, the SVM loss constraint is added, and the final joint loss function is: in, The proportional coefficient used to adjust the loss balance.
7. The small-sample X-ray security inspection method for detecting contraband using SVM as described in claim 6, characterized in that: The proportionality coefficient for adjusting the loss balance is 0.5.