In order to make the objectives, technical solutions, and advantages of the present invention clearer, the following further describes the present invention in detail in conjunction with specific embodiments and with reference to the accompanying drawings.
Figure 1 shows a schematic diagram of the system structure of the present invention. The basic hardware conditions required to realize the system structure of the present invention are: a computer with a main frequency of 2.4GHz and a memory of 1G; the required software conditions are: a programming environment ( Visual C++ 6.0), the system structure of the present invention is implemented in a computer, including: scene image reading module 1, gray image judgment module 2, gray image conversion module 3, image grade division module 4, local binary mode feature calculation Module 5, local binary pattern feature quantification module 6, histogram calculation module 7, principal component analysis calculation module 8, histogram feature fusion module 9, counting module 10, structural information feature fusion module 11, classifier training module 12, the first First-level classifier 13, similar category pair calculation module 14, local texture feature calculation module 15, second-level classifier 16 and classification result fusion module 17;
The scene image reading module 1 reads the scene image; the gray image judgment module 2 is connected to the scene image reading module 1, and the gray image judgment module 2 receives the scene image, and judges and outputs the color image or gray image of the scene image ; Grayscale image conversion module 3 is connected to grayscale image judgment module 2. Grayscale image conversion module 3 receives color images and converts the color images into grayscale images; image level division module 4 is connected with grayscale image judgment module 2 and The gray-scale image conversion module 3 is connected, and the image-level division module 4 divides the gray-scale image according to three levels to obtain the first-level division, the second-level division and the third-level division into 31 image blocks corresponding to the local binary The pattern feature calculation module 5 is connected to the image level division module 4. The local binary pattern feature calculation module 5 calculates the structural information feature of each pixel in the image block to obtain an 8-dimensional local binary pattern feature; local binary pattern feature The quantization module 6 is connected to the local binary pattern feature calculation module 5. The local binary pattern feature quantization module 6 quantizes the 8-dimensional local binary pattern features to obtain the 1-dimensional local binary pattern quantized features; the histogram calculation module 7 and the local The binary pattern feature quantization module 6 is connected, and the histogram calculation module 7 calculates the histogram of the quantized features of the 1-dimensional local binary pattern to obtain the 255-dimensional histogram feature H ps; The principal component analysis calculation module 8 is connected to the histogram calculation module 7, and the principal component analysis calculation module 8 pairs 255-dimensional histogram features H ps Perform principal component analysis to obtain 40-dimensional histogram feature H p; Then use the histogram calculation module 7 to calculate the histogram of the 8-dimensional local binary pattern feature to obtain the 8-dimensional histogram feature H b; The histogram feature fusion module 9 is connected to the principal component analysis calculation module 8, and the histogram feature fusion module 9 combines the 8-dimensional histogram feature H b And 40-dimensional histogram feature H p Fusion, obtain the 48-dimensional structural information feature H corresponding to an image block f =(H b , H p ); The counting module 10 is connected to the histogram feature fusion module 9, and the counting module 10 calculates the 48-dimensional structural information feature H of 31 image blocks f , When H f <31, output the uncompleted image block to the local binary pattern feature calculation module 5, H f =31, the 48-dimensional structural information feature H of 31 image blocks is output f; The structural information feature fusion module 11 is connected to the counting module 10, and the structural information feature fusion module 11 performs 48-dimensional structural information features of 31 image blocks. f Perform fusion to obtain the global structural information feature H of an image g =(H f1 ,..., H f31 ); The classifier training module 12 is connected to the structure information feature fusion module 11, the classifier training module 12 trains the global structure information features to obtain the first-level classifier 13; the first-level classifier 13 is connected to the classifier training module 12, The first-level classifier 13 classifies the scene image and obtains the classification result R in the form of probability 1; The similar category pair calculation module 14 is connected to the first-level classifier 13, and the similar category pair calculation module 14 counts the classification results R 1 The first two candidates in the possible situation, namely statistics (R 11 , R 12 ) To obtain N similar category pairs; the local texture feature calculation module 15 is connected to the similar category pair calculation module 14, and the local texture feature calculation module 15 calculates the N similar category pairs to obtain the local texture information features of the scene image ; The classifier training module 12 is connected to the local texture feature calculation module 15. The classifier training module 12 trains the local texture information feature of the scene image to obtain N second-level classifiers; the second-level classifier 16 and the classifier training module 12 connections, the second-level classifier 16 classifies the scene image, and obtains N classification results C in the form of probability i =(C i1 , C i2 ), i∈[1,N], where C i1 , C i2 Arrange from largest to smallest; the classification result fusion module 17 is connected to the second-level classifier 16, and the classification result fusion module 17 combines the result R1 obtained by the first-level classifier 13 with the second-level classifier 16 to obtain the result C i The final classification result of the scene image is obtained.
The specific steps of the classifier training module 12 are as follows:
The global structure feature output by the structural information feature fusion module 11 and the local texture feature output by the local texture feature calculation module 15 are used as the input learning sample x. For scene images, there are only two types of problems, the identification given by the support vector machine (SVM) The function equation is:
f ( x ) = X i = 1 N y i α i k ( x , x i ) + b
Among them, N is the number of learning samples, y i Is the learning sample x i The category (+1 represents a positive sample, -1 represents a negative sample), b is a constant, k(x, x i ) Is a kernel function, which is defined as follows:
k(x, x i )=Φ(x)·Φ(x i )
Among them, Φ(x) and Φ(x i ) Is a function that can transform x into a high-dimensional space. The above formula can be regarded as a linear equation of the weight vector w:
w = X i = 1 N y i α i · Φ ( x i )
Parameter α i , I = 1, 2,..., N, determined by the learning sample by solving the following optimization problems:
min J ( w ) = 1 2 | | w | | 2
s.t.y i f(x i )≥1-ξ i , Ξ i ≥0, i=1, 2,..., N
ξ i It is a slack variable. This is a quadratic programming problem, which can be solved by converting to its dual problem:
max W ( α ) = X i = 1 N α i - 1 2 X i , j = 1 N α i α j y i y j k ( x i , x j )
s . t . X i = 1 N α i y i = 0
0≤α i ≤C, i=1, 2,..., N
In the formula, C is called the penalty primer. It is a certain specified constant. It can control the degree of punishment for the wrong sample, and achieve this between the proportion of the wrong sample and the complexity of the algorithm.
The above quadratic programming problem can be solved by optimization algorithm, and finally some α i Learning samples that are not 0, these samples are called Support Vectors (SVs). For multi-class problems, a one-to-many method can be used to convert multi-class problems into two types of problems. The discriminating equation of each type is obtained by super-class training by itself and the rest of the other categories. Find the category corresponding to the maximum value of the formula output.
Specific steps of the local texture feature calculation module 15:
Let X be the set of pixels in the similar category picture output by the similar category pair calculation module 14, that is, ordinary vector quantization needs to satisfy the formula:
min V X m = 1 M min k = 1 . . . K | | x m - v k | | 2 2
Where V=[v 1 ,..., v K ] T The set of visual words in the codebook (or dictionary) formed by the similar category pictures output by the calculation module 14 representing similar categories is generally obtained by clustering X. This quantification is essentially used with the feature x m The nearest visual word v k ′ To represent x m However, this quantization is likely to cause larger quantization errors. In order to reduce quantization errors, the method of the present invention can use a linear combination of all pixels to represent x m At the same time, in order to prevent over-fitting, the coefficients of the linear combination need to be restricted, as shown in the formula:
min U , V X m = 1 M | | x m - u m V | | 2 2 + λ | | u m | | 0
||v k ||≤1,
Where U=[u 1 ,..., u M ] T Is the set of linear combination coefficients, ||u m || 0 Means u m The 0 norm of u m The number of non-zero elements in the codebook V should be over-complete, that is, K>D. Solve the above optimization equation, and set u m As a local texture feature.
As shown in Figure 2, the present invention provides a method for extracting global structural information features of scene images. The specific steps of the method are as follows:
Step S1: Use the scene image reading module 1 to read the scene image, and use the gray image judgment module 2 to determine whether the scene image is a color image, and if it is a color image, use the gray image conversion module 3 to convert the color image. Obtain a grayscale image, if it is a grayscale image, skip to step S2;
Step S2: Use the image level division module 4 to divide the gray image according to three levels to obtain image blocks corresponding to the first level division, the second level division and the third level division; after the three levels of division, 31 images are obtained Piece;
Step S3: Use the local binary pattern feature calculation module 5 to calculate the structural information feature of each pixel in the image block to obtain an 8-dimensional local binary pattern feature;
Step S4: Use the local binary pattern feature quantization module 6 to quantize the 8-dimensional local binary pattern features to obtain the 1-dimensional local binary pattern quantized features, and use the histogram calculation module 7 to calculate the quantized features of the 1-dimensional local binary pattern Histogram, get 255-dimensional histogram feature H ps; Use principal component analysis calculation module 8 pairs of 255-dimensional histogram feature H ps Perform principal component analysis to obtain 40-dimensional histogram feature H p; Then use the histogram calculation module 7 to calculate the histogram of the 8-dimensional local binary pattern feature to obtain the 8-dimensional histogram feature H b; Finally, use the histogram feature fusion module 9 to convert the 8-dimensional histogram feature H b And 40-dimensional histogram feature H p Fusion, obtain the 48-dimensional structural information feature H corresponding to an image block f =(H b , H p );
Step S5: Use the counting module 10 to determine the 48-dimensional structural information feature H of all 31 image blocks f Whether all calculations are completed, if steps S3 to S4 are not repeated, if all calculations are completed, skip to step S7;
Step S6: Use the structural information feature fusion module 11 to analyze the 48-dimensional structural information feature H of all 31 image blocks f Perform fusion to obtain the global structural information feature H of an image g =(H f1 ,..., H f31 );
Step S7: Use the classifier training module 12 to train the global structure information feature to obtain the first-level classifier 13;
Step S8: Classify the scene image with the first-level classifier 13 to obtain the classification result R in the form of probability 1 =(R 11 , R 12 ,..., R 1n ), where R 1i (i ∈ [1, n]) are arranged in descending order, n is the number of scene image categories;
Step S9: Use similar categories to count the classification results R of the calculation module 14 1 The first two candidates in the possible situation, namely statistics (R 11 , R 12 ) Possible combinations to obtain N similar category pairs, N ∈ [1, n(n-1)/2], in order to reduce the subsequent calculation complexity, N can generally be set to n/5;
Step S10: Use the local texture feature calculation module 15 to perform calculations for N pairs of similar categories to obtain local texture information features of the scene image;
Step S11: Use the classifier training module 12 to train the local texture information features to obtain N second-level classifiers 16;
Step S12: Classify the scene image with the second-level classifier 16, and obtain N classification results C expressed in probability form i =(C i1 , C i2 ), i∈[1,N], where C i1 , C i2 Arrange from largest to smallest;
Step S13: Use the classification result fusion module 17 to convert the result R obtained in step S8 1 Fuse with the result obtained in step S12 to obtain the final classification result of the scene image.
The specific steps of the image level division module 4 performing the first level division of the corresponding image blocks are as follows:
Step S211: First, the grayscale image is uniformly divided into 4×4 according to the aspect ratio to obtain 16 image blocks, as shown by the solid line of the first level division in FIG. 3;
Step S212: After cutting off 1/8 of the four sides of the grayscale image, the grayscale image is divided evenly according to the aspect ratio of 3×3 to obtain 9 image blocks. At this time, the first-level division obtains a total of 25 Image blocks, as shown by the dotted line in the first level of division in Figure 3.
Among them, the specific steps of the image level division module 4 performing the second level division of the corresponding image blocks are as follows:
Step S221: First, the grayscale image is reduced by one time according to the original aspect ratio, and then the reduced image is evenly divided into 2×2 according to the aspect ratio to obtain 4 image blocks, as shown in the second level of division in Figure 3 Shown by the solid line;
Step S222: After cutting off 1/4 of the periphery of the grayscale image, 1 image block is obtained. At this time, the second-level division obtains a total of 5 image blocks, as shown by the dotted line of the second-level division in FIG. 3.
Among them, the specific steps of the image level division module 4 performing the third level division of the corresponding image blocks are as follows: the grayscale image is reduced by a factor of the original aspect ratio to obtain 1 image block, as shown in the third level division in Figure 3 Line shown.
Among them, the specific steps for the local binary pattern feature calculation module 5 to obtain 8-dimensional local binary pattern features are as follows:
Step S31: First define the pixel P in the image block 0 The 8 neighborhood pixels are P i , Where i=1~8;
Step S32: Compare pixel P 0 The intensity of the gray scale and 8 neighborhood pixels P i The gray scale intensity, if P 0P i , It is recorded as 0, if P 0 i , Then it is recorded as 1, for each pixel, a total of 8-dimensional local binary pattern features are obtained: F b =(f b1 , F b2 , F b3 , F b4 , F b5 , F b6 , F b7 , F b8 ), f bi =0 or 1, i=1-8.
Among them, the similar category pair calculation module 14 determines the specific steps of similar category pairs as follows:
Step S91: Statistics (R 11 , R 12 ) The number of common occurrences C(R 11 , R 12 );
Step S92: Calculate R 11 And R 12 The classification accuracy P(R 11 ) And P(R 12 );
Step S93: Calculate the similarity S=C(R 11 , R 12 )/(P(R 11 )·P(R 12 ));
Step S94: Count the top N category pairs with the largest similarity S (R 11 , R 12 ) As a pair of similar categories.
Among them, the classification result fusion module 17 obtains the result R1 of the first-level classifier 13 and the second-level classifier 16 obtains the result C i The integration steps are as follows:
Step S131: Find the first-level classification result R of the classification result 1 =(R 11 , R 12 ,..., R 1n ) In R 11 And the second level classification result C i =(C i1 , C i2 ), C in i∈[1,N] i1 The maximum value Tmax;
Step S132: Determine whether Tmax is greater than θ, where θ is an empirical value set according to the scene image, if Tmax is greater than θ, the final classification result of the scene image is R 1 , If Tmax is less than or equal to θ, the final scene image classification result is R 1 And C i (i ∈ [1, N]) weighted average.
The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this. Anyone familiar with the technology can understand the changes or substitutions that are conceivable within the technical scope disclosed in the present invention. All should be covered within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.