[0027] Reference figure 1 , The method of behavior recognition based on wavelet domain joint statistical descriptors of the present invention, the steps are as follows:
[0028] Step 1: Densely sample the behavior video to extract the dense trajectory of the video sequence.
[0029] Common trajectory extraction methods include trajectory tracking based on KLT (Kanade-Lucas-Tomasi), trajectory tracking based on SIFT (Scale Invariant Feature Transform) descriptor matching, and trajectory tracking based on dense optical flow. The present invention uses the dense optical flow-based trajectory tracking method proposed by Wang et al. in the article "Action recognition by dense trajectories" in 2011 to extract the motion trajectory of the behavior video, and the steps are as follows:
[0030] (1.1) Use dense grids to densely sample the video in eight scale spaces in turn, and the scaling factor between every two scale spaces is The sampling interval is 5 pixels;
[0031] (1.2) Calculate the dense optical flow of the sampled video, and track and match the dense sampling points of adjacent frames according to the dense optical flow of the sampling points to form a motion trajectory.
[0032] Step 2: Take the motion trajectory extracted in Step 1 as the center and construct a cube curved along the trajectory.
[0033] A cube with a size of W×H×L is constructed around each trajectory, where W and H are the length and width of the cross section of the cube at a time point, and L is the length of the motion trajectory.
[0034] Step 3: Perform 3D stationary wavelet transform decomposition on the video data.
[0035] Compared with the 3D discrete wavelet transform, the 3D stationary wavelet transform does not perform downsampling during the decomposition of the video, which can ensure the translation stability of the algorithm. At the same time, the 3D stationary wavelet transform decomposes the video into the same size as the original data. It is convenient for joint analysis and processing of different sub-bands. The decomposition steps are as follows:
[0036] (3.1) The video samples in the data set are preprocessed from color data to gray data to reduce computational complexity;
[0037] (3.2) Decompose the preprocessed video data into the same size as the original data e Approximate coefficient subband LLL l And 7×l e Seven detail coefficient subbands HLL in different directions l , LHL l , LLH l , HHL l , HLH l , LHH l , HHH l , Where l is the wavelet decomposition series, l=1, 2,...,l e , L e In order to decompose the total series, LLL means that it contains three-dimensional approximate information, HLL means that it contains detailed information of the first dimension, approximate information of the second and third dimensions, and so on for other direction subbands.
[0038] Step 4. Construct a mutual information descriptor between wavelet coefficient subbands in a cube curved along the track.
[0039] The coefficients in each subband obtained by the stationary wavelet transform have a certain dependence relationship, and coefficients of a relatively large order tend to appear in the same space in the subbands of different scales and different directions.
[0040] The wavelet coefficient subbands located in the same direction at different scales are defined as a parent-child relationship, and the wavelet coefficient subbands located at the same scale in different directions are defined as a cousin relationship.
[0041] In order to quantitatively measure the dependency between the coefficient subbands with a parent-child or cousin relationship, it is necessary to construct a mutual information descriptor between subbands. The steps are as follows:
[0042] (4.1) Calculate the mutual information between subbands of coefficients with parent-child or cousin relationship:
[0043]
[0044] Among them, X represents the parent-child belt coefficient or cousin belt coefficient in the cube curved along the trajectory, Y represents the child belt coefficient or cousin belt coefficient in the cube curved along the trajectory, and MI (X; Y) is the parent-child belt coefficient in the trajectory cube. -Mutual information between the two subbands of the child or cousin relationship; h ij Represents the value of the (i, j) unit in the joint statistical histogram of the two coefficient subbands, h i =∑ j h ij Represents the histogram of the marginal statistics of the father-son band or the cousin band, h j =∑ i h ij Represents the histogram of the edge statistics of the sub-sub-band or the cousin-band, and N represents the total number of pixels in the cube curved along the trajectory;
[0045] (4.2) Calculate the mutual information between all subband pairs that satisfy the above relationship among the subbands obtained by stationary wavelet decomposition, and concatenate them to obtain the wavelet coefficient inter-subband mutual information descriptor D m ,Expressed as:
[0046] D m =[MI 1 ,MI 2 ,...,MI f ,...,MI Z ],
[0047] Among them, MI f Represents the mutual information between the f-th pair of coefficient subbands having a parent-child or cousin relationship, f=1, 2,...,Z, Z is the number of coefficient subband pairs having a parent-child or cousin relationship.
[0048] Step 5. Construct a co-occurrence histogram descriptor between the subbands of the stationary wavelet transform approximation coefficients and the subbands of the detail system in the cube curved along the trajectory:
[0049] (5.1) The approximate subband and the detailed subband obtained in step (3.2) form Q subband pairs: (LLL l ,HLL l ), (LLL l ,LHL l ), (LLL l ,LLH l ), (LLL l ,HHL l ), (LLL l ,HLH l ), (LLL l ,LHH l ) And (LLL l ,HHH l ), which is used to qualitatively analyze the dependency between the detail subbands in each direction and the approximate subbands of the same scale.
[0050] (5.2) Subband pair (LLL l ,HLL l ) As an example, define the first histogram H 1 And the second histogram H 2 Is the joint statistical subband pair (LLL l ,HLL l ) Used two co-occurrence histograms, and the first histogram H 1 And the second histogram H 2 The frequency of each channel in is initialized to zero;
[0051] (5.3) The approximate subband LLL l , The coefficient a corresponding to any point in the cube curved along the trajectory is selected as the target coefficient, by comparing min(a, d t ) And min(d,a t ), calculated in a→a t , D→d t In the neighborhood direction, a belongs to the first histogram H 1 The frequency of the channel H 1 (C h ) And the second histogram H 2 The frequency of the channel H 2 (C h ):
[0052] If min(a,d t )≥min(d,a t ),then:
[0053]
[0054] If min(a,d t ) t ),then:
[0055]
[0056] Where a t Represents a coefficient in a three-dimensional neighborhood with a distance of 1 from the target coefficient a, d is a in the detail subband HLL l The coefficient of the corresponding position in, d t Is a t In the details subband HLL l The coefficient of the corresponding position in C h Is the histogram channel to which the coefficient a belongs;
[0057] (5.4) Calculate the coefficient a for all points in a cube curved along the track g Frequency of the histogram channel H 1 (C hg ) And H 2 (C hg ), get in a→a t , D→d t Neighborhood direction, co-occurrence histogram of subband pairs H 1 Sum co-occurrence histogram H 2 :
[0058]
[0059] Where g = 1, 2,..., U, U is the total number of channels in the histogram;
[0060] (5.5) Calculate the first co-occurrence histogram H 1 The corresponding first cumulative co-occurrence histogram H a1 And the second co-occurrence histogram H 2 The corresponding second cumulative co-occurrence histogram H a2 :
[0061]
[0062] (5.6) Respectively to the first cumulative co-occurrence histogram H a1 And the second cumulative co-occurrence histogram H a2 Perform normalization to get the first normalized cumulative co-occurrence histogram H n1 And the second normalized cumulative co-occurrence histogram H n2;
[0063] (5.7) Use the least square method to find two normalized cumulative histograms H n1 And H n2 , The two straight line equations corresponding to the highest point of each channel: y 1 =k 1 x 1 +b 1 , Y 2 =k 2 x 2 +b 2 , Where x is the value of the histogram channel to which the coefficient belongs, y is the number of all coefficients whose value is less than or equal to x, k is the slope of the line, b is a constant, and the distance between the intersection of two straight lines and y=0 is d 1 , The distance between the intersection with y=1 is d 2;
[0064] (5.8) Select from two straight lines (k 1 ,k 2 ,b 1 ,b 2 ,d 1 ,d 2 ]As a→a t , D→d t The feature description vector v of the histogram co-occurs in the neighborhood direction, and the feature description vector v of all selected neighborhood directions t Concatenate, get a pair of sub-band description vector V=[v 1 ,v 2 ,...,v t ,...,v P ], t=1, 2,...,P, P is the number of selected directions;
[0065] (5.9) Concatenate the description vectors of all selected subband pairs to obtain the wavelet coefficient co-occurrence histogram descriptor D c =[V 1 ,V 2 ,...,V s ,...,V Q ], where V s Is the description vector of the s-th subband pair, s=1, 2,...,Q, Q is the number of selected coefficient pairs.
[0066] Step 6, Descriptor D based on the mutual information between subbands of wavelet coefficients m Co-occurring with wavelet coefficients histogram descriptor D c , Get the joint statistical descriptor D in the wavelet domain u.
[0067] The mutual information descriptor D m =[MI 1 ,MI 2 ,...,MI f ,...,MI Z ] And the wavelet coefficient co-occurrence histogram descriptor D c =[V 1 ,V 2 ,...,V s ,...,V Q ] To concatenate, get the wavelet domain joint statistical descriptor D u =[MI 1 ,MI 2 ,...,MI f ,...,MI Z ,V 1 ,V 2 ,...,V s ,...,V Q ].
[0068] Step 7, construct a word bag model for the wavelet domain joint statistical descriptor, obtain the video representation, and train the SVM classifier.
[0069] (7.1) According to the commonly used division ratios of different human body data sets, the wavelet domain joint statistical descriptors corresponding to all video samples are divided into training set D tr And test set D te; Take the human behavior database KTH as an example, each type of behavior has 25 video samples, and the wavelet domain joint statistical descriptors corresponding to 16 samples are used as the training set, and the wavelet domain joint statistical descriptors corresponding to the remaining 9 samples are used as the test set .
[0070] (7.2) For training set D tr Use K-means clustering method to generate dictionary DI De×Ce , Through the dictionary DI De×Ce , The training set D tr And test set D te Perform quantization coding to get the training set D tr Histogram vector h tr And test set D te Histogram vector h te , Where De represents the feature dimension and Ce represents the number of cluster centers;
[0071] Step 8. Use the histogram vector h of the training set tr Train the SVM classifier and convert the histogram vector h of the test set te Input to the trained SVM, output test set D te The behavior category to which the corresponding test sample belongs.
[0072] In order to verify the effectiveness of the present invention, use the present invention to recognize behaviors on the commonly used human behavior databases KTH and UCF-Sports;
[0073] The recognition result is: the correct recognition rate on the database KTH is 97.17%, and the correct recognition rate on the database UCF-Sports is 96.00%.