Method for identifying taxicabs in real time by utilizing video images

A recognition method and video image technology, applied in character and pattern recognition, road vehicle traffic control system, instrument, etc., can solve problems such as difficult to predict traffic flow interference and chaotic and complex external environment, difficult to control and predict, etc. Achieve high judgment accuracy and robustness, increase effectiveness, and improve judgment accuracy

Inactive Publication Date: 2013-02-13
3 Cites 8 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0015] (4) The influence of complex external environment such as scene shadows, unpredictable traffic flow interference and chaotic and complex external environment, all of which are difficult to control and predict
Since taxis account for 30% of th...
View more

Method used

In formula, v is to be normalized vector; ε is used for avoiding denominator to be 0, gets ε=0.05 in the present embodiment. In order to improve the calculation speed, the integral vector map is introduced when calculating the HOG feature. First, 9 integral histograms are used to represent the gradient integral map of each pixel point in 9 gradient directions, so that the triangle cannot be used when discretizing the gradient direction. Linear voting method. The histogram statistics in any rectangular area can be quickly calculated by using the integral graph with the integral values ​​of the four corners, which avoids repeated calculations caused by overlapping blocks and improves the calculation speed.
[0063] For the ubiquitous problem in vehicle identification, that is, the appearance of vehicles under different illuminations is different. Since the HOG method operates on the local cell unit of the image, it can maintain good invariance to both geometric and photometric deformations of the image. This increases the robustness of recognition.
[0073] Compared with other feature description methods, the HOG algorithm has many advantages. First of all, since the HOG method operates on the local cell units of the image, it can maintain good invariance to the geometric (geometric) and optical (photometric) deformation of the image, and these two deformations will only appear in more large space areas.
[0080] 2.3, using the above-mentioned SVM classifier to identify the standard tracking window through the color determin...
View more


The invention relates to the field of vehicle identification and particularly relates to a method for identifying taxicabs in real time by utilizing video images. The method comprises the following steps of: establishing a sample library, and setting color category sets for the taxicabs; defining a color parameter boundary and an area threshold of each color category in the category sets, and training a support vector machine classifier; tracking the taxicabs to be identified in an input video frame; extracting the area of each color category in a tracking window, and comparing the area of each color category with the area threshold, wherein if the area of at least one color category in the tracking window is more than the area threshold, the tracking is finished, otherwise, the tracking continues; dividing the tracking window into a plurality of pixels moving in up, down, left and right directions, thus obtaining four sub-windows; and inputting the sub-windows and the standard tracking window into the support vector machine classifier respectively to obtain identification results, and performing polling statistics on the results. Strict experiments prove that the taxicab identification method can be used for improving the accuracy of judgment, and is high in judgment accuracy and robustness.

Application Domain

Road vehicles traffic controlCharacter and pattern recognition

Technology Topic

Vehicle identificationPattern recognition +2


  • Method for identifying taxicabs in real time by utilizing video images
  • Method for identifying taxicabs in real time by utilizing video images
  • Method for identifying taxicabs in real time by utilizing video images


  • Experimental program(1)

Example Embodiment

[0040] Below in conjunction with accompanying drawing, the present invention is described in further detail, in order to facilitate the understanding of those skilled in the industry:
[0041] Compared with other motor vehicles, taxis have the following salient features:
[0042] Features 1. Dome light. The dome light of a taxi is a clear sign to distinguish it from other vehicles. Significant changes have been made to the contours of the body.
[0043] Features two, the model. Taxi is a small car model, which is obviously different from medium and large vehicles in terms of shape and outline.
[0044]Features three, color. In order to reflect the significant difference between taxis and private cars and make it easier for citizens to distinguish, each region often has a clear limit on the color of the taxi body, so there is a significant difference in color between taxis and private cars.
[0045] For example, in Beijing, taxis are all two-color cars, and in the color setting, the ancient Five Elements culture of our country is implied. All taxis operating in it, no matter what color, have a yellow color band in the middle of the car, which means that the five elements are in the middle of the earth. , blue, purple, white, and red, and cannot be checked.
[0046] Another example is Shanghai. For the color of taxis, it is stipulated that companies with more than 2,000 vehicles can choose their own colors. According to statistics, there are 9 different colors of taxis in Shanghai. These colors are called corporate colors, and there is a significant difference between the colors used in private cars.
[0047] see Figure 2-4 , this embodiment provides a real-time taxi recognition method for video images, and the realization of the taxi recognition method depends on the distinctive features of taxis relative to other vehicles. In view of the above characteristics, the taxi identification method first analyzes the color of the vehicle body, and performs the first rough classification of the vehicle. SVM) to train the classifier, and finally realize the recognition of taxis.
[0048] see figure 1 , which is the overall design process of the taxi identification method. The main design ideas are the same as the conventional vehicle identification method. First, the taxi classifier is constructed, and then the taxi is identified through the classifier. The following two steps will be described in detail. describe:
[0049] 1. Build a taxi classifier, which consists of the following steps:
[0050] 1.1. Build a sample library, including positive samples containing taxis and negative samples containing other motor vehicles. In the case of a certain algorithm, the size and typicality of the sample library jointly determine the accuracy of the SVM classifier. Since taxis in one area are often quite different from taxis in other areas, in addition to the color characteristics of taxis described above, there are also factors that have a greater impact on vehicle contour characteristics, such as the size of the dome light. , the outline of the dome light in the night lighting state, etc. Therefore, preferably, the construction of the sample library should be based on the vehicle images collected by multiple surveillance cameras in the same area (Note: the term "area" used here should be understood as the area where the taxis in the same area have the same specification, in practice In practice, the above regions should be selected according to the specific requirements of different regional traffic control departments). Although the SVM classifier trained with such a sample library is inevitably regional and not suitable for use in other regions, the accuracy will be improved accordingly. On the contrary, if the vehicle images collected by surveillance cameras in multiple regions are used, the SVM classifier will be universal, but its accuracy will often decrease to a certain extent.
[0051] Not only that, the selection of the collected vehicle images also needs to be comprehensive and typical, and the sample library should contain images of various models from various angles in various weather conditions and various lighting environments. Based on such a sample library, the accuracy of the SVM classifier is guaranteed.
[0052] 1.2. Based on the positive sample images collected above, determine all types of the main colors of the local taxi body, and set them as the color category set for taxis. In this embodiment, based on the traffic surveillance video of a certain place, the color category set includes eight colors: orange, earthy yellow, green, light blue, dark blue, dark purple, white and red.
[0053] 1.3. Determine the color parameter boundary of each color category in the color category set. After careful analysis of a large number of samples, it is found that, especially the analysis of the changes of the same color under different illumination, it is found that the main change of illumination is the brightness of the color, such as red, that is, the change from dark red, red, to light red. From this point, the boundary of the color parameter is determined mainly by selecting images of taxis of various colors under different illuminations in the positive samples, and extracting a class of colors based on the HSV color space. The range of the three parameters H, S and V in the calibration area is determined by manual calibration. See Table 1, which is the color parameter boundaries of eight colors in the color category set in this embodiment.
[0055] For the above-mentioned HSV color space, a brief description is given below. The HSV color model evolved from the CIE three-dimensional color space. It adopts the user's intuitive color description method. It is closer to the HVC spherical color stereo of the Munsell color rendering system, but the HSV color model is an inverted six-dimensional color model. The rhombus is only half of the Munsell spherical color solid (southern hemisphere), so the pure colors without black are all on a color plane on the top surface of the hexagonal pyramid. In the HSV hexagonal color model, the hue (H) is on the color plane parallel to the top surface of the hexagonal pyramid, and they rotate and change around the central axis V. The six standard colors of red, yellow, green, cyan, blue, and magenta are respectively 60 degrees apart. The color lightness (B) changes from top to bottom along the central axis V of the hexagonal pyramid. The top of the central axis is white (V=1) and the bottom is black (V=0). They represent the grayscale colors of the achromatic system. The color saturation (S) changes in the horizontal direction. The closer to the color of the central axis of the hexagon, the lower the saturation. The color saturation in the center of the hexagon is zero (S=0), which is the same as the highest brightness V= 1 coincides, and the color with the highest saturation is on the edge of the hexagonal frame (S=1).
[0056] Color planes (H, S) are based on the x, y color planes of the CIE chromaticity diagram
[0057] The basis of the chromaticity/hexagonal axis (V) is the luminance factor Y of the CI E three-dimensional color space.
[0058] 1.4. On the basis of finding the above range of color parameters, select a large number of representative positive and negative samples, and search for the color in the category set of taxis and non-taxi in the image by traversing a series of positive samples and negative samples. The relative color area values ​​​​are compared to obtain the threshold value for distinguishing positive and negative samples. Since the above positive samples and negative samples have different sizes, after extracting the judgment window of the above samples, it is necessary to normalize the size of the judgment window, and then extract each color category in the normalized judgment window. area.
[0059] See Table 2 for the setting values ​​of the threshold values ​​for each color. Since the color judgment is the first judgment of the entire vehicle identification, in order to avoid missing positive samples, the threshold value is obtained closer to the maximum area of ​​the negative samples.
[0061] 1.5. Extract the HOG feature of the above-mentioned normalized judgment window based on the HOG algorithm, and use the HOG feature to train the SVM classifier. The following is a brief description of the HOG algorithm and the SVM classifier:
[0062] The HOG algorithm can more accurately extract the contour features of the vehicle in the case of low definition. This is because the HOG algorithm extracts color gradients on local cell units and does not require high clarity. Thereby reducing the requirements for the instrument and greatly reducing the cost.
[0063] Aiming at the common problem in vehicle recognition, that is, the appearance of vehicles under different lighting is different. Since the HOG method operates on the local cell units of the image, it is well invariant to both geometric and photometric deformations of the image. Thereby increasing the robustness of the recognition.
[0064] The HOG feature is a local area descriptor, which can well describe the edge of the vehicle by calculating the gradient direction histogram on the local area to form the vehicle exterior feature. It is insensitive to lighting changes and small shifts. The gradient of the pixel (x, y) in the input image is as follows
[0065] G x (x,y)=H(x+1,y)-H(x-1,y)
[0066] G y (x,y)=H(x,y+1)-H(x,y-1)
[0067] In the formula, G x (x,y), G y (x, y) and H(x, y) represent the horizontal gradient, vertical gradient and pixel value at the pixel point (x, y) in the input image, respectively. The gradient magnitude and gradient direction at the pixel point (x, y) are as follows
[0068] G ( x , y ) = | G x ( x , y ) 2 + G y ( x , y ) 2
[0069] α ( x , y ) = tan - 1 ( G y ( x , y ) G x ( x , y ) )
[0070] HOG feature extraction step: Divide the image into several 8 × 8 pixel cells (cell), and divide the gradient direction of [-π/2, π/2] into 9 intervals (bin) on average, in each cell Histogram statistics are performed on the gradient magnitudes of all pixels in the bin intervals in each direction, as follows figure 1 As shown in .7, a 9-dimensional feature vector is obtained, each adjacent 4 units is a block, and the feature vectors of the 4 units are connected to obtain a 36-dimensional feature vector of the block, and the block is used to scan the sample image. , the scanning step is one cell, and finally the features of all blocks are concatenated to obtain the features of the vehicle. In the method of DATAL, the size of all blocks is fixed, the information obtained is limited, and more complete information cannot be obtained. In the embodiment of the present invention, blocks of variable sizes are used to extract HOG features, and the aspect ratios of the blocks used are respectively ( 1:1), (2:1), (1:2). The size of the block varies from 16×16 to 64×128, and each block is equally divided into 4 cells. The moving step size of each block is still 8 pixels, so a total of 438 blocks are obtained, and the HOG features in each block are normalized using the following formula.
[0071] V = v | | v | | + ϵ
[0072] In the formula, v is the vector to be normalized; ε is used to prevent the denominator from being 0, and ε=0.05 in this embodiment. In order to improve the calculation speed, the integral vector graph is introduced when calculating the HOG feature. First, 9 integral histograms are used to represent the gradient integral graphs of each pixel in the 9 gradient directions, so that when discretizing the gradient directions, the triangle cannot be used. Linear voting method. Using the integral graph to calculate the histogram statistics in any rectangular area can be quickly calculated by the integral values ​​of the four corners, which avoids the repeated calculation caused by the overlapping of blocks and improves the calculation speed.
[0073] Compared with other feature description methods, the HOG algorithm has many advantages. First, since the HOG method operates on the local cell unit of the image, it is well invariant to both the geometric and photometric deformations of the image, which only appear in more on a large space.
[0074] The main idea of ​​SVM can be summarized as two points: (1) It analyzes the linearly separable case. For the linearly inseparable case, the linearly inseparable samples in the low-dimensional input space are converted into high-dimensional feature space by using a nonlinear mapping algorithm. It makes it linearly separable, which makes it possible to use linear algorithms to linearly analyze the nonlinear characteristics of samples in the high-dimensional feature space; (2) It constructs the optimal segmentation hyperplane in the feature space based on the theory of structural risk minimization, The learning is optimized globally, and the expected risk in the entire sample space satisfies a certain upper bound with a certain probability.
[0075]Based on the above principles, in this embodiment, the judgment window is normalized, and then sent to the HOG algorithm for transformation, and a 100×100 picture is reduced from 10000 dimensions to 900 dimensions. Through the change of the HOG algorithm, the image features after feature extraction highlight the contour features of the taxi dome light. Then, the image features extracted by the HOG algorithm are sent to the SVM for training. Obtain the vehicle classifier constructed by SVM.
[0076] 2. Taxi identification. Taxi identification is based on vehicle tracking. By tracking all vehicles entering the monitoring area, it is only necessary to perform various identification and judgment on the tracked target window. Since the vehicle tracking technology is familiar to those skilled in the art, its specific principles will not be repeated here. Taxi recognition consists of the following steps:
[0077] 2.1. The color determination step is the first rough classification of the vehicle through the color feature of the taxi.
[0078] The vehicle tracking process is performed on the input continuous video frame containing the vehicle to be recognized. When the tracking process is completed, the size of the tracking window including the complete image of the rental vehicle to be recognized in the first image frame after successful tracking is normalized.
[0079] 2.2. Extract the area of ​​each color type in the normalized tracking window, and compare the area with the corresponding area threshold. If the area of ​​one or more color types in a standard tracking window is greater than the area threshold , it is considered that the vehicle corresponding to the tracking window passes the color judgment, and the tracking window enters the next step; otherwise, it is discarded. Judging by color features, it can effectively exclude a large number of private vehicles of other colors, and increase the effectiveness and real-time performance of taxi identification. For the selection of the above color parameter boundaries and the color judgment steps, a combination of Figure 4 understand.
[0080] 2.3. Use the above SVM classifier to identify the standard tracking window that has passed the color determination step. In fact, with the increase of the sample size in the sample database, the increase rate of the correct judgment rate of the SVM classifier will gradually slow down. Therefore, in order to further improve the correct judgment rate, the applicant proposes to use the time domain and the spatial domain in the present invention. Multi-window voting mechanism on .
[0081] Spatial domain
[0082] In the applicant's repeated experiments, it is found that even under the same SVM classifier, based on the same image frame, due to the different selection of the standard tracking window, the background of the judged vehicle will change, so under certain conditions, it can lead to the judgment result. change. However, after the applicant moved the standard tracking window several times slightly, it was found that most of the multiple sub-windows obtained by translating the standard tracking window were favorable for judgment, while a small amount of them would lead to wrong judgment due to special background. In other words, the standard tracking window also suffers from erroneous judgments due to special backgrounds. Based on the above original theory, the basic principles of the multi-window voting mechanism in the spatial domain are as follows:
[0083] Move the standard tracking window in an image frame by x pixels in the upward and downward directions to obtain two sub-windows, and move the same y pixels in the left and two directions to obtain the other two sub-windows. The obtained 4 sub-window sizes are normalized, and the normalized 4 sub-windows and the standard tracking window are input into the support vector machine classifier to obtain five recognition results. Based on the above, most of the five recognition results based on the same image frame are obtained on the basis of facilitating judgment, that is to say, they are correct results. The five recognition results are voted and counted, and the one with the most votes in the two results, yes or no, wins, thereby obtaining the first revised identification result.
[0084] time domain
[0085] In the process of vehicle operation, it is inevitable that there are factors that are not conducive to judgment and identification, such as occlusion, complex vehicle background, and reflection, which lead to misjudgment. However, the applicant has found through research on a large number of videos that during the continuous period of time when the vehicle is running, it is favorable for judgment most of the time. Based on the above theory, the basic principles of the multi-window voting mechanism in the time domain are as follows:
[0086] In a continuous image frame (containing the vehicle to be recognized), select multiple image frames. Based on the theory that "in the continuous operation of the vehicle, most of the time is conducive to judgment", the tracking windows including the complete image of the same rental vehicle to be recognized in these image frames are input into the SVM classifier to obtain multiple recognition results. Most of the recognition results are obtained on the basis of favorable judgment, that is to say, they are correct results. After weighting these results and counting the votes, the one with the most votes in the two results, yes or no, wins, thereby obtaining the revised judgment result. Through the multi-window voting mechanism in the time domain, the contingency of the wrong identification structure obtained under special conditions can be well avoided, and the robustness of the judgment can be effectively increased.
[0087] In fact, combining spatial and temporal multi-window voting mechanisms can further improve the robustness of judgment. As well as such a design idea, the specific implementation of the vehicle identification step in this embodiment is as follows:
[0088] After passing the color judgment, the key frame is obtained by sampling the video frame from the first image frame (the sampling rate of the key frame depends on the processing speed of one frame of image). In this embodiment, the step is to take one image frame every 0.2 seconds within one second, and use the spatial domain algorithm for each frame of the five obtained images (including the first image frame) to obtain the respective frame. After that, the spatial domain correction results of the 5 frames of images in this second are voted, that is, time domain processing, to complete the joint correction of the initial judgment results in the time and space domains. It should be noted that the above-mentioned 0.2 seconds and 1 second are only exemplary examples, and the actual time interval should be selected on the basis of satisfying the processing cycle for completing one frame of image, and the vehicle generates relatively obvious displacement. The number of video frames processed in the time domain should not be too small, otherwise it will have a relatively high error rate, and it should not be too many, otherwise the computational load will be too large.
[0089] In step 2.3, the values ​​of the window shifts x and y in the spatial domain voting are as follows: set the standard judgment tracking window to be a horizontal pixel point and vertical b pixel point, the value range of x is 0.1a-0.2a, y The value range is 0.1b-0.2b.
[0090] The above-mentioned value ranges of x and y are empirical values ​​obtained by the applicant through a large number of actual experiments. If the translation factor is too large, the self-window may not be able to cover the target well. If the translation factor is too small, the translation effect is not enough.
[0091] 2.4. Interval correction recognition of subsequent frames
[0092] When the above-mentioned first identification step is completed, the vehicle identification result has been obtained. However, when the vehicle enters the monitoring area for a period of time, it is no longer necessary to identify it at all times, so that the amount of calculation can be reduced in the case of heavy traffic and slow speed. In order to increase the judgment accuracy and robustness, the interval recognition and correction method is adopted in the subsequent video frames.
[0093] That is to say, for the same motor vehicle, after the first identification is completed, the vehicle is corrected and identified every certain period, and the above steps 2.1-2.3 are repeated to correct the previous judgment result. In other words, when the vehicle enters the monitoring area and is successfully tracked, steps 2.1-2.3 in the identification step are repeated periodically, and the previous results are corrected using the identification results obtained in each cycle.
[0094] The specific implementation steps of the identification method proposed by the present invention have been described in detail above through the embodiments, and the identification method proposed in the present embodiment and the existing identification methods based on the HOG algorithm and the SVM classifier are compared through experiments below:
[0095] experimental design
[0096] Randomly select 30 positive and negative test samples from the test sample library, and randomly select 150, 250, 350, and 500 positive and negative samples from the sample library. The existing identification method and the identification method of this embodiment are respectively used for identification, and the detection results are compared and evaluated, and the following table is obtained.
[0097] Comparison of the accuracy of the original algorithm and the multi-angle judgment algorithm
[0099] Those of ordinary skill in the art to which the present invention pertains can understand that the above embodiment of the present invention is only one of the exemplary embodiments of the present invention, which is limited by space. The implementation is all within the protection scope of the present invention.
[0100] It should be noted that the above content is a further detailed description of the present invention in combination with the specific embodiments, and it cannot be considered that the specific embodiments of the present invention are limited to this. Under the guidance of the above embodiments, those skilled in the art can Various improvements and modifications can be made on the basis of the invention, and these improvements or modifications fall within the protection scope of the present invention.


no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Method for identifying human face based on LDA subspace learning

InactiveCN102129557AImprove effectivenessImprove classification performance

User classification model generation method and apparatus

InactiveCN108121742Aavoid disgustImprove effectiveness

Waterlogging warning system and method

InactiveCN104778648AImprove effectiveness

Classification and recommendation of technical efficacy words

  • Improve accuracy and robustness
  • Improve effectiveness

Background-sound control system for a video game apparatus

InactiveUS20020094865A1smoothly switchimprove effectiveness

Method for using query templates in directory caches

InactiveUS6904433B2improve effectivenesslow overhead

Design method and implementation of optimal linear IIR equalizers for RF transceivers

InactiveUS20050100105A1eliminate accumulationimprove effectiveness

Start-up procedure for an isolated switched mode power supply

ActiveUS8456867B1improve effectivenessdischarge parasitic capacitance

Method, system and device for finding user in social network

ActiveCN102546656AImprove effectiveness
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products