Effective vehicle logo recognition in real-world application using mapreduce based convolutional neural networks with a pre-training strategy

Abstract

Large amounts of data are generated by the intelligent transportation system (ITS) everyday. It exceeds the storage and processing capacity of conventional systems, and also doesn’t fit the structures of current database. Therefore, it is necessary to use efficient methodology addressing the challenges. Vehicle logo recognition (VLR) is a significant application in ITS. VLR is difficult due to the geometric distortions as well as various imaging situations simultaneously. However, traditional methods and hand-crafted features have many limitations. Convolutional neural network (CNN) enjoys the success in many machine vision tasks. Inspired by the excellent performance of CNN, we design and develop a novel VLR distributed system framework based on Hadoop ecosystem and deeplearning. We propose a Mapreduce based CNN called MRCNN to train the networks, which significantly increases the training speed and reduces the computation cost simultaneously. Furthermore, unlike previous classical CNN starting from a random initialization, we propose a novel genetic algorithm (GA) global optimization and Bayesian regularization approach called GABR in order to initialize the weights of classifier, which help prevent the overfitting and avoid the local optima. Compared with other algorithms, the proposed method performs best and increases the recognition accuracy with good initial weights optimized by GABR. The results show that the distributed system framework and proposed algorithms are suitable for real-world applications of VLR.

Keywords

MRCNN vehicle logo recognition (VLR)Hadoop ecosystem GA optimization intelligent transportation system (ITS)

1 Introduction

Large amounts of data in intelligent transportation system (ITS) present challenges to processing the information. Therefore, the use of great computing power to speed up the training process has shown significant potential in ITS. However, the harvesting of valuable knowledge and intelligence from ITS is difficult [1].

Deep learning refers to a set of machine learning techniques that learn multiple levels of representations in deep architectures [2 –5]. In recent years, most deep learning algorithms were designed for single machine. With the arrival of Big Data age, this does not hold any more. It is impossible to train a deep neural network by only one machine with CPU, GPU and limited storage. Therefore, distributed architecture with clusters of machines is a better choice [6 –8]. Strategies of such deeplearning have been developed for the forward and backward propagations, such as data parallelism or model parallelism or both two. It is significant to develop high performance computing (HPC) infrastructure in order to build the deeplearning system that is scalable to massive data [9, 10].

ITS plays an important role in the machine vision application areas with techniques such as computer vision and deeplearning. In real applications, vehicle frontal view images [11] are captured by traffic cameras in many places such as highway, checkpoints and intersections. A great many of features are utilized such as Sobel edge, direct normalized gradients, locally normalized gradients, and Harris corner. Psyllos et al. [12] utilized SIFT features to recognize the logo, manufacture and model of vehicles.

Vehicle logo, as an important vehicle feature, can provide the ITS with useful information to category different vehicles [13], which can be applied in the highway monitoring or public security, such as illegal vehicle and incident detection. Besides, vehicle logo recognition (VLR) can also provide government and business with valuable statistics for intelligent city planning or other commercial use.

Therefore, it is necessary to recognize vehicle logo automatically in order to meet demand from practical processing situation. There are two main challenges: one is that large amounts of data generated by intelligent transportation system should be stored and processed properly, the other is that recognition accuracy and efficiency should be increased for real-world applications.

Wang et al. [14] proposed a vehicle logos detection method using edge features. A solution for VLR using “Modest AdaBoost” was presented by Sam and Tian [15]. SIFT features was utilized by Belongie [16] and Psyllos et al. [17] to recognize the vehicle manufacturer. Yu et al. [18] designed a “Bag-of-Words” model for VLR in which SIFT is used to extract robust features. Dai et al. [19] applied SVM to the binary image recognition of vehicle logo.

However, the above methods are still limited. They are based on handcrafted features. Recently, hand-crafted features are often utilized, such as HOG, SIFT and so on. However, these approaches indicate their limitations for vehicle logo since images can be easily influenced by geometric distortions. In addition, hand-crafted features are not discriminative enough.

Convolutional neural network (CNN), as a method of deep learning, is able to automatically learn multiple stages of invariant features for the specific task and has enjoyed the success in a great deal of applications [20, 21]. CNN shows strong robustness against geometric distortions due to the hierarchical learning structure, such as shifts, scaling and inclination. Furthermore, unlike many traditional methods, CNN can learn features automatically, which is suitable for real-time applications.

Recently, CNN has demonstrated excellent performance on various visual tasks, including the classification of two-dimensional images. CNNs were firstly introduced in Fukushima [22]. CNNs have recently outperformed some other conventional methods, even human performance [23], on many vision related tasks, including image classification [24], scene labeling [25], house number digit classification [26], and face recognition [27]. CNNs have been demonstrated to provide even better classification performance than the traditional SVM classifiers [28]. The hierarchical architecture of CNNs is gradually proved to be the most efficient and successful way to learn visual representations [29].

Huang et al. [13] proposed a CNN based VLR system. However, the training of CNN and system are deployed on single machine. As described above, it is not suitable for larger samples. To address the problems, the distributed infrastructure and parallel computation are preferred.

Although CNN achieves excellent performance in VLR, the training of CNN is time consuming, which is inadequate to the real-world applications [30 –32]. In addition, the VLR systems need to be trained with frequently updated training samples. To address this problem, we propose a distributed VLR system and MRCNN training algorithm dramatically increasing the speed of CNN training. With the arrival of big data age, the designed and developed distributed VLR system is also capable of handling this challenge.

On the other hand, the training of fully connected layers may suffer from overfitting problems and local minima [33, 34]. To address this problem, Bayesian regularization and genetic algorithm (GA) global optimization are combined in the training procedure. GA is significant as it has powerful ability to avoid local optima. We utilize GA global optimization in order to initialize the weights of classifier in CNN. The combination of neural network and GA efficiently increase the recognition accuracy with good initial weights optimized by GA. A Bayesian approach can potentially avoid the above pitfalls in training neural networks [35]. Bayesian principle can not only automatically infer hyperparameters by marginalizing them out of the posterior distribution, but it also naturally accounts for uncertainty in parameter estimates and propagates the uncertainty to predictions. Furthermore, Bayesian techniques are often more robust to overfitting since it averages over values of parameters rather than choosing a single point estimate.

The major contributions of this paper are as follows.

A distributed Mapreduce based CNN training algorithm (MRCNN) is designed. Unlike other CNN methods deploying on single machine, the proposed distributed system significantly reduces the training cost and increases the processing efficiency simultaneously.

The proposed distributed framework satisfies the requirement of real-world logo recognition tasks, such as dealing with frequently updated training samples generated by monitors. It also improves the storage ability and system scalability.

Compared with traditional methods utilizing hand-crafted features, the developed architecture automatically extracts features to provide better recognition accuracy and achieves higher qualified performance. Moreover, unlike classical CNN suffering from over-fitting problems with random initialization, a pre-training strategy called GABR is proposed to further increase the flexibility, robustness and recognition accuracy of CNN model.

The rest of paper is presented as follows. The distributed systems are detailed in Section 2. In Section 3, the experiment and comparisons are presented. Finally, this paper is concluded in Section 4.

2 Framework of distributed VLR system

The distributed VLR system is illustrated in Fig. 1. It consists of two stages: offline training and online recognition. In the offline training stage, training samples are trained from feature extractor to the classifier. In the recognition stage, vehicle logo is segmented from new images and then sent to the trained framework for recognition.

Fig.1

Overview of distributed VLR framework.

2.1 Hadoop ecosystem and distributed system for data processing

Hadoop has attracted substantial attention from both industry and scholars alike. Instead of relying on expensive hardware to store and process data, Hadoop enables distributed processing of big data on large clusters of commodity servers. Apache Hadoop framework has become an ecosystem. Hadoop ecosystem has many advantages and is particularly suitable for data management and analysis. Hadoop allows hardware infrastructure to be scaled up and down easily to accommodate hardware changes.

MapReduce draws a lot of attentions as a powerful tool for building specific applications. Hadoop distributed file system (HDFS) is characterized as a distributed file system that can store a large number of files on clusters. It utilizes the advantage of data locality in order to move computations to data nodes rather than bring data to computation nodes. We store the training image data by HDFS.

2.2 Mapreduce based CNN

The training of CNN is time consuming; we propose MRCNN training algorithm dramatically increasing the speed of CNN training.

As shown in Fig. 2, we propose a framework supporting data parallelism, where multiple replicas of the same CNN model are utilized to optimize a single objective.

Fig.2

Architecture of Mapreduce based CNN.

We employ a set of CNN model replicas to simultaneously address a single optimization problem. We leverage the concept of a centralized parameter server in which model replicas update and share their parameters. It takes advantage of the distributed computation within each individual replica.

In a sense, the optimization algorithms implement an intelligent version of data parallelism using HDFS and Mapreduce. It allows us to simultaneously process training samples in each of the CNN model replicas, and periodically combine their outputs to optimize the objective function.

In the parallelized implementations of optimization algorithm, training data is distributed to many machines and each machine is responsible for calculating the gradient on a specific subset of training data samples. The gradients computed are sent back to a central server stored on HDFS that is suitable for data processing.

The MRCNN is implemented as follows:

MRCNN algorithm for training distributed CNN framework
Procedure 1 Mapper()
Input: <Class ID, training sample>.
Output: <weight w, local Δw>.
(1) Load the parameters and initialize the networks;
(2) Read each training sample and output the < key, value > pairs as < Class ID, training sample>;
(3) Feedforward using value as the input of the network;
(4) Back propagation and output the localΔw;
(5) Output the < key, value > pairs as < weight w, local Δw>.
Procedure 2 Reducer()
Input: <weight w, local Δw>.
Output: <weight w, global Δw>.
(1) Reduce by Key and output the globalΔw;
(2) Output the < key, value > pairs to HDFS as < weight w, global Δw>.
Procedure 3 Main ()
Input: <training sample, parameters of network>.
Output: <weight file>.
While (the precision of the network is not better than the
expected precision)
(1) Run the job with Mapper() in procedure 1;
(2) Execute Reducer() in procedure 2;
(3) Reduce by Key and compute the output value of each
reducer. Do a batch update on weights of the network and
output to the weight file.
End While

2.3 Data pre-processing

All data generated by monitors are stored in HDFS, as shown in Fig. 3.

Fig.3

The image originally captured from the monitoring system.

The method detecting the vehicle logo was described in [13]. Figure 4 shows the approach detecting the vehicle logo.

Fig.4

The method detecting the vehicle logo.

Firstly, the vehicle license plate is identified with LPL module. Then, the area in the blue box above license plate is segmented, as shown in Fig. 5a. The size of image is 100×100 pixels in order to contain more types of vehicles such as Audi logo with wide size, as shown in Fig. 5b. Meanwhile, the images shown in Fig. 5 are samples for deeplearning in training set and test set.

Fig.5

Segmented vehicle logo. (a) logo with common size, (b) loge with wide size.

2.4 Architecture of CNN

CNN composes of mainly two parts: feature extractor and classifier. CNN starts with two altering layers named convolutional and downsampling layers. The sequence of convolutioand and downsampling can be repeated many times.

Followed by a nonlinear activation, several feature maps are constructed. The outputs from the last downsampling layer will be constructed as feature vector. And then the feature vector is sent to the classifier.

A nonlinear activation is given by $y_{j}^{(l)} = f (\sum_{i} M_{ij} \otimes x_{i}^{l - 1} + b_{j})$ (1) where $y_{j}^{(l)}$ is the j-th output for the l-th convolution layer C_l; f () is a nonlinear function. M_ij is a filter convolving with the feature map $x_{i}^{l - 1}$ from the previous layer, creating a new feature map in the current layer. The symbol ⊗ represents a convolution operation and b_j is a bias. The downsampling layer reduces the spatial resolution of the feature map to provide distortion invariance.

2.5 Learning the weights by bayesian regularization

One weakness that fully connected layers exhibit is their tendency to strongly overfit the training data [36, 37]. CNN is composed of alternating convolution and pooling layers. The convolutional layers extract patterns on local regions of the input images by convolving a filter over the pixels of input image. After calculating the inner product of the filter at every location in the image, a feature map for each filter is constructed in the layer.

The objective function F is given by $F = β E_{D} + α E_{W} .$ (2) where E_W is the sum of squared weights, E_D is the sum of squared errors, α and β are parameters of objective function. $E_{D} = \frac{1}{n} \sum_{i = 1}^{n} e (i)^{2} = \frac{1}{n} \sum_{i = 1}^{n} (t (i) - a (i))^{2}$ (3) $E_{W} = \frac{1}{m} \sum_{j = 1}^{m} w (j)^{2}$ (4) where t (i) is the corresponding object output, n is the number of sample set, e (i) is the error and a (i) is the output of network. The posterior distribution of the weights can be computed by Bayes’ rule $p (W | D, α, β, H) = \frac{p (D | W, β, H) p (W | α, H)}{p (D | α, β, H)}$ (5) where H is the model of neural network, W is the weights of network. P (W|α, H) is the prior distribution. P (D|, β, H) is the likelihood function. Given the weights W, P (D|α, b, H) is a normalization factor.

Likelyhood: A network with architecture H and W can be considered as making predictions about the target output according to the probability distribution $p (D | W, β, H) = \frac{1}{Z_{D} (β)} exp (- β E_{D})$ (6) where Z_D (β) is presented as $Z_{D} (β) = {(\frac{π}{β})}^{\frac{n}{2}} .$ (7)

Prior: A prior probability of Wcan be given by $p (W | α, H) = \frac{1}{Z_{W} (α)} exp (- α E_{W})$ (8) where Z_W (α) is expressed as $Z_{W} (α) = {(\frac{π}{α})}^{\frac{m}{2}} .$ (9)

The posterior probability of W is given by

$\begin{matrix} p (W | D, α, β, H) \\ = \frac{\frac{1}{Z_{W} (α)} \frac{1}{Z_{D} (β)} exp (- (β E_{D} + α E_{W}))}{p (D | α, β, H)} \\ = \frac{1}{Z_{F} (α, β)} exp (- F (W)) . \end{matrix}$ (10)

According to [38], the values for α and β can be computed as follows:

$α_{MP} = \frac{γ}{2 E_{W} (W_{MP})}$ (11) $β_{MP} = \frac{n - γ}{2 E_{D} (W_{MP})}$ (12) where γ = m - 2α_MPtr (∇ ²F (W_MP)) ^-1 is the number of effective parameters. n is the number of training samples and m is the total number of parameters in the network.

Then, Bayesian regularization procedure can be summarized as follows:

Algorithm: Bayesian regularization procedure
(1) Initialize α, β and the weights;
(2) Train the networks to minimize the objective function F(W);
(3) Calculate γ using the Gauss-Newton approximation to Hessian matrix;
(4) Calculate new estimates of α and β for the objective function;
(5) Iterate from (2) to (4) until convergence.

2.6 Weight initialization by GA global optimization

The back propagation algorithm training neural network starts at random initialization, which often results in local optima. GA is significant as it helps avoid local optima. In addition, proper initialized parameters make the optimization process more effective. We utilize GA global optimization in order to initialize the weights of classifier in CNN. It efficiently increases the recognition accuracy with good initial weights optimized by GA.

GAs are based on biological process in which new and better populations are developed during evolution. The strong ones selected have more opportunity to pass their genes to next generations by reproduction. Weak and unfit species are faced with extinction by natural selection. GA has two operators that generate new solutions from existing ones: crossover and mutation. By utilizing the crossover, genes of good chromosomes are expected to appear more frequently in the population, leading to an overall good solution.

The mutation operator introduces random changes into chromosomes, playing a critical role in GA. In other words, the mutation operator allows for global search of the design space and prevents the algorithm from getting trapped in local minima.

The length of 1-D vector generated by CNN extractor is n₁. The number of neurons in hidden layer is n₂. The output layer contains neurons representing the class of categorized vehicles. Therefore, the length of chromosomes is (n = n₁ × n₂ + n₂ + n₂ × n₃ + n₃). We take 1/F as the fitness function, where, F is (βE_D + αE_W). After selection, crossover and mutation, the best individual is found. Then, we decode chromosomes into weights in each layer of neural network. The algorithm is described in Fig. 6.

Fig.6

Weights initialization by GA global optimization.

3 Experiment and analysis

Our experiments were performed on a cluster of machines that has 1 master and 3 slaves. The master was configured to use 4 CPU, 2.0 G Hz of each CPU, 4GB of RAM. Each salve was configured to use 4 CPU, 2.0 G Hz of each CPU, 8GB of RAM and 500 G disk spaces. All our experimental data are stored in HDFS, with the replication factor is 3.

3.1 Data sets description

After data pre-processing described above, 9000 from 9 manufacturers are generated. Each manufacturer has 1000 images with 100×100 pixels. In distributed VLR system, the parameters of CNN are shown in Table 1.

Table 1
Architecture and construction of CNN

Layer Definition Feature maps Kernel

0 Image input 100×100 pixels 1 –

1 Convolutional layer C1 6 9×9

2 Pooling layer S1 6 2

3 Convolutional layer C2 12 31×31

4 Pooling layer S2 12 2

5 Classifier Vector input – –

6 Hidden layer – –

7 Output layer – –

Layer	Definition	Feature maps	Kernel
0	Image input 100×100 pixels	1	–
1	Convolutional layer C1	6	9×9
2	Pooling layer S1	6	2
3	Convolutional layer C2	12	31×31
4	Pooling layer S2	12	2
5	Classifier Vector input	–	–
6	Hidden layer	–	–
7	Output layer	–	–

In the data sets, 7200 images are utilized as training samples, while other 1800 images are used as test samples. For each type of vehicle, 800 images are training samples and other 200 images are test samples.

3.2 Experimental results

We utilize CNN, CNN with GABR, and SIFT with SVM in order to compare the classification accuracy. SIFT with SVM is a traditional machine learning approach based on hand crafted feature. CNN is a deeplearning architecture automatically learning feature representation with random initialization. The proposed method is CNN framework with a pre-training strategy. The three methods are applied to the data sets. Then, the precision of VLR are shown in Tables 2, 3 and 4.

Table 2
VLR accuracy by CNN+GABR

Vehicle type Correct False Precision

Toyota 185 15 92.50%

Audi 191 9 95.50%

Honda 188 12 94.00%

VW 187 13 93.50%

Citroen 192 8 96.00%

Hyundai 195 5 97.50%

Peugeot 190 10 95.00%

Ford 191 9 95.50%

Buick 183 17 91.50%

Average 1702 98 94.56%

Vehicle type	Correct	False	Precision
Toyota	185	15	92.50%
Audi	191	9	95.50%
Honda	188	12	94.00%
VW	187	13	93.50%
Citroen	192	8	96.00%
Hyundai	195	5	97.50%
Peugeot	190	10	95.00%
Ford	191	9	95.50%
Buick	183	17	91.50%
Average	1702	98	94.56%

Table 3

VLR accuracy by CNN

Vehicle type	Correct	False	Precision
Toyota	179	21	89.50%
Audi	186	14	93.00%
Honda	182	18	91.00%
VW	180	20	90.00%
Citroen	183	17	91.50%
Hyundai	189	11	94.50%
Peugeot	184	16	92.00%
Ford	188	12	94.00%
Buick	178	22	89.00%
Average	1649	151	91.61%

Table 4

VLR accuracy by SIFT+SVM

Vehicle type	Correct	False	Precision
Toyota	171	29	85.50%
Audi	175	25	87.50%
Honda	173	27	86.50%
VW	177	23	88.50%
Citroen	180	20	90.00%
Hyundai	182	18	91.00%
Peugeot	174	26	87.00%
Ford	173	27	86.50%
Buick	172	28	86.00%
Average	1577	223	87.61%

As shown in Table 3, VLR by CNN in more accurate than traditional machine learning approach SIFT with SVM. It reveals that the feature representation of CNN is more powerful than hand-crafted features. According to Table 2, the weights optimized by GABR increases the accuracy of VLR in CNN. It shows that the proper weights initialization is significant to CNN and can improve the performance of CNN. The comparison of three methods is shown in Fig. 7.

Fig.7

Comparison of accuracy in VLR system by CNN+GABR, CNN and SIFT+SVM.

In addition, unlike traditional CNN method focusing on the accuracy of VLR without considering the computational cost and system scalability, the proposed distributed framework and MRCNN efficiently reduce the computational cost and increases the training effectiveness simultaneously. The computational cost comparison between CNN and distributed CNN is also provided in Table 5.

Table 5

Comparison of computational cost

Training methods	Time
CNN	1.4 hours
Distributed CNN	28 minutes

As shown in Table 5, distributed CNN significantly increases the training speed and reduces the computation cost simultaneously. It is more suitable for real-world logo recognition tasks, such as dealing with frequently updated training samples generated by monitors. Note that the Mapreduce based framework is also efficient and effective to the large-scale data environment.

Our algorithms are also applied on the data set extended by [17, 39]. As shown in Table 6, CNN performs better than traditional approach. However, the computation cost of CNN makes it unsuitable for real world application. Therefore, distributed CNN is a more suitable choice.

Table 6

Comparison with other methods

Training methods	Recognition precision	Automatically	Computation & system scalability
MFM [17]	93.2%	No	No
MSIFT [39]	94.5%	No	No
CNN	96.1%	Yes	No
Distributed CNN	98.7%	Yes	Yes

3.3 Robustness evaluation

To evaluate the robustness of CNN model and pre-training strategy, we compute the accuracy by different percentage of training samples with and without pre-training. In addition, we select another technique called weight decay for comparison, considering that the training of traditional CNN encounters some limitations such as overfitting and local minima.

The results are reported in Table 7 and Fig. 8. It reveals that the pre-training strategy helps traditional CNN achieve better performance in classification accuracy.

Table 7
Average accuracy with different number of convolution layers

CNN model Average accuracy

CNN CNN+weight decay CNN+GABR

Convolutional layer C1 0.81 0.89 0.91

Convolutional layers C1 + C2 0.92 0.94 0.95

CNN model	Average accuracy
Convolutional layer C1	0.81	0.89	0.91
Convolutional layers C1 + C2	0.92	0.94	0.95

Fig.8

Robustness evaluation of CNN model in different percentage of training samples.

Compared with weight decay, the proposed method is more robust even with only 10% of training samples. It shows that the CNN model must control its parameters properly when trained from varied training samples. The balance between model complexity and size of data sets should be properly controlled.

As shown above, the observation demonstrates that the proposed method performs more robust due to the automatic adaption of varied training samples and different model structures. It helps the CNN model control the parameters adaptively according to the scale of training sets, which is more suitable for dealing with frequently updated training samples.

4 Conclusions

Previous CNN method mainly focused on the accuracy of VLR without considering the computational cost and system scalability. In contrast, the proposed distributed system framework is more sufficient for real-world applications. Unlike other CNN methods deploying on single machine, we design MRCNN to efficiently reduce the computational cost and increases the training efficiency simultaneously. Besides, we propose GABR approach initializing the weights of classifier to prevent the overfitting and avoid local optima. Compared with traditional CNN approach by random initialization, the results show that good initialized weights are important to CNN performance. The proposed method is more flexible, robust and increases the recognition accuracy. The distributed system framework and proposed algorithms are suitable for VLR tasks of ITS. Furthermore, considering the scalability of distributed system, it can be scaled up and down easily to more and more machines in order to accommodate larger data.

Conflict of interest

The authors declare that they have no conflict of interest.

Footnotes

Acknowledgments

This work is supported by National Natural Science Foundation of China under Grant U1435220.

References

Chen

X.W.

and Lin

X.G.

, Big data and deep learning: Challenges and perspectives, IEEE Access 2 (2014), 514–525.

Dahl

G.E.

, Yu

, Deng

and Acero

, Context-dependent pretrained deep neural networks for large-vocabulary speech recognition, IEEE Transactions on Audio, Speech, and Language Processing 20(1) (2012), 30–41.

Cirean

, Meler

, Cambardella

and Schmidhuber

, Deep, big, simple neural nets for handwritten digit recognition, Neural Computation 22(12) (2010), 3207–3220.

Raina

, Madhavan

and Ng

, Large-scale deep unsupervised learning using graphics processors, In Proceeding of 26th International Conference on Machine Learning (2009), 873–880.

Zhang

and Chen

, Large-scale deep belief nets with MapReduce, IEEE Access 2 (2014), 395–403.

Chen

and Huo

, Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering, In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (2016), 5880–5884.

Gupta

, Zhang

and Milthrope

, Model accuracy and runtime tradeoff in distributed deep learning, arXiv preprint, arXiv:1509.04210, 2015.

Iandola

F.N.

, Ashraf

and Moskewicz

M.W.

, and Kurt Keutzer, Firecaffe: Near-linear acceleration of deep neural network training on compute clusters, arXiv preprint, arXiv:1511.00175, 2015.

Odena

, Faster asynchronous sgd, arXiv preprint, arXiv:1601.04033, 2016.

10.

and Chen

, Experiments on parallel training of deep neural network using model averaging, arXiv preprint, arXiv:1507.01239, 2015.

11.

Zheng

, Zhao

and Wang

, An efficient method of license plate location, Pattern Recognition Letter 26(15) (2005), 2431–2438.

12.

Psyllos

, Anagnostopoulos

C.N.

and Kayafas

, Vehicle model recognition from frontal view image measurements, Computer Standards & Interfaces 33(2) (2011), 142–151.

13.

Huang

, Wu

, Sun

, Wang

and Ding

, Vehicle Logo Recognition System Based on Convolutional Neural Networks With a Pretraining Strategy, IEEE Transaction on Intelligent Transportation Systems 16(4) (2015), 1951–1960.

14.

Wang

, Liu

and Xiao

, A fast coarse to fine vehicle logo detection and recognition method, in Proc IEEE Int Conf Robot Biomim (2007), 691–696.

15.

Sam

K.T.

and Tian

X.L.

, Vehicle logo recognition using modest adaboost and radial tchebichef moments, in Proc 4th Int Conf Mach Learn Comput (2012), 91–95.

16.

Dlagnekov

and Belongie

, Recognizing cars, Dept. Comput. Sci. Eng., Univ. California, San Diego, CA, USA 2005.

17.

Psyllos

, Anagnostopoulos

C.N.

and Kayafas

, Vehicle logo recognition using a sift-based enhanced matching scheme, IEEE Trans Intell Transp Syst 11(2) (2010), 322–328.

18.

, Zheng

, Yang

and Liang

, Vehicle logo recognition based on Bag-of-Words, in Proc 10th IEEE Int Conf AVSS (2013), 353–358.

19.

Dai

, He

, Gao

, Li

and Xiao

, Vehicle-logo recognition method based on Tchebichef moment invariants and SVM, in Proc WCSE (2009), 18–21.

20.

Sun

, Wang

and Tang

, Deep convolutional network cascade for facial point detection, In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2013), 3476–3483.

21.

Krizhevsky

, Sutskever

and Hinton

, Imagenet classification with deep convolutional neural networks, In Advances in Neural Information Processing Systems (2012), 1106–1114.

22.

Fukushima

, Neocognitron: A hierarchical neural network capable of visual pattern recognition, Neural Networks 1(2) (1988), 119–130.

23.

Sermanet

and LeCun

, Traffic sign recognition with multiscale convolutional networks, In Proceedings of the International Joint Conference on Neural Network (2011), 2809–2813.

24.

Ciregan

, Meier

and Schmidhuber

, Multi-column deep neural networks for image classification, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2012), 3642–3649.

25.

Farabet

, Couprie

, Najman

and LeCun

, Learning hierarchical features for scene labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8) (2013), 1915–1929.

26.

Sermanet

, Chintala

and LeCun

, Convolutional neural networks applied to house numbers digit classification, In Proceedings of the 21st International Conference on Pattern Recognition (2012), 3288–3291.

27.

Taigman

, Yang

, Ranzato

and Wolf

, DeepFace: Closing the gap to human-level performance in face verification, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), 1701–1708.

28.

Sutskever

and Hinton

G.E.

, Deep, narrow sigmoid belief networks are universal approximators, Neural Computation 20(11) (2008), 2629–2636.

29.

Bengio

, Learning deep architectures for AI, Foundations and Trends in Machine Learning 2(1) (2009), 1–127.

30.

Sermanet

, et al., OverFeat: Integrated recognition, localization and detection using convolutional networks, in Proc ICLR (2014), 1–16.

31.

and Sun

, Convolutional neural networks at constrained time cost, in Proc IEEE CVPR (2015), 5353–5360.

32.

Szegedy

, et al., Going deeper with convolutions, in Proc CVPR (2015), 1–9.

33.

Erhan

, et al., Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research 11 (2010), 625–660.

34.

Fan

, Xu

, Wu

and Gong

, Human tracking using convolutional neural networks, IEEE Transaction on Neural Network 21(10) (2010), 1610–1623.

35.

MacKay

J.C.

, A practical bayesian framework for backpropagation networks, Neural Computation 4(3) (1992), 448–472.

36.

Snoek

, Larochelle

and Adams

R.P.

, Practical bayesian optimization of machine learning algorithms, In Advances in Neural Information Processing Systems (2012), 2951–2959.

37.

Rumelhart

, Hinton

and Williams

, Learning representations by back-propagating errors, Nature 323 (1986), 533–536.

38.

Foresee

F.D.

and Hagan

M.T.

, Gauss-Newton approximation to Bayesian regularization, Proceedings of the 1997 International Joint Conference on Neural Networks (1997), 1930–1935.

39.

Psyllos

, Anagnostopoulos

C.N.

and Kayafas

, M-SIFT: A new method for vehicle logo recognition, in Proc IEEE Int Conf Veh Electron Safety (2013), 24–27.

CNN model	Average accuracy
	CNN	CNN+weight decay	CNN+GABR
Convolutional layer C1	0.81	0.89	0.91
Convolutional layers C1 + C2	0.92	0.94	0.95

Effective vehicle logo recognition in real-world application using mapreduce based convolutional neural networks with a pre-training strategy

Abstract

Keywords

1 Introduction

2 Framework of distributed VLR system

2.2 Mapreduce based CNN

3.1 Data sets description

Table 2 VLR accuracy by CNN+GABR Vehicle type Correct False Precision Toyota 185 15 92.50% Audi 191 9 95.50% Honda 188 12 94.00% VW 187 13 93.50% Citroen 192 8 96.00% Hyundai 195 5 97.50% Peugeot 190 10 95.00% Ford 191 9 95.50% Buick 183 17 91.50% Average 1702 98 94.56%

Table 7 Average accuracy with different number of convolution layers CNN model Average accuracy CNN CNN+weight decay CNN+GABR Convolutional layer C1 0.81 0.89 0.91 Convolutional layers C1 + C2 0.92 0.94 0.95

Conflict of interest

Footnotes

Acknowledgments

References

Table 2
VLR accuracy by CNN+GABR

Vehicle type Correct False Precision

Toyota 185 15 92.50%

Audi 191 9 95.50%

Honda 188 12 94.00%

VW 187 13 93.50%

Citroen 192 8 96.00%

Hyundai 195 5 97.50%

Peugeot 190 10 95.00%

Ford 191 9 95.50%

Buick 183 17 91.50%

Average 1702 98 94.56%

Table 7
Average accuracy with different number of convolution layers

CNN model Average accuracy

CNN CNN+weight decay CNN+GABR

Convolutional layer C1 0.81 0.89 0.91

Convolutional layers C1 + C2 0.92 0.94 0.95