first training with YOLOv2

5 min readJan 11, 2018

Intro

I have been studying Yolov2 for a while and have first tried using it on car detection in actual road situations. I used tiny-yolo as the base model and used the pre-trained binary weights. While it recognized cars very well with traditional full-shot car images like the ones that a person can see in a commercial, it did not work well in car images that a driver would see in the driver's seat.

Preparing Dataset

Get Images from Blackbox Video Footage

Clearly, the pretrained model was not trained with driver’s POV car images. In order to gather some data, I took the liberty of copying the blackbox videos from my Dad’s car. It was approximately an hour long. I used the ffprobe tool to extract a screenshot every 3~10seconds of all the videos. There were two cases where I removed the images from my dataset

night images. The night images were very dark and the image of the car differed greatly from what the driver would see in daylight. It looked like a good idea to only work with daylight images for now.
nearly identical images during halt. When the car is waiting for the signal, it is halted but the blackbox video is still recording. Therefore some extracted screenshots nearly had identical images and it would be redundant to label this data.

In the end I was able to extract 330 images. It is small but let’s see how far we can go with this amount of dataset.

Labeling the Images

This is another separate project that I had been doing. It was about making a tool where I could label the object in an image. First I tried making an Android app which I did roughly complete but it turned out that I was quite irritating to do this job in a small screen. The productivity of labeling was lower than I thought.

I moved on to creating a website to increase labeling efficiency. After 2 weeks of Django programming, I was able to setup a website where I could not only label my images but also manage them in sets.

With this website, I was able to label all 330 images within approx. 1 hour. Of course, this didn’t include time I had to spend fixing bugs with my website while I started labeling.

Converting label data to darkflow compatible json format

After looking in to the xml format that is included in the darkflow source code as an example, the necessary key-values that are required for the darkflow to understand were identified. The box data that accumulated in my webserver then needed to be converted to this compatible format.

Training

Prepare Custom JSON Parser

By default, the darkflow source code can only parse xml format. However, I find json to be much more easy to handle and thus I added a custom JSON Parser to darkflow and tweaked it so that it can read json files instead of xml files.

The training procedure is simple and documented in the README of darkflow source code. Following the guidelines were sufficient to start training

Result

Train Stats

total images: 330
batch size: 16
epoch: 10

Training Progress Graph

In total 200 steps were run and the loss became approximately half compared to the beginning.

Test Set Images

I had another set of driver POV video that I took to use as a test set. I picked a few images and ran the three types of predictions.

Pretrained (No additiona training done from me)
Step-105 model
Step-200 model

Below are the results.

Pretrained (Step-0)

Step-105

Step-200

Conclusion

Training with driver’s POV images even with a small dataset does quite improve the car detection from driver’s POV
Step-200 seems to be drawing excessive rectangles. Do not know if this is due to immature detection of half-hidden cars. If we did a better job at training half-hidden cars, then perhaps this issue may disappear.
The rectangle position and size is still quite off from what I have anticipated. How can I improve this?
The rectangle position and size prediction is related with grid size and anchor points inside YOLOv2. Should take a deeper look into this.