The code I used for doing this project can be found in project05.py
and Project05-Training.ipynb
. The main file imports functions defined in other files -- training.py
, detection.py
and tracking.py
. The upcoming sections go into further detail about the specific points described in the Rubric.
CarND Project 05 by Thomas Antony -- see http://www.thomasantony.com
Usage:
project05.py MODEL INPUT OUTPUT
project05.py (-h | --help)
Arguments:
MODEL SKLearn model file created using Project05-Training.ipynb
INPUT Input video file
OUTPUT Output video filename
Options:
-h --help Show this screen.
To run the program on a video:
python project05.py models/clf_9885_newhog.pkl input_video.mp4 output/output_video.mp4
Trained SKlearn classifier model files are stored in the models
folder. The best one so far is models/clf_9885_newhog.pkl
The code for this step is contained in the second and third code cells of the IPython notebook Project05-Training.ipynb
as well as lines 13-134 of training.py
.
First, the glob
package is used to read in all the files in the data-set. Here is an example of one of each of the vehicle
and non-vehicle
classes:
Different colorspaces and HoG parameters were explored. The final configuration was chosen according to the one which gave the best test-set accuracy from the classifier (described later). Here is an example using the Y channel of the YUV
color space and HOG parameters of orientations=8
, pixels_per_cell=(8, 8)
and cells_per_block=(2, 2)
:
Note: OpenCV's HOGDescriptor
class was used in place of skimage.hog
as it was found to be significantly faster. The method init_hog
in training.py
initializes the descriptor with the parameters and then it is re-used for the rest of the process.
A LinearSVC
was used as the classifier for this project. The training process can be seen in code cells 4-6 of Project05-Training.ipynb
. The features are extracted and concatenated using functions in training.py
. Since the training data consists of PNG files, and because of how mpimg.imread
loads PNG files, the image data is scaled up to 0-255 before being passed into the feature extractor.
The features include HOG features, spatial features and color histograms. The classifier is set up as a pipeline that includes a scaler as shown below:
clf = Pipeline([('scaling', StandardScaler()),
('classification', LinearSVC(loss='hinge')),
])
This keeps the scaling factors embedded in the model object when saved to a file. One round of hard negative mining was also used to enhance the results. This was done by making copies of images that were identified as false positives and adding them to the training data set. This improved the performance of the classifier. The model stored in the file models/clf_9885_newhog.pkl
obtained a test accuracy of 98.85% where the test-set was 20% of the total data.
The function create_windows
in lines 141-147 of detection.py
was used to generate the list of windows to search. The input to the function specifies both the window size, along with the 'y' range of the image that the window is to be applied to. Different scales were explored and the following set was eventually selected based on classification performance:
Window size | Y-range |
---|---|
(64, 64) | [400, 500] |
(96, 96) | [400, 500] |
(128, 128) | [450, 600] |
The final image used the Y-channel of a YUV-image along with the sliding window approach shown above to detect windows of interest. These windows were used to create a heatmap which was then thresholded to remove false positives. This strategy is of course more effective in a video stream where there the heatmap is stacked over multiple frames. scipy.ndimage.measurements.label
was then used to define clusters on this heatmap to be labeled as possible vehicles. This is shown in lines 48-56 of the process_image
function in project05.py
. The heatmap is created using the update_heatmap
function in detection.py
in lines 171-186.
In the video pipeline (described in the next section), the output of the label
function was used as measurements for a Kalman Filter which estimated the corners of the bounding box.
This same video can also be found at: project_video_out.mp4
The vehicle tracking pipeline was also run on the challenge video from project #4.
The pipeline worked by first creating a heatmap using all the windows marked as positive by the classifier. The first step to reducing false positives was to use classifier's decision function rather than the predict
method. By setting a threshold on the decision function as shown below, many spurious detections were avoided:
dec = clf.decision_function(test_features)
prediction = int(dec > 0.75)
The heatmap was then added up over 25 frames, and any pixels below a threshold of 10 were zeroed out. This excluded any detections that didn't show up consistently in 10 out of the last 25 frames.
The process of tracking the vehicle, given a set of bounding boxes from the thresholded heatmap is performed by the VehicleTracker
class in tracking.py
. After the clusters were extracted from the heatmap using scipy.ndimage.measurements.label
, the bounding boxes were passed in as measurements to a Kalman Filter.
The tracking process is mainly done by the track
method of the VehicleTracker
class. The first step in the process is to perform non-maximum suppression of the bounding boxes to remove duplicates.
The following rules were used to identify possible candidates for tracking. These are implemented in lines 182-208 of tracking.py
.
-
If there are no candidates being tracked, assume all the non-max suppressed bounding boxes as possible tracking candidates.
-
If there are candidates currently being tracked, compute the overlap between the measured bounding boxes as well as the distance between the centroids of these boxes and the candidates.
-
The measured bounding boxes are assigned to a candidates based on the overlap as well as the distance between the centroids.
Once each measurement has been assigned to an existing (or new) candidate, the Kalman Filter is used to update the state of each candidate. The filter uses the x and y positions of the top-left and bottom-right corners of the bounding box as the state along with their second and third derivatives.
The cleanup
method in the VehicleTracker
class removes stray vehicle candidates that haven't haven't been seen in a few frames or those that have gone beyond a pre-defined area of interest. This is implemented in lines 238-241 of tracking.py
.
Each candidate also has an "age" value that is initially set to zero. Once the age hits -50
, the variable is locked, and the candidate is assumed to be valid vehicle detection and permanently tracked even if it is not seen for a few frames. The Kalman Filter estimates the velocity of the bounding box and predicts the position whenever the measurements are lacking. A permanently tracked vehicle is only deleted once it goes outside a certain region-of-interest (horizon). This helps maintain the tracking even when vehicles are obscured.
The is decremented by one any time the candidate has at least one measurement assigned to it. Conversely, the age is increased if no measurements are assigned to a candidate in a frame (assuming it's age > -50). Once the age reaches 5
, the candidate is considered a false positive and removed.
There is also a threshold on the age at which a candidate is drawn on the image. An orange bounding box indicates that the covariance estimated by the Kalman filter has increased beyond a certain limit. This happens when a vehicle candidate is obscured behind another and hasn't been detected for a few frames.
The pipeline still detects a few false positives. One thing that can help make this more reliable would be a way to detect the horizon and automatically mask out only those places where a vehicle can show up.
The current pipeline runs at a speed of ~3-6 frames per second on a 2011 Macbook Pro. Using convolutional neural networks for the initial segmentation of the image might be much faster than a Support Vector Machine classifier such as the one used here. This might enable real-time vehicle detection and tracking.
[1] "Get HOG image features from OpenCV + Python?" http://stackoverflow.com/questions/6090399/get-hog-image-features-from-opencv-python
[2] Malisiewicz et al., "Faster Non-maximum Suppression in Python" http://www.pyimagesearch.com/2015/02/16/faster-non-maximum-suppression-python/