[Translation] We are looking for a free parking space with Python

## [Translation] We are looking for a free parking space with Python

I live in a good city. But, as in many others, finding a parking space always turns into a test. Empty places are quickly occupied, and even if you have your own, it will be difficult for friends to drive in because they will have no place to park.

So I decided to send the camera to the window and use deep learning so that my computer would notify me when the place is free:

It may sound hard, but actually writing a working prototype with deep learning is quick and easy. All the necessary components are already there - you just need to know where to find them and how to put them together.

So let’s have a little fun and write the exact free parking notification system using Python and deep learning

When we have a difficult task that we want to solve with the help of machine learning, the first step is to break it down into a sequence of simple tasks. Then we can use different tools to solve each of them. By combining a few simple solutions together, we get a system that is capable of something complicated.

Here is how I broke my task:

At the entrance of the conveyor receives a video stream from a webcam, directed to the window:

Through the pipeline, we will transmit each frame of video, one at a time.

The first step is to recognize all possible parking spaces on the frame. Obviously, before we can search for unallocated places, we need to understand which parts of the image there is parking.

Then on each frame you need to find all the cars. This will allow us to track the movement of each machine from frame to frame.

The third step is to determine which places are occupied by cars and which are not. To do this, combine the results of the first two steps.

Finally, the program should send an alert when the parking space becomes available. This will be determined by changes in the location of the machines between video frames.

Each of these stages can be completed in different ways using different technologies. There is no only right or wrong way to make this pipeline, different approaches will have their advantages and disadvantages. Let's take a closer look at each step.

### Recognize parking spaces

This is what our camera sees:

We need to somehow scan this image and get a list of places to park:

The solution “in the forehead” would be to simply hard-code the locations of all parking spaces manually instead of automatic recognition. But in this case, if we move the camera or want to search for parking spaces on another street, we will have to do the whole procedure again. It sounds so-so, so let's look for an automatic way to recognize parking spaces.

Alternatively, you can search for parking meters on the image and assume that there is a parking space next to each of them:

However, this approach is not so smooth. Firstly, not every parking space has a parking meter, and indeed, we are more interested in finding parking spaces for which there is no need to pay.Secondly, the location of the parking meter tells us nothing about where the parking space is located, but only allows us to make an assumption.

Another idea is to create an object recognition model that looks for parking space labels drawn on the road:

But this approach is also so-so. Firstly, in my city all such marks are very small and difficult to see at a distance, so it will be difficult to detect them using a computer. Secondly, the street is full of all sorts of other lines and tags. It will be difficult to separate the parking tags from the lanes and pedestrian crossings.

When you encounter a problem that at first glance seems difficult, take a few minutes to find another approach to solving a problem that will help circumvent some technical problems. What generally is a parking space? This is just a place for which a car is parked for a long time. Perhaps we do not need to recognize parking spaces at all. Why don't we just recognize cars that have been standing still for a long time and not assume that they are standing in a parking space?

In other words, parking spaces are located where cars stand for a long time:

Thus, if we can recognize the cars and find out which of them do not move between frames, we will be able to guess where the parking spaces are. Simply simple - go to the recognition of machines!

### Recognize Machines

Recognizing machines in a video frame is a classic object recognition task. There are many machine learning-based approaches that we could use for recognition. Here are some of them in order from the “old school” to the “new school”:

• You can train a HOG-based detector (Histogram of Oriented Gradients, directional gradient histograms) and walk them through the entire image to find all the machines. This old approach, which does not use deep learning, works relatively quickly, but does not do very well with machines that are located differently.
• You can train the detector on the basis of CNN (Convolutional Neural Network, a convolutional neural network) and walk them all over the image until we find all the machines. This approach works exactly, but not as efficiently, since we need to scan the image several times using CNN to find all the machines. And although we can find machines located in different ways, we will need much more training data than for the HOG detector.
• You can use a deep learning approach like Mask R-CNN, Faster R-CNN or YOLO, which combines the accuracy of CNN and a set of technical tricks that greatly increase recognition speed. Such models will work relatively quickly (on a GPU) if we have a lot of data to train the model.

In general, we need the simplest solution that will work as it should and will require the least amount of training data. It is not necessary that this is the newest and fastest algorithm. However, specifically in our case, the Mask R-CNN is a sensible choice, despite the fact that it is fairly new and fast.

The Mask R-CNN architecture is designed in such a way that it recognizes objects in the entire image, effectively wasting resources, and does not use the sliding window approach. In other words, it works pretty fast. With modern GPU, we will be able to recognize objects in high-definition video at a speed of several frames per second. For our project this should be enough.

In addition, the Mask R-CNN gives a lot of information about each recognized object. Most recognition algorithms return only the bounding box for each object.However, the Mask R-CNN will not only give us the location of each object, but also its outline (mask):

To learn the Mask R-CNN, we need a lot of images of objects that we want to recognize. We could go outside, take a picture of the cars and label them in the photos, which would require several days of work. Fortunately, cars are one of those objects that people often want to recognize, so there are already several publicly available datasets with images of cars.

One of them is the popular COCO (short for Common Objects In Context), which has images annotated with masks of objects. In this dataset there are more than 12,000 images with already marked machines. Here is an example of an image from dataset:

Such data is great for training models based on Mask R-CNN.

But hold the horses, there is even better news! We are not the first who wanted to train their model with the help of COCO dataset - many people have already done it before us and shared their results. Therefore, instead of teaching our model, we can take ready-made, which can already recognize the machine. For our project we will use the Matterport open-source model.

If we give an image from the camera to the input of this model, here’s what we’ll get out of the box:

The model recognized not only cars, but also objects such as traffic lights and people. It's funny that she recognized the tree as an indoor plant.

For each recognized object, the R-CNN Mask model returns 4 things:

• Type of object detected (integer). The pre-trained COCO model is able to recognize 80 different common objects such as cars and trucks. A full list of them can be found here.
• The degree of confidence in the recognition results. The higher the number, the more confident the model is that the object is recognized correctly.
• The bounding box for an object in the form of XY coordinates of pixels in an image.
• A “mask” that shows which pixels within the bounding box are part of an object. Using the mask data, you can find the outline of the object.

The following is Python code for detecting a bounding box for machines using the pre-trained Mask R-CNN and OpenCV:

import numpy as np
import cv2
import mrcnn.config
import mrcnn.utils
from pathlib import path

# Configuration that will be used by Mask-RCNN library.
NAME = "coco_pretrained_model_config"
IMAGES_PER_GPU = 1
GPU_COUNT = 1
NUM_CLASSES = 1 + 80 # COCO dataset has 80 classes + 1 background class.
DETECTION_MIN_CONFIDENCE = 0.6

# Filter the list of recognition results so that only cars remain.
def get_car_boxes (boxes, class_ids):
car_boxes = []

for i, box in enumerate (boxes):
# If the object found is not a car, then skip it.
if class_ids [i] in [3, 8, 6]:
car_boxes.append (box)

return np.array (car_boxes)

# Project root directory.
ROOT_DIR = Path (".")

# Directory for saving logs and trained model.
MODEL_DIR = ROOT_DIR/"logs"

# Local path to the file with the trained weights.

# Load COCO datase if necessary.
if not COCO_MODEL_PATH.exists ():

# Directory with images for processing.IMAGE_DIR = ROOT_DIR/"images"

# Video file or camera for processing - insert the value 0 if you want to use the camera, and not the video file.
VIDEO_SOURCE = "test_images/parking.mp4"

# Create a model Mask-RCNN in output mode.
model = MaskRCNN (mode = "inference", model_dir = MODEL_DIR, config = MaskRCNNConfig ())

# Location of parking spaces.
parked_car_boxes = None

# Load the video file for which we want to run the recognition.
video_capture = cv2.VideoCapture (VIDEO_SOURCE)

# We are looping through each frame.
while video_capture.isOpened ():
if not success:
break

# Convert the image from the BGR color model (using OpenCV) to RGB.
rgb_image = frame [:,:, :: - 1]

# We submit the image of the Mask R-CNN model to get the result.
results = model.detect ([rgb_image], verbose = 0)

# Mask R-CNN assumes that we recognize objects in multiple images.
# We passed only one image, so we retrieve only the first result.
r = results [0]

# The variable r now contains the recognition results:
# - r ['rois'] - the bounding box for each recognized object;
# - r ['class_ids'] - identifier (type) of the object;
# - r ['scores'] - degree of confidence;
# - r ['masks'] - object masks (which gives you their outline).

# Filter the result to get the car frame.
car_boxes = get_car_boxes (r ['rois'], r ['class_ids'])

print ("Cars found in frame of video:")

# Display each frame on the frame.
for box in car_boxes:
print ("Car:", box)

y1, x1, y2, x2 = box

# Draw a frame.
cv2.rectangle (frame, (x1, y1), (x2, y2), (0, 255, 0), 1)

# Show the frame on the screen.
cv2.imshow ('Video', frame)

# Press 'q' to exit.
if cv2.waitKey (1) & amp;  0xFF == ord ('q'):
break

# We clean everything after completion.
video_capture.release ()
cv2.destroyAllWindows ()

After running this script, an image with a frame around each detected machine will appear on the screen:

Also, the coordinates of each machine will be displayed in the console:

Cars found in frame of video:
Car: [492 871 551 961]
Car: [450 819 509 913]
Car: [411 774 470 856]

So we learned to recognize the cars in the image.

### Recognize empty parking spaces

We know the pixel coordinates of each machine. Looking through several consecutive frames, we can easily determine which of the cars did not move, and assume that there are parking spaces. But how to understand that the car left the parking lot?

The problem is that the frames of the machines partially overlap:

Therefore, if we imagine that each frame represents a parking space, it may turn out that it is partially occupied by a car, when in fact it is empty. We need to find a way to measure the degree of intersection of two objects in order to search for only the “most empty” frames.

We will use a measure called Intersection Over Union (the ratio of the intersection area to the sum of the areas) or IoU. IoU can be found by counting the number of pixels where two objects intersect, and divided by the number of pixels occupied by these objects:

So we can understand how strongly the bounding frame of the car intersects with the frame of the parking space. This makes it easy to determine if parking is available. If the IoU value is low, like 0.15, then the car takes up a small part of the parking space. And if it is high, like 0.6, then this means that the car takes up most of the space and you cannot park there.

Since IoU is used quite often in computer vision, in the respective libraries there is a high probability that this measure is implemented. In our Mask R-CNN library, it is implemented as a function mrcnn.utils.compute_overlaps ().

If we have a list of limiting frames for parking spaces, then we can add a check for the presence of cars in this framework by adding a whole line of different code:

# Filter the result to get the car frames.
car_boxes = get_car_boxes (r ['rois'], r ['class_ids'])

# We look, how much cars intersect with known parking spaces.
overlaps = mrcnn.utils.compute_overlaps (car_boxes, parking_areas)

print (overlaps)

The result should look something like this:

[
[one.  0.07040032 0. 0.]
[0.07040032 1. 0.07673165 0.]
[0.  0. 0.02332112 0.]
]

In this two-dimensional array, each row reflects one frame of the parking space. And each column indicates how strongly each of the places intersects with one of the detected machines. A result of 1.0 means that the entire space is completely occupied by the car, and a low value, like 0.02, means that the car climbed into place a little, but you can still park on it.

To find unallocated places, you only need to check every line in this array. If all numbers are close to zero, then most likely the place is free!

However, keep in mind that object recognition does not always work perfectly with live video. Although the model based on the Mask R-CNN is pretty accurate, from time to time it can skip a car or two in one frame of video. Therefore, before claiming that the place is free, you need to make sure that it remains so for the next 5–10 frames of the video. This way we will be able to avoid situations when the system mistakenly marks the place as empty due to a glitch on one video frame. As soon as we make sure that the place remains free for several frames, you can send a message!

### Send SMS

The last part of our conveyor is sending an SMS notification when a free parking space appears.

Send a message from Python is very easy if you use Twilio. Twilio is a popular API that allows you to send SMS from almost any programming language with just a few lines of code. Of course, if you prefer another service, you can use it. I have nothing to do with Twilio, it's just the first thing that comes to mind.

To use Twilio, register a trial account , create a Twilio phone number and get account authentication data. Then install the client library:

\$ pip3 install twilio

After that use the following code to send the message:

from twilio.rest import Client

# Twilio account details.
twilio_auth_token = 'Your Twilio Authentication Token'
twilio_source_phone_number = 'Your Twilio Phone Number'

# Create a Twilio client object.
client = Client (twilio_account_sid, twilio_auth_token)

# We send SMS.
message = client.messages.create (
body = "Message body",
from_ = twilio_source_phone_number,
to = "Your number, where the message will come"
)

To add the ability to send messages to our script, just copy this code there. However, you need to make sure that the message is not sent on each frame, where you can see the free space. Therefore, we will have a flag that, in the established state, will not allow us to send messages for some time or until another place is vacated.

### Putting it all together

import numpy as np
import cv2
import mrcnn.config
import mrcnn.utils
from pathlib import path
from twiliorest import Client

# Configuration that will be used by Mask-RCNN library.
NAME = "coco_pretrained_model_config"
IMAGES_PER_GPU = 1
GPU_COUNT = 1
NUM_CLASSES = 1 + 80 # COCO dataset has 80 classes + 1 background class.
DETECTION_MIN_CONFIDENCE = 0.6

# Filter the list of recognition results so that only cars remain.
def get_car_boxes (boxes, class_ids):
car_boxes = []

for i, box in enumerate (boxes):
# If the object found is not a car, then skip it.
if class_ids [i] in [3, 8, 6]:
car_boxes.append (box)

return np.array (car_boxes)

# Twilio configuration.
twilio_auth_token = 'Your Twilio Authentication Token'
twilio_phone_number = 'Your Twilio Phone Number'
destination_phone_number = 'Number where the message will come'
client = Client (twilio_account_sid, twilio_auth_token)

# Project root directory.
ROOT_DIR = Path (".")

# Directory for saving logs and trained model.
MODEL_DIR = ROOT_DIR/"logs"

# Local path to the file with the trained weights.

# Load COCO datase if necessary.
if not COCO_MODEL_PATH.exists ():

# Directory with images for processing.
IMAGE_DIR = ROOT_DIR/"images"

# Video file or camera for processing - insert the value 0 if using the camera, and not the video file.
VIDEO_SOURCE = "test_images/parking.mp4"

# Create a model Mask-RCNN in output mode.
model = MaskRCNN (mode = "inference", model_dir = MODEL_DIR, config = MaskRCNNConfig ())

# Location of parking spaces.
parked_car_boxes = None

# Load the video file for which we want to run the recognition.
video_capture = cv2.VideoCapture (VIDEO_SOURCE)

# How many frames in a row with an empty place we have already seen.
free_space_frames = 0

# We have already sent SMS?
sms_sent = False

# We are looping through each frame.
while video_capture.isOpened ():
if not success:
break

# Convert the image from the BGR color model to RGB.
rgb_image = frame [:,:, :: - 1]

# We submit the image of the Mask R-CNN model to get the result.
results = model.detect ([rgb_image], verbose = 0)

# Mask R-CNN assumes that we recognize objects in multiple images.
# We passed only one image, so we retrieve only the first result.
r = results [0]

# The variable r now contains the recognition results:
# - r ['rois'] - the bounding box for each recognized object;
# - r ['class_ids'] - identifier (type) of the object;
# - r ['scores'] - degree of confidence;
# - r ['masks'] - object masks (which gives you their outline).

if parked_car_boxes is None:
# This is the first frame of the video - let's say that all the cars found are in the parking lot.
# Save the location of each car as a parking space and go to the next frame.
parked_car_boxes = get_car_boxes (r ['rois'], r ['class_ids'])
else:
# We already know where the places are.  Check if there are free ones.

# We are looking for cars on the current frame.
car_boxes = get_car_boxes (r ['rois'], r ['class_ids'])

# We look, how strongly these cars intersect with known parking spaces.
overlaps = mrcnn.utils.compute_overlaps (parked_car_boxes, car_boxes)

# We assume that there are no empty seats until we find at least one.
free_space = False

# We are looping through each known parking space.
for parking_area, overlap_areas in zip (parked_car_boxes, overlaps):

# Looking for the maximum value of the intersection with any detected
# on the frame machine (no matter what).
max_IoU_overlap = np.max (overlap_areas)

# We get the upper left and lower right coordinates of the parking space.
y1, x1, y2, x2 = parking_area

# Check if the space is free by checking the IoU value.
if max_IoU_overlap & lt;  0.15:
# Place is free!  Draw a green frame around it.
cv2rectangle (frame, (x1, y1), (x2, y2), (0, 255, 0), 3)
# We note that we have found at least the free space.
free_space = true
else:
# The place is still occupied - draw a red frame.
cv2.rectangle (frame, (x1, y1), (x2, y2), (0, 0, 255), 1)

# Write the IoU value inside the frame.
font = cv2.FONT_HERSHEY_DUPLEX
cv2.putText (frame, f "{max_IoU_overlap: 0.2}", (x1 + 6, y2 - 6), font, 0.3, (255, 255, 255))

# If at least one place was free, we start counting frames.
# This is to make sure the place is really free.
# and do not send another notification.
if free_space:
free_space_frames + = 1
else:
# If everything is busy, reset the counter.
free_space_frames = 0

# If a place is free for several frames, it can be said that it is free.
if free_space_frames & gt;  ten:
# Display SPACE AVAILABLE !!  at the top of the screen.
font = cv2.FONT_HERSHEY_DUPLEX
cv2.putText (frame, f "SPACE AVAILABLE!", (10, 150), font, 3.0, (0, 255, 0), 2, cv2.FILLED)

# Send a message if you haven’t done it yet.
if not sms_sent:
print ("SENDING SMS !!!")
message = client.messages.create (
body = "Parking space open - go go go!",
from_ = twilio_phone_number,
to = destination_phone_number
)
sms_sent = true

# Show the frame on the screen.
cv2.imshow ('Video', frame)

# Press 'q' to exit.
if cv2.waitKey (1) & amp;  0xFF == ord ('q'):
break

# Press 'q' to exit.
video_capture.release ()
cv2.destroyAllWindows ()

To run this code, you first need to install Python 3.6+, Matterport Mask R-CNN and OpenCV .

I specifically wrote the code as easy as possible. For example, if he sees on the first frame of the car, he concludes that they are all parked. Try experimenting with it and see if you can improve its reliability.

By simply changing the identifiers of the objects that the model is looking for, you can turn the code into something completely different. For example, imagine that you work at a ski resort. Having made a couple of changes, you can turn this script into a system that automatically recognizes snowboarders jumping off the ramp, and records videos with great jumps. Or, if you work in the reserve, you can create a system that counts zebras. You are limited only by your imagination.

More similar articles can be read in the Neuron telegram channel (@neurondata)

All knowledge. Experiment!