Face Detection Simplified

Published in

Analytics Vidhya

4 min readFeb 8, 2021

We have watched various spy and sci-fi movies in which the surveillance systems of some secret service matches an unidentified individual by searching or identifying against their database. We have also bought many toy games that uses face identification and gamifies the experience.

In a real world scenario, identity verification systems at office security, visas & immigration, employee attendance use face biometrics. Face Identification is also used to identify shop lifters from their images/video captured through CCTVs. It has also become a very common household thing with mobile devices bringing face Identification as the ubiquitous way to unlock mobile phones.

This blog post is created to explain with an example how face biometrics is used for Face Detection, Identification & Verification.

Multi Task Cascaded Convolution Networks

Face Detection means identifying human faces in an image or a video. Facebook tagging is a great example of that. Face Identification is a method by which face biometrics is used to match against a database to perform a match whereas face verification is a one-to-one matching to authenticate or verify the claim of the given identify.There are various libraries and algorithms that perform face detection. However, Multi Task Cascaded Convolution Networks (MTCNN) is one of the most popular and easy to use ML library that provides face detection with a very high degree of accuracy. It is a very light weight detector due to which it has shorter inference time.

MTCNN algorithm consists of 3 stages that detects bounding box of human face from an image and identifies 5 point landmarks.

Stage 1: Image is scaled down to create an Image Pyramid whereby every scaled down version passes through the CNN.

Stage 2:Image Patches for bounding box is extracted

Stage 3: landmark points (mouth left, mouth right, left eye, right eye, nose) are computed

Overall the 3 stages in the MTCNN are summarised below:

Github link of python implementation of MTCNN is available at https://github.com/ipazc/mtcnn. It is based on the paper FaceNet: A Unified Embedding for Face Recognition and Clustering

Environment Details: Conda Python3

pip install mtcnn #for installing mtcnn library
pip install tensorflow #tensorflow for running mtcnn algorithms
pip install cv2-plt-imshow #for use in notebooks to display images

Download any image to the server. I downloaded below image from wikipedia: https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/President_Barack_Obama.jpg/800px-President_Barack_Obama.jpg

Once you have successfully loaded the image in cv2 or any other processing library, you can create a detector object for face & landmark detection.

detector = MTCNN()
faces = detector.detect_faces(image)
for face in faces:
print(face)

this will return a json object with face bounding box, confidence score and other keypoints(landmark):

{'box': [178, 45, 118, 148], 'confidence': 0.9990620017051697, 'keypoints': {'left_eye': (213, 103), 'right_eye': (264, 104), 'nose': (237, 132), 'mouth_left': (209, 153), 'mouth_right': (266, 153)}}

Now, to visualise the same, we will now create a function to obtain these values in a parameterised way:

def create_bounding_box(image):
faces = detector.detect_faces(image)
bounding_box = faces[0][‘box’] # to obtain the only 1 image in our case
keypoints = faces[0][‘keypoints’]

cv2.rectangle(image, (bounding_box[0], bounding_box[1]),
(bounding_box[0] + bounding_box[2], bounding_box[1] + bounding_box[3]),
(0,155,255), 2)

cv2.circle(image, (keypoints[‘left_eye’]), 2, (0,155,255), 2)
cv2.circle(image, (keypoints[‘right_eye’]), 2, (0,155,255), 2)
cv2.circle(image, (keypoints[‘mouth_left’]), 2, (0,155,255), 2)
cv2.circle(image, (keypoints[‘mouth_right’]), 2, (0,155,255), 2)
cv2.circle(image, (keypoints[‘nose’]), 2, (0,155,255), 2)

return image

Here cv2.rectangle is obtained from openCV library popularly used for computer vision problems and defined as below to draw a rectangle where start and endpoints are explained below

cv2.rectangle(image, start_point, end_point, color, thickness)

cv2.rectangle(img, (x1, y1), (x2, y2), (255,0,0), 2)


x1,y1 ------
|          |
|          |
|          |
--------x2,y2

Similarly, cv2.circle is used to draw a circle on any image.

image = cv2.circle(image, center_coordinates, radius, color, thickness)

image_with_markers = create_bounding_box(image) # method call
cv2_plt_imshow(image_with_markers) # to display the image with rectangles and circles

As you can see below, bounding box, representing the face and facial landmarks are shown below.

Now, as next steps, this base logic could be used to match facial biometrics against other images using some similarity distance measures such as cosine or an alert system (intruder) that takes action based on the face detected in the camera. This can further be leveraged to build enterprise scale use cases such as for attendance system, surveillance, criminal investigation etc.

Face Detection Simplified

Written by Shrey Shivam