This is a MicroPython binding for ESP-DL (Deep Learning) models that enables face detection, face recognition, human detection, and image classification on ESP32 devices.
I spent a lot of time and effort to make this. If you find this project useful, please consider donating to support my work.
FaceDetector
: Detects faces in images and provides bounding boxes and facial featuresFaceRecognizer
: Recognizes enrolled faces and manages a face databaseHumanDetector
: Detects people in images and provides bounding boxesImageNet
: Classifies images into predefined categories
You can find precompiled images in two ways:
- In the Actions section for passed workflows under artifacts
- By forking the repo and manually starting the action
- Clone the required repositories:
git clone https://github.com/cnadler86/mp_esp_dl_models.git
git clone https://github.com/cnadler86/micropython-camera-API.git
git clone https://github.com/cnadler86/mp_jpeg.git
- Build the firmware: Make sure you have the complete ESP32 build environment for MicroPython available.
cd boards/
idf.py -D MICROPY_DIR=<micropython-dir> -D MICROPY_BOARD=<BOARD_NAME> -D MICROPY_BOARD_VARIANT=<BOARD_VARIANT> -B build-<your-build-name> build
cd build-<your-build-name>
python ~/micropython/ports/esp32/makeimg.py sdkconfig bootloader/bootloader.bin partition_table/partition-table.bin micropython.bin firmware.bin micropython.uf2
All models require input images in RGB888 format. You can use mp_jpeg to decode camera images to the correct format.
The FaceDetector module detects faces in images and can optionally provide facial feature points.
FaceDetector(width=320, height=240, features=True)
Parameters:
width
(int, optional): Input image width. Default: 320height
(int, optional): Input image height. Default: 240features
(bool, optional): Whether to return facial feature points. Default: True
-
run(framebuffer)
Detects faces in the provided image.
Parameters:
framebuffer
: RGB888 image data (required)
Returns: List of dictionaries with detection results, each containing:
score
: Detection confidence (float)box
: Bounding box coordinates [x1, y1, x2, y2]features
: Facial feature points [(x,y) coordinates for: left eye, right eye, nose, left mouth, right mouth] if enabled, None otherwise
The FaceRecognizer module manages a database of faces and can recognize previously enrolled faces.
FaceRecognizer(width=320, height=240, db_path="face.db")
Parameters:
width
(int, optional): Input image width. Default: 320height
(int, optional): Input image height. Default: 240db_path
(str, optional): Path to the face database file. Default: "face.db"
-
run(framebuffer)
Detects and recognizes faces in the provided image.
Parameters:
framebuffer
: RGB888 image data (required)
Returns: List of dictionaries with recognition results, each containing:
score
: Detection confidencebox
: Bounding box coordinates [x1, y1, x2, y2]features
: Facial feature points (if enabled)person
: Recognition result containing:id
: Face IDsimilarity
: Match confidence (0-1)name
: Person name (if provided during enrollment)
-
enroll(framebuffer, validate=False, name=None)
Enrolls a new face in the database.
Parameters:
framebuffer
: RGB888 image datavalidate
(bool, optional): Check if face is already enrolled. Default: Falsename
(str, optional): Name to associate with the face. Default: None
Returns:
- ID of the enrolled face
-
delete_face(id)
Deletes a face from the database.
Parameters:
id
(int): ID of the face to delete
-
print_database()
Prints the contents of the face database.
The HumanDetector module detects people in images.
HumanDetector(width=320, height=240)
Parameters:
width
(int, optional): Input image width. Default: 320height
(int, optional): Input image height. Default: 240
-
run(framebuffer)
Detects people in the provided image.
Parameters:
framebuffer
: RGB888 image data
Returns: List of dictionaries with detection results, each containing:
score
: Detection confidencebox
: Bounding box coordinates [x1, y1, x2, y2]
The ImageNet module classifies images into predefined categories.
ImageNet(width=320, height=240)
Parameters:
width
(int, optional): Input image width. Default: 320height
(int, optional): Input image height. Default: 240
-
run(framebuffer)
Classifies the provided image.
Parameters:
framebuffer
: RGB888 image data
Returns: List alternating between class names and confidence scores:
[class1, score1, class2, score2, ...]
from espdl import FaceDetector
import camera
from jpeg import Decoder
# Initialize components
cam = camera.Camera()
decoder = Decoder()
face_detector = FaceDetector()
# Capture and process image
img = cam.capture()
framebuffer = decoder.decode(img) # Convert to RGB888
results = face_detector.run(framebuffer)
if results:
for face in results:
print(f"Face detected with confidence: {face['score']}")
print(f"Bounding box: {face['box']}")
if face['features']:
print(f"Facial features: {face['features']}")
from espdl import FaceRecognizer
import camera
from jpeg import Decoder
# Initialize components
cam = camera.Camera()
decoder = Decoder()
recognizer = FaceRecognizer(db_path="/faces.db")
# Enroll a face
img = cam.capture()
framebuffer = decoder.decode(img)
face_id = recognizer.enroll(framebuffer, name="John")
print(f"Enrolled face with ID: {face_id}")
# Later, recognize faces
img = cam.capture()
framebuffer = decoder.decode(img)
results = recognizer.run(framebuffer)
if results:
for face in results:
if face['person']:
print(f"Recognized {face['person']['name']} (ID: {face['person']['id']})")
print(f"Similarity: {face['person']['similarity']}")
The following table shows the frames per second (fps) for different image sizes and models. The results are based on a test with a 2MP camera and a ESP32S3.
Frame Size | FaceDetector | HumanDetector |
---|---|---|
QQVGA | 14.5 | 6.6 |
R128x128 | 21 | 6.6 |
QCIF | 19.7 | 6.5 |
HQVGA | 18 | 6.3 |
R240X240 | 16.7 | 6.1 |
QVGA | 15.2 | 6.6 |
CIF | 13 | 5.5 |
HVGA | 11.9 | 5.3 |
VGA | 8.2 | 4.4 |
SVGA | 6.2 | 3.8 |
XGA | 4.1 | 2.8 |
HD | 3.6 | 2.6 |
-
Image Format: Always ensure input images are in RGB888 format. Use mp_jpeg for JPEG decoding from camera.
-
Memory Management:
- Close/delete detector objects when no longer needed
- Consider memory constraints when choosing image dimensions
-
Face Recognition:
- Enroll faces in good lighting conditions
- Multiple enrollments of the same person can improve recognition
- Use
validate=True
during enrollment to avoid duplicates
-
Storage:
- Face database is persistent across reboots
- Consider backing up the face database file