GitHub - OcularEngineering/notebooks: Learn how to use Ocular Foundry to fine-tune or train powerful state-of-the-art models like YOLOv11 for real-time object detection, SAM 2 for image segmentation, Florence-2 for visual reasoning tasks, and PaliGemma 2 for multimodal learning.

Ocular AI: Data Engine for The Multimodal AI Era

This repository offers a comprehensive and continuously expanding collection of tutorials, designed to help you master the latest advancements in the field. Learn how to harness powerful state-of-the-art models like YOLOv11 for real-time object detection, SAM 2 for image segmentation, Florence-2 for visual reasoning tasks, PaliGemma 2 for multimodal learning, and Qwen2.5-VL for video-language tasks. These tutorials cover a wide range of applications, including object detection, image and video segmentation, pose estimation, data extraction, and optical character recognition (OCR).

We are committed to keeping this repository up to date, and we'll be adding new notebooks regularly to cover emerging techniques and use cases. Additionally, we welcome contributions from the community, so feel free to submit your own tutorials, improvements, or ideas to help us grow this resource.

Notebooks

Notebook Title	Colab Link	Resources	Publisher's Paper & Repo
Fine-tuning YOLOv11 for Object Detection		Ocular Blog	YOLOv11 Paper, YOLOv11 Repository

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
automation		automation
notebooks		notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ocular AI: Data Engine for The Multimodal AI Era

Notebooks

About

Releases

Packages

Contributors 2

Languages

OcularEngineering/notebooks

Folders and files

Latest commit

History

Repository files navigation

Ocular AI: Data Engine for The Multimodal AI Era

Notebooks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages