👓 Object Understanding with Vision Language Models

Explore object detection, visual grounding, keypoint detection, and/or object counting through natural language prompts.

Powered by Qwen3-VL 4B and Moondream 3 Preview. Inspired by the tutorial Object Detection and Visual Grounding with Qwen 2.5 on PyImageSearch. Moondream 3 uses the moondream-preview, selecting detect for categories with "Object Detection" point for the ones with "Keypoint Detection", and reasoning-based querying for all others.

Select Task Category

Qwen/Qwen3-VL-4B-Instruct

moondream/moondream3-preview

Examples
Input Image Select Task Category Prompt