👓 Object Understanding with Vision Language Models

Powered by Qwen3-VL 4B and Moondream 3 Preview. Inspired by the tutorial Object Detection and Visual Grounding with Qwen 2.5 on PyImageSearch. Moondream 3 uses the moondream-preview, selecting detect for categories with "Object Detection" point for the ones with "Keypoint Detection", and reasoning-based querying for all others.

Input Image

Select Task Category

Query Caption Point Detect

Prompt

Annotated Image

Text Output

Annotated Image

Text Output

Examples

Input Image	Select Task Category	Prompt