MEVZU N°12808.05.2026ISTANBULYEAR I — VOL. III

MEVZU N° TAG / VOL. 102

#multimodal

0 blog · 0 news · 3 wiki

§03

Wiki

03

VLM — Vision-Language Model

A model that jointly understands images and text and produces text responses.

EN: VLM (Vision-Language Model)
TR: VLM — Görü-Dil Modeli

Multimodal

Models capable of understanding or producing more than one input type — text, image, audio, video.

EN: Multimodal
TR: Çok-Modlu

MLLM — Multimodal LLM

A large language model that also processes modalities like image, audio, or video.

EN: MLLM (Multimodal LLM)
TR: MLLM — Çok-Modlu LLM