Words and their images in 100 languages

View the Project on GitHub penn-nlp/mmid

The Massively Multilingual Image Dataset (MMID)

MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the University of Pennsylvania. The dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, and parallel to the word’s translation into English (and corresponding images.)

By far the largest dataset of its kind, with 100 languages and up to 10,000 words per language, it is useful for evaluating image-based translation methods.

Getting Started

See the documentation page

If you have questions, please email the MMID users list,


CNN features and web text for the 30 languages evaluated on in the paper are up.

We’re happy to announce that MMID will be available via the Amazon Public Datasets program! Through their generosity, we’ll soon be able to provide all of MMID free of charge via a public S3 bucket.

Check out the downloads page for options on how to access the dataset.


If you use this dataset for your research, please cite:

Learning Translations via Images with a Massively Multilingual Image Dataset.
John Hewitt*, Daphne Ippolito*, Brendan Callahan, Reno Kriz, Derry Tanti Wijaya and Chris Callison-Burch.
ACL 2018.

  author    = {Hewitt, John  and  Ippolito, Daphne  and  Callahan, Brendan and Kriz, Reno and Wijaya, Derry Tanti and Callison-Burch, Chris},
  title     = {Learning Translations via Images with a Massively Multilingual Image Dataset},
  booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month     = {July},
  year      = {2018},
  address   = {Melbourne, Australia},
  publisher = {Association for Computational Linguistics}