Dall-E: Can you draw me a picture?

3 min readJan 6, 2021

Open AI had been fundamentally progressing further its research in AI technologies in spite of the challenges it is facing. After the commercialization of GPT-3, everyone was looking for the next step from OpenAI.

What is Dall-E?

Dall-E is an extended version of GPT-3 training model, which is trained on 12 billion parameters which draws or generates images from the text description provided. A unique combination of an artist and a robot, the name of this model Dall-E is as well created combining the name of an artist Salvador Dalí and Pixar’s famous and probably the most loved robot WALL·E.

What can Dall-E do?

Dall-E is a training model which can generate images based on the text you provide. For illustrations OpenAI has provided some examples of the capabilities of their new model.

Ex: An armchair in the shape of an avocado

or lets say —

A stained glass window with an image of a blue strawberry —

OpenAI claims that DALL·E is able to create plausible images for a great variety of sentences that explore the compositional structure of language.

Is it just an image search?

A transformer model has a large set of data in the form of images (here) as an input to it. But here the model is not only recognizing the objects in an image, but is also able to combine multiple objects and intelligently put the missing pieces together.

An as example with the description “a painting of a fox sitting in a field during winter,” the agent was able to determine that a shadow was needed.

“Unlike a 3D rendering engine, whose inputs must be specified unambiguously and in complete detail, DALL·E is often able to ‘fill in the blanks’ when the caption implies that the image must contain a certain detail that is not explicitly stated,” according to the OpenAI team.

So what does it mean in terms of advancement in AI technology?

With GPT-3, OpenAI was able to achieve a remarkable leap in the technology of comprehending text and generating text as a form of an output or a response. Having trained on a very huge dataset, its accuracy in answering or predicting text was remarkable. OpenAI has moved further and developed a model which generates images from the text provided. This finding shows that manipulating visual concepts through language is now possible through AI.

It literally means that we are going to be able to generate pictures, images by just a simple description.

So, possibly in the near future, you can tell Alexa or Google to create a specific image and boom! you have it in your mailbox or phones.

“Alexa/Google/Dall-E, can you create an image of a collection of glasses on a table” —

Conclusion

Any advancement in AI is welcomed with humongous enthusiasm in todays world. We are in the age of exponential growth in terms of technological advancement in AI and OpenAI’s Dall-E is an amazing tool developed in that direction.