Built on top of the revolutionary GPT-3 model, DALL – E can generate breathtaking images from mere text descriptions. Discover how!
OpenAI’s DALL-E creates any image by text description!
OpenAI, a San Francisco-based artificial intelligence research company founded by Elon Musk, has created a program called DALL-E (a combination of Dali and Wall-E) that can produce plausible images using text descriptions. The program uses an autoregressive language model with 12 billion parameters to create virtually any image.
The technology used for DALL-E to understand language forms, called Generative Pre-trained Transformer 3 (GPT-3), was also created by OpenAI. In summary, the program creates illustrations, paintings, photos, renderings, sketches – in short, anything that can be identified through words.
OpenAI’s article about DALL-E, for example, features “an armchair in the shape of an avocado” with several renderings as a result. As expected, many results will be good, but occasionally some may not be very “selectable.”
The OpenAI description of DALL-E is as follows: “The DALL-E model transforms graphics and text into one single stream of data containing up to 1280 tokens. The model is then trained using maximum likelihood to generate each token sequentially.”
“We find that DALL·E is able to create images for a great variety of sentences that explore the compositional structure of language, including creating anthropomorphised versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.” – OpenAI.
OpenAI found that DALL-E is able to render the same scene in a variety of styles and can adapt lighting, shadows and ambience based on the time of the day or season, even when these details are not specified.
DALL – E: What it can do?
● It can change the attributes of objects and the number of times an object appears in an image:
Text Prompt: “A collection of glasses is sitting on a table”.
● It can draw multiple objects simultaneously and control their spatial relationship:
Text Prompt: “a small red block sitting on a large green block”.
● Scenes can be controlled both in terms of their viewpoint and the way they are rendered in 3D:
Text Prompt: “an extreme close-up view of a capybara sitting in a field”.
● It can visualise the internal and external structure of an object:
Text Prompt: “a cross-section view of a walnut”.
● It can infer contextual details:
Text Prompt: “a store front that has the word ‘openai’ written on it”.
● It can create fashion and interior design pieces based on the description provided:
Text Prompt: “a female mannequin dressed in a black leather jacket and gold pleated skirt”.
● It can combine unrelated concepts and create realistic objects:
Text Prompt: “a snail made of a harp”.
● It can create illustrations of anthropomorphised animals and plants based on the description provided:
Text Prompt: “an illustration of a baby daikon radish in a tutu walking a dog”.
● It can conduct zero-shot visual reasoning:
Text Prompt: “the exact same cat on the top as a sketch on the bottom”.
● It can reason about geographical facts, landmarks and neighbourhoods:
Text Prompt: “a photo of the food of china”
● It can reason with time information and use its temporal knowledge:
Text Prompt: “a photo of a phone from the 20s”.
We got a quick overview of OpenAI’s new model, DALL – E, in this article and with the use of text prompts, DALL – E appears to be able to perform multiple image-generation and classification tasks. While there are several other solutions on the market, OpenAI goes a step further and is leading the way with its state of art capabilities.
It is not entirely new what DALL-E does, but OpenAI’s new program works quite well and is able to handle input variations with remarkable success. Obviously, a generated image of an animal will not have the same quality or sharpness as a genuine image captured by a digital camera – but this too is likely to change in the near future.
“Unlike a 3D rendering engine, whose inputs must be specified unambiguously and in full detail, DALL – E is often able to “fill in the blanks” when the caption implies that the image must contain a certain detail that is not explicitly stated.” – OpenAI.
If you want to learn about the latest AI technology, business hacks and behaviour insights that you can use immediately on your strategy, you should definitely follow my page.