Generative artificial intelligence (generative AI, GenAI, or GAI) is artificial intelligence capable of generating text, images or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.
Generative artificial intelligence (AI) refers to a subset of AI technologies and algorithms capable of generating new content or data that resembles the original training data. This can include text, images, music, speech, and other forms of media or outputs. What sets generative AI apart is its ability to learn from a vast dataset and then use that learned information to create entirely new, original pieces of content that were not explicitly programmed into it.
The most common types of generative AI models include:
- Generative Adversarial Networks (GANs): Introduced by Ian Goodfellow and his colleagues in 2014, GANs consist of two parts: a generator that creates images and a discriminator that evaluates them. The generator produces new data instances, while the discriminator evaluates their authenticity; i.e., whether they look like real data instances or not. The generator and the discriminator are trained simultaneously in a competitive manner, improving each other until the generated instances are indistinguishable from real ones.
- Variational Autoencoders (VAEs): These are generative models that learn a latent representation of the input data and can generate new data based on that representation. They are particularly useful for tasks like image generation, where they can learn to encode images into a lower-dimensional space and then decode that space back into new images.
- Transformers: Although initially designed for natural language processing tasks, transformer models like OpenAI’s GPT (Generative Pretrained Transformer) series have shown remarkable generative capabilities. They can generate highly coherent and contextually relevant text based on the input prompt, making them useful for applications like chatbots, content creation, and more.
The applications of generative AI are diverse and growing. In the realm of content creation, these models can produce realistic images, write stories or articles, compose music, and even generate synthetic human voices. In the scientific field, generative AI is being used to create new molecular structures for drug discovery. In the realm of design, it aids in creating virtual environments, fashion items, and architectural models.
Generative AI raises important ethical considerations, particularly concerning authenticity, copyright, and the potential for misuse, such as creating deepfakes. As such, the development and deployment of generative AI technologies are accompanied by discussions on guidelines and regulations to ensure they are used responsibly.
History
The academic discipline of artificial intelligence was established at a research workshop held at Dartmouth College in 1956 and has experienced several waves of advancement and optimism in the decades since. Since its inception, researchers in the field have raised philosophical and ethical arguments about the nature of the human mind and the consequences of creating artificial beings with human-like intelligence; these issues have previously been explored by myth, fiction and philosophy since antiquity. The concept of automated art dates back at least to the automata of ancient Greek civilization, where inventors such as Daedalus and Hero of Alexandria were described as having designed machines capable of writing text, generating sounds, and playing music. The tradition of creative automatons has flourished throughout history, exemplified by Maillardet’s automaton created in the early 1800s.
Artificial Intelligence is an idea that has been captivating society since the mid-20th century. It began with science fiction familiarizing the world with the concept but the idea wasn’t fully seen in the scientific manner until Alan Turing, a polymath, was curious about the feasibility of the concept. Turing’s groundbreaking 1950 paper, “Computing Machinery and Intelligence,” posed fundamental questions about machine reasoning similar to human intelligence, significantly contributing to the conceptual groundwork of AI. The development of AI was not very rapid at first because of the high costs and the fact that computers were not able to store commands. This changed during the 1956 Dartmouth Summer Research Project on AI where there was an inspiring call for AI research, setting the precedent for two decades of rapid advancements in the field.
Since the founding of AI in the 1950s, artists and researchers have used artificial intelligence to create artistic works. By the early 1970s, Harold Cohen was creating and exhibiting generative AI works created by AARON, the computer program Cohen created to generate paintings.
Markov chains have long been used to model natural languages since their development by Russian mathematician Andrey Markov in the early 20th century. Markov published his first paper on the topic in 1906, and analyzed the pattern of vowels and consonants in the novel Eugeny Onegin using Markov chains. Once a Markov chain is learned on a text corpus, it can then be used as a probabilistic text generator.
The field of machine learning often uses statistical models, including generative models, to model and predict data. Beginning in the late 2000s, the emergence of deep learning drove progress and research in image classification, speech recognition, natural language processing and other tasks. Neural networks in this era were typically trained as discriminative models, due to the difficulty of generative modeling.
In 2014, advancements such as the variational autoencoder and generative adversarial network produced the first practical deep neural networks capable of learning generative models, as opposed to discriminative ones, for complex data such as images. These deep generative models were the first to output not only class labels for images but also entire images.
In 2017, the Transformer network enabled advancements in generative models compared to older Long-Short Term Memory models, leading to the first generative pre-trained transformer (GPT), known as GPT-1, in 2018. This was followed in 2019 by GPT-2 which demonstrated the ability to generalize unsupervised to many different tasks as a Foundation model.
In 2021, the release of DALL-E, a transformer-based pixel generative model, followed by Midjourney and Stable Diffusion marked the emergence of practical high-quality artificial intelligence art from natural language prompts.
In March 2023, GPT-4 was released. A team from Microsoft Research argued that “it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system”. Other scholars have disputed that GPT-4 reaches this threshold, calling generative AI “still far from reaching the benchmark of ‘general human intelligence’” as of 2023. In 2023, Meta released an AI model called ImageBind which combines data from text, images, video, thermal data, 3D data, audio, and motion which is expected to allow for more immersive generative AI content.
Modalities
A generative AI system is constructed by applying unsupervised or self-supervised machine learning to a data set. The capabilities of a generative AI system depend on the modality or type of the data set used.
Generative AI can be either unimodal or multimodal; unimodal systems take only one type of input, whereas multimodal systems can take more than one type of input. For example, one version of OpenAI’s GPT-4 accepts both text and image inputs.
Text
Code
Images
Audio
Music
Video
Runway Gen2, prompt A golden retriever in a suit sitting at a podium giving a speech to the white house press corps