What is Generative AI and How Does It Work in 2024

Artificial Intelligence (AI) and Generative Ai is everywhere these days – from the news to LinkedIn and even discussions at the local pub. Everyone has there own opinion or prediction about it. Many predict (or at least hope) that it will revolutionize how we live, work and interact. But what exactly is it, and why is there so much hype around it at the moment?

At its core, AI is a broad term that refers to machines or software. Its goal is to mimic human intelligence and strive to learn, think, perceive, reason, communicate, and make decisions like humans. This evolving technology can be divided into three categories: narrow AI, designed for a specific task, such as speech recognition; general AI, which can perform any intellectual task that a human can do; and super intelligent AI, which surpasses human capabilities in most economically valuable jobs.

Within this broad definition of AI, there is a specific subset currently in the media: So-called Generative AI, which can generate deceptively similar text, images, and other content. This article focuses on what Generative AI is, what it means, and what notable examples demonstrate itsit’s potential.

What is Generative AI?

Generative AI is a subset of Artificial Intelligence that has established its own distinct field.. It is a set of trained AI models and techniques that use statistical methods to generate content based on its probabilities. These AI systems learn to imitate (importantly – imitate, not understand and apply) the data they are trained on and then produce similar content (so not facts).

Unlike Discriminative AI, which classifies input content into predefined categories (e.g., spam filters), Generative AI produces new, synthetic data that reflects the training data.

The basis of Generative AI is machine learning techniques and intense learning. Machine learning uses algorithms that learn from data and use it to make decisions or predictions. Deep learning is a subset of machine learning that uses so-called neural networks with multiple layers.

Each layer represents something like a synapse in the brain – it fires with a certain probability. So when a word like “Great” appears, there are various synapses (nodes) that then say “Britain” or “Wall” might appear after “Great” with a certain probability. The more context given, the more these nodes are trained. If London, the Queen, and the Union Jack appear somewhere, it is highly probable that it is “Great Britain” and not “The Great Wall”.

Classes of models

Generative models use several classes of statistical models (usually neural networks). The most famous example currently, ChatGPT, uses an encoder/decoder architecture. The input is analyzed and classified by the encoder network, converted into computer-readable numbers and variables, and sent through a trained neural network. The resulting numbers and variables are outputted as text to the decoder.

A simple explanation of Generative AI is that the text entered by the user is broken down, the machine tries to understand it, and based on the information, the network then tries to generate the best answer and make it human-readable again, and the result is converted back into speech and output. Therefore, everything is based only on probability, so there are also false statements of facts because, in this case, these are “more likely” than the facts.

The hype around this technology in the media and social media is probably based on the fact that these models are very good at generating convincing and deceptively real content, thus making us believe in intelligence. Nevertheless, Generative AI models also apply to various areas beyond image and text generation, including data augmentation, anomaly detection, and filling in missing data, as well as content classification.

How Generative AI Works

Recently, as computing power has become cheap enough to run large datasets at a “reasonable cost”, they have made significant progress in artificial intelligence, creating the basis for different models to be trained on a large enough scale to produce reasonable output.

These are the three main models behind generative AI, each with its strengths, weaknesses, and possible use cases:

Generative Adversarial Networks (GANs)

GANs essentially consist of two neural networks – the generator and the discriminator – that compete with each other, creating new outputs and controlling the outputs. It works similarly to how a counterfeiter tries to create fake money and a detective tries to differentiate between fake and real money.

The generator network creates a sample/output and passes it to the discriminator. The discriminator could be better at distinguishing at first and might classify a fake as real money. Therefore, both networks need to be trained to improve their efficiency. However, as both learn from their mistakes, their performance improves over time (which is why AI models need to be trained).

The generator’s goal is to produce data and outputs that the discriminator cannot distinguish from accurate data. At the same time, the discriminator tries to get better and better at distinguishing accurate data from fake data. This process continues until the generator produces accurate data, making it impossible for the discriminator to distinguish between real and generated data.

Variational Autoencoders (VAEs)

VAEs rely entirely on probability and statistics principles to generate synthetic data. These models generate data using various simple mathematical structures, such as the mean or standard deviation.

VAEs consist of an encoder and a decoder (as briefly described above). The encoder compresses the input data into a so-called “latent space representation” that captures the parameters of the statistics according to a probability distribution (mean and variance). It then generates a sample from the learned distribution of the latent space, which the decoder network takes and reconstructs the original input data. The model trains to minimize the difference between input and output, ensuring that the generated data closely resembles the original data.

Transformer-Based Models

Compared to GANs and VAEs, transformer-based models such as GPT-3 and GPT-4 are mainly used for tasks involving sequential data, data with specific semantics or interrelationships, such as natural language processing.

Transformer-based models use an architecture based on an “attention mechanism” that prioritizes certain parts of the input data during the task in an attempt to extract and weigh the meaning of the sentence.

GPT models use a variation of the transformer, called the transformer decoder, which reads an entire sequence of data (e.g., a sentence) at once so it can build models or figure out complex dependencies between words in a sentence. These models are trained on huge text models and then fine-tuned for tasks such as translation, question answering, or text generation. The powerful language models they create can produce sentences, paragraphs, and even entire articles with excellent coherence and context. However, there is still a problem that, like other models, they are based only on probabilities and therefore also “hallucinate” or invent content because it is “likely” but wrong.

Use Cases for Generative AI Models

Now that we understand the basic principles of these systems and slowly understand where the limits are but also understand how they work, we can talk right away about how to apply these models. In general, you could say that the current wave of generative AI is limited to applications where it is necessary to have good repetition (GAN models), or you need an output that “might be something”, like a transcription of speech or generated text. Some of the use cases mentioned here should give you a rough idea of the possibilities:

Creative Arts and Design

Generative AI has found many applications in art and design and is changing the way we create and experience art. Dall-E, Midjourney, and many other image generators have shown that creating realistic and compelling art is possible.

GANs, in particular, have played a significant role in this field. For example, an AI-generated portrait created using GANs by the art collective “Obvious” sold at Christie’s for $432,500.

Music Composition and Generation:

Generative AI models are also used to create music. A few years ago, it was unthinkable that a machine could generate something as complex and creative as music. Networks like Google’s MusicLM or OpenAI’s MuseNet are models trained on MIDI files of different genres and sources and can generate compositions in many different styles.

Transferring art into different styles:

AI can create new works and transform existing ones. AI models can learn stylistic elements of one image and apply them to another – a technique known as neural style transfer. The result is a hybrid image that combines one image’s content with another’s artistic style.

Natural Language Processing (NLP)

Generative AI plays a key role in NLP tasks, such as content creation, dialogue systems, translation, and virtual assistant creation.

Text and content creation:

Models such as GPT-3 and GPT-4 have contributed significantly to the current hype. Their remarkable ability to create human-like text has captured people’s imagination. These models can write articles, poems, code, or improve code, making them valuable tools for automatic content creation and sharing the work for us – but the problem is that the content is not always accurate, and all voices sound almost the same.

Dialogue systems and virtual assistants:

By understanding language while generating content in a targeted way, generative models also have the potential to enable conversations between humans and machines. They can generate contextual responses and have human-like conversations. This ability increases the effectiveness of virtual assistants, chatbots and AI in customer service and many other areas.

Transcription and speech enhancement:

Another widely known use case is language models that create content from speech. The challenge is that these models need to understand the context to compensate for the poor quality of the microphone or the noise in the room. In this way, generative AI can produce clear output and create better transcriptions of video and audio content.

Computer Vision and Image Synthesis

Generative AI significantly impacts computer vision tasks, as neural networks can recognize objects or create deceptive replicas.

Image Synthesis:

GANs are widely used to generate realistic synthetic images. For example, Nvidia’s StyleGAN produces incredibly lifelike images of faces that do not exist. Or other AI that generates movie content without professional cameras. But there are also Deep Fakes, computer-generated fake versions of people that can be part of such image synthesis.

Image Augmentation:

Generative models can also fill in missing parts of an image in a process called inpainting. They predict the missing part based on the context of the surrounding pixels. Photoshop AI has thus become a hit on social media because it complements images with content that does not exist. Google’s “Magic Eraser” also makes headlines, which uses generative AI to remove people or objects from pictures with the “most likely” filler.

Drug Development and Healthcare

Generative AI has great potential in healthcare and drug development as it can also predict or “invent” different structures or compounds.

New Drug Discovery:

Generative models can predict the molecular structures of potential drugs, accelerating the drug discovery process. For years, various companies have been trying to use AI models to invent new molecular compounds and use them to develop drugs to treat diseases.

Personalized Medicine:

Generative models can also help in personalized medicine. By learning patterns from patient data, these models can help find the most effective treatment for individual patients.

Examples of Generative AI in Real-World Scenarios

OpenAI’s GPT-4: This transformer-based model is a high-capacity language generator capable of drafting emails, writing code, creating written content, tutoring, and translating.

DeepArt:

Also known as Prisma, this application uses generative models to transform user-uploaded photos into artwork inspired by famous artists.

MidJourney: is a text-to-image generator that creates images and graphics based on user input and descriptions.

Google’s DeepDream: a program that uses AI to find and enhance image patterns, creating dreamy, psychedelic transformations.

Jukin Composer: This tool, powered by OpenAI’s MuseNet, uses AI to create original music for video content.

Insilico Medicine: a biotech company that uses generative models to predict the molecular structures of potential drugs, speeding up the drug discovery process.

ChatGPT: an AI chatbot developed by OpenAI that can have human-like text conversations for customer service and personal assistant applications.

Nvidia’s StyleGAN: a generative adversarial network that generates hyper-realistic images of faces that don’t exist in reality.

Artbreeder: a platform that uses GANs to merge user-input images to create complex and novel images, such as portraits and landscapes.