I Just Travelled to the Future and Saw What I’ll Look Like at 75 Years Old With GANs
Artificial Intelligence Can Help Us Time-Travel With Generative Adversarial Networks
I have a confession to make.
In 2017, I spent hours on FaceApp, to see what everyone around me would look at an older or younger age.
I was obsessed.
And, I think it was because of how realistic the final images were. They weren’t another SnapChat filter that didn’t align with my face, but rather an actual progression of what I could look like…. not that the wrinkles looked good. Either way, it looked like a picture I took in real-time.
It was definitely the closest I’ve gotten to time-travelling.
Back then, I never looked into the ‘how’ behind this feature. I was too busy producing baby & older pictures of all my friends and family members while having a good laugh to understand the workings behind it.
Although, the other day, I saw this picture of Elon Musk as an old man — thinking it was real, until I realized it was another FaceApp edit. Completely slipping my mind was this fixation of mine, and all of a sudden, I was reminded of its initial trendiness when it was first released.
But the difference between 6 years ago and now, is that the first thing I wanted to know was… how? From there, I actually connected the dots within my head and realized it has to be related to Artificial Intelligence — specifically GANs, or Generative Adversarial Networks.
Generative… What Now?
Humans posses remarkable abilities to recognize and create new things. And, while we’ve made significant progress in teaching computers to recognize objects in the same manner as humans, the capability to generate novel content has been elusive for AI systems.
However, in 2014, Ian Goodfellow made a change to this, as he invented Generative Adversarial Networks — to finally unlock this ‘creation’ potential for AI systems.
According to Yann LeCun, GANs are “the most interesting idea in the last 10 years in Machine Learning” — and when a prominent researcher within the field of AI compliments this idea, you know it’s about to be good.
And, he’s not wrong. The hype around GANs is what initially got me excited about the world of artificial intelligence.
Also, a couple of quick notes about machine learning before we continue… if you’re new to the world of AI, these terms are pretty important to know:
1) Supervised Learning: A machine learning approach where the algorithm learns to predict outputs based on input data that has been labeled with the correct output values.
2) Unsupervised Learning: A machine learning approach where the algorithm learns to identify patterns and relationships in the input data without any explicit labeling of the data.
3) Reinforcement Learning: A machine learning approach where the algorithm learns to make decisions based on feedback from the environment, with the goal of maximizing a reward signal that is tied to the quality of the decisions made.
Formally, Generative Adversarial Networks (GANs) are a type of unsupervised machine learning algorithm that use two neural networks — a generator and a discriminator — to generate new data that closely mimics pre-existing data. And, this is done through the GAN learning patterns & relationships in the original data to create new, realistic images.
But, here’s my personal favourite way to think of GANs:
I’ve always loved drawing, but I’ve never reached a level where my drawings look like photographs. But, maybe, through a bit of training I could.
Let’s say I wanted to have someone ‘mentor’ me so I could refine my drawings, and while doing this I had to keep drawing a flower until it looked like an actual photograph.
I would repeatedly draw the flower and submit it to my mentor for feedback, who would assess whether it resembles a photograph or not. If they decide it was a drawing, I would iterate on the feedback until the drawing eventually passes as a photograph, and then I’ll achieve my goal of creating a realistic flower drawing from scratch.
Within that context, I’m the generator and my mentor would be the discriminator; but in the end, we produce a new ‘image’ that’s never seen before.
But, let’s break this down further.
Training Generative ‘Adversarial’ Networks
The ‘adversarial’ nature of this algorithm stems from the combination of these two neural networks — as they go against each other, until a beautiful, final output is created.
Even though we know the gist of them, let’s get the formal definitions out of the way first:
1) Generator: A neural network within the GAN that’s a unsupervised model, which learns to generate new data samples that are similar to a set of real data sample.
2) Discriminator: A neural network within the GAN that’s a supervised model, which takes a data sample as an input to determine whether it’s realistic or fake.
The architecture of a GAN can vary depending on the data type being generated, but more often than not, it follows this general process:
This diagram represents exactly what we’ve established already, with the adversarial component between the generator and the discriminator — with the drawing & mentor analogy.
But, both these neural networks are being trained at the exact same time, while iterating to get better at their respective roles. And, as the two neural networks train, they’re apart of a type of ‘min-max game’ where the generator and discriminator play against each other.
This ‘game’ is formally known as an optimization function:
And, behind all these complex symbols, here’s what’s happening.
G represents the generator, and D represents the discriminator. And, x is a real data point, z is a random noise vector that the generator uses to create fake data, and E is the expectation function.
But, the generator is trying to minimize the discriminator’s ability to tell real data from fake data, while the discriminator is trying to maximize its ability to tell real data from fake data — hence, the max & the min involved.
So, we can break the function down into two parts; where the first part, E_x[log(D(x))] measures how well the discriminator can identify real data points and the second part E_z[log(1-D(G(z)))] measures how well the discriminator can correctly identify fake data points create by the generator.
The generator wants to minimize the second part of the function, whereas the discriminator wants to maximize the entire function. An on-going argument between the two.
This stops when an equilibrium is reached between both parts of the function, so that the generated image is passed as ‘real’ by the discriminator.
But, this is a good thing. Because, as the generator gets better at creating data that looks real to the discriminator, the discriminator gets trained at identifying fake data — so the generator can produce the most realistic data possible.
Some Variations of GANs
There are at least 20+ type of GANs, along with many more being developed and researched on the daily. The general process of GANs falls into the same concept of a generator plus a discriminator, but there’s differences between their use cases & the data-types involved.
Each GAN has its own strengths and weaknesses, but here’s a collection of GANs I found interesting during my whole deep-dive.
ProgressiveGANs
A couple years back, Remini — an app used to turn blurry photos into HD — was trending all over social media. It took extremely zoomed-in and pixelated images and turned them into sharpened photos, without taking anything away from the actual image.
That app, along with many others, are an example of ProgressiveGANs. They generate high-quality images in a progressive manner. PGANs start with low-resolution images and gradually increase the resolution over time while generating higher quality images at each stage.
ConditionalGANs
Traditional GANs generate images randomly through taking noise as an input. On the other hand, cGANs use additional information to control the generation process — which means, the generator network takes in both random noise and a condition or label as its input.
For example, if we had a model that generates images of handwritten digits conditioned on a specific digit label, the cGAN would use the digit label (0–9) to generate an image of the digit.
So, basically, the user can have control of the image that’s being generated as well.
StyleGANs
StyleGANs are able to generate high-quality images with fine-grained control over their appearance. Traditional GANs learn to map a random vector to an image, whereas StyleGANs generate images by controlling the style and structure of each feature within an input image.
And, these GANs are responsible for my initial curiosity — how we can generate pictures that are older versions of ourselves.
How StyleGANs Work Within FaceApp
This process is a little more specific and different in comparison to the general overview of traditional GANs.
The first step involves training our neural networks on a large and diverse dataset of images — which in this case, diversified ages.
Both our generator and discriminator learn from this training set in their own manner.
The generator is trained to create new images that mimic those within a training dataset. It’ll receive a random input vector (which is also known as a latent code) so that it’ll generate an image based on that code. This image is then compared to the real images from the training dataset, so that the generated images are updated to produce images that are extremely similar to real images.
On the other hand, we already know that the discriminator is trained to distinguish between real and fake images — so, our discriminator familiarizes itself with all the human-like features to ensure that the generated images are realistic.
So, once our StyleGAN has been trained, our next step is to generate an older version of a specific person’s face. The network takes an input of the person’s face (so, I could input a picture of my face) and then encodes it into a latent code to capture the essential features of the image.
From there, the encoded input image is modified to create a new latent code that represents an older version of the person’s face — through all the training our generator has done on older images. And, this process continues until our generated image passes through the discriminator and gets its final checkmark.
From there, voila, you’ve got your output which is you 30–35 years older… even though it was most definitely not my favourite image.
The Future of GANs
The future of Generative Adversarial Networks is extremely exciting and full of loads of possibilities. Knowing that we’ve reached a point within Artificial Intelligence where we can create novel content is extremely significant for the development of the field.
So many industries can be impacted with this enhancement of deep learning, including but not limited to:
Art and Design: We could generate new forms of art within a matter of seconds, allowing for a personalized and creative world of visuals. But, more than anything, this could be applied to the industries that are behind all of our enjoyments and luxuries; architecture, fashion & product design.
Medical Research: We could leverage GANs to generate realistic and accurate models of the human anatomy, which could help within medical research and training — allowing for help within diagnosis and treatments.
Robotics and Autonomous Systems: This one is my favourite. We could use GANs to generate synthetic training data for autonomous sytems. For example, we could generate images of different environments for self-driving cars, so they learn how to navigate and interact with their surroundings.
Truly a wide range.
Of course, there’s a few limitations and problems.
Deepfakes can converge extremely easily with GANs, which hold a lot of controversy behind them. Of course, it’s not the best use case as it can result in misuse of this technology for malicious purposes.
The training instability is another one. Because there’s two neural networks involved, it can be difficult to have an efficient training process and sometimes the conflicts make the networks collapse.
Although, I don’t want to sideline the potential GANs hold. For all the reasons and applications I talked about before the limitations, GANs have and continue to change the way we create content for our day-to-day lives.
TL;DR
- Generative Adversarial Networks consist of a generator and a discriminator, which train against each other until a realistic image converges.
- There are so many types of GANs for different use cases and data types, including: ProgressiveGANs, ConditionalGANs, StyleGANs.
- Despite its limitations, GANs can impact so many industries in a positive manner — to lead to a more efficient and creative world.
My personal TL;DR from all this? I agree with Yann LeCun, GANs are super sick.
Hey, I’m Priyal, a 17-year-old driven to impact the world using emerging technologies. If you enjoyed my article or learned something new, feel free to subscribe to my monthly newsletter to keep up with my progress in AI & quantum computing exploration and an insight into everything I’ve been up to. You can also connect with me on LinkedIn and follow my Medium for more content! Thank you so much for reading.