Generative AI Models For Metaverse

Author: Jimi Vaubien

Content plays a significant role in the success of a platform, and the Metaverse won’t be an exception. Who wants to hang out in an empty virtual universe?

Recent models like Google Imagen or Open AI DALL-E 2 showed impressive results in image generations, hinting that AI will be able to generate 3D assets in the near future. However, these models need a massive quantity of data and processing power to train: A perfect use case for High-Performance Computing.

What do we need to fill the Metaverse?

To fill the Metaverse, we need 3D assets in quantity and diversity.


First, we need assets in quantity. The Metaverse can theoretically be used for work, meetings, gaming, and contemplation, … It requires a wide variety of objects, buildings, and environments to be created.


Quantity is not enough, though. If every place and user in the virtual world looks and feels the same without enough customization, it will hinder the immersion.

We start seeing a problem here: creating realistic 3D assets is costly and time-consuming. Experimented 3D designers must produce the assets, geometry variations, and textures, … It can be a long process to get high-quality results.

It won’t be enough to keep the pace imposed by a content-hungry virtual world like the Metaverse.

Generative AI research so far

This is where AI can play a role in alleviating the problem and boosting the 3D artists’ productivity tenfolds.

For many years, machine learning research worked on generative models, which can generate various objects from a distribution. But how do you generate/create with a mathematical model? We rarely create from nothing, right?

Interestingly, generative models work as distribution transformers: they transform a prior distribution from which you know how to sample into the target distribution.

Let’s take, for example, a model generating human faces; here is the main workflow: we first sample random vectors from a multivariate normal distribution. Then a neural network is trained to transform the prior normal distribution into the distribution of images (2D matrices) representing human faces. This process, to be successful, necessitate tons of images from the target distribution and a long time to train.

But recently, we made some breakthroughs in various generation tasks which could be used to build the Metaverse:

Image generation: GAN and Diffusion Based models

Image generation is the task of generating images from a given distribution, for instance, human faces, buildings, …

Generative Adversarial Networks (GAN) and Diffusion Based models showed excellent capabilities on this task. For instance, StyleGAN NADA, the last version of the StyleGAN models, can generate high-fidelity human faces and shift the generated images’ domain without gathering additional data. It converts a pre-trained generator to new domains using only a textual prompt and no training data


Text-to-image generation: DALL-E 2 and Imagen

Text-to-image generation is the task of generating images matching a text description. It’s powerful since you can guide the image generation with natural language, a simple and natural way to interact with the AI model for nontechnical users or artists.

Recently, DALL-E 2 from OpenAI and Google Imagen achieved astonishing results in this task. Their models are based on Diffusion, and they can even do image editing:

e.g: replace a hat on somebody’s head, add an object in your hand, remove an object,

Images to 3D scene generation: NVIDIA Instant Nerf

Here the task is to render a 3D scene based on several 2D pictures. Indeed, it’s far easier to come up with 2D views of a scene at different angles than to generate the entire corresponding 3D representation.

NVIDIA Instant Nerf is a recent model that takes a bunch of images of a scene with the camera at different angles and positions and renders the corresponding 3D scene: you can then move the camera anywhere and the model will render the corresponding image. The AI takes care of extrapolating the unknown parts while maintaining coherence.

Sustainable AI development for content generation

Research is progressing pretty fast on generative models, and they are the perfect solutions to significantly increase 3D artists’ productivity.

Especially, progress in guided generation and images to 3D scenes could play a central role in creating a realistic world for games and virtual universes such as the Metaverse.

However, these models necessitate an immense quantity of data and thus a substantial computational power to train in a reasonable amount of time. For example, GPT-3, which is the kind of language model used to handle the text in text-to-image generation, has 175 billion parameters and is trained on 45TB of text data.

Such workloads must be run on High-Performance Computing clusters.

However, considering the energy consumed while training such gigantic models, we should pay attention to the computing providers’ energy furniture.

HPC cloud providers such as DeepSquare takes this seriously and build a sustainable and decentralized cloud. As we continue to take actions to preserve our environment by liming carbon emissions, such providers should be prioritized for cloud computing operations, especially for computationally intensive workloads such as AI training.


OpenAI DALL-E 2:

Google Imagen:



We are continuously working on expanding our community and ecosystem. If you want to learn more about the project or connect with the team and the community, follow us on TwitterLinkedInInstagramTelegram and Discord.

Sign up for our Newsletter

Your e-mail
Your data is safe!
This website uses cookies to ensure you get the best experience on our website. Learn more