Nvidia has developed a way to turn 2D photos into 3D scenes

AI researchers at Nvidia have developed a way to covert a handful of 2D images into a 3D scene almost instantly by using ultra-fast neural network training alongside rapid rendering.

Known as inverse rendering, the process leverages AI to approximate how light behaves in the real world to turn 2D images taken at different angles into 3D scenes.

Nvidia’s researchers applied their novel approach to a popular new technology called neural radiance fields or NeRF for short. The result, which the company has dubbed Instant NeRF, is the fastest NeRF technique to date and it is more than 1,000 times faster in some cases. The neural model used takes just seconds to train on a few dozen still photos though it also requires data on the camera angles they were taken from.

VP for graphics research at Nvidia, David Luebke provided further insight between the difference between NeRF and Instant NeRF in a blog post, saying:

“If traditional 3D representations like polygonal meshes are akin to vector images, NeRFs are like bitmap images: they densely capture the way light radiates from an object or within a scene. In that sense, Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography — vastly increasing the speed, ease and reach of 3D capture and sharing.” 

Potential use cases

By using neural networks, NeRFs are able to render realistic 3D scenes based on an input collection of 2D images. However, the most interesting part is how the neural networks used to create them are able to fill in the blanks between the 2D images even when the objects or people in them are blocked by obstructions.

Normally, creating a 3D scene with traditional methods can take a few to several hours depending on the complexity and resolution of the visualization. By bringing AI into the picture though, even early NeRF models were capable of rendering crisp scenes without artifacts in a few minutes after being trained for several hours.

Nvidia’s Instant NeRFs are able to cut down the required rendering time by several orders of magnitude by using a technique developed by the company called multi-resolution hash grid encoding that has been optimized to run efficiently on Nvidia GPUs. The model shown off by the company at GTC 2022 uses the Nvidia CUDA Toolkit and the Tiny CUDA Neural Networks library which can be trained and run on a single Nvidia GPU though graphics cards with Nvidia Tensor Cores can handle the work even faster.

Going forward, Instant NeRF technology could be used to quickly create avatars or scenes for virtual worlds, to capture video conferencing participants and their environments in 3D or to reconstruct scenes for 3D digital maps. Alternatively, the technology could also be used to train robots and self-driving cars so that they better understand the size and shape of real-world objects by capturing 2D images or video footage of them. At the same time, the architecture and entertainment industries can use Instant NeRF to rapidly generate digital representations of real environments that creators can modify and build on top of.

Nvidia’s researchers are also exploring how their new input encoding technique could be used to accelerate various AI challenges such as reinforcement learning, language translation and general-purpose deep learning algorithms.

Go to Source