announcements Featured

Major breakthrough - Realtime NPC Vision - History is Made

Shadowfinder Studios

17 Aug 2023 • 7 min read

This is probably a first for a 3D video game with conversational non-player characters! It's not a flashy demonstration but it is a major first for the technology:

I've come up with a way, using MiniGPT4 and TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g to give the NPC the ability to see his surroundings and describe it to me. I made use of a SceneCaptureComponent2D camera and a render target texture in Unreal 5 to get a screenshot of what the NPC sees. I fed this to a Nvidia A40 cloud instance hosted with Vultr and allowed him to perceive his reality for the first time instead of just being told about it.

King Ogard then accurately describes what he's seeing; a small village in the middle of nowhere, straw and mud homes and a campfire burning in front of them. He responds aptly and quickly and even comes up with additional details to describe the experience beyond just the basics of what is in the scene.

I had this vision in my head of how it would all come together and after a long night and morning of coding without sleep it just all came together finally! It seemed almost miraculous as it was happening. I could barely contain myself as I proceeded to try to calmly ask him about the world.

This was a very simple scene, so the technology will be more interesting in unexpected environments. I've been running a series of tests with other environments to see how it reacts to different objects, arrays of furniture, people, and places.

There could be a lot of interesting interactions that stem from this beyond just simple descriptions. This technique finally gives the characters some awareness. Imagine for example a blacksmith seeing you attempt to strike a blade and saying, "Oh, no no no, not like that. Let me show you," because they perceived visually what was actually happening. My mind boggles at all the possibilites yet undiscovered.

I need to adjust the lipsync'ing a bit, it's a bit tight in this latest update to the code.

Also, this is another first for me, as this code is now fully 100% written in-house. It is not based on ConvAI or Inworld.

All business and investment inquiries may be sent to:

shadowfinderstudios@gmail.com

A personal thank you goes out to the following in no specific order!

Meta’s GenAI Team for creating and releasing LLAMA and LLAMA 2 to the world:

LMSYS and the Vicuna Team for their work on Vicuna:

TheBloke for his hard work:

The Meta Oculus team for LipSync OVR:

https://developer.oculus.com/documentation/unreal/audio-ovrlipsync-unreal/

The oobabooga team for their great text generation tool which made working with the models a breeze:

Nvidia for making the most amazing graphics chips which made all of this possible:

Vultr for the cloud GPU which gave me the capability to finish and test my code. And, the price of which drove me to try to finish as much as I could in a single night:

https://www.vultr.com/

A thank you to playht for their services:

MobaXterm for making a fantastic SSH terminal for Windows:

Epic Game for making a great game engine:

Microsoft for their great development tools, OS and strong backing of the AI world through their funding of OpenAI that is driving the world forward:

Thanks go out to tortoise and elevenlabs for their work on improving voice cloning and TTS:

ElevenLabs - Generative AI Text to Speech & Voice Cloning

Generative AI Text to Speech & Voice Cloning

The Linux team for a great OS on which I ran the server:

Linux.org

Friendly Linux Forum

Linux.org

GNU for making free software possible:

Canonical for their hard work on Ubuntu which is the flavor of Linux I chose for the server:

Google for their hard work on YouTube and the field of AI which laid the foundation for everything we're working with today:

OpenAI for ChatGPT and Whisper:

Georgi Gerganov for GGML editions of just about everything:

And, last but not least and the thing that tied all of this together and made it possible! The MiniGPT4 team!

@article{zhu2023minigpt,
title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models},
author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed},
journal={arXiv preprint arXiv:2304.10592},
year={2023}
}

Without any of these people and their hard work none of this would have been possible.

Sign up for more like this.