Major breakthrough - Realtime NPC Vision - History is Made

Major breakthrough - Realtime NPC Vision - History is Made

This is probably a first for a 3D video game with conversational non-player characters! It's not a flashy demonstration but it is a major first for the technology:

I've come up with a way, using MiniGPT4 and TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g to give the NPC the ability to see his surroundings and describe it to me. I made use of a SceneCaptureComponent2D camera and a render target texture in Unreal 5 to get a screenshot of what the NPC sees. I fed this to a Nvidia A40 cloud instance hosted with Vultr and allowed him to perceive his reality for the first time instead of just being told about it.

King Ogard then accurately describes what he's seeing; a small village in the middle of nowhere, straw and mud homes and a campfire burning in front of them. He responds aptly and quickly and even comes up with additional details to describe the experience beyond just the basics of what is in the scene.

I had this vision in my head of how it would all come together and after a long night and morning of coding without sleep it just all came together finally! It seemed almost miraculous as it was happening. I could barely contain myself as I proceeded to try to calmly ask him about the world.

This was a very simple scene, so the technology will be more interesting in unexpected environments. I've been running a series of tests with other environments to see how it reacts to different objects, arrays of furniture, people, and places.

There could be a lot of interesting interactions that stem from this beyond just simple descriptions. This technique finally gives the characters some awareness. Imagine for example a blacksmith seeing you attempt to strike a blade and saying, "Oh, no no no, not like that. Let me show you," because they perceived visually what was actually happening. My mind boggles at all the possibilites yet undiscovered.

I need to adjust the lipsync'ing a bit, it's a bit tight in this latest update to the code.

Also, this is another first for me, as this code is now fully 100% written in-house. It is not based on ConvAI or Inworld.

All business and investment inquiries may be sent to:

A personal thank you goes out to the following in no specific order!

Meta’s GenAI Team for creating and releasing LLAMA and LLAMA 2 to the world:

Llama 2 - Meta AI
Llama 2 — The next generation of our open source large language model, available for free for research and commercial use.

LMSYS and the Vicuna Team for their work on Vicuna:

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality | LMSYS Org
<p>We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation…

TheBloke for his hard work:

TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

The Meta Oculus team for LipSync OVR:

The oobabooga team for their great text generation tool which made working with the models a breeze:

GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (ggml), Llama models.
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (ggml), Llama models. - GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Suppor…

Nvidia for making the most amazing graphics chips which made all of this possible:

NVIDIA A40 GPU for Visual Computing
The world’s most powerful data center GPU for visual computing.

Vultr for the cloud GPU which gave me the capability to finish and test my code. And, the price of which drove me to try to finish as much as I could in a single night:

A thank you to playht for their services:

AI Voice Generator & Realistic Text to Speech Online
AI Voice Generator with 600+ AI voices. Generate realistic Text to Speech voice over online with AI. Convert text to audio and download as MP3 & WAV files.

MobaXterm for making a fantastic SSH terminal for Windows:

MobaXterm free Xserver and tabbed SSH client for Windows
The ultimate toolbox for remote computing - includes X server, enhanced SSH client and much more!

Epic Game for making a great game engine:

Unreal Engine 5
Unreal Engine 5 empowers all creators across all industries to deliver stunning real-time content and experiences.

Microsoft for their great development tools, OS and strong backing of the AI world through their funding of OpenAI that is driving the world forward:

What’s new in Visual Studio 2022 | Download for free - Visual Studio
Visual Studio 2022 has the latest features to bring you real-time collaboration with Live Share, AI-assisted code completions, & many more. Download for free.

Thanks go out to tortoise and elevenlabs for their work on improving voice cloning and TTS:

GitHub - neonbjb/tortoise-tts: A multi-voice TTS system trained with an emphasis on quality
A multi-voice TTS system trained with an emphasis on quality - GitHub - neonbjb/tortoise-tts: A multi-voice TTS system trained with an emphasis on quality
ElevenLabs - Generative AI Text to Speech & Voice Cloning

The Linux team for a great OS on which I ran the server:
Friendly Linux Forum

GNU for making free software possible:

The GNU Operating System and the Free Software Movement
Since 1983, developing the free Unix style operating system GNU, so that computer users can have the freedom to share and improve the software they use.

Canonical for their hard work on Ubuntu which is the flavor of Linux I chose for the server:

Enterprise Open Source and Linux | Ubuntu
Ubuntu is the modern, open source operating system on Linux for the enterprise server, desktop, cloud, and IoT.

Google for their hard work on YouTube and the field of AI which laid the foundation for everything we're working with today:

Bekijk je favoriete video’s, luister naar de muziek die je leuk vindt, upload originele content en deel alles met vrienden, familie en anderen op YouTube.

OpenAI for ChatGPT and Whisper:

Creating safe AGI that benefits all of humanity

Georgi Gerganov for GGML editions of just about everything:

GitHub - ggerganov/whisper.cpp: Port of OpenAI’s Whisper model in C/C++
Port of OpenAI’s Whisper model in C/C++. Contribute to ggerganov/whisper.cpp development by creating an account on GitHub.

And, last but not least and the thing that tied all of this together and made it possible! The MiniGPT4 team!


title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models},
author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed},
journal={arXiv preprint arXiv:2304.10592},

Without any of these people and their hard work none of this would have been possible.