MediaTech

Gamers of a certain vintage will remember the early days of 3D consoles as a hit-and-miss affair.

For every jaw-dropping Super Mario 64 moment there was an irritating adventure game with a permanently swivelling camera and jagged polygons. Yet these opposing ends of the pleasure spectrum had one thing in common: little to no facial animation.

“Narrative games have been around for a long time, but once the hardware and the rendering capabilities improved, the quality of the animation needed to follow,” Speech Graphics CEO Gregor Hofer tells BusinessCloud. 

“When you had blocky characters, it didn’t matter as much, right? Because he couldn’t really see the difference anyway.”

The Edinburgh-headquartered firm’s software, spun out of the city’s university in 2010, delivers speech animation & lip synchronisation from audio alone. Used by Triple A studios for characters’ facial animations in smash hit games including The Last of Us Part II (pictured), it featured third on our MediaTech 50 ranking this year.

University spinout

Co-founders Hofer and Michael Berger, who serves as CTO, met during PhDs in Informatics at the Centre for Speech Technology Research back in 2007. Berger’s work focused on visual speech while Hofer’s studies were in the related area of non-verbal animation driven by speech. 

They both saw the great impact facial animation technology could have on the world. “We worked on analysing interactions – with machine learning, we could see how people relate to each other, how they talk to each other,” he explains. “It’s emotional content: there’s a lot of context. 

“You may start at one point, but as you move from one point to another, with all those different data sources, there is a continuous flow of data which you are trying to map: there’s the intonation, the pitch, the energy in the audio.

“In a specific extraction we might look at different frequency bands, and how those frequency bands relate to each other over time. By extracting these kinds of cues, you can eventually build a model of that information in specific contexts.”

The duo started up a conference with other global PhDs and postdocs focused on facial animation and analysis. “Through that, we made a lot of industry contacts – we had speakers from Microsoft and Sony, Industrial Light and Magic,” says Hofer.

“They became aware of the work we were doing at the university. And even before we spun out the company, we had people contacting us about using the technology in their projects.”

Activist shareholder bids to oust Bidstack directors

Milo & Kate

The all-conquering videogames industry would become the main focus for Hofer and Berger’s fledgling tech. Early on they were involved in a famous R&D project at Lionhead Studios, the now-closed Guilford developer, with Microsoft’s Xbox 360 Kinect – a sensor bar which tracks your movements and allows you to use your body as a game controller.

Milo & Kate, led by industry legend Peter Molyneux, sought to have virtual character Milo react to the user – a sort of interactive friend who could talk to players, recognise them and react in realistic fashion.

“They had a lot of speech for that character. We produced a lot of the animation that you see in that demo – and that was before we had even started the company,” says Hofer.

With the rise of the powerful 360 and PlayStation 3 consoles, the founders recognised a huge problem: modern games contain vast amounts of recorded voiceover, possibly repeated in many languages for localisation, which is incredibly difficult and expensive to animate by hand or through motion capture.

From service company to SaaS

Bootstrapped until 2018, Speech Graphics operated as a service company which used in-house tech to “charge for minutes” in delivering facial animation.

“Over time, we turned it into a product,” Hofer says of the move to an infinitely more scalable Software-as-a-Service licensing platform. “We found there was a real niche for what we were doing – companies didn’t have the systems to really scale animation production. 

“You typically produce animation either by hand, or performance capture by having someone put on a suit and a helmet covered in sensors. While that produces very good results, it’s very, very time consuming, and extremely expensive.

“Our software analyses the speech and automatically produces the animation – it’s very impactful and a lot cheaper. It’s very comparable to something that you would get with performance capture systems. The only difference is that you might not get the complete likeness: we don’t know exactly what the character is, or what the actor’s doing, as we’re inferring from the audio.”

Speech Graphics

Used by 90% of Triple A studios today, its systems include facial animation accelerator SGX, which interprets vocal characteristics and maps the data to a detailed model of the associated muscle systems to generate anatomically correct lip sync & emotionally accurate expressions; and SG Com, which allows for animation to be generated on-the-fly from audio streamed or downloaded to the gaming device in real-time.

The Last of Us

The Last of Us and its contemporaries have raised the bar in terms of storytelling within games: the believability of the interactions between characters is now all-important in successfully immersing players. I won’t be alone in having cried as that game drew to a close.

No wonder, then, that Speech Graphics is now taking the power of emotion into new verticals within enterprise: its generative AI platform Rapport is a modular system which automates character animation from speech and other vocalisations, and is looking to tap into new markets such as branding within the metaverse; in eCommerce; training and education; and triage assessment in healthcare.

Speech Graphics Rapport healthcare

“One of our key drivers for growth is making it much more accessible. We’re now working with more AA games studios and indies,” says Hofer. “And over the last few years, other industries have begun to embrace digital characters. We even do the Meerkats! There’s a lot of this happening right now that wasn’t the case even three years ago.

“It’s a more interesting and engaging way to interact with brands. We’re building a platform to make it really, really easy for anybody to connect all the different bits that you need to have a successful experience around digital characters. 

“It’s not just about games – it could be a character on a website, in a metaverse, music artists who want to appear in Fortnite or Roblox, brand ambassadors… we’re integrating the platform with different character creators, voice services, rendering environments.”

‘You have to commit fully to entrepreneurship’