LLAniMAtion: LLAMA Driven Gesture Animation

Windle, Jonathan; Matthews, Iain; Taylor, Sarah

LLAniMAtion: LLAMA Driven Gesture Animation

Files

v43i8_cgf15167.pdf (24.92 MB)

paper1027.mp4 (10.78 MB)

Date

2024

Authors

Windle, Jonathan
Matthews, Iain
Taylor, Sarah

Publisher

The Eurographics Association and John Wiley & Sons Ltd.

Abstract

Co-speech gesturing is an important modality in conversation, providing context and social cues. In character animation, appropriate and synchronised gestures add realism, and can make interactive agents more engaging. Historically, methods for automatically generating gestures were predominantly audio-driven, exploiting the prosodic and speech-related content that is encoded in the audio signal. In this paper we instead experiment with using Large-Language Model (LLM) features for gesture generation that are extracted from text using LLAMA2. We compare against audio features, and explore combining the two modalities in both objective tests and a user study. Surprisingly, our results show that LLAMA2 features on their own perform significantly better than audio features and that including both modalities yields no significant difference to using LLAMA2 features in isolation. We demonstrate that the LLAMA2 based model can generate both beat and semantic gestures without any audio input, suggesting LLMs can provide rich encodings that are well suited for gesture generation.

CCS Concepts: Computing methodologies → Machine learning algorithms; Animation

        @article{10.1111:cgf.15167
,
journal = {Computer Graphics Forum},
title = {{LLAniMAtion: LLAMA Driven Gesture Animation
}},
author = {Windle, Jonathan and 
Matthews, Iain and 
Taylor, Sarah
},
year = {2024
},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.
},
ISSN = {1467-8659
},
DOI = {10.1111/cgf.15167
}
}

URI

https://doi.org/10.1111/cgf.15167
https://diglib.eg.org/handle/10.1111/cgf15167

Collections

43-Issue 8
SCA 2024: Eurographics/SIGGRAPH Symposium on Computer Animation

Full item page