Modelling Sentiment Analysis scores and acoustic features of emotional speech with neural networks: A pilot study

Enrico Zovato; Vito Quinci; Paolo Mairano

doi:10.17469/O2106AISV000023

Authors

Enrico Zovato Laboratorio di Fonetica Sperimentale “Arturo Genre”, Università di Torino
Vito Quinci Dipartimento di Informatica, Università di Torino
Paolo Mairano Departement des Etudes Anglophones, Université de Lille, France

DOI:

https://doi.org/10.17469/O2106AISV000023

Keywords:

emotional speech, sentiment analysis, prosody, voice quality

Abstract

Abundant literature has shown that emotional speech is characterized by various acoustic cues. However, most studies focused on sentences produced by actors, disregarding more naturally produced speech due to the difficulty in finding suitable emotional data. In our previous work we had performed an analysis of audiobook data in order to see if sentiment analysis could be of help in selecting emotional sentences from read speech. A regression analysis with Linear Mixed Models had revealed small effects, and the power of the models was low. We propose here an analysis with a neural network classifier predicting sentiment on the basis of acoustic cues, given the success of such models in the speech literature. However, the accuracy of the output was merely +0.13 above chance levels, suggesting that the different components used to express emotions (acoustic and lexical) tend to be complementary rather than additive, at least in audiobooks.