An automatic speech recognition Android app for ALS patients

Authors

  • Cecilia Di Nardi School of Postgraduates studies & Research, Royal College of Ireland, Dublino 2, Irlanda
  • Rosanna Turrisi Multiscale Brain Communication, Istituto Italiano di Tecnologia, Ferrara, Italia
  • Alberto Inuggu Robotics Brain and Cognitive Sciences Unit, Istituto Italiano di Tecnologia, Genova, Italia
  • Nilo Riva Dipartimento di Neurologia, Neurofisiologia Clinica e Neuroriabilitazione, IRCCS Ospedale San Raffaele, Milano, Italia
  • Ilaria Mauri Logopedia, IRCCS Ospedale San Raffaele, Milano, Italia
  • Leonardo Badino Multiscale Brain Communication, Istituto Italiano di Tecnologia, Ferrara, Italia

DOI:

https://doi.org/10.17469/O2104AISV000012

Keywords:

automatic speech recognition, amyotrophic lateral sclerosis, smartphone application, deep neural networks

Abstract

This paper describes AllSpeak, an Automatic Speech Recognition (ASR) Android Application developed for Italian-speaking patients with Amyotrophic Lateral Sclerosis (ALS). It allows to recognize a predefined and customizable set of basic utterances that are used by the patient in everyday life (e.g., “I’m thirsty”, “I feel pain”, etc…). The ASR engine is based on deep learning architectures and it uses a simple decoding strategy to allow offline (i.e., w/o any network connection) and fast decoding. Although deep learning approaches have achieved outstanding results on different speech recognition tasks, recognition of impaired speech is still quite challenging for an ASR system mainly due to a scarce availability of training data and a large variability of impairments. We have addressed these two problems by limiting recognition to a set of key phrases/words corresponding to the patient’s primary needs and by strongly adapting the neural networks to the target speaker’s voice. Results show that the type of network architecture and the training strategy have both a very significant impact on recognition accuracy of dysarthric speech. Although different architectures and training strategies perform similarly on healthy speakers, recurrent neural networks trained in sequence-to-sequence fashion significantly outperform any other method on most of ALS speakers.

Downloads

Published

31-12-2018