Excerpt of clients we work with

audEERING’s audio analysis technology can be experienced in various end-user products developed in cooperation with our clients. Our sensAI framework has been extended to custom speech, music, and sound processing and recognition solutions for consumer research, call center data analysis, in-car emotion recognition, acoustic scene classification, DJ apps, gaming, and many more.













Excerpt of excellence partners

audEERING maintains a strong link to academia and continuously advances the state-of-the-art in intelligent audio analysis by actively contributing to various research projects. Parts of audEERING’s research on speech emotion recognition are funded by an ERC Proof-of-concept grant from the European Commission.





Excerpt of government-funded projects

audEERING is partner of various government-funded projects targeted to improve wellbeing for society.


VocEmoApI: Voice Emotion Detection by Appraisal Inference


In the VocEmoApI project, a first-of-its-kind software product prototype for voice emotion detection based on a fundamentally different approach is created: Focusing on vocal nonverbal behavior and sophisticated acoustic voice analysis, the detection will exploit the building blocks of emotional processes, namely a person’s appraisal of relevant events and situations that trigger the action tendencies and expressions which constitute an emotional episode. Evidence for emotion-antecedent appraisals is continuously tracked in recordings of running speech. This also allows for tracking continuous changes in emotion intensity and quality as they occur in many real-life contexts (for example, in phone calls or political debates). Using Bayesian inference rules to combine expert knowledge, theoretical predictions, and empirical data, this approach allows to infer not only the usual basic emotion categories but also to make much finer distinctions such as subcategories of emotion families (e.g., anger, imitation, rage) as well as subtle emotions such as interest, pleasure, doubt, boredom, admiration, or fascination.


ECoWeB: Assessing and Enhancing Emotional Competence for Well-Being (ECoWeB) in the Young: A principled, evidence-based, mobile-health approach to prevent mental disorders and promote mental wellbeing

Although there are effective mental well-being promotion and mental disorder prevention interventions for young people, there is a need for more robust evidence on resilience factors, for more effective interventions, and for approaches that can be scalable and accessible at a population level. To tackle these challenges and move beyond the state-of-the-art, ECoWeB uniquely integrates three multidisciplinary approaches: (a) For the first time to our knowledge, we will systematically use an established theoretical model of normal emotional functioning (Emotional Competence Process) to guide the identification and targeting of mechanisms robustly implicated in well-being and psychopathology in young people; (b) A personalized medicine approach: systematic assessment of personal Emotional Competence (EC) profiles is used to select targeted interventions to promote well-being: (c) Mobile application delivery to target scalability, accessibility and acceptability in young people. Our aim is to improve mental health promotion by developing, evaluating, and disseminating a comprehensive mobile app to assess deficits in three major components of EC (production, regulation, knowledge) and to selectively augment pertinent EC abilities in adolescents and young adults. It is hypothesized that the targeted interventions, based on state-of-the-art assessment, will efficiently increase resilience toward adversity, promote mental well-being, and act as primary prevention for mental disorders. The EC intervention will be tested in cohort multiple randomized trials with young people from many European countries against a usual care control and an established, non-personalized socio-emotional learning digital intervention. Building directly from a fundamental understanding of emotion in combination with a personalized approach and leading edge digital technology is a novel and innovative approach, with potential to deliver a breakthrough in effective prevention of mental disorder.


TAPAS: Training Network on Automatic Processing of PAthological Speech

There are an increasing number of people across Europe with debilitating speech pathologies (e.g., due to stroke, Parkinson’s, etc). These groups face communication problems that can lead to social exclusion. They are now being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but which is not robust to atypical speech. TAPAS is proposing a programme of pathological speech research, that aims to transform the well-being of these people. The TAPAS work programme targets three key research problems:
(a) Detection: We will develop speech processing techniques for early detection of conditions that impact on speech production. The outcomes will be cheap and non-invasive diagnostic tools that provide early warning of the onset of progressive conditions such as Alzheimer’s and Parkinson’s.
(b) Therapy: We will use newly-emerging speech processing techniques to produce automated speech therapy tools. These tools will make therapy more accessible and more individually targeted. Better therapy can increase the chances of recovering intelligible speech after traumatic events such a stroke or oral surgery.
(c) Assisted Living: We will re-design current speech technology so that it works well for people with speech impairments and also helps in making informed clinical choices. People with speech impairments often have other co-occurring conditions making them reliant on carers. Speech-driven tools for assisted-living are a way to allow such people to live more independently.

BMBF_gefördert vom_deutsch

OPTAPEB: Optimierung der Psychotherapie durch Agentengeleitete Patientenzentrierte Emotionsbewältigung

OPTAPEB aims to develop an immersive and interactive virtual reality system that assists users in curing phobia. The system will allow to experience situations of phobia and protocol this emotional experience and the user’s behaviour. Various levels of emotional reactions will be monitored continuously and in real time by the system that applies sensors based on innovative e-wear technology, speech signals, and other pervasive technologies (e.g. accelerometres). A further goal of the project is the development of a game-like algorithm to control the user experience of anxieties through exposure therapy and to adapt the course of the therapy to the user needs and the current situation automatically.

Excerpt of scientific studies using openSMILE

audEERING’s openSMILE software is widely used in the affective computing research community as it can be applied for various automatic audio analysis tasks. The publication list below is an excerpt of more than 1000 scientific studies referencing openSMILE and does not include publications made by audEERING. A complete list of publications by audEERING and its team members can be found here.

Try openSMILE and use it for:

  1. emotion_recognitionF
  2. emotion_recognitionB
  1. personality_recognitionF
  2. personality_recognitionB
  1. depression_detectionF
  2. depression_detectionB
  1. social_interactionF
  2. social_interactionB
  1. stress_recognitionF
  2. stress_recognitionB
  1. laughter_detectionF
  2. laughter_detectionB
  1. speaker_likabilityF
  2. speaker_likabilityB
  1. autism_diagnosisF
  2. autism_diagnosisB
  1. virtual_agentsF
  2. virtual_agentsB
  1. bird_soundF
  2. bird_soundB
  1. speech_synthesisF
  2. speech_synthesisB
  1. parkinson_diagnosisF
  2. parkinson_diagnosisB
  1. intoxication_detectionF
  2. intoxication_detectionB
  1. intelligibility_classificationF
  2. intelligibility_classificationB
  1. aggression_detectionF
  2. aggression_detectionB
  1. speech_recognition_optimizationF
  2. speech_recognition_optimizationB
  1. uncertainty_detectionF
  2. uncertainty_detectionB
  1. articulatory_disorderF
  2. articulatory_disorderB
  1. eating_behaviorF
  2. eating_behaviorB
  1. multimedia_event_detectionF
  2. multimedia_event_detectionB
  1. whisper_detectionF
  2. whisper_detectionB
  1. speaking_style_analysisF
  2. speaking_style_analysisB
  1. head_motion_synthesisF
  2. head_motion_synthesisB
  1. music_mood_recognitionF
  2. music_mood_recognitionB

Emotion Recognition

Dhall, A., Goecke, R., Joshi, J., Wagner, M., & Gedeon, T. (2013, December). Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 509-516). ACM.

Dhall, A., Goecke, R., Joshi, J., Sikka, K., & Gedeon, T. (2014, November). Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 461-466). ACM.

Liu, M., Wang, R., Li, S., Shan, S., Huang, Z., & Chen, X. (2014, November). Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 494-501). ACM.

Dhall, A., Ramana Murthy, O. V., Goecke, R., Joshi, J., & Gedeon, T. (2015, November). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 423-426). ACM.

Savran, A., Cao, H., Shah, M., Nenkova, A., & Verma, R. (2012, October). Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering. In Proceedings of the 14th ACM international conference on Multimodal interaction (pp. 485-492). ACM.

Poria, S., Cambria, E., & Gelbukh, A. F. (2015, September). Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis. In EMNLP (pp. 2539-2544).

Zheng, W., Xin, M., Wang, X., & Wang, B. (2014). A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Processing Letters, 21(5), 569-572.

Bhattacharya, A., Wu, W., & Yang, Z. (2012). Quality of experience evaluation of voice communication: an affect-based approach. Human-centric Computing and Information Sciences, 2(1), 7.

Bone, D., Lee, C. C., & Narayanan, S. (2014). Robust unsupervised arousal rating: A rule-based framework withknowledge-inspired vocal features. IEEE transactions on affective computing, 5(2), 201-213.

Liu, M., Wang, R., Huang, Z., Shan, S., & Chen, X. (2013, December). Partial least squares regression on grassmannian manifold for emotion recognition. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 525-530). ACM.

Audhkhasi, K., & Narayanan, S. (2013). A globally-variant locally-constant model for fusion of labels from multiple diverse experts without using reference labels. IEEE transactions on pattern analysis and machine intelligence, 35(4), 769-783.

Mariooryad, S., & Busso, C. (2013). Exploring cross-modality affective reactions for audiovisual emotion recognition. IEEE Transactions on affective computing, 4(2), 183-196.

Chen, J., Chen, Z., Chi, Z., & Fu, H. (2014, November). Emotion recognition in the wild with feature fusion and multiple kernel learning. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 508-513). ACM.

Rosenberg, A. (2012). Classifying Skewed Data: Importance Weighting to Optimize Average Recall. In Interspeech (pp. 2242-2245).

Sun, R., & Moore, E. (2011). Investigating glottal parameters and teager energy operators in emotion recognition. Affective computing and intelligent interaction, 425-434.

Sun, B., Li, L., Zuo, T., Chen, Y., Zhou, G., & Wu, X. (2014, November). Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 481-486). ACM.

Mariooryad, S., & Busso, C. (2015). Correcting time-continuous emotional labels by modeling the reaction lag of evaluators. IEEE Transactions on Affective Computing, 6(2), 97-108.

Ivanov, A., & Riccardi, G. (2012, March). Kolmogorov-Smirnov test for feature selection in emotion recognition from speech. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp. 5125-5128). IEEE.

Mariooryad, S., & Busso, C. (2013, September). Analysis and compensation of the reaction lag of evaluators in continuous emotional annotations. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on (pp. 85-90). IEEE.

Alonso-Martín, F., Malfaz, M., Sequeira, J., Gorostiza, J. F., & Salichs, M. A. (2013). A multimodal emotion detection system during human–robot interaction. Sensors, 13(11), 15549-15581.

Moore, J. D., Tian, L., & Lai, C. (2014, April). Word-level emotion recognition using high-level features. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 17-31). Springer Berlin Heidelberg.

Cao, H., Verma, R., & Nenkova, A. (2015). Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Computer speech & language, 29(1), 186-202.

Mariooryad, S., & Busso, C. (2014). Compensating for speaker or lexical variabilities in speech for emotion recognition. Speech Communication, 57, 1-12.

Wu, C. H., Lin, J. C., & Wei, W. L. (2014). Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA transactions on signal and information processing, 3, e12.

Busso, C., Mariooryad, S., Metallinou, A., & Narayanan, S. (2013). Iterative feature normalization scheme for automatic emotion detection from speech. IEEE transactions on Affective computing, 4(4), 386-397.

Galanis, D., Karabetsos, S., Koutsombogera, M., Papageorgiou, H., Esposito, A., & Riviello, M. T. (2013, December). Classification of emotional speech units in call centre interactions. In Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on (pp. 403-406). IEEE.

Sidorov, M., Brester, C., Minker, W., & Semenkin, E. (2014, May). Speech-Based Emotion Recognition: Feature Selection by Self-Adaptive Multi-Criteria Genetic Algorithm. In LREC (pp. 3481-3485).

Oflazoglu, C., & Yildirim, S. (2013). Recognizing emotion from Turkish speech using acoustic features. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 26.

Kaya, H., & Salah, A. A. (2016). Combining modality-specific extreme learning machines for emotion recognition in the wild. Journal on Multimodal User Interfaces, 10(2), 139-149.

Amer, M. R., Siddiquie, B., Richey, C., & Divakaran, A. (2014, May). Emotion detection in speech using deep networks. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 3724-3728). IEEE.

Poria, S., Chaturvedi, I., Cambria, E., & Hussain, A. (2016, December). Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Data Mining (ICDM), 2016 IEEE 16th International Conference on (pp. 439-448). IEEE.

Kaya, H., Çilli, F., & Salah, A. A. (2014, November). Ensemble CCA for continuous emotion prediction. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 19-26). ACM.

Mariooryad, S., Lotfian, R., & Busso, C. (2014, September). Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora. In INTERSPEECH (pp. 238-242).

Busso, C., Parthasarathy, S., Burmania, A., AbdelWahab, M., Sadoughi, N., & Provost, E. M. (2017). MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 8(1), 67-80.

Jin, Q., Li, C., Chen, S., & Wu, H. (2015, April). Speech emotion recognition with acoustic and lexical features. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4749-4753). IEEE.

Peng, S. O. N. G., Yun, J. I. N., Li, Z. H. A. O., & Minghai, X. I. N. (2014). Speech emotion recognition using transfer learning. IEICE TRANSACTIONS on Information and Systems, 97(9), 2530-2532.

Huang, D. Y., Zhang, Z., & Ge, S. S. (2014). Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines. Computer Speech & Language, 28(2), 392-419.

Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical signal processing and control, 18, 80-90.

Kaya, H., Gürpinar, F., Afshar, S., & Salah, A. A. (2015, November). Contrasting and combining least squares based learners for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 459-466). ACM.

Banda, N., & Robinson, P. (2011, November). Noise analysis in audio-visual emotion recognition. In Proceedings of the International Conference on Multimodal Interaction (pp. 1-4).

Chen, S., & Jin, Q. (2015, October). Multi-modal dimensional emotion recognition using recurrent neural networks. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge (pp. 49-56). ACM.

Audhkhasi, K., Sethy, A., Ramabhadran, B., & Narayanan, S. S. (2012, March). Creating ensemble of diverse maximum entropy models. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp. 4845-4848). IEEE.

Lubis, N., Sakti, S., Neubig, G., Toda, T., Purwarianti, A., & Nakamura, S. (2016). Emotion and its triggers in human spoken dialogue: Recognition and analysis. In Situated Dialog in Speech-Based Human-Computer Interaction (pp. 103-110). Springer International Publishing.

Song, P., Jin, Y., Zha, C., & Zhao, L. (2014). Speech emotion recognition method based on hidden factor analysis. Electronics Letters, 51(1), 112-114.

Dhall, A., Goecke, R., Joshi, J., Wagner, M., & Gedeon, T. (2013, December). Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 371-372). ACM.

Chen, L., Yoon, S. Y., Leong, C. W., Martin, M., & Ma, M. (2014, November). An initial analysis of structured video interviews by using multimodal emotion detection. In Proceedings of the 2014 workshop on Emotion Representation and Modelling in Human-Computer-Interaction-Systems (pp. 1-6). ACM.

Brester, C., Semenkin, E., Sidorov, M., & Minker, W. (2014). Self-adaptive multi-objective genetic algorithms for feature selection. In Proceedings of International Conference on Engineering and Applied Sciences Optimization (pp. 1838-1846).

Personality Recognition

Vinciarelli, A., & Mohammadi, G. (2014). A survey of personality computing. IEEE Transactions on Affective Computing, 5(3), 273-291.

Pohjalainen, J., Räsänen, O., & Kadioglu, S. (2015). Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Computer Speech & Language, 29(1), 145-171.

Ivanov, A. V., Riccardi, G., Sporka, A. J., & Franc, J. (2011). Recognition of Personality Traits from Human Spoken Conversations. In INTERSPEECH (pp. 1549-1552).

Chastagnol, C., & Devillers, L. (2012). Personality traits detection using a parallelized modified SFFS algorithm. computing, 15, 16.

Alam, F., & Riccardi, G. (2013, August). Comparative study of speaker personality traits recognition in conversational and broadcast news speech. In INTERSPEECH (pp. 2851-2855).

Alam, F., & Riccardi, G. (2014, May). Fusion of acoustic, linguistic and psycholinguistic features for speaker personality traits recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 955-959). IEEE.

Depression Detection

Grünerbl, A., Muaremi, A., Osmani, V., Bahle, G., Oehler, S., Tröster, G., … & Lukowicz, P. (2015). Smartphone-based recognition of states and state changes in bipolar disorder patients. IEEE Journal of Biomedical and Health Informatics, 19(1), 140-148.

Gravenhorst, F., Muaremi, A., Bardram, J., Grünerbl, A., Mayora, O., Wurzer, G., … & Tröster, G. (2015). Mobile phones as medical devices in mental disorder treatment: an overview. Personal and Ubiquitous Computing, 19(2), 335-353.

Cummins, N., Joshi, J., Dhall, A., Sethu, V., Goecke, R., & Epps, J. (2013, October). Diagnosis of depression by behavioural signals: a multimodal approach. In Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge (pp. 11-20). ACM.

Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Breakspear, M., & Parker, G. (2012, May). From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech. In FLAIRS Conference.

Joshi, J., Goecke, R., Alghowinem, S., Dhall, A., Wagner, M., Epps, J., … & Breakspear, M. (2013). Multimodal assistive technologies for depression diagnosis and monitoring. Journal on Multimodal User Interfaces, 7(3), 217-228.

Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Breakspear, M., & Parker, G. (2013, May). Detecting depression: a comparison between spontaneous and read speech. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 7547-7551). IEEE.

Cummins, N., Epps, J., Sethu, V., Breakspear, M., & Goecke, R. (2013, August). Modeling spectral variability for the classification of depressed speech. In Interspeech (pp. 857-861).

Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Gedeon, T., Breakspear, M., & Parker, G. (2013, May). A comparative study of different classifiers for detecting depression from spontaneous speech. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8022-8026). IEEE.

Gupta, R., Malandrakis, N., Xiao, B., Guha, T., Van Segbroeck, M., Black, M., … & Narayanan, S. (2014, November). Multimodal prediction of affective dimensions and depression in human-computer interactions. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 33-40). ACM.

Karam, Z. N., Provost, E. M., Singh, S., Montgomery, J., Archer, C., Harrington, G., & Mcinnis, M. G. (2014, May). Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 4858-4862). IEEE.

Mitra, V., Shriberg, E., McLaren, M., Kathol, A., Richey, C., Vergyri, D., & Graciarena, M. (2014, November). The SRI AVEC-2014 evaluation system. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 93-101). ACM.

Sidorov, M., & Minker, W. (2014, November). Emotion recognition and depression diagnosis by acoustic and visual features: A multimodal approach. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 81-86). ACM.

Kaya, H., & Salah, A. A. (2014, November). Eyes whisper depression: A cca based multimodal approach. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 961-964). ACM.

Hönig, F., Batliner, A., Nöth, E., Schnieder, S., & Krajewski, J. (2014, September). Automatic modelling of depressed speech: relevant features and relevance of gender. In INTERSPEECH (pp. 1248-1252).

Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Parker, G., & Breakspear, M. (2013). Characterising depressed speech for classification. In Interspeech (pp. 2534-2538).

Asgari, M., Shafran, I., & Sheeber, L. B. (2014, September). Inferring clinical depression from speech and spoken utterances. In Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on (pp. 1-5). IEEE.

Social Interaction Analysis

Rehg, J., Abowd, G., Rozga, A., Romero, M., Clements, M., Sclaroff, S., … & Rao, H. (2013). Decoding children’s social behavior. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3414-3421).

Wagner, J., Lingenfelser, F., Baur, T., Damian, I., Kistler, F., & André, E. (2013, October). The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time. In Proceedings of the 21st ACM international conference on Multimedia (pp. 831-834). ACM.

Black, M. P., Katsamanis, A., Baucom, B. R., Lee, C. C., Lammert, A. C., Christensen, A., … & Narayanan, S. S. (2013). Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features. Speech Communication, 55(1), 1-21.

Lee, C. C., Katsamanis, A., Black, M. P., Baucom, B. R., Christensen, A., Georgiou, P. G., & Narayanan, S. S. (2014). Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions. Computer Speech & Language, 28(2), 518-539.

Black, M., Georgiou, P. G., Katsamanis, A., Baucom, B. R., & Narayanan, S. S. (2011, August). ” You made me do it”: Classification of Blame in Married Couples’ Interactions by Fusing Automatically Derived Speech and Language Information. In Interspeech (pp. 89-92).

Lubold, N., & Pon-Barry, H. (2014, November). Acoustic-prosodic entrainment and rapport in collaborative learning dialogues. In Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge (pp. 5-12). ACM.

Neiberg, D., & Gustafson, J. (2011). Predicting Speaker Changes and Listener Responses with and without Eye-Contact. In INTERSPEECH (pp. 1565-1568).

Wagner, J., Lingenfelser, F., & André, E. (2013). Using phonetic patterns for detecting social cues in natural conversations. In INTERSPEECH (pp. 168-172).

Avril, M., Leclère, C., Viaux, S., Michelet, S., Achard, C., Missonnier, S., … & Chetouani, M. (2014). Social signal processing for studying parent–infant interaction. Frontiers in psychology, 5, 1437.

Stress Recognition

Muaremi, A., Arnrich, B., & Tröster, G. (2013). Towards measuring stress with smartphones and wearable devices during workday and sleep. BioNanoScience, 3(2), 172-183.

Van Segbroeck, M., Travadi, R., Vaz, C., Kim, J., Black, M. P., Potamianos, A., & Narayanan, S. S. (2014, September). Classification of cognitive load from speech using an i-vector framework. In INTERSPEECH (pp. 751-755).

Laughter Detection

Niewiadomski, R., Hofmann, J., Urbain, J., Platt, T., Wagner, J., Piot, B., … & Geist, M. (2013, May). Laugh-aware virtual agent and its impact on user amusement. In Proceedings of the 2013 international conference on Autonomous agents

Gupta, R., Audhkhasi, K., Lee, S., & Narayanan, S. (2013). Paralinguistic event detection from speech using probabilistic time-series smoothing and masking. In Interspeech (pp. 173-177).

Oh, J., Cho, E., & Slaney, M. (2013, August). Characteristic contours of syllabic-level units in laughter. In Interspeech (pp. 158-162).

Speaker Likability Recognition

Pohjalainen, J., Räsänen, O., & Kadioglu, S. (2015). Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Computer Speech & Language, 29(1), 145-171.

Autism Dignosis

Bone, D., Lee, C. C., Black, M. P., Williams, M. E., Lee, S., Levitt, P., & Narayanan, S. (2014). The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody. Journal of Speech, Language, and Hearing Research, 57(4), 1162-1177.

Räsänen, O., & Pohjalainen, J. (2013, August). Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In INTERSPEECH (pp. 210-214).

Bone, D., Chaspari, T., Audhkhasi, K., Gibson, J., Tsiartas, A., Van Segbroeck, M., … & Narayanan, S. (2013). Classifying language-related developmental disorders from speech cues: the promise and the potential confounds. In INTERSPEECH (pp. 182-186).

Virtual Agents

Reidsma, D., de Kok, I., Neiberg, D., Pammi, S. C., van Straalen, B., Truong, K., & van Welbergen, H. (2011). Continuous interaction with a virtual human. Journal on Multimodal User Interfaces, 4(2), 97-118.

Bevacqua, E., De Sevin, E., Hyniewska, S. J., & Pelachaud, C. (2012). A listener model: introducing personality traits. Journal on Multimodal User Interfaces, 6(1-2), 27-38.

Kopp, S., van Welbergen, H., Yaghoubzadeh, R., & Buschmeier, H. (2014). An architecture for fluid real-time conversational agents: integrating incremental output generation and input processing. Journal on Multimodal User Interfaces, 8(1), 97-108.

Neiberg, D., & Truong, K. P. (2011, May). Online detection of vocal listener responses with maximum latency constraints. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on (pp. 5836-5839). IEEE.

Maat, M. (2011). Response selection and turn-taking for a sensitive artificial listening agent. University of Twente.

Gebhard, P., Baur, T., Damian, I., Mehlmann, G., Wagner, J., & André, E. (2014, May). Exploring interaction strategies for virtual characters to induce stress in simulated job interviews. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems (pp. 661-668). International Foundation for Autonomous Agents and Multiagent Systems.

Bird Sound Identification

Potamitis, I., Ntalampiras, S., Jahn, O., & Riede, K. (2014). Automatic bird sound detection in long real-field recordings: Applications and tools. Applied Acoustics, 80, 1-9.

Goëau, H., Glotin, H., Vellinga, W. P., Planqué, R., Rauber, A., & Joly, A. (2014, September). LifeCLEF bird identification task 2014. In CLEF2014.

Lasseck, M. (2014). Large-scale Identification of Birds in Audio Recordings. In CLEF (Working Notes) (pp. 643-653).

Emotional Speech Synthesis Research

Black, A. W., Bunnell, H. T., Dou, Y., Muthukumar, P. K., Metze, F., Perry, D., … & Vaughn, C. (2012, March). Articulatory features for expressive speech synthesis. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp. 4005-4008). IEEE.

Steidl, S., Polzehl, T., Bunnell, H. T., Dou, Y., Muthukumar, P. K., Perry, D., … & Metze, F. (2012). Emotion identification for evaluation of synthesized emotional speech.

Parkinson’s Disease Diagnosis

Bayestehtashk, A., Asgari, M., Shafran, I., & McNames, J. (2015). Fully automated assessment of the severity of Parkinson’s disease from speech. Computer speech & language, 29(1), 172-185.

Bocklet, T., Steidl, S., Nöth, E., & Skodda, S. (2013). Automatic evaluation of parkinson’s speech-acoustic, prosodic and voice related cues. In Interspeech (pp. 1149-1153).

Intoxication Detection

Gajšek, R., Mihelic, F., & Dobrišek, S. (2013). Speaker state recognition using an HMM-based feature extraction method. Computer Speech & Language, 27(1), 135-150.

Bone, D., Li, M., Black, M. P., & Narayanan, S. S. (2014). Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Computer speech & language, 28(2), 375-391.

Suendermann-Oeft, D., Ramanarayanan, V., Teckenbrock, M., Neutatz, F., & Schmidt, D. (2015). HALEF: An Open-Source Standard-Compliant Telephony-Based Modular Spoken Dialog System: A Review and An Outlook. In Natural Language Dialog Systems and Intelligent Assistants (pp. 53-61). Springer International Publishing.

Speech Intelligibility Classification

Kim, J., Kumar, N., Tsiartas, A., Li, M., & Narayanan, S. S. (2015). Automatic intelligibility classification of sentence-level pathological speech. Computer speech & language, 29(1), 132-144.

Aggression Detection

Lefter, I., Rothkrantz, L. J., & Burghouts, G. J. (2013). A comparative study on automatic audio–visual fusion for aggression detection using meta-information. Pattern Recognition Letters, 34(15), 1953-1963.

Speech Recognition Optimization

Audhkhasi, K., Zavou, A. M., Georgiou, P. G., & Narayanan, S. S. (2014). Theoretical analysis of diversity in an ensemble of automatic speech recognition systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(3), 711-726.

Uncertainty Detection

Forbes-Riley, K., Litman, D., Friedberg, H., & Drummond, J. (2012, June). Intrinsic and extrinsic evaluation of an automatic user disengagement detector for an uncertainty-adaptive spoken dialogue system. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 91-102). Association for Computational Linguistics

Articulatory Disorder Detection

Cmejla, R., Rusz, J., Bergl, P., & Vokral, J. (2013). Bayesian changepoint detection for the automatic assessment of fluency and articulatory disorders. Speech Communication, 55(1), 178-189.

Eating Behavior Analysis

Kalantarian, H., & Sarrafzadeh, M. (2015). Audio-based detection and evaluation of eating behavior using the smartwatch platform. Computers in biology and medicine, 65, 1-9.

Multimedia Event Detection

Metze, F., Rawat, S., & Wang, Y. (2014, July). Improved audio features for large-scale multimedia event detection. In Multimedia and Expo (ICME), 2014 IEEE International Conference on (pp. 1-6). IEEE.

Rawat, S., Schulam, P. F., Burger, S., Ding, D., Wang, Y., & Metze, F. (2013). Robust audio-codebooks for large-scale event detection in consumer videos.

Whisper Speech Analysis

Tran, T., Mariooryad, S., & Busso, C. (2013, May). Audiovisual corpus to analyze whisper speech. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8101-8105). IEEE.

Speaking Style Analysis

Mariooryad, S., Kannan, A., Hakkani-Tur, D., & Shriberg, E. (2014, May). Automatic characterization of speaking styles in educational videos. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 4848-4852). IEEE.

Head Motion Synthesis

Ben Youssef, A., Shimodaira, H., & Braude, D. A. (2013). Articulatory features for speech-driven head motion synthesis. Proceedings of Interspeech, Lyon, France.

Music Mood Recognition

Fan, Y., & Xu, M. (2014, October). MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level Regression. In MediaEval.