REFERENCES

Excerpt of clients we work with

audEERING’s audio analysis technology can be experienced in various end-user products developed in cooperation with our clients. Our sensAI framework has been extended to custom speech, music, and sound processing and recognition solutions for consumer research, call center data analysis, in-car emotion recognition, acoustic scene classification,  DJ apps, gaming, and many more.

HuaweiC

BMW

gfkC

RBMH_C

algoriddim-logo-web

sensumC

sanpsyCW

spitch

compedia

beatclip

lionapps

Excerpt of excellence partners

audEERING maintains a strong link to academia and continuously advances the state-of-the-art in intelligent audio analysis by actively contributing to various research projects. Parts of audEERING’s research on speech emotion recognition are funded by an ERC Proof-of-concept grant from the European Commission.

Uni_Passau

Tum_logo

erc

david-systems-logo

Selected scientific studies applying / referencing audEERING’s openSMILE software

Narayanan, S., & Georgiou, P. G. (2013). Behavioral signal processing: Deriving human behavioral informatics from speech and language. Proceedings of the IEEE, 101(5), 1203-1233.

Vinciarelli, A., & Mohammadi, G. (2014). A survey of personality computing. IEEE Transactions on Affective Computing, 5(3), 273-291.

Dhall, A., Goecke, R., Joshi, J., Wagner, M., & Gedeon, T. (2013, December). Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 509-516). ACM.

Grünerbl, A., Muaremi, A., Osmani, V., Bahle, G., Oehler, S., Tröster, G., … & Lukowicz, P. (2015). Smartphone-based recognition of states and state changes in bipolar disorder patients. IEEE Journal of Biomedical and Health Informatics, 19(1), 140-148.

Dhall, A., Goecke, R., Joshi, J., Sikka, K., & Gedeon, T. (2014, November). Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 461-466). ACM.

Rehg, J., Abowd, G., Rozga, A., Romero, M., Clements, M., Sclaroff, S., … & Rao, H. (2013). Decoding children’s social behavior. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3414-3421).

Wagner, J., Lingenfelser, F., Baur, T., Damian, I., Kistler, F., & André, E. (2013, October). The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time. In Proceedings of the 21st ACM international conference on Multimedia (pp. 831-834). ACM.

Joly, A., Goëau, H., Glotin, H., Spampinato, C., Bonnet, P., Vellinga, W. P., … & Müller, H. (2016, September). LifeCLEF 2016: multimedia life species identification challenges. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 286-310). Springer International Publishing.

Black, M. P., Katsamanis, A., Baucom, B. R., Lee, C. C., Lammert, A. C., Christensen, A., … & Narayanan, S. S. (2013). Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features. Speech Communication, 55(1), 1-21.

Degottex, G., Kane, J., Drugman, T., Raitio, T., & Scherer, S. (2014, May). COVAREP—A collaborative voice analysis repository for speech technologies. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 960-964). IEEE.

Muaremi, A., Arnrich, B., & Tröster, G. (2013). Towards measuring stress with smartphones and wearable devices during workday and sleep. BioNanoScience, 3(2), 172-183.

Liu, M., Wang, R., Li, S., Shan, S., Huang, Z., & Chen, X. (2014, November). Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 494-501). ACM.

Dhall, A., Ramana Murthy, O. V., Goecke, R., Joshi, J., & Gedeon, T. (2015, November). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 423-426). ACM.

Lee, C. C., Katsamanis, A., Black, M. P., Baucom, B. R., Christensen, A., Georgiou, P. G., & Narayanan, S. S. (2014). Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions. Computer Speech & Language, 28(2), 518-539.

Wagner, J., Lingenfelser, F., & André, E. (2011). The Social Signal Interpretation Framework (SSI) for Real Time Signal Processing and Recognition. In INTERSPEECH (pp. 3245-3248).

Niewiadomski, R., Hofmann, J., Urbain, J., Platt, T., Wagner, J., Piot, B., … & Geist, M. (2013, May). Laugh-aware virtual agent and its impact on user amusement. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems (pp. 619-626). International Foundation for Autonomous Agents and Multiagent Systems.

Pohjalainen, J., Räsänen, O., & Kadioglu, S. (2015). Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Computer Speech & Language, 29(1), 145-171.

Gravenhorst, F., Muaremi, A., Bardram, J., Grünerbl, A., Mayora, O., Wurzer, G., … & Tröster, G. (2015). Mobile phones as medical devices in mental disorder treatment: an overview. Personal and Ubiquitous Computing, 19(2), 335-353.

Savran, A., Cao, H., Shah, M., Nenkova, A., & Verma, R. (2012, October). Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering. In Proceedings of the 14th ACM international conference on Multimodal interaction (pp. 485-492). ACM.

Cummins, N., Joshi, J., Dhall, A., Sethu, V., Goecke, R., & Epps, J. (2013, October). Diagnosis of depression by behavioural signals: a multimodal approach. In Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge (pp. 11-20). ACM.

Poria, S., Cambria, E., & Gelbukh, A. F. (2015, September). Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis. In EMNLP (pp. 2539-2544).

Bone, D., Lee, C. C., Black, M. P., Williams, M. E., Lee, S., Levitt, P., & Narayanan, S. (2014). The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody. Journal of Speech, Language, and Hearing Research, 57(4), 1162-1177.

Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Breakspear, M., & Parker, G. (2012, May). From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech. In FLAIRS Conference.

Reidsma, D., de Kok, I., Neiberg, D., Pammi, S. C., van Straalen, B., Truong, K., & van Welbergen, H. (2011). Continuous interaction with a virtual human. Journal on Multimodal User Interfaces, 4(2), 97-118.

Joshi, J., Goecke, R., Alghowinem, S., Dhall, A., Wagner, M., Epps, J., … & Breakspear, M. (2013). Multimodal assistive technologies for depression diagnosis and monitoring. Journal on Multimodal User Interfaces, 7(3), 217-228.

Potamitis, I., Ntalampiras, S., Jahn, O., & Riede, K. (2014). Automatic bird sound detection in long real-field recordings: Applications and tools. Applied Acoustics, 80, 1-9.

Zheng, W., Xin, M., Wang, X., & Wang, B. (2014). A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Processing Letters, 21(5), 569-572.

Räsänen, O., & Pohjalainen, J. (2013, August). Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In INTERSPEECH (pp. 210-214).

Bhattacharya, A., Wu, W., & Yang, Z. (2012). Quality of experience evaluation of voice communication: an affect-based approach. Human-centric Computing and Information Sciences, 2(1), 7.

Black, A. W., Bunnell, H. T., Dou, Y., Muthukumar, P. K., Metze, F., Perry, D., … & Vaughn, C. (2012, March). Articulatory features for expressive speech synthesis. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp. 4005-4008). IEEE.

Bone, D., Lee, C. C., & Narayanan, S. (2014). Robust unsupervised arousal rating: A rule-based framework withknowledge-inspired vocal features. IEEE transactions on affective computing, 5(2), 201-213.

Liu, M., Wang, R., Huang, Z., Shan, S., & Chen, X. (2013, December). Partial least squares regression on grassmannian manifold for emotion recognition. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 525-530). ACM.

Goëau, H., Glotin, H., Vellinga, W. P., Planqué, R., Rauber, A., & Joly, A. (2014, September). LifeCLEF bird identification task 2014. In CLEF2014.

Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Breakspear, M., & Parker, G. (2013, May). Detecting depression: a comparison between spontaneous and read speech. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 7547-7551). IEEE.

Bayestehtashk, A., Asgari, M., Shafran, I., & McNames, J. (2015). Fully automated assessment of the severity of Parkinson’s disease from speech. Computer speech & language, 29(1), 172-185.

Audhkhasi, K., & Narayanan, S. (2013). A globally-variant locally-constant model for fusion of labels from multiple diverse experts without using reference labels. IEEE transactions on pattern analysis and machine intelligence, 35(4), 769-783.

Mariooryad, S., & Busso, C. (2013). Exploring cross-modality affective reactions for audiovisual emotion recognition. IEEE Transactions on affective computing, 4(2), 183-196.

Cummins, N., Epps, J., Sethu, V., Breakspear, M., & Goecke, R. (2013, August). Modeling spectral variability for the classification of depressed speech. In Interspeech (pp. 857-861).

Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Gedeon, T., Breakspear, M., & Parker, G. (2013, May). A comparative study of different classifiers for detecting depression from spontaneous speech. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8022-8026). IEEE.

Gupta, R., Malandrakis, N., Xiao, B., Guha, T., Van Segbroeck, M., Black, M., … & Narayanan, S. (2014, November). Multimodal prediction of affective dimensions and depression in human-computer interactions. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 33-40). ACM.

Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., & Pal, C. (2015, November). Recurrent neural networks for emotion recognition in video. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 467-474). ACM.

Karam, Z. N., Provost, E. M., Singh, S., Montgomery, J., Archer, C., Harrington, G., & Mcinnis, M. G. (2014, May). Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 4858-4862). IEEE.

Chen, J., Chen, Z., Chi, Z., & Fu, H. (2014, November). Emotion recognition in the wild with feature fusion and multiple kernel learning. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 508-513). ACM.

Rosenberg, A. (2012). Classifying Skewed Data: Importance Weighting to Optimize Average Recall. In Interspeech (pp. 2242-2245).

Sun, R., & Moore, E. (2011). Investigating glottal parameters and teager energy operators in emotion recognition. Affective computing and intelligent interaction, 425-434.

Sun, B., Li, L., Zuo, T., Chen, Y., Zhou, G., & Wu, X. (2014, November). Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 481-486). ACM.

Ivanov, A. V., Riccardi, G., Sporka, A. J., & Franc, J. (2011). Recognition of Personality Traits from Human Spoken Conversations. In INTERSPEECH (pp. 1549-1552).

Black, M., Georgiou, P. G., Katsamanis, A., Baucom, B. R., & Narayanan, S. S. (2011, August). ” You made me do it”: Classification of Blame in Married Couples’ Interactions by Fusing Automatically Derived Speech and Language Information. In Interspeech (pp. 89-92).

Gupta, R., Audhkhasi, K., Lee, S., & Narayanan, S. (2013). Paralinguistic event detection from speech using probabilistic time-series smoothing and masking. In Interspeech (pp. 173-177).

Mariooryad, S., & Busso, C. (2015). Correcting time-continuous emotional labels by modeling the reaction lag of evaluators. IEEE Transactions on Affective Computing, 6(2), 97-108.

Gajšek, R., Mihelic, F., & Dobrišek, S. (2013). Speaker state recognition using an HMM-based feature extraction method. Computer Speech & Language, 27(1), 135-150.

Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., … & Serra, X. (2013, October). Essentia: an open-source library for sound and music analysis. In Proceedings of the 21st ACM international conference on Multimedia (pp. 855-858). ACM.

Ivanov, A., & Riccardi, G. (2012, March). Kolmogorov-Smirnov test for feature selection in emotion recognition from speech. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp. 5125-5128). IEEE.

Bone, D., Li, M., Black, M. P., & Narayanan, S. S. (2014). Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Computer speech & language, 28(2), 375-391.

Brueckner, R., & Schulter, B. (2014, May). Social signal classification using deep BLSTM recurrent neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 4823-4827). IEEE.

Kim, J., Kumar, N., Tsiartas, A., Li, M., & Narayanan, S. S. (2015). Automatic intelligibility classification of sentence-level pathological speech. Computer speech & language, 29(1), 132-144.

Bevacqua, E., De Sevin, E., Hyniewska, S. J., & Pelachaud, C. (2012). A listener model: introducing personality traits. Journal on Multimodal User Interfaces, 6(1-2), 27-38.

Mariooryad, S., & Busso, C. (2013, September). Analysis and compensation of the reaction lag of evaluators in continuous emotional annotations. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on (pp. 85-90). IEEE.

Lefter, I., Rothkrantz, L. J., & Burghouts, G. J. (2013). A comparative study on automatic audio–visual fusion for aggression detection using meta-information. Pattern Recognition Letters, 34(15), 1953-1963.

Alonso-Martín, F., Malfaz, M., Sequeira, J., Gorostiza, J. F., & Salichs, M. A. (2013). A multimodal emotion detection system during human–robot interaction. Sensors, 13(11), 15549-15581.

Moore, J. D., Tian, L., & Lai, C. (2014, April). Word-level emotion recognition using high-level features. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 17-31). Springer Berlin Heidelberg.

Audhkhasi, K., Zavou, A. M., Georgiou, P. G., & Narayanan, S. S. (2014). Theoretical analysis of diversity in an ensemble of automatic speech recognition systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(3), 711-726.

Cao, H., Verma, R., & Nenkova, A. (2015). Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Computer speech & language, 29(1), 186-202.

Mariooryad, S., & Busso, C. (2014). Compensating for speaker or lexical variabilities in speech for emotion recognition. Speech Communication, 57, 1-12.

Mitra, V., Shriberg, E., McLaren, M., Kathol, A., Richey, C., Vergyri, D., & Graciarena, M. (2014, November). The SRI AVEC-2014 evaluation system. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 93-101). ACM.

Wu, C. H., Lin, J. C., & Wei, W. L. (2014). Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA transactions on signal and information processing, 3, e12.

Forbes-Riley, K., Litman, D., Friedberg, H., & Drummond, J. (2012, June). Intrinsic and extrinsic evaluation of an automatic user disengagement detector for an uncertainty-adaptive spoken dialogue system. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 91-102). Association for Computational Linguistics.

Busso, C., Mariooryad, S., Metallinou, A., & Narayanan, S. (2013). Iterative feature normalization scheme for automatic emotion detection from speech. IEEE transactions on Affective computing, 4(4), 386-397.

Galanis, D., Karabetsos, S., Koutsombogera, M., Papageorgiou, H., Esposito, A., & Riviello, M. T. (2013, December). Classification of emotional speech units in call centre interactions. In Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on (pp. 403-406). IEEE.

Van Segbroeck, M., Travadi, R., Vaz, C., Kim, J., Black, M. P., Potamianos, A., & Narayanan, S. S. (2014, September). Classification of cognitive load from speech using an i-vector framework. In INTERSPEECH (pp. 751-755).

Cmejla, R., Rusz, J., Bergl, P., & Vokral, J. (2013). Bayesian changepoint detection for the automatic assessment of fluency and articulatory disorders. Speech Communication, 55(1), 178-189.

Urbain, J., Cakmak, H., & Dutoit, T. (2013, September). Automatic phonetic transcription of laughter and its application to laughter synthesis. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on (pp. 153-158). IEEE.

Kalantarian, H., & Sarrafzadeh, M. (2015). Audio-based detection and evaluation of eating behavior using the smartwatch platform. Computers in biology and medicine, 65, 1-9.

Lasseck, M. (2014). Large-scale Identification of Birds in Audio Recordings. In CLEF (Working Notes) (pp. 643-653).

Sidorov, M., Brester, C., Minker, W., & Semenkin, E. (2014, May). Speech-Based Emotion Recognition: Feature Selection by Self-Adaptive Multi-Criteria Genetic Algorithm. In LREC (pp. 3481-3485).

Oflazoglu, C., & Yildirim, S. (2013). Recognizing emotion from Turkish speech using acoustic features. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 26.

Kaya, H., & Salah, A. A. (2016). Combining modality-specific extreme learning machines for emotion recognition in the wild. Journal on Multimodal User Interfaces, 10(2), 139-149.

Amer, M. R., Siddiquie, B., Richey, C., & Divakaran, A. (2014, May). Emotion detection in speech using deep networks. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 3724-3728). IEEE.

Poria, S., Chaturvedi, I., Cambria, E., & Hussain, A. (2016, December). Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Data Mining (ICDM), 2016 IEEE 16th International Conference on (pp. 439-448). IEEE.

Ganchev, T. (2011). Contemporary methods for speech parameterization (pp. 1-106). Springer New York.

Kopp, S., van Welbergen, H., Yaghoubzadeh, R., & Buschmeier, H. (2014). An architecture for fluid real-time conversational agents: integrating incremental output generation and input processing. Journal on Multimodal User Interfaces, 8 (1), 97-108.

Kaya, H., Çilli, F., & Salah, A. A. (2014, November). Ensemble CCA for continuous emotion prediction. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 19-26). ACM.

Mariooryad, S., Lotfian, R., & Busso, C. (2014, September). Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora. In INTERSPEECH (pp. 238-242).

Muñoz, D., Gutierrez, F. J., Ochoa, S. F., & Baloian, N. (2013, December). Enhancing Social Interaction between Older Adults and Their Families. In IWAAL (pp. 47-54).

Muaremi, A., Gravenhorst, F., Grünerbl, A., Arnrich, B., & Tröster, G. (2014, May). Assessing bipolar episodes using speech cues derived from phone calls. In International Symposium on Pervasive Computing Paradigms for Mental Health (pp. 103-114). Springer International Publishing.

Neiberg, D., & Truong, K. P. (2011, May). Online detection of vocal listener responses with maximum latency constraints. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on (pp. 5836-5839). IEEE.

Bone, D., Chaspari, T., Audhkhasi, K., Gibson, J., Tsiartas, A., Van Segbroeck, M., … & Narayanan, S. (2013). Classifying language-related developmental disorders from speech cues: the promise and the potential confounds. In INTERSPEECH (pp. 182-186).

Sidorov, M., & Minker, W. (2014, November). Emotion recognition and depression diagnosis by acoustic and visual features: A multimodal approach. In Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge (pp. 81- 86). ACM.

Chastagnol, C., & Devillers, L. (2012). Personality traits detection using a parallelized modified SFFS algorithm. computing, 15, 16.

Busso, C., Parthasarathy, S., Burmania, A., AbdelWahab, M., Sadoughi, N., & Provost, E. M. (2017). MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 8(1), 67-80.

Metze, F., Rawat, S., & Wang, Y. (2014, July). Improved audio features for large-scale multimedia event detection. In Multimedia and Expo (ICME), 2014 IEEE International Conference on (pp. 1-6). IEEE.

Kaya, H., & Salah, A. A. (2014, November). Eyes whisper depression: A cca based multimodal approach. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 961-964). ACM.

Knauf, R., Kürsten, J., Kurze, A., Ritter, M., Berger, A., Heinich, S., & Eibl, M. (2011, December). Produce. annotate. archive. repurpose–: accelerating the composition and metadata accumulation of tv content. In Proceedings of the 2011 ACM international workshop on Automated media analysis and production for novel TV services (pp. 31-36). ACM.

Maat, M. (2011). Response selection and turn-taking for a sensitive artificial listening agent. University of Twente.

Lubold, N., & Pon-Barry, H. (2014, November). Acoustic-prosodic entrainment and rapport in collaborative learning dialogues. In Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge (pp. 5- 12). ACM.

Suendermann-Oeft, D., Ramanarayanan, V., Teckenbrock, M., Neutatz, F., & Schmidt, D. (2015). HALEF: An Open-Source Standard-Compliant Telephony-Based Modular Spoken Dialog System: A Review and An Outlook. In Natural Language Dialog Systems and Intelligent Assistants (pp. 53-61). Springer International Publishing.

Jin, Q., Li, C., Chen, S., & Wu, H. (2015, April). Speech emotion recognition with acoustic and lexical features. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4749-4753). IEEE.

Peng, S. O. N. G., Yun, J. I. N., Li, Z. H. A. O., & Minghai, X. I. N. (2014). Speech emotion recognition using transfer learning. IEICE TRANSACTIONS on Information and Systems, 97(9), 2530-2532.

Huang, D. Y., Zhang, Z., & Ge, S. S. (2014). Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines. Computer Speech & Language, 28(2), 392-419.

Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical signal processing and control, 18, 80-90.

Alam, F., & Riccardi, G. (2013, August). Comparative study of speaker personality traits recognition in conversational and broadcast news speech. In INTERSPEECH (pp. 2851-2855).

Hönig, F., Batliner, A., Nöth, E., Schnieder, S., & Krajewski, J. (2014, September). Automatic modelling of depressed speech: relevant features and relevance of gender. In INTERSPEECH (pp. 1248-1252).

Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Parker, G., & Breakspear, M. (2013). Characterising depressed speech for classification. In Interspeech (pp. 2534-2538).

Kaya, H., Gürpinar, F., Afshar, S., & Salah, A. A. (2015, November). Contrasting and combining least squares based learners for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 459-466). ACM.

Kaya, H., Özkaptan, T., Salah, A. A., & Gürgen, F. (2015). Random discriminative projection based feature selection with application to conflict recognition. IEEE Signal Processing Letters, 22(6), 671-675.

Banda, N., & Robinson, P. (2011, November). Noise analysis in audio-visual emotion recognition. In Proceedings of the International Conference on Multimodal Interaction (pp. 1-4).

Steidl, S., Polzehl, T., Bunnell, H. T., Dou, Y., Muthukumar, P. K., Perry, D., … & Metze, F. (2012). Emotion identification for evaluation of synthesized emotional speech.

Tick, D., Rahman, T., Busso, C., & Gans, N. (2012, May). Indoor robotic terrain classification via angular velocity based hierarchical classifier selection. In Robotics and Automation (ICRA), 2012 IEEE International Conference On (pp. 3594-3600). IEEE.

Oh, J., Cho, E., & Slaney, M. (2013, August). Characteristic contours of syllabic-level units in laughter. In Interspeech (pp. 158-162).

Chen, S., & Jin, Q. (2015, October). Multi-modal dimensional emotion recognition using recurrent neural networks. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge (pp. 49-56). ACM.

Ben-Youssef, A., Shimodaira, H., & Braude, D. A. (2014, May). Speech driven talking head from estimated articulatory features. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 4573-4577). IEEE.

Audhkhasi, K., Sethy, A., Ramabhadran, B., & Narayanan, S. S. (2012, March). Creating ensemble of diverse maximum entropy models. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp. 4845-4848). IEEE.

Bocklet, T., Steidl, S., Nöth, E., & Skodda, S. (2013). Automatic evaluation of parkinson’s speech-acoustic, prosodic and voice related cues. In Interspeech (pp. 1149-1153).

Lubis, N., Sakti, S., Neubig, G., Toda, T., Purwarianti, A., & Nakamura, S. (2016). Emotion and its triggers in human spoken dialogue: Recognition and analysis. In Situated Dialog in Speech-Based Human-Computer Interaction (pp. 103-110). Springer International Publishing.

Tran, T., Mariooryad, S., & Busso, C. (2013, May). Audiovisual corpus to analyze whisper speech. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8101-8105). IEEE.

Rawat, S., Schulam, P. F., Burger, S., Ding, D., Wang, Y., & Metze, F. (2013). Robust audio-codebooks for large-scale event detection in consumer videos.

Gebhard, P., Baur, T., Damian, I., Mehlmann, G., Wagner, J., & André, E. (2014, May). Exploring interaction strategies for virtual characters to induce stress in simulated job interviews. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems (pp. 661-668). International Foundation for Autonomous Agents and Multiagent Systems.

Song, P., Jin, Y., Zha, C., & Zhao, L. (2014). Speech emotion recognition method based on hidden factor analysis. Electronics Letters, 51(1), 112-114.

Neiberg, D., & Gustafson, J. (2011). Predicting Speaker Changes and Listener Responses with and without Eye-Contact. In INTERSPEECH (pp. 1565-1568).

Wagner, J., Lingenfelser, F., & André, E. (2013). Using phonetic patterns for detecting social cues in natural conversations. In INTERSPEECH (pp. 168-172).

Cowley, B., Filetti, M., Lukander, K., Torniainen, J., Henelius, A., Ahonen, L., … & Ravaja, N. (2016). The Psychophysiology Primer: A Guide to Methods and a Broad Review with a Focus on Human–Computer Interaction. Foundations and Trends® Human–Computer Interaction, 9(3-4), 151-308.

Mariooryad, S., Kannan, A., Hakkani-Tur, D., & Shriberg, E. (2014, May). Automatic characterization of speaking styles in educational videos. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 4848-4852). IEEE.

Ben Youssef, A., Shimodaira, H., & Braude, D. A. (2013). Articulatory features for speech-driven head motion synthesis. Proceedings of Interspeech, Lyon, France.

Asgari, M., Shafran, I., & Sheeber, L. B. (2014, September). Inferring clinical depression from speech and spoken utterances. In Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on (pp. 1-5). IEEE.

Alam, F., & Riccardi, G. (2014, May). Fusion of acoustic, linguistic and psycholinguistic features for speaker personality traits recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 955-959). IEEE.

Avril, M., Leclère, C., Viaux, S., Michelet, S., Achard, C., Missonnier, S., … & Chetouani, M. (2014). Social signal processing for studying parent–infant interaction. Frontiers in psychology, 5, 1437.

Fan, Y., & Xu, M. (2014, October). MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level Regression. In MediaEval.

Dhall, A., Goecke, R., Joshi, J., Wagner, M., & Gedeon, T. (2013, December). Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 371-372). ACM.

Chen, L., Yoon, S. Y., Leong, C. W., Martin, M., & Ma, M. (2014, November). An initial analysis of structured video interviews by using multimodal emotion detection. In Proceedings of the 2014 workshop on Emotion Representation and Modelling in Human-Computer-Interaction-Systems (pp. 1-6). ACM.

Brester, C., Semenkin, E., Sidorov, M., & Minker, W. (2014). Self-adaptive multi-objective genetic algorithms for feature selection. In Proceedings of International Conference on Engineering and Applied Sciences Optimization (pp. 1838-1846).

Heckmann, M. (2014, September). Steps towards more natural human-machine interaction via audio-visual word prominence detection. In International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (pp. 15-24). Springer International Publishing.

Lopez-Otero, P., Docio-Fernandez, L., & Garcia-Mateo, C. (2014). iVectors for continuous emotion recognition. Training, 45, 50.

Orozco-Arroyave, J. R., Hönig, F., Arias-Londoño, J. D., Vargas-Bonilla, J. F., Daqrouq, K., Skodda, S., … & Nöth, E. (2016). Automatic detection of Parkinson’s disease in running speech spoken in three different languages. The Journal of the Acoustical Society of America, 139(1), 481-500.

Brester, C., Sidorov, M., & Semenkin, E. (2014, September). Acoustic emotion recognition two ways of features selection based on self-adaptive multi-objective genetic algorithm. In Informatics in Control, Automation and Robotics (ICINCO), 2014 11th International Conference on (Vol. 2, pp. 851-855). IEEE.

Bojanic, M., Crnojevic, V., & Delic, V. (2012, September). Application of neural networks in emotional speech recognition. In Neural Network Applications in Electrical Engineering (NEUREL), 2012 11th Symposium on (pp. 223-226). IEEE.

Kim, J. C., & Clements, M. A. (2015). Multimodal affect classification at various temporal lengths. IEEE Transactions on Affective Computing, 6(4), 371-384.

Jones, H. E., Sabouret, N., Damian, I., Baur, T., André, E., Porayska-Pomsta, K., & Rizzo, P. (2014). Interpreting social cues to generate credible affective reactions of virtual job interviewers. arXiv preprint arXiv:1402.5039.

Pokorny, F. B., Graf, F., Pernkopf, F., & Schuller, B. W. (2015, September). Detection of negative emotions in speech signals using bags-of-audio-words. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on (pp. 879-884). IEEE.

Hönig, F., Bocklet, T., Riedhammer, K., Batliner, A., & Nöth, E. (2012). The Automatic Assessment of Non-native Prosody: Combining Classical Prosodic Analysis with Acoustic Modelling. In INTERSPEECH (pp. 823-826).

Wagner, J., Lingenfelser, F., & André, E. (2012). A Frame Pruning Approach for Paralinguistic Recognition Tasks. In INTERSPEECH (pp. 274-277).

Sánchez-Gutiérrez, M. E., Albornoz, E. M., Martinez-Licona, F., Rufiner, H. L., & Goddard, J. (2014, June). Deep learning for emotional speech recognition. In Mexican Conference on Pattern Recognition (pp. 311-320). Springer International Publishing.

Avila, S., Moreira, D., Perez, M., Moraes, D., Cota, I., Testoni, V., … & Rocha, A. (2014). RECOD at MediaEval 2014: Violent scenes detection task. In CEUR Workshop Proceedings. CEUR-WS.

Litman, D. J., Friedberg, H., & Forbes-Riley, K. (2012). Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues. In INTERSPEECH (pp. 755-758).

Bone, D., Lee, C. C., Potamianos, A., & Narayanan, S. S. (2014). An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model. In INTERSPEECH (pp. 218-222).

Weng, S., Chen, S., Yu, L., Wu, X., Cai, W., Liu, Z., … & Li, M. (2015, December). The SYSU system for the interspeech 2015 automatic speaker verification spoofing and countermeasures challenge. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific (pp. 152-155). IEEE.

Aguiar, A. C., Kaiseler, M., Meinedo, H., Abrudan, T. E., & Almeida, P. R. (2013, September). Speech stress assessment using physiological and psychological measures. In Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication (pp. 921-930). ACM.

Lasseck, M. (2015, September). Towards automatic large-scale identification of birds in audio recordings. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 364-375). Springer International Publishing.

Day, M. (2013, December). Emotion recognition with boosted tree classifiers. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 531-534). ACM.

Sidorov, M., Ultes, S., & Schmitt, A. (2014, May). Comparison of Gender-and Speaker-adaptive Emotion Recognition. In LREC (pp. 3476-3480).

Kim, J., Nasir, M., Gupta, R., Van Segbroeck, M., Bone, D., Black, M. P., … & Narayanan, S. S. (2015, September). Automatic estimation of parkinson’s disease severity from diverse speech tasks. In INTERSPEECH (pp. 914-918).

Tian, L., Moore, J. D., & Lai, C. (2015, September). Emotion recognition in spontaneous and acted dialogues. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on (pp. 698-704). IEEE.

Zhao, R., Sinha, T., Black, A. W., & Cassell, J. (2016, September). Automatic recognition of conversational strategies in the service of a socially-aware dialog system. In 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (p. 381).

Lehner, B., Widmer, G., & Bock, S. (2015, August). A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks. In Signal Processing Conference (EUSIPCO), 2015 23rd European (pp. 21-25). IEEE.

Feese, S., Muaremi, A., Arnrich, B., Troster, G., Meyer, B., & Jonas, K. (2011, October). Discriminating individually considerate and authoritarian leaders by speech activity cues. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on (pp. 1460-1465). IEEE.

Ghosh, A., & Riccardi, G. (2014, November). Recognizing human activities from smartphone sensor signals. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 865-868). ACM.

Liu, G., & Hansen, J. H. (2014). Supra-segmental feature based speaker trait detection. In Proc. Odyssey.

Sun, B., Li, L., Zhou, G., Wu, X., He, J., Yu, L., … & Wei, Q. (2015, November). Combining multimodal features within a fusion network for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 497-502). ACM.

Ellis, J. G., Lin, W. S., Lin, C. Y., & Chang, S. F. (2014, December). Predicting evoked emotions in video. In Multimedia (ISM), 2014 IEEE International Symposium on (pp. 287-294). IEEE.

Urbain, J., Niewiadomski, R., Hofmann, J., Bantegnie, E., Baur, T., Berthouze, N., … & Griffin, H. (2013). Laugh machine. Proceedings eNTERFACE, 12, 13-34.

Brester, C., Semenkin, E., Kovalev, I., Zelenkov, P., & Sidorov, M. (2015, May). Evolutionary feature selection for emotion recognition in multilingual speech analysis. In Evolutionary Computation (CEC), 2015 IEEE Congress on (pp. 2406-2411). IEEE.

Albornoz, E. M., Sánchez-Gutiérrez, M., Martinez-Licona, F., Rufiner, H. L., & Goddard, J. (2014, November). Spoken emotion recognition using deep learning. In Iberoamerican Congress on Pattern Recognition (pp. 104-111). Springer International Publishing.

Zhang, B., Provost, E. M., Swedberg, R., & Essl, G. (2015, January). Predicting Emotion Perception Across Domains: A Study of Singing and Speaking. In AAAI (pp. 1328-1335).

Ursu, M. F., Falelakis, M., Groen, M., Kaiser, R., & Frantzis, M. (2015, June). Experimental enquiry into automatically orchestrated live video communication in social settings. In Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (pp. 63-72). ACM.

Serban, O., & Pauchet, A. (2013, September). Agentslang: A fast and reliable platform for distributed interactive systems. In Intelligent Computer Communication and Processing (ICCP), 2013 IEEE International Conference on (pp. 35-42). IEEE.

Rasheed, U., Tahir, Y., Dauwels, S., Dauwels, J., Thalmann, D., & Magnenat-Thalmann, N. (2013, October). Real-Time Comprehensive Sociometrics for Two-Person Dialogs. In HBU (pp. 196-208).

Li, M. (2014). Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens.

Brester, C., Sidorov, M., & Semenkin, E. (2014). Speech-based emotion recognition: Application of collective decision making concepts. In Proceedings of the 2nd International Conference on Computer Science and Artificial Intelligence (ICCSAI2014) (pp. 216-220).

Lukacs, G., Jani, M., & Takacs, G. (2013, September). Acoustic feature mining for mixed speech and music playlist generation. In ELMAR, 2013 55th International Symposium (pp. 275-278). IEEE.

Lopez-Otero, P., Docio-Fernandez, L., & Garcia-Mateo, C. (2014, May). A study of acoustic features for the classification of depressed speech. In Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014 37th International Convention on (pp. 1331-1335). IEEE.

Cao, H., Savran, A., Verma, R., & Nenkova, A. (2015). Acoustic and lexical representations for affect prediction in spontaneous conversations. Computer speech & language, 29(1), 203-217.

Huang, C. L., Tsao, Y., Hori, C., & Kashioka, H. (2011, October). Feature normalization and selection for robust speaker state recognition. In Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on (pp. 102-105). IEEE.

Gao, B., Dellandréa, E., & Chen, L. (2012, June). Music sparse decomposition onto a midi dictionary of musical words and its application to music mood classification. In Content-Based Multimedia Indexing (CBMI), 2012 10th International Workshop on (pp. 1-6). IEEE.

Sidorov, M., Brester, C., Semenkin, E., & Minker, W. (2014, September). Speaker state recognition with neural network-based classification and self-adaptive heuristic feature selection. In Informatics in Control, Automation and Robotics (ICINCO), 2014 11th International Conference on (Vol. 1, pp. 699-703). IEEE.

Lopez-Otero, P., Dacia-Fernandez, L., & Garcia-Mateo, C. (2014, March). A study of acoustic features for depression detection. In Biometrics and Forensics (IWBF), 2014 International Workshop on (pp. 1-6). IEEE.

Sha, C. Y., Yang, Y. H., Lin, Y. C., & Chen, H. H. (2013, May). Singing voice timbre classification of chinese popular music. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 734-738). IEEE.

Liu, C. J., Wu, C. H., & Chiu, Y. H. (2013, October). BFI-based speaker personality perception using acoustic-prosodic features. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific (pp. 1-6). IEEE.

Friedberg, H. (2011, June). Turn-taking cues in a human tutoring corpus. In Proceedings of the ACL 2011 Student Session (pp. 94-98). Association for Computational Linguistics.

Sapru, A., & Bourlard, H. (2015). Automatic recognition of emergent social roles in small group interactions. IEEE Transactions on Multimedia, 17(5), 746-760.

Gosztolya, G. (2015). Conflict intensity estimation from speech using greedy forward-backward feature selection.

Steidl, S., Riedhammer, K., Bocklet, T., Hönig, F., & Nöth, E. (2011). Java Visual Speech Components for Rapid Application Development of GUI Based Speech Processing Applications. In INTERSPEECH (pp. 3257-3260).

Finkelstein, S., Ogan, A., Vaughn, C., & Cassell, J. (2013). Alex: A virtual peer that identifies student dialect. In Proc. Workshop on Culturally-aware Technology Enhanced Learning in conjuction with EC-TEL 2013, Paphos, Cyprus, September 17.

Sapru, A., & Bourlard, H. (2013, September). Investigating the impact of language style and vocal expression on social roles of participants in professional meetings. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on (pp. 324-329). IEEE.

Tian, L., Lai, C., & Moore, J. (2015, April). Recognizing emotions in dialogues with disfluencies and non-verbal vocalisations. In Proceedings of the 4th Interdisciplinary Workshop on Laughter and Other Non-verbal Vocalisations in Speech (Vol. 14, p. 15).

Tickle, A., Raghu, S., & Elshaw, M. (2013). Emotional recognition from the speech signal for a virtual education agent. In Journal of Physics: Conference Series (Vol. 450, No. 1, p. 012053). IOP Publishing.

Le, B. V., Bang, J. H., & Lee, S. (2013, December). Hierarchical emotion classification using genetic algorithms. In Proceedings of the Fourth Symposium on Information and Communication Technology (pp. 158-163). ACM.

Gallardo-Antolín, A., Montero, J. M., & King, S. (2014). A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis.

Gangamohan, P., Kadiri, S. R., & Yegnanarayana, B. (2016). Analysis of Emotional Speech—A Review. In Toward Robotic Socially Believable Behaving Systems-Volume I (pp. 205-238). Springer International Publishing.

Burkhardt, F. (2011). Speechalyzer: a software tool to process speech data. Proc. Elektronische Sprachsignalverarbeitung.

Honig, F., Batliner, A., Booklet, T., Stemmer, G., Noth, E., Schnieder, S., & Krajewski, J. (2014, May). Are men more sleepy than women or does it only look like—Automatic analysis of sleepy speech. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 995-999). IEEE.

Chen, S., Jin, Q., Li, X., Yang, G., & Xu, J. (2014, September). Speech emotion classification using acoustic features. In Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on (pp. 579-583). IEEE.

Asgari, M., Shafran, I., & Bayestehtashk, A. (2014, September). Inferring social contexts from audio recordings using deep neural networks. In Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on (pp. 1-6). IEEE.

Kaya, Y., Karabacak, O., & Çaliskan, A. (2013, April). A computer vision system for classification of some Euphorbia (Euphorbiaceae) seeds based on local binary patterns. In Signal Processing and Communications Applications Conference (SIU), 2013 21st (pp. 1-4). IEEE.

Cao, H., Verma, R., & Nenkova, A. (2012, September). Combining Ranking and Classification to Improve Emotion Recognition in Spontaneous Speech. In INTERSPEECH (pp. 358-361).

An, G., Brizan, D. G., Ma, M., Morales, M., Syed, A. R., & Rosenberg, A. (2015). Automatic recognition of unified parkinson’s disease rating from speech with acoustic, i-vector and phonotactic features. In INTERSPEECH (pp. 508-512).

Wagner, J., Lingenfelser, F., & André, E. (2015). Building a robust system for multimodal emotion recognition. Emotion recognition: A pattern analysis approach, 379-410.

Xia, W., Gibson, J., Xiao, B., Baucom, B., & Georgiou, P. G. (2015). A dynamic model for behavioral analysis of couple interactions using acoustic features. In Sixteenth Annual Conference of the International Speech Communication Association.

Tang, Y., Huang, Y., Wu, Z., Meng, H., Xu, M., & Cai, L. (2016, March). Question detection from acoustic features using recurrent neural network with gated recurrent unit. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6125-6129). IEEE.

Albornoz, E. M., Vignolo, L. D., Martinez, C. E., & Milone, D. H. (2013, November). Genetic wrapper approach for automatic diagnosis of speech disorders related to Autism. In Computational Intelligence and Informatics (CINTI), 2013 IEEE 14th International Symposium on (pp. 387-392). IEEE.

Caraty, M. J., & Montacié, C. (2014). Vocal fatigue induced by prolonged oral reading: Analysis and detection. Computer Speech & Language, 28(2), 453-466.

Yu, Z., Ramanarayanan, V., Suendermann-Oeft, D., Wang, X., Zechner, K., Chen, L., … & Qian, Y. (2015, December). Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on (pp. 338-345). IEEE.

Chao, L., Tao, J., Yang, M., & Li, Y. (2014, September). Improving generation performance of speech emotion recognition by denoising autoencoders. In Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on (pp. 341-344). IEEE.

Pon-Barry, H., & Nelakurthi, A. R. (2014). Challenges for robust prosody-based affect recognition. In Proceedings of Speech Prosody (pp. 144-148).

Sidorov, M., Brester, C., & Schmitt, A. (2015). Contemporary stochastic feature selection algorithms for speech-based emotion recognition. In INTERSPEECH (pp. 2699-2703).

Sun, B., Li, L., Wu, X., Zuo, T., Chen, Y., Zhou, G., … & Zhu, X. (2016). Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild. Journal on Multimodal User Interfaces, 10(2), 125-137.

Sidorov, M., Ultes, S., & Schmitt, A. (2014, November). Automatic recognition of personality traits: A multimodal approach. In Proceedings of the 2014 Workshop on Mapping Personality Traits Challenge and Workshop (pp. 11-15). ACM.

Gupta, R., Audhkhasi, K., & Narayanan, S. (2015, April). A mixture of experts approach towards intelligibility classification of pathological speech. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 1986-1990). IEEE.

Lyakso, E., Frolova, O., Dmitrieva, E., Grigorev, A., Kaya, H., Salah, A. A., & Karpov, A. (2015, September). EmoChildRu: emotional child russian speech corpus. In International Conference on Speech and Computer (pp. 144-152). Springer International Publishing.

Auguin, N., Huang, S., & Fung, P. (2013, October). Identification of live or studio versions of a song via supervised learning. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific (pp. 1-4). IEEE.

Nikolaou, N. (2011). Music emotion classification (Doctoral dissertation, Technical University of Crete).

Savran, A., Cao, H., Nenkova, A., & Verma, R. (2015). Temporal Bayesian fusion for affect sensing: Combining video, audio, and lexical modalities. IEEE transactions on cybernetics, 45(9), 1927-1941.

Thomason, J., & Litman, D. J. (2013). Differences in User Responses to a Wizard-of-Oz versus Automated System. In HLT-NAACL (pp. 796-801).

Prylipko, D., Egorow, O., Siegert, I., & Wendemuth, A. (2014). Application of image processing methods to filled pauses detection from spontaneous speech. In INTERSPEECH (pp. 1816-1820).

Alghowinem, S. (2013, September). From joyous to clinically depressed: Mood detection using multimodal analysis of a person’s appearance and speech. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on (pp. 648-654). IEEE.

Chaspari, T., Dimitriadis, D., & Maragos, P. (2014, September). Emotion classification of speech using modulation features. In Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European (pp. 1552-1556). IEEE.

Gosztolya, G. (2015). On evaluation metrics for social signal detection.

Wu, J., Lin, Z., & Zha, H. (2015, November). Multiple models fusion for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 475-481). ACM.

Milde, B., & Biemann, C. (2015, September). Using representation learning and out-of-domain data for a paralinguistic speech task. In INTERSPEECH (pp. 904-908).

Huang, Z., Epps, J., & Ambikairajah, E. (2015). An investigation of emotion change detection from speech. In INTERSPEECH (pp. 1329-1333).

Wang, Y., Rawat, S., & Metze, F. (2014, May). Exploring audio semantic concepts for event-based video retrieval. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 1360-1364). IEEE.

Sandulescu, V., Andrews, S., Ellis, D., Dobrescu, R., & Martinez-Mozos, O. (2015, November). Mobile app for stress monitoring using voice features. In E-Health and Bioengineering Conference (EHB), 2015 (pp. 1-4). IEEE.

Jani, M., Lukács, G., & Takács, G. (2014, April). Experimental Investigation of Transitions for Mixed Speech and Music Playlist Generation. In Proceedings of International Conference on Multimedia Retrieval (p. 392). ACM.

Escalante, H. J., Ponce-López, V., Wan, J., Riegler, M. A., Chen, B., Clapés, A., … & Müller, H. (2016). Chalearn joint contest on multimedia challenges beyond visual analysis: An overview. Proceedings of ICPRW.

Chang, K. H. (2012). Speech Analysis Methodologies towards Unobtrusive Mental Health Monitoring.

Seppi, D., Demuynck, K., & Van Compernolle, D. (2011, January). Template-Based Automatic Speech Recognition Meets Prosody. In Interspeech (pp. 545-548).

Kaya, H., & Salah, A. A. (2014, November). Continuous mapping of personality traits: a novel challenge and failure conditions. In Proceedings of the 2014 Workshop on Mapping Personality Traits Challenge and Workshop (pp. 17-24). ACM.

Lotfian, R., & Busso, C. (2016, March). Practical considerations on the use of preference learning for ranking emotional speech. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 5205-5209). IEEE.

Borrie, S. A., Lubold, N., & Pon-Barry, H. (2015). Disordered speech disrupts conversational entrainment: a study of acoustic-prosodic entrainment and communicative success in populations with communication challenges. Frontiers in psychology, 6.

Hussain, M. S., D’Mello, S. K., & Calvo, R. A. (2014). 25 Research and Development Tools in Affective Computing. The Oxford Handbook of Affective Computing, 349.

Murphy, A., & Redfern, S. (2013). Utilizing bimodal emotion recognition for adaptive artificial intelligence. International Journal of Engineering Science and Innovative Technology (IJESIT), 2(4), 167-173.

Hassan Awadallah, A., Gurunath Kulkarni, R., Ozertem, U., & Jones, R. (2015, October). Characterizing and predicting voice query reformulation. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 543-552). ACM.

Bhattacharya, A., Wu, W., & Yang, Z. (2011, November). Quality of experience evaluation of voice communication systems using affect-based approach. In Proceedings of the 19th ACM international conference on Multimedia (pp. 929-932). ACM.

Zhang, Y., Coutinho, E., Zhang, Z., Quan, C., & Schuller, B. (2015). Agreement-based Dynamic Active Learning with Least and Medium Certainty Query Strategies. In Proc. of Advances in Active Learning: Bridging Theory and Practice Workshop, ICML 2015 (p. 5).

Shah, M., Cooper, D. G., Cao, H., Gur, R. C., Nenkova, A., & Verma, R. (2013, September). Action unit models of facial expression of emotion in the presence of speech. In Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on (pp. 49-54). IEEE.

Gupta, R., Kumar, N., & Narayanan, S. (2015, August). Affect prediction in music using boosted ensemble of filters. In Signal Processing Conference (EUSIPCO), 2015 23rd European (pp. 11-15). IEEE.

Markov, K., Matsui, T., Septier, F., & Peters, G. (2015, August). Dynamic speech emotion recognition with state-space models. In Signal Processing Conference (EUSIPCO), 2015 23rd European (pp. 2077-2081). IEEE.

Arsikere, H., Shriberg, E., & Ozertem, U. (2014, May). Computationally-efficient endpointing features for natural spoken interaction with personal-assistant systems. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 3241-3245). IEEE.

Lubis, N., Lestari, D., Purwarianti, A., Sakti, S., & Nakamura, S. (2014, December). Emotion recognition on Indonesian television talk shows. In Spoken Language Technology Workshop (SLT), 2014 IEEE (pp. 466-471). IEEE.

Keary, A., & Walsh, P. (2014, November). How affective computing could complement and advance the quantified self. In Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on (pp. 24-31). IEEE.

Siegert, I. (2015). Emotional and user-specific cues for improved analysis of naturalistic interactions (Doctoral dissertation, Magdeburg, Universität, Diss., 2015).

Braude, D. A., Shimodaira, H., & Youssef, A. B. (2013). Template-warping based speech driven head motion synthesis. In Interspeech (pp. 2763-2767).

Qian, K., Janott, C., Zhang, Z., Heiser, C., & Schuller, B. (2016, March). Wavelet features for classification of vote snore sounds. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 221-225). IEEE.

Traista, A., & Elshaw, M. (2012, September). A hybrid neural emotion recogniser for human-robotic agent interaction. In International Conference on Engineering Applications of Neural Networks (pp. 353-362). Springer Berlin Heidelberg.

Zhao, T., Black, A. W., & Eskenazi, M. (2015, September). An incremental turn-taking model with active system barge-in for spoken dialog systems. In 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (p. 42).

Riccardi, G., Stepanov, E. A., & Chowdhury, S. A. (2016, March). Discourse connective detection in spoken conversations. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6095-6099). IEEE.

Proença, J., Veiga, A., Candeias, S., & Perdigão, F. (2013, October). Acoustic, Phonetic and Prosodic Features of Parkinson’s disease Speech. In STIL-IX Brazilian Symposium in Information and Human Language Technology, 2nd Brazilian Conference on Intelligent Systems (BRACIS 2013), Fortaleza/Ceará, Brazil.

Ono, Y., Otake, M., Shinozaki, T., Nisimura, R., Yamada, T., Ishizuka, K., … & Imai, S. (2012, December). Open answer scoring for S-CAT automated speaking test system using support vector regression. In Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific (pp. 1-4). IEEE.

Truong, K. P., Westerhof, G. J., Lamers, S., de Jong, F., & Sools, A. (2013). Emotional expression in oral history narratives: Comparing results of automated verbal and nonverbal analyses. In OASIcs-OpenAccess Series in Informatics (Vol. 32). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.

Nasir, M., Xia, W., Xiao, B., Baucom, B., Narayanan, S. S., & Georgiou, P. G. (2015). Still together?: The role of acoustic features in predicting marital outcome. In Sixteenth Annual Conference of the International Speech Communication Association.

Lopes, J., Chorianopoulou, A., Palogiannidi, E., Moniz, H., Abad, A., Louka, K., … & Potamianos, A. (2016). The SpeDial datasets: datasets for spoken dialogue system analytics. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC).

Kayaoglu, M., & Eroglu Erdem, C. (2015, November). Affect recognition using key frame selection based on minimum sparse reconstruction. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 519-524). ACM.

Black, M. P., Bone, D., Skordilis, Z. I., Gupta, R., Xia, W., Papadopoulos, P., … & Georgiou, P. G. (2015). Automated evaluation of non-native english pronunciation quality: Combining knowledge-and data-driven features at multiple time scales. In Sixteenth Annual Conference of the International Speech Communication Association.

Xia, R., & Liu, Y. (2015). A multi-task learning framework for emotion recognition using 2D continuous space. IEEE Transactions on Affective Computing.

Jani, M., Takács, G., & Lukács, G. (2013, June). Evaluation of speech music transitions in Radio programs based on acoustic features. In Content-Based Multimedia Indexing (CBMI), 2013 11th International Workshop on (pp. 97-102). IEEE.

Bozkurt, E., Toledo-Ronen, O., Sorin, A., & Hoory, R. (2014, September). Exploring modulation spectrum features for speech-based depression level classification. In INTERSPEECH (pp. 1243-1247).

Kaya, H., Karpov, A. A., & Salah, A. A. (2016, July). Robust acoustic emotion recognition based on cascaded normalization and extreme learning machines. In International Symposium on Neural Networks (pp. 115-123). Springer International Publishing.

Zong, Y., Zheng, W., Huang, X., Yan, K., Yan, J., & Zhang, T. (2016). Emotion recognition in the wild via sparse transductive transfer linear discriminant analysis. Journal on Multimodal User Interfaces, 10(2), 163-172.

Burkhardt, F. (2012). Fast Labeling and Transcription with the Speechalyzer Toolkit. In LREC (pp. 196-200).

Zhang, B., Provost, E. M., & Essi, G. (2016, March). Cross-corpus acoustic emotion recognition from singing and speaking: A multi-task learning approach. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 5805-5809). IEEE.

Nguyen, P., Tran, D., Huang, X., & Ma, W. (2013, November). Age and gender classification using EEG paralinguistic features. In Neural Engineering (NER), 2013 6th International IEEE/EMBS Conference on (pp. 1295-1298). IEEE.

Kim, J., Truong, K. P., & Evers, V. (2016). Automatic detection of children’s engagement using non-verbal features and ordinal learning. In Workshop on Child Computer Interaction (pp. 29-34).

Gonzalez, S., & Anguera, X. (2013, May). Perceptually inspired features for speaker likability classification. In ICASSP (pp. 8490-8494).

Huckvale, M. (2014). Prediction of cognitive load from speech with the VOQAL voice quality toolbox for the interspeech 2014 computational paralinguistics challenge. In INTERSPEECH (pp. 741-745).

Serban, O. (2013). Detection and integration of affective feedback into distributed interactive systems (Doctoral dissertation, INSA de Rouen).

Rojas, V., Ochoa, S. F., & Hervás, R. (2014, December). Monitoring Moods in Elderly People through Voice Processing. In IWAAL (pp. 139-146).

Chisholm, D., Siddiquie, B., Divakaran, A., & Shriberg, E. (2015, June). Audio-based affect detection in web videos. In Multimedia and Expo (ICME), 2015 IEEE International Conference on (pp. 1-6). IEEE.

Xia, R., & Liu, Y. (2015, April). Leveraging valence and activation information via multi-task learning for categorical emotion recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 5301-5305). IEEE.

Rozgic, V., Vazquez-Reina, A., Crystal, M., Srivastava, A., Tan, V., & Berka, C. (2014, May). Multi-modal prediction of ptsd and stress indicators. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 3636-3640). IEEE.

Cummins, N., Sethu, V., Epps, J., Schnieder, S., & Krajewski, J. (2015). Analysis of acoustic space variability in speech affected by depression. Speech Communication, 75, 27-49.

Lotfian, R., & Busso, C. (2015, April). Emotion recognition using synthetic speech as neutral reference. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4759-4763). IEEE.

Chunying, F., Haifeng, L., Lin, M., & Xiaopeng, Z. (2013, July). Nonlinear dynamic analysis of pathological voices. In International Conference on Intelligent Computing (pp. 401-409). Springer Berlin Heidelberg.

Hämäläinen, A., Meinedo, H., Tjalve, M., Pellegrini, T., Trancoso, I., & Dias, M. S. (2014, October). Improving Speech Recognition through Automatic Selection of Age Group–Specific Acoustic Models. In International Conference on Computational Processing of the Portuguese Language (pp. 12-23). Springer International Publishing.

Nguyen, P., Tran, D., Vo, T., Huang, X., Ma, W., & Phung, D. (2013, November). EEG-based age and gender recognition using tensor decomposition and speech features. In International Conference on Neural Information Processing (pp. 632-639). Springer Berlin Heidelberg.

Toledo-Ronen, O., & Sorin, A. (2013, May). Voice-based sadness and anger recognition with cross-corpora evaluation. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 7517-7521). IEEE.

Tsiartas, A., Georgiou, P. G., & Narayanan, S. (2013, August). Toward transfer of acoustic cues of emphasis across languages. In INTERSPEECH (pp. 3483-3486).

Kalantarian, H., Sideris, C., Mortazavi, B., Alshurafa, N., & Sarrafzadeh, M. (2017). Dynamic computation offloading for low-power wearable health monitoring systems. IEEE Transactions on Biomedical Engineering, 64(3), 621-628.

Lubold, N., & Pon-Barry, H. (2014, December). A comparison of acoustic-prosodic entrainment in face-to-face and remote collaborative learning dialogues. In Spoken Language Technology Workshop (SLT), 2014 IEEE (pp. 288-293). IEEE.

Huang, C. L., & Hori, C. (2013, October). Classification of children with voice impairments using deep neural networks. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific (pp. 1-5). IEEE.

Bencherif, M. A., Alsulaiman, M., Muhammad, G., Ali, Z., Mahmood, A., & Faisal, M. (2012, October). Gender Effect in Trait Recognition. In Proceedings of the World Congress on Engineering and Computer Science (Vol. 1).

Lee, C. C., Kim, J., Metallinou, A., Busso, C., Lee, S., & Narayanan, S. S. (2014). Speech in affective computing.

Yang, Z., Wu, Q., Leung, C., & Miao, C. (2014). OS-ELM Based Emotion Recognition for Empathetic Elderly Companion. Proceedings of ELM-2014 Volume 2: Applications, 4, 331.

Lubis, N., Sakti, S., Neubig, G., Yoshino, K., Toda, T., & Nakamura, S. (2015, December). A study of social-affective communication: Automatic prediction of emotion triggers and responses in television talk shows. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on (pp. 777-783). IEEE.

Chaspari, T., Al Moubayed, S., & Fain Lehman, J. (2015, November). Exploring Children’s Verbal and Acoustic Synchrony: Towards Promoting Engagement in Speech-Controlled Robot-Companion Games. In Proceedings of the 1st Workshop on Modeling INTERPERsonal SynchrONy And infLuence (pp. 21-24). ACM.

Gosztolya, G. (2014). Estimating the level of conflict based on audio information using Inverse Distance Weighting. ACTA UNIVERSITATIS SAPIENTIAE ELECTRICAL AND MECHANICAL ENGINEERING, 6, 47-58.

Zuo, X., & Fung, P. N. (2011). A cross gender and cross lingual study on acoustic features for stress recognition in speech. In Proceedings 17th International Congress of Phonetic Sciences (ICPhS XVII), Hong Kong.

Faurholt-Jepsen, M., Busk, J., Frost, M., Vinberg, M., Christensen, E. M., Winther, O., … & Kessing, L. V. (2016). Voice analysis as an objective state marker in bipolar disorder. Translational Psychiatry, 6(7), e856.

Matsuyama, Y., Bhardwaj, A., Zhao, R., Romero, O. J., Akoju, S. A., & Cassell, J. (2016, September). Socially-aware animated intelligent personal assistant agent. In 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (p. 224).

Wu, C. H., Liang, W. B., Cheng, K. C., & Lin, J. C. (2015, September). Hierarchical modeling of temporal course in emotional expression for speech emotion recognition. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on (pp. 810-814). IEEE.

Sidorov, M., & Minker, W. (2014, November). Emotion recognition in real-world conditions with acoustic and visual features. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 521-524). ACM.

Su, D., Fung, P., & Auguin, N. (2013, May). Multimodal music emotion classification using AdaBoost with decision stumps. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 3447-3451). IEEE.

Lubold, N., Walker, E., & Pon-Barry, H. (2015). Relating Entrainment, Grounding, and Topic of Discussion in Collaborative Learning Dialogues. In Proceedings of Computer Supported Collaborative Learning.

Kaya, H., Salah, A. A., Gurgen, S. F., & Ekenel, H. (2014, April). Protocol and baseline for experiments on Bogazici University Turkish emotional speech corpus. In Signal Processing and Communications Applications Conference (SIU), 2014 22nd (pp. 1698-1701). IEEE.

Oflazoglu, Ç., & Yildirim, S. (2012, April). Anger recognition in Turkish speech using acoustic information. In Signal Processing and Communications Applications Conference (SIU), 2012 20th (pp. 1-4). IEEE.

Soury, M. (2014). Détection multimodale du stress pour la conception de logiciels de remédiation (Doctoral dissertation, Université Paris Sud-Paris XI).

Visser, N. (2011). Recognizing Natural Emotions in Speech, Having Two Classes.

Gamage, K. W., Sethu, V., Le, P. N., & Ambikairajah, E. (2015, December). An i-vector GPLDA system for speech based emotion recognition. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific (pp. 289-292). IEEE.

Kim, J. C. (2014). Classification of affect using novel voice and visual features (Doctoral dissertation, Georgia Institute of Technology).

Zhang, B., Essl, G., & Mower Provost, E. (2016, October). Automatic recognition of self-reported and perceived emotion: does joint modeling help?. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (pp. 217-224). ACM.

Yan, J., Zheng, W., Cui, Z., Tang, C., Zhang, T., Zong, Y., & Sun, N. (2016, October). Multi-clue fusion for emotion recognition in the wild. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (pp. 458-463). ACM.

Gupta, R., Audhkhasi, K., Lee, S., & Narayanan, S. (2016). Detecting paralinguistic events in audio stream using context in features and probabilistic decisions. Computer Speech & Language, 36, 72-92.

Schäfer, H. J. (2015). Deriving Conversational Social Contexts from Audio-Data (Doctoral dissertation, Master’s thesis, Technical University Munich).

Rao, H., Kim, J. C., Clements, M. A., Rozga, A., & Messinger, D. S. (2014). Detection of children’s paralinguistic events in interaction with caregivers. In 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014. International Speech and Communication Association.

Proença, J., Veiga, A., Candeias, S., Lemos, J., Januário, C., & Perdigão, F. (2014, October). Characterizing Parkinson’s Disease Speech by Acoustic and Phonetic Features. In International Conference on Computational Processing of the Portuguese Language (pp. 24-35). Springer International Publishing.

Tang, Y., Wu, Z., Meng, H., Xu, M., & Cai, L. (2016). Analysis on gated recurrent unit based question detection approach. Interspeech 2016, 735-739.

Huang, D. Y., Li, H., & Dong, M. (2014, December). Ensemble Nyström method for predicting conflict level from speech. In Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA) (pp. 1-5). IEEE.

Gao, B. (2014). Contributions to music semantic analysis and its acceleration techniques (Doctoral dissertation, Ecole Centrale de Lyon).

Zong, Y., Zheng, W., Zhang, T., & Huang, X. (2016). Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Processing Letters, 23(5), 585-589.

D’Mello, S., Dieterle, E., & Duckworth, A. (2017). Advanced, analytic, automated (AAA) measurement of engagement during learning. Educational Psychologist, 52(2), 104-123.

Gao, B., Dellandréa, E., & Chen, L. (2012, October). Accelerated dictionary learning with GPU/Multi-core CPU and its application to music classification. In Signal Processing (ICSP), 2012 IEEE 11th International Conference on (Vol. 2, pp. 1188-1193). IEEE.

Theodorou, T., Mporas, I., & Fakotakis, N. (2014, May). Audio Feature Selection for Recognition of Non-Linguistic Vocalization Sounds. In Hellenic Conference on Artificial Intelligence (pp. 395-405). Springer International Publishing.

Liu, Z., Hu, B., Yan, L., Wang, T., Liu, F., Li, X., & Kang, H. (2015, September). Detection of depression in speech. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on (pp. 743-747). IEEE.

Kim, J. C., Rao, H., & Clements, M. A. (2014). Speech intelligibility estimation using multi-resolution spectral features for speakers undergoing cancer treatment. The Journal of the Acoustical Society of America, 136(4), EL315-EL321.

Peng, S. O. N. G., Zheng, W., & Liang, R. (2015). Speech emotion recognition based on sparse transfer learning method. IEICE TRANSACTIONS on Information and Systems, 98(7), 1409-1412.

Levitan, S. I., An, G., Ma, M., Levitan, R., Rosenberg, A., & Hirschberg, J. (2016). Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection. Interspeech 2016, 2006-2010.

Mera, D., Batko, M., & Zezula, P. (2016). Speeding up the multimedia feature extraction: a comparative study on the big data approach. Multimedia Tools and Applications, 1-21.

Caixinha, M., Amaro, J., Santos, M., Perdigão, F., Gomes, M., & Santos, J. (2016). In-vivo Automatic Nuclear Cataract Detection and Classification in an Animal Model by Ultrasounds. IEEE Transactions on Biomedical Engineering, 63(11), 2326-2335.

Shivakumar, P. G., Chakravarthula, S. N., & Georgiou, P. (2016). Multimodal fusion of multirate acoustic, prosodic, and lexical speaker characteristics for native language identification. Interspeech 2016, 2408-2412.

Koutsombogera, M., Galanis, D., Riviello, M. T., Tseres, N., Karabetsos, S., Esposito, A., & Papageorgiou, H. (2015). Conflict cues in call center interactions. In Conflict and Multimodal Communication (pp. 431-447). Springer International Publishing.

Shangguan, Y., & Provost, E. M. (2015, September). EmoShapelets: Capturing local dynamics of audio-visual affective speech. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on (pp. 229-235). IEEE.

Prasad, A., & Ghosh, P. K. (2015). Automatic classification of eating conditions from speech using acoustic feature selection and a set of hierarchical support vector machine classifiers. In INTERSPEECH (pp. 884-888).

Williamson, J. R., Godoy, E., Cha, M., Schwarzentruber, A., Khorrami, P., Gwon, Y., … & Quatieri, T. F. (2016, October). Detecting Depression using Vocal, Facial and Semantic Communication Cues. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge (pp. 11-18). ACM.

Theodorou, T., Mporas, I., & Fakotakis, N. (2015, September). Automatic Sound Recognition of Urban Environment Events. In International Conference on Speech and Computer (pp. 129-136). Springer International Publishing.

Kim, S., Georgiou, P. G., & Narayanan, S. (2013, August). Annotation and classification of Political advertisements. In INTERSPEECH (pp. 1092-1096).

Zhang, B., Essl, G., & Provost, E. M. (2015, September). Recognizing emotion from singing and speaking using shared models. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on (pp. 139-145). IEEE.

Lopez-Otero, P., Docio-Fernandez, L., & Garcia-Mateo, C. (2016). Finding relevant features for zero-resource query-by-example search on speech. Speech Communication, 84, 24-35.

Matsumiya, S., Sakti, S., Neubig, G., Toda, T., & Nakamura, S. (2014). Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus. In INTERSPEECH (pp. 1801-1805).

Ellis, J. G., Jou, B., & Chang, S. F. (2014, November). Why we watch the news: A dataset for exploring sentiment in broadcast video news. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 104-111). ACM.

Poria, S., Peng, H., Hussain, A., Howard, N., & Cambria, E. (2017). Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing.

Jing, H., Hu, T. Y., Lee, H. S., Chen, W. C., Lee, C. C., Tsao, Y., & Wang, H. M. (2014). Ensemble of machine learning algorithms for cognitive and physical speaker load detection. In INTERSPEECH (pp. 447-451).

Truong, K. P., Nieuwenhuys, A., Beek, P., & Evers, V. (2015). A database for analysis of speech under physical stress: detection of exercise intensity while running and talking.

Kaushik, L., Sangwan, A., & Hansen, J. H. (2015). Laughter and filler detection in naturalistic audio. In INTERSPEECH (pp. 2509-2513).

 

 

and many many more…