Author-wise List of Papers
A G, Ramakrishnan
Online Speech Translation System for Tamil
Madhavaraj Ayyavu, Shiva Kumar H R and Ramakrishnan A G
Session: Show and Tell 5
Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley
TV Ananthapadmanabha and Ramakrishnan A G
Session: Signal Analysis for the Natural, Biological and Social Sciences
Abrol, Vinayak
Deep Convex Representations: Feature Representations for Bioacoustics Classification
Anshul Thakur, Vinayak Abrol, Pulkit Sharma and Padmanabhan Rajan
Session: Signal Analysis for the Natural, Biological and Social Sciences
ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM
Pulkit Sharma, Vinayak Abrol and Anshul Thakur
Session: Acoustic Scenes and Rare Events
Adachi, Hiroyoshi
Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents
Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo and Satoshi Nakamura
Session: Integrating Speech Science and Technology for Clinical Applications
Afouras, Triantafyllos
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras, Joon Son Chung and Andrew Zisserman
Session: Deep Enhancement
Deep Lip Reading: A Comparison of Models and an Online Application
Triantafyllos Afouras, Joon Son Chung and Andrew Zisserman
Session: Multimodal Systems
Afshan, Amber
Effectiveness of Voice Quality Features in Detecting Depression
Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Jonathan Flint and Abeer Alwan
Session: Integrating Speech Science and Technology for Clinical Applications
Using Voice Quality Supervectors for Affect Identification
Soo Jin Park, Amber Afshan, Zhi Ming Chua and Abeer Alwan
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Aggarwal, Ritu
Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training
Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
Agrawal, Purvi
PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation
Naoya Takahashi, Purvi Agrawal, Nabarun Goswami and Yuki Mitsufuji
Session: Spatial and Phase Cues for Source Separation and Speech Recognition
Comparison of Unsupervised Modulation Filter Learning Methods for ASR
Purvi Agrawal and Sriram Ganapathy
Session: Neural Network Training Strategies for ASR
Ainger, Eloise
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
Session: Speech and Language Analytics for Mental Health
Airaksinen, Manu
Time-regularized Linear Prediction for Noise-robust Extraction of the Spectral Envelope of Speech
Manu Airaksinen, Lauri Juvela, Okko Räsänen and Paavo Alku
Session: Speech Analysis and Representation
Speaker-independent Raw Waveform Model for Glottal Excitation
Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi and Paavo Alku
Session: Voice Conversion and Speech Synthesis
Akhanov, Egor
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Alam, Md Jahangir
Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
Gautam Bhattacharya, Md Jahangir Alam, Vishwa Gupta and Patrick Kenny
Session: Speaker Verification Using Neural Network Methods II
Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition
Anderson R. Avila, Md Jahangir Alam, Douglas O'Shaughnessy and Tiago Falk
Session: Emotion Recognition and Analysis
Alcorn, Alyssa
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
Session: Speech and Language Analytics for Mental Health
Aleksic, Petar
Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant
Leonid Velikovich, Ian Williams, Justin Scheiner, Petar Aleksic, Pedro Moreno and Michael Riley
Session: Recurrent Neural Models for ASR
Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search
Ian Williams, Anjuli Kannan, Petar Aleksic, David Rybach and Tara Sainath
Session: Recurrent Neural Models for ASR
Alku, Paavo
Time-regularized Linear Prediction for Noise-robust Extraction of the Spectral Envelope of Speech
Manu Airaksinen, Lauri Juvela, Okko Räsänen and Paavo Alku
Session: Speech Analysis and Representation
Speaker-independent Raw Waveform Model for Glottal Excitation
Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi and Paavo Alku
Session: Voice Conversion and Speech Synthesis
Dysarthric Speech Classification Using Glottal Features Computed from Non-words, Words and Sentences
Narendra N P and Paavo Alku
Session: Speech Pathology, Depression, and Medical Applications
Alwan, Abeer
On the Difficulties of Automatic Speech Recognition for Kindergarten-Aged Children
Gary Yeung and Abeer Alwan
Session: Applications in Education and Learning
Effectiveness of Voice Quality Features in Detecting Depression
Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Jonathan Flint and Abeer Alwan
Session: Integrating Speech Science and Technology for Clinical Applications
Using Voice Quality Supervectors for Affect Identification
Soo Jin Park, Amber Afshan, Zhi Ming Chua and Abeer Alwan
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Filter Sampling and Combination CNN (FSC-CNN): A Compact CNN Model for Small-footprint ASR Acoustic Modeling Using Raw Waveforms
Jinxi Guo, Ning Xu, Xin Chen, Yang Shi, Kaiyuan Xu and Abeer Alwan
Session: Acoustic Modelling
Ambikairajah, Eliathamby
Detection of Replay-Spoofing Attacks Using Frequency Modulation Features
Tharshini Gunendradasan, Buddhi Wickramasinghe, Ngoc Phu Le, Eliathamby Ambikairajah and Julien Epps
Session: Spoofing Detection
Frequency Domain Linear Prediction Features for Replay Spoofing Attack Detection
Buddhi Wickramasinghe, Saad Irtza, Eliathamby Ambikairajah and Julien Epps
Session: Spoofing Detection
Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric
Kaavya Sriskandaraja, Vidhyasaharan Sethu and Eliathamby Ambikairajah
Session: Spoofing Detection
Modulation Dynamic Features for the Detection of Replay Attacks
Gajan Suthokumar, Vidhyasaharan Sethu, Chamith Wijenayake and Eliathamby Ambikairajah
Session: Spoofing Detection
Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification
Sarith Fernando, Vidhyasaharan Sethu and Eliathamby Ambikairajah
Session: Language Identification
Amiriparian, Shahin
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
Session: Speech and Language Analytics for Mental Health
An, Guozhen
Lexical and Acoustic Deep Learning Model for Personality Recognition
Guozhen An and Rivka Levitan
Session: Speaker Characterization and Analysis
Deep Personality Recognition for Deception Detection
Guozhen An, Sarah Ita Levitan, Julia Hirschberg and Rivka Levitan
Session: Deception, Personality, and Culture Attribute
An, Kwanghoon
Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
Kwanghoon An, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman and Jun Wang
Session: Integrating Speech Science and Technology for Clinical Applications
Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
Myungjong Kim, Beiming Cao, Kwanghoon An and Jun Wang
Session: Application of ASR in Medical Practice
Ando, Atsushi
Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder
Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hirokazu Masataki and Yushi Aono
Session: Selected Topics in Neural Speech Processing
Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training
Atsushi Ando, Reine Asakawa, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa and Yushi Aono
Session: Speaker Characterization and Analysis
André, Elisabeth
Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?
Johannes Wagner, Dominik Schiller, Andreas Seiderer and Elisabeth André
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Anumula, Jithendar
Speaker Activity Detection and Minimum Variance Beamforming for Source Separation
Enea Ceolini, Jithendar Anumula, Adrian Huber, Ilya Kiselev and Shih-Chii Liu
Session: Source Separation and Spatial Analysis
Multi-channel Attention for End-to-End Speech Recognition
Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini and Shih-Chii Liu
Session: End-to-End Speech Recognition
Aono, Yushi
Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder
Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hirokazu Masataki and Yushi Aono
Session: Selected Topics in Neural Speech Processing
Automatic DNN Node Pruning Using Mixture Distribution-based Group Regularization
Tsukasa Yoshida, Takafumi Moriya, Kazuho Watanabe, Yusuke Shinohara, Yoshikazu Yamaguchi and Yushi Aono
Session: Selected Topics in Neural Speech Processing
Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training
Atsushi Ando, Reine Asakawa, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa and Yushi Aono
Session: Speaker Characterization and Analysis
Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition
Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, Yoshikazu Yamaguchi and Yushi Aono
Session: Adjusting to Speaker, Accent, and Domain
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono and Tatsuya Kawahara
Session: Adjusting to Speaker, Accent, and Domain
Neural Error Corrective Language Models for Automatic Speech Recognition
Tomohiro Tanaka, Ryo Masumura, Hirokazu Masataki and Yushi Aono
Session: ASR Systems and Technologies
Armstrong, Zeb
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Atkins, David
Language Features for Automated Evaluation of Cognitive Behavior Psychotherapy Sessions
Nikolaos Flemotomos, Victor Martinez, James Gibson, David Atkins, Torrey Creed and Shrikanth Narayanan
Session: Integrating Speech Science and Technology for Clinical Applications
Computational Modeling of Conversational Humor in Psychotherapy
Anil Ramakrishna, Timothy Greer, David Atkins and Shrikanth Narayanan
Session: Speech and Language Analytics for Mental Health
Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy
Karan Singla, Zhuohao Chen, Nikolaos Flemotomos, James Gibson, Dogan Can, David Atkins and Shrikanth Narayanan
Session: Speech Pathology, Depression, and Medical Applications
Azarbayejani, Ali
Attention-based Sequence Classification for Affect Detection
Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani and John Kane
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Bacchiani, Michiel
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition
Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara Sainath, Parisa Haghani, Bo Li and Michiel Bacchiani
Session: Acoustic Model Adaptation
Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models
Chanwoo Kim, Ehsan Variani, Arun Narayanan and Michiel Bacchiani
Session: Distant ASR
Badino, Leonardo
Analyzing Vocal Tract Movements During Speech Accommodation
Sankar Mukherjee, Thierry Legou, Leonardo Lancia, Pauline Hilt, Alice Tomassini, Luciano Fadiga, Alessandro D'Ausilio, Leonardo Badino and Noël Nguyen
Session: Spoken Dialogue Systems and Conversational Analysis
Conditional-Computation-Based Recurrent Neural Networks for Computationally Efficient Acoustic Modelling
Raffaele Tavarone and Leonardo Badino
Session: Selected Topics in Neural Speech Processing
Bai, Linxue
Exploring How Phone Classification Neural Networks Learn Phonetic Information by Visualising and Interpreting Bottleneck Features
Linxue Bai, Philip Weber, Peter Jančovič and Martin Russell
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Phone Recognition Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs
Mengjie Qian, Linxue Bai, Peter Jančovič and Martin Russell
Session: Acoustic Modelling
Baird, Alice
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
Session: Speech and Language Analytics for Mental Health
The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech
Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Categorical vs Dimensional Perception of Italian Emotional Speech
Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird and Björn Schuller
Session: Emotion Recognition and Analysis
Barker, Jon
On the Usefulness of the Speech Phase Spectrum for Pitch Extraction
Erfan Loweimi, Jon Barker and Thomas Hain
Session: Speech Analysis and Representation
The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines
Jon Barker, Shinji Watanabe, Emmanuel Vincent and Jan Trmal
Session: Robust Speech Recognition
DNN Driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation
Mandar Gogate, Ahsan Adeel, Ricard Marxer, Jon Barker and Amir Hussain
Session: Spatial and Phase Cues for Source Separation and Speech Recognition
Barrios, Maria A.
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Bartels, Chris
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Articulatory Features for ASR of Pathological Speech
Emre Yılmaz, Vikramjit Mitra, Chris Bartels and Horacio Franco
Session: Application of ASR in Medical Practice
Bartl-Pokorny, Katrin D.
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Baskar, Murali Karthick
BUT OpenSAT 2017 Speech Recognition System
Martin Karafiát, Murali Karthick Baskar, Igor Szöke, Vladimír Malenovský, Karel Veselý, František Grézl, Lukáš Burget and Jan Černocký
Session: Topics in Speech Recognition
BUT System for Low Resource Indian Language ASR
Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiát, Lukáš Burget and Jan Černocký
Session: Low Resource Speech Recognition Challenge for Indian Languages
Batliner, Anton
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Categorical vs Dimensional Perception of Italian Emotional Speech
Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird and Björn Schuller
Session: Emotion Recognition and Analysis
Baucom, Brian
Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions
Sandeep Nallan Chakravarthula, Brian Baucom and Panayiotis Georgiou
Session: Speech and Language Analytics for Mental Health
Towards an Unsupervised Entrainment Distance in Conversational Speech Using Deep Neural Networks
Md Nasir, Brian Baucom, Shrikanth Narayanan and Panayiotis Georgiou
Session: Speech Pathology, Depression, and Medical Applications
Baumann, Timo
DialogOS: Simple and Extensible Dialogue Modeling
Alexander Koller, Timo Baumann and Arne Köhn
Session: Show and Tell 1
An Empirical Analysis of the Correlation of Syntax and Prosody
Arne Köhn, Timo Baumann and Oskar Dörfler
Session: Speech Prosody
Analysing the Focus of a Hierarchical Attention Network: the Importance of Enjambments When Classifying Post-modern Poetry
Timo Baumann, Hussein Hussein and Burkhard Meyer-Sickendiek
Session: Speech Prosody
Baumeister, Harald
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
State of Mind: Classification through Self-reported Affect and Word Use in Speech.
Eva-Maria Rathner, Yannik Terhorst, Nicholas Cummins, Björn Schuller and Harald Baumeister
Session: Speaker State and Trait
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe and Harald Baumeister
Session: Speech Pathology, Depression, and Medical Applications
Bellamkonda, Mallikarjuna Rao
HoloCompanion: An MR Friend for EveryOne
Annam Naresh, Rushabh Gandhi, Mallikarjuna Rao Bellamkonda and Mithun Das Gupta
Session: Show and Tell 3
Bengio, Yoshua
Twin Regularization for Online Speech Recognition
Mirco Ravanelli, Dmitriy Serdyuk and Yoshua Bengio
Session: Acoustic Modelling
Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linares, Renato de Mori and Yoshua Bengio
Session: End-to-End Speech Recognition
Berisha, Visar
A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment
Megan Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu and Visar Berisha
Session: Spoken Dialogue Systems and Conversational Analysis
Investigating the Role of L1 in Automatic Pronunciation Evaluation of L2 Speech
Ming Tu, Anna Grabek, Julie Liss and Visar Berisha
Session: Applications in Education and Learning
Triplet Network with Attention for Speaker Diarization
Huan Song, Megan Willi, Jayaraman J. Thiagarajan, Visar Berisha and Andreas Spanias
Session: Speaker Verification Using Neural Network Methods II
Bhat, Chitralekha
Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder
Chitralekha Bhat, Biswajit Das, Bhavik Vachhani and Sunil Kumar Kopparapu
Session: Automatic Detection and Recognition of Voice and Speech Disorders
Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition
Bhavik Vachhani, Chitralekha Bhat and Sunil Kumar Kopparapu
Session: Automatic Detection and Recognition of Voice and Speech Disorders
Biswas, Arijit
Temporal Noise Shaping with Companding
Arijit Biswas, Per Hedelin, Lars Villemoes and Vinay Melkote
Session: Coding
Biswas, Astik
Building a Unified Code-Switching ASR System for South African Languages
Emre Yılmaz, Astik Biswas, Ewald van der Westhuizen, Febe de Wet and Thomas Niesler
Session: Speech Technologies for Code-Switching in Multilingual Communities
Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech
Astik Biswas, Febe de Wet, Ewald van der Westhuizen, Emre Yılmaz and Thomas Niesler
Session: Topics in Speech Recognition
Black, Alan W
Investigating Utterance Level Representations for Detecting Intent from Acoustics
SaiKrishna Rallabandi, Bhavya Karki, Carla Viegas, Eric Nyberg and Alan W Black
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Multimodal Polynomial Fusion for Detecting Driver Distraction
Yulun Du, Alan W Black, Louis-Philippe Morency and Maxine Eskenazi
Session: Spoken Dialogue Systems and Conversational Analysis
An Investigation of Convolution Attention Based Models for Multilingual Speech Synthesis of Indian Languages
Pallavi Baljekar, SaiKrishna Rallabandi and Alan W Black
Session: Speech Synthesis Paradigms and Methods
Bonafonte, Antonio
Visualizing Punctuation Restoration in Speech Transcripts with Prosograph
Alp Öktem, Mireia Farrús and Antonio Bonafonte
Session: Show and Tell 4
Spanish Statistical Parametric Speech Synthesis Using a Neural Vocoder
Antonio Bonafonte, Santiago Pascual and Georgina Dorca
Session: Voice Conversion and Speech Synthesis
Expressive Speech Synthesis Using Sentiment Embeddings
Igor Jauk, Jaime Lorenzo-Trueba, Junichi Yamagishi and Antonio Bonafonte
Session: Expressive Speech Synthesis
Bonastre, Jean-François
Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons
Moez Ajili, Jean-François Bonastre and Solange Rossato
Session: Speaker Verification II
Speech Database and Protocol Validation Using Waveform Entropy
Itshak Lapidot, Héctor Delgado, Massimiliano Todisco, Nicholas Evans and Jean-François Bonastre
Session: Spoken Corpora and Annotation
Boulard, Yannick
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Bourlard, Hervé
CNN Based Query by Example Spoken Term Detection
Dhananjay Ram, Lesly Miculicich and Hervé Bourlard
Session: Spoken Term Detection
Single-channel Late Reverberation Power Spectral Density Estimation Using Denoising Autoencoders
Ina Kodrasi and Hervé Bourlard
Session: Dereverberation
Evolution of Neural Network Architectures for Speech Recognition
Hervé Bourlard
Session: Plenary Talk-2
Phonological Posterior Hashing for Query by Example Spoken Term Detection
Afsaneh Asaei, Dhananjay Ram and Hervé Bourlard
Session: Extracting Information from Audio
Fast Language Adaptation Using Phonological Information
Sibo Tong, Philip N. Garner and Hervé Bourlard
Session: Adjusting to Speaker, Accent, and Domain
Boves, Lou
Analyzing Reaction Time Sequences from Human Participants in Auditory Experiments
Louis ten Bosch, Mirjam Ernestus and Lou Boves
Session: Models of Speech Perception
Analyzing EEG Signals in Auditory Speech Comprehension Using Temporal Response Functions and Generalized Additive Models
Kimberley Mulder, Louis ten Bosch and Lou Boves
Session: Cognition and Brain Studies
Information Encoding by Deep Neural Networks: What Can We Learn?
Louis ten Bosch and Lou Boves
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Bruguier, Antoine
Sequence-to-sequence Neural Network Model with 2D Attention for Learning Japanese Pitch Accents
Antoine Bruguier, Heiga Zen and Arkady Arkhangorodsky
Session: Selected Topics in Neural Speech Processing
Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction
Antoine Bruguier, Anton Bakhtin and Dravyansh Sharma
Session: Acoustic Modelling
Brutti, Richard
Attention-based Sequence Classification for Affect Detection
Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani and John Kane
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Burget, Lukáš
Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors
Anna Silnova, Niko Brümmer, Daniel Garcia-Romero, David Snyder and Lukáš Burget
Session: Speaker Verification I
BUT OpenSAT 2017 Speech Recognition System
Martin Karafiát, Murali Karthick Baskar, Igor Szöke, Vladimír Malenovský, Karel Veselý, František Grézl, Lukáš Burget and Jan Černocký
Session: Topics in Speech Recognition
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
BUT System for Low Resource Indian Language ASR
Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiát, Lukáš Burget and Jan Černocký
Session: Low Resource Speech Recognition Challenge for Indian Languages
i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models
Karel Beneš, Santosh Kesiraju and Lukáš Burget
Session: Language Modeling
Busso, Carlos
Role of Regularization in the Prediction of Valence from Speech
Kusha Sridhar, Srinivas Parthasarathy and Carlos Busso
Session: Emotion Modeling
Predicting Categorical Emotions by Jointly Learning Primary and Secondary Emotions through Multitask Learning
Reza Lotfian and Carlos Busso
Session: Emotion Modeling
Audiovisual Speech Activity Detection with Advanced Long Short-Term Memory
Fei Tao and Carlos Busso
Session: Syllabification, Rhythm, and Voice Activity Detection
Preference-Learning with Qualitative Agreement for Sentence Level Emotional Annotations
Srinivas Parthasarathy and Carlos Busso
Session: Speaker State and Trait
Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes
Srinivas Parthasarathy and Carlos Busso
Session: Emotion Recognition and Analysis
Bäckström, Tom
Dithered Quantization for Frequency-Domain Speech and Audio Coding
Tom Bäckström, Johannes Fischer and Sneha Das
Session: Coding
Postfiltering with Complex Spectral Correlations for Speech and Audio Coding
Sneha Das and Tom Bäckström
Session: Coding
Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding
Sneha Das and Tom Bäckström
Session: Coding
C M, Vikram
Spoken Keyword Detection Using Joint DTW-CNN
Ravi Shankar, Vikram C M and S R Mahadeva Prasanna
Session: Spoken Term Detection
Epoch Extraction from Pathological Children Speech Using Single Pole Filtering Approach
Vikram C M and S R Mahadeva Prasanna
Session: Measuring Pitch and Articulation
Detection of Glottal Activity Errors in Production of Stop Consonants in Children with Cleft Lip and Palate
Vikram C M, S R Mahadeva Prasanna, Ajish K Abraham, Pushpavathi M and Girish K S
Session: Acoustic Analysis-Synthesis of Speech Disorders
C, Mahima
Development of Large Vocabulary Speech Recognition System with Keyword Search for Manipuri
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Speech Recognition for Indian Languages
An Automatic Speech Transcription System for Manipuri Language
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Show and Tell 6
TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages
Noor Fathima, Tanvina Patel, Mahima C and Anuroop Iyengar
Session: Low Resource Speech Recognition Challenge for Indian Languages
Cahill, Aoife
Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task
Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan and Aoife Cahill
Session: Spoken CALL Shared Task, Second Edition
Cai, Lianhong
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection
Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng and Lianhong Cai
Session: Spoken Term Detection
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng and Lianhong Cai
Session: Emotion Recognition and Analysis
Caines, Andrew
Impact of ASR Performance on Free Speaking Language Assessment
Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines
Session: Applications in Education and Learning
Overview of the 2018 Spoken CALL Shared Task
Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei
Session: Spoken CALL Shared Task, Second Edition
Campbell, Thomas
Fusing Text-dependent Word-level i-Vector Models to Screen ‘at Risk’ Child Speech
Prasanna Kothalkar, Johanna Rudolph, Christine Dollaghan, Jennifer McGlothlin, Thomas Campbell and John H.L. Hansen
Session: Integrating Speech Science and Technology for Clinical Applications
Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
Kwanghoon An, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman and Jun Wang
Session: Integrating Speech Science and Technology for Clinical Applications
Cao, Beiming
Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
Myungjong Kim, Beiming Cao, Kwanghoon An and Jun Wang
Session: Application of ASR in Medical Practice
Articulation-to-Speech Synthesis Using Articulatory Flesh Point Sensors’ Orientation Information
Beiming Cao, Myungjong Kim, Jun R. Wang, Jan van Santen, Ted Mau and Jun Wang
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Carmiel, Yishay
Punctuation Prediction Model for Conversational Speech
Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel and Najim Dehak
Session: Topics in Speech Recognition
Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts
Jaejin Cho, Raghavendra Pappagari, Purva Kulkarni, Jesús Villalba, Yishay Carmiel and Najim Dehak
Session: Speaker State and Trait
Casillas, Marisa
Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions
Okko Räsänen, Seshadri Shreyas and Marisa Casillas
Session: Syllabification, Rhythm, and Voice Activity Detection
Talker Diarization in the Wild: the Case of Child-centered Daylong Audio-recordings
Alejandrina Cristia, Shobhana Ganesh, Marisa Casillas and Sriram Ganapathy
Session: Second Language Acquisition and Code-switching
Ceolini, Enea
Speaker Activity Detection and Minimum Variance Beamforming for Source Separation
Enea Ceolini, Jithendar Anumula, Adrian Huber, Ilya Kiselev and Shih-Chii Liu
Session: Source Separation and Spatial Analysis
Multi-channel Attention for End-to-End Speech Recognition
Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini and Shih-Chii Liu
Session: End-to-End Speech Recognition
Cerda, Mauricio
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information
Lucas D. Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan C. Gómez, Bernard Girau and Nancy Hitschfeld-Kahler
Session: Spoken Corpora and Annotation
Cervone, Alessandra
Coherence Models for Dialogue
Alessandra Cervone, Evgeny Stepanov and Giuseppe Riccardi
Session: Multimodal Dialogue Systems
Chaudhuri, Sourish
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Chen, Huan-Yu
Self-Assessed Affect Recognition Using Fusion of Attentional BLSTM and Static Acoustic Features
Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li and Chi-Chun Lee
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Chen, Jie
Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection
Meng Yu, Xuan Ji, Yi Gao, Lianwu Chen, Jie Chen, Jimeng Zheng, Dan Su and Dong Yu
Session: Topics in Speech Recognition
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Chen, Kuan-Yu
Discourse Marker Detection for Hesitation Events on Mandarin Conversation
Yu-Wun Wang, Hen-Hsen Huang, Kuan-Yu Chen and Hsin-Hsi Chen
Session: Speaker Characterization and Analysis
Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings
Da-Rong Liu, Kuan-Yu Chen, Hung-yi Lee and Lin-shan Lee
Session: Acoustic Modelling
Chen, Lianwu
Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection
Meng Yu, Xuan Ji, Yi Gao, Lianwu Chen, Jie Chen, Jimeng Zheng, Dan Su and Dong Yu
Session: Topics in Speech Recognition
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Chen, Nanxin
An Investigation of Non-linear i-vectors for Speaker Verification
Nanxin Chen, Jesús Villalba and Najim Dehak
Session: Speaker Verification I
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
End-to-end Deep Neural Network Age Estimation
Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur and Najim Dehak
Session: Speaker State and Trait
Chen, Szu-Jui
Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline
Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu and Shinji Watanabe
Session: Robust Speech Recognition
Student-Teacher Learning for BLSTM Mask-based Speech Enhancement
Aswin Shanmugam Subramanian, Szu-Jui Chen and Shinji Watanabe
Session: Deep Enhancement
Chen, Zhehuai
A GPU-based WFST Decoder with Exact Lattice Generation
Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Recurrent Neural Models for ASR
Knowledge Distillation for Sequence Model
Mingkun Huang, Yongbin You, Zhehuai Chen, Yanmin Qian and Kai Yu
Session: Acoustic Modelling
Cheng, Gaofeng
Output-Gate Projected Gated Recurrent Unit for Speech Recognition
Gaofeng Cheng, Daniel Povey, Lu Huang, Ji Xu, Sanjeev Khudanpur and Yonghong Yan
Session: Novel Neural Network Architectures for Acoustic Modelling
Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR
Li Wenjie, Gaofeng Cheng, Fengpei Ge, Pengyuan Zhang and Yonghong Yan
Session: Neural Network Training Strategies for ASR
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi and Sanjeev Khudanpur
Session: Acoustic Modelling
Cheng, Jian
Real-Time Scoring of an Oral Reading Assessment on Mobile Devices
Jian Cheng
Session: Applications in Education and Learning
Modeling Self-Reported and Observed Affect from Speech
Jian Cheng, Jared Bernstein, Elizabeth Rosenfeld, Peter W. Foltz, Alex S. Cohen, Terje B. Holmlund and Brita Elvevåg
Session: Emotion Recognition and Analysis
Chiu, Chung-Cheng
Compression of End-to-End Models
Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang and Chung-Cheng Chiu
Session: End-to-End Speech Recognition
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Chng, Eng Siong
Mandarin-English Code-switching Speech Recognition
Haihua Xu, Van Tung Pham, Zin Tun Kyaw, Zhi Hao Lim, Eng Siong Chng and Haizhou Li
Session: Show and Tell 2
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
Pengcheng Guo, Haihua Xu, Lei Xie and Eng Siong Chng
Session: Speech Technologies for Code-Switching in Multilingual Communities
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR
Yerbolat Khassanov and Eng Siong Chng
Session: Language Modeling
A Shifted Delta Coefficient Objective for Monaural Speech Separation Using Multi-task Learning
Chenglin Xu, Wei Rao, Eng Siong Chng and Haizhou Li
Session: Source Separation from Monaural Input
Choi, Iksoo
Hierarchical Recurrent Neural Networks for Acoustic Modeling
Jinhwan Park, Iksoo Choi, Yoonho Boo and Wonyong Sung
Session: Acoustic Modelling
Character-level Language Modeling with Gated Hierarchical Recurrent Neural Networks
Iksoo Choi, Jinhwan Park and Wonyong Sung
Session: ASR Systems and Technologies
Chou, Katherine
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Chung, Joon Son
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung, Arsha Nagrani and Andrew Zisserman
Session: Speaker Verification II
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras, Joon Son Chung and Andrew Zisserman
Session: Deep Enhancement
Deep Lip Reading: A Comparison of Models and an Online Application
Triantafyllos Afouras, Joon Son Chung and Andrew Zisserman
Session: Multimodal Systems
Co, Chris
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Cristia, Alejandrina
The ACLEW DiViMe: An Easy-to-use Diarization Tool
Adrien Le Franc, Eric Riebling, Julien Karadayi, Yun Wang, Camila Scaff, Florian Metze and Alejandrina Cristia
Session: Speaker Diarization
Talker Diarization in the Wild: the Case of Child-centered Daylong Audio-recordings
Alejandrina Cristia, Shobhana Ganesh, Marisa Casillas and Sriram Ganapathy
Session: Second Language Acquisition and Code-switching
Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations
Zixing Zhang, Alejandrina Cristia, Anne Warlaumont and Björn Schuller
Session: Second Language Acquisition and Code-switching
Csapó, Tamás Gábor
Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó and Tamás Gábor Csapó
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Cui, Jia
Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
Session: Sequence Models for ASR
A Multistage Training Framework for Acoustic-to-Word Model
Chengzhu Yu, Chunlei Zhang, Chao Weng, Jia Cui and Dong Yu
Session: Sequence Models for ASR
Cummins, Nicholas
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
Session: Speech and Language Analytics for Mental Health
The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech
Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
State of Mind: Classification through Self-reported Affect and Word Use in Speech.
Eva-Maria Rathner, Yannik Terhorst, Nicholas Cummins, Björn Schuller and Harald Baumeister
Session: Speaker State and Trait
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe and Harald Baumeister
Session: Speech Pathology, Depression, and Medical Applications
D. S., Pavan Kumar
Implementing Fusion Techniques for the Classification of Paralinguistic Information
Bogdan Vlasenko, Jilt Sebastian, Pavan Kumar D. S. and Mathew Magimai.-Doss
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech
Jilt Sebastian, Manoj Kumar, Pavan Kumar D. S., Mathew Magimai.-Doss, Hema Murthy and Shrikanth Narayanan
Session: Speaker State and Trait
Dai, Li-Rong
WaveNet Vocoder with Limited Training Data for Voice Conversion
Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou and Li-Rong Dai
Session: Voice Conversion and Speech Synthesis
Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis
Xiao Zhou, Zhen-Hua Ling, Zhi-Ping Zhou and Li-Rong Dai
Session: Speech Synthesis Paradigms and Methods
Dai, Lirong
Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition
Jian Tang, Yan Song, Lirong Dai and Ian McLoughlin
Session: Novel Neural Network Architectures for Acoustic Modelling
An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition
Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
Session: Representation Learning for Emotion
An Improved Deep Embedding Learning Method for Short Duration Speaker Verification
Zhifu Gao, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
Session: Speaker Verification Using Neural Network Methods II
Dang, Jianwu
Multiple Phase Information Combination for Replay Attacks Detection
Dongbo Li, Longbiao Wang, Jianwu Dang, Meng Liu, Zeyan Oo, Seiichi Nakagawa, Haotian Guan and Xiangang Li
Session: Spoofing Detection
Revealing Spatiotemporal Brain Dynamics of Speech Production Based on EEG and Eye Movement
Bin Zhao, Jinfeng Huang, Gaoyan Zhang, Jianwu Dang, Minbo Chen, YingjianFu and Longbiao Wang
Session: Cognition and Brain Studies
Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
Lili Guo, Longbiao Wang, Jianwu Dang, Linjuan Zhang, Haotian Guan and Xiangang Li
Session: Robust Speech Recognition
Das, Sneha
Dithered Quantization for Frequency-Domain Speech and Audio Coding
Tom Bäckström, Johannes Fischer and Sneha Das
Session: Coding
Postfiltering with Complex Spectral Correlations for Speech and Audio Coding
Sneha Das and Tom Bäckström
Session: Coding
Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding
Sneha Das and Tom Bäckström
Session: Coding
Davis, Chris
Investigating the Role of Familiar Face and Voice Cues in Speech Processing in Noise
Jeesun Kim, Sonya Karisma, Vincent Aubanel and Chris Davis
Session: Speech Perception in Adverse Conditions
Characterizing Rhythm Differences between Strong and Weak Accented L2 Speech
Chris Davis and Jeesun Kim
Session: Second Language Acquisition and Code-switching
Dehak, Najim
An Investigation of Non-linear i-vectors for Speaker Verification
Nanxin Chen, Jesús Villalba and Najim Dehak
Session: Speaker Verification I
Investigation on Bandwidth Extension for Speaker Recognition
Phani Sankar Nidadavolu, Cheng-I Lai, Jesús Villalba and Najim Dehak
Session: Speaker Verification II
Visualizing Phoneme Category Adaptation in Deep Neural Networks
Odette Scharenborg, Sebastian Tiesmeyer, Mark Hasegawa-Johnson and Najim Dehak
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Effectiveness of Single-Channel BLSTM Enhancement for Language Identification
Peter Sibbern Frederiksen, Jesús Villalba, Shinji Watanabe, Zheng-Hua Tan and Najim Dehak
Session: Language Identification
Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages
Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak and Sanjeev Khudanpur
Session: Extracting Information from Audio
Punctuation Prediction Model for Conversational Speech
Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel and Najim Dehak
Session: Topics in Speech Recognition
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts
Jaejin Cho, Raghavendra Pappagari, Purva Kulkarni, Jesús Villalba, Yishay Carmiel and Najim Dehak
Session: Speaker State and Trait
Emotion Identification from Raw Speech Signals Using DNNs
Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma and Najim Dehak
Session: Representation Learning for Emotion
End-to-end Deep Neural Network Age Estimation
Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur and Najim Dehak
Session: Speaker State and Trait
Deka, Abhash
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
Session: Show and Tell 7
Deka, Barsha
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
Session: Show and Tell 7
Delcroix, Marc
Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition
Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, Yoshikazu Yamaguchi and Yushi Aono
Session: Adjusting to Speaker, Accent, and Domain
Auxiliary Feature Based Adaptation of End-to-end ASR Systems
Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita and Tomohiro Nakatani
Session: Adjusting to Speaker, Accent, and Domain
Semi-Supervised End-to-End Speech Recognition
Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa and Marc Delcroix
Session: End-to-End Speech Recognition
Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation
Lukas Drude, Christoph Boeddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix and Tomohiro Nakatani
Session: Distant ASR
Delgado, Héctor
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion
Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen and Junichi Yamagishi
Session: Speaker Verification I
Speech Database and Protocol Validation Using Waveform Entropy
Itshak Lapidot, Héctor Delgado, Massimiliano Todisco, Nicholas Evans and Jean-François Bonastre
Session: Spoken Corpora and Annotation
The EURECOM Submission to the First DIHARD Challenge
Jose Patino, Héctor Delgado and Nicholas Evans
Session: The First DIHARD Speech Diarization Challenge
Dellwo, Volker
The Zurich Corpus of Vowel and Voice Quality, Version 1.0
Dieter Maurer, Christian d’Heureuse, Heidy Suter, Volker Dellwo, Daniel Friedrichs and Thayabaran Kathiresan
Session: Phonation
Influences of Fundamental Oscillation on Speaker Identification in Vocalic Utterances by Humans and Computers
Volker Dellwo, Thayabaran Kathiresan, Elisa Pellegrino, Lei He, Sandra Schwab and Dieter Maurer
Session: Speech and Speaker Perception
Demuynck, Kris
Cross-lingual Speech Emotion Recognition through Factor Analysis
Brecht Desplanques and Kris Demuynck
Session: Emotion Recognition and Analysis
Vowel Space as a Tool to Evaluate Articulation Problems
Rob van Son, Catherine Middag and Kris Demuynck
Session: Acoustic Analysis-Synthesis of Speech Disorders
Deng, Shiwen
Unsupervised Temporal Feature Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition
Liwen Zhang, Jiqing Han and Shiwen Deng
Session: Acoustic Scenes and Rare Events
A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification
Hongwei Song, Jiqing Han and Shiwen Deng
Session: Acoustic Scenes and Rare Events
Dernoncourt, Franck
A Framework for Speech Recognition Benchmarking
Franck Dernoncourt, Trung Bui and Walter Chang
Session: Show and Tell 1
Dey, Abhishek
Robust Mizo Continuous Speech Recognition
Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna, Rohit Sinha and Nirmala S.R.
Session: Speech Recognition for Indian Languages
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
Session: Show and Tell 7
Dey, Subhadeep
Analysis of Language Dependent Front-End for Speaker Recognition
Srikanth Madikeri, Subhadeep Dey and Petr Motlicek
Session: Speaker Verification II
End-to-end Text-dependent Speaker Verification Using Novel Distance Measures
Subhadeep Dey, Srikanth Madikeri and Petr Motlicek
Session: Speaker Verification Using Neural Network Methods II
Diez, Mireia
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Ding, Shaojin
Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function
Shaojin Ding, Guanlong Zhao, Christopher Liberatore and Ricardo Gutierrez-Osuna
Session: Voice Conversion
Learning Structured Dictionaries for Exemplar-based Voice Conversion
Shaojin Ding, Christopher Liberatore and Ricardo Gutierrez-Osuna
Session: Voice Conversion
Djamali, Julia
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe and Harald Baumeister
Session: Speech Pathology, Depression, and Medical Applications
Dmitriev, Evgeny
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Do Carmo Blanco, Noelia
Loud and Shouted Speech Perception at Variable Distances in a Forest
Julien Meyer, Fanny Meunier, Laure Dentel, Noelia Do Carmo Blanco and Frédéric Sèbe
Session: Speech Perception in Adverse Conditions
Phoneme Resistance and Phoneme Confusion in Noise: Impact of Dyslexia
Noelia Do Carmo Blanco, Julien Meyer, Michel Hoen and Fanny Meunier
Session: Speech Perception in Adverse Conditions
Dong, Fengquan
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Dong, Linhao
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Shiyu Zhou, Linhao Dong, Shuang Xu and Bo Xu
Session: Sequence Models for ASR
Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin
Linhao Dong, Shiyu Zhou, Wei Chen and Bo Xu
Session: Sequence Models for ASR
Du, Jun
Speaker Diarization with Enhancing Speech for the First DIHARD Challenge
Lei Sun, Jun Du, Chao Jiang, Xueyang Zhang, Shan He, Bing Yin and Chin-Hui Lee
Session: The First DIHARD Speech Diarization Challenge
Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement
Li Chai, Jun Du and Chin-Hui Lee
Session: Deep Enhancement
Duan, Pengfei
Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer
Yaxing Li, Eshete Derb Emiru, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yichang Li
Session: Coding
Multi-frame Coding of LSF Parameters Using Block-Constrained Trellis Coded Vector Quantization
Yaxing Li, Shan Xu, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yueming Ding
Session: Coding
Dupoux, Emmanuel
End-to-End Speech Recognition from the Raw Waveform
Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert and Emmanuel Dupoux
Session: Sequence Models for ASR
Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning
Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz and Emmanuel Dupoux
Session: Topics in Speech Recognition
Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments
Nils Holzenberger, Mingxing Du, Julien Karadayi, Rachid Riad and Emmanuel Dupoux
Session: Zero-resource Speech Recognition
Earnshaw, Kate
Variation in the FACE Vowel across West Yorkshire: Implications for Forensic Speaker Comparisons
Kate Earnshaw and Erica Gold
Session: Dialectal Variation
The ‘West Yorkshire Regional English Database’: Investigations into the Generalizability of Reference Populations for Forensic Speaker Comparison Casework
Erica Gold, Sula Ross and Kate Earnshaw
Session: Dialectal Variation
Einspieler, Christa
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Ellis, Daniel P. W.
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Enrique Yalta Soplin, Nelson
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Epps, Julien
Detection of Replay-Spoofing Attacks Using Frequency Modulation Features
Tharshini Gunendradasan, Buddhi Wickramasinghe, Ngoc Phu Le, Eliathamby Ambikairajah and Julien Epps
Session: Spoofing Detection
Frequency Domain Linear Prediction Features for Replay Spoofing Attack Detection
Buddhi Wickramasinghe, Saad Irtza, Eliathamby Ambikairajah and Julien Epps
Session: Spoofing Detection
Transfer Learning for Improving Speech Emotion Classification Accuracy
Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir and Julien Epps
Session: Speaker State and Trait
Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study
Siddique Latif, Rajib Rana, Junaid Qadir and Julien Epps
Session: Representation Learning for Emotion
Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions
Zhaocheng Huang, Julien Epps, Dale Joachim and Michael Chen
Session: Speech Pathology, Depression, and Medical Applications
Demonstrating and Modelling Systematic Time-varying Annotator Disagreement in Continuous Emotion Annotation
Mia Atcheson, Vidhyasaharan Sethu and Julien Epps
Session: Emotion Recognition and Analysis
Erdogan, Hakan
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks
Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao and Fil Alleva
Session: Distant ASR
Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation
Hakan Erdogan and Takuya Yoshioka
Session: Source Separation from Monaural Input
Erzin, Engin
Monitoring Infant's Emotional Cry in Domestic Environments Using the Capsule Network Architecture
Mehmet Ali Tuğtekin Turan and Engin Erzin
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions
Bekir Berker Türker, Engin Erzin, Yücel Yemez and Metin Sezgin
Session: Speaker Characterization and Analysis
Espy-Wilson, Carol
Noise Robust Acoustic to Articulatory Speech Inversion
Nadee Seneviratne, Ganesh Sivaraman, Vikramjit Mitra and Carol Espy-Wilson
Session: Articulatory Information, Modeling and Inversion
On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks
Saurabh Sahu, Rahul Gupta and Carol Espy-Wilson
Session: Emotion Recognition and Analysis
Estève, Yannick
Task Specific Sentence Embeddings for ASR Error Detection
Sahar Ghannay, Yannick Estève and Nathalie Camelin
Session: Selected Topics in Neural Speech Processing
Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition
Natalia Tomashenko, Yuri Khokhlov and Yannick Estève
Session: Adjusting to Speaker, Accent, and Domain
Acoustic-dependent Phonemic Transcription for Text-to-speech Synthesis
Kévin Vythelingum, Yannick Estève and Olivier Rosec
Session: Speech Synthesis Paradigms and Methods
Evanini, Keelan
Game-based Spoken Dialog Language Learning Applications for Young Students
Keelan Evanini, Veronika Timpe-Laughlin, Eugene Tsuprun, Ian Blood, Jeremy Lee, James Bruno, Vikram Ramanarayanan, Patrick Lange and David Suendermann-Oeft
Session: Show and Tell 2
Toward Scalable Dialog Technology for Conversational Language Learning: Case Study of the TOEFL® MOOC
Vikram Ramanarayanan, David Pautler, Patrick Lange, Eugene Tsuprun, Rutuja Ubale, Keelan Evanini and David Suendermann-Oeft
Session: Show and Tell 5
Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task
Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan and Aoife Cahill
Session: Spoken CALL Shared Task, Second Edition
Evans, Nicholas
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion
Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen and Junichi Yamagishi
Session: Speaker Verification I
Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders
Pramod Bachhav, Massimiliano Todisco and Nicholas Evans
Session: Novel Approaches to Enhancement
Speech Database and Protocol Validation Using Waveform Entropy
Itshak Lapidot, Héctor Delgado, Massimiliano Todisco, Nicholas Evans and Jean-François Bonastre
Session: Spoken Corpora and Annotation
The EURECOM Submission to the First DIHARD Challenge
Jose Patino, Héctor Delgado and Nicholas Evans
Session: The First DIHARD Speech Diarization Challenge
Fathima, Noor
Development of Large Vocabulary Speech Recognition System with Keyword Search for Manipuri
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Speech Recognition for Indian Languages
An Automatic Speech Transcription System for Manipuri Language
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Show and Tell 6
TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages
Noor Fathima, Tanvina Patel, Mahima C and Anuroop Iyengar
Session: Low Resource Speech Recognition Challenge for Indian Languages
Fedotov, Dmitrii
LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition
Heysem Kaya, Dmitrii Fedotov, Ali Yeşilkanat, Oxana Verkholyak, Yang Zhang and Alexey Karpov
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Feng, Siyuan
Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer
Siyuan Feng and Tan Lee
Session: Adjusting to Speaker, Accent, and Domain
Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling
Siyuan Feng and Tan Lee
Session: Zero-resource Speech Recognition
Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning
Ying Qin, Tan Lee, Siyuan Feng and Anthony Pak Hin Kong
Session: Speech Pathology, Depression, and Medical Applications
Feng, Zhe
An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng and Taufiq Hasan
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Flemotomos, Nikolaos
Combined Speaker Clustering and Role Recognition in Conversational Speech
Nikolaos Flemotomos, Pavlos Papadopoulos, James Gibson and Shrikanth Narayanan
Session: Speaker Diarization
Language Features for Automated Evaluation of Cognitive Behavior Psychotherapy Sessions
Nikolaos Flemotomos, Victor Martinez, James Gibson, David Atkins, Torrey Creed and Shrikanth Narayanan
Session: Integrating Speech Science and Technology for Clinical Applications
Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy
Karan Singla, Zhuohao Chen, Nikolaos Flemotomos, James Gibson, Dogan Can, David Atkins and Shrikanth Narayanan
Session: Speech Pathology, Depression, and Medical Applications
Franco, Horacio
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Articulatory Features for ASR of Pathological Speech
Emre Yılmaz, Vikramjit Mitra, Chris Bartels and Horacio Franco
Session: Application of ASR in Medical Practice
Fu, Ruibo
Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
Session: Statistical Parametric Speech Synthesis
On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis
Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ruibo Fu
Session: Statistical Parametric Speech Synthesis
Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer
Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
Session: Speech Synthesis Paradigms and Methods
Gafos, Adamantios
Speaker-specific Structure in German Voiceless Stop Voice Onset Times
Marc Antony Hullebus, Stephen Tobin and Adamantios Gafos
Session: Phonation
Perceptual Sensitivity to Spectral Change in Australian English Close Front Vowels: An Electroencephalographic Investigation
Daniel Williams, Paola Escudero and Adamantios Gafos
Session: Cognition and Brain Studies
Gales, Mark
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems
Yu Wang, Chao Zhang, Mark Gales and Philip Woodland
Session: Acoustic Model Adaptation
A Deep Learning Approach to Assessing Non-native Pronunciation of English Using Phone Distances
Konstantinos Kyriakopoulos, Kate Knill and Mark Gales
Session: Applications in Education and Learning
Impact of ASR Performance on Free Speaking Language Assessment
Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines
Session: Applications in Education and Learning
Automatic Speech Recognition System Development in the "Wild"
Anton Ragni and Mark Gales
Session: Recurrent Neural Models for ASR
Active Memory Networks for Language Modeling
Oscar Chen, Anton Ragni, Mark Gales and Xie Chen
Session: Language Modeling
Gallagher, Andrew
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Galvez, Rodrigo
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information
Lucas D. Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan C. Gómez, Bernard Girau and Nancy Hitschfeld-Kahler
Session: Spoken Corpora and Annotation
Gamble, Paul
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Deep Speech Denoising with Vector Space Projections
Jeffrey Hetherly, Paul Gamble, Maria Alejandra Barrios, Cory Stephenson and Karl Ni
Session: Source Separation from Monaural Input
Ganapathy, Sriram
Supervised I-vector Modeling - Theory and Applications
Shreyas Ramoji and Sriram Ganapathy
Session: Speaker Verification II
On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification
Rajath Kumar, Vaishnavi Yeruva and Sriram Ganapathy
Session: Speaker Verification II
Talker Diarization in the Wild: the Case of Child-centered Daylong Audio-recordings
Alejandrina Cristia, Shobhana Ganesh, Marisa Casillas and Sriram Ganapathy
Session: Second Language Acquisition and Code-switching
Comparison of Unsupervised Modulation Filter Learning Methods for ASR
Purvi Agrawal and Sriram Ganapathy
Session: Neural Network Training Strategies for ASR
Far-Field Speech Recognition Using Multivariate Autoregressive Models
Sriram Ganapathy and Madhumita Harish
Session: Distant ASR
Speaker and Language Recognition -- From Laboratory Technologies to the Wild
Sriram Ganapathy
Session: Perspective Talk-4
Garcia-Romero, Daniel
Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors
Anna Silnova, Niko Brümmer, Daniel Garcia-Romero, David Snyder and Lukáš Burget
Session: Speaker Verification I
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
Garner, Philip N.
Fast Language Adaptation Using Phonological Information
Sibo Tong, Philip N. Garner and Hervé Bourlard
Session: Adjusting to Speaker, Accent, and Domain
A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation
Bastian Schnell and Philip N. Garner
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Georgiou, Panayiotis
An Unsupervised Neural Prediction Framework for Learning Speaker Embeddings Using Recurrent Neural Networks
Arindam Jati and Panayiotis Georgiou
Session: Speaker Verification II
Multimodal Speaker Segmentation and Diarization Using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks
Tae Jin Park and Panayiotis Georgiou
Session: Speaker Diarization
Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions
Sandeep Nallan Chakravarthula, Brian Baucom and Panayiotis Georgiou
Session: Speech and Language Analytics for Mental Health
Towards an Unsupervised Entrainment Distance in Conversational Speech Using Deep Neural Networks
Md Nasir, Brian Baucom, Shrikanth Narayanan and Panayiotis Georgiou
Session: Speech Pathology, Depression, and Medical Applications
Ghaffarzadegan, Shabnam
An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng and Taufiq Hasan
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Ghahremani, Pegah
Acoustic Modeling from Frequency Domain Representations of Speech
Pegah Ghahremani, Hossein Hadian, Hang Lv, Daniel Povey and Sanjeev Khudanpur
Session: Robust Speech Recognition
Emotion Identification from Raw Speech Signals Using DNNs
Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma and Najim Dehak
Session: Representation Learning for Emotion
End-to-end Deep Neural Network Age Estimation
Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur and Najim Dehak
Session: Speaker State and Trait
Ghosh, Prasanta Kumar
Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs
G. Nisha Meenakshi and Prasanta Kumar Ghosh
Session: Voice Conversion
Intonation tutor by SPIRE (In-SPIRE): An Online Tool for an Automatic Feedback to the Second Language Learners in Learning Intonation
Anand P A, Chiranjeevi Yarra, Kausthubha N K and Prasanta Kumar Ghosh
Session: Show and Tell 2
Subband Weighting for Binaural Speech Source Localization
Karthik Girija Ramesan, Parth Suresh and Prasanta Kumar Ghosh
Session: Source Separation and Spatial Analysis
Reconstructing Neutral Speech from Tracheoesophageal Speech
Abinay Reddy N, Achuth Rao MV, G. Nisha Meenakshi and Prasanta Kumar Ghosh
Session: Speech and Singing Production
SPIRE-SST: An Automatic Web-based Self-learning Tool for Syllable Stress Tutoring (SST) to the Second Language Learners
Chiranjeevi Yarra, Anand P A, Kausthubha N K and Prasanta Kumar Ghosh
Session: Show and Tell 6
Relating Articulatory Motions in Different Speaking Rates
Astha Singh, G. Nisha Meenakshi and Prasanta Kumar Ghosh
Session: Source and Supra-segmentals
Automatic Glottis Localization and Segmentation in Stroboscopic Videos Using Deep Neural Network
Achuth Rao MV, Rahul Krishnamurthy, Pebbili Gopikishore, Veeramani Priyadharshini and Prasanta Kumar Ghosh
Session: Source and Supra-segmentals
Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory
Aravind Illa and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training
Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video Using Semantic Segmentation with Fully Convolutional Networks
Valliappan CA, Renuka Mannem and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
Speech Enhancement Using Deep Mixture of Experts Based on Hard Expectation Maximization
Pavan Karjol and Prasanta Kumar Ghosh
Session: Deep Enhancement
Gibson, James
Combined Speaker Clustering and Role Recognition in Conversational Speech
Nikolaos Flemotomos, Pavlos Papadopoulos, James Gibson and Shrikanth Narayanan
Session: Speaker Diarization
Language Features for Automated Evaluation of Cognitive Behavior Psychotherapy Sessions
Nikolaos Flemotomos, Victor Martinez, James Gibson, David Atkins, Torrey Creed and Shrikanth Narayanan
Session: Integrating Speech Science and Technology for Clinical Applications
Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy
Karan Singla, Zhuohao Chen, Nikolaos Flemotomos, James Gibson, Dogan Can, David Atkins and Shrikanth Narayanan
Session: Speech Pathology, Depression, and Medical Applications
Girau, Bernard
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information
Lucas D. Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan C. Gómez, Bernard Girau and Nancy Hitschfeld-Kahler
Session: Spoken Corpora and Annotation
Glass, James
Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech
Yu-An Chung and James Glass
Session: Sequence Models for ASR
Scalable Factorized Hierarchical Variational Autoencoder Training
Wei-Ning Hsu and James Glass
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
Wei-Ning Hsu, Hao Tang and James Glass
Session: Robust Speech Recognition
Detecting Depression with Audio/Text Sequence Modeling of Interviews
Tuka Al Hanai, Mohammad Ghassemi and James Glass
Session: Integrating Speech Science and Technology for Clinical Applications
A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition
Hao Tang, Wei-Ning Hsu, François Grondin and James Glass
Session: Neural Network Training Strategies for ASR
Glembek, Ondřej
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Gobl, Christer
Voice Source Contribution to Prominence Perception: Rd Implementation
Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide and Christer Gobl
Session: Speech Segments and Voice Quality
On the Relationship between Glottal Pulse Shape and Its Spectrum: Correlations of Open Quotient, Pulse Skew and Peak Flow with Source Harmonic Amplitudes
Christer Gobl, Andy Murphy, Irena Yanushevskaya and Ailbhe Ní Chasaide
Session: Speech Segments and Voice Quality
Gogoi, Parismita
Robust Mizo Continuous Speech Recognition
Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna, Rohit Sinha and Nirmala S.R.
Session: Speech Recognition for Indian Languages
Analysis of Breathiness in Contextual Vowel of Voiceless Nasals in Mizo
Pamir Gogoi, Sishir Kalita, Parismita Gogoi, Ratree Wayland, Priyankoo Sarmah, S R Mahadeva Prasanna
Session: Speech Segments and Voice Quality
Gold, Erica
Articulation Rate as a Speaker Discriminant in British English
Erica Gold
Session: Production of Prosody
Variation in the FACE Vowel across West Yorkshire: Implications for Forensic Speaker Comparisons
Kate Earnshaw and Erica Gold
Session: Dialectal Variation
The ‘West Yorkshire Regional English Database’: Investigations into the Generalizability of Reference Populations for Forensic Speaker Comparison Casework
Erica Gold, Sula Ross and Kate Earnshaw
Session: Dialectal Variation
Goldwater, Sharon
Low-Resource Speech-to-Text Translation
Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez and Sharon Goldwater
Session: Selected Topics in Neural Speech Processing
Multilingual Bottleneck Features for Subword Modeling in Zero-resource Languages
Enno Hermann and Sharon Goldwater
Session: Zero-resource Speech Recognition
Gong, Yifan
Cycle-Consistent Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong and Biing-Hwang (Fred) Juang
Session: Novel Approaches to Enhancement
Layer Trajectory LSTM
Jinyu Li, Changliang Liu and Yifan Gong
Session: Novel Neural Network Architectures for Acoustic Modelling
Industry presentation by Microsoft
Yifan Gong
Session: Industry Presentation-4
Adversarial Feature-Mapping for Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong and Biing-Hwang (Fred) Juang
Session: Deep Enhancement
Gorrostieta, Cristina
Attention-based Sequence Classification for Affect Detection
Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani and John Kane
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Gosztolya, Gábor
General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats
Gábor Gosztolya, Tamás Grósz and László Tóth
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
User-centric Evaluation of Automatic Punctuation in ASR Closed Captioning
Máté Ákos Tündik, György Szaszák, Gábor Gosztolya and András Beke
Session: Topics in Speech Recognition
Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó and Tamás Gábor Csapó
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech
Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi and Ildikó Hoffmann
Session: Speech Pathology, Depression, and Medical Applications
Govender, Avashna
Using Pupillometry to Measure the Cognitive Load of Synthetic Speech
Avashna Govender and Simon King
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Measuring the Cognitive Load of Synthetic Speech Using a Dual Task Paradigm
Avashna Govender and Simon King
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Graciarena, Martin
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson and Martin Graciarena
Session: Speaker Verification II
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Green, Jordan
Automatic Detection of Orofacial Impairment in Stroke
Andrea Bandini, Jordan Green, Brian Richburg and Yana Yunusova
Session: Integrating Speech Science and Technology for Clinical Applications
Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
Kwanghoon An, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman and Jun Wang
Session: Integrating Speech Science and Technology for Clinical Applications
Grósz, Tamás
General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats
Gábor Gosztolya, Tamás Grósz and László Tóth
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó and Tamás Gábor Csapó
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Guan, Haotian
Multiple Phase Information Combination for Replay Attacks Detection
Dongbo Li, Longbiao Wang, Jianwu Dang, Meng Liu, Zeyan Oo, Seiichi Nakagawa, Haotian Guan and Xiangang Li
Session: Spoofing Detection
Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
Lili Guo, Longbiao Wang, Jianwu Dang, Linjuan Zhang, Haotian Guan and Xiangang Li
Session: Robust Speech Recognition
Guarino Reid, Loretta
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Gui, Xiangquan
Acoustic Features Associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders
Wang Zhang, Xiangquan Gui, Tianqi Wang, Manwa Ng, Feng Yang, Lan Wang and Nan Yan
Session: Integrating Speech Science and Technology for Clinical Applications
Guo, Jinxi
Effectiveness of Voice Quality Features in Detecting Depression
Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Jonathan Flint and Abeer Alwan
Session: Integrating Speech Science and Technology for Clinical Applications
Filter Sampling and Combination CNN (FSC-CNN): A Compact CNN Model for Small-footprint ASR Acoustic Modeling Using Raw Waveforms
Jinxi Guo, Ning Xu, Xin Chen, Yang Shi, Kaiyuan Xu and Abeer Alwan
Session: Acoustic Modelling
Guo, Wu
Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification
Lanhua You, Wu Guo, Yan Song and Sheng Zhang
Session: Speaker Verification I
Gated Convolutional Neural Network for Sentence Matching
Peixin Chen, Wu Guo, Zhi Chen, Jian Sun and Lanhua You
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition
Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
Session: Representation Learning for Emotion
An Improved Deep Embedding Learning Method for Short Duration Speaker Verification
Zhifu Gao, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
Session: Speaker Verification Using Neural Network Methods II
Gupta, Vishwa
CRIM's System for the MGB-3 English Multi-Genre Broadcast Media Transcription
Vishwa Gupta and Gilles Boulianne
Session: Topics in Speech Recognition
Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
Gautam Bhattacharya, Md Jahangir Alam, Vishwa Gupta and Patrick Kenny
Session: Speaker Verification Using Neural Network Methods II
Gurugubelli, Krishna
An Exploration towards Joint Acoustic Modeling for Indian Languages: IIIT-H Submission for Low Resource Speech Recognition Challenge for Indian Languages, INTERSPEECH 2018
Hari Krishna, Krishna Gurugubelli, Vishnu Vidyadhara Raju V and Anil Kumar Vuppala
Session: Low Resource Speech Recognition Challenge for Indian Languages
Gutierrez-Osuna, Ricardo
Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function
Shaojin Ding, Guanlong Zhao, Christopher Liberatore and Ricardo Gutierrez-Osuna
Session: Voice Conversion
Learning Structured Dictionaries for Exemplar-based Voice Conversion
Shaojin Ding, Christopher Liberatore and Ricardo Gutierrez-Osuna
Session: Voice Conversion
L2-ARCTIC: A Non-native English Speech Corpus
Guanlong Zhao, Sinem Sonsaat, Alif Silpachai, Ivana Lucic, Evgeny Chukharev-Hudilainen, John Levis and Ricardo Gutierrez-Osuna
Session: Spoken Corpora and Annotation
Gómez, Juan C.
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information
Lucas D. Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan C. Gómez, Bernard Girau and Nancy Hitschfeld-Kahler
Session: Spoken Corpora and Annotation
Hadian, Hossein
Acoustic Modeling from Frequency Domain Representations of Speech
Pegah Ghahremani, Hossein Hadian, Hang Lv, Daniel Povey and Sanjeev Khudanpur
Session: Robust Speech Recognition
End-to-end Speech Recognition Using Lattice-free MMI
Hossein Hadian, Hossein Sameti, Daniel Povey and Sanjeev Khudanpur
Session: End-to-End Speech Recognition
Haeb-Umbach, Reinhold
Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery
Thomas Glarner, Patrick Hanebrink, Janek Ebbers and Reinhold Haeb-Umbach
Session: Zero-resource Speech Recognition
Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation
Lukas Drude, Christoph Boeddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix and Tomohiro Nakatani
Session: Distant ASR
Haider, Fasih
Improving Response Time of Active Speaker Detection Using Visual Prosody Information Prior to Articulation
Fasih Haider, Saturnino Luz, Carl Vogel and Nick Campbell
Session: Speaker Characterization and Analysis
An Active Feature Transformation Method for Attitude Recognition of Video Bloggers
Fasih Haider, Fahim A. Salim, Owen Conlan and Saturnino Luz
Session: Deception, Personality, and Culture Attribute
Hain, Thomas
On the Usefulness of the Speech Phase Spectrum for Pitch Extraction
Erfan Loweimi, Jon Barker and Thomas Hain
Session: Speech Analysis and Representation
Improved Acoustic Modelling for Automatic Literacy Assessment of Children
Mauro Nicolao, Michiel Sanders and Thomas Hain
Session: Applications in Education and Learning
Han, Jing
Evolving Learning for Analysing Mood-Related Infant Vocalisation
Zixing Zhang, Jing Han, Kun Qian and Björn Schuller
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech
Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller
Session: Representation Learning for Emotion
Han, Jiqing
Unsupervised Temporal Feature Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition
Liwen Zhang, Jiqing Han and Shiwen Deng
Session: Acoustic Scenes and Rare Events
A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification
Hongwei Song, Jiqing Han and Shiwen Deng
Session: Acoustic Scenes and Rare Events
Hansen, John H.L.
Speaker Recognition with Nonlinear Distortion: Clipping Analysis and Impact
Wei Xia and John H.L. Hansen
Session: Speech Analysis and Representation
Compensation for Domain Mismatch in Text-independent Speaker Recognition
Fahimeh Bahmaninezhad and John H.L. Hansen
Session: Speaker Verification II
Fusing Text-dependent Word-level i-Vector Models to Screen ‘at Risk’ Child Speech
Prasanna Kothalkar, Johanna Rudolph, Christine Dollaghan, Jennifer McGlothlin, Thomas Campbell and John H.L. Hansen
Session: Integrating Speech Science and Technology for Clinical Applications
Testing Paradigms for Assistive Hearing Devices in Diverse Acoustic Environments
Ram Charan Chandra Shekar, Hussnain Ali and John H.L. Hansen
Session: Integrating Speech Science and Technology for Clinical Applications
Assessing Speaker Engagement in 2-Person Debates: Overlap Detection in United States Presidential Debates
Midia Yousefi, Navid Shokouhi and John H.L. Hansen
Session: Signal Analysis for the Natural, Biological and Social Sciences
Leveraging Native Language Information for Improved Accented Speech Recognition
Shahram Ghorbani and John H.L. Hansen
Session: Adjusting to Speaker, Accent, and Domain
Fearless Steps: Apollo-11 Corpus Advancements for Speech Technologies from Earth to the Moon
John H.L. Hansen, Abhijeet Sangwan, Aditya Joglekar, Ahmet E. Bulut, Lakshmish Kaushik and Chengzhu Yu
Session: Spoken Corpora and Annotation
Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams
Harishchandra Dubey, Abhijeet Sangwan and John H.L. Hansen
Session: Speaker Verification Using Neural Network Methods II
Hantke, Simone
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech
Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Annotator Trustability-based Cooperative Learning Solutions for Intelligent Audio Analysis
Simone Hantke, Christoph Stemp and Björn Schuller
Session: Multimodal Systems
Hasan, Taufiq
An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng and Taufiq Hasan
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Hasegawa-Johnson, Mark
Visualizing Phoneme Category Adaptation in Deep Neural Networks
Odette Scharenborg, Sebastian Tiesmeyer, Mark Hasegawa-Johnson and Najim Dehak
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning
Wenda Chen, Mark Hasegawa-Johnson and Nancy F. Chen
Session: Extracting Information from Audio
Improving DNNs Trained with Non-Native Transcriptions Using Knowledge Distillation and Target Interpolation
Amit Das and Mark Hasegawa-Johnson
Session: Adjusting to Speaker, Accent, and Domain
Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks
Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson and Deming Chen
Session: Topics in Speech Recognition
Infant Emotional Outbursts Detection in Infant-parent Spoken Interactions
Yijia Xu, Mark Hasegawa-Johnson and Nancy McElwain
Session: Speaker State and Trait
Speaker Adaptive Audio-Visual Fusion for the Open-Vocabulary Section of AVICAR
Leda Sari, Mark Hasegawa-Johnson, Kumaran S, Georg Stemmer and Krishnakumar N Nair
Session: Multimodal Systems
Hayashi, Tomoki
Multi-Head Decoder for End-to-End Speech Recognition
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda and Kazuya Takeda
Session: Sequence Models for ASR
Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder
Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing and Tomoki Toda
Session: Voice Conversion and Speech Synthesis
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
He, Lei
A New Glottal Neural Vocoder for Speech Synthesis
Yang Cui, Xi Wang, Lei He and Frank K. Soong
Session: Voice Conversion and Speech Synthesis
Influences of Fundamental Oscillation on Speaker Identification in Vocalic Utterances by Humans and Computers
Volker Dellwo, Thayabaran Kathiresan, Elisa Pellegrino, Lei He, Sandra Schwab and Dieter Maurer
Session: Speech and Speaker Perception
He, Liang
Speaker Embedding Extraction with Phonetic Information
Yi Liu, Liang He, Jia Liu and Michael T. Johnson
Session: Speaker Verification Using Neural Network Methods I
MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks
Wenhao Ding and Liang He
Session: Speaker Verification Using Neural Network Methods II
Heaton, Kristin
Vocal Biomarkers for Cognitive Performance Estimation in a Working Memory Task
Jennifer Sloboda, Adam Lammert, James Williamson, Christopher Smalt, Daryush D. Mehta, COL Ian Curry, Kristin Heaton, Jeffrey Palmer and Thomas Quatieri
Session: Speaker Characterization and Analysis
The Effect of Exposure to High Altitude and Heat on Speech Articulatory Coordination
James Williamson, Thomas Quatieri, Adam Lammert, Katherine Mitchell, Katherine Finkelstein, Nicole Ekon, Caitlin Dillon, Robert Kenefick and Kristin Heaton
Session: Speaker State and Trait
Heitzman, Daragh
Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
Kwanghoon An, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman and Jun Wang
Session: Integrating Speech Science and Technology for Clinical Applications
Heldner, Mattias
Creak in the Respiratory Cycle
Kätlin Aare, Pärtel Lippus, Marcin Wlodarczak and Mattias Heldner
Session: Phonation
Hermes, Anne
Age-related Effects on Sensorimotor Control of Speech Production
Anne Hermes, Jane Mertens and Doris Mücke
Session: Speech and Singing Production
Structural Effects on Properties of Consonantal Gestures in Tashlhiyt
Anne Hermes, Doris Mücke, Bastian Auris and Rachid Ridouane
Session: Speech Segments and Voice Quality
Hetherly, Jeffrey
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Deep Speech Denoising with Vector Space Projections
Jeffrey Hetherly, Paul Gamble, Maria Alejandra Barrios, Cory Stephenson and Karl Ni
Session: Source Separation from Monaural Input
Heymann, Jahn
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation
Lukas Drude, Christoph Boeddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix and Tomohiro Nakatani
Session: Distant ASR
Himawan, Ivan
Deep Learning Techniques for Koala Activity Detection
Ivan Himawan, Michael Towsey, Bradley Law and Paul Roe
Session: Signal Analysis for the Natural, Biological and Social Sciences
Employing Phonetic Information in DNN Speaker Embeddings to Improve Speaker Recognition Performance
Md Hafizur Rahman, Ivan Himawan, Mitchell McLaren, Clinton Fookes and Sridha Sridharan
Session: Speaker Verification Using Neural Network Methods II
Hirschberg, Julia
The Role of Cognate Words, POS Tags and Entrainment in Code-Switching
Victor Soto, Nishmar Cestero and Julia Hirschberg
Session: Speech Technologies for Code-Switching in Multilingual Communities
A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis
Kai-Zhan Lee, Erica Cooper and Julia Hirschberg
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Predicting Arousal and Valence from Waveforms and Spectrograms Using Deep Neural Networks
Zixiaofan Yang and Julia Hirschberg
Session: Representation Learning for Emotion
Acoustic-Prosodic Indicators of Deception and Trust in Interview Dialogues
Sarah Ita Levitan, Angel Maredia and Julia Hirschberg
Session: Deception, Personality, and Culture Attribute
Deep Personality Recognition for Deception Detection
Guozhen An, Sarah Ita Levitan, Julia Hirschberg and Rivka Levitan
Session: Deception, Personality, and Culture Attribute
Hitschfeld-Kahler, Nancy
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information
Lucas D. Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan C. Gómez, Bernard Girau and Nancy Hitschfeld-Kahler
Session: Spoken Corpora and Annotation
Hoffmeister, Björn
Device-directed Utterance Detection
Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas and Björn Hoffmeister
Session: Syllabification, Rhythm, and Voice Activity Detection
Industry presentation by Amazon
Björn Hoffmeister and Sri Garimella
Session: Industry Presentation-1
Hoory, Ron
The IBM Virtual Voice Creator
Alexander Sorin, Slava Shechtman, Zvi Kons, Ron Hoory, Shay Ben-David, Joe Pavitt, Shai Rozenberg, Carmel Rabinovitz and Tal Drory
Session: Show and Tell 2
Word Emphasis Prediction for Expressive Text to Speech
Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev and David Konopnicki
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Hori, Takaaki
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Hrúz, Marek
ZCU-NTIS Speaker Diarization System for the DIHARD 2018 Challenge
Zbyněk Zajíc, Marie Kunešová, Jan Zelinka and Marek Hrúz
Session: The First DIHARD Speech Diarization Challenge
Multimodal Name Recognition in Live TV Subtitling
Marek Hrúz, Aleš Pražák and Michal Bušta
Session: Multimodal Systems
Hsu, Wei-Ning
Scalable Factorized Hierarchical Variational Autoencoder Training
Wei-Ning Hsu and James Glass
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
Wei-Ning Hsu, Hao Tang and James Glass
Session: Robust Speech Recognition
A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition
Hao Tang, Wei-Ning Hsu, François Grondin and James Glass
Session: Neural Network Training Strategies for ASR
Hu, Shoukang
Gaussian Process Neural Networks for Speech Recognition
Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu and Helen Meng
Session: Novel Neural Network Architectures for Acoustic Modelling
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
Jianwei Yu, Xurong Xie, Shansong Liu, Shoukang Hu, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu and Helen Meng
Session: Application of ASR in Medical Practice
Huang, Dongyan
Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition
Danqing Luo, Yuexian Zou and Dongyan Huang
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Huber, Rainer
Prediction of Perceived Speech Quality Using Deep Machine Listening
Jasper Ooster, Rainer Huber and Bernd T. Meyer
Session: Models of Speech Perception
Prediction of Subjective Listening Effort from Acoustic Data with Non-Intrusive Deep Models
Paul Kranzusch, Rainer Huber, Melanie Krüger, Birger Kollmeier and Bernd T. Meyer
Session: Models of Speech Perception
Huckvale, Mark
Neural Network Architecture That Combines Temporal and Summative Features for Infant Cry Classification in the Interspeech 2018 Computational Paralinguistics Challenge
Mark Huckvale
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Humayun, Ahmed Imtiaz
An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng and Taufiq Hasan
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Hunger-Schoppe, Christina
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe and Harald Baumeister
Session: Speech Pathology, Depression, and Medical Applications
Hwang, Hsin-Te
Exemplar-Based Spectral Detail Compensation for Voice Conversion
Yu-Huai Peng, Hsin-Te Hwang, Yichiao Wu, Yu Tsao and Hsin-Min Wang
Session: Voice Conversion
Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM
Szu-wei Fu, Yu Tsao, Hsin-Te Hwang and Hsin-Min Wang
Session: Speech Intelligibility and Quality
Ikeda, Manabu
Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents
Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo and Satoshi Nakamura
Session: Integrating Speech Science and Technology for Clinical Applications
Imani, Siddika
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
Session: Show and Tell 7
Inoue, Koji
Engagement Recognition in Spoken Dialogue via Neural Network by Aggregating Different Annotators' Models
Koji Inoue, Divesh Lala, Katsuya Takanashi and Tatsuya Kawahara
Session: Spoken Dialogue Systems and Conversational Analysis
Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers
Kohei Hara, Koji Inoue, Katsuya Takanashi and Tatsuya Kawahara
Session: Multimodal Dialogue Systems
Inoue, Nakamasa
Detecting Alzheimer’s Disease Using Gated Convolutional Neural Network from Audio Data
Tifani Warnita, Nakamasa Inoue and Koichi Shinoda
Session: Integrating Speech Science and Technology for Clinical Applications
I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification
Jiacen Zhang, Nakamasa Inoue and Koichi Shinoda
Session: Speaker Verification Using Neural Network Methods II
Irie, Kazuki
Improved Training of End-to-end Attention Models for Speech Recognition
Albert Zeyer, Kazuki Irie, Ralf Schlüter and Hermann Ney
Session: End-to-End Speech Recognition
Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs
Kazuki Irie, Zhihong Lei, Liuhui Deng, Ralf Schlüter and Hermann Ney
Session: ASR Systems and Technologies
Irino, Toshio
Multi-resolution Gammachirp Envelope Distortion Index for Intelligibility Prediction of Noisy Speech
Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita and Tomohiro Nakatani
Session: Speech Intelligibility and Quality
Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis
Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda and Toshio Irino
Session: Voice Conversion and Speech Synthesis
Iyengar, Anuroop
Development of Large Vocabulary Speech Recognition System with Keyword Search for Manipuri
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Speech Recognition for Indian Languages
An Automatic Speech Transcription System for Manipuri Language
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Show and Tell 6
TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages
Noor Fathima, Tanvina Patel, Mahima C and Anuroop Iyengar
Session: Low Resource Speech Recognition Challenge for Indian Languages
Jaitly, Navdeep
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Jančovič, Peter
Exploring How Phone Classification Neural Networks Learn Phonetic Information by Visualising and Interpreting Bottleneck Features
Linxue Bai, Philip Weber, Peter Jančovič and Martin Russell
Session: Deep Neural Networks: How Can We Interpret What They Learned?
The University of Birmingham 2018 Spoken CALL Shared Task Systems
Mengjie Qian, Xizi Wei, Peter Jančovič and Martin Russell
Session: Spoken CALL Shared Task, Second Edition
Phone Recognition Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs
Mengjie Qian, Linxue Bai, Peter Jančovič and Martin Russell
Session: Acoustic Modelling
Jaunzeikare, Diana
Semi-supervised Learning for Information Extraction from Dialogue
Anjuli Kannan, Kai Chen, Diana Jaunzeikare and Alvin Rajkomar
Session: Extracting Information from Audio
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Jia, Yuan
WaveNet Vocoder with Limited Training Data for Voice Conversion
Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou and Li-Rong Dai
Session: Voice Conversion and Speech Synthesis
Stress Distribution of Given Information in Chinese Reading Texts
Yuan Jia and Xiaoxiao Ma
Session: Speech Prosody
Jin, Hongxia
Training Recurrent Neural Network through Moment Matching for NLP Applications
Yue Deng, Yilin Shen, KaWai Chen and Hongxia Jin
Session: Language Modeling
A Deep Reinforcement Learning Based Multimodal Coaching Model (DCM) for Slot Filling in Spoken Language Understanding(SLU)
Yu Wang, Abhishek Patel, Yilin Shen and Hongxia Jin
Session: Spoken Language Understanding
Robust Spoken Language Understanding via Paraphrasing
Avik Ray, Yilin Shen and Hongxia Jin
Session: Spoken Language Understanding
User Information Augmented Semantic Frame Parsing Using Progressive Neural Networks
Yilin Shen, Xiangyu Zeng, Yu Wang and Hongxia Jin
Session: Spoken Language Understanding
Juang, Biing-Hwang (Fred)
Cycle-Consistent Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong and Biing-Hwang (Fred) Juang
Session: Novel Approaches to Enhancement
Adversarial Feature-Mapping for Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong and Biing-Hwang (Fred) Juang
Session: Deep Enhancement
Julka, Sahib
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
Session: Speech and Language Analytics for Mental Health
Juvela, Lauri
Time-regularized Linear Prediction for Noise-robust Extraction of the Spectral Envelope of Speech
Manu Airaksinen, Lauri Juvela, Okko Räsänen and Paavo Alku
Session: Speech Analysis and Representation
Speaker-independent Raw Waveform Model for Glottal Excitation
Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi and Paavo Alku
Session: Voice Conversion and Speech Synthesis
Jyothi, Preethi
Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
Abhinav Jain, Minali Upreti and Preethi Jyothi
Session: Adjusting to Speaker, Accent, and Domain
Dual Language Models for Code Switched Speech Recognition
Saurabh Garg, Tanmay Parekh and Preethi Jyothi
Session: Topics in Speech Recognition
Time Aggregation Operators for Multi-label Audio Event Detection
Pankaj Joshi, Digvijaysingh Gautam, Ganesh Ramakrishnan and Preethi Jyothi
Session: Acoustic Scenes and Rare Events
K T, Raseena
Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training
Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
Kadiri, Sudarsana Reddy
Discriminating Nasals and Approximants in English Language Using Zero Time Windowing
RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty and Bayya Yegnanarayana
Session: Speech Segments and Voice Quality
Detection of Glottal Closure Instants in Degraded Speech Using Single Frequency Filtering Analysis
Gunnam Aneeja, Sudarsana Reddy Kadiri and Bayya Yegnanarayana
Session: Measuring Pitch and Articulation
Estimation of Fundamental Frequency from Singing Voice Using Harmonics of Impulse-like Excitation Source
Sudarsana Reddy Kadiri and Bayya Yegnanarayana
Session: Measuring Pitch and Articulation
Breathy to Tense Voice Discrimination using Zero-Time Windowing Cepstral Coefficients (ZTWCCs)
Sudarsana Reddy Kadiri and Bayya Yegnanarayana
Session: Speech Segments and Voice Quality
Analysis and Detection of Phonation Modes in Singing Voice using Excitation Source Features and Single Frequency Filtering Cepstral Coefficients (SFFCC)
Sudarsana Reddy Kadiri and Bayya Yegnanarayana
Session: Deception, Personality, and Culture Attribute
Kalita, Sishir
Exploration of Compressed ILPR Features for Replay Attack Detection
Sarfaraz Jelil, Sishir Kalita, S R Mahadeva Prasanna and Rohit Sinha
Session: Spoofing Detection
Processing Transition Regions of Glottal Stop Substituted /S/ for Intelligibility Enhancement of Cleft Palate Speech
Protima Nomo Sudro, Sishir Kalita and S R Mahadeva Prasanna
Session: Speech and Singing Production
Estimation of Hypernasality Scores from Cleft Lip and Palate Speech
Vikram C M, Ayush Tripathi, Sishir Kalita and S R Mahadeva Prasanna
Session: Integrating Speech Science and Technology for Clinical Applications
Analysis of Breathiness in Contextual Vowel of Voiceless Nasals in Mizo
Pamir Gogoi, Sishir Kalita, Parismita Gogoi, Ratree Wayland, Priyankoo Sarmah, S R Mahadeva Prasanna
Session: Speech Segments and Voice Quality
Self-similarity Matrix Based Intelligibility Assessment of Cleft Lip and Palate Speech
Sishir Kalita, S R Mahadeva Prasanna and Samarendra Dandapat
Session: Acoustic Analysis-Synthesis of Speech Disorders
Kamble, Madhu
Effectiveness of Speech Demodulation-Based Features for Replay Detection
Madhu Kamble, Hemlata Tak and Hemant Patil
Session: Spoofing Detection
Novel Variable Length Energy Separation Algorithm Using Instantaneous Amplitude Features for Replay Detection
Madhu Kamble and Hemant Patil
Session: Spoofing Detection
Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection
Hardik Sailor, Madhu Kamble and Hemant Patil
Session: Spoofing Detection
DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018
Hardik B. Sailor, Maddala Venkata Siva Krishna, Diksha Chhabra, Ankur T. Patil, Madhu Kamble and Hemant Patil
Session: Low Resource Speech Recognition Challenge for Indian Languages
Kamper, Herman
Low-Resource Speech-to-Text Translation
Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez and Sharon Goldwater
Session: Selected Topics in Neural Speech Processing
Fast ASR-free and Almost Zero-resource Keyword Spotting Using DTW and CNNs for Humanitarian Monitoring
Raghav Menon, Herman Kamper, John Quinn and Thomas Niesler
Session: Topics in Speech Recognition
Kane, John
Attention-based Sequence Classification for Affect Detection
Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani and John Kane
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Kang, Yongguo
Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions
Yu Gu and Yongguo Kang
Session: Voice Conversion and Speech Synthesis
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
Hao Li, Yongguo Kang and Zhenyu Wang
Session: Expressive Speech Synthesis
Kannan, Anjuli
Semi-supervised Learning for Information Extraction from Dialogue
Anjuli Kannan, Kai Chen, Diana Jaunzeikare and Alvin Rajkomar
Session: Extracting Information from Audio
Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search
Ian Williams, Anjuli Kannan, Petar Aleksic, David Rybach and Tara Sainath
Session: Recurrent Neural Models for ASR
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Kao, Chieh-Chi
A Simple Model for Detection of Rare Sound Events
Weiran Wang, Chieh-Chi Kao and Chao Wang
Session: Audio Events and Acoustic Scenes
R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection
Chieh-Chi Kao, Weiran Wang, Ming Sun and Chao Wang
Session: Audio Events and Acoustic Scenes
Karadayi, Julien
The ACLEW DiViMe: An Easy-to-use Diarization Tool
Adrien Le Franc, Eric Riebling, Julien Karadayi, Yun Wang, Camila Scaff, Florian Metze and Alejandrina Cristia
Session: Speaker Diarization
Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning
Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz and Emmanuel Dupoux
Session: Topics in Speech Recognition
Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments
Nils Holzenberger, Mingxing Du, Julien Karadayi, Rachid Riad and Emmanuel Dupoux
Session: Zero-resource Speech Recognition
Karafiát, Martin
BUT OpenSAT 2017 Speech Recognition System
Martin Karafiát, Murali Karthick Baskar, Igor Szöke, Vladimír Malenovský, Karel Veselý, František Grézl, Lukáš Burget and Jan Černocký
Session: Topics in Speech Recognition
BUT System for Low Resource Indian Language ASR
Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiát, Lukáš Burget and Jan Černocký
Session: Low Resource Speech Recognition Challenge for Indian Languages
Karita, Shigeki
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Auxiliary Feature Based Adaptation of End-to-end ASR Systems
Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita and Tomohiro Nakatani
Session: Adjusting to Speaker, Accent, and Domain
Semi-Supervised End-to-End Speech Recognition
Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa and Marc Delcroix
Session: End-to-End Speech Recognition
Karki, Bhavya
Investigating Utterance Level Representations for Detecting Intent from Acoustics
SaiKrishna Rallabandi, Bhavya Karki, Carla Viegas, Eric Nyberg and Alan W Black
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Karpov, Alexey
LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition
Heysem Kaya, Dmitrii Fedotov, Ali Yeşilkanat, Oxana Verkholyak, Yang Zhang and Alexey Karpov
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Kathiresan, Thayabaran
The Zurich Corpus of Vowel and Voice Quality, Version 1.0
Dieter Maurer, Christian d’Heureuse, Heidy Suter, Volker Dellwo, Daniel Friedrichs and Thayabaran Kathiresan
Session: Phonation
Influences of Fundamental Oscillation on Speaker Identification in Vocalic Utterances by Humans and Computers
Volker Dellwo, Thayabaran Kathiresan, Elisa Pellegrino, Lei He, Sandra Schwab and Dieter Maurer
Session: Speech and Speaker Perception
Kaver, Liat
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Kawahara, Tatsuya
Engagement Recognition in Spoken Dialogue via Neural Network by Aggregating Different Annotators' Models
Koji Inoue, Divesh Lala, Katsuya Takanashi and Tatsuya Kawahara
Session: Spoken Dialogue Systems and Conversational Analysis
Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers
Kohei Hara, Koji Inoue, Katsuya Takanashi and Tatsuya Kawahara
Session: Multimodal Dialogue Systems
Forward-Backward Attention Decoder
Masato Mimura, Shinsuke Sakai and Tatsuya Kawahara
Session: Recurrent Neural Models for ASR
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono and Tatsuya Kawahara
Session: Adjusting to Speaker, Accent, and Domain
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
Session: Acoustic Modelling
Kawai, Hisashi
Temporal Attentive Pooling for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao and Hisashi Kawai
Session: Audio Events and Acoustic Scenes
Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
Session: Language Identification
Multilingual Grapheme-to-Phoneme Conversion with Global Character Vectors
Jinfu Ni, Yoshinori Shiga and Hisashi Kawai
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
Session: Acoustic Modelling
Kaya, Heysem
LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition
Heysem Kaya, Dmitrii Fedotov, Ali Yeşilkanat, Oxana Verkholyak, Yang Zhang and Alexey Karpov
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Kazui, Hiroaki
Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents
Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo and Satoshi Nakamura
Session: Integrating Speech Science and Technology for Clinical Applications
Kesiraju, Santosh
BUT System for Low Resource Indian Language ASR
Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiát, Lukáš Burget and Jan Černocký
Session: Low Resource Speech Recognition Challenge for Indian Languages
i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models
Karel Beneš, Santosh Kesiraju and Lukáš Burget
Session: Language Modeling
Khan, Md. Tauhiduzzaman
An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng and Taufiq Hasan
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Khokhlov, Yuri
Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition
Natalia Tomashenko, Yuri Khokhlov and Yannick Estève
Session: Adjusting to Speaker, Accent, and Domain
An Investigation of Mixup Training Strategies for Acoustic Models in ASR
Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Dmitry Popov, Natalia Tomashenko, Ivan Sorokin and Alexander Zatvornitskiy
Session: Neural Network Training Strategies for ASR
Khudanpur, Sanjeev
Acoustic Modeling from Frequency Domain Representations of Speech
Pegah Ghahremani, Hossein Hadian, Hang Lv, Daniel Povey and Sanjeev Khudanpur
Session: Robust Speech Recognition
Output-Gate Projected Gated Recurrent Unit for Speech Recognition
Gaofeng Cheng, Daniel Povey, Lu Huang, Ji Xu, Sanjeev Khudanpur and Yonghong Yan
Session: Novel Neural Network Architectures for Acoustic Modelling
Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages
Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak and Sanjeev Khudanpur
Session: Extracting Information from Audio
A GPU-based WFST Decoder with Exact Lattice Generation
Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Recurrent Neural Models for ASR
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
End-to-end Deep Neural Network Age Estimation
Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur and Najim Dehak
Session: Speaker State and Trait
Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Ke Li, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Language Modeling
End-to-end Speech Recognition Using Lattice-free MMI
Hossein Hadian, Hossein Sameti, Daniel Povey and Sanjeev Khudanpur
Session: End-to-End Speech Recognition
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi and Sanjeev Khudanpur
Session: Acoustic Modelling
Kim, Jeesun
Investigating the Role of Familiar Face and Voice Cues in Speech Processing in Noise
Jeesun Kim, Sonya Karisma, Vincent Aubanel and Chris Davis
Session: Speech Perception in Adverse Conditions
Characterizing Rhythm Differences between Strong and Weak Accented L2 Speech
Chris Davis and Jeesun Kim
Session: Second Language Acquisition and Code-switching
Kim, Myungjong
Automatic Speech Recognition with Articulatory Information and a Unified Dictionary for Hindi, Marathi, Bengali and Oriya
Debadatta Dash, Myungjong Kim, Kristin Teplansky and Jun Wang
Session: Speech Recognition for Indian Languages
Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
Kwanghoon An, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman and Jun Wang
Session: Integrating Speech Science and Technology for Clinical Applications
Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
Myungjong Kim, Beiming Cao, Kwanghoon An and Jun Wang
Session: Application of ASR in Medical Practice
Articulation-to-Speech Synthesis Using Articulatory Flesh Point Sensors’ Orientation Information
Beiming Cao, Myungjong Kim, Jun R. Wang, Jan van Santen, Ted Mau and Jun Wang
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Kim, Yulia
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
King, Simon
Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data
Zack Hodari, Oliver Watts, Srikanth Ronanki and Simon King
Session: Prosody Modeling and Generation
Exemplar-based Speech Waveform Generation
Oliver Watts, Cassia Valentini-Botinhao, Felipe Espic and Simon King
Session: Voice Conversion and Speech Synthesis
Impact of Different Speech Types on Listening Effort
Olympia Simantiraki, Martin Cooke and Simon King
Session: Speech Perception in Adverse Conditions
Using Pupillometry to Measure the Cognitive Load of Synthetic Speech
Avashna Govender and Simon King
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Measuring the Cognitive Load of Synthetic Speech Using a Dual Task Paradigm
Avashna Govender and Simon King
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Kinnunen, Tomi
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion
Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen and Junichi Yamagishi
Session: Speaker Verification I
Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks
Akihiro Kato and Tomi Kinnunen
Session: Deep Learning for Source Separation and Pitch Tracking
Kinoshita, Keisuke
Multi-resolution Gammachirp Envelope Distortion Index for Intelligibility Prediction of Noisy Speech
Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita and Tomohiro Nakatani
Session: Speech Intelligibility and Quality
Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation
Lukas Drude, Christoph Boeddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix and Tomohiro Nakatani
Session: Distant ASR
Knill, Kate
A Deep Learning Approach to Assessing Non-native Pronunciation of English Using Phone Distances
Konstantinos Kyriakopoulos, Kate Knill and Mark Gales
Session: Applications in Education and Learning
Impact of ASR Performance on Free Speaking Language Assessment
Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines
Session: Applications in Education and Learning
Ko, Ming-Ya
Self-Assessed Affect Recognition Using Fusion of Attentional BLSTM and Static Acoustic Features
Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li and Chi-Chun Lee
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Ko, Tom
Long Distance Voice Channel Diagnosis Using Deep Neural Networks
Zhen Qin, Tom Ko and Guangjian Tian
Session: Application of ASR in Medical Practice
Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
Yingke Zhu, Tom Ko, David Snyder, Brian Mak and Daniel Povey
Session: Speaker Verification Using Neural Network Methods II
Kocharov, Daniil
Language-Dependent Melody Embeddings
Daniil Kocharov and Alla Menshikova
Session: Speech Prosody
Kochetov, Alexei
An Ultrasound Study of Gemination in Coronal Stops in Eastern Oromo
Maida Percival, Alexei Kochetov and Yoonjung Kang
Session: Speech and Singing Production
Gestural Lenition of Rhotics Captures Variation in Brazilian Portuguese
Phil Howson and Alexei Kochetov
Session: Speech Segments and Voice Quality
The Retroflex-dental Contrast in Punjabi Stops and Nasals: A Principal Component Analysis of Ultrasound Images
Alexei Kochetov, Matthew Faytak and Kiranpreet Nara
Session: Speech Segments and Voice Quality
Kopparapu, Sunil Kumar
Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder
Chitralekha Bhat, Biswajit Das, Bhavik Vachhani and Sunil Kumar Kopparapu
Session: Automatic Detection and Recognition of Voice and Speech Disorders
Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition
Bhavik Vachhani, Chitralekha Bhat and Sunil Kumar Kopparapu
Session: Automatic Detection and Recognition of Voice and Speech Disorders
Analysis of the Effect of Speech-Laugh on Speaker Recognition System
Sri Harsha Dumpala, Ashish Panda and Sunil Kumar Kopparapu
Session: Speaker Characterization and Analysis
Krishna, Hari
An Exploration towards Joint Acoustic Modeling for Indian Languages: IIIT-H Submission for Low Resource Speech Recognition Challenge for Indian Languages, INTERSPEECH 2018
Hari Krishna, Krishna Gurugubelli, Vishnu Vidyadhara Raju V and Anil Kumar Vuppala
Session: Low Resource Speech Recognition Challenge for Indian Languages
Kudo, Takashi
Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents
Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo and Satoshi Nakamura
Session: Integrating Speech Science and Technology for Clinical Applications
Kumar, Avinash
Analysis of Variational Mode Functions for Robust Detection of Vowels
Surbhi Sakshi, Avinash Kumar and Gayadhar Pradhan
Session: Speech Analysis and Representation
Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition
Ishwar Chandra Yadav, Avinash Kumar, Syed Shahnawazuddin and Gayadhar Pradhan
Session: Robust Speech Recognition
Kumar, Deepak
Development of Large Vocabulary Speech Recognition System with Keyword Search for Manipuri
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Speech Recognition for Indian Languages
An Automatic Speech Transcription System for Manipuri Language
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Show and Tell 6
Kumar, Kanhaiya
Automatic Detection of Expressiveness in Oral Reading
Kamini Sabu, Kanhaiya Kumar and Preeti Rao
Session: Show and Tell 4
A Study of Lexical and Prosodic Cues to Segmentation in a Hindi-English Code-switched Discourse
Preeti Rao, Mugdha Pandya, Kamini Sabu, Kanhaiya Kumar and Nandini Bondale
Session: Speech Technologies for Code-Switching in Multilingual Communities
Kumar, Manoj
A Knowledge Driven Structural Segmentation Approach for Play-Talk Classification During Autism Assessment
Manoj Kumar, Pooja Chebolu, So Hyun Kim, Kassandra Martinez, Catherine Lord and Shrikanth Narayanan
Session: Spoken Corpora and Annotation
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech
Jilt Sebastian, Manoj Kumar, Pavan Kumar D. S., Mathew Magimai.-Doss, Hema Murthy and Shrikanth Narayanan
Session: Speaker State and Trait
Kumar, Rajath
On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification
Rajath Kumar, Vaishnavi Yeruva and Sriram Ganapathy
Session: Speaker Verification II
Music Source Activity Detection and Separation Using Deep Attractor Network
Rajath Kumar, Yi Luo and Nima Mesgarani
Session: Deep Learning for Source Separation and Pitch Tracking
Kurata, Gakuto
Data Augmentation Improves Recognition of Foreign Accented Speech
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin and Gakuto Kurata
Session: Adjusting to Speaker, Accent, and Domain
Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models
Masayuki Suzuki, Tohru Nagano, Gakuto Kurata and Samuel Thomas
Session: Neural Network Training Strategies for ASR
Kyriakopoulos, Konstantinos
A Deep Learning Approach to Assessing Non-native Pronunciation of English Using Phone Distances
Konstantinos Kyriakopoulos, Kate Knill and Mark Gales
Session: Applications in Education and Learning
Impact of ASR Performance on Free Speaking Language Assessment
Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines
Session: Applications in Education and Learning
Laaridh, Imed
Automatic Evaluation of Speech Intelligibility Based on I-vectors in the Context of Head and Neck Cancers
Imed Laaridh, Corinne Fredouille, Alain Ghio, Muriel Lalain and Virginie Woisard
Session: Application of ASR in Medical Practice
Perceptual and Automatic Evaluations of the Intelligibility of Speech Degraded by Noise Induced Hearing Loss Simulation
Imed Laaridh, Julien Tardieu, Cynthia Magnen, Pascal Gaillard, Jérôme Farinas and Julien Pinquier
Session: Application of ASR in Medical Practice
Lam, Max W. Y.
Gaussian Process Neural Networks for Speech Recognition
Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu and Helen Meng
Session: Novel Neural Network Architectures for Acoustic Modelling
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
Jianwei Yu, Xurong Xie, Shansong Liu, Shoukang Hu, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu and Helen Meng
Session: Application of ASR in Medical Practice
Lamel, Lori
Exploring Temporal Reduction in Dialectal Spanish: A Large-scale Study of Lenition of Voiced Stops and Coda-s
Ioana Vasilescu, Nidia Hernandez, Bianca Vieru and Lori Lamel
Session: Dialectal Variation
Studying Vowel Variation in French-Algerian Arabic Code-switched Speech
Jane Wottawa, Amazouz Djegdjiga, Martine Adda-Decker and Lori Lamel
Session: Dialectal Variation
Lammert, Adam
Vocal Biomarkers for Cognitive Performance Estimation in a Working Memory Task
Jennifer Sloboda, Adam Lammert, James Williamson, Christopher Smalt, Daryush D. Mehta, COL Ian Curry, Kristin Heaton, Jeffrey Palmer and Thomas Quatieri
Session: Speaker Characterization and Analysis
The Effect of Exposure to High Altitude and Heat on Speech Articulatory Coordination
James Williamson, Thomas Quatieri, Adam Lammert, Katherine Mitchell, Katherine Finkelstein, Nicole Ekon, Caitlin Dillon, Robert Kenefick and Kristin Heaton
Session: Speaker State and Trait
Landini, Federico
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Lane, Ian
Densely Connected Networks for Conversational Speech Recognition
Kyu Han, Akshay Chandrashekaran, Jungsuk Kim and Ian Lane
Session: Sequence Models for ASR
Online Incremental Learning for Speaker-Adaptive Language Models
Chih Chi Hu, Bing Liu, John Shen and Ian Lane
Session: Language Modeling
Lange, Patrick
Game-based Spoken Dialog Language Learning Applications for Young Students
Keelan Evanini, Veronika Timpe-Laughlin, Eugene Tsuprun, Ian Blood, Jeremy Lee, James Bruno, Vikram Ramanarayanan, Patrick Lange and David Suendermann-Oeft
Session: Show and Tell 2
Toward Scalable Dialog Technology for Conversational Language Learning: Case Study of the TOEFL® MOOC
Vikram Ramanarayanan, David Pautler, Patrick Lange, Eugene Tsuprun, Rutuja Ubale, Keelan Evanini and David Suendermann-Oeft
Session: Show and Tell 5
Latif, Siddique
Transfer Learning for Improving Speech Emotion Classification Accuracy
Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir and Julien Epps
Session: Speaker State and Trait
Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study
Siddique Latif, Rajib Rana, Junaid Qadir and Julien Epps
Session: Representation Learning for Emotion
Lawson, Aaron
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson and Martin Graciarena
Session: Speaker Verification II
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Analysis of Complementary Information Sources in the Speaker Embeddings Framework
Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout and Aaron Lawson
Session: Speaker Verification Using Neural Network Methods II
Lee, Chi-Chun
Self-Assessed Affect Recognition Using Fusion of Attentional BLSTM and Static Acoustic Features
Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li and Chi-Chun Lee
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
An Interlocutor-Modulated Attentional LSTM for Differentiating between Subgroups of Autism Spectrum Disorder
Yun-Shao Lin, Susan Shur-Fen Gau and Chi-Chun Lee
Session: Speech and Language Analytics for Mental Health
Encoding Individual Acoustic Features Using Dyad-Augmented Deep Variational Representations for Dialog-level Emotion Recognition
Jeng-Lin Li and Chi-Chun Lee
Session: Representation Learning for Emotion
Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition
Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng and Chi-Chun Lee
Session: Speech Pathology, Depression, and Medical Applications
Automatic Assessment of Individual Culture Attribute of Power Distance Using a Social Context-Enhanced Prosodic Network Representation
Fu-Sheng Tsai, Hao-Chun Yang, Wei-Wen Chang and Chi-Chun Lee
Session: Deception, Personality, and Culture Attribute
Lee, Chin-Hui
Speaker Diarization with Enhancing Speech for the First DIHARD Challenge
Lei Sun, Jun Du, Chao Jiang, Xueyang Zhang, Shan He, Bing Yin and Chin-Hui Lee
Session: The First DIHARD Speech Diarization Challenge
Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement
Li Chai, Jun Du and Chin-Hui Lee
Session: Deep Enhancement
Lee, Hung-yi
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations
Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee and Lin-shan Lee
Session: Voice Conversion
Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator
Pei-Hung Chung, Kuan Tung, Ching-Lun Tai and Hung-yi Lee
Session: Extracting Information from Audio
Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension
Chia-Hsuan Lee, Szu-Lin Wu, Chi-Liang Liu and Hung-yi Lee
Session: Spoken Language Understanding
Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings
Da-Rong Liu, Kuan-Yu Chen, Hung-yi Lee and Lin-shan Lee
Session: Acoustic Modelling
Lee, Kong Aik
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion
Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen and Junichi Yamagishi
Session: Speaker Verification I
Co-whitening of I-vectors for Short and Long Duration Speaker Verification
Longting Xu, Kong Aik Lee, Haizhou Li and Zhen Yang
Session: Speaker Verification II
Lee, Lin-shan
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations
Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee and Lin-shan Lee
Session: Voice Conversion
Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings
Da-Rong Liu, Kuan-Yu Chen, Hung-yi Lee and Lin-shan Lee
Session: Acoustic Modelling
Lee, Tan
Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer
Siyuan Feng and Tan Lee
Session: Adjusting to Speaker, Accent, and Domain
Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling
Siyuan Feng and Tan Lee
Session: Zero-resource Speech Recognition
Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning
Ying Qin, Tan Lee, Siyuan Feng and Anthony Pak Hin Kong
Session: Speech Pathology, Depression, and Medical Applications
Cross-cultural (A)symmetries in Audio-visual Attitude Perception
Hansjörg Mixdorff, Albert Rilliard, Tan Lee, Matthew K. H. Ma and Angelika Hönemann
Session: Deception, Personality, and Culture Attribute
Lei, Ming
Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning
ShiLiang Zhang and Ming Lei
Session: Sequence Models for ASR
Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting
Mengzhe Chen, ShiLiang Zhang, Ming Lei, Yong Liu, Haitao Yao and Jie Gao
Session: Topics in Speech Recognition
Levitan, Rivka
Lexical and Acoustic Deep Learning Model for Personality Recognition
Guozhen An and Rivka Levitan
Session: Speaker Characterization and Analysis
Deep Personality Recognition for Deception Detection
Guozhen An, Sarah Ita Levitan, Julia Hirschberg and Rivka Levitan
Session: Deception, Personality, and Culture Attribute
Levitan, Sarah Ita
Acoustic-Prosodic Indicators of Deception and Trust in Interview Dialogues
Sarah Ita Levitan, Angel Maredia and Julia Hirschberg
Session: Deception, Personality, and Culture Attribute
Deep Personality Recognition for Deception Detection
Guozhen An, Sarah Ita Levitan, Julia Hirschberg and Rivka Levitan
Session: Deception, Personality, and Culture Attribute
Li, Bin
Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese
Cuiling Zhang, Bin Li, Si Chen and Yike Yang
Session: Phonation
Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement
Shuai Nie, Shan Liang, Bin Liu, Yaping Zhang, Wenju Liu and Jianhua Tao
Session: Deep Enhancement
Li, Haizhou
Mandarin-English Code-switching Speech Recognition
Haihua Xu, Van Tung Pham, Zin Tun Kyaw, Zhi Hao Lim, Eng Siong Chng and Haizhou Li
Session: Show and Tell 2
Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion
Berrak Sisman and Haizhou Li
Session: Prosody Modeling and Generation
Co-whitening of I-vectors for Short and Long Duration Speaker Verification
Longting Xu, Kong Aik Lee, Haizhou Li and Zhen Yang
Session: Speaker Verification II
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search
Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma and Haizhou Li
Session: Spoken Term Detection
Automatic Pronunciation Evaluation of Singing
Chitralekha Gupta, Haizhou Li and Ye Wang
Session: Speech and Singing Production
A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder
Berrak Sisman, Mingyang Zhang and Haizhou Li
Session: Voice Conversion and Speech Synthesis
A Shifted Delta Coefficient Objective for Monaural Speech Separation Using Multi-task Learning
Chenglin Xu, Wei Rao, Eng Siong Chng and Haizhou Li
Session: Source Separation from Monaural Input
Li, Hao
Mandarin-English Code-switching Speech Recognition
Haihua Xu, Van Tung Pham, Zin Tun Kyaw, Zhi Hao Lim, Eng Siong Chng and Haizhou Li
Session: Show and Tell 2
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
Hao Li, Yongguo Kang and Zhenyu Wang
Session: Expressive Speech Synthesis
Li, Jeng-Lin
Self-Assessed Affect Recognition Using Fusion of Attentional BLSTM and Static Acoustic Features
Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li and Chi-Chun Lee
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Encoding Individual Acoustic Features Using Dyad-Augmented Deep Variational Representations for Dialog-level Emotion Recognition
Jeng-Lin Li and Chi-Chun Lee
Session: Representation Learning for Emotion
Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition
Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng and Chi-Chun Lee
Session: Speech Pathology, Depression, and Medical Applications
Li, Jinyu
Cycle-Consistent Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong and Biing-Hwang (Fred) Juang
Session: Novel Approaches to Enhancement
Layer Trajectory LSTM
Jinyu Li, Changliang Liu and Yifan Gong
Session: Novel Neural Network Architectures for Acoustic Modelling
Improved Training for Online End-to-end Speech Recognition Systems
Suyoun Kim, Michael Seltzer, Jinyu Li and Rui Zhao
Session: Neural Network Training Strategies for ASR
Adversarial Feature-Mapping for Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong and Biing-Hwang (Fred) Juang
Session: Deep Enhancement
Li, Juncheng
Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks
Yun Wang, Juncheng Li and Florian Metze
Session: Audio Events and Acoustic Scenes
Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection
Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley and Samarjit Das
Session: Acoustic Scenes and Rare Events
Li, Ke
Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Ke Li, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Language Modeling
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi and Sanjeev Khudanpur
Session: Acoustic Modelling
Li, Ming
An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals
Dengke Tang, Junlin Zeng and Ming Li
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Analysis of Length Normalization in End-to-End Speaker Verification System
Weicheng Cai, Jinkun Chen and Ming Li
Session: Speaker Verification Using Neural Network Methods II
Li, Sheng
Temporal Attentive Pooling for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao and Hisashi Kawai
Session: Audio Events and Acoustic Scenes
Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
Session: Language Identification
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
Session: Acoustic Modelling
Li, Xiangang
Multiple Phase Information Combination for Replay Attacks Detection
Dongbo Li, Longbiao Wang, Jianwu Dang, Meng Liu, Zeyan Oo, Seiichi Nakagawa, Haotian Guan and Xiangang Li
Session: Spoofing Detection
Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
Lili Guo, Longbiao Wang, Jianwu Dang, Linjuan Zhang, Haotian Guan and Xiangang Li
Session: Robust Speech Recognition
Li, Ya
BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End
Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ya Li
Session: Prosody Modeling and Generation
Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function
Jian Huang, Ya Li, Jianhua Tao and Zhen Lian
Session: Emotion Recognition and Analysis
Li, Yaxing
Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer
Yaxing Li, Eshete Derb Emiru, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yichang Li
Session: Coding
Multi-frame Coding of LSF Parameters Using Block-Constrained Trellis Coded Vector Quantization
Yaxing Li, Shan Xu, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yueming Ding
Session: Coding
Liberatore, Christopher
Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function
Shaojin Ding, Guanlong Zhao, Christopher Liberatore and Ricardo Gutierrez-Osuna
Session: Voice Conversion
Learning Structured Dictionaries for Exemplar-based Voice Conversion
Shaojin Ding, Christopher Liberatore and Ricardo Gutierrez-Osuna
Session: Voice Conversion
Lin, Huibin
Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
Session: Speaker Verification I
Joint Learning of J-Vector Extractor and Joint Bayesian Model for Text Dependent Speaker Verification
Ziqiang Shi, Liu Liu, Huibin Lin and Rujie Liu
Session: Speaker Verification II
Latent Factor Analysis of Deep Bottleneck Features for Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
Session: Speaker Verification II
Ling, Zhen-Hua
WaveNet Vocoder with Limited Training Data for Voice Conversion
Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou and Li-Rong Dai
Session: Voice Conversion and Speech Synthesis
Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis
Xiao Zhou, Zhen-Hua Ling, Zhi-Ping Zhou and Li-Rong Dai
Session: Speech Synthesis Paradigms and Methods
Lippus, Pärtel
Creak in the Respiratory Cycle
Kätlin Aare, Pärtel Lippus, Marcin Wlodarczak and Mattias Heldner
Session: Phonation
Liu, Liu
Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
Session: Speaker Verification I
Joint Learning of J-Vector Extractor and Joint Bayesian Model for Text Dependent Speaker Verification
Ziqiang Shi, Liu Liu, Huibin Lin and Rujie Liu
Session: Speaker Verification II
Latent Factor Analysis of Deep Bottleneck Features for Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
Session: Speaker Verification II
Liu, Rujie
Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
Session: Speaker Verification I
Joint Learning of J-Vector Extractor and Joint Bayesian Model for Text Dependent Speaker Verification
Ziqiang Shi, Liu Liu, Huibin Lin and Rujie Liu
Session: Speaker Verification II
Latent Factor Analysis of Deep Bottleneck Features for Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
Session: Speaker Verification II
Liu, Shansong
Gaussian Process Neural Networks for Speech Recognition
Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu and Helen Meng
Session: Novel Neural Network Architectures for Acoustic Modelling
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
Jianwei Yu, Xurong Xie, Shansong Liu, Shoukang Hu, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu and Helen Meng
Session: Application of ASR in Medical Practice
Liu, Shih-Chii
Speaker Activity Detection and Minimum Variance Beamforming for Source Separation
Enea Ceolini, Jithendar Anumula, Adrian Huber, Ilya Kiselev and Shih-Chii Liu
Session: Source Separation and Spatial Analysis
Multi-channel Attention for End-to-End Speech Recognition
Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini and Shih-Chii Liu
Session: End-to-End Speech Recognition
Liu, Songxiang
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance
Songxiang Liu, Jinghua Zhong, Lifa Sun, Xixin Wu, Xunying Liu and Helen Meng
Session: Voice Conversion
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu and Helen Meng
Session: Expressive Speech Synthesis
Liu, Xunying
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance
Songxiang Liu, Jinghua Zhong, Lifa Sun, Xixin Wu, Xunying Liu and Helen Meng
Session: Voice Conversion
Gaussian Process Neural Networks for Speech Recognition
Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu and Helen Meng
Session: Novel Neural Network Architectures for Acoustic Modelling
Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis
Xu Li, Shaoguang Mao, Xixin Wu, Kun Li, Xunying Liu and Helen Meng
Session: Second Language Acquisition and Code-switching
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
Jianwei Yu, Xurong Xie, Shansong Liu, Shoukang Hu, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu and Helen Meng
Session: Application of ASR in Medical Practice
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu and Helen Meng
Session: Expressive Speech Synthesis
Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription
Rongfeng Su, Xunying Liu and Lan Wang
Session: Multimodal Systems
Liu, Yang
Liulishuo's System for the Spoken CALL Shared Task 2018
Huy Nguyen, Lei Chen, Ramon Prieto, Chuan Wang and Yang Liu
Session: Spoken CALL Shared Task, Second Edition
Pitch Characteristics of L2 English Speech by Chinese Speakers: A Large-scale Study
Jiahong Yuan, Qiusi Dong, Fei Wu, Huan Luan, Xiaofei Yang, Hui Lin and Yang Liu
Session: Second Language Acquisition and Code-switching
Lu, Xugang
Temporal Attentive Pooling for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao and Hisashi Kawai
Session: Audio Events and Acoustic Scenes
Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
Session: Language Identification
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
Session: Acoustic Modelling
Luo, Yi
Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
Yi Luo and Nima Mesgarani
Session: Deep Learning for Source Separation and Pitch Tracking
Music Source Activity Detection and Separation Using Deep Attractor Network
Rajath Kumar, Yi Luo and Nima Mesgarani
Session: Deep Learning for Source Separation and Pitch Tracking
Luong, Hieu-Thi
Investigating Accuracy of Pitch-accent Annotations in Neural Network-based Speech Synthesis and Denoising Effects
Hieu-Thi Luong, Xin Wang, Junichi Yamagishi and Nobuyuki Nishizawa
Session: Prosody Modeling and Generation
Multimodal Speech Synthesis Architecture for Unsupervised Speaker Adaptation
Hieu-Thi Luong and Junichi Yamagishi
Session: Speech Synthesis Paradigms and Methods
Luz, Saturnino
Improving Response Time of Active Speaker Detection Using Visual Prosody Information Prior to Articulation
Fasih Haider, Saturnino Luz, Carl Vogel and Nick Campbell
Session: Speaker Characterization and Analysis
An Active Feature Transformation Method for Attitude Recognition of Video Bloggers
Fasih Haider, Fahim A. Salim, Owen Conlan and Saturnino Luz
Session: Deception, Personality, and Culture Attribute
Maciejewski, Matthew
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
Madikeri, Srikanth
Analysis of Language Dependent Front-End for Speaker Recognition
Srikanth Madikeri, Subhadeep Dey and Petr Motlicek
Session: Speaker Verification II
End-to-end Text-dependent Speaker Verification Using Novel Distance Measures
Subhadeep Dey, Srikanth Madikeri and Petr Motlicek
Session: Speaker Verification Using Neural Network Methods II
Maggu, Akshay Raj
Learning Two Tone Languages Enhances the Brainstem Encoding of Lexical Tones
Akshay Raj Maggu, Wenqing Zong, Vina Law and Patrick C. M. Wong
Session: Cognition and Brain Studies
Experience-dependent Influence of Music and Language on Lexical Pitch Learning Is Not Additive
Akshay Raj Maggu, Patrick C. M. Wong, Hanjun Liu and Francis C. K. Wong
Session: Speech and Speaker Perception
Magimai.-Doss, Mathew
Implementing Fusion Techniques for the Classification of Paralinguistic Information
Bogdan Vlasenko, Jilt Sebastian, Pavan Kumar D. S. and Mathew Magimai.-Doss
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
On Learning Vocal Tract System Related Speaker Discriminative Information from Raw Signal Using CNNs
Hannah Muckenhirn, Mathew Magimai.-Doss and Sebastien Marcel
Session: Speaker Verification II
On Learning to Identify Genders from Raw Speech Signal Using CNNs
Selen Hande Kabil, Hannah Muckenhirn and Mathew Magimai.-Doss
Session: Speaker State and Trait
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech
Jilt Sebastian, Manoj Kumar, Pavan Kumar D. S., Mathew Magimai.-Doss, Hema Murthy and Shrikanth Narayanan
Session: Speaker State and Trait
Magron, Paul
Expectation-Maximization Algorithms for Itakura-Saito Nonnegative Matrix Factorization
Paul Magron and Tuomas Virtanen
Session: Source Separation and Spatial Analysis
Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation
Paul Magron, Konstantinos Drossos, Stylianos Ioannis Mimilakis and Tuomas Virtanen
Session: Deep Learning for Source Separation and Pitch Tracking
Mak, Brian
Fast Derivation of Cross-lingual Document Vectors from Self-attentive Neural Machine Translation Model
Wei Li and Brian Mak
Session: Spoken Term Detection
Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
Yingke Zhu, Tom Ko, David Snyder, Brian Mak and Daniel Povey
Session: Speaker Verification Using Neural Network Methods II
Mandel, Michael
Large Vocabulary Concatenative Resynthesis
Soumi Maiti, Joey Ching and Michael Mandel
Session: Novel Approaches to Enhancement
Concatenative Resynthesis with Improved Training Signals for Speech Enhancement
Ali Raza Syed, Viet Anh Trinh and Michael Mandel
Session: Novel Approaches to Enhancement
Manohar, Vimal
Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages
Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak and Sanjeev Khudanpur
Session: Extracting Information from Audio
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
Marschik, Peter B.
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Marshall, David
Computational Paralinguistics: Automatic Assessment of Emotions, Mood and Behavioural State from Acoustics of Speech
Zafi Sherhan Syed, Julien Schroeter, Kirill Sidorov and David Marshall
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Marvin, Radhika
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Masataki, Hirokazu
Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder
Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hirokazu Masataki and Yushi Aono
Session: Selected Topics in Neural Speech Processing
Neural Error Corrective Language Models for Automatic Speech Recognition
Tomohiro Tanaka, Ryo Masumura, Hirokazu Masataki and Yushi Aono
Session: ASR Systems and Technologies
Masumura, Ryo
Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder
Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hirokazu Masataki and Yushi Aono
Session: Selected Topics in Neural Speech Processing
Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training
Atsushi Ando, Reine Asakawa, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa and Yushi Aono
Session: Speaker Characterization and Analysis
Neural Error Corrective Language Models for Automatic Speech Recognition
Tomohiro Tanaka, Ryo Masumura, Hirokazu Masataki and Yushi Aono
Session: ASR Systems and Technologies
Matveeva, Anastasia
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Matějka, Pavel
Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
Ladislav Mošner, Oldřich Plchot, Pavel Matějka, Ondřej Novotný and Jan Černocký
Session: Dereverberation
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Maurer, Dieter
The Zurich Corpus of Vowel and Voice Quality, Version 1.0
Dieter Maurer, Christian d’Heureuse, Heidy Suter, Volker Dellwo, Daniel Friedrichs and Thayabaran Kathiresan
Session: Phonation
Influences of Fundamental Oscillation on Speaker Identification in Vocalic Utterances by Humans and Computers
Volker Dellwo, Thayabaran Kathiresan, Elisa Pellegrino, Lei He, Sandra Schwab and Dieter Maurer
Session: Speech and Speaker Perception
McCree, Alan
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
McLaren, Mitchell
A Generalization of PLDA for Joint Modeling of Speaker Identity and Multiple Nuisance Conditions
Luciana Ferrer and Mitchell McLaren
Session: Speaker Verification I
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson and Martin Graciarena
Session: Speaker Verification II
Analysis of Complementary Information Sources in the Speaker Embeddings Framework
Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout and Aaron Lawson
Session: Speaker Verification Using Neural Network Methods II
Employing Phonetic Information in DNN Speaker Embeddings to Improve Speaker Recognition Performance
Md Hafizur Rahman, Ivan Himawan, Mitchell McLaren, Clinton Fookes and Sridha Sridharan
Session: Speaker Verification Using Neural Network Methods II
McLoughlin, Ian
Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition
Jian Tang, Yan Song, Lirong Dai and Ian McLoughlin
Session: Novel Neural Network Architectures for Acoustic Modelling
An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition
Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
Session: Representation Learning for Emotion
Early Detection of Continuous and Partial Audio Events Using CNN
Ian McLoughlin, Yan Song, Lam Dang Pham, Ramaswamy Palaniappan, Huy Phan and Yue Lang
Session: Acoustic Scenes and Rare Events
An Improved Deep Embedding Learning Method for Short Duration Speaker Verification
Zhifu Gao, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
Session: Speaker Verification Using Neural Network Methods II
Meenakshi, G. Nisha
Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs
G. Nisha Meenakshi and Prasanta Kumar Ghosh
Session: Voice Conversion
Reconstructing Neutral Speech from Tracheoesophageal Speech
Abinay Reddy N, Achuth Rao MV, G. Nisha Meenakshi and Prasanta Kumar Ghosh
Session: Speech and Singing Production
Relating Articulatory Motions in Different Speaking Rates
Astha Singh, G. Nisha Meenakshi and Prasanta Kumar Ghosh
Session: Source and Supra-segmentals
Melkote, Vinay
Temporal Noise Shaping with Companding
Arijit Biswas, Per Hedelin, Lars Villemoes and Vinay Melkote
Session: Coding
Meng, Helen
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance
Songxiang Liu, Jinghua Zhong, Lifa Sun, Xixin Wu, Xunying Liu and Helen Meng
Session: Voice Conversion
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection
Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng and Lianhong Cai
Session: Spoken Term Detection
Gaussian Process Neural Networks for Speech Recognition
Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu and Helen Meng
Session: Novel Neural Network Architectures for Acoustic Modelling
Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis
Xu Li, Shaoguang Mao, Xixin Wu, Kun Li, Xunying Liu and Helen Meng
Session: Second Language Acquisition and Code-switching
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
Jianwei Yu, Xurong Xie, Shansong Liu, Shoukang Hu, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu and Helen Meng
Session: Application of ASR in Medical Practice
Speech and Language Processing for Learning and Wellbeing
Helen Meng
Session: Plenary Talk-3
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu and Helen Meng
Session: Expressive Speech Synthesis
Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method
Shuai Yang, Zhiyong Wu, Binbin Shen and Helen Meng
Session: Deep Learning for Source Separation and Pitch Tracking
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng and Lianhong Cai
Session: Emotion Recognition and Analysis
Menshikova, Alla
Language-Dependent Melody Embeddings
Daniil Kocharov and Alla Menshikova
Session: Speech Prosody
Mesgarani, Nima
Speech Processing in the Human Brain Meets Deep Learning
Nima Mesgarani
Session: Perspective Talk-3
Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
Yi Luo and Nima Mesgarani
Session: Deep Learning for Source Separation and Pitch Tracking
Music Source Activity Detection and Separation Using Deep Attractor Network
Rajath Kumar, Yi Luo and Nima Mesgarani
Session: Deep Learning for Source Separation and Pitch Tracking
Metze, Florian
Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks
Yun Wang, Juncheng Li and Florian Metze
Session: Audio Events and Acoustic Scenes
The ACLEW DiViMe: An Easy-to-use Diarization Tool
Adrien Le Franc, Eric Riebling, Julien Karadayi, Yun Wang, Camila Scaff, Florian Metze and Alejandrina Cristia
Session: Speaker Diarization
Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection
Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley and Samarjit Das
Session: Acoustic Scenes and Rare Events
Subword and Crossword Units for CTC Acoustic Models
Thomas Zenkel, Ramon Sanabria, Florian Metze and Alex Waibel
Session: ASR Systems and Technologies
Meunier, Fanny
Loud and Shouted Speech Perception at Variable Distances in a Forest
Julien Meyer, Fanny Meunier, Laure Dentel, Noelia Do Carmo Blanco and Frédéric Sèbe
Session: Speech Perception in Adverse Conditions
Phoneme Resistance and Phoneme Confusion in Noise: Impact of Dyslexia
Noelia Do Carmo Blanco, Julien Meyer, Michel Hoen and Fanny Meunier
Session: Speech Perception in Adverse Conditions
Meyer, Bernd T.
Prediction of Perceived Speech Quality Using Deep Machine Listening
Jasper Ooster, Rainer Huber and Bernd T. Meyer
Session: Models of Speech Perception
Prediction of Subjective Listening Effort from Acoustic Data with Non-Intrusive Deep Models
Paul Kranzusch, Rainer Huber, Melanie Krüger, Birger Kollmeier and Bernd T. Meyer
Session: Models of Speech Perception
Meyer, Julien
Length Contrast and Covarying Features: Whistled Speech as a Case Study
Rachid Ridouane, Giuseppina Turco and Julien Meyer
Session: Production of Prosody
Loud and Shouted Speech Perception at Variable Distances in a Forest
Julien Meyer, Fanny Meunier, Laure Dentel, Noelia Do Carmo Blanco and Frédéric Sèbe
Session: Speech Perception in Adverse Conditions
Phoneme Resistance and Phoneme Confusion in Noise: Impact of Dyslexia
Noelia Do Carmo Blanco, Julien Meyer, Michel Hoen and Fanny Meunier
Session: Speech Perception in Adverse Conditions
Mimura, Masato
Forward-Backward Attention Decoder
Masato Mimura, Shinsuke Sakai and Tatsuya Kawahara
Session: Recurrent Neural Models for ASR
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono and Tatsuya Kawahara
Session: Adjusting to Speaker, Accent, and Domain
Minematsu, Nobuaki
A Comparative Study of Statistical Conversion of Face to Voice Based on Their Subjective Impressions
Yasuhito Ohsugi, Daisuke Saito and Nobuaki Minematsu
Session: Multimodal Dialogue Systems
A Study of Objective Measurement of Comprehensibility through Native Speakers' Shadowing of Learners' Utterances
Yusuke Inoue, Suguru Kabashima, Daisuke Saito, Nobuaki Minematsu, Kumi Kanamura and Yutaka Yamauchi
Session: Applications in Education and Learning
Mitra, Vikramjit
Articulatory Features for ASR of Pathological Speech
Emre Yılmaz, Vikramjit Mitra, Chris Bartels and Horacio Franco
Session: Application of ASR in Medical Practice
Noise Robust Acoustic to Articulatory Speech Inversion
Nadee Seneviratne, Ganesh Sivaraman, Vikramjit Mitra and Carol Espy-Wilson
Session: Articulatory Information, Modeling and Inversion
Mittal, Sanjeev Kumar
Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training
Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
Montacié, Claude
Vocalic, Lexical and Prosodic Cues for the INTERSPEECH 2018 Self-Assessed Affect Challenge
Claude Montacié and Marie-José Caraty
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Moran, Joseph
Attention-based Sequence Classification for Affect Detection
Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani and John Kane
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Moriya, Takafumi
Automatic DNN Node Pruning Using Mixture Distribution-based Group Regularization
Tsukasa Yoshida, Takafumi Moriya, Kazuho Watanabe, Yusuke Shinohara, Yoshikazu Yamaguchi and Yushi Aono
Session: Selected Topics in Neural Speech Processing
Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition
Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, Yoshikazu Yamaguchi and Yushi Aono
Session: Adjusting to Speaker, Accent, and Domain
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono and Tatsuya Kawahara
Session: Adjusting to Speaker, Accent, and Domain
Motlicek, Petr
Analysis of Language Dependent Front-End for Speaker Recognition
Srikanth Madikeri, Subhadeep Dey and Petr Motlicek
Session: Speaker Verification II
Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network
Weipeng He, Petr Motlicek and Jean-Marc Odobez
Session: Deep Learning for Source Separation and Pitch Tracking
Iterative Learning of Speech Recognition Models for Air Traffic Control
Ajay Srinivasamurthy, Petr Motlicek, Mittul Singh, Youssef Oualil, Matthias Kleinert, Heiko Ehr and Hartmut Helmke
Session: Multimodal Systems
End-to-end Text-dependent Speaker Verification Using Novel Distance Measures
Subhadeep Dey, Srikanth Madikeri and Petr Motlicek
Session: Speaker Verification Using Neural Network Methods II
Mower Provost, Emily
Classification of Huntington Disease Using Acoustic and Lexical Features
Matthew Perez, Wenyu Jin, Duc Le, Noelle Carlozzi, Praveen Dayalu, Angela Roberts and Emily Mower Provost
Session: Integrating Speech Science and Technology for Clinical Applications
The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild
Soheil Khorram, Mimansa Jaiswal, John Gideon, Melvin McInnis and Emily Mower Provost
Session: Integrating Speech Science and Technology for Clinical Applications
Mošner, Ladislav
Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
Ladislav Mošner, Oldřich Plchot, Pavel Matějka, Ondřej Novotný and Jan Černocký
Session: Dereverberation
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Muckenhirn, Hannah
On Learning Vocal Tract System Related Speaker Discriminative Information from Raw Signal Using CNNs
Hannah Muckenhirn, Mathew Magimai.-Doss and Sebastien Marcel
Session: Speaker Verification II
On Learning to Identify Genders from Raw Speech Signal Using CNNs
Selen Hande Kabil, Hannah Muckenhirn and Mathew Magimai.-Doss
Session: Speaker State and Trait
Mulholland, Matthew
Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task
Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan and Aoife Cahill
Session: Spoken CALL Shared Task, Second Edition
Murphy, Andy
Voice Source Contribution to Prominence Perception: Rd Implementation
Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide and Christer Gobl
Session: Speech Segments and Voice Quality
On the Relationship between Glottal Pulse Shape and Its Spectrum: Correlations of Open Quotient, Pulse Skew and Peak Flow with Source Harmonic Amplitudes
Christer Gobl, Andy Murphy, Irena Yanushevskaya and Ailbhe Ní Chasaide
Session: Speech Segments and Voice Quality
Murthy, Hema
Mobile Application for Learning Languages for the Unlettered
Gayathri G, Mohana N, Radhika Pal and Hema Murthy
Session: Show and Tell 2
Decision-level Feature Switching as a Paradigm for Replay Attack Detection
Saranya M S and Hema Murthy
Session: Spoofing Detection
Brain-Computer Interface using Electroencephalogram Signatures of Eye Blinks
Srihari Maruthachalam, Sidharth Aggarwal, Mari Ganesh Kumar, Mriganka Sur and Hema Murthy
Session: Show and Tell 3
Information Bottleneck Based Percussion Instrument Diarization System for Taniavartanam Segments of Carnatic Music Concerts
Nauman Dawalatabad, Jom Kuriakose, Chandra Sekhar Chellu and Hema Murthy
Session: Syllabification, Rhythm, and Voice Activity Detection
Early Vocabulary Development Through Picture-based Software Solutions
Kasthuri G, Prabha Ramanathan, Hema Murthy, Namita Jacob and Anil Prabhakar
Session: Show and Tell 4
Code-switching in Indic Speech Synthesisers
Anju Leela Thomas, Anusha Prakash, Arun Baby and Hema Murthy
Session: Speech Technologies for Code-Switching in Multilingual Communities
Resyllabification in Indian Languages and Its Implications in Text-to-speech Systems
Mahesh M, Jeena J Prakash and Hema Murthy
Session: Speech Segments and Voice Quality
Transcription Correction for Indian Languages Using Acoustic Signatures
Jeena JPrakash, Golda Brunet Rajan and Hema Murthy
Session: Low Resource Speech Recognition Challenge for Indian Languages
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech
Jilt Sebastian, Manoj Kumar, Pavan Kumar D. S., Mathew Magimai.-Doss, Hema Murthy and Shrikanth Narayanan
Session: Speaker State and Trait
Mücke, Doris
Age-related Effects on Sensorimotor Control of Speech Production
Anne Hermes, Jane Mertens and Doris Mücke
Session: Speech and Singing Production
Structural Effects on Properties of Consonantal Gestures in Tashlhiyt
Anne Hermes, Doris Mücke, Bastian Auris and Rachid Ridouane
Session: Speech Segments and Voice Quality
N K, Kausthubha
Intonation tutor by SPIRE (In-SPIRE): An Online Tool for an Automatic Feedback to the Second Language Learners in Learning Intonation
Anand P A, Chiranjeevi Yarra, Kausthubha N K and Prasanta Kumar Ghosh
Session: Show and Tell 2
SPIRE-SST: An Automatic Web-based Self-learning Tool for Syllable Stress Tutoring (SST) to the Second Language Learners
Chiranjeevi Yarra, Anand P A, Kausthubha N K and Prasanta Kumar Ghosh
Session: Show and Tell 6
Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training
Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
N, Krishna D
Development of Large Vocabulary Speech Recognition System with Keyword Search for Manipuri
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Speech Recognition for Indian Languages
An Automatic Speech Transcription System for Manipuri Language
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Show and Tell 6
Nagarsheth, Parav
Speech Synthesis in the Wild
Ganesh Sivaraman, Parav Nagarsheth and Elie Khoury
Session: Show and Tell 7
Nakamura, Satoshi
Compressing End-to-end ASR Networks by Tensor-Train Decomposition
Takuma Mori, Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
Session: Sequence Models for ASR
Machine Speech Chain with One-shot Speaker Adaptation
Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
Session: Acoustic Model Adaptation
Incremental TTS for Japanese Language
Tomoya Yanagita, Sakriani Sakti and Satoshi Nakamura
Session: Statistical Parametric Speech Synthesis
Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents
Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo and Satoshi Nakamura
Session: Integrating Speech Science and Technology for Clinical Applications
Nakashika, Toru
DNN-based Speech Synthesis for Small Data Sets Considering Bidirectional Speech-Text Conversion
Kentaro Sone and Toru Nakashika
Session: Speech Synthesis Paradigms and Methods
LSTBM: A Novel Sequence Representation of Speech Spectra Using Restricted Boltzmann Machine with Long Short-Term Memory
Toru Nakashika
Session: Speech Synthesis Paradigms and Methods
Nakatani, Tomohiro
Multi-resolution Gammachirp Envelope Distortion Index for Intelligibility Prediction of Noisy Speech
Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita and Tomohiro Nakatani
Session: Speech Intelligibility and Quality
Auxiliary Feature Based Adaptation of End-to-end ASR Systems
Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita and Tomohiro Nakatani
Session: Adjusting to Speaker, Accent, and Domain
Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation
Lukas Drude, Christoph Boeddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix and Tomohiro Nakatani
Session: Distant ASR
Nandwana, Mahesh Kumar
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson and Martin Graciarena
Session: Speaker Verification II
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Analysis of Complementary Information Sources in the Speaker Embeddings Framework
Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout and Aaron Lawson
Session: Speaker Verification Using Neural Network Methods II
Narayanan, Arun
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition
Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara Sainath, Parisa Haghani, Bo Li and Michiel Bacchiani
Session: Acoustic Model Adaptation
Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models
Chanwoo Kim, Ehsan Variani, Arun Narayanan and Michiel Bacchiani
Session: Distant ASR
Narayanan, Shrikanth
Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Pavlos Papadopoulos, Colin Vaz and Shrikanth Narayanan
Session: Novel Approaches to Enhancement
Combined Speaker Clustering and Role Recognition in Conversational Speech
Nikolaos Flemotomos, Pavlos Papadopoulos, James Gibson and Shrikanth Narayanan
Session: Speaker Diarization
Language Features for Automated Evaluation of Cognitive Behavior Psychotherapy Sessions
Nikolaos Flemotomos, Victor Martinez, James Gibson, David Atkins, Torrey Creed and Shrikanth Narayanan
Session: Integrating Speech Science and Technology for Clinical Applications
Computational Modeling of Conversational Humor in Psychotherapy
Anil Ramakrishna, Timothy Greer, David Atkins and Shrikanth Narayanan
Session: Speech and Language Analytics for Mental Health
A Knowledge Driven Structural Segmentation Approach for Play-Talk Classification During Autism Assessment
Manoj Kumar, Pooja Chebolu, So Hyun Kim, Kassandra Martinez, Catherine Lord and Shrikanth Narayanan
Session: Spoken Corpora and Annotation
Improving Gender Identification in Movie Audio Using Cross-Domain Data
Rajat Hebbar, Krishna Somandepalli and Shrikanth Narayanan
Session: Speaker State and Trait
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech
Jilt Sebastian, Manoj Kumar, Pavan Kumar D. S., Mathew Magimai.-Doss, Hema Murthy and Shrikanth Narayanan
Session: Speaker State and Trait
Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy
Karan Singla, Zhuohao Chen, Nikolaos Flemotomos, James Gibson, Dogan Can, David Atkins and Shrikanth Narayanan
Session: Speech Pathology, Depression, and Medical Applications
Towards an Unsupervised Entrainment Distance in Conversational Speech Using Deep Neural Networks
Md Nasir, Brian Baucom, Shrikanth Narayanan and Panayiotis Georgiou
Session: Speech Pathology, Depression, and Medical Applications
Stochastic Shake-Shake Regularization for Affective Learning from Speech
Che-Wei Huang and Shrikanth Narayanan
Session: Emotion Recognition and Analysis
Ney, Hermann
Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
Eugen Beck, Mirko Hannemann, Patrick Dötsch, Ralf Schlüter and Hermann Ney
Session: Sequence Models for ASR
Comparison of BLSTM-Layer-Specific Affine Transformations for Speaker Adaptation
Markus Kitza, Ralf Schlüter and Hermann Ney
Session: Acoustic Model Adaptation
Improved Training of End-to-end Attention Models for Speech Recognition
Albert Zeyer, Kazuki Irie, Ralf Schlüter and Hermann Ney
Session: End-to-End Speech Recognition
Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition
Zoltán Tüske, Ralf Schlüter and Hermann Ney
Session: Language Modeling
Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs
Kazuki Irie, Zhihong Lei, Liuhui Deng, Ralf Schlüter and Hermann Ney
Session: ASR Systems and Technologies
Ng, Manwa
Acoustic Features Associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders
Wang Zhang, Xiangquan Gui, Tianqi Wang, Manwa Ng, Feng Yang, Lan Wang and Nan Yan
Session: Integrating Speech Science and Technology for Clinical Applications
Nguyen, Patrick
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Ni, Karl
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Deep Speech Denoising with Vector Space Projections
Jeffrey Hetherly, Paul Gamble, Maria Alejandra Barrios, Cory Stephenson and Karl Ni
Session: Source Separation from Monaural Input
Nidadavolu, Phani Sankar
Investigation on Bandwidth Extension for Speaker Recognition
Phani Sankar Nidadavolu, Cheng-I Lai, Jesús Villalba and Najim Dehak
Session: Speaker Verification II
End-to-end Deep Neural Network Age Estimation
Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur and Najim Dehak
Session: Speaker State and Trait
Niehues, Jan
Low-Latency Neural Speech Translation
Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber and Alex Waibel
Session: Selected Topics in Neural Speech Processing
Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks
Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel and Sebastian Stüker
Session: Extracting Information from Audio
Self-Attentional Acoustic Models
Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker and Alex Waibel
Session: Acoustic Modelling
Niesler, Thomas
Building a Unified Code-Switching ASR System for South African Languages
Emre Yılmaz, Astik Biswas, Ewald van der Westhuizen, Febe de Wet and Thomas Niesler
Session: Speech Technologies for Code-Switching in Multilingual Communities
Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech
Astik Biswas, Febe de Wet, Ewald van der Westhuizen, Emre Yılmaz and Thomas Niesler
Session: Topics in Speech Recognition
Fast ASR-free and Almost Zero-resource Keyword Spotting Using DTW and CNNs for Humanitarian Monitoring
Raghav Menon, Herman Kamper, John Quinn and Thomas Niesler
Session: Topics in Speech Recognition
Nishitoba, Jiro
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Nixon, Jessie S.
Neural Response Development During Distributional Learning
Natalie Boll-Avetisyan, Jessie S. Nixon, Tomas O. Lentz, Liquan Liu, Sandrien van Ommen, Çağri Çöltekin and Jacolien van Rij
Session: Cognition and Brain Studies
Effective Acoustic Cue Learning Is Not Just Statistical, It Is Discriminative
Jessie S. Nixon
Session: Cognition and Brain Studies
Novotný, Ondřej
Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
Ladislav Mošner, Oldřich Plchot, Pavel Matějka, Ondřej Novotný and Jan Černocký
Session: Dereverberation
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Nyberg, Eric
Investigating Utterance Level Representations for Detecting Intent from Acoustics
SaiKrishna Rallabandi, Bhavya Karki, Carla Viegas, Eric Nyberg and Alan W Black
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Ní Chasaide, Ailbhe
Voice Source Contribution to Prominence Perception: Rd Implementation
Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide and Christer Gobl
Session: Speech Segments and Voice Quality
On the Relationship between Glottal Pulse Shape and Its Spectrum: Correlations of Open Quotient, Pulse Skew and Peak Flow with Source Harmonic Amplitudes
Christer Gobl, Andy Murphy, Irena Yanushevskaya and Ailbhe Ní Chasaide
Session: Speech Segments and Voice Quality
Nöth, Elmar
A Multitask Learning Approach to Assess the Dysarthria Severity in Patients with Parkinson's Disease
Juan Camilo Vásquez Correa, Tomas Arias, Juan Rafael Orozco-Arroyave and Elmar Nöth
Session: Automatic Detection and Recognition of Voice and Speech Disorders
Multimodal I-vectors to Detect and Evaluate Parkinson's Disease
Nicanor Garcia, Juan Camilo Vásquez Correa, Juan Rafael Orozco-Arroyave and Elmar Nöth
Session: Speech and Language Analytics for Mental Health
Ochi, Keiko
Automatic Evaluation of Soft Articulatory Contact for Stuttering Treatment
Keiko Ochi, Koichi Mori and Naomi Sakai
Session: Speech and Singing Production
Respiratory and Respiratory Muscular Control in JL1’s and JL2’s Text Reading Utilizing 4-RSTs and a Soft Respiratory Mask with a Two-Way Bulb
Toshiko Isei-Jaakkola, Keiko Ochi and Keikichi Hirose
Session: Source and Supra-segmentals
Ochiai, Tsubasa
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Odobez, Jean-Marc
Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization
Nam Le and Jean-Marc Odobez
Session: Speaker Verification Using Neural Network Methods I
Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network
Weipeng He, Petr Motlicek and Jean-Marc Odobez
Session: Deep Learning for Source Separation and Pitch Tracking
Ogawa, Atsunori
Auxiliary Feature Based Adaptation of End-to-end ASR Systems
Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita and Tomohiro Nakatani
Session: Adjusting to Speaker, Accent, and Domain
Semi-Supervised End-to-End Speech Recognition
Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa and Marc Delcroix
Session: End-to-End Speech Recognition
Ondel, Lucas
Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages
Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak and Sanjeev Khudanpur
Session: Extracting Information from Audio
Unsupervised Word Segmentation from Speech with Attention
Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio and Laurent Besacier
Session: Zero-resource Speech Recognition
Orozco-Arroyave, Juan Rafael
A Multitask Learning Approach to Assess the Dysarthria Severity in Patients with Parkinson's Disease
Juan Camilo Vásquez Correa, Tomas Arias, Juan Rafael Orozco-Arroyave and Elmar Nöth
Session: Automatic Detection and Recognition of Voice and Speech Disorders
Multimodal I-vectors to Detect and Evaluate Parkinson's Disease
Nicanor Garcia, Juan Camilo Vásquez Correa, Juan Rafael Orozco-Arroyave and Elmar Nöth
Session: Speech and Language Analytics for Mental Health
Ottl, Sandra
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
Session: Speech and Language Analytics for Mental Health
Ouni, Slim
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information
Lucas D. Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan C. Gómez, Bernard Girau and Nancy Hitschfeld-Kahler
Session: Spoken Corpora and Annotation
Phoneme-to-Articulatory Mapping Using Bidirectional Gated RNN
Théo Biasutto-Lervat and Slim Ouni
Session: Articulatory Information, Modeling and Inversion
P A, Anand
Intonation tutor by SPIRE (In-SPIRE): An Online Tool for an Automatic Feedback to the Second Language Learners in Learning Intonation
Anand P A, Chiranjeevi Yarra, Kausthubha N K and Prasanta Kumar Ghosh
Session: Show and Tell 2
SPIRE-SST: An Automatic Web-based Self-learning Tool for Syllable Stress Tutoring (SST) to the Second Language Learners
Chiranjeevi Yarra, Anand P A, Kausthubha N K and Prasanta Kumar Ghosh
Session: Show and Tell 6
Pandey, Laxmi
LSTM Based Attentive Fusion of Spectral and Prosodic Information for Keyword Spotting in Hindi Language
Laxmi Pandey and Karan Nathwani
Session: Spoken Term Detection
Monoaural Audio Source Separation Using Variational Autoencoders
Laxmi Pandey, Anurendra Kumar and Vinay Namboodiri
Session: Source Separation from Monaural Input
Pandey, Prem C.
Implementation of Digital Hearing Aid as a Smartphone Application
Saketh Sharma, Nitya Tiwari and Prem C. Pandey
Session: Novel Approaches to Enhancement
Detection of Glottal Excitation Epochs in Speech Signal Using Hilbert Envelope
Hirak Dasgupta, Prem C. Pandey and K S Nataraj
Session: Signal Analysis for the Natural, Biological and Social Sciences
Pantofaru, Caroline
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Papadopoulos, Pavlos
Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Pavlos Papadopoulos, Colin Vaz and Shrikanth Narayanan
Session: Novel Approaches to Enhancement
Combined Speaker Clustering and Role Recognition in Conversational Speech
Nikolaos Flemotomos, Pavlos Papadopoulos, James Gibson and Shrikanth Narayanan
Session: Speaker Diarization
Papangelis, Alexandros
Comparison of an End-to-end Trainable Dialogue System with a Modular Statistical Dialogue System
Norbert Braunschweiler and Alexandros Papangelis
Session: Spoken Dialogue Systems and Conversational Analysis
A Case Study on the Importance of Belief State Representation for Dialogue Policy Management
Margarita Kotti, Vassilios Diakoloukas, Alexandros Papangelis, Michail Lagoudakis and Yannis Stylianou
Session: Multimodal Dialogue Systems
Parada-Cabaleiro, Emilia
The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech
Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Categorical vs Dimensional Perception of Italian Emotional Speech
Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird and Björn Schuller
Session: Emotion Recognition and Analysis
Park, Jinhwan
Hierarchical Recurrent Neural Networks for Acoustic Modeling
Jinhwan Park, Iksoo Choi, Yoonho Boo and Wonyong Sung
Session: Acoustic Modelling
Character-level Language Modeling with Gated Hierarchical Recurrent Neural Networks
Iksoo Choi, Jinhwan Park and Wonyong Sung
Session: ASR Systems and Technologies
Park, Soo Jin
Effectiveness of Voice Quality Features in Detecting Depression
Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Jonathan Flint and Abeer Alwan
Session: Integrating Speech Science and Technology for Clinical Applications
Using Voice Quality Supervectors for Affect Identification
Soo Jin Park, Amber Afshan, Zhi Ming Chua and Abeer Alwan
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Parthasarathy, Sarangarajan
What to Expect from Expected Kneser-Ney Smoothing
Michael Levit, Sarangarajan Parthasarathy and Shuangyu Chang
Session: Language Modeling
Entity-Aware Language Model as an Unsupervised Reranker
Mohammad Sadegh Rasooli and Sarangarajan Parthasarathy
Session: ASR Systems and Technologies
Parthasarathy, Srinivas
Role of Regularization in the Prediction of Valence from Speech
Kusha Sridhar, Srinivas Parthasarathy and Carlos Busso
Session: Emotion Modeling
Preference-Learning with Qualitative Agreement for Sentence Level Emotional Annotations
Srinivas Parthasarathy and Carlos Busso
Session: Speaker State and Trait
Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes
Srinivas Parthasarathy and Carlos Busso
Session: Emotion Recognition and Analysis
Patel, Tanvina
Development of Large Vocabulary Speech Recognition System with Keyword Search for Manipuri
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Speech Recognition for Indian Languages
An Automatic Speech Transcription System for Manipuri Language
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Show and Tell 6
TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages
Noor Fathima, Tanvina Patel, Mahima C and Anuroop Iyengar
Session: Low Resource Speech Recognition Challenge for Indian Languages
Patil, Hemant
Effectiveness of Speech Demodulation-Based Features for Replay Detection
Madhu Kamble, Hemlata Tak and Hemant Patil
Session: Spoofing Detection
Novel Variable Length Energy Separation Algorithm Using Instantaneous Amplitude Features for Replay Detection
Madhu Kamble and Hemant Patil
Session: Spoofing Detection
Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection
Hardik Sailor, Madhu Kamble and Hemant Patil
Session: Spoofing Detection
Auditory Filterbank Learning Using ConvRBM for Infant Cry Classification
Hardik B. Sailor and Hemant Patil
Session: Speech Analysis and Representation
Effectiveness of Dynamic Features in INCA and Temporal Context-INCA
Nirmesh Shah and Hemant Patil
Session: Speech Analysis and Representation
Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection
Prasad Tapkir and Hemant Patil
Session: Speech Analysis and Representation
Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection
Hemlata Tak and Hemant Patil
Session: Speech Analysis and Representation
Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion
Nirmesh Shah, Maulik C. Madhavi and Hemant Patil
Session: Voice Conversion and Speech Synthesis
Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion
Neil Shah, Nirmesh Shah and Hemant Patil
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018
Hardik B. Sailor, Maddala Venkata Siva Krishna, Diksha Chhabra, Ankur T. Patil, Madhu Kamble and Hemant Patil
Session: Low Resource Speech Recognition Challenge for Indian Languages
Petrović, Sunčica
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
Session: Speech and Language Analytics for Mental Health
Pham, Van Tung
Mandarin-English Code-switching Speech Recognition
Haihua Xu, Van Tung Pham, Zin Tun Kyaw, Zhi Hao Lim, Eng Siong Chng and Haizhou Li
Session: Show and Tell 2
Plchot, Oldřich
Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
Ladislav Mošner, Oldřich Plchot, Pavel Matějka, Ondřej Novotný and Jan Černocký
Session: Dereverberation
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Pokorny, Florian B.
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Povey, Daniel
Acoustic Modeling from Frequency Domain Representations of Speech
Pegah Ghahremani, Hossein Hadian, Hang Lv, Daniel Povey and Sanjeev Khudanpur
Session: Robust Speech Recognition
Output-Gate Projected Gated Recurrent Unit for Speech Recognition
Gaofeng Cheng, Daniel Povey, Lu Huang, Ji Xu, Sanjeev Khudanpur and Yonghong Yan
Session: Novel Neural Network Architectures for Acoustic Modelling
A GPU-based WFST Decoder with Exact Lattice Generation
Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Recurrent Neural Models for ASR
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
Emotion Identification from Raw Speech Signals Using DNNs
Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma and Najim Dehak
Session: Representation Learning for Emotion
End-to-end Deep Neural Network Age Estimation
Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur and Najim Dehak
Session: Speaker State and Trait
Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Ke Li, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Language Modeling
Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
Yingke Zhu, Tom Ko, David Snyder, Brian Mak and Daniel Povey
Session: Speaker Verification Using Neural Network Methods II
End-to-end Speech Recognition Using Lattice-free MMI
Hossein Hadian, Hossein Sameti, Daniel Povey and Sanjeev Khudanpur
Session: End-to-End Speech Recognition
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi and Sanjeev Khudanpur
Session: Acoustic Modelling
Pradhan, Gayadhar
Analysis of Variational Mode Functions for Robust Detection of Vowels
Surbhi Sakshi, Avinash Kumar and Gayadhar Pradhan
Session: Speech Analysis and Representation
Enhancement of Noisy Speech Signal by Non-Local Means Estimation of Variational Mode Functions
Nagapuri Srinivas, Gayadhar Pradhan and Syed Shahnawazuddin
Session: Novel Approaches to Enhancement
Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition
Ishwar Chandra Yadav, Avinash Kumar, Syed Shahnawazuddin and Gayadhar Pradhan
Session: Robust Speech Recognition
Prakash, Anusha
Code-switching in Indic Speech Synthesisers
Anju Leela Thomas, Anusha Prakash, Arun Baby and Hema Murthy
Session: Speech Technologies for Code-Switching in Multilingual Communities
Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition
Vishwas M. Shetty, Rini A Sharon, Basil Abraham, Tejaswi Seeram, Anusha Prakash, Nithya Ravi and S. Umesh
Session: Low Resource Speech Recognition Challenge for Indian Languages
Prasad, RaviShankar
Discriminating Nasals and Approximants in English Language Using Zero Time Windowing
RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty and Bayya Yegnanarayana
Session: Speech Segments and Voice Quality
Identification and Classification of Fricatives in Speech Using Zero Time Windowing Method
RaviShankar Prasad and Bayya Yegnanarayana
Session: Speech Segments and Voice Quality
Prasanna, S R Mahadeva
Exploration of Compressed ILPR Features for Replay Attack Detection
Sarfaraz Jelil, Sishir Kalita, S R Mahadeva Prasanna and Rohit Sinha
Session: Spoofing Detection
Robust Mizo Continuous Speech Recognition
Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna, Rohit Sinha and Nirmala S.R.
Session: Speech Recognition for Indian Languages
Spoken Keyword Detection Using Joint DTW-CNN
Ravi Shankar, Vikram C M and S R Mahadeva Prasanna
Session: Spoken Term Detection
Processing Transition Regions of Glottal Stop Substituted /S/ for Intelligibility Enhancement of Cleft Palate Speech
Protima Nomo Sudro, Sishir Kalita and S R Mahadeva Prasanna
Session: Speech and Singing Production
Estimation of Hypernasality Scores from Cleft Lip and Palate Speech
Vikram C M, Ayush Tripathi, Sishir Kalita and S R Mahadeva Prasanna
Session: Integrating Speech Science and Technology for Clinical Applications
Epoch Extraction from Pathological Children Speech Using Single Pole Filtering Approach
Vikram C M and S R Mahadeva Prasanna
Session: Measuring Pitch and Articulation
Glotto Vibrato Graph: A Device and Method for Recording, Analysis and Visualization of Glottal Activity
Kishalay Chakraborty, Senjam Shantirani Devi, Sanjeevan Devnath, S R Mahadeva Prasanna and Priyankoo Sarmah
Session: Show and Tell 6
Analysis of Breathiness in Contextual Vowel of Voiceless Nasals in Mizo
Pamir Gogoi, Sishir Kalita, Parismita Gogoi, Ratree Wayland, Priyankoo Sarmah, S R Mahadeva Prasanna
Session: Speech Segments and Voice Quality
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
Session: Show and Tell 7
Self-similarity Matrix Based Intelligibility Assessment of Cleft Lip and Palate Speech
Sishir Kalita, S R Mahadeva Prasanna and Samarendra Dandapat
Session: Acoustic Analysis-Synthesis of Speech Disorders
Pitch-Adaptive Front-end Feature for Hypernasality Detection
Akhilesh Kumar Dubey, S R Mahadeva Prasanna and S Dandapat
Session: Acoustic Analysis-Synthesis of Speech Disorders
Detection of Glottal Activity Errors in Production of Stop Consonants in Children with Cleft Lip and Palate
Vikram C M, S R Mahadeva Prasanna, Ajish K Abraham, Pushpavathi M and Girish K S
Session: Acoustic Analysis-Synthesis of Speech Disorders
Pugh, Robert
Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task
Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan and Aoife Cahill
Session: Spoken CALL Shared Task, Second Edition
Qadir, Junaid
Transfer Learning for Improving Speech Emotion Classification Accuracy
Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir and Julien Epps
Session: Speaker State and Trait
Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study
Siddique Latif, Rajib Rana, Junaid Qadir and Julien Epps
Session: Representation Learning for Emotion
Qian, Kun
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Evolving Learning for Analysing Mood-Related Infant Vocalisation
Zixing Zhang, Jing Han, Kun Qian and Björn Schuller
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Qian, Mengjie
Overview of the 2018 Spoken CALL Shared Task
Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei
Session: Spoken CALL Shared Task, Second Edition
The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task
Dominik Jülg, Mario Kunstek, Cem Philipp Freimoser, Kay Berkling and Mengjie Qian
Session: Spoken CALL Shared Task, Second Edition
The University of Birmingham 2018 Spoken CALL Shared Task Systems
Mengjie Qian, Xizi Wei, Peter Jančovič and Martin Russell
Session: Spoken CALL Shared Task, Second Edition
Phone Recognition Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs
Mengjie Qian, Linxue Bai, Peter Jančovič and Martin Russell
Session: Acoustic Modelling
Qian, Yanmin
Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks
Xuankai Chang, Yanmin Qian and Dong Yu
Session: Robust Speech Recognition
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Knowledge Distillation for Sequence Model
Mingkun Huang, Yongbin You, Zhehuai Chen, Yanmin Qian and Kai Yu
Session: Acoustic Modelling
Qian, Yao
Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task
Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan and Aoife Cahill
Session: Spoken CALL Shared Task, Second Edition
Quatieri, Thomas
Vocal Biomarkers for Cognitive Performance Estimation in a Working Memory Task
Jennifer Sloboda, Adam Lammert, James Williamson, Christopher Smalt, Daryush D. Mehta, COL Ian Curry, Kristin Heaton, Jeffrey Palmer and Thomas Quatieri
Session: Speaker Characterization and Analysis
The Effect of Exposure to High Altitude and Heat on Speech Articulatory Coordination
James Williamson, Thomas Quatieri, Adam Lammert, Katherine Mitchell, Katherine Finkelstein, Nicole Ekon, Caitlin Dillon, Robert Kenefick and Kristin Heaton
Session: Speaker State and Trait
Ragni, Anton
Impact of ASR Performance on Free Speaking Language Assessment
Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines
Session: Applications in Education and Learning
Automatic Speech Recognition System Development in the "Wild"
Anton Ragni and Mark Gales
Session: Recurrent Neural Models for ASR
Active Memory Networks for Language Modeling
Oscar Chen, Anton Ragni, Mark Gales and Xie Chen
Session: Language Modeling
Rajan, Padmanabhan
All-Conv Net for Bird Activity Detection: Significance of Learned Pooling
Arjun Pankajakshan, Anshul Thakur, Daksh Thapar, Padmanabhan Rajan and Aditya Nigam
Session: Signal Analysis for the Natural, Biological and Social Sciences
Deep Convex Representations: Feature Representations for Bioacoustics Classification
Anshul Thakur, Vinayak Abrol, Pulkit Sharma and Padmanabhan Rajan
Session: Signal Analysis for the Natural, Biological and Social Sciences
Rallabandi, SaiKrishna
Investigating Utterance Level Representations for Detecting Intent from Acoustics
SaiKrishna Rallabandi, Bhavya Karki, Carla Viegas, Eric Nyberg and Alan W Black
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
An Investigation of Convolution Attention Based Models for Multilingual Speech Synthesis of Indian Languages
Pallavi Baljekar, SaiKrishna Rallabandi and Alan W Black
Session: Speech Synthesis Paradigms and Methods
Ram, Dhananjay
CNN Based Query by Example Spoken Term Detection
Dhananjay Ram, Lesly Miculicich and Hervé Bourlard
Session: Spoken Term Detection
Phonological Posterior Hashing for Query by Example Spoken Term Detection
Afsaneh Asaei, Dhananjay Ram and Hervé Bourlard
Session: Extracting Information from Audio
Ramabhadran, Bhuvana
Open Problems in Speech Recognition
Bhuvana Ramabhadran
Session: Perspective Talk-2
Data Augmentation Improves Recognition of Foreign Accented Speech
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin and Gakuto Kurata
Session: Adjusting to Speaker, Accent, and Domain
Ramachandran, Samith
Industry presentation by Uniphore
Samith Ramachandran
Session: Industry Presentation-3
Ramanarayanan, Vikram
Game-based Spoken Dialog Language Learning Applications for Young Students
Keelan Evanini, Veronika Timpe-Laughlin, Eugene Tsuprun, Ian Blood, Jeremy Lee, James Bruno, Vikram Ramanarayanan, Patrick Lange and David Suendermann-Oeft
Session: Show and Tell 2
FACTS: A Hierarchical Task-based Control Model of Speech Incorporating Sensory Feedback
Benjamin Parrell, Vikram Ramanarayanan, Srikantan Nagarajan and John Houde
Session: Speech and Singing Production
Toward Scalable Dialog Technology for Conversational Language Learning: Case Study of the TOEFL® MOOC
Vikram Ramanarayanan, David Pautler, Patrick Lange, Eugene Tsuprun, Rutuja Ubale, Keelan Evanini and David Suendermann-Oeft
Session: Show and Tell 5
Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task
Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan and Aoife Cahill
Session: Spoken CALL Shared Task, Second Edition
Ramasubramanian, V
Indian Languages ASR: A Multilingual Phone Recognition Framework with IPA Based Common Phone-set, Predicted Articulatory Features and Feature fusion
Manjunath K E, K. Sreenivasa Rao, Dinesh Babu Jayagopi and V Ramasubramanian
Session: Speech Recognition for Indian Languages
Semi-supervised and Active-learning Scenarios: Efficient Acoustic Model Refinement for a Low Resource Indian Language
Maharajan Chellapriyadharshini, Anoop Toffy, Srinivasa Raghavan K. M. and V Ramasubramanian
Session: Speech Recognition for Indian Languages
Rana, Rajib
Transfer Learning for Improving Speech Emotion Classification Accuracy
Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir and Julien Epps
Session: Speaker State and Trait
Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study
Siddique Latif, Rajib Rana, Junaid Qadir and Julien Epps
Session: Representation Learning for Emotion
Rao MV, Achuth
Reconstructing Neutral Speech from Tracheoesophageal Speech
Abinay Reddy N, Achuth Rao MV, G. Nisha Meenakshi and Prasanta Kumar Ghosh
Session: Speech and Singing Production
Automatic Glottis Localization and Segmentation in Stroboscopic Videos Using Deep Neural Network
Achuth Rao MV, Rahul Krishnamurthy, Pebbili Gopikishore, Veeramani Priyadharshini and Prasanta Kumar Ghosh
Session: Source and Supra-segmentals
Rao, K. Sreenivasa
Analysis of sparse representation based feature on speech mode classification
Kumud Tripathi and K. Sreenivasa Rao
Session: Speech Analysis and Representation
Harmonic-Percussive Source Separation of Polyphonic Music by Suppressing Impulsive Noise Events
Gurunath Reddy M, K. Sreenivasa Rao and Partha Pratim Das
Session: Source Separation and Spatial Analysis
Indian Languages ASR: A Multilingual Phone Recognition Framework with IPA Based Common Phone-set, Predicted Articulatory Features and Feature fusion
Manjunath K E, K. Sreenivasa Rao, Dinesh Babu Jayagopi and V Ramasubramanian
Session: Speech Recognition for Indian Languages
Classification of Disorders in Vocal Folds Using Electroglottographic Signal
Tanumay Mandal, K. Sreenivasa Rao and Sanjay Kumar Gupta
Session: Source and Supra-segmentals
Rao, Preeti
Acoustic-Prosodic Features of Tabla Bol Recitation and Correspondence with the Tabla Imitation
Rohit M A and Preeti Rao
Session: Syllabification, Rhythm, and Voice Activity Detection
A Non-convolutive NMF Model for Speech Dereverberation
Nikhil M, Rajbabu Velmurugan and Preeti Rao
Session: Dereverberation
Automatic Detection of Expressiveness in Oral Reading
Kamini Sabu, Kanhaiya Kumar and Preeti Rao
Session: Show and Tell 4
A Study of Lexical and Prosodic Cues to Segmentation in a Hindi-English Code-switched Discourse
Preeti Rao, Mugdha Pandya, Kamini Sabu, Kanhaiya Kumar and Nandini Bondale
Session: Speech Technologies for Code-Switching in Multilingual Communities
Rastrow, Ariya
Device-directed Utterance Detection
Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas and Björn Hoffmeister
Session: Syllabification, Rhythm, and Voice Activity Detection
Contextual Language Model Adaptation for Conversational Agents
Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh and Ariya Rastrow
Session: Language Modeling
Rathner, Eva-Maria
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
State of Mind: Classification through Self-reported Affect and Word Use in Speech.
Eva-Maria Rathner, Yannik Terhorst, Nicholas Cummins, Björn Schuller and Harald Baumeister
Session: Speaker State and Trait
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe and Harald Baumeister
Session: Speech Pathology, Depression, and Medical Applications
Rayner, Manny
A Robust Context-Dependent Speech-to-Speech Phraselator Toolkit for Alexa
Manny Rayner, Nikos Tsourakis and Jan Stanek
Session: Show and Tell 1
Overview of the 2018 Spoken CALL Shared Task
Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei
Session: Spoken CALL Shared Task, Second Edition
Reale, Nathan
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Ren, Zhao
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech
Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller
Session: Representation Learning for Emotion
Renduchintala, Adithya
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Multi-Modal Data Augmentation for End-to-end ASR
Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner and Shinji Watanabe
Session: Adjusting to Speaker, Accent, and Domain
Renkens, Vincent
Capsule Networks for Low Resource Spoken Language Understanding
Vincent Renkens and Hugo van Hamme
Session: Spoken Dialogue Systems and Conversational Analysis
State Gradients for RNN Memory Analysis
Lyan Verwimp, Hugo van Hamme, Vincent Renkens and Patrick Wambacq
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Riad, Rachid
Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning
Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz and Emmanuel Dupoux
Session: Topics in Speech Recognition
Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments
Nils Holzenberger, Mingxing Du, Julien Karadayi, Rachid Riad and Emmanuel Dupoux
Session: Zero-resource Speech Recognition
Richey, Colleen
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson and Martin Graciarena
Session: Speaker Verification II
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Ridouane, Rachid
Length Contrast and Covarying Features: Whistled Speech as a Case Study
Rachid Ridouane, Giuseppina Turco and Julien Meyer
Session: Production of Prosody
Structural Effects on Properties of Consonantal Gestures in Tashlhiyt
Anne Hermes, Doris Mücke, Bastian Auris and Rachid Ridouane
Session: Speech Segments and Voice Quality
Rohdin, Johan
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Roth, Joseph
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Russell, Martin
Exploring How Phone Classification Neural Networks Learn Phonetic Information by Visualising and Interpreting Bottleneck Features
Linxue Bai, Philip Weber, Peter Jančovič and Martin Russell
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Overview of the 2018 Spoken CALL Shared Task
Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei
Session: Spoken CALL Shared Task, Second Edition
The University of Birmingham 2018 Spoken CALL Shared Task Systems
Mengjie Qian, Xizi Wei, Peter Jančovič and Martin Russell
Session: Spoken CALL Shared Task, Second Edition
Analysis of Phone Errors Attributable to Phonological Effects Associated With Language Acquisition Through Bottleneck Feature Visualisations
Eva Fringi and Martin Russell
Session: Second Language Acquisition and Code-switching
Phone Recognition Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs
Mengjie Qian, Linxue Bai, Peter Jančovič and Martin Russell
Session: Acoustic Modelling
Räsänen, Okko
Time-regularized Linear Prediction for Noise-robust Extraction of the Spectral Envelope of Speech
Manu Airaksinen, Lauri Juvela, Okko Räsänen and Paavo Alku
Session: Speech Analysis and Representation
Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions
Okko Räsänen, Seshadri Shreyas and Marisa Casillas
Session: Syllabification, Rhythm, and Voice Activity Detection
S, Chandana
Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training
Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
S.R., Nirmala
Robust Mizo Continuous Speech Recognition
Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna, Rohit Sinha and Nirmala S.R.
Session: Speech Recognition for Indian Languages
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
Session: Show and Tell 7
Sabu, Kamini
Automatic Detection of Expressiveness in Oral Reading
Kamini Sabu, Kanhaiya Kumar and Preeti Rao
Session: Show and Tell 4
A Study of Lexical and Prosodic Cues to Segmentation in a Hindi-English Code-switched Discourse
Preeti Rao, Mugdha Pandya, Kamini Sabu, Kanhaiya Kumar and Nandini Bondale
Session: Speech Technologies for Code-Switching in Multilingual Communities
Sad, Gonzalo
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information
Lucas D. Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan C. Gómez, Bernard Girau and Nancy Hitschfeld-Kahler
Session: Spoken Corpora and Annotation
Sailor, Hardik B.
Auditory Filterbank Learning Using ConvRBM for Infant Cry Classification
Hardik B. Sailor and Hemant Patil
Session: Speech Analysis and Representation
DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018
Hardik B. Sailor, Maddala Venkata Siva Krishna, Diksha Chhabra, Ankur T. Patil, Madhu Kamble and Hemant Patil
Session: Low Resource Speech Recognition Challenge for Indian Languages
Sainath, Tara
Compression of End-to-End Models
Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang and Chung-Cheng Chiu
Session: End-to-End Speech Recognition
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition
Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara Sainath, Parisa Haghani, Bo Li and Michiel Bacchiani
Session: Acoustic Model Adaptation
Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search
Ian Williams, Anjuli Kannan, Petar Aleksic, David Rybach and Tara Sainath
Session: Recurrent Neural Models for ASR
Saito, Daisuke
A Comparative Study of Statistical Conversion of Face to Voice Based on Their Subjective Impressions
Yasuhito Ohsugi, Daisuke Saito and Nobuaki Minematsu
Session: Multimodal Dialogue Systems
A Study of Objective Measurement of Comprehensibility through Native Speakers' Shadowing of Learners' Utterances
Yusuke Inoue, Suguru Kabashima, Daisuke Saito, Nobuaki Minematsu, Kumi Kanamura and Yutaka Yamauchi
Session: Applications in Education and Learning
Sak, Hasim
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Sakai, Shinsuke
Forward-Backward Attention Decoder
Masato Mimura, Shinsuke Sakai and Tatsuya Kawahara
Session: Recurrent Neural Models for ASR
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono and Tatsuya Kawahara
Session: Adjusting to Speaker, Accent, and Domain
Sakti, Sakriani
Compressing End-to-end ASR Networks by Tensor-Train Decomposition
Takuma Mori, Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
Session: Sequence Models for ASR
Machine Speech Chain with One-shot Speaker Adaptation
Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
Session: Acoustic Model Adaptation
Incremental TTS for Japanese Language
Tomoya Yanagita, Sakriani Sakti and Satoshi Nakamura
Session: Statistical Parametric Speech Synthesis
Salamon, Gudrun
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe and Harald Baumeister
Session: Speech Pathology, Depression, and Medical Applications
Samudravijaya, K
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
Session: Show and Tell 7
San Segundo Fernández, Eugenia
The Individual and the System: Assessing the Stability of the Output of a Semi-automatic Forensic Voice Comparison System
Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh and Eugenia San Segundo Fernández
Session: Speech Segments and Voice Quality
Sangwan, Abhijeet
Fearless Steps: Apollo-11 Corpus Advancements for Speech Technologies from Earth to the Moon
John H.L. Hansen, Abhijeet Sangwan, Aditya Joglekar, Ahmet E. Bulut, Lakshmish Kaushik and Chengzhu Yu
Session: Spoken Corpora and Annotation
Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams
Harishchandra Dubey, Abhijeet Sangwan and John H.L. Hansen
Session: Speaker Verification Using Neural Network Methods II
Sankar, Ananth
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Sarma, Mousmita
Extracting Speaker’s Gender, Accent, Age and Emotional State from Speech
Nagendra Goel, Mousmita Sarma, Tejendra Kushwah, Dharmesh Agarwal, Zikra Iqbal and Surbhi Chauhan
Session: Show and Tell 6
Emotion Identification from Raw Speech Signals Using DNNs
Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma and Najim Dehak
Session: Representation Learning for Emotion
Sarmah, Priyankoo
Robust Mizo Continuous Speech Recognition
Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna, Rohit Sinha and Nirmala S.R.
Session: Speech Recognition for Indian Languages
Glotto Vibrato Graph: A Device and Method for Recording, Analysis and Visualization of Glottal Activity
Kishalay Chakraborty, Senjam Shantirani Devi, Sanjeevan Devnath, S R Mahadeva Prasanna and Priyankoo Sarmah
Session: Show and Tell 6
Analysis of Breathiness in Contextual Vowel of Voiceless Nasals in Mizo
Pamir Gogoi, Sishir Kalita, Parismita Gogoi, Ratree Wayland, Priyankoo Sarmah, S R Mahadeva Prasanna
Session: Speech Segments and Voice Quality
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
Session: Show and Tell 7
Scharenborg, Odette
Visualizing Phoneme Category Adaptation in Deep Neural Networks
Odette Scharenborg, Sebastian Tiesmeyer, Mark Hasegawa-Johnson and Najim Dehak
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Articulatory Feature Classification Using Convolutional Neural Networks
Danny Merkx and Odette Scharenborg
Session: Signal Analysis for the Natural, Biological and Social Sciences
The Conversation Continues: the Effect of Lyrics and Music Complexity of Background Music on Spoken-Word Recognition
Odette Scharenborg and Martha Larson
Session: Speech Perception in Adverse Conditions
Schiller, Dominik
Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?
Johannes Wagner, Dominik Schiller, Andreas Seiderer and Elisabeth André
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Schlüter, Ralf
Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
Eugen Beck, Mirko Hannemann, Patrick Dötsch, Ralf Schlüter and Hermann Ney
Session: Sequence Models for ASR
Comparison of BLSTM-Layer-Specific Affine Transformations for Speaker Adaptation
Markus Kitza, Ralf Schlüter and Hermann Ney
Session: Acoustic Model Adaptation
Improved Training of End-to-end Attention Models for Speech Recognition
Albert Zeyer, Kazuki Irie, Ralf Schlüter and Hermann Ney
Session: End-to-End Speech Recognition
Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition
Zoltán Tüske, Ralf Schlüter and Hermann Ney
Session: Language Modeling
Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs
Kazuki Irie, Zhihong Lei, Liuhui Deng, Ralf Schlüter and Hermann Ney
Session: ASR Systems and Technologies
Schmitt, Maximilian
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech
Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller
Session: Representation Learning for Emotion
Schroeter, Julien
Computational Paralinguistics: Automatic Assessment of Emotions, Mood and Behavioural State from Acoustics of Speech
Zafi Sherhan Syed, Julien Schroeter, Kirill Sidorov and David Marshall
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Schuller, Björn
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Towards Temporal Modelling of Categorical Speech Emotion Recognition
Wenjing Han, Huabin Ruan, Xiaomin Chen, Zhixiang Wang, Haifeng Li and Björn Schuller
Session: Emotion Modeling
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Evolving Learning for Analysing Mood-Related Infant Vocalisation
Zixing Zhang, Jing Han, Kun Qian and Björn Schuller
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins and Björn Schuller
Session: Speech and Language Analytics for Mental Health
Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations
Zixing Zhang, Alejandrina Cristia, Anne Warlaumont and Björn Schuller
Session: Second Language Acquisition and Code-switching
The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech
Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
State of Mind: Classification through Self-reported Affect and Word Use in Speech.
Eva-Maria Rathner, Yannik Terhorst, Nicholas Cummins, Björn Schuller and Harald Baumeister
Session: Speaker State and Trait
Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech
Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller
Session: Representation Learning for Emotion
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe and Harald Baumeister
Session: Speech Pathology, Depression, and Medical Applications
Annotator Trustability-based Cooperative Learning Solutions for Intelligent Audio Analysis
Simone Hantke, Christoph Stemp and Björn Schuller
Session: Multimodal Systems
Categorical vs Dimensional Perception of Italian Emotional Speech
Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird and Björn Schuller
Session: Emotion Recognition and Analysis
Schultz, Tanja
Investigating the Effect of Audio Duration on Dementia Detection Using Acoustic Features
Jochen Weiner, Miguel Angrick, Srinivasan Umesh and Tanja Schultz
Session: Speech and Language Analytics for Mental Health
Investigating Objective Intelligibility in Real-Time EMG-to-Speech Conversion
Lorenz Diener and Tanja Schultz
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Domain-Adversarial Training for Session Independent EMG-based Speech Recognition
Michael Wand, Tanja Schultz and Jürgen Schmidhuber
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Schwartz, Jean-Luc
Picture Naming or Word Reading: Does the Modality Affect Speech Motor Adaptation and Its Transfer?
Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz and Amélie Rochet-Capellan
Session: Models of Speech Perception
COSMO SylPhon: A Bayesian Perceptuo-motor Model to Assess Phonological Learning
Marie-Lou Barnaud, Juien Diard, Pierre Bessière and Jean-Luc Schwartz
Session: Speech and Speaker Perception
Sebastian, Jilt
Implementing Fusion Techniques for the Classification of Paralinguistic Information
Bogdan Vlasenko, Jilt Sebastian, Pavan Kumar D. S. and Mathew Magimai.-Doss
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech
Jilt Sebastian, Manoj Kumar, Pavan Kumar D. S., Mathew Magimai.-Doss, Hema Murthy and Shrikanth Narayanan
Session: Speaker State and Trait
Seelamantula, Chandra Sekhar
Multicomponent 2-D AM-FM Modeling of Speech Spectrograms
Jitendra Kumar Dhiman, Neeraj Sharma and Chandra Sekhar Seelamantula
Session: Speech Analysis and Representation
An Optimization Framework for Recovery of Speech from Phase-Encoded Spectrograms
Abhilash Sainathan, Sunil Rudresh and Chandra Sekhar Seelamantula
Session: Speech Analysis and Representation
Speech Enhancement Using the Minimum-probability-of-error Criterion
Jishnu Sadasivan, Subhadip Mukherjee and Chandra Sekhar Seelamantula
Session: Novel Approaches to Enhancement
Seiderer, Andreas
Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?
Johannes Wagner, Dominik Schiller, Andreas Seiderer and Elisabeth André
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Sell, Gregory
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
Sethu, Vidhyasaharan
Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric
Kaavya Sriskandaraja, Vidhyasaharan Sethu and Eliathamby Ambikairajah
Session: Spoofing Detection
Modulation Dynamic Features for the Detection of Replay Attacks
Gajan Suthokumar, Vidhyasaharan Sethu, Chamith Wijenayake and Eliathamby Ambikairajah
Session: Spoofing Detection
Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification
Sarith Fernando, Vidhyasaharan Sethu and Eliathamby Ambikairajah
Session: Language Identification
Demonstrating and Modelling Systematic Time-varying Annotator Disagreement in Continuous Emotion Annotation
Mia Atcheson, Vidhyasaharan Sethu and Julien Epps
Session: Emotion Recognition and Analysis
Shafaei-Bajestan, Elnaz
Wide Learning for Auditory Comprehension
Elnaz Shafaei-Bajestan and R. Harald Baayen
Session: Models of Speech Perception
Shah, Nirmesh
Effectiveness of Dynamic Features in INCA and Temporal Context-INCA
Nirmesh Shah and Hemant Patil
Session: Speech Analysis and Representation
Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion
Nirmesh Shah, Maulik C. Madhavi and Hemant Patil
Session: Voice Conversion and Speech Synthesis
Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion
Neil Shah, Nirmesh Shah and Hemant Patil
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Shah, Nisar
Development of Large Vocabulary Speech Recognition System with Keyword Search for Manipuri
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Speech Recognition for Indian Languages
An Automatic Speech Transcription System for Manipuri Language
Tanvina Patel, Krishna D N, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar and Anuroop Iyengar
Session: Show and Tell 6
Shahnawazuddin, Syed
Enhancement of Noisy Speech Signal by Non-Local Means Estimation of Variational Mode Functions
Nagapuri Srinivas, Gayadhar Pradhan and Syed Shahnawazuddin
Session: Novel Approaches to Enhancement
Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition
Ishwar Chandra Yadav, Avinash Kumar, Syed Shahnawazuddin and Gayadhar Pradhan
Session: Robust Speech Recognition
Shapiro, Avi
Attention-based Sequence Classification for Affect Detection
Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani and John Kane
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Sharma, Dravyansh
On Training and Evaluation of Grapheme-to-Phoneme Mappings with Limited Data
Dravyansh Sharma
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction
Antoine Bruguier, Anton Bakhtin and Dravyansh Sharma
Session: Acoustic Modelling
Sharma, Pulkit
Deep Convex Representations: Feature Representations for Bioacoustics Classification
Anshul Thakur, Vinayak Abrol, Pulkit Sharma and Padmanabhan Rajan
Session: Signal Analysis for the Natural, Biological and Social Sciences
ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM
Pulkit Sharma, Vinayak Abrol and Anshul Thakur
Session: Acoustic Scenes and Rare Events
Sharon, Rini A
Correlational Networks for Speaker Normalization in Automatic Speech Recognition
Rini A Sharon, Sandeep Reddy Kothinti and Umesh Srinivasan
Session: Acoustic Model Adaptation
Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition
Vishwas M. Shetty, Rini A Sharon, Basil Abraham, Tejaswi Seeram, Anusha Prakash, Nithya Ravi and S. Umesh
Session: Low Resource Speech Recognition Challenge for Indian Languages
Shechtman, Slava
The IBM Virtual Voice Creator
Alexander Sorin, Slava Shechtman, Zvi Kons, Ron Hoory, Shay Ben-David, Joe Pavitt, Shai Rozenberg, Carmel Rabinovitz and Tal Drory
Session: Show and Tell 2
Word Emphasis Prediction for Expressive Text to Speech
Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev and David Konopnicki
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Shen, Peng
Temporal Attentive Pooling for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao and Hisashi Kawai
Session: Audio Events and Acoustic Scenes
Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li and Hisashi Kawai
Session: Language Identification
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara and Hisashi Kawai
Session: Acoustic Modelling
Shen, Yilin
Training Recurrent Neural Network through Moment Matching for NLP Applications
Yue Deng, Yilin Shen, KaWai Chen and Hongxia Jin
Session: Language Modeling
A Deep Reinforcement Learning Based Multimodal Coaching Model (DCM) for Slot Filling in Spoken Language Understanding(SLU)
Yu Wang, Abhishek Patel, Yilin Shen and Hongxia Jin
Session: Spoken Language Understanding
Robust Spoken Language Understanding via Paraphrasing
Avik Ray, Yilin Shen and Hongxia Jin
Session: Spoken Language Understanding
User Information Augmented Semantic Frame Parsing Using Progressive Neural Networks
Yilin Shen, Xiangyu Zeng, Yu Wang and Hongxia Jin
Session: Spoken Language Understanding
Shi, Ziqiang
Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
Session: Speaker Verification I
Joint Learning of J-Vector Extractor and Joint Bayesian Model for Text Dependent Speaker Verification
Ziqiang Shi, Liu Liu, Huibin Lin and Rujie Liu
Session: Speaker Verification II
Latent Factor Analysis of Deep Bottleneck Features for Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
Session: Speaker Verification II
Shinoda, Koichi
Detecting Alzheimer’s Disease Using Gated Convolutional Neural Network from Audio Data
Tifani Warnita, Nakamasa Inoue and Koichi Shinoda
Session: Integrating Speech Science and Technology for Clinical Applications
Attentive Statistics Pooling for Deep Speaker Embedding
Koji Okabe, Takafumi Koshinaka and Koichi Shinoda
Session: Speaker Verification Using Neural Network Methods I
I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification
Jiacen Zhang, Nakamasa Inoue and Koichi Shinoda
Session: Speaker Verification Using Neural Network Methods II
Shinohara, Yusuke
Automatic DNN Node Pruning Using Mixture Distribution-based Group Regularization
Tsukasa Yoshida, Takafumi Moriya, Kazuho Watanabe, Yusuke Shinohara, Yoshikazu Yamaguchi and Yushi Aono
Session: Selected Topics in Neural Speech Processing
Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition
Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, Yoshikazu Yamaguchi and Yushi Aono
Session: Adjusting to Speaker, Accent, and Domain
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono and Tatsuya Kawahara
Session: Adjusting to Speaker, Accent, and Domain
Shlykov, Andrei
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Sidorov, Kirill
Computational Paralinguistics: Automatic Assessment of Emotions, Mood and Behavioural State from Acoustics of Speech
Zafi Sherhan Syed, Julien Schroeter, Kirill Sidorov and David Marshall
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Silnova, Anna
Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors
Anna Silnova, Niko Brümmer, Daniel Garcia-Romero, David Snyder and Lukáš Burget
Session: Speaker Verification I
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Singh, Astha
Relating Articulatory Motions in Different Speaking Rates
Astha Singh, G. Nisha Meenakshi and Prasanta Kumar Ghosh
Session: Source and Supra-segmentals
Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training
Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
Sinha, Rohit
Exploration of Compressed ILPR Features for Replay Attack Detection
Sarfaraz Jelil, Sishir Kalita, S R Mahadeva Prasanna and Rohit Sinha
Session: Spoofing Detection
Robust Mizo Continuous Speech Recognition
Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente, Parismita Gogoi, Priyankoo Sarmah, S R Mahadeva Prasanna, Rohit Sinha and Nirmala S.R.
Session: Speech Recognition for Indian Languages
A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model
Sreeram Ganji and Rohit Sinha
Session: Speech Technologies for Code-Switching in Multilingual Communities
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya and Nirmala S.R.
Session: Show and Tell 7
Sinyavskaya, Yadviga
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Sisman, Berrak
Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion
Berrak Sisman and Haizhou Li
Session: Prosody Modeling and Generation
A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder
Berrak Sisman, Mingyang Zhang and Haizhou Li
Session: Voice Conversion and Speech Synthesis
Sitaram, Sunayana
Effect of TTS Generated Audio on OOV Detection and Word Error Rate in ASR for Low-resource Languages
Savitha Murthy, Dinkar Sitaram and Sunayana Sitaram
Session: Speech Recognition for Indian Languages
Homophone Identification and Merging for Code-switched Speech Recognition
Brij Mohan Lal Srivastava and Sunayana Sitaram
Session: Speech Technologies for Code-Switching in Multilingual Communities
Sivaraman, Ganesh
Noise Robust Acoustic to Articulatory Speech Inversion
Nadee Seneviratne, Ganesh Sivaraman, Vikramjit Mitra and Carol Espy-Wilson
Session: Articulatory Information, Modeling and Inversion
Speech Synthesis in the Wild
Ganesh Sivaraman, Parav Nagarsheth and Elie Khoury
Session: Show and Tell 7
Snyder, David
Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors
Anna Silnova, Niko Brümmer, Daniel Garcia-Romero, David Snyder and Lukáš Burget
Session: Speaker Verification I
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
Yingke Zhu, Tom Ko, David Snyder, Brian Mak and Daniel Povey
Session: Speaker Verification Using Neural Network Methods II
Song, Eunwoo
A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems
Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim and Hong-Goo Kang
Session: Statistical Parametric Speech Synthesis
Acoustic Modeling Using Adversarially Trained Variational Recurrent Neural Network for Speech Synthesis
Joun Yeop Lee, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim and Eunwoo Song
Session: Statistical Parametric Speech Synthesis
Song, Yan
Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification
Lanhua You, Wu Guo, Yan Song and Sheng Zhang
Session: Speaker Verification I
Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition
Jian Tang, Yan Song, Lirong Dai and Ian McLoughlin
Session: Novel Neural Network Architectures for Acoustic Modelling
An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition
Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
Session: Representation Learning for Emotion
Early Detection of Continuous and Partial Audio Events Using CNN
Ian McLoughlin, Yan Song, Lam Dang Pham, Ramaswamy Palaniappan, Huy Phan and Yue Lang
Session: Acoustic Scenes and Rare Events
An Improved Deep Embedding Learning Method for Short Duration Speaker Verification
Zhifu Gao, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
Session: Speaker Verification Using Neural Network Methods II
Sorin, Alexander
The IBM Virtual Voice Creator
Alexander Sorin, Slava Shechtman, Zvi Kons, Ron Hoory, Shay Ben-David, Joe Pavitt, Shai Rozenberg, Carmel Rabinovitz and Tal Drory
Session: Show and Tell 2
Data Augmentation Improves Recognition of Foreign Accented Speech
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin and Gakuto Kurata
Session: Adjusting to Speaker, Accent, and Domain
Sperber, Matthias
Low-Latency Neural Speech Translation
Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber and Alex Waibel
Session: Selected Topics in Neural Speech Processing
Self-Attentional Acoustic Models
Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker and Alex Waibel
Session: Acoustic Modelling
Stauffer, Allen
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson and Martin Graciarena
Session: Speaker Verification II
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Steidl, Stefan
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Stephenson, Cory
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Deep Speech Denoising with Vector Space Projections
Jeffrey Hetherly, Paul Gamble, Maria Alejandra Barrios, Cory Stephenson and Karl Ni
Session: Source Separation from Monaural Input
Stylianou, Yannis
A Case Study on the Importance of Belief State Representation for Dialogue Policy Management
Margarita Kotti, Vassilios Diakoloukas, Alexandros Papangelis, Michail Lagoudakis and Yannis Stylianou
Session: Multimodal Dialogue Systems
Weighting Time-Frequency Representation of Speech Using Auditory Saliency for Automatic Speech Recognition
Cong-Thanh Do and Yannis Stylianou
Session: Robust Speech Recognition
Speech Intelligibility Enhancement Based on a Non-causal Wavenet-like Model
Muhammed Shifas PV, Vassilis Tsiaras and Yannis Stylianou
Session: Speech Intelligibility and Quality
Stüker, Sebastian
Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks
Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel and Sebastian Stüker
Session: Extracting Information from Audio
Neural Language Codes for Multilingual Acoustic Models
Markus Müller, Sebastian Stüker and Alex Waibel
Session: Adjusting to Speaker, Accent, and Domain
Self-Attentional Acoustic Models
Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker and Alex Waibel
Session: Acoustic Modelling
Su, Bo-Hao
Self-Assessed Affect Recognition Using Fusion of Attentional BLSTM and Static Acoustic Features
Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li and Chi-Chun Lee
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Su, Dan
Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
Session: Sequence Models for ASR
Deep Discriminative Embeddings for Duration Robust Speaker Verification
Na Li, Deyi Tuo, Dan Su, Zhifeng Li and Dong Yu
Session: Speaker Verification Using Neural Network Methods I
Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection
Meng Yu, Xuan Ji, Yi Gao, Lianwu Chen, Jie Chen, Jimeng Zheng, Dan Su and Dong Yu
Session: Topics in Speech Recognition
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu and Helen Meng
Session: Expressive Speech Synthesis
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Su, Rongfeng
Gaussian Process Neural Networks for Speech Recognition
Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu and Helen Meng
Session: Novel Neural Network Architectures for Acoustic Modelling
Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription
Rongfeng Su, Xunying Liu and Lan Wang
Session: Multimodal Systems
Subramanian, Aswin Shanmugam
Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline
Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu and Shinji Watanabe
Session: Robust Speech Recognition
Student-Teacher Learning for BLSTM Mask-based Speech Enhancement
Aswin Shanmugam Subramanian, Szu-Jui Chen and Shinji Watanabe
Session: Deep Enhancement
Suendermann-Oeft, David
Game-based Spoken Dialog Language Learning Applications for Young Students
Keelan Evanini, Veronika Timpe-Laughlin, Eugene Tsuprun, Ian Blood, Jeremy Lee, James Bruno, Vikram Ramanarayanan, Patrick Lange and David Suendermann-Oeft
Session: Show and Tell 2
Toward Scalable Dialog Technology for Conversational Language Learning: Case Study of the TOEFL® MOOC
Vikram Ramanarayanan, David Pautler, Patrick Lange, Eugene Tsuprun, Rutuja Ubale, Keelan Evanini and David Suendermann-Oeft
Session: Show and Tell 5
An Automated Assistant for Medical Scribes
Gregory Finley, Erik Edwards, Amanda Robinson, Najmeh Sadoughi, James Fone, Mark Miller, David Suendermann-Oeft, Michael Brenndoerfer and Nico Axtmann
Session: Show and Tell 7
Sun, Sining
Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition
Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang and Lei Xie
Session: Robust Speech Recognition
Training Augmentation with Adversarial Examples for Robust Speech Recognition
Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang and Lei Xie
Session: Adjusting to Speaker, Accent, and Domain
A Probability Weighted Beamformer for Noise Robust ASR
Suliang Bu, Yunxin Zhao, Meiyuh Hwang and Sining Sun
Session: Distant ASR
Sung, Wonyong
Hierarchical Recurrent Neural Networks for Acoustic Modeling
Jinhwan Park, Iksoo Choi, Yoonho Boo and Wonyong Sung
Session: Acoustic Modelling
Character-level Language Modeling with Gated Hierarchical Recurrent Neural Networks
Iksoo Choi, Jinhwan Park and Wonyong Sung
Session: ASR Systems and Technologies
Syed, Zafi Sherhan
Computational Paralinguistics: Automatic Assessment of Emotions, Mood and Behavioural State from Acoustics of Speech
Zafi Sherhan Syed, Julien Schroeter, Kirill Sidorov and David Marshall
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Szöke, Igor
BUT OpenSAT 2017 Speech Recognition System
Martin Karafiát, Murali Karthick Baskar, Igor Szöke, Vladimír Malenovský, Karel Veselý, František Grézl, Lukáš Burget and Jan Černocký
Session: Topics in Speech Recognition
Lightly Supervised vs. Semi-supervised Training of Acoustic Model on Luxembourgish for Low-resource Automatic Speech Recognition
Karel Veselý, Carlos Segura, Igor Szöke, Jordi Luque and Jan Černocký
Session: Neural Network Training Strategies for ASR
Tak, Hemlata
Effectiveness of Speech Demodulation-Based Features for Replay Detection
Madhu Kamble, Hemlata Tak and Hemant Patil
Session: Spoofing Detection
Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection
Hemlata Tak and Hemant Patil
Session: Speech Analysis and Representation
Takanashi, Katsuya
Engagement Recognition in Spoken Dialogue via Neural Network by Aggregating Different Annotators' Models
Koji Inoue, Divesh Lala, Katsuya Takanashi and Tatsuya Kawahara
Session: Spoken Dialogue Systems and Conversational Analysis
Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers
Kohei Hara, Koji Inoue, Katsuya Takanashi and Tatsuya Kawahara
Session: Multimodal Dialogue Systems
Tanaka, Hiroki
Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents
Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo and Satoshi Nakamura
Session: Integrating Speech Science and Technology for Clinical Applications
Tanaka, Tomohiro
Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder
Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hirokazu Masataki and Yushi Aono
Session: Selected Topics in Neural Speech Processing
Neural Error Corrective Language Models for Automatic Speech Recognition
Tomohiro Tanaka, Ryo Masumura, Hirokazu Masataki and Yushi Aono
Session: ASR Systems and Technologies
Tang, Hao
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
Wei-Ning Hsu, Hao Tang and James Glass
Session: Robust Speech Recognition
A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition
Hao Tang, Wei-Ning Hsu, François Grondin and James Glass
Session: Neural Network Training Strategies for ASR
Tansuwan, Justin
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Tao, Jianhua
BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End
Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ya Li
Session: Prosody Modeling and Generation
Sparsity-Constrained Weight Mapping for Head-Related Transfer Functions Individualization from Anthropometric Features
Xiaoke Qi and Jianhua Tao
Session: Source Separation and Spatial Analysis
Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
Session: Statistical Parametric Speech Synthesis
On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis
Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ruibo Fu
Session: Statistical Parametric Speech Synthesis
Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer
Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
Session: Speech Synthesis Paradigms and Methods
Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement
Shuai Nie, Shan Liang, Bin Liu, Yaping Zhang, Wenju Liu and Jianhua Tao
Session: Deep Enhancement
Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function
Jian Huang, Ya Li, Jianhua Tao and Zhen Lian
Session: Emotion Recognition and Analysis
Taylor, Kye
Attention-based Sequence Classification for Affect Detection
Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani and John Kane
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Teixeira, Francisco
Mining Multimodal Repositories for Speech Affecting Diseases
Joana Correia, Bhiksha Raj, Isabel Trancoso and Francisco Teixeira
Session: Application of ASR in Medical Practice
Patient Privacy in Paralinguistic Tasks
Francisco Teixeira, Alberto Abad and Isabel Trancoso
Session: Speech Pathology, Depression, and Medical Applications
Teplansky, Kristin
Automatic Speech Recognition with Articulatory Information and a Unified Dictionary for Hindi, Marathi, Bengali and Oriya
Debadatta Dash, Myungjong Kim, Kristin Teplansky and Jun Wang
Session: Speech Recognition for Indian Languages
Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
Kwanghoon An, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman and Jun Wang
Session: Integrating Speech Science and Technology for Clinical Applications
Terhorst, Yannik
State of Mind: Classification through Self-reported Affect and Word Use in Speech.
Eva-Maria Rathner, Yannik Terhorst, Nicholas Cummins, Björn Schuller and Harald Baumeister
Session: Speaker State and Trait
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe and Harald Baumeister
Session: Speech Pathology, Depression, and Medical Applications
Terissi, Lucas D.
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information
Lucas D. Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan C. Gómez, Bernard Girau and Nancy Hitschfeld-Kahler
Session: Spoken Corpora and Annotation
Thakur, Anshul
All-Conv Net for Bird Activity Detection: Significance of Learned Pooling
Arjun Pankajakshan, Anshul Thakur, Daksh Thapar, Padmanabhan Rajan and Aditya Nigam
Session: Signal Analysis for the Natural, Biological and Social Sciences
Deep Convex Representations: Feature Representations for Bioacoustics Classification
Anshul Thakur, Vinayak Abrol, Pulkit Sharma and Padmanabhan Rajan
Session: Signal Analysis for the Natural, Biological and Social Sciences
ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM
Pulkit Sharma, Vinayak Abrol and Anshul Thakur
Session: Acoustic Scenes and Rare Events
Thomas, Samuel
Data Augmentation Improves Recognition of Foreign Accented Speech
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin and Gakuto Kurata
Session: Adjusting to Speaker, Accent, and Domain
Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models
Masayuki Suzuki, Tohru Nagano, Gakuto Kurata and Samuel Thomas
Session: Neural Network Training Strategies for ASR
Tjandra, Andros
Compressing End-to-end ASR Networks by Tensor-Train Decomposition
Takuma Mori, Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
Session: Sequence Models for ASR
Machine Speech Chain with One-shot Speaker Adaptation
Andros Tjandra, Sakriani Sakti and Satoshi Nakamura
Session: Acoustic Model Adaptation
Toda, Tomoki
Multi-Head Decoder for End-to-End Speech Recognition
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda and Kazuya Takeda
Session: Sequence Models for ASR
Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder
Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing and Tomoki Toda
Session: Voice Conversion and Speech Synthesis
Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis
Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda and Toshio Irino
Session: Voice Conversion and Speech Synthesis
Audio-visual Voice Conversion Using Deep Canonical Correlation Analysis for Deep Bottleneck Features
Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu and Tomoki Toda
Session: Speech Synthesis Paradigms and Methods
Designing a Pneumatic Bionic Voice Prosthesis - A Statistical Approach for Source Excitation Generation
Farzaneh Ahmadi and Tomoki Toda
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Todisco, Massimiliano
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion
Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen and Junichi Yamagishi
Session: Speaker Verification I
Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders
Pramod Bachhav, Massimiliano Todisco and Nicholas Evans
Session: Novel Approaches to Enhancement
Speech Database and Protocol Validation Using Waveform Entropy
Itshak Lapidot, Héctor Delgado, Massimiliano Todisco, Nicholas Evans and Jean-François Bonastre
Session: Spoken Corpora and Annotation
Tomashenko, Natalia
Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition
Natalia Tomashenko, Yuri Khokhlov and Yannick Estève
Session: Adjusting to Speaker, Accent, and Domain
An Investigation of Mixup Training Strategies for Acoustic Models in ASR
Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Dmitry Popov, Natalia Tomashenko, Ivan Sorokin and Alexander Zatvornitskiy
Session: Neural Network Training Strategies for ASR
Trancoso, Isabel
Acoustic-prosodic Entrainment in Structural Metadata Events
Vera Cabarrão, Fernando Batista, Helena Moniz, Isabel Trancoso and Ana Isabel Mata
Session: Speech Prosody
Mining Multimodal Repositories for Speech Affecting Diseases
Joana Correia, Bhiksha Raj, Isabel Trancoso and Francisco Teixeira
Session: Application of ASR in Medical Practice
Patient Privacy in Paralinguistic Tasks
Francisco Teixeira, Alberto Abad and Isabel Trancoso
Session: Speech Pathology, Depression, and Medical Applications
Trinh, Viet Anh
Concatenative Resynthesis with Improved Training Signals for Speech Enhancement
Ali Raza Syed, Viet Anh Trinh and Michael Mandel
Session: Novel Approaches to Enhancement
Bubble Cooperative Networks for Identifying Important Speech Cues
Viet Anh Trinh, Brian McFee and Michael I Mandel
Session: Robust Speech Recognition
Tripathi, Anshuman
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition
Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara Sainath, Parisa Haghani, Bo Li and Michiel Bacchiani
Session: Acoustic Model Adaptation
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Trmal, Jan
The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines
Jon Barker, Shinji Watanabe, Emmanuel Vincent and Jan Trmal
Session: Robust Speech Recognition
Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages
Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak and Sanjeev Khudanpur
Session: Extracting Information from Audio
Tsao, Yu
Exemplar-Based Spectral Detail Compensation for Voice Conversion
Yu-Huai Peng, Hsin-Te Hwang, Yichiao Wu, Yu Tsao and Hsin-Min Wang
Session: Voice Conversion
Temporal Attentive Pooling for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao and Hisashi Kawai
Session: Audio Events and Acoustic Scenes
Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM
Szu-wei Fu, Yu Tsao, Hsin-Te Hwang and Hsin-Min Wang
Session: Speech Intelligibility and Quality
Tsiaras, Vassilis
Speech Intelligibility Enhancement Based on a Non-causal Wavenet-like Model
Muhammed Shifas PV, Vassilis Tsiaras and Yannis Stylianou
Session: Speech Intelligibility and Quality
Speaker-independent Raw Waveform Model for Glottal Excitation
Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi and Paavo Alku
Session: Voice Conversion and Speech Synthesis
Tsuprun, Eugene
Game-based Spoken Dialog Language Learning Applications for Young Students
Keelan Evanini, Veronika Timpe-Laughlin, Eugene Tsuprun, Ian Blood, Jeremy Lee, James Bruno, Vikram Ramanarayanan, Patrick Lange and David Suendermann-Oeft
Session: Show and Tell 2
Toward Scalable Dialog Technology for Conversational Language Learning: Case Study of the TOEFL® MOOC
Vikram Ramanarayanan, David Pautler, Patrick Lange, Eugene Tsuprun, Rutuja Ubale, Keelan Evanini and David Suendermann-Oeft
Session: Show and Tell 5
Tu, Ming
A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment
Megan Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu and Visar Berisha
Session: Spoken Dialogue Systems and Conversational Analysis
Investigating the Role of L1 in Automatic Pronunciation Evaluation of L2 Speech
Ming Tu, Anna Grabek, Julie Liss and Visar Berisha
Session: Applications in Education and Learning
Tucker, Benjamin V.
A Comparison of Input Types to a Deep Neural Network-based Forced Aligner
Matthew C. Kelley and Benjamin V. Tucker
Session: Syllabification, Rhythm, and Voice Activity Detection
Implementing DIANA to Model Isolated Auditory Word Recognition in English
Filip Nenadić, Louis ten Bosch and Benjamin V. Tucker
Session: Speech and Speaker Perception
Turan, Mehmet Ali Tuğtekin
Monitoring Infant's Emotional Cry in Domestic Environments Using the Capsule Network Architecture
Mehmet Ali Tuğtekin Turan and Engin Erzin
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Tzirakis, Panagiotis
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Tóth, László
General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats
Gábor Gosztolya, Tamás Grósz and László Tóth
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó and Tamás Gábor Csapó
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Ubale, Rutuja
Toward Scalable Dialog Technology for Conversational Language Learning: Case Study of the TOEFL® MOOC
Vikram Ramanarayanan, David Pautler, Patrick Lange, Eugene Tsuprun, Rutuja Ubale, Keelan Evanini and David Suendermann-Oeft
Session: Show and Tell 5
Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task
Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan and Aoife Cahill
Session: Spoken CALL Shared Task, Second Edition
Ueno, Sei
Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition
Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, Yoshikazu Yamaguchi and Yushi Aono
Session: Adjusting to Speaker, Accent, and Domain
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono and Tatsuya Kawahara
Session: Adjusting to Speaker, Accent, and Domain
Ujiro, Tsuyoki
Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents
Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo and Satoshi Nakamura
Session: Integrating Speech Science and Technology for Clinical Applications
Unno, Yuya
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Usova, Maria
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
V, Vishnu Vidyadhara Raju
An Exploration towards Joint Acoustic Modeling for Indian Languages: IIIT-H Submission for Low Resource Speech Recognition Challenge for Indian Languages, INTERSPEECH 2018
Hari Krishna, Krishna Gurugubelli, Vishnu Vidyadhara Raju V and Anil Kumar Vuppala
Session: Low Resource Speech Recognition Challenge for Indian Languages
Vachhani, Bhavik
Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder
Chitralekha Bhat, Biswajit Das, Bhavik Vachhani and Sunil Kumar Kopparapu
Session: Automatic Detection and Recognition of Voice and Speech Disorders
Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition
Bhavik Vachhani, Chitralekha Bhat and Sunil Kumar Kopparapu
Session: Automatic Detection and Recognition of Voice and Speech Disorders
Valentini-Botinhao, Cassia
Exemplar-based Speech Waveform Generation
Oliver Watts, Cassia Valentini-Botinhao, Felipe Espic and Simon King
Session: Voice Conversion and Speech Synthesis
Velmurugan, Rajbabu
A Non-convolutive NMF Model for Speech Dereverberation
Nikhil M, Rajbabu Velmurugan and Preeti Rao
Session: Dereverberation
Vepa, Jithendra
CACTAS - Collaborative Audio Categorization and Transcription for ASR Systems
Mithul Mathivanan, Kinnera Saranu, Abhishek Pandey and Jithendra Vepa
Session: Show and Tell 4
Hierarchical Accent Determination and Application in a Large Scale ASR System
Ramya Viswanathan, Periyasamy Paramasivam and Jithendra Vepa
Session: Show and Tell 5
Speech Emotion Recognition Using Spectrogram & Phoneme Embedding
Promod Yenigalla, Abhay Kumar, Suraj Tripathi, Chirag Singh, Sibsambhu Kar and Jithendra Vepa
Session: Emotion Recognition and Analysis
Leveraging Second-Order Log-Linear Model for Improved Deep Learning Based ASR Performance
Ankit Raj, Shakti P Rath and Jithendra Vepa
Session: Acoustic Modelling
Verkholyak, Oxana
LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition
Heysem Kaya, Dmitrii Fedotov, Ali Yeşilkanat, Oxana Verkholyak, Yang Zhang and Alexey Karpov
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Veselý, Karel
BUT OpenSAT 2017 Speech Recognition System
Martin Karafiát, Murali Karthick Baskar, Igor Szöke, Vladimír Malenovský, Karel Veselý, František Grézl, Lukáš Burget and Jan Černocký
Session: Topics in Speech Recognition
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge
Lightly Supervised vs. Semi-supervised Training of Acoustic Model on Luxembourgish for Low-resource Automatic Speech Recognition
Karel Veselý, Carlos Segura, Igor Szöke, Jordi Luque and Jan Černocký
Session: Neural Network Training Strategies for ASR
Viegas, Carla
Investigating Utterance Level Representations for Detecting Intent from Acoustics
SaiKrishna Rallabandi, Bhavya Karki, Carla Viegas, Eric Nyberg and Alan W Black
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Viksnin, Ilya
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Villalba, Jesús
An Investigation of Non-linear i-vectors for Speaker Verification
Nanxin Chen, Jesús Villalba and Najim Dehak
Session: Speaker Verification I
Investigation on Bandwidth Extension for Speaker Recognition
Phani Sankar Nidadavolu, Cheng-I Lai, Jesús Villalba and Najim Dehak
Session: Speaker Verification II
Effectiveness of Single-Channel BLSTM Enhancement for Language Identification
Peter Sibbern Frederiksen, Jesús Villalba, Shinji Watanabe, Zheng-Hua Tan and Najim Dehak
Session: Language Identification
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts
Jaejin Cho, Raghavendra Pappagari, Purva Kulkarni, Jesús Villalba, Yishay Carmiel and Najim Dehak
Session: Speaker State and Trait
End-to-end Deep Neural Network Age Estimation
Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur and Najim Dehak
Session: Speaker State and Trait
Villemoes, Lars
Temporal Noise Shaping with Companding
Arijit Biswas, Per Hedelin, Lars Villemoes and Vinay Melkote
Session: Coding
Vincent, Emmanuel
The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines
Jon Barker, Shinji Watanabe, Emmanuel Vincent and Jan Trmal
Session: Robust Speech Recognition
Keyword Based Speaker Localization: Localizing a Target Speaker in a Multi-speaker Environment
Sunit Sivasankaran, Emmanuel Vincent and Dominique Fohr
Session: Spatial and Phase Cues for Source Separation and Speech Recognition
Virtanen, Tuomas
Expectation-Maximization Algorithms for Itakura-Saito Nonnegative Matrix Factorization
Paul Magron and Tuomas Virtanen
Session: Source Separation and Spatial Analysis
Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation
Paul Magron, Konstantinos Drossos, Stylianos Ioannis Mimilakis and Tuomas Virtanen
Session: Deep Learning for Source Separation and Pitch Tracking
Vlasenko, Bogdan
Implementing Fusion Techniques for the Classification of Paralinguistic Information
Bogdan Vlasenko, Jilt Sebastian, Pavan Kumar D. S. and Mathew Magimai.-Doss
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Vuppala, Anil Kumar
An Exploration towards Joint Acoustic Modeling for Indian Languages: IIIT-H Submission for Low Resource Speech Recognition Challenge for Indian Languages, INTERSPEECH 2018
Hari Krishna, Krishna Gurugubelli, Vishnu Vidyadhara Raju V and Anil Kumar Vuppala
Session: Low Resource Speech Recognition Challenge for Indian Languages
Vásquez Correa, Juan Camilo
A Multitask Learning Approach to Assess the Dysarthria Severity in Patients with Parkinson's Disease
Juan Camilo Vásquez Correa, Tomas Arias, Juan Rafael Orozco-Arroyave and Elmar Nöth
Session: Automatic Detection and Recognition of Voice and Speech Disorders
Multimodal I-vectors to Detect and Evaluate Parkinson's Disease
Nicanor Garcia, Juan Camilo Vásquez Correa, Juan Rafael Orozco-Arroyave and Elmar Nöth
Session: Speech and Language Analytics for Mental Health
Wagner, Johannes
Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?
Johannes Wagner, Dominik Schiller, Andreas Seiderer and Elisabeth André
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Waibel, Alex
Low-Latency Neural Speech Translation
Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber and Alex Waibel
Session: Selected Topics in Neural Speech Processing
Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks
Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel and Sebastian Stüker
Session: Extracting Information from Audio
Neural Language Codes for Multilingual Acoustic Models
Markus Müller, Sebastian Stüker and Alex Waibel
Session: Adjusting to Speaker, Accent, and Domain
Self-Attentional Acoustic Models
Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker and Alex Waibel
Session: Acoustic Modelling
Subword and Crossword Units for CTC Acoustic Models
Thomas Zenkel, Ramon Sanabria, Florian Metze and Alex Waibel
Session: ASR Systems and Technologies
Wan, Nathan
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Wang, Chao
A Simple Model for Detection of Rare Sound Events
Weiran Wang, Chieh-Chi Kao and Chao Wang
Session: Audio Events and Acoustic Scenes
R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection
Chieh-Chi Kao, Weiran Wang, Ming Sun and Chao Wang
Session: Audio Events and Acoustic Scenes
Detecting Media Sound Presence in Acoustic Scenes
Constantinos Papayiannis, Justice Amoh, Viktor Rozgic, Shiva Sundaram and Chao Wang
Session: Audio Events and Acoustic Scenes
Wang, DeLiang
A New Framework for Supervised Speech Enhancement in the Time Domain
Ashutosh Pandey and DeLiang Wang
Session: Novel Approaches to Enhancement
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction
Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang and John Hershey
Session: Spatial and Phase Cues for Source Separation and Speech Recognition
Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation
Zhong-Qiu Wang and DeLiang Wang
Session: Spatial and Phase Cues for Source Separation and Speech Recognition
A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement
Ke Tan and DeLiang Wang
Session: Deep Enhancement
All-Neural Multi-Channel Speech Enhancement
Zhong-Qiu Wang and DeLiang Wang
Session: Deep Enhancement
Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios
Hao Zhang and DeLiang Wang
Session: Deep Enhancement
A Two-Stage Approach to Noisy Cochannel Speech Separation with Gated Residual Networks
Ke Tan and DeLiang Wang
Session: Source Separation from Monaural Input
Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks
Zhong-Qiu Wang, Xueliang Zhang and DeLiang Wang
Session: Deep Learning for Source Separation and Pitch Tracking
Wang, Hsin-Min
Exemplar-Based Spectral Detail Compensation for Voice Conversion
Yu-Huai Peng, Hsin-Te Hwang, Yichiao Wu, Yu Tsao and Hsin-Min Wang
Session: Voice Conversion
Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM
Szu-wei Fu, Yu Tsao, Hsin-Te Hwang and Hsin-Min Wang
Session: Speech Intelligibility and Quality
Wang, Jun
Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
Session: Sequence Models for ASR
Automatic Speech Recognition with Articulatory Information and a Unified Dictionary for Hindi, Marathi, Bengali and Oriya
Debadatta Dash, Myungjong Kim, Kristin Teplansky and Jun Wang
Session: Speech Recognition for Indian Languages
Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
Kwanghoon An, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman and Jun Wang
Session: Integrating Speech Science and Technology for Clinical Applications
Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
Myungjong Kim, Beiming Cao, Kwanghoon An and Jun Wang
Session: Application of ASR in Medical Practice
Articulation-to-Speech Synthesis Using Articulatory Flesh Point Sensors’ Orientation Information
Beiming Cao, Myungjong Kim, Jun R. Wang, Jan van Santen, Ted Mau and Jun Wang
Session: Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Wang, Ke
Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition
Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang and Lei Xie
Session: Robust Speech Recognition
Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model
Ke Wang, Junbo Zhang, Yujun Wang and Lei Xie
Session: Adjusting to Speaker, Accent, and Domain
Wang, Lan
Acoustic Features Associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders
Wang Zhang, Xiangquan Gui, Tianqi Wang, Manwa Ng, Feng Yang, Lan Wang and Nan Yan
Session: Integrating Speech Science and Technology for Clinical Applications
Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription
Rongfeng Su, Xunying Liu and Lan Wang
Session: Multimodal Systems
Wang, Longbiao
Multiple Phase Information Combination for Replay Attacks Detection
Dongbo Li, Longbiao Wang, Jianwu Dang, Meng Liu, Zeyan Oo, Seiichi Nakagawa, Haotian Guan and Xiangang Li
Session: Spoofing Detection
Revealing Spatiotemporal Brain Dynamics of Speech Production Based on EEG and Eye Movement
Bin Zhao, Jinfeng Huang, Gaoyan Zhang, Jianwu Dang, Minbo Chen, YingjianFu and Longbiao Wang
Session: Cognition and Brain Studies
Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
Lili Guo, Longbiao Wang, Jianwu Dang, Linjuan Zhang, Haotian Guan and Xiangang Li
Session: Robust Speech Recognition
Wang, Tianqi
Acoustic Features Associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders
Wang Zhang, Xiangquan Gui, Tianqi Wang, Manwa Ng, Feng Yang, Lan Wang and Nan Yan
Session: Integrating Speech Science and Technology for Clinical Applications
Wang, Weiran
A Simple Model for Detection of Rare Sound Events
Weiran Wang, Chieh-Chi Kao and Chao Wang
Session: Audio Events and Acoustic Scenes
R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection
Chieh-Chi Kao, Weiran Wang, Ming Sun and Chao Wang
Session: Audio Events and Acoustic Scenes
Wang, Yiming
A GPU-based WFST Decoder with Exact Lattice Generation
Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Recurrent Neural Models for ASR
Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Ke Li, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Language Modeling
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi and Sanjeev Khudanpur
Session: Acoustic Modelling
Wang, Yu
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems
Yu Wang, Chao Zhang, Mark Gales and Philip Woodland
Session: Acoustic Model Adaptation
Impact of ASR Performance on Free Speaking Language Assessment
Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines
Session: Applications in Education and Learning
A Deep Reinforcement Learning Based Multimodal Coaching Model (DCM) for Slot Filling in Spoken Language Understanding(SLU)
Yu Wang, Abhishek Patel, Yilin Shen and Hongxia Jin
Session: Spoken Language Understanding
User Information Augmented Semantic Frame Parsing Using Progressive Neural Networks
Yilin Shen, Xiangyu Zeng, Yu Wang and Hongxia Jin
Session: Spoken Language Understanding
Wang, Yujun
Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition
Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang and Lei Xie
Session: Robust Speech Recognition
Attention-based End-to-End Models for Small-Footprint Keyword Spotting
Changhao Shan, Junbo Zhang, Yujun Wang and Lei Xie
Session: Extracting Information from Audio
Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model
Ke Wang, Junbo Zhang, Yujun Wang and Lei Xie
Session: Adjusting to Speaker, Accent, and Domain
Wang, Yun
Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks
Yun Wang, Juncheng Li and Florian Metze
Session: Audio Events and Acoustic Scenes
The ACLEW DiViMe: An Easy-to-use Diarization Tool
Adrien Le Franc, Eric Riebling, Julien Karadayi, Yun Wang, Camila Scaff, Florian Metze and Alejandrina Cristia
Session: Speaker Diarization
Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection
Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley and Samarjit Das
Session: Acoustic Scenes and Rare Events
Wang, Zhong-Qiu
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction
Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang and John Hershey
Session: Spatial and Phase Cues for Source Separation and Speech Recognition
Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation
Zhong-Qiu Wang and DeLiang Wang
Session: Spatial and Phase Cues for Source Separation and Speech Recognition
All-Neural Multi-Channel Speech Enhancement
Zhong-Qiu Wang and DeLiang Wang
Session: Deep Enhancement
Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks
Zhong-Qiu Wang, Xueliang Zhang and DeLiang Wang
Session: Deep Learning for Source Separation and Pitch Tracking
Watanabe, Shinji
Multi-Head Decoder for End-to-End Speech Recognition
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda and Kazuya Takeda
Session: Sequence Models for ASR
The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines
Jon Barker, Shinji Watanabe, Emmanuel Vincent and Jan Trmal
Session: Robust Speech Recognition
Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline
Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu and Shinji Watanabe
Session: Robust Speech Recognition
Effectiveness of Single-Channel BLSTM Enhancement for Language Identification
Peter Sibbern Frederiksen, Jesús Villalba, Shinji Watanabe, Zheng-Hua Tan and Najim Dehak
Session: Language Identification
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Multi-Modal Data Augmentation for End-to-end ASR
Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner and Shinji Watanabe
Session: Adjusting to Speaker, Accent, and Domain
Auxiliary Feature Based Adaptation of End-to-end ASR Systems
Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita and Tomohiro Nakatani
Session: Adjusting to Speaker, Accent, and Domain
Semi-Supervised End-to-End Speech Recognition
Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa and Marc Delcroix
Session: End-to-End Speech Recognition
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe and Sanjeev Khudanpur
Session: The First DIHARD Speech Diarization Challenge
Student-Teacher Learning for BLSTM Mask-based Speech Enhancement
Aswin Shanmugam Subramanian, Szu-Jui Chen and Shinji Watanabe
Session: Deep Enhancement
Watts, Oliver
Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data
Zack Hodari, Oliver Watts, Srikanth Ronanki and Simon King
Session: Prosody Modeling and Generation
Exemplar-based Speech Waveform Generation
Oliver Watts, Cassia Valentini-Botinhao, Felipe Espic and Simon King
Session: Voice Conversion and Speech Synthesis
Wei, Xizi
Overview of the 2018 Spoken CALL Shared Task
Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei
Session: Spoken CALL Shared Task, Second Edition
The University of Birmingham 2018 Spoken CALL Shared Task Systems
Mengjie Qian, Xizi Wei, Peter Jančovič and Martin Russell
Session: Spoken CALL Shared Task, Second Edition
Wen, Zhengqi
BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End
Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ya Li
Session: Prosody Modeling and Generation
Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
Session: Statistical Parametric Speech Synthesis
On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis
Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ruibo Fu
Session: Statistical Parametric Speech Synthesis
Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer
Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
Session: Speech Synthesis Paradigms and Methods
Weng, Chao
Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
Session: Sequence Models for ASR
A Multistage Training Framework for Acoustic-to-Word Model
Chengzhu Yu, Chunlei Zhang, Chao Weng, Jia Cui and Dong Yu
Session: Sequence Models for ASR
Wickramasinghe, Buddhi
Detection of Replay-Spoofing Attacks Using Frequency Modulation Features
Tharshini Gunendradasan, Buddhi Wickramasinghe, Ngoc Phu Le, Eliathamby Ambikairajah and Julien Epps
Session: Spoofing Detection
Frequency Domain Linear Prediction Features for Replay Spoofing Attack Detection
Buddhi Wickramasinghe, Saad Irtza, Eliathamby Ambikairajah and Julien Epps
Session: Spoofing Detection
Wiesner, Matthew
Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages
Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak and Sanjeev Khudanpur
Session: Extracting Information from Audio
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala and Tsubasa Ochiai
Session: Recurrent Neural Models for ASR
Multi-Modal Data Augmentation for End-to-end ASR
Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner and Shinji Watanabe
Session: Adjusting to Speaker, Accent, and Domain
Willi, Megan
A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment
Megan Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu and Visar Berisha
Session: Spoken Dialogue Systems and Conversational Analysis
Triplet Network with Attention for Speaker Diarization
Huan Song, Megan Willi, Jayaraman J. Thiagarajan, Visar Berisha and Andreas Spanias
Session: Speaker Verification Using Neural Network Methods II
Williams, Ian
Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant
Leonid Velikovich, Ian Williams, Justin Scheiner, Petar Aleksic, Pedro Moreno and Michael Riley
Session: Recurrent Neural Models for ASR
Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search
Ian Williams, Anjuli Kannan, Petar Aleksic, David Rybach and Tara Sainath
Session: Recurrent Neural Models for ASR
Williamson, James
Vocal Biomarkers for Cognitive Performance Estimation in a Working Memory Task
Jennifer Sloboda, Adam Lammert, James Williamson, Christopher Smalt, Daryush D. Mehta, COL Ian Curry, Kristin Heaton, Jeffrey Palmer and Thomas Quatieri
Session: Speaker Characterization and Analysis
The Effect of Exposure to High Altitude and Heat on Speech Articulatory Coordination
James Williamson, Thomas Quatieri, Adam Lammert, Katherine Mitchell, Katherine Finkelstein, Nicole Ekon, Caitlin Dillon, Robert Kenefick and Kristin Heaton
Session: Speaker State and Trait
Wilson, Kevin
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Wong, Patrick C. M.
Learning Two Tone Languages Enhances the Brainstem Encoding of Lexical Tones
Akshay Raj Maggu, Wenqing Zong, Vina Law and Patrick C. M. Wong
Session: Cognition and Brain Studies
Experience-dependent Influence of Music and Language on Lexical Pitch Learning Is Not Additive
Akshay Raj Maggu, Patrick C. M. Wong, Hanjun Liu and Francis C. K. Wong
Session: Speech and Speaker Perception
Woodland, Philip
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems
Yu Wang, Chao Zhang, Mark Gales and Philip Woodland
Session: Acoustic Model Adaptation
Semi-tied Units for Efficient Gating in LSTM and Highway Networks
Chao Zhang and Philip Woodland
Session: Novel Neural Network Architectures for Acoustic Modelling
Combining Natural Gradient with Hessian Free Methods for Sequence Training
Adnan Haider and Philip Woodland
Session: Neural Network Training Strategies for ASR
Wu, Ji
Temporal Transformer Networks for Acoustic Scene Classification
Teng Zhang, Kailai Zhang and Ji Wu
Session: Audio Events and Acoustic Scenes
Data Independent Sequence Augmentation Method for Acoustic Scene Classification
Zhang Teng, Kailai Zhang and Ji Wu
Session: Acoustic Scenes and Rare Events
Multi-modal Attention Mechanisms in LSTM and Its Application to Acoustic Scene Classification
Teng Zhang, Kailai Zhang and Ji Wu
Session: Acoustic Scenes and Rare Events
Wu, Xixin
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance
Songxiang Liu, Jinghua Zhong, Lifa Sun, Xixin Wu, Xunying Liu and Helen Meng
Session: Voice Conversion
Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis
Xu Li, Shaoguang Mao, Xixin Wu, Kun Li, Xunying Liu and Helen Meng
Session: Second Language Acquisition and Code-switching
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
Jianwei Yu, Xurong Xie, Shansong Liu, Shoukang Hu, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu and Helen Meng
Session: Application of ASR in Medical Practice
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu and Helen Meng
Session: Expressive Speech Synthesis
Wu, Yonghui
Compression of End-to-End Models
Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang and Chung-Cheng Chiu
Session: End-to-End Speech Recognition
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Wu, Zhiyong
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection
Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng and Lianhong Cai
Session: Spoken Term Detection
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu and Helen Meng
Session: Expressive Speech Synthesis
Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method
Shuai Yang, Zhiyong Wu, Binbin Shen and Helen Meng
Session: Deep Learning for Source Separation and Pitch Tracking
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng and Lianhong Cai
Session: Emotion Recognition and Analysis
Xi, Zhonghua
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson and Zhonghua Xi
Session: Syllabification, Rhythm, and Voice Activity Detection
Xie, Lei
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search
Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma and Haizhou Li
Session: Spoken Term Detection
Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition
Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang and Lei Xie
Session: Robust Speech Recognition
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
Pengcheng Guo, Haihua Xu, Lei Xie and Eng Siong Chng
Session: Speech Technologies for Code-Switching in Multilingual Communities
Attention-based End-to-End Models for Small-Footprint Keyword Spotting
Changhao Shan, Junbo Zhang, Yujun Wang and Lei Xie
Session: Extracting Information from Audio
Training Augmentation with Adversarial Examples for Robust Speech Recognition
Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang and Lei Xie
Session: Adjusting to Speaker, Accent, and Domain
Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model
Ke Wang, Junbo Zhang, Yujun Wang and Lei Xie
Session: Adjusting to Speaker, Accent, and Domain
Xie, Xurong
Gaussian Process Neural Networks for Speech Recognition
Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu and Helen Meng
Session: Novel Neural Network Architectures for Acoustic Modelling
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
Jianwei Yu, Xurong Xie, Shansong Liu, Shoukang Hu, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu and Helen Meng
Session: Application of ASR in Medical Practice
Xie, Yanlu
A Preliminary Study on Tonal Coarticulation in Continuous Speech
Lixia Hao, Wei Zhang, Yanlu Xie and Jinsong Zhang
Session: Source and Supra-segmentals
Interactions between Vowels and Nasal Codas in Mandarin Speakers’ Perception of Nasal Finals
Chong Cao, Wei Wei, Wei Wang, Yanlu Xie and Jinsong Zhang
Session: Speech and Speaker Perception
Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention
Longfei Yang, Yanlu Xie and Jinsong Zhang
Session: Deep Learning for Source Separation and Pitch Tracking
Xiong, Shengwu
Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer
Yaxing Li, Eshete Derb Emiru, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yichang Li
Session: Coding
Multi-frame Coding of LSF Parameters Using Block-Constrained Trellis Coded Vector Quantization
Yaxing Li, Shan Xu, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yueming Ding
Session: Coding
Xu, Bo
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Shiyu Zhou, Linhao Dong, Shuang Xu and Bo Xu
Session: Sequence Models for ASR
Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin
Linhao Dong, Shiyu Zhou, Wei Chen and Bo Xu
Session: Sequence Models for ASR
Single-channel Speech Dereverberation via Generative Adversarial Training
Chenxing Li, Tieqiang Wang, Shuang Xu and Bo Xu
Session: Dereverberation
Xu, Haihua
Mandarin-English Code-switching Speech Recognition
Haihua Xu, Van Tung Pham, Zin Tun Kyaw, Zhi Hao Lim, Eng Siong Chng and Haizhou Li
Session: Show and Tell 2
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
Pengcheng Guo, Haihua Xu, Lei Xie and Eng Siong Chng
Session: Speech Technologies for Code-Switching in Multilingual Communities
Xu, Hainan
Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline
Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu and Shinji Watanabe
Session: Robust Speech Recognition
A GPU-based WFST Decoder with Exact Lattice Generation
Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Recurrent Neural Models for ASR
Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Ke Li, Hainan Xu, Yiming Wang, Daniel Povey and Sanjeev Khudanpur
Session: Language Modeling
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi and Sanjeev Khudanpur
Session: Acoustic Modelling
Xu, Mingxing
Imbalance Learning-based Framework for Fear Recognition in the MediaEval Emotional Impact of Movies Task
Xiaotong Zhang, Xingliang Cheng, Mingxing Xu and Thomas Fang Zheng
Session: Emotion Recognition and Analysis
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng and Lianhong Cai
Session: Emotion Recognition and Analysis
Xu, Shuang
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Shiyu Zhou, Linhao Dong, Shuang Xu and Bo Xu
Session: Sequence Models for ASR
Single-channel Speech Dereverberation via Generative Adversarial Training
Chenxing Li, Tieqiang Wang, Shuang Xu and Bo Xu
Session: Dereverberation
Yamagishi, Junichi
Investigating Accuracy of Pitch-accent Annotations in Neural Network-based Speech Synthesis and Denoising Effects
Hieu-Thi Luong, Xin Wang, Junichi Yamagishi and Nobuyuki Nishizawa
Session: Prosody Modeling and Generation
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion
Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen and Junichi Yamagishi
Session: Speaker Verification I
Speaker-independent Raw Waveform Model for Glottal Excitation
Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi and Paavo Alku
Session: Voice Conversion and Speech Synthesis
Multimodal Speech Synthesis Architecture for Unsupervised Speaker Adaptation
Hieu-Thi Luong and Junichi Yamagishi
Session: Speech Synthesis Paradigms and Methods
Expressive Speech Synthesis Using Sentiment Embeddings
Igor Jauk, Jaime Lorenzo-Trueba, Junichi Yamagishi and Antonio Bonafonte
Session: Expressive Speech Synthesis
Yamaguchi, Yoshikazu
Automatic DNN Node Pruning Using Mixture Distribution-based Group Regularization
Tsukasa Yoshida, Takafumi Moriya, Kazuho Watanabe, Yusuke Shinohara, Yoshikazu Yamaguchi and Yushi Aono
Session: Selected Topics in Neural Speech Processing
Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition
Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, Yoshikazu Yamaguchi and Yushi Aono
Session: Adjusting to Speaker, Accent, and Domain
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono and Tatsuya Kawahara
Session: Adjusting to Speaker, Accent, and Domain
Yan, Nan
Acoustic Features Associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders
Wang Zhang, Xiangquan Gui, Tianqi Wang, Manwa Ng, Feng Yang, Lan Wang and Nan Yan
Session: Integrating Speech Science and Technology for Clinical Applications
Yan, Yonghong
Cross-Lingual Multi-Task Neural Architecture for Spoken Language Understanding
Yujiang Li, Xuemin Zhao, Weiqun Xu and Yonghong Yan
Session: Spoken Dialogue Systems and Conversational Analysis
Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
Lu Yin, Ziteng Wang, Risheng Xia, Junfeng Li and Yonghong Yan
Session: Source Separation and Spatial Analysis
Output-Gate Projected Gated Recurrent Unit for Speech Recognition
Gaofeng Cheng, Daniel Povey, Lu Huang, Ji Xu, Sanjeev Khudanpur and Yonghong Yan
Session: Novel Neural Network Architectures for Acoustic Modelling
Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR
Li Wenjie, Gaofeng Cheng, Fengpei Ge, Pengyuan Zhang and Yonghong Yan
Session: Neural Network Training Strategies for ASR
Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling
Hangting Chen, Pengyuan Zhang, Haichuan Bai, Qingsheng Yuan, Xiuguo Bao and Yonghong Yan
Session: Acoustic Scenes and Rare Events
Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition
Yike Zhang, Pengyuan Zhang and Yonghong Yan
Session: Language Modeling
Yang, Feng
Acoustic Features Associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders
Wang Zhang, Xiangquan Gui, Tianqi Wang, Manwa Ng, Feng Yang, Lan Wang and Nan Yan
Session: Integrating Speech Science and Technology for Clinical Applications
Yanushevskaya, Irena
Voice Source Contribution to Prominence Perception: Rd Implementation
Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide and Christer Gobl
Session: Speech Segments and Voice Quality
On the Relationship between Glottal Pulse Shape and Its Spectrum: Correlations of Open Quotient, Pulse Skew and Peak Flow with Source Harmonic Amplitudes
Christer Gobl, Andy Murphy, Irena Yanushevskaya and Ailbhe Ní Chasaide
Session: Speech Segments and Voice Quality
Yarra, Chiranjeevi
Intonation tutor by SPIRE (In-SPIRE): An Online Tool for an Automatic Feedback to the Second Language Learners in Learning Intonation
Anand P A, Chiranjeevi Yarra, Kausthubha N K and Prasanta Kumar Ghosh
Session: Show and Tell 2
SPIRE-SST: An Automatic Web-based Self-learning Tool for Syllable Stress Tutoring (SST) to the Second Language Learners
Chiranjeevi Yarra, Anand P A, Kausthubha N K and Prasanta Kumar Ghosh
Session: Show and Tell 6
Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training
Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh and Prasanta Kumar Ghosh
Session: Articulatory Information, Modeling and Inversion
Yegnanarayana, Bayya
Discriminating Nasals and Approximants in English Language Using Zero Time Windowing
RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty and Bayya Yegnanarayana
Session: Speech Segments and Voice Quality
Identification and Classification of Fricatives in Speech Using Zero Time Windowing Method
RaviShankar Prasad and Bayya Yegnanarayana
Session: Speech Segments and Voice Quality
Detection of Glottal Closure Instants in Degraded Speech Using Single Frequency Filtering Analysis
Gunnam Aneeja, Sudarsana Reddy Kadiri and Bayya Yegnanarayana
Session: Measuring Pitch and Articulation
Estimation of Fundamental Frequency from Singing Voice Using Harmonics of Impulse-like Excitation Source
Sudarsana Reddy Kadiri and Bayya Yegnanarayana
Session: Measuring Pitch and Articulation
Determining Speaker Location from Speech in a Practical Environment
BHVS Narayanamurthy, JV Satyanarayana and Bayya Yegnanarayana
Session: Show and Tell 6
Breathy to Tense Voice Discrimination using Zero-Time Windowing Cepstral Coefficients (ZTWCCs)
Sudarsana Reddy Kadiri and Bayya Yegnanarayana
Session: Speech Segments and Voice Quality
Analysis and Detection of Phonation Modes in Singing Voice using Excitation Source Features and Single Frequency Filtering Cepstral Coefficients (SFFCC)
Sudarsana Reddy Kadiri and Bayya Yegnanarayana
Session: Deception, Personality, and Culture Attribute
Yeh, Sung-Lin
Self-Assessed Affect Recognition Using Fusion of Attentional BLSTM and Static Acoustic Features
Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li and Chi-Chun Lee
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Yeşilkanat, Ali
LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition
Heysem Kaya, Dmitrii Fedotov, Ali Yeşilkanat, Oxana Verkholyak, Yang Zhang and Alexey Karpov
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Yoshioka, Takuya
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks
Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao and Fil Alleva
Session: Distant ASR
Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation
Hakan Erdogan and Takuya Yoshioka
Session: Source Separation from Monaural Input
You, Lanhua
Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification
Lanhua You, Wu Guo, Yan Song and Sheng Zhang
Session: Speaker Verification I
Gated Convolutional Neural Network for Sentence Matching
Peixin Chen, Wu Guo, Zhi Chen, Jian Sun and Lanhua You
Session: Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis
Yu, Chengzhu
Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
Session: Sequence Models for ASR
A Multistage Training Framework for Acoustic-to-Word Model
Chengzhu Yu, Chunlei Zhang, Chao Weng, Jia Cui and Dong Yu
Session: Sequence Models for ASR
Fearless Steps: Apollo-11 Corpus Advancements for Speech Technologies from Earth to the Moon
John H.L. Hansen, Abhijeet Sangwan, Aditya Joglekar, Ahmet E. Bulut, Lakshmish Kaushik and Chengzhu Yu
Session: Spoken Corpora and Annotation
Yu, Dong
Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
Session: Sequence Models for ASR
A Multistage Training Framework for Acoustic-to-Word Model
Chengzhu Yu, Chunlei Zhang, Chao Weng, Jia Cui and Dong Yu
Session: Sequence Models for ASR
Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks
Xuankai Chang, Yanmin Qian and Dong Yu
Session: Robust Speech Recognition
Deep Discriminative Embeddings for Duration Robust Speaker Verification
Na Li, Deyi Tuo, Dan Su, Zhifeng Li and Dong Yu
Session: Speaker Verification Using Neural Network Methods I
Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection
Meng Yu, Xuan Ji, Yi Gao, Lianwu Chen, Jie Chen, Jimeng Zheng, Dan Su and Dong Yu
Session: Topics in Speech Recognition
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu and Helen Meng
Session: Expressive Speech Synthesis
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Yu, Jianwei
Gaussian Process Neural Networks for Speech Recognition
Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu and Helen Meng
Session: Novel Neural Network Architectures for Acoustic Modelling
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
Jianwei Yu, Xurong Xie, Shansong Liu, Shoukang Hu, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu and Helen Meng
Session: Application of ASR in Medical Practice
Yu, Kai
Structured Word Embedding for Low Memory Neural Network Language Model
Kaiyu Shi and Kai Yu
Session: Selected Topics in Neural Speech Processing
High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
Kuan Chen, Bo Chen, Jiahao Lai and Kai Yu
Session: Voice Conversion and Speech Synthesis
Angular Softmax for Short-Duration Text-independent Speaker Verification
Zili Huang, Shuai Wang and Kai Yu
Session: Speaker Verification Using Neural Network Methods II
Knowledge Distillation for Sequence Model
Mingkun Huang, Yongbin You, Zhehuai Chen, Yanmin Qian and Kai Yu
Session: Acoustic Modelling
Yu, Meng
Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection
Meng Yu, Xuan Ji, Yi Gao, Lianwu Chen, Jie Chen, Jimeng Zheng, Dan Su and Dong Yu
Session: Topics in Speech Recognition
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu
Session: Deep Learning for Source Separation and Pitch Tracking
Yuan, Jiahong
GlobalTIMIT: Acoustic-Phonetic Datasets for the World’s Languages
Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman, Jonathan Wright, Jiahong Yuan, Juhong Zhan and Yuqing Zhan
Session: Speech Segments and Voice Quality
Pitch Characteristics of L2 English Speech by Chinese Speakers: A Large-scale Study
Jiahong Yuan, Qiusi Dong, Fei Wu, Huan Luan, Xiaofei Yang, Hui Lin and Yang Liu
Session: Second Language Acquisition and Code-switching
Yunusova, Yana
Automatic Detection of Orofacial Impairment in Stroke
Andrea Bandini, Jordan Green, Brian Richburg and Yana Yunusova
Session: Integrating Speech Science and Technology for Clinical Applications
Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
Kwanghoon An, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman and Jun Wang
Session: Integrating Speech Science and Technology for Clinical Applications
Yılmaz, Emre
Building a Unified Code-Switching ASR System for South African Languages
Emre Yılmaz, Astik Biswas, Ewald van der Westhuizen, Febe de Wet and Thomas Niesler
Session: Speech Technologies for Code-Switching in Multilingual Communities
Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech
Emre Yılmaz, Henk van den Heuvel and David van Leeuwen
Session: Speech Technologies for Code-Switching in Multilingual Communities
Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech
Astik Biswas, Febe de Wet, Ewald van der Westhuizen, Emre Yılmaz and Thomas Niesler
Session: Topics in Speech Recognition
Articulatory Features for ASR of Pathological Speech
Emre Yılmaz, Vikramjit Mitra, Chris Bartels and Horacio Franco
Session: Application of ASR in Medical Practice
Zafeiriou, Stefanos
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Zarazinski, Adam
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Zeghidour, Neil
End-to-End Speech Recognition from the Raw Waveform
Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert and Emmanuel Dupoux
Session: Sequence Models for ASR
Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning
Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz and Emmanuel Dupoux
Session: Topics in Speech Recognition
Zenkel, Thomas
Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks
Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel and Sebastian Stüker
Session: Extracting Information from Audio
Subword and Crossword Units for CTC Acoustic Models
Thomas Zenkel, Ramon Sanabria, Florian Metze and Alex Waibel
Session: ASR Systems and Technologies
Zhang, Chao
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems
Yu Wang, Chao Zhang, Mark Gales and Philip Woodland
Session: Acoustic Model Adaptation
Semi-tied Units for Efficient Gating in LSTM and Highway Networks
Chao Zhang and Philip Woodland
Session: Novel Neural Network Architectures for Acoustic Modelling
Zhang, Dajie
Self-assessed Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Summary of results
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Heart Beat Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Crying Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Atypical Affect Sub-Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Zhang, Hao
Cross-Corpora Convolutional Deep Neural Network Dereverberation Preprocessing for Speaker Verification and Speech Enhancement
Peter Guzewich, Stephen Zahorian, Xiao Chen and Hao Zhang
Session: Dereverberation
Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios
Hao Zhang and DeLiang Wang
Session: Deep Enhancement
Zhang, Hui
Improving Mongolian Phrase Break Prediction by Using Syllable and Morphological Embeddings with BiLSTM Model
Rui Liu, Feilong Bao, Guanglai Gao, Hui Zhang and Yonghe Wang
Session: Prosody Modeling and Generation
Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation
Yun Liu, Hui Zhang and Xueliang Zhang
Session: Novel Approaches to Enhancement
Zhang, Jinsong
Emotional Prosody Perception in Mandarin-speaking Congenital Amusics
Yixin Zhang, Tianzhu Geng and Jinsong Zhang
Session: Speech Prosody
Analysis of L2 Learners’ Progress of Distinguishing Mandarin Tone 2 and Tone 3
Yue Sun, Win Thuzar Kyaw, Jinsong Zhang and Yoshinori Sagisaka
Session: Second Language Acquisition and Code-switching
A Preliminary Study on Tonal Coarticulation in Continuous Speech
Lixia Hao, Wei Zhang, Yanlu Xie and Jinsong Zhang
Session: Source and Supra-segmentals
Interactions between Vowels and Nasal Codas in Mandarin Speakers’ Perception of Nasal Finals
Chong Cao, Wei Wei, Wei Wang, Yanlu Xie and Jinsong Zhang
Session: Speech and Speaker Perception
Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention
Longfei Yang, Yanlu Xie and Jinsong Zhang
Session: Deep Learning for Source Separation and Pitch Tracking
Zhang, Junbo
Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition
Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang and Lei Xie
Session: Robust Speech Recognition
Attention-based End-to-End Models for Small-Footprint Keyword Spotting
Changhao Shan, Junbo Zhang, Yujun Wang and Lei Xie
Session: Extracting Information from Audio
Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model
Ke Wang, Junbo Zhang, Yujun Wang and Lei Xie
Session: Adjusting to Speaker, Accent, and Domain
Zhang, Kailai
Temporal Transformer Networks for Acoustic Scene Classification
Teng Zhang, Kailai Zhang and Ji Wu
Session: Audio Events and Acoustic Scenes
Data Independent Sequence Augmentation Method for Acoustic Scene Classification
Zhang Teng, Kailai Zhang and Ji Wu
Session: Acoustic Scenes and Rare Events
Multi-modal Attention Mechanisms in LSTM and Its Application to Acoustic Scene Classification
Teng Zhang, Kailai Zhang and Ji Wu
Session: Acoustic Scenes and Rare Events
Zhang, Pengyuan
Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR
Li Wenjie, Gaofeng Cheng, Fengpei Ge, Pengyuan Zhang and Yonghong Yan
Session: Neural Network Training Strategies for ASR
Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling
Hangting Chen, Pengyuan Zhang, Haichuan Bai, Qingsheng Yuan, Xiuguo Bao and Yonghong Yan
Session: Acoustic Scenes and Rare Events
Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition
Yike Zhang, Pengyuan Zhang and Yonghong Yan
Session: Language Modeling
Zhang, ShiLiang
Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning
ShiLiang Zhang and Ming Lei
Session: Sequence Models for ASR
Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting
Mengzhe Chen, ShiLiang Zhang, Ming Lei, Yong Liu, Haitao Yao and Jie Gao
Session: Topics in Speech Recognition
Zhang, Teng
Temporal Transformer Networks for Acoustic Scene Classification
Teng Zhang, Kailai Zhang and Ji Wu
Session: Audio Events and Acoustic Scenes
Multi-modal Attention Mechanisms in LSTM and Its Application to Acoustic Scene Classification
Teng Zhang, Kailai Zhang and Ji Wu
Session: Acoustic Scenes and Rare Events
Zhang, Wang
Acoustic Features Associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders
Wang Zhang, Xiangquan Gui, Tianqi Wang, Manwa Ng, Feng Yang, Lan Wang and Nan Yan
Session: Integrating Speech Science and Technology for Clinical Applications
Zhang, Xuedong
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
Session: Application of ASR in Medical Practice
Zhang, Xueliang
Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation
Yun Liu, Hui Zhang and Xueliang Zhang
Session: Novel Approaches to Enhancement
Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks
Zhong-Qiu Wang, Xueliang Zhang and DeLiang Wang
Session: Deep Learning for Source Separation and Pitch Tracking
Zhang, Yang
LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition
Heysem Kaya, Dmitrii Fedotov, Ali Yeşilkanat, Oxana Verkholyak, Yang Zhang and Alexey Karpov
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Zhang, Zixing
Evolving Learning for Analysing Mood-Related Infant Vocalisation
Zixing Zhang, Jing Han, Kun Qian and Björn Schuller
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations
Zixing Zhang, Alejandrina Cristia, Anne Warlaumont and Björn Schuller
Session: Second Language Acquisition and Code-switching
Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
Ziping Zhao, Yu Zheng, Zixing Zhang, Haishuai Wang, Yiqin Zhao and Chao Li
Session: Speaker State and Trait
Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech
Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller
Session: Representation Learning for Emotion
Zhao, Guanlong
Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function
Shaojin Ding, Guanlong Zhao, Christopher Liberatore and Ricardo Gutierrez-Osuna
Session: Voice Conversion
L2-ARCTIC: A Non-native English Speech Corpus
Guanlong Zhao, Sinem Sonsaat, Alif Silpachai, Ivana Lucic, Evgeny Chukharev-Hudilainen, John Levis and Ricardo Gutierrez-Osuna
Session: Spoken Corpora and Annotation
Zheng, Yibin
BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End
Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ya Li
Session: Prosody Modeling and Generation
Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
Session: Statistical Parametric Speech Synthesis
On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis
Yibin Zheng, Jianhua Tao, Zhengqi Wen and Ruibo Fu
Session: Statistical Parametric Speech Synthesis
Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer
Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
Session: Speech Synthesis Paradigms and Methods
Zhong, Shun-Chang
Self-Assessed Affect Recognition Using Fusion of Attentional BLSTM and Static Acoustic Features
Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li and Chi-Chun Lee
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2
Zhou, Shiyu
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Shiyu Zhou, Linhao Dong, Shuang Xu and Bo Xu
Session: Sequence Models for ASR
Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin
Linhao Dong, Shiyu Zhou, Wei Chen and Bo Xu
Session: Sequence Models for ASR
Zhu, Anna
Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer
Yaxing Li, Eshete Derb Emiru, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yichang Li
Session: Coding
Multi-frame Coding of LSF Parameters Using Block-Constrained Trellis Coded Vector Quantization
Yaxing Li, Shan Xu, Shengwu Xiong, Anna Zhu, Pengfei Duan and Yueming Ding
Session: Coding
Zhukova, Yulia
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov and Maria Usova
Session: Speaker Verification II
Zisserman, Andrew
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung, Arsha Nagrani and Andrew Zisserman
Session: Speaker Verification II
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras, Joon Son Chung and Andrew Zisserman
Session: Deep Enhancement
Deep Lip Reading: A Comparison of Models and an Online Application
Triantafyllos Afouras, Joon Son Chung and Andrew Zisserman
Session: Multimodal Systems
Zou, Yuexian
Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with an Acoustic Vector Sensor
Disong Wang and Yuexian Zou
Session: Source Separation and Spatial Analysis
Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition
Danqing Luo, Yuexian Zou and Dongyan Huang
Session: The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1
de Wet, Febe
Building a Unified Code-Switching ASR System for South African Languages
Emre Yılmaz, Astik Biswas, Ewald van der Westhuizen, Febe de Wet and Thomas Niesler
Session: Speech Technologies for Code-Switching in Multilingual Communities
Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech
Astik Biswas, Febe de Wet, Ewald van der Westhuizen, Emre Yılmaz and Thomas Niesler
Session: Topics in Speech Recognition
ten Bosch, Louis
Analyzing Reaction Time Sequences from Human Participants in Auditory Experiments
Louis ten Bosch, Mirjam Ernestus and Lou Boves
Session: Models of Speech Perception
Analyzing EEG Signals in Auditory Speech Comprehension Using Temporal Response Functions and Generalized Additive Models
Kimberley Mulder, Louis ten Bosch and Lou Boves
Session: Cognition and Brain Studies
Information Encoding by Deep Neural Networks: What Can We Learn?
Louis ten Bosch and Lou Boves
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Implementing DIANA to Model Isolated Auditory Word Recognition in English
Filip Nenadić, Louis ten Bosch and Benjamin V. Tucker
Session: Speech and Speaker Perception
van Hamme, Hugo
Capsule Networks for Low Resource Spoken Language Understanding
Vincent Renkens and Hugo van Hamme
Session: Spoken Dialogue Systems and Conversational Analysis
State Gradients for RNN Memory Analysis
Lyan Verwimp, Hugo van Hamme, Vincent Renkens and Patrick Wambacq
Session: Deep Neural Networks: How Can We Interpret What They Learned?
Memory Time Span in LSTMs for Multi-Speaker Source Separation
Jeroen Zegers and Hugo van Hamme
Session: Deep Neural Networks: How Can We Interpret What They Learned?
van Hout, Julien
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson and Martin Graciarena
Session: Speaker Verification II
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey and Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson and Karl Ni
Session: Robust Speech Recognition
Analysis of Complementary Information Sources in the Speaker Embeddings Framework
Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout and Aaron Lawson
Session: Speaker Verification Using Neural Network Methods II
van der Westhuizen, Ewald
Building a Unified Code-Switching ASR System for South African Languages
Emre Yılmaz, Astik Biswas, Ewald van der Westhuizen, Febe de Wet and Thomas Niesler
Session: Speech Technologies for Code-Switching in Multilingual Communities
Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech
Astik Biswas, Febe de Wet, Ewald van der Westhuizen, Emre Yılmaz and Thomas Niesler
Session: Topics in Speech Recognition
Černocký, Jan
Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
Ladislav Mošner, Oldřich Plchot, Pavel Matějka, Ondřej Novotný and Jan Černocký
Session: Dereverberation
BUT OpenSAT 2017 Speech Recognition System
Martin Karafiát, Murali Karthick Baskar, Igor Szöke, Vladimír Malenovský, Karel Veselý, František Grézl, Lukáš Burget and Jan Černocký
Session: Topics in Speech Recognition
Lightly Supervised vs. Semi-supervised Training of Acoustic Model on Luxembourgish for Low-resource Automatic Speech Recognition
Karel Veselý, Carlos Segura, Igor Szöke, Jordi Luque and Jan Černocký
Session: Neural Network Training Strategies for ASR
BUT System for Low Resource Indian Language ASR
Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiát, Lukáš Burget and Jan Černocký
Session: Low Resource Speech Recognition Challenge for Indian Languages
Žmolíková, Kateřina
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner and Pavel Matějka
Session: The First DIHARD Speech Diarization Challenge