Accepted Papers

The Expressive Power of Word Embeddings
Y. Chen, B. Perozzi, R. Al-Rfou' and S. Skeina

Abstract: We seek to better understand the information encoded in word embeddings. We propose several tasks that help to distinguish the characteristics of different publicly released embeddings. Our evaluation shows that embeddings are able to capture surprisingly nuanced semantics even in the absence of sentence structure. Moreover, benchmarking the embeddings shows great variance in quality and characteristics of the semantics captured by the tested embeddings. Finally, we show the impact of varying the number of dimensions and the resolution of each dimension on the effective useful features captured by the embedding space. Our contributions high- light the importance of embeddings for NLP tasks and the effect of their quality on the final results.

ICML version (pdf)

Deep Learning for Topical Words and Thematic Sentences
J-T. Chien and Y-L Chang

Abstract: This paper presents a hierarchical theme and topic model for deep representation of sentences and words from heterogeneous documents. We extract the latent themes from sentences and simultaneously identify the latent topics from words in different sentences. In this study, we flexibly conduct structural learning according to the Bayesian nonparametrics where the numbers of themes and topics are unknown. A tree stick-breaking process is proposed to determine the theme proportions for sentence representation. Hierarchical Dirichlet process is adopted to sample the topical words of a text corpus under the same theme. In the experiments, the proposed method is evaluated to be effective for finding topical words and thematic sentences in DUC 2007 corpus.

ICML version (pdf)

Text Segmentation with Character-level Text Embeddings
G. Chrupala

Abstract: Learning word representations has recently seen much success in computational linguistics. However, assuming sequences of word tokens as input to linguistic analysis is of- ten unjustified. For many languages word segmentation is a non-trivial task and naturally occurring text is sometimes a mixture of natural language strings and other character data. We propose to learn text representations directly from raw character sequences by training a Simple Recurrent Network to predict the next character in text. The net- work uses its hidden layer to evolve abstract representations of the character sequences it sees. To demonstrate the usefulness of the learned text embeddings, we use them as features in a supervised character level text segmentation and labeling task: recognizing spans of text containing programming language code. By using the embeddings as features we are able to substantially improve over a baseline which uses only surface character n-grams.

ICML version (pdf)

Deep Learning Based on Manhattan Update Rule
Y. Hifny

Abstract: Acoustic models based on Deep Neural Net- works (DNNs) lead to significant improvement in the recognition accuracy. In these methods, Hidden Markov Models (HMMs) state scores are computed using flexible discriminant DNNs. Training DNNs is computationally expensive and efficient training of DNNs is an active area of research. Similar to HMMs, Deep Conditional Random Fields (DCRFs) use DNNs to compute state scores. In this paper, we present a method to estimate DCRFs using the Manhattan (MH) update rule. The Manhattan update rule does not involve the gradient magnitude. The Manhattan update method is general and can be used to estimate models when the gradient can be computed.

ICML version (pdf)

Acoustic Modeling Based on Deep Conditional Random Fields

Y. Hifny

Abstract: Acoustic modeling based on Hidden Markov Models (HMMs) is employed by state-of-the- art stochastic speech recognition systems. In continuous density HMMs, the state scores are computed using Gaussian mixture models. On the other hand, Deep Neural Networks (DNN) can be used to compute the HMM state scores. This leads to significant improvement in the recognition accuracy. Conditional Random Fields (CRFs) are undirected graphical models that maintain the Markov properties of Hidden Markov Models (HMMs), formulated using the maximum entropy (MaxEnt) principle. It is possible to use DNN to compute the state scores in CRFs. Using CRFs on the top of DNN will lead to an acoustic model known as Deep Conditional Random Fields (DCRFs). In this paper, we present a phone recognition task based on DCRFs. Preliminary results on the TIMIT task show that DCRFs can lead to good results.

ICML version (pdf)

Vocal Tract Length Perturbation (VTLP) Improves Speech Recognition

N. Jaitly and G. Hinton

Abstract: Augmenting datasets by transforming inputs in a way that does not change the label is a crucial ingredient of the state of the art methods for object recognition using neural networks. However this approach has (to our knowledge) not been exploited successfully in speech recognition (with or without neural networks). In this paper we lay the foundation for this approach, and show one way of augmenting speech datasets by transforming spectrograms, using a random linear warping along the frequency dimension. In practice this can be achieved by using warping techniques that are used for vocal tract length normalization (VTLN) - with the difference that a warp factor is generated randomly each time, during training, rather than fitting a single warp factor to each training and test speaker (or utterance). At test time, a prediction is made by averaging the predictions over multiple warp factors. When this technique is applied to TIMIT using Deep Neural Networks (DNN) of different depths, the Phone Error Rate (PER) improved by an average of 0.65% on the test set. For a Convolutional neural network (CNN) with convolutional layer in the bottom, a gain of 1.0% was observed. These improvements were achieved without increasing the number of training epochs, and suggest that data transformations should be an important component of training neural networks for speech, especially for data limited projects.

ICML version (pdf)

Rectifier Nonlinearities Improve Neural Network Acoustic Models
A. Maas, A. Jannun and A. Ng

Abstract: Deep neural network acoustic models produce substantial gains in large vocabulary continuous speech recognition systems. Emerging work with rectified linear (ReL) hidden units demonstrates additional gains in final system performance relative to more commonly used sigmoidal nonlinearities. In this work, we explore the use of deep rectifier networks as acoustic models for the 300 hour Switchboard conversational speech recognition task. Using simple training procedures without pretraining, networks with rectifier nonlinearities produce 2% absolute reductions in word error rates over their sigmoidal counterparts. We analyze hidden layer representations to quantify differences in how ReL units encode inputs as compared to sigmoidal units. Finally, we evaluate a variant of the ReL unit with a gradient more amenable to optimization in an attempt to further improve deep rectifier networks.

ICML version (pdf)

Effect of Non-linear Deep Architecture in Sequence Labeling
M. Wang and C. D. Manning

Abstract: If we compare the widely used Conditional Random Fields (CRF) with newly proposed “deep architecture” sequence models (Collobert et al., 2011), there are two things changing: from linear architecture to non- linear, and from discrete feature representation to distributional. It is unclear, however, what utility non-linearity offers in conventional feature-based models. In this study, we show the close connection between CRF and “sequence model” neural nets, and present an empirical investigation to compare their performance on two sequence labeling tasks – Named Entity Recognition and Syntactic Chunking. Our results suggest that non-linear models are highly effective in low-dimensional distributional spaces. Somewhat surprisingly, we find that a non-linear architecture offers no benefits in a high-dimensional discrete feature space.

ICML version (pdf)

Deep Learning for Audio, Speech and Language Processing, ICML 2013

Navigation

Accepted Papers