ASRU 2011: Keynote Abstracts

Yoram Singer

Efficient Algorithms for Learning Sparse Models from Large Amounts of Data

Abstract: We will review the design, analysis and implementation of several sparsity promoting learning algorithms. We start with an efficient projected gradient algorithm onto the L1 ball. We then describe a forward-backward splitting (Fobos) method that incorporates L1 and mixed-norms. We next present adaptive gradient versions of the above methods that generalize well-studied sub-gradient methods. We conclude with a description of a recent approach for "sparse counting" which facilitates compact yet accurate language modeling.

Return to Keynote and Invited Speakers

David Forsyth

More Words and Bigger Pictures

Abstract: Object recognition is a little like translation: a picture (text in a source language) goes in, and a description (text in a target language) comes out. I will use this analogy, which has proven fertile, to describe recent progress in object recognition.

We have very good methods to spot some objects in images, but extending these methods to produce descriptions of images remains very difficult. The description might come in the form of a set of words, indicating objects, and boxes or regions spanned by the object. This representation is difficult to work with, because some objects seem to be much more important than others, and because objects interact. An alternative is a sentence or a paragraph describing the picture, and recent work indicates how one might generate rich structures like this. Furthermore, recent work suggests that it is easier and more effective to generate descriptions of images in terms of chunks of meaning ("person on a horse") rather than just objects ("person"; "horse").

Finally, if the picture contains objects that are unfamiliar, then we need to generate useful descriptions that will make it possible to interact with them, even though we don't know what they are.

Return to Keynote and Invited Speakers

Jennifer Chu-Carroll

Building Watson: An Overview of the DeepQA Project

Abstract: In February 2011, IBM's Watson system became the first computer contestant on Jeopardy!, a popular quiz show on American television. In the two game-match, Watson defeated the show's two all-time best players, along the way demonstrating its impressive capability at answering natural language questions with high precision and speed.

Underlying Watson are technologies that build on and integrate advances in Natural Language Processing (NLP), Information Retrieval (IR), Machine Learning (ML), and Knowledge Representation and Reasoning (KR&R), and Text-to-Speech (TTS) over the last few decades. Watson was developed as part of the DeepQA project, which was aimed at exploring how integrating NLP, IR, ML, and KR&R technologies, as well as massively parallel computation, can advance the science and application of automatic Question Answering. Open domain Question Answering holds tremendous promise for facilitating informed decision making over vast volumes of natural language content. Applications in business intelligence, healthcare, customer support, enterprise knowledge management, social computing, science and government could all benefit from computer systems capable of deeper language understanding.

Attaining champion-level performance at Jeopardy! requires a computer to rapidly and accurately answer rich open-domain questions, and to predict its own performance on any given question. The system must deliver high degrees of precision and confidence over a very broad range of knowledge and natural language content with a 3-second response time. To do this, the DeepQA team advanced a broad array of NLP techniques to find, generate, evidence and analyze many competing hypotheses over large volumes of natural language content to build Watson (www.ibmwatson.com). An important contributor to Watson's success is its ability to automatically learn and combine accurate confidences across a wide array of algorithms and over different dimensions of evidence. Watson produced accurate confidences to know when to "buzz in" against its competitors and how much to bet. High precision and accurate confidence computations are critical for real business settings where helping users focus on the right content sooner and with greater confidence can make all the difference. The need for speed and high precision demands a massively parallel computing platform capable of generating, evaluating and combing 1000's of hypotheses and their associated evidence. In this talk, I will introduce the audience to the Jeopardy! Challenge, explain how Watson was built on DeepQA to ultimately achieve high performance in precision, confidence, and speed, and I will discuss applications of the Watson technology in areas such as healthcare.

Return to Keynote and Invited Speakers

Frank H. Guenther

The Neural Mechanisms of Speech Production: From Computational Modeling to Neural Prosthesis

Abstract: Speech production is a highly complex sensorimotor task involving tightly coordinated processing in the frontal, temporal, and parietal lobes of the cerebral cortex. To better understand these processes, our laboratory has designed, experimentally tested, and iteratively refined a neural network model whose components correspond to the brain regions involved in speech. Babbling and imitation phases are used to train neural mappings between phonological, articulatory, auditory, and somatosensory representations. After learning, the model can produce syllables and words it has learned by generating movements of an articulatory synthesizer. Because the model’s components correspond to neural populations and are given precise anatomical locations, activity in the model’s cells can be compared directly to neuroimaging data. Computer simulations of the model account for a wide range of experimental findings, including data on acquisition of speaking skills, articulatory kinematics, and brain activity during speech. Furthermore, “damaged” versions of the model are being used to investigate several communication disorders, including stuttering, apraxia of speech, and spasmodic dysphonia. Finally, the model was used to guide development of a neural prosthesis aimed at restoring speech output to a profoundly paralyzed individual with an electrode permanently implanted in his speech motor cortex. The volunteer maintained a 70% hit rate after 5-10 practice attempts of each vowel in a vowel production task, supporting the feasibility of brain-machine interfaces with the potential to restore conversational speech abilities to the profoundly paralyzed.

Return to Keynote and Invited Speakers

Main Menu