3:30, ITT-L3.1
STANDARDIZATION OF THE SELECTABLE MODE VOCODER
S. GREER, A. DEJACO
The migration from the first generation of cellular telephony to the second has included a transition from an analog speech channel to a digital channel that employs digital speech codecs. As the deployment of these second-generation systems matures, system capacity concerns have increased the pressure for a more efficient encoding of speech. In addition, market pressures have contributed to the contradictory requirement of improved voice quality provided by these wireless systems. This paper presents the motivation for, and the execution of a program of standardizing a new variable-rate speech codec for the cdmaOne/cdma2000 wireless system. Several codecs have previously been standardized. This new codec, known as the selectable mode vocoder or SMV, offers a significant improvement in voice quality over that of existing codec standards as well as the flexibility of allowing the system operator to make tradeoffs between voice quality and system capacity.
3:50, ITT-L3.2
2.4 KB/SEC COMPRESSED DOMAIN TELECONFERENCE BRIDGE WITH UNIVERSAL TRANSCODER
R. ZINSER, P. CHOONG, S. KOCH
Advanced new technologies, such as cellular-telephone-quality ultra-low-rate speech coders, model domain transcoders, and compressed domain conferencing algorithms provide an opportunity to develop a compressed domain conference bridge system for use in secure, survivable military communications environments. The new conference bridge will allow seamless interoperability with diverse voice terminals and enable full-duplex teleconference operation. Unlike users of half-duplex systems, conferencing participants will be able to talk at the same time and hear the two most relevant simultaneous talkers over a single 2.4 KBPS connection. This paper describes a system architecture that implements the features mentioned above. Compared to conventional multicast conferencing algorithms, the new system will consume a significantly smaller portion of the satellite resources; for N conference participants, conventional multicast requires N^2 channels, while the new system will use only 2N channels.
4:10, ITT-L3.3
ON INTEGRATING ACOUSTIC ECHO AND NOISE CANCELLATION SYSTEMS FOR HANDS-FREE TELEPHONY
S. PARK, C. CHO, C. LEE, D. YOUN
An integrated acoustic echo and noise cancellation system for hands-free telephony is presented. The proposed system includes a new residual echo cancellation scheme based on spectral analysis and a new double-talk detector suitable for real-time implementation. Residual echo is whitened via AR analysis during no near-end-talk period and is cancelled by noise reduction. Removing speech characteristics of the residual echo signal, noise reduction successfully reduces the power of the residual echo as well as the ambient noise. For further integration with commercial low-bit rate speech coders, noise reduction in IS-127 (EVRC) was considered.
For the hands-free situation in the moving car, the proposed system attenuated the interferences more than 30 dB at a constant speed of 90 km/h. The proposed system was implemented on a low-cost DSP with 16-bit fixed-point arithmetic.
4:30, ITT-L3.4
A NEW W3C MARKUP STANDARD FOR TEXT-TO-SPEECH SYNTHESIS
M. WALKER, J. LARSON, A. HUNT
A new set of XML-based markup standards developed for the purpose of enabling voice browsing of the Internet will begin emerging in 2001 from the Voice Browser working group, recently organized under the auspices of the W3C. Among the first in this series of soon-to-be-released specifications is the speech synthesis text markup standard. The Speech Synthesis Markup Language (SSML) Specification is largely based on JSML, but also incorporates elements and concepts from SABLE, a previously published text markup standards, and from VoiceXML, which is itself based on JSML and SABLE. SSML also includes new elements designed to optimize the capabilities of contemporary speech synthesis engines in the task of converting text into speech. This paper summarizes the markup element design philosophy and includes descriptions of each of the speech synthesis markup elements.
4:50, ITT-L3.5
SPOKEN WORD RECOGNITION WITH DIGITAL COCHLEA USING 32 DSP-BOARDS
M. NAMIKI, S. HANGAI, T. HAMAMOTO
A digital cochlea, which has a cascade of 16 filter sections, is realized by 32 commercially available DSP-boards. Each section consists of travelling waves filter, velocity transformation filter and second filter. The artificial cochlea is also applied to spoken word recognition by feeding 16 output signals through a multi-channel A/D converter on PC From experimental results, it is found that 50 Japanese words uttered by three speakers are recognized with 3% error. This means the cochlea extracts feature parameters for speech recognition and shows the possibility of the signal processor for the cochlear implants.
5:10, ITT-L3.6
FORCE XXI LAND WARRIOR: A SYSTEMS APPROACH TO SPEECH RECOGNITION
C. BROUN, W. CAMPBELL
Speech recognition is continually being realized as a user interface in new applications. As this technology progresses, it enables new ways for humans to interact with machines and information. The performance in many domains has approached users’ expectations. Although there are still abundant technology challenges ahead, speech recognition has reached a maturity level that requires one to consider its deployment in complex systems and environments. It is in this vein that we discuss a systems approach to the successful execution of speech recognition within the Force XXI Land Warrior program.