1:00, AUDIO-L3.1
A PREDOMINANT-F0 ESTIMATION METHOD FOR CD RECORDINGS: MAP ESTIMATION USING EM ALGORITHM FOR ADAPTIVE TONE MODELS
M. GOTO
This paper describes a predominant-F0 (fundamental frequency)
estimation method called PreFEst, which can detect melody and bass
lines in monaural audio signals containing sounds of various
instruments. While most previous methods premised mixtures of a few
sounds and had difficulty dealing with such complex signals, our
method can estimate the F0 of the melody and bass lines without
assuming the number of sound sources in compact-disc recordings. In
this paper we propose the following three extensions to our previous
PreFEst to make it more adaptive and flexible: introducing multiple
harmonic-structure tone models, estimating the shape of tone models,
and introducing a prior distribution of its shape and F0 estimates.
These extensions were implemented by the MAP (Maximum A Posteriori
Probability) estimation by using the Expectation-Maximization
algorithm. Experimental results with compact-disc recordings showed
that our real-time system based on the extended PreFEst achieved
performance improvement.
1:20, AUDIO-L3.2
ESTIMATION OF SINUSOIDS IN AUDIO SIGNALS USING AN ANALYSIS-BY-SYNTHESIS NEURAL NETWORK
G. GARCIA
In this paper we present a new method for estimating the frequency, amplitude and phase of sinusoidal components in audio signals. An analysis-by-synthesis system of neural networks is used to extract the sinusoidal parameters from the signal spectrum at each window position of the Short-Term Fourier Transform. The system attempts to find the set of sinusoids that best fits the spectral representation in a least-squares sense. Overcoming a significant limitation of the traditional approach in the art, preliminary detection and interpolation of spectral peaks is not necessary and the method works even when spectral peaks are not well resolved in frequency. This allows for shorter analysis windows and therefore better time resolution of the estimated sinusoidal parameters. Results have also shown robust performance in presence of high levels of additive noise, with signal-to-noise ratios as low as 0 dB.
1:40, AUDIO-L3.3
IDENTIFICATION OF MUSICAL CHORDS USING CONSTANT-Q SPECTRA
R. WOTIZ, S. AYYASH, S. NAWAB
We present an approach to the extraction of frequencies corresponding to chords in western polyphonic music. In the first phase of this approach constant-Q spectral analysis directly provides the features from which the fundamental frequencies for 43 of the 57 possible categories of chords can be extracted without ambiguity. Each remaining chord category has a potential ambiguity associated with it because of frequency resolution problems. The second phase of our approach is designed to address such ambiguities. A software implementation of our approach was used successfully to validate its performance on a representative set of polyphonic musical signals.
2:00, AUDIO-L3.4
MULTI-TIMBRE CHORD CLASSIFICATION USING WAVELET TRANSFORM AND SELF-ORGANIZED MAP NEURAL NETWORKS
B. SU, S. JENG
This paper presents a new method for musical chord recognition based on a model of human perception. We classify the chords directly from the sound without the information of timbres and notes. A wavelet-based transform as well as a self-organized map (SOM) neural network is adopted to imitate human ears and cerebra, respectively. The resultant system can classify chords very well even in a noisy environment.
2:20, AUDIO-L3.5
MULTIPITCH ESTIMATION AND SOUND SEPARATION BY THE SPECTRAL SMOOTHNESS PRINCIPLE
A. KLAPURI
A processing principle is proposed for finding the pitches
and separating the spectra of concurrent musical sounds.
The principle, spectral smoothness, is used in the human
auditory system which separates sounds partly by assuming
that the spectral envelopes of real sounds are continuous.
Both theoretical and experimental evidence is presented for
the vital importance of spectral smoothness in resolving
sound mixtures. Three algorithms of varying complexity are
described which successfully implement the new principle.
In validation experiments, random pitch and sound source
combinations were analyzed in a single time frame. Number
of simultaneous sounds ranged from one to six, database
comprising sung vowels and 26 musical instruments. Usage
of a specific yet straightforward smoothing operation
corrected approximately half of the pitch errors that
occurred in a system which was otherwise identical but did
not use the smoothness principle. In random four-voice
mixtures, pitch error rate reduced from 18% to 8.1%.
2:40, AUDIO-L3.6
PHYSICAL MODELING OF DRUMS BY TRANSFER FUNCTION METHODS
L. TRAUTMANN, S. PETRAUSCH, R. RABENSTEIN
Multidimensional (MD) physical systems are usually given in terms of partial differential equations (PDEs). Similar to one-dimensional systems, they can also be described by transfer function models (TFMs). In addition to including initial and boundary conditions as well as excitation functions exactly, the TFM can also be discretized in a simple way. This leads to suitable implementations for digital signal processors. Therefore it is possible to implement physics based digital sound synthesis algorithms derived from TFMs in real-time. This paper extends the recently presented solution for vibrating strings with one spatial dimension to two-dimensional drum models.