I am affiliated with the Auditory
Neuroscience Lab where we "explore spatial hearing and auditory
perception in multi-source
natural environments through acoustical, behavioral and computational
studies". My areas of interest include computational auditory scene
analysis, source segregation and related topics. My advisor is Prof.
Barbara Shinn-Cunningham. Recently, I have been working with Bhiksha Raj and Paris Smaragdis at MERL on probabilistic models for single-channel audio processing. Below are brief descriptions of some of the projects that I'm currently working on.
Probabilistic Models for Single-Channel Audio Processing
- with Paris Smaragdis and Bhiksha Raj.
Auditory scene analysis refers to the human ability to extract different perceptual objects from a sound mixture. Building artificial systems with this ability has become an active area of research. Most attempts so far have used hand-designed systems that build on the knowledge of psychophysics and heuristics used by the human auditory system. Statistical methods, on the other hand, have almost exclusively focused on multi-channel audio data (ICA, blind source separation, microphone arrays and so on).
I have been working on a probabilistic framework for modeling single-channel audio data. The framework comprises latent variable probabilistic generative models. The basic idea is to extract frequency structure in the form of 'basis vectors' from two-dimensional time-frequency representations of sound. The models can be used in an unsupervised fashion to learn acoustic structure from different classes of sound signals and in a supervised learning framework for applications such as sound separation and denoising. I have been specifically interested in using the concept of sparse-coding in this framework. We can enforce sparsity on distributions by imposing an entropic prior.
The models are more general and can be useful in analyzing data from other domains such as text corpora (for semantic analysis) and image data.
I maintain a speaker separation page which contains more details and results of application of these models to the problem of separating speakers from single-channel mixtures.
Applications of Secure Multi-party Computations for Classification
- with Paris Smaragdis.
Consider two parties Alice and Bob with private data a and b respectively and suppose they want to compute the result of a function f(a, b). Consider a trusted third-party who can take the private data, compute the result c = f(a, b), and intimate the result to the parties. Any protocol between Alice and Bob that implements an algorithm to calculate f(a, b) is said to be secure only if it leaks no more information about a and b than what one can gain from learning the result c from the trusted third-party. This idea can be generalized to the case when there are multiple parties involved and is referred to as secure multi-party computations (SMC).
Using SMC principles, we developed protocols that enable secure classification.Suppose Alice has private d-dimensional data and Bob has a classifier. Alice wants Bob to classify her data but she does not want him to know either her data or the result of classification. In addition, Bob does not want Alice to learn anything about his classifier. We developed protocols that enable such computations when Bob's classifier is a Gaussian mixture model. The basic building block is a cryptographic primitive that enables secure computation of dot-products. We have extended this work to encompass Hidden Markov Model computations. This work can be used to develop secure speech recognition systems. Further research directions include extending this work for other classifiers and developing more efficient (in terms of computation and communication complexity) cryptographic primitives that function as building blocks for these protocols.
Some of the projects I've worked on in the past include:
Understanding speech in the presence of multiple talkers in natural reverberant environments