Machine learning

I’m interested broadly in using statistical models to understand and benefit from large data sets. This includes supervised learning as well as using weak supervisory signals and self-supervised learning.

Computer vision

Deep neural networks have led to major advances in computer vision. Most of my work at Yahoo uses deep learning with visual data for classification, ranking and retrieval, and/or object detection.

(Deep) Metric learning

The goal of metric learning is to learn a function map inputs to an embedding space where relevant inputs are nearby. This function can then be used to match new inputs to one another or to class prototypes that were not available at training time. “Deep” metric learning just means the function is a deep neural network. Several of my projects at Yahoo have involved metric learning for matching product, content, and/or ad images. My vision-and-language work also draws on metric learning techniques.

Vision and language

I am interested in multimodal models and the intersection of vision and language in particular.



Neuroscience

Normative models of neural sensory systems

In my neuroscience work, I tried to understand sensory systems (particularly vision and audition) starting with a notion of optimality – how should the system work based on what we think it’s trying to do? Supposing that it is beneficial to form efficient or sparse representations of sensory signals an animal encounters, for example, we expect the brain to be adapted for this purpose. We can then develop a statistic model of these signals with some of the structure and constraints of a biological neural system, and compare the parameters or behavior of this model to properties of a real brain.

Natural ensemble statistics

The statistical models I described above have to be suited to the sensory signals that animals encounter if we hope to learn anything about the brain. This led me to study the statistical structure of natural sounds and images as their own objects of interest.