Thursday, September 18, 2014

Getting Started

In which the obligatory introductory post is found:

My more recent interest in working with data and machine learning re-kindled a couple of years ago when a friend and I started thinking about creating devices which would record audio in the wild (or backyards or ...) and then have a program that would analyze the audio to identify what bird species were present. Further goals for this project might include mapping the location of the call and identifying the type of call (i.e. is it a territorial song, companion call, juvenile begging, alarm, etc.) The basic identification tool could of course be useful for cataloging the birds in an area (useful information for research and home use). The extended goals could allow for things like mapping individual bird territories, identifying nesting success and potentially more.

Getting started, I thought that working on location of sounds from a stereo signal might be an accessible beginning: it would not be too difficult to generate a set of training data and at least in an ideal world the mathematics of locating a sound source is approachable. Of course, outside in the real world things can get messy and my naive approaches had problems.

It was time to start moving beyond naive approaches. I started reading more about machine learning, including reading some papers on the recent progress on the bird call identification problem (it appears that the basic idea of the project has been implemented by at least a couple of companies at this point), and I had the opportunity to take the University of Washington's introductory data science course on Coursera this summer. That course was very helpful at getting me using some of the tools and techniques available (in particular, I have mainly been playing with various classifiers in Python's scikit-learn), and as one of the assignments began working on a Kaggle competition.

Initially, much of my posting here will likely revolve around my work on the Kaggle forest cover type competition. I also have an on going write-up of some of my results on my website. My intention is to post about some of the things I have tried and learned, both as a way of keeping track for myself as well as to hopefully be of help to others who might run into similar problems along the way.