This was a personal project mostly completed in the time around Day after Christmas 2017 - Day after New Year's Day 2018. It's not perfect but it was awfully fun to play with, and it was nice to learn a little about image recognition. Bird detection and classification (I wanted to try the latter too but didn't get to it) is certainly something people have looked at, with birdsongs as well as with images, but it seemed like a cute little problem to play with at my own birdfeeder.
To get started, I set up a raspberry pi 3 (with this camera which seems to work ok although everything has a reddish tinge, which is almost certainly user error that I never bothered to fix) to take pictures of my birdfeeder every second during daylight hours. I printed this case for it which I've used in the past and like a lot, and as you can see below, I attached the pi to my windowsill with painter's tape, the camera to a pen (that had a stick-on thing on the end that was meant to attach it to a whiteboard or something - a good find as it kept things pretty stable and made it easy to make fine adjustments), and the pen to some cardboard with... more painter's tape. The very apogee of elegance!
I used dlib to train some machine learning models (details later on) to try to detect the birds in different poses, which was more accurate than just training one detector. I labeled the training data (images obtained through writing a tiny script to take a picture every 4 seconds, and sticking it in my crontab on the pi so I didn't have to wake up too early to start the collection!) using the handy dlib imglab tool, trying to get a variety of species with different backgrounds and lighting conditions in each set of data.
The pi now posts a new image to this website's server every second, where the detectors are run on the image. If they return any detections, these are shown on the image and recorded in the database. In any case the image you see on the feeder feed gets refreshed every second. I save the detection locations in a postgres database so I could calculate statistics/save history, and used django to build the webapp.
So... it sort of works, but on test data my detectors tended to have high precision but poor recall - 0.5-0.7. I trained HOG detectors using dlib, but I really ought to have tried a CNN. The trouble with birds is they can appear in any orientation at all, especially at a feeder, where they hang upside down, right side up, crouch in the platform feeder, appear on the windowsill, the bushes, at different distances from the camera, etc. Sometimes their wings are prominent (in flight), or their tails are, or... Anyway, this is not a great recipe for HOG detector success. To somewhat mitigate this, I initially trained five detectors - side view, front view, back view, "up-down" (hanging on the side of a feeder) view, and "upside down" view (where I tried to make a detector that would recognize birds hanging off the bottom of the red circular feeder). To show the bounding boxes, I use all the outputs of these detectors, except where detections from different detectors overlap, in which case I show just one (since sometimes more than one detector will pick up a bird). A week or so after I initially deployed the website, it snowed a foot, then the temperature shot up to 60 F a few days later and melted the snow, giving me lots of good backgrounds to collect additional data from! So I labeled more data and trained 8 new detectors using dlib's data clustering to determine what general type of images were used for each detector (these do roughly correspond to the detectors I initially trained). All but one of the detectors are currently used (one of the ones trained to detect birds hanging upside down from the red feeder false alarms too much on the snowless bushes).
One reason I didn't try a CNN was that I feared I wouldn't have enough training data. But the labeling went faster than I expected really. I had intended to try to turk the labeling but ended up finding it quicker to just do myself - if I were going to do a lot more of it I might turk it though. For the HOG detectors, I had on the order of just 30-200 training images per detector - sometime I'll get around to labeling more. I'd need many more for a CNN.
To close, I must thank Hansie for supporting this effort. Well, actually he mostly distracted me from this effort but I appreciated him *trying* to assist by destroying my computer (that's one way to debug!). :P