Reconnaissance automatisée de sons naturels – Application aux pics (Aves) par Madame Juliette FLORENTIN
Promoteur : Prof. Olivier Verlinden
Summary : Automated Recognition of Natural Sounds – Application to Woodpeckers (Aves)
There are eleven species of woodpeckers on the European continent. Ten of them drum on trees and
seven have long-distance advertising calls. Every year from March to May, these signals contribute to
forest soundscapes while woodpeckers draw territories, find mates and dig tree cavities. Each drum
and each call is species-specific and easily picked up by a trained ear. In this thesis, we have worked
toward automating this process and thus toward making the continuous acoustic monitoring of
woodpeckers practical. There were two main steps to implement: first the detection of woodpecker
signals against the backdrop of diverse acoustic communities and secondly the identification of the
different species. Because continuous monitoring generates hundreds of gigabytes of data, detection
had to be progressive; first we coarsely trimmed the datasets using a simple indicator, the Acoustic
Complexity Index (ACI), then we analyzed more elaborate sound features. Species identification
required mostly a description of duration and rhythm for the drums and an analysis of the
spectrograms for the calls. For both detection and species identification, for both the drums and the
calls, deep neural networks provided the most efficient, if not the only solution. Two favorable
circumstances made this possible: 1) legacy very deep image nets (up to 169 layers) were made public
and could be re-trained to address specific image problems and 2) the sound problem could be
transformed into an image problem via the spectrogram. When tested on development datasets
obtained from online archives such as Xeno-Canto, very deep nets easily recognized 95% of submitted
drums and calls, also alongside other noises. For real-life datasets, the false positives came in larger
numbers. The nets get confused by the countless birds that could not be taken into account during
training. Another point that calls for caution is the fact that the image invariants that sustained the
original training of the deep nets (e.g. the enlarged image of a car still represents a car) do not
necessarily apply to spectrograms. Overall, the woodpecker signals were recognized with a high
accuracy in March and early April, when the forest is relatively quiet. Later in the season, the false
positives crept in, but the nets still allowed discarding more than 95% of the recordings. This number
further increased when the nets were trained with known confusing signals. In the end, a reasonable
number of audio files was left that could be reviewed manually. This dataset reduction is a
consequent improvement compared to other techniques and allowed very deep nets to make the
acoustic monitoring of woodpeckers a reality.
7000 Mons, Belgique