Last time we converted audio buffers into images. This time we’ll take these images and train a neural network using deeplearn.js. The result is a browser-based demo that lets you speak a command (“yes” or “no”), and see the output of the classifier in real-time, like this:
Curious to play with it, see whether or not it recognizes yay or nay in addition to yes and no? Try it out live. You will quickly see that the performance is far from perfect. But that’s ok with me: this example is intended to be a reasonable starting point for doing all sorts of audio recognition on the web. Now, let’s dive into how this works. Continue reading