Web-based voice command recognition

Last time we converted audio buffers into images. This time we’ll take these images and train a neural network using deeplearn.js. The result is a browser-based demo that lets you speak a command (“yes” or “no”), and see the output of the classifier in real-time, like this:

Curious to play with it, see whether or not it recognizes yay or nay in addition to yes and noTry it out live. You will quickly see that the performance is far from perfect. But that’s ok with me: this example is intended to be a reasonable starting point for doing all sorts of audio recognition on the web. Now, let’s dive into how this works. Continue reading

Audio features for web-based ML

One of the first problems presented to students of deep learning is to classify handwritten digits in the MNIST dataset. This was recently ported to the web thanks to deeplearn.js. The web version has distinct educational advantages over the relatively dry TensorFlow tutorial. You can immediately get a feeling for the model, and start building intuition for what works and what doesn’t. Let’s preserve this interactivity, but change domains to audio. This post sets the scene for the auditory equivalent of MNIST. Rather than recognize handwritten digits, we will focus on recognizing spoken commands. We’ll do this by converting sounds like this:

Into images like this, called log-mel spectrograms, and in the next post, feed these images into the same types of models that do handwriting recognition so well:

final-log-mel-spectrogram

The audio feature extraction technique I discuss here is generic enough to work for all sorts of audio, not just human speech. The rest of the post explains how. If you don’t care and just want to see the code, or play with some live demos, be my guest! Continue reading

How to publish about your research results for academic and non-academic audiences

As a graduate student, one of our goals is to produce research that will be useful to the world, that will be known and used by other people. This usefulness can come in many forms; for example, our work can serve to inspire future research, which will take the topic one step further, or it can be used by people in the industry as part of their work. But for any of this to happen, the methods, results, and takeaways of our research need to be communicated to the world. Of course, most research programs require the student to write a thesis or dissertation, but the reality is that very few people will read it besides the evaluation committee. A thesis or dissertation might eventually be also read by other graduate students that are working on the same topic and want to know the existing literature in details. But other than that, most people would prefer to read a summarized version of the research instead of the whole thesis or dissertation. Continue reading

A world full of emojis

In 2010, a new trend emerged in electronic messages and web pages: emojis. There is an interesting journey behind these cute little images, and it is definitely worth to understand how and why they were initially created.

Emojis (less known as pictographs) are images encoded as text and exist in various genres: facial expressions , common objects , food , places ⛰️, activities ⛷, animals and most of what you can think of . The word comes from the Japanese (e ≅ picture) + (moji ≅ written character). 2823 emojis exist in total (as of today) and it is estimated that about 6 billion emojis are sent every single day. Continue reading

PHP for Backend Web Applications

Web Applications

It is reasonable to consider any website, whose functionality is entirely carried out by the client machine, to be a webpage. Alternatively, any website which requires communication with the server, after requesting a new page to display, could be considered a web application. PHP is one programming language which can be used on a web server in order to support web application functionality.
Continue reading