In this post, I am going to talk about precision and recall and their importance in information retrieval. First of all, let’s talk about what we mean by information retrieval. Suppose you wake up one morning and decide you want to make muffins for breakfast. You take out your laptop and search for “healthy muffin recipe” on Google. Then, you go through the search results, decide on a recipe and get started on it. This is an example of information retrieval where the search engine (Google in this case) retrieved the results for your search query “healthy muffin recipe”.
Now let’s think about the importance of getting back good search results. Usually, whenever you search for something on a search engine, you have in mind some ideal result. You know exactly what you are looking for and you type in your query accordingly. For instance, if your search query is “The Beatles songs”, you might be thinking about a list of all songs by “The Beatles” as your ideal search result or as the ideal web page that you want to see. These ideal web pages can be called relevant to the search query. A search engine aims at retrieving relevant pages for your search query.
Let’s say we want to evaluate our search engine on a set of benchmark search queries.
Then for a search query, we have an ideal set of web pages that should be returned for that query. Let us call this the Relevant set of pages. Now you might be wondering how we find this ideal set of web pages. One way of finding it is by asking many different people what web pages they would expect for that query.
Now a search engine has access to millions of pages, but it needs to decide which web pages to return for a search query. How do you think it does this? Clearly, we cannot just ask different people and store the ideal web pages they state for each possible search query because the number of search queries possible is infinite. We need some automated way of returning the best web pages for any search query. Search engines use algorithms to figure out the best web pages to return. Let us call the set of web pages returned by a search engine for a query the Retrieved set of pages (as these are the pages that the search engine retrieved for you).
Let’s go through an example.
- Say the search query is “Weather in Los Angeles”.
- Relevant Pages : Let us assume that the ideal set of web pages that answer this query are the pages that describe the current weather in LA and pages with a 7 day forecast of the weather in LA.
- Retrieved Pages : Let us say that for this query, a search engine retrieves the following pages based on its algorithm – one page describing the current weather, some pages talking about a book titled “Under the Weather in Los Angeles” and some pages talking about the overall climate of Los Angeles.
For every search query, a search engine tries to make the retrieved pages as close to the relevant pages as possible. Basically, a search engine wants to retrieve all of the relevant pages for a query and nothing else, but this is more difficult than it sounds. We are not going to talk about the retrieval algorithm in this post, but we are going to jump to evaluating search engines as this is where precision and recall are used.
If we have three different search engines and we want to find the one that performs the best for a benchmark query, how do we do it? This is where we need precision and recall. These are extremely useful and we are going to see why. First, let us see what they mean.
Recall is defined as the fraction of relevant documents that are retrieved and precision is the fraction of retrieved documents that are relevant. Let us look at the formulas of precision and recall for a better understanding.
Recall = Number of pages that were retrieved and relevant / Total number of relevant pages.
Precision = Number of pages that were retrieved and relevant / Total number of retrieved pages.
You can see that the numerator in both cases is the same, only the denominator changes. Also, this definition of precision is specific to information retrieval, and is different from the statistical definition of precision.
Let’s go through an example on precision and recall. Let us say there exist a total of 5 pages labelled P1, P2, P3, P4 and P5. Let us assume that for the query “weather in Los Angeles”, the pages that are relevant are P3, P4 and P5 (the green pages shown below). So the total number of relevant pages is 3. Let us assume that a search engine returns the pages P2 and P3. So the number of retrieved pages is 2.
The search engine returns the pages P2 and P3 but only P3 is relevant. So the number of pages that are retrieved and relevant is 1 (only P3).
So based on the formula,
Recall = 1 / 3 = 0.67
Precision = 1 / 2 = 0.5
Higher values of precision and recall (closer to 1) are better.
Now let us think about why we need both precision and recall. Suppose we are trying to build our own search engine. In one case, say we design our search engine to return only one page for any query. If that one page is relevant, the precision will be =
Number of Retrieved and Relevant Pages / Number of Retrieved Pages = 1 / 1 which is 100%.
If there are actually 1000 relevant pages that exist, the recall will be 1 / 1000 which is 0.1%. Clearly, this system is not performing well with such a poor recall. If we didn’t have recall but only had precision as an evaluation metric, this system would be incorrectly assumed to be performing very well, whereas in reality it isn’t.
Now let’s think of another case. Suppose we design a search engine that always returns every page that exists. Then in this case, the recall =
Number of relevant and retrieved pages / Number of relevant pages = 1 = 100% because the number of relevant retrieved pages is equal to the number of relevant pages.
If there were 1 million pages that exist, out of which 1000 pages are actually relevant, then all 1 million pages would be retrieved and the precision =
Number of retrieved and relevant / Number of retrieved = 1000 / 1000000 = 0.001 = 0.1%
Clearly, this search engine is not performing well because it is simply returning every page that exists. But if we ignored precision and only stated the recall = 100%, it would seem like the system is doing really well.
So as you can see, we need both precision and recall to evaluate a search engine or any information retrieval task.
Overall, in this post we talked about precision and recall and how they are important in information retrieval.