Ultra-Efficient via Sublinearity

For a long time in the area of design and analysis of algorithms, when we have said that an algorithm is efficient we meant that it runs in time polynomial in the input size n and finding a linear time algorithm have been considered as the most efficient way to solve a problem. It’s been because of this assumption that we need at least to consider and read all the input to solve the problem. This way it seems that we cannot do much better! But nowadays the data sets are growing fast in various areas and applications in a way that it hardly fits in storage and in this case even linear time is prohibitive. To work with this massive amount of data, the traditional notions of an efficient algorithm is not sufficient anymore and we need to design more efficient algorithms and data structures. This encourages researchers to ask whether it is possible to solve the problems using just sublinear amount of resources? what does that mean exactly when we say ‘sublinear resources’?
We can think of sublinear algorithms in the area of big data in three different categories:

Continue reading

Laying the Foundation for a Common Ground

This week, the Simons Institute hosted a workshop entitled Unifying Theory and Experiment for Large-Scale Networks. The goal of the workshop is to bring together researchers involved in various large networks problems to discuss both the theoretical models and the empirical process for testing and validating them. Even further, the “unifying” in the title suggests a forum on where the ends of the spectrum may meet.

Continue reading

More with Less: Deduplication

Deduplication is a critical technology for modern production and research systems. In many domains, such as cloud computing, it is often taken for granted [0]. Deduplication magnifies the amount of data you can store in memory [1], on disk [2], and in transmission across the network [3]. It comes at the cost of more CPU cycles, and potentially more IO operations at origin and destination storage backends. Microsoft [4], IBM [5], EMC [6], Riverbed [7], Oracle [8], NetApp [9], and other companies tout deduplication as a major feature and differentiator across the computing industry. So what exactly is deduplication?

Continue reading