Many years ago (I will not reveal my age), I began working on my PhD thesis concerning the area of Domain-Specific Languages (DSLs). Research was booming at the time and many research articles stated in their introduction that DSLs are very useful and increase productivity, by reducing lines of code etc. All these claims seemed logical to me, but I always considered them something like urban legends. We all know that they are correct, but cannot easily prove it. Keeping that on the back of my mind, I searched for a way to bring the “legend” down to measurable facts that will provide solid motivation for the importance DSLs in every day programming. I decided to do a simple experiment that measures DSL usage in open source programs. Continue reading
Ultra-Efficient via Sublinearity
For a long time in the area of design and analysis of algorithms, when we have said that an algorithm is efficient we meant that it runs in time polynomial in the input size n and finding a linear time algorithm have been considered as the most efficient way to solve a problem. It’s been because of this assumption that we need at least to consider and read all the input to solve the problem. This way it seems that we cannot do much better! But nowadays the data sets are growing fast in various areas and applications in a way that it hardly fits in storage and in this case even linear time is prohibitive. To work with this massive amount of data, the traditional notions of an efficient algorithm is not sufficient anymore and we need to design more efficient algorithms and data structures. This encourages researchers to ask whether it is possible to solve the problems using just sublinear amount of resources? what does that mean exactly when we say ‘sublinear resources’?
We can think of sublinear algorithms in the area of big data in three different categories:
Laying the Foundation for a Common Ground
This week, the Simons Institute hosted a workshop entitled Unifying Theory and Experiment for Large-Scale Networks. The goal of the workshop is to bring together researchers involved in various large networks problems to discuss both the theoretical models and the empirical process for testing and validating them. Even further, the “unifying” in the title suggests a forum on where the ends of the spectrum may meet.
More with Less: Deduplication
Deduplication is a critical technology for modern production and research systems. In many domains, such as cloud computing, it is often taken for granted [0]. Deduplication magnifies the amount of data you can store in memory [1], on disk [2], and in transmission across the network [3]. It comes at the cost of more CPU cycles, and potentially more IO operations at origin and destination storage backends. Microsoft [4], IBM [5], EMC [6], Riverbed [7], Oracle [8], NetApp [9], and other companies tout deduplication as a major feature and differentiator across the computing industry. So what exactly is deduplication?
The Scary Reality of Identity Theft
One of the most basic philosophical questions stems from attempting to identify oneself, with the first step of proving you actually exist. René Descartes provides a proof with
Cogito ergo sum
meaning, “I think, therefore I am.” The intuition is that the mere fact of thinking forms a proof that you exist. But who or what are you exactly? What identifies you? How can we definitively prove you are what you claim to be? Who you claim to be? The problem of identity is an incredibly hard one—how do you know a letter in the mail is from the person that signed it? How do you know a text was written by the owner of a certain phone? How do you know an email comes from the person that owns an email address? This is a fundamental problem that faces the fields of computer science and cryptography, and it is incredibly hard to solve.