My $300 Home Cloud Server: A Story of Blood, Sweat, and eBay

“I really wish I had a dedicated Linux computer to run computer vision algorithms on,” said my fiancée a couple of weeks ago. If you were there you would have been blinded by the metaphorical light bulb that lit over my head. You see, just the week before, my friend and co-worker had ordered an old, decommissioned (complete with “non-classified” stickers!) Apple Xserve off of eBay for merely $40. Like my fiancée, he wanted to have a machine for a special purpose: test compilations of open source software on a big-endian architecture. I was quite envious that he was able to hack on such cool hardware for such a cheap price. But, I wasn’t yet ready to bring out my wallet. I couldn’t justify indulging a new hobby without good reason—I was stuck waiting for just the right impetus. I didn’t wait long. My fiancée’s wish became my command!
Continue reading

Eyes Clouded by Distributed Systems

You are probably reading this article with a dual- or quad-core processor, and perhaps with even more cores. Your computer is already a distributed system, with multiple computing components—cores—communicating with each other via main memory and other channels such as physical buses—or wires—between them. As you browse multiple web pages you are interacting with the largest distributed system ever created—the Internet.  We recently celebrated IPv6 Day [0]: IPv6 is a new form of addressing devices connected to the Internet because its sheer scale has outgrown the previous standard IPv4’s list of addresses—all 4+ billion of them.  Every Internet company depends on distributed systems, and, by extension, the economies of the world are now tied to them.

Companies such as Google, Facebook, and Amazon are all interested in building highly efficient large-scale distributed systems which power their businesses. Over the previous decade, Google has described their Google File System (GFS) [1]—a file system spanning thousands of computers to store more data than any single computer system, and a technology that has shaped almost every form of large-scale computing since publication: MapReduce [2].  MapReduce is distributed computing for the masses because it distills everything down to two functions—Map and Reduce—and once they are specified it handles all other aspects of coordinating thousands of computers on behalf of the programmer. Facebook has released open source projects such as Thrift [3] for implementing communication between programs in different programming languages. Amazon built the first, and largest, public cloud EC2 [4] by inventing new distributed systems designed to bring datacenter scale to the masses—with EC2 you can easily start 100 servers within minutes.   Amazon has offered many other services to enhance their overall cloud such as a storage substrate called S3 [5]—think of it as a building block for a GFS—and CloudFront [6], a content distribution network (CDN) designed to distribute data around the world for low latency and high bandwidth access. Akamai [7] also helps deliver the web’s content with one of the largest CDN networks in the world. Netflix has their own distributed CDN [8] as they outgrew solutions provided by Akamai and Amazon.

Continue reading