RailsConfEurope: EC2, MapReduce, and Distributed Processing
Next session wasEC2, MapReduce, and Distributed Processing by Jonathan Dahl of Tumblon.
He did a nice job of explaining the mapreduce algorithm (the algorithm made famous by google to distribute their indexing algorithm).
Mapreduce is most easily expressed in a functional programming language, but Jonathan showed how simple it is to implement it in Ruby (which off course is quite suitable for functional programming with it’s lambda’s and closures) but it can just as easily be programmed in Java or even C (which is a procedural language).
Johathan also showed how easy it is to use EC2 (Amazon’s utility computing facility) to implement a mapreduce system. He also mentioned a use case for this setup, in which a big American newspaper used mapreduce to convert images of old newspapers to PDF.
His advice though is not to implement mapreduce yourself. Try to use an open source library like Hadoop instead.
Resources
Slides: http://railspikes.com







