Friday, 1 March 2013

Project ideas for Hadoop

If this is for an undergraduate class, I would suggest something that
allows you to get some work in with basic data structures such as
building an inverted index over a few million documents (maybe Wikipedia
pages?). You will also need to get a general feel for Hadoop.
The University of Washington has some really nice project ideas for
their distributed systems class:


If you wanted to tackle something a little more advanced, then you could
take a look at Pete Skomoroch’s article on finding trends with Hadoop
and Hive:



Things to keep in mind:

1.) Hadoop wont be as simple as writing a single Java app
 
2.) There will be some overhead involved in re-writing algorithms in Map Reduce
 
3.) There will also be some overhead involved in setup and maintenance
of the Hadoop Cluster

Take these three things into account when planning how to manage your
time for the project during the semester, semesters can seem a lot
shorter when you spend too much time on things not related to just
implementing and testing your algorithm.


 Good luck!

No comments:

Post a Comment