TagShadow Dreaming – Hadoop

May 29, 2010

I started the weekend watching through these Hadoop videos over at Cloudera. That led me to research manipulating large matrices with Hadoop. I found a spectacular paper about multiplying matrices with Hadoop. A random statement in that paper about sparse matrices sent me back to the documentation for JAMA (the java matrix library I used for TagShadow processing) and the competing library, Jampack. Turns out neither of them have algorithms optimized for sparse matrices. I did find some notes about how to optimize matrix multiplication for a sparse matrix

I’ve been considering pushing TagShadow to it’s very limits. Even if I just included the titles that have been added to ISFDB it looks like that’s pushing 500k entries. Of course I’d like to handle the online magazines that ISFDB doesn’t cover as well… Handling that much data will involve some real-time abstractions to remain functional. It’s these thoughts that have me investigating parallel computing solutions like Hadoop. This is rather stream of conscious at the moment, but I’m excited.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: