hadoop fs is space-sensitive

HDFS, high density file system, is useful for big data. However, hadoop fs is not quite there as a shell replacement. Today I kept getting the message

cp: When copying multiple files, destination should be a directory.

when trying to copy multiple files to a directory using

hadoop fs -cp /path/to/files/*  /path/to/destination/directory

Finally figured out that the problem was I had two spaces between the file list and the directory path, which made hadoop not see the directory path in the command. Aaahh.

ruby non-intuitive multi-dimensional array assignment

All I want to do is work with an array of arrays…

ruby-1.9.2-p290 :012 > a = Array.new(8,[]) # here lies the problem...
=> [[], [], [], [], [], [], [], []]
ruby-1.9.2-p290 :013 > a[1].push("a")
=> ["a"]
ruby-1.9.2-p290 :014 > a
=> [["a"], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"]]

Trying again…

ruby-1.9.2-p290 :019 > a = Array.new(8,Array.new()) # This doesn't solve it
=> [[], [], [], [], [], [], [], []]
ruby-1.9.2-p290 :020 > a[1][0] = 'a'
=> "a"
ruby-1.9.2-p290 :021 > a
=> [["a"], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"], ["a"]]

Argh! Perl makes this so easy…

Ok, the problem was, that the first two ways of initializing the array, were just creating 8 pointers to the SAME array

Now do it the right way:

ruby-1.9.2-p290 :031 > a = Array.new(8) { Array.new(0) } # NOW we have an array of different arrays
=> [[], [], [], [], [], [], [], []]
ruby-1.9.2-p290 :032 > a[1].push('a')
=> ["a"]
ruby-1.9.2-p290 :033 > a
=> [[], ["a"], [], [], [], [], [], []]

Ahh…but I miss an interpreter that always tries to ‘Do The Right Thing’

And, I wish the two versions didn’t look so identical when inspected…

don’t try creating gdbm file on an nfs mount

gems/gdbm-1.2/lib/gdbm.rb:256:in `initialize': Empty database (GDBMError)

error occurs when trying to use

g = GDBM.new('somefile')

on an nfs-mounted partition. GDBM works fine on normal drives, just don’t try it on nfs-mounts. Posting this as I found nothing when I googled the error message, and wasted several minutes before I realized the problem. The error message may be specific to the ruby ‘gdbm’ gem, but the rule is a general one.

Wordpress debug notes

Note: I’m not a wordpress expert, just returning to it after several years without having touched PHP – and looking for the best way to quickly understand the flow of a wordpress site using buddypress and a few other plugins. Raw notes here, will be annotated as I progress…


Clojure makes the JVM a friendly place…

Yes! Someone gets it – “It has always been an unfortunate characteristic of using classes for application domain information that it resulted in information being hidden behind class-specific micro-languages, e.g. even the seemingly harmless employee.getName() is a custom interface to data. Putting information in such classes is a problem, much like having every book being written in a different language would be a problem. You can no longer take a generic approach to information processing. This results in an explosion of needless specificity, and a dearth of reuse.”
–Rich Hickey, http://clojure.org/datatypes

Data is just data. Please, coders – free the data from all those bureaucratic OO controls on it, and just expose the rules if any, let us obey them thoughtfully our own way. It has got to be better than all these little bureaucratic fiefdoms exerting paranoiac control over their bits of data.

Ok, so it sounds good…now to finish the clojure koans, having realized that the ___ construct is just where you put your answers and not a new triple-underscore special variable. But I’m also hacking on a larger clojure app while learning the basic syntax…it really does seem promising…

The most helpful message yet

My Canon iP4300 printer is exceptionally helpful when anything goes wrong:

“Error Number : 311 Printer is in use or an error has occurred. If an error has occurred, eliminate the cause of the error.”

If an error has occurred, eliminate the cause! Why didn’t I think of that!

Data Visualization

The D3 javascript library looks awesome – clean, extensible, and powerful.

Checkout this example of mashing US Census boundaries with unemployment stats…


Oracle has broken registration form…

All I wanted was an updated copy of berkeley db…

Oracle has it, its free, but you have to register. Ok, I’ll register. The form has lots of required fields…two of which are broken select drop-downs with no options! This is embarassing for multi-million dollar company with a supposed commitment to open source, no?

Here is the culprit:


See the drop-downs for Job Title and State/Province.

Oh, and I tried the live chat. The rep told me to call the help line :
“No I can do nothing, please call one of those numbers to log and start a ticket sir”


need flat, fast namespacing + tags

tags are ok but they are missing a namespace; maybe there is implicit one from the blogger but for global discovery what about

gvelez:chickens (means my chickens, not chickens in general)

quick to write without requiring much forethought or looking up namespaces, but more precise and less prone to overlap than flat tags

fwix beats outside.in for ease of use

Looking for local? (ie, an API to get local data specific to a subject). Tried http://outside.in, the API looked promising and their site had good data, but after a week of 403 errors using their example code snippets and no response from the forums or support, was about to give up for awhile. Then I noticed the Weather Underground folks using fwix. They have an extremely simple API, decent quality data (at least if you filter for News or Places), and although its not as complete as I’d like, it does have more than just twitter posts. For ease of use they get 5 stars. Will see how they do on updating, relevency and completeness…