Data Structures in Ruby – Proposed Table of Contents

I’m thinking of writing an ebook on Data Structures in Ruby.  Any feedback, thoughts or comments are greatly appreciated!  I’ve included my proposed table of contents below.

Data Structures in Ruby

1. Intro
2. Analysis of Data Structures
  a. What is a data structure?
  b. Understanding algorithms
  c. Big-O notation (incl. big-O vs theta, omega)
  d. Memory and Time Complexity
  e. Inserting data
  f. Retrieving data
  g. Searching for data
3. Primitive Types
  a. What are primitive types?
  b. What are they good for?
  c. Numeric types
  d. Strings & Symbols
  e. Booleans
  f. Objects
4. Arrays
  a. What are arrays?
  b. What are they good for?
  c. Basic arrays
  d. Dynamic arrays
5. Hashes (aka Maps, Associative Arrays, Dictionaries)
  a. What are hashes?
  b. What are they good for?
  c. Concept of a hash – key, value
  d. Implementations of hashes
6. Lists
  a. What are lists?
  b. What are they good for?
  c. Linked Lists
  d. Doubly-linked lists
  e. Unrolled lists
  f. Self-organizing lists
7. Sets
  a. What are sets?
  b. What are they good for?
8. Stacks
  a. What are stacks?
  b. What are they good for?
  c. Implementations – array vs. linked list
9. Queues
  a. What are queues?
  b. What are they good for?
  c. Queue implementations
  d. Priority Queues
10. Trees
  a. What are trees?
  b. What are they good for?
  d. Binary Trees
  e. Binary Search Trees
  f. Red-black trees
  g. Splay trees
  h. B-trees
11. Graphs
  a. What are graphs?
  b. What are they good for?
  c. Sparse Graphs
  d. Dense graphs
  e. Matrix representations
12. Tries
  a. What are tries?
  b. What are they good for?
  c. Regular tries
  d. Radix tries
  e. Judy arrays
13. Heaps
  a. What are heaps?
  b. What are they good for?
  c. Regular heaps
  d. Binary heaps
  e. Fibonacci Heaps
14. Multiway Trees
  a. What are multiway trees?
  b. What are they good for?
  c. Ternary trees
  d. And-or trees
  e. Spaghetti stacks
15. Probabilistic Data Structures
  a. What are probabilistic data structures?
  b. What are they good for?
  c. Bloom Filters
  d. Skip Lists
  e. Locality-sensitive hashes

Ruby HoodPop and RubyClean – Two Static Analysis Tools for Ruby

While the Ruby culture seems to really celebrate dynamic testing, there is less emphasis on static analysis tools.  While there are certainly good examples of static analysis tools for Ruby such as flog, they’re just not as popular as statically-typed languages like Java.  Now part of this is because doing static analysis on a dynamic language is much more difficult.  However, that’s no reason not to try!  With that in mind, I have put together two small scripts for helping with static analysis of your Ruby code.

The first, HoodPop, allows you to see how Ruby tokenizes, parses, and compiles your code.  This will probably only be of interest for the die-hard performance nuts out there, most of whom are probably coding in C anyways.  It can be interesting to see, however, and will definitely help you understand what Ruby is doing with your code “under the hood.”  I wrote this tool as part of a Lightning Talk I gave at Pittsburgh Ruby in February, if you’d like to see the slides or video of my “Ruby Under the Hood” talk.  To run it, just give it the name of a Ruby (.rb) file as an argument.


The second is arguably more useful.  When I review people’s code (and when my code is reviewed), one of the most common issues I see is people forgetting to remove their old commented-out code.  There’s no real need to keep most of it around when you have version control, and almost every time it’s just an oversight.  Therefore, I wrote a simple script which will check for potentially commented-out code in a codebase.  It’s not fool-proof, but I tried to err on the side of showing you more than less – you can always grep -v some lines out from the output if, for instance, your comments contain an abundance of curly braces.  To run this, just give it a list of root directories as arguments.  It will only look at .rb files, but you could easily extend it for .erb and other files.  You’d probably want to change the regexes around in the possibly_code? function first, but it should be relatively self-explanatory.


Those Who Don’t Know Their UNIX History Are Doomed to Re-type It

This is a bit of a diversion from my series on probabilistic data structures (I promise, I’m working on putting an article on skip lists together), but I wanted to write about something which I think could benefit your UNIX (or UNIX-esque, e.g., GNU/Linux) development environment immensely.

It turns out that not everybody who uses the UNIX command line knows about the wealth of history commands available.  Used properly, you should rarely ever have to re-type a full command, and if you make a quick typo, you will be able to correct it almost on the fly. Note that these commands are not universal (especially if you’re using one of the odder shells, with which I am not very familiar – I’ve really only ever used sh, bash, and tcsh), but should work on bash.  I tried these out on both Ubuntu 12.04 and Mac OS X 10.6 without difficulty, but your mileage may vary.  If you don’t know which shell you’re using, you can try either of the following commands:

echo $SHELL


ps -p $$

The first tip, and probably the most common, is the use of the !-operators.  These refer to previous commands in a variety of ways.  In case you didn’t know, there is a list of commands that you recently typed.  This can easily be accessed by the command history.  You can also add an integer argument to limit it to the last n commands. For example, here is the output of running history 5 on my machine.

$ history 5
 496 clear
 497 ruby -w BayesianClassifier.rb 
 498 vi foo
 499 wc foo
 500 ping

Here you can see that for some reason, I cleared the screen, ran a Ruby program, edited a file, ran a word count on it, and pinged  All quotidian stuff.  If I wanted to re-edit foo, I could type:


and command number 498 would run again, opening up the foo file for me to edit.

You can also edit by referring to commands relatively.  For example, let’s say that instead of editing foo, I just wanted to run a word count on it again.  Instead of typing history and getting the exact number, you can type:


and run the command “two commands ago,” wc foo.  If you want to run the command immediately before, a shortcut for “run the previous command again” is !!.  For example,

$ touch foo
$ !!
touch foo

Note that whenever you use the !-operators, bash will print out the command and then execute it.  If you ONLY want to print out the command, say to double-check that it’s the right one, add the :p operator to the end.

$ touch foo
$ !!:p
touch foo
This will print the command out but importantly, it will not execute it.  However, it “counts” as a command; you could print something out using :p, and then run !! to actually execute it.  From the point of view of the shell, it was the last command “executed.”
You are not limited to the strict text of the command entered.  You can specify which arguments you want to use by using the :n operator after the !-operator.  For example, let’s say you want to list some files and see how big they are, then you will determine which one to edit.
$ ls -l billtest.rb helpers_tests.rb test_recommend.rb 
-rw-r--r-- 1 wlaboon staff 320 Jan 13 18:38 billtest.rb
-rw-r--r-- 1 wlaboon staff 1732 Jan 13 09:39 helpers_tests.rb
-rw-r--r-- 1 wlaboon staff 624 Jan 13 09:39 test_recommend.rb
$ vi !:3
vi helpers_tests.rb

Note that the numbering starts at 0, so !:0 would be ls, !:1 would be -l, and so on.  !$ is a special command meaning the last argument – in this case, test_recommend.rb.  You can also combine operators, like so:

$ !763:p
ls -l billtest.rb helpers_tests.rb test_recommend.rb 
$ !!:$:p

The last example translates to “print out, but do not execute, the last argument to the most recently typed command.”

The ^ operator is also a very useful command.  It allows you to easily correct typos, or slightly modify the previous command.  It works by typing ^, what you wish to change, ^ again, and what you wish to change the previous string to.  This is one of those things which makes more sense when you see it in action:

$ la -l
-bash: la: command not found
$ ^la^ls
ls -l
total 24
-rw-r--r-- 1 wlaboon staff 320 Jan 13 18:38 billtest.rb
-rw-r--r-- 1 wlaboon staff 0 Jan 20 14:32 foo
-rw-r--r-- 1 wlaboon staff 1732 Jan 13 09:39 helpers_tests.rb
-rw-r--r-- 1 wlaboon staff 624 Jan 13 09:39 test_recommend.rb

Note that this only replaces the FIRST instance of the string with the second.

One of the cooler things available in modern shells is the reverse-i-search.  Type control-r at the prompt and you can interactively search for previous commands.

(reverse-i-search)`cblas': locate cblas.h

It’s kind of difficult to tell, but I typed in cblas and it found the last command I used that text string in.  It also interactively shows you on the right the closest match it’s found as you type.

Finally, you can modify some environment variables to make your history completion even better.  Set HISTSIZE to the number of commands you want to store in your history.  You can set HISTCONTROL to ignorespace, ignoredupes, or ignoreboth.  ignorespace will not store any commands that start with a space.  Since the shell ignores initial whitespace, this won’t matter to the execution of the commands, but it won’t store that command in the history.  Honestly, I’ve never felt much need for that, but I guess if you’re embarassed about the number you times you have to run “man *command*”, then perhaps you will find it more useful than I do.  ignoredupes will not put duplicate entries into your history.  ignoreboth will (intuitively) set both the ignoredupes and ignorespace flags.  Since you don’t want to keep re-typing these every time you log in, of course, you should probably add them to your .bashrc or equivalent.