17 March 2011
Humans have come a long way since our cave-dwelling days. No, that’s not a metaphor for primitive software. I mean literally since we lived in caves. One of the big inventions is the lock. There are birds that bury food and will move it later if they notice they were watched burying it. But they have no access control. Any bird may come along and dig up the food.
Humans, though, are smarter than the average bird.
We have numerous systems that implement the analog of a lock, namely, some sort of access control. For every one of these systems, we have other systems that attempt to circumvent or defeat the access control. Two sides of the ubiquitous coin of life.
In software, attempts to implement access control typically involve distinguishing between source code and some form of executable code. Direct access to the source code is not permitted. Further, the format of the executable code resists attempts to derive the source code. There are several reasons for this:
The process that separates the source code from the executable program is typically a compilation step. However, Ruby code is not typically associated with any sort of compilation. That’s one of the great things about Ruby, right? There is no edit-compile-link-load cycle to wait on. Just edit and run. But if there is no compilation step, how do we separate the source code from the executable code?
You may recall from my last post that Rubinius does compile Ruby code to a bytecode format that the virtual machine executes. I also promised to explain how you could run the bytecode directly.
But first, let me very clearly state that there are a number of caveats. In fact, I’ve included a whole section on them below. Please read them. We will assume that you have and that you understand them. If you have any questions, please ask.
Let’s review what we would like to accomplish. We’ll assume affable Abe is a developer writing an application for customer Cain.
In this scenario, I’m assuming a very vague definition of application. In other words, the process below will fit in with a broad spectrum of bundling and distribution schemes.
Let’s assume that you have the following application layout. This mirrors what you would expect to see in a gem. You could also consider this as a subtree in your larger project.
widget
|- lib
|- widget.rb
\- widget
|- red.rb
|- blue.rb
\- green.rb
1 # widget.rb
2 require 'widget/red'
3 require 'widget/blue'
4 require 'widget/green'
1 # widget/red.rb
2 puts "I am red"
1 # widget/blue.rb
2 puts "I am blue"
1 # widget/green.rb
2 puts "I am green"
The Rubinius bytecode compiler is accessible through a command-line script.
See rbx compile -h
for all options. We will only need one simple option in
our case to easily create a separate tree containing one compiled file for
every Ruby source file in our source tree.
rbx compile -s '^widget:widget-compiled' widget/
Let’s dissect this command. The -s
option defines a transformation to apply
to every filename. The transformation has the form <source>:<destination>
where <source>
can be a Regexp. In our case, we would like to change any
path starting with widget
to start with widget-compiled
. This way, we
create a separate tree of our compiled files. The final option is the
widget/
directory. The rbx compile
command will happily compile a single
file or a directory of files. Note that if we did not pass the -s
option,
rbx compile
would have created the compiled files alongside the source
files.
If we now look at widget-compiled
, we should see the following:
widget-compiled
|- lib
|- widget.rbc
\- widget
|- red.rbc
|- blue.rbc
\- green.rbc
Now that we have a separate tree of only compiled files, how do we load them? Well, first, let’s load our source files so we know what to expect. Note that the technique used in this post should not substitute for a robust test suite.
$ rbx -Iwidget/lib -e "require 'widget/lib/widget'"
I am red
I am blue
I am green
Ok, that is what I would expect. Now, to load the compiled files:
$ rbx -Iwidget-compiled/lib -e "Rubinius::CodeLoader.require_compiled 'widget/lib/widget'"
I am red
I am blue
I am green
The crowed erupts with applause and hooting.
Golly gee, you guys… Blush
Let’s review. Our goal is to take a tree of Ruby source files and create a
tree of compiled files that can be sent to a customer and loaded to perform
exactly as the Ruby source would if loaded directly. The most direct and
simple way to accomplish this is to use the Rubinius compiler command-line
script to compile the tree of Ruby source files to a separate tree. Then, load
the root of that tree with Rubinius::CodeLoader.require_compiled "root"
.
I will admit, I have resisted fiercely against encouraging or even permitting Rubinius users from using what I showed above in their code. Not because I am an ogre who is trying to steal your fun, but because there are serious issues with allowing this. So, please read the following carefully.
Rubinius::CodeLoader.require_compiled(name)
method, we will respect that
contract. What it says is, given a name, we will load a representation of
that name. DO NOT assume that "some_file"
is actually referencing
"some_file.rbc"
. We may change the way compiled files are stored and may
change the format of the compiled output.Rubinius compiles Ruby code to bytecode before running it. It is possible to save the bytecode representation and reload it later. Using this mechanism, it is possible to avoid providing the Ruby source code and run an application directly from the compiled bytecode. The mechanism we use to do this was created to solve our problem of bootstrapping the Rubinius bytecode compiler, which is written in Ruby. The mechanism is not intended to be used for security.
It is possible to extend the Rubinius code loading mechanism to support custom formats for on-disk compiled bytecode and to load those formats. This can be done entirely in Ruby code. If this interests you, please talk with us about it.