2010 September

The Dexy Command

September 26, 2010   ·   By   ·   Comments Off   ·   Posted in Uncategorized

The dexy command-line tool is the primary way of interacting with Dexy. Let’s take a look at the options which are available by calling:

dexy -h

Read more…

Blogging With Dexy

September 15, 2010   ·   By   ·   2 Comments   ·   Posted in Uncategorized

Blogging about programming often means talking about code, and including code snippets in your blog post. Anyone who has tried to set up a technical blog will know that this isn’t that easy. GUI text editors like to reformat your code, especially ignoring whitespace. Then there’s the thorny issue of syntax highlighting. And, blog posts have the same issues as any other type of writing when it comes to including source code: if you write the source code directly into the document, it might be wrong (and I find it to be a relatively slow process). If you write it separately and test it, then copy and paste, it’s probably correct but then any changes you make may introduce errors. (I always find that, as I write, I iterate the example code.) Also, the snippet you want to type into your blog may not be able to run on its own. You want to be able to write code in code files using your favourite text editor, test and run it, and then leave it there to be run again at any time or modified and re-tested, but be able to pull out just the useful and interesting sections to include in your discussion. This post is going to show you how Dexy makes this possible.

Read more…

Installing Dexy

September 13, 2010   ·   By   ·   Comments Off   ·   Posted in Uncategorized

Here’s what’s needed to install Dexy on a totally blank Arch Linux installation:

pacman -Sy —noconfirm python
pacman -Sy —noconfirm mercurial
pacman -Sy —noconfirm setuptools

# Packages Used by Dexy
easy_install nose
easy_install ordereddict # only needed if your python < 2.7
easy_install simplejson
easy_install pydot

# Packages Used by Handlers/Examples/Tests
pacman -Sy —noconfirm r
easy_install jinja2
easy_install pexpect
easy_install pygments
easy_install http://dexy.it/tmp/idiopidae-0.5.tgz
easy_install http://dexy.it/tmp/zapps-0.5.tgz

hg clone http://bitbucket.org/ananelson/dexy
cd dexy/
mkdir artifacts
nosetests
python setup.py install

If you wanted to, you could actually run that script (I did to check it). However, you can also just use it as a handy checklist of what should be installed to get Dexy working. Pacman is the Arch Linux package manager/installer, so you might need to check for different package names and of course use the appropriate syntax for your package manager of choice. Hopefully easy_install will be the same everywhere and if you already have that, this shouldn’t take long. You need everything on there in order to run the tests, however if you really feel the need you can try leaving off some of the packages in the second list if you feel you won’t need to use the Dexy filters which require them. (I’ll be tidying up the filter system soon too so Dexy won’t complain if you don’t have the ingredients for every filter installed.)

A note about pexpect. Dexy uses pexpect for many of its filters, including the filter which runs Python code. This allows tremendous flexibility, in that anything with a command line interface can be controlled using pexpect. However, pexpect only works on *nix-like operating systems, not on Windows. This means that Dexy’s functionality will be very limited on Windows for the time being (and I haven’t done any testing of Dexy on Windows yet).

However, I would encourage Windows users to consider using Dexy nonetheless by running it in a virtual machine which runs a flavour of Linux. You are likely to have a better experience using the sorts of tools which are more suited to Dexy on Linux, and running Linux within a virtual machine gives you a “soft” start if you are new to this environment. You won’t need to worry about many of the issues such as device drivers which tend to pose problems for new users, you can just use Linux for research or documentation-related tasks and continue using Windows as before for remaining tasks. Working with virtual machines has many advantages such as being able to easily revert to ‘snapshots’ of your system if you make a mistake in installing or configuring software.

If you run into any issues, please let me know.

Hello, World

September 12, 2010   ·   By   ·   6 Comments   ·   Posted in Uncategorized

I’m delighted to announce the imminent launch of Dexy. Dexy is open source software for automating documents and documentation. The goal of Dexy is for you to never manually type runnable code or the results of running that code into your document. If you’re writing a tutorial, the example code and the example output all come from running live code. If you’re writing a scientific paper or lab report, then your graphs, results of statistical analysis, raw data tables and any source code examples which appear in your paper are all fully automated. Using this approach you can put together complex documents with surprising speed. Dexy is powerful open source productivity software for scientists, programmers and anyone else who writes about code or data.

Dexy’s approach helps you produce software documentation quickly and accurately. How many times have you seen mistakes in tutorials, or in RDoc/JavaDoc-style comments right next to the code? This is far less likely when the example scripts and example output come from live code. Even more importantly, Dexy helps you maintain documentation over time. I’ve seen many projects with excellent documentation written for the 0.1 release, but not updated to reflect the changes in the 0.3 release, causing frustration and confusion to users. Good documentation is key to attracting and retaining users, and maximizing the benefits they receive from your software. Dexy makes it easier not just to write documentation, but to write good documentation, in any format you like, and to keep it up to date.

For scientific writing, Dexy helps you compose documents quickly and easily, but also helps you (or others) validate the correctness of those documents by clearly linking sources to the end results. If you’re revisiting work that’s a few years old, having built it with Dexy will ensure that you can update it easily and remind yourself how it was put together. “Show me how you arrived at this number” will be a question you’ll look forward to being asked (or, if you publish your full sources, a question your readers won’t need to ask). Using Dexy (and publishing your full source code) can make it much easier for fellow researchers to reproduce your results or explore your model, which may lead to valuable feedback and discussions, and eventually more papers based on (and citing) your work.

Dexy works for any kind of document. The most common approach taken in the demos is to write prose in a LaTeX or HTML document, and pull in sections of source code, graphs, or output from other scripts using tags. This allows you to write any type of document: tutorials, getting started guides, architecture overviews, literate test output, a personal research journal, blog posts. Since Dexy simply structures the composition of various tools, and it’s very easy to write new Dexy filters, this is just the beginning of what Dexy can do. Dexy is written in Python, and this makes it slightly easier to incorporate tools with a native Python interface, but with Python’s pexpect module you can run any command line process as a filter. Or multiple command line processes, as in the C filter which first calls the clang/LLVM compiler, then runs the freshly compiled binary.

You can accomplish document automation to some degree with shell scripts, rake/make, and other automation tools, and there are existing software tools which support literate documentation, but no approach is as convenient, complete or flexible as Dexy. If you already have a favourite tool, then you don’t need to stop using it to get the benefits of Dexy. If it has a command line interface or it can be scripted, then it can probably work as a Dexy filter.

Dexy Sphinx Sweave
any language Python, C/C++ R
any templating system ReST noweb
any markup ReST LaTeX
runnable code
standalone source code files
freeform documents
mix multiple languages
smart caching

(help me improve this feature table by letting me know if there are other packages I should include, or any other suggestions/corrections)

How Dexy Works

If you want you can skip ahead to the examples.

input. filter. combine.

Dexy works by taking inputs, usually raw text files, and running the text through filters. Filters are arbitrary code which change the input text in some way. Filters can be chained together, so the output of one filter becomes the input to the next filter.

Filters can change the input text in a subtle way, such as applying syntax highlighting tags to source code. Or filters can take more drastic action, like running code and using the input text as a parameter specification (or ignoring the input text altogether). You can do absolutely anything with a filter.

Taking input text and piping it through filters is hardly new; where Dexy really shines is in the way you specify which filters should be run, and in letting you compose documents with other documents as inputs. Dexy looks for specification files named .dexy, which are JSON-formatted text files that you place in the directory containing files you wish to process, or in any parent directory within the project. Within the .dexy file, you specify individual files or wildcard patterns and use the | symbol to indicate the filters you wish to apply.

So, for example:

"001.c|c"

tells Dexy to look for a file named 001.c, and to apply the ‘c’ filter to this file. The ‘c’ filter compiles the C code and executes the resulting binary. The output from running this filter is going to be whatever was written to STDOUT by the C programme. The specification:

"*.c|c"

tells Dexy to look for any file with a .c extension and run the ‘c’ filter on it,

"*.c|pyg"

tells Dexy to look for any file with a .c extension and to apply the ‘pyg’ filter to it (which runs the Pygments syntax highlighter on the input). The ‘pyg’ filter defaults to HTML output, if we wanted LaTeX instead we could achieve this by specifying:

"*.c|pyg|l"

where ‘l’ is a filter which simply passes through the input text unchanged, but it restricts its input to .latex files (which forces the ‘pyg’ filter to output LaTeX).

Now, these C files and their output and their syntax highlighting could be useful on their own, but how about this:

"tutorial.html|jinja" : {
   "inputs" : [
     "*.c|pyg",
     "*.c|c"
   ]
 }

That tells Dexy that we want to take the contents of tutorial.html and pipe it through the Jinja templating system, and we want the items listed as inputs to be available. So using Jinja tags, we can pull in source code and the output of running that source code anywhere in our HTML document. When you make a change in 001.c, then the syntax highlighting and the record of STDOUT are updated in the tutorial.

Prefer to publish a PDF? No problem. Write your tutorial as a .tex file.

"tutorial.tex|jinja|latex" : {
   "inputs" : [
     "*.c|pyg|l",
     "*.c|c"
   ]
 }

The jinja filter will pull in the source code and outputs, and the latex filter will run ‘pdflatex’ and actually create a PDF.

Of course, with wildcards you don’t need to specify this for each document you want to create. You can say:

"*.html|jinja" : {
   "inputs" : [
     "*.c|pyg",
     "*.c|c"
   ]
 }

in the root of your project and this rule will apply in all subdirectories (unless overridden). In fact, there’s even a shorter way:

"*.html|jinja" : {
 "allinputs" : true
 }
 

That tells Dexy to make available all other outputs (except those also marked as ‘allinputs’).

Dexy will run inputs in the correct order (and will raise an error if there are circular dependencies).

cache.

Dexy’s caching system makes it easy to include the results of multiple inputs, and it also makes Dexy fast and correct. Fast because filters are only processed when they need to be, and correct because when filters need to be re-run, they are. The caching system stores output in a file whose name is a hashcode calculated based on all vital inputs to the file (plus the file extension). When any of those ingredients change, the hashcode changes, and the next time the file is requested the input text gets run through filters again. If a document depends on other documents, a change in any of these documents will change the hashcode, so the document will be updated. The use of files for caching also makes it very easy to write filters incorporating command line tools which take a filename as input. (I have started to use this caching system in other projects to good effect.)

Say Hello

Here are some Hello, World examples to give you a taste of what Dexy can do. This source code for this blog is hosted on bitbucket so check that out if you want to see the source of this document.

Voice Example

Let’s let Dexy do some talking.

Hi, I’m Dexy. This mp3 was made by the voice filter, I can say dynamic text like {{ d['random.R|rart'] }}.

The dynamic text was generated by running this simple R script:

cat(round(rnorm(1, 100, 10)))

C Example

Here is the source of a hello, world in C:

#include <stdio.h>

int main()
{
  printf("hello, world\n");
}

And here is the outcome of running this:

hello, world

Python Example

And, here’s a hello from Python:

print "hello, world!"

And here is the outcome of running this:


Fancy R Example

And, let’s not leave out R!

Here is the R script before any processing:

cat("hello world :-)\n")


# Let's graph the waveform of the mp3 we made earlier...
require(tuneR, quiet=TRUE)
require(seewave, quiet=TRUE)
w <- readMP3("artifacts/{{ d['filename']['hello.txt|jinja|voice']  }}")
png(file="{{ a.create_input_file('waveform', 'png') }}", width=550, height=550)
spectro(w, osc=TRUE)
dev.off()

You might notice some Jinja tags in there. This R file gets run through the Jinja2 filter before being executed in R, and the create_input_file function generates random filenames. Dexy will keep track of these for us, we just need to know the key we have chosen, in this case “waveform”. This system ensures that we write to a different file every time this code is run, without having to worry about carefully naming files. This is a convenience for an example like this, but it becomes very important when dealing with more complex documents.

In the line above, we also use Jinja tags to insert the name of the mp3 file generated earlier by the voice filter.

Here’s how this looks when it’s actually run in R, after the tags are filled in:

> cat("hello world :-)\n")
hello world :-)
> 
> 
> # Let's graph the waveform of the mp3 we made earlier...
> require(tuneR, quiet=TRUE)
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'tuneR'
> require(seewave, quiet=TRUE)
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'seewave'
> w <- readMP3("artifacts/531a5d91fb25ba1c2d8b7dd214cb89c2.mp3")
Error: could not find function "readMP3"
Execution halted

And here is the waveform graph (try playing the mp3 file while looking at this):

Dexy Project

If you are interested enough to want to know more about Dexy then you can follow @dexyit on twitter or subscribe to this blog’s RSS feed for updates. The dexy.it website will also be launching very soon. This blog will feature tutorials for using and extending Dexy, examples and case studies demonstrating the benefits of Dexy, and discussions of Dexy’s internals. If you’re in Dublin, then I’ll be officially launching Dexy at ossbarcamp in 2 weeks, come along!

Dexy isn’t ready for prime time yet, the command line tool is very basic and not well documented yet, but if you’re curious and want to poke around, then feel free to check out the source at bitbucket.