Uncategorized

Hello, World

September 12, 2010   ·   By   ·   6 Comments   ·   Posted in Uncategorized

I’m delighted to announce the imminent launch of Dexy. Dexy is open source software for automating documents and documentation. The goal of Dexy is for you to never manually type runnable code or the results of running that code into your document. If you’re writing a tutorial, the example code and the example output all come from running live code. If you’re writing a scientific paper or lab report, then your graphs, results of statistical analysis, raw data tables and any source code examples which appear in your paper are all fully automated. Using this approach you can put together complex documents with surprising speed. Dexy is powerful open source productivity software for scientists, programmers and anyone else who writes about code or data.

Dexy’s approach helps you produce software documentation quickly and accurately. How many times have you seen mistakes in tutorials, or in RDoc/JavaDoc-style comments right next to the code? This is far less likely when the example scripts and example output come from live code. Even more importantly, Dexy helps you maintain documentation over time. I’ve seen many projects with excellent documentation written for the 0.1 release, but not updated to reflect the changes in the 0.3 release, causing frustration and confusion to users. Good documentation is key to attracting and retaining users, and maximizing the benefits they receive from your software. Dexy makes it easier not just to write documentation, but to write good documentation, in any format you like, and to keep it up to date.

For scientific writing, Dexy helps you compose documents quickly and easily, but also helps you (or others) validate the correctness of those documents by clearly linking sources to the end results. If you’re revisiting work that’s a few years old, having built it with Dexy will ensure that you can update it easily and remind yourself how it was put together. “Show me how you arrived at this number” will be a question you’ll look forward to being asked (or, if you publish your full sources, a question your readers won’t need to ask). Using Dexy (and publishing your full source code) can make it much easier for fellow researchers to reproduce your results or explore your model, which may lead to valuable feedback and discussions, and eventually more papers based on (and citing) your work.

Dexy works for any kind of document. The most common approach taken in the demos is to write prose in a LaTeX or HTML document, and pull in sections of source code, graphs, or output from other scripts using tags. This allows you to write any type of document: tutorials, getting started guides, architecture overviews, literate test output, a personal research journal, blog posts. Since Dexy simply structures the composition of various tools, and it’s very easy to write new Dexy filters, this is just the beginning of what Dexy can do. Dexy is written in Python, and this makes it slightly easier to incorporate tools with a native Python interface, but with Python’s pexpect module you can run any command line process as a filter. Or multiple command line processes, as in the C filter which first calls the clang/LLVM compiler, then runs the freshly compiled binary.

You can accomplish document automation to some degree with shell scripts, rake/make, and other automation tools, and there are existing software tools which support literate documentation, but no approach is as convenient, complete or flexible as Dexy. If you already have a favourite tool, then you don’t need to stop using it to get the benefits of Dexy. If it has a command line interface or it can be scripted, then it can probably work as a Dexy filter.

Dexy Sphinx Sweave
any language Python, C/C++ R
any templating system ReST noweb
any markup ReST LaTeX
runnable code
standalone source code files
freeform documents
mix multiple languages
smart caching

(help me improve this feature table by letting me know if there are other packages I should include, or any other suggestions/corrections)

How Dexy Works

If you want you can skip ahead to the examples.

input. filter. combine.

Dexy works by taking inputs, usually raw text files, and running the text through filters. Filters are arbitrary code which change the input text in some way. Filters can be chained together, so the output of one filter becomes the input to the next filter.

Filters can change the input text in a subtle way, such as applying syntax highlighting tags to source code. Or filters can take more drastic action, like running code and using the input text as a parameter specification (or ignoring the input text altogether). You can do absolutely anything with a filter.

Taking input text and piping it through filters is hardly new; where Dexy really shines is in the way you specify which filters should be run, and in letting you compose documents with other documents as inputs. Dexy looks for specification files named .dexy, which are JSON-formatted text files that you place in the directory containing files you wish to process, or in any parent directory within the project. Within the .dexy file, you specify individual files or wildcard patterns and use the | symbol to indicate the filters you wish to apply.

So, for example:

"001.c|c"

tells Dexy to look for a file named 001.c, and to apply the ‘c’ filter to this file. The ‘c’ filter compiles the C code and executes the resulting binary. The output from running this filter is going to be whatever was written to STDOUT by the C programme. The specification:

"*.c|c"

tells Dexy to look for any file with a .c extension and run the ‘c’ filter on it,

"*.c|pyg"

tells Dexy to look for any file with a .c extension and to apply the ‘pyg’ filter to it (which runs the Pygments syntax highlighter on the input). The ‘pyg’ filter defaults to HTML output, if we wanted LaTeX instead we could achieve this by specifying:

"*.c|pyg|l"

where ‘l’ is a filter which simply passes through the input text unchanged, but it restricts its input to .latex files (which forces the ‘pyg’ filter to output LaTeX).

Now, these C files and their output and their syntax highlighting could be useful on their own, but how about this:

"tutorial.html|jinja" : {
   "inputs" : [
     "*.c|pyg",
     "*.c|c"
   ]
 }

That tells Dexy that we want to take the contents of tutorial.html and pipe it through the Jinja templating system, and we want the items listed as inputs to be available. So using Jinja tags, we can pull in source code and the output of running that source code anywhere in our HTML document. When you make a change in 001.c, then the syntax highlighting and the record of STDOUT are updated in the tutorial.

Prefer to publish a PDF? No problem. Write your tutorial as a .tex file.

"tutorial.tex|jinja|latex" : {
   "inputs" : [
     "*.c|pyg|l",
     "*.c|c"
   ]
 }

The jinja filter will pull in the source code and outputs, and the latex filter will run ‘pdflatex’ and actually create a PDF.

Of course, with wildcards you don’t need to specify this for each document you want to create. You can say:

"*.html|jinja" : {
   "inputs" : [
     "*.c|pyg",
     "*.c|c"
   ]
 }

in the root of your project and this rule will apply in all subdirectories (unless overridden). In fact, there’s even a shorter way:

"*.html|jinja" : {
 "allinputs" : true
 }
 

That tells Dexy to make available all other outputs (except those also marked as ‘allinputs’).

Dexy will run inputs in the correct order (and will raise an error if there are circular dependencies).

cache.

Dexy’s caching system makes it easy to include the results of multiple inputs, and it also makes Dexy fast and correct. Fast because filters are only processed when they need to be, and correct because when filters need to be re-run, they are. The caching system stores output in a file whose name is a hashcode calculated based on all vital inputs to the file (plus the file extension). When any of those ingredients change, the hashcode changes, and the next time the file is requested the input text gets run through filters again. If a document depends on other documents, a change in any of these documents will change the hashcode, so the document will be updated. The use of files for caching also makes it very easy to write filters incorporating command line tools which take a filename as input. (I have started to use this caching system in other projects to good effect.)

Say Hello

Here are some Hello, World examples to give you a taste of what Dexy can do. This source code for this blog is hosted on bitbucket so check that out if you want to see the source of this document.

Voice Example

Let’s let Dexy do some talking.

Hi, I’m Dexy. This mp3 was made by the voice filter, I can say dynamic text like {{ d['random.R|rart'] }}.

The dynamic text was generated by running this simple R script:

cat(round(rnorm(1, 100, 10)))

C Example

Here is the source of a hello, world in C:

#include <stdio.h>

int main()
{
  printf("hello, world\n");
}

And here is the outcome of running this:

hello, world

Python Example

And, here’s a hello from Python:

print "hello, world!"

And here is the outcome of running this:


Fancy R Example

And, let’s not leave out R!

Here is the R script before any processing:

cat("hello world :-)\n")


# Let's graph the waveform of the mp3 we made earlier...
require(tuneR, quiet=TRUE)
require(seewave, quiet=TRUE)
w <- readMP3("artifacts/{{ d['filename']['hello.txt|jinja|voice']  }}")
png(file="{{ a.create_input_file('waveform', 'png') }}", width=550, height=550)
spectro(w, osc=TRUE)
dev.off()

You might notice some Jinja tags in there. This R file gets run through the Jinja2 filter before being executed in R, and the create_input_file function generates random filenames. Dexy will keep track of these for us, we just need to know the key we have chosen, in this case “waveform”. This system ensures that we write to a different file every time this code is run, without having to worry about carefully naming files. This is a convenience for an example like this, but it becomes very important when dealing with more complex documents.

In the line above, we also use Jinja tags to insert the name of the mp3 file generated earlier by the voice filter.

Here’s how this looks when it’s actually run in R, after the tags are filled in:

> cat("hello world :-)\n")
hello world :-)
> 
> 
> # Let's graph the waveform of the mp3 we made earlier...
> require(tuneR, quiet=TRUE)
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'tuneR'
> require(seewave, quiet=TRUE)
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'seewave'
> w <- readMP3("artifacts/531a5d91fb25ba1c2d8b7dd214cb89c2.mp3")
Error: could not find function "readMP3"
Execution halted

And here is the waveform graph (try playing the mp3 file while looking at this):

Dexy Project

If you are interested enough to want to know more about Dexy then you can follow @dexyit on twitter or subscribe to this blog’s RSS feed for updates. The dexy.it website will also be launching very soon. This blog will feature tutorials for using and extending Dexy, examples and case studies demonstrating the benefits of Dexy, and discussions of Dexy’s internals. If you’re in Dublin, then I’ll be officially launching Dexy at ossbarcamp in 2 weeks, come along!

Dexy isn’t ready for prime time yet, the command line tool is very basic and not well documented yet, but if you’re curious and want to poke around, then feel free to check out the source at bitbucket.

6 Comments
  1. admin

    Apologies to people if these mp3 files autoplay, it’s an issue with Chrome/Opera I believe. Will use a different setup next time.

  2. hi ana, looks very interesting! my first thought is that this blog post/your documentation could use some screen shots. that would be really helpful. second is that it seems like a non-trivial learning curve (perhaps like learning latex or something), but also intriguing and promising.

  3. Hi Ana,

    wow, this looks very, very promising. I will follow the project and hopefully I can find some time to play with this.

  4. Vijayan Padmanabhan

    Hi Ana
    Impressive.. Very Impressive.
    Do you intend having a Windows Version of Dexy available for download as a zip package from CRAN in the near future.. I would be the first to try if that is the case..
    Regards
    Vijayan Padmanabhan

    • admin

      Hi, Vijayan,

      I will let you know when I have Dexy running on Windows. In theory the core should work on Windows, however many of the filters depend on pexpect which does not run on Windows. My main priority right now is in getting the core functionality solid and well-documented. In the mean time I am encouraging people, and especially Windows users, to download a linux virtual machine which has Dexy already installed. I’ll be blogging more about this within the next few days.

      Regards,
      Ana

  5. There is support for Windows in more recent versions of Dexy. Although not all the filters work on Windows.