2011 July

Wolfram’s Computable Document Format and Dexy

July 21, 2011   ·   By   ·   Comments Off   ·   Posted in Uncategorized

This morning, several people pointed me to Wolfram’s Computable Document Format asking whether it is a direct competitor to Dexy. In short, no. CDF documents created using Mathematica have some superficial resemblances to documents created using Dexy, but there are many significant differences.

The Computable Document Format (CDF) is a document format with a published, open specification (apparently, although I haven’t been able to actually find this specification as yet). Right now the only way to create CDF documents is by using Mathematica, and the only way to view CDF documents is by using Wolfram’s CDF viewer.

Here is a summary of some of the differences between Mathematica/CDF and Dexy:

Free vs. Commercial

The most obvious difference is that Dexy is free software. For now, in order to create CDF documents, you must have Mathematica. Even after paying your Mathematica license fee, you may need to pay more in order to distribute the CDF documents you create. It’s free to give away your CDF documents (your readers will have to put up with a Wolfram splash screen), but you can’t sell or even prevent others from republishing your blog posts without paying. Also “Enhanced capabilities, including import/export of external data, arbitrary input fields, dialog windows, and file saving” cost extra. Yep, this solution for “reproducibility” won’t let you export the data (unless Wolfram provides the data) unless you pay up.

Choice of Output Formats

Wolfram’s Computable Document Format is a document format. Dexy is a tool which lets you output in basically any document format which can be scripted or compiled. In the sense that you can publish CDF documents for viewing on the web (except, for now, on Linux), on the Desktop, and for iPad/mobile devices (these are “coming soon”), then CDF does give you some choice in output formats. However, all of these options require your user to download and install new software. Currently, this software is 231 MB in size. Big enough that I cannot download it at present as I am travelling.

Right now the only way to author CDF documents is to use Mathematica, and the only way to view CDF documents is to use Wolfram’s viewer. If there were an open source authoring tool to create CDF documents that did not depend on Mathematica code, well, then CDF would simply be another output option for Dexy.

Reusability

One of the things that differentiates Dexy from most other literate programming frameworks is the decoupling of code and documents. With Dexy, your code lives in code files which can be located anywhere, even in-situ in your project. You simply pull code into your document when and where you need it, via filters which transform it in various ways. This means you can write an example script and re-use this script in several places, say a blog post, a tutorial, a research paper and a book. You can just show code, actually run it, or both.

Mathematica-created CDFs use the traditional 1:1 paradigm of having you write your code within your document. So your code is embedded and can really only be used in that one place. This may be a little simpler and more intuitive, especially for non-programmers, but it seriously limits what you can do. (I’m sure there are workarounds but this defeats the purpose.)

Interactivity

The CDF format is highly interactive. You can tweak parameters and watch how this changes a graph. That is cool, and may be really useful for getting people interested in your data, but interactivity is not typically compatible with reproducibility.

A nice exception to this is an approach like http://www.commentspace.net/ where you can explore interactive data, snapshot it so others can see how you were viewing it, and comment.

Interactivity without a “save as source” option means that readers are simply consumers. A student interacting with your model using CDF can’t write a report about it, a customer can’t change some specifications and send you back a document with “I think this is more realistic”. They can take screenshots perhaps, but these are not reproducible.

Documents created using Dexy are static, not interactive (unless you use Dexy to feed data and source code to, say, a flash app where the results can be manipulated). However, in many circumstances this is preferable. With Dexy you can manipulate source code and see the results quickly, rather than instantly, and if you share your sources anyone can interact with your data and source at a deeper, more meaningful level than just playing around with rotating a 3D model, as fun as that might be. And if you use open source software, then people don’t have to pay someone else a license fee to engage with your content.

Choice

Languages supported by Mathematica: Mathematica

Languages/Systems supported by Dexy (so far)*: Python, R, Ruby, Java, PHP, Erlang, JavaScript, JRuby, Jython, Dot, C, C++, Clojure, Ragel

Formats supported by CDF: CDF

Formats supported by Dexy (so far)*: plain text, markdown, textile, HTML, LaTeX, Asciidoc

*If what you want isn’t listed, get in touch.

Transparency

When you document your scripts with Dexy, you can choose to document the source code underlying the implementations of the functions you call, assuming these are open source. Basing scientific computations on a closed-source interpreter means having to take Mathematica’s word for the correctness of its implementation, and having to reply on Mathematica’s documentation for the specifics of that implementation.

It’s Not All Bad

I have been very harsh here on Mathematica/CDF. That is not to say that it isn’t a highly streamlined and convenient package which I’m sure will provide a lot of value to business users. Open source projects can always use more inspiration on how to make data management and reporting a more seamless part of the process. However, for the sake of science and learning, a closed source pay-to-play engine with prohibitively large dependencies is going to make reproducibility and meaningful interactivity worse, not better.

I will absolutely agree that for a non-programmer, and probably even for most programmers, creating a document for the first time in Mathematica is probably a much faster and smoother process than doing the same in Dexy. Dexy does have a learning curve, although I am always working to make this as gentle as possible. For some people, this convenience will trump all other considerations. However for reproducibility, flexibility and control over your data and source code, these other considerations are very weighty indeed.

When we eventually get to look at the open specification for CDF, then it may be that this turns out to be a useful standalone document format which, once decoupled from Mathematica, could provide the benefits of interactivity to open source projects, and hopefully this would enable the development of features to promote iterative feedback and learning. This would depend on the standard itself and also the attitude of Wolfram towards CDF being a genuinely open standard. If this did happen, unfortunately Mathematica users would probably be left out as their licensing terms would preclude using these features.

Please note that due to the huge download size I was unable to actually run a CDF viewer, and while I have done my best to be accurate, it it possible that I have some details wrong (since I also have no way of running Mathematica). Please, please let me know in the comments if I have made any inaccurate statements about how Mathematica or CDF works.