Uncategorized

Wolfram’s Computable Document Format and Dexy

July 21, 2011   ·   By   ·   Comments Off   ·   Posted in Uncategorized

This morning, several people pointed me to Wolfram’s Computable Document Format asking whether it is a direct competitor to Dexy. In short, no. CDF documents created using Mathematica have some superficial resemblances to documents created using Dexy, but there are many significant differences.

The Computable Document Format (CDF) is a document format with a published, open specification (apparently, although I haven’t been able to actually find this specification as yet). Right now the only way to create CDF documents is by using Mathematica, and the only way to view CDF documents is by using Wolfram’s CDF viewer.

Here is a summary of some of the differences between Mathematica/CDF and Dexy:

Free vs. Commercial

The most obvious difference is that Dexy is free software. For now, in order to create CDF documents, you must have Mathematica. Even after paying your Mathematica license fee, you may need to pay more in order to distribute the CDF documents you create. It’s free to give away your CDF documents (your readers will have to put up with a Wolfram splash screen), but you can’t sell or even prevent others from republishing your blog posts without paying. Also “Enhanced capabilities, including import/export of external data, arbitrary input fields, dialog windows, and file saving” cost extra. Yep, this solution for “reproducibility” won’t let you export the data (unless Wolfram provides the data) unless you pay up.

Choice of Output Formats

Wolfram’s Computable Document Format is a document format. Dexy is a tool which lets you output in basically any document format which can be scripted or compiled. In the sense that you can publish CDF documents for viewing on the web (except, for now, on Linux), on the Desktop, and for iPad/mobile devices (these are “coming soon”), then CDF does give you some choice in output formats. However, all of these options require your user to download and install new software. Currently, this software is 231 MB in size. Big enough that I cannot download it at present as I am travelling.

Right now the only way to author CDF documents is to use Mathematica, and the only way to view CDF documents is to use Wolfram’s viewer. If there were an open source authoring tool to create CDF documents that did not depend on Mathematica code, well, then CDF would simply be another output option for Dexy.

Reusability

One of the things that differentiates Dexy from most other literate programming frameworks is the decoupling of code and documents. With Dexy, your code lives in code files which can be located anywhere, even in-situ in your project. You simply pull code into your document when and where you need it, via filters which transform it in various ways. This means you can write an example script and re-use this script in several places, say a blog post, a tutorial, a research paper and a book. You can just show code, actually run it, or both.

Mathematica-created CDFs use the traditional 1:1 paradigm of having you write your code within your document. So your code is embedded and can really only be used in that one place. This may be a little simpler and more intuitive, especially for non-programmers, but it seriously limits what you can do. (I’m sure there are workarounds but this defeats the purpose.)

Interactivity

The CDF format is highly interactive. You can tweak parameters and watch how this changes a graph. That is cool, and may be really useful for getting people interested in your data, but interactivity is not typically compatible with reproducibility.

A nice exception to this is an approach like http://www.commentspace.net/ where you can explore interactive data, snapshot it so others can see how you were viewing it, and comment.

Interactivity without a “save as source” option means that readers are simply consumers. A student interacting with your model using CDF can’t write a report about it, a customer can’t change some specifications and send you back a document with “I think this is more realistic”. They can take screenshots perhaps, but these are not reproducible.

Documents created using Dexy are static, not interactive (unless you use Dexy to feed data and source code to, say, a flash app where the results can be manipulated). However, in many circumstances this is preferable. With Dexy you can manipulate source code and see the results quickly, rather than instantly, and if you share your sources anyone can interact with your data and source at a deeper, more meaningful level than just playing around with rotating a 3D model, as fun as that might be. And if you use open source software, then people don’t have to pay someone else a license fee to engage with your content.

Choice

Languages supported by Mathematica: Mathematica

Languages/Systems supported by Dexy (so far)*: Python, R, Ruby, Java, PHP, Erlang, JavaScript, JRuby, Jython, Dot, C, C++, Clojure, Ragel

Formats supported by CDF: CDF

Formats supported by Dexy (so far)*: plain text, markdown, textile, HTML, LaTeX, Asciidoc

*If what you want isn’t listed, get in touch.

Transparency

When you document your scripts with Dexy, you can choose to document the source code underlying the implementations of the functions you call, assuming these are open source. Basing scientific computations on a closed-source interpreter means having to take Mathematica’s word for the correctness of its implementation, and having to reply on Mathematica’s documentation for the specifics of that implementation.

It’s Not All Bad

I have been very harsh here on Mathematica/CDF. That is not to say that it isn’t a highly streamlined and convenient package which I’m sure will provide a lot of value to business users. Open source projects can always use more inspiration on how to make data management and reporting a more seamless part of the process. However, for the sake of science and learning, a closed source pay-to-play engine with prohibitively large dependencies is going to make reproducibility and meaningful interactivity worse, not better.

I will absolutely agree that for a non-programmer, and probably even for most programmers, creating a document for the first time in Mathematica is probably a much faster and smoother process than doing the same in Dexy. Dexy does have a learning curve, although I am always working to make this as gentle as possible. For some people, this convenience will trump all other considerations. However for reproducibility, flexibility and control over your data and source code, these other considerations are very weighty indeed.

When we eventually get to look at the open specification for CDF, then it may be that this turns out to be a useful standalone document format which, once decoupled from Mathematica, could provide the benefits of interactivity to open source projects, and hopefully this would enable the development of features to promote iterative feedback and learning. This would depend on the standard itself and also the attitude of Wolfram towards CDF being a genuinely open standard. If this did happen, unfortunately Mathematica users would probably be left out as their licensing terms would preclude using these features.

Please note that due to the huge download size I was unable to actually run a CDF viewer, and while I have done my best to be accurate, it it possible that I have some details wrong (since I also have no way of running Mathematica). Please, please let me know in the comments if I have made any inaccurate statements about how Mathematica or CDF works.

Lights, Camera, Stack Trace

May 24, 2011   ·   By   ·   4 Comments   ·   Posted in Uncategorized

I have been setting up a new computer (yay!) which means installing a bunch of software (boo!). It’s actually been a while, at least in software years, since I did this, so I’m enjoying some pleasant surprises (like Homebrew, and having no reason to need MySQL yet) along with the unpleasant bumps along the way.

I’ve also been chatting with several new and prospective Dexy users, and trying to get a feel for what issues people encounter when they try to install and run Dexy and start working through the tutorials. It’s great when I can provide just a few small tips to help people get started, but of course I have no idea how many other people run into those small little roadblocks and decide “oh, I don’t have time to troubleshoot this now, I’ll come back to this later” or even “that error looks scary, I’m never running this program again”. I do use feedback to refine tutorials and improve error messages, but I know I’m not seeing all the potential problems.

On the other side of this, I had an incredibly frustrating Sunday with a not-quite-polished web application, and a particularly frustrating afternoon today trying to figure out how to set up my EC2 environment for command line use. In both cases I really wanted to provide constructive feedback on my experiences so the app and documentation (respectively) could be improved, but by the time I had solved the issues in question I realized that (a) I really didn’t want to spend any more time on them and (b) I probably couldn’t remember and reconstruct all the issues I ran into, so it wouldn’t have been a terribly useful exercise for me to provide feedback, without a lot of work (GOTO a).

However, out of frustration came an idea. While last week I was toying with the idea of suggesting that prospective Dexy users arrange a time and Skype me when they are first attempting to use Dexy so I can provide real-time assistance, now I have refined this to the following idea which I am going to put into practice:

Whenever you are about to install or upgrade software, use new or recently upgraded software, or try to do something unfamiliar, or just any time your spidey senses are tingling, start a screen recorder. Preferably one that catches sound and, ideally, webcam video. Use the audio to explain what you are doing as you do it. When you are finished, if all has gone well, then you can just delete the recording. If all has not gone well, then you can watch the recording, note down the times at which important things happen, and email the developers with your documented tale of woe.

Will this be practical? I have no idea. Will this backfire with users not bothering to do as much troubleshooting as they otherwise might? Not sure about that either.

I know that I, as a developer, want to know about people having bad experiences with my software. Many of these experiences will have nothing to do with my software, but I still want to know about them.

I know that I, as a user, want to be able to express my frustration when things go wrong to the creators of software. I want to do this in a constructive way, but part of me also wants to convey the very real cost that poor design or poor documentation imposes. I also want to be able to easily illustrate ways in which software could be made more beneficial, and to be able to provide accurate and useful feedback when I encounter an issue.

So, I personally will be doing this screen recording experiment. And, if you are a Dexy user or are about to try Dexy, I encourage you to do it too. There’s no need to send me links to videos where everything went well (unless you’ve done something very cool with Dexy and want to tell me about it – then please do!) but I encourage you to send me links to videos that illustrate a problem or roadblock. Or, if you have suggestions for improving the tutorials or other documentation. Preferably with an explanatory email telling me what you were attempting to do (if it’s not clear from the voiceover), and I’d really appreciate timestamps where important things happen. I can’t promise to watch everything, but I will do what I can. And, please use this as a supplement to rather than replacement for the usual information you submit with a support request.

I have started a list of screen recording software with an emphasis on free, multi-platform software. Please feel free to make additional suggestions in the comments. I suggest people do a trial run to ensure that the software doesn’t place an undue load on your machine, and that your console screen is actually readable (a text-based, searchable console transcript is probably a good idea anyway – easy if you’re using GNU screen). A console transcript is also an option if you don’t want to record a video, and you can delete any sensitive content this way too.

If you upload your screencast to a hosted service, take note of their Ts and Cs in relation to the ownership of your upload. Take care that your screencasts don’t include any sensitive information like passwords or API keys, or any dodgy browser history that you don’t want the world to see. I certainly won’t rebroadcast anybody’s screencast without their explicit consent, or mention anybody by name [assuming recordings are in good faith, that is], but I will take note of the OS and software you are using in order to better tailor training materials, and I may discuss these in an anonymized way like “I notice a number of people who submitted screencasts to me weren’t using GNU Screen, let me tell you what a great utility this is.” If that bothers you, then, well, probably best not to send me anything since it will be hard for me to unsee. If you are happy for me to publish your screencast on the Dexy blog or elsewhere, then let me know and this will make things quicker for me in the event that I want to share some examples in future. If you give me suggestions for documentation improvements, I will assume that you want me to go ahead and make use of these.

SciBarCamb

April 9, 2011   ·   By   ·   Comments Off   ·   Posted in Uncategorized

I’m attending SciBarCamb in beautiful, sunny Cambridge. Here are my slides on Scientific Data Automation using Dexy. dexy-scibarcambtalk

Dexy Business Cards

March 19, 2011   ·   By   ·   Comments Off   ·   Posted in Uncategorized

If you met me at a conference recently, you probably got one of my very dexy business cards. I love the idea of business cards which are useful in some way, and these are business cards which are also mini demos of Dexy, not to mention the awesomeness of having business cards with source code!

I’m down to the very last card of my first order from Moo (except for the 4 I’m keeping for myself – 1 of each design), so as I’m about to order more, it occurred to me that I haven’t actually blogged about them yet. To make up for this omission, I made the first Dexy screencast talking about how the cards are made.

Read more…

Riak with Dexy

March 18, 2011   ·   By   ·   1 Comment   ·   Posted in Uncategorized

I have recently been playing with Riak, a key-value store database inspired by dynamo, and in particular with its map-reduce functionality. This gave me a nice excuse to write this blog post which describes the Python interface to Riak, and also demonstrates how Dexy can be used to run and document map reduce jobs and to make use of the returned data.

Read more…