Non-XML experiment format?

From this post:

Improve the MWorks user experience, particularly the process of creating and modifying experiments

(Timeline: ongoing; working prototype of non-XML experiment format by end of April?)

Non-XML experiment format?
Any details on that?

I’m not privy to what you are thinking on this, but my somewhat-superficial view is:
Please use some established scripting language with a well-developed and tested language definition and parser. Ideally Python or Matlab.

Best,
Mark

Hi Mark,

Non-XML experiment format?
Any details on that?

In a nutshell: Many users (though not all) find the editor very frustrating. This leads them to write/edit experiment XML by hand, which is painful and error-prone. If we supported a non-XML (but still text-based) experiment format, these users could edit their experiments in a text editor, without the pain of dealing with XML.

Please use some established scripting language with a well-developed and tested language definition and parser. Ideally Python or Matlab.

Yeah, there’s been some debate about what the experiment syntax should be, in particular whether it should be straight Python or a custom (possibly Python-like) language. There’s still more discussing to be done, so any input you have would be appreciated.

Chris

Agreed with all the above. We in the Maunsell lab all edit XML by hand.

I feel strongly that a switch is not worth it unless it uses or ideally
embeds a proven, existing parser/interpreter. My experience with Cortex
bears this out- they implemented a ‘simplified C’ lexer/parser. But real C
compilers have had years to work out all the bugs and corner cases and the
Cortex parser invariably had annoying undiagnosable bugs. As we have seen
recently with the MW XML, often the last thing to be implemented are the
error messages that help you find bugs. Lablib, from John Maunsell, uses
Objective C and the Mac compiler and avoided these issues (though it was not
free of other bugs).

A counterargument might be: look at e.g. sendmail/procmail/cron - they all
implement their own config file parser and it works for them. Yes, but
their language is much less complex. MWorks demands arrays, for example,
and also string functions in the XML would be nice. Even though MW appears
to achieve simplicity by needing only a basic state machine, our recent
experience with flow control shows we really need most of the conditional
and looping semantics of a full scripting language.

So I have argued for the use of a full-featured language, and to embed an
existing, debugged implementation of the language. But if you decide that
is infeasible, I would rather keep the XML than switch to a new custom
written parser. You guys have sanded off a lot of the sharp edges, and I
think it will take a long time to get anything new up to the existing
standard.

But you should take what I say with a grain of salt- I ended up being wrong
about the optimal release branch structure and you guys were right, so that
means you should deweight my opinion a bit.

Hey Mark,

[UPDATE: I just saw your response come in while I’ve been writing. I
will respond to it after this]

Thanks for bringing this up. Here’s my current thinking on the
subject. To shortcut ahead to the upshot: while I understand where
you’re coming from, I don’t think using an existing scripting language
(by itself) is a good idea, but I do think there is a path where we
can have our cake an eat it too, using a (very) simple domain specific
language (DSL). I’m certainly open to outside opinions and arguments,
and obviously everything is open source, so folks are welcome to go in
a different direction if they don’t agree with me.

Here’s the longer discussion:

Some desirables:

  1. something editable in any text editor, something that plays well
    with version control
  2. something lightweight: an experiment should be expressible in very
    few characters with very little extra “line noise” junk
  3. something that fits the multi-threaded, asynchronous nature of
    these experiments without exposing the user to threading issues
  4. it would be nice if non-programmer users had some route whereby
    they can create always-guaranteed-to-be-valid experiments with a
    shallower learning curve
  5. it should be possible to verify that an experiment won’t
    crash/raise prior to setting it in motion
  6. as much as possible, all experiment state should be logged by design

XML arguably fails on #1 and decidedly fails on #2. The editor
enables #4, but yeah, the current editor irritates me too. We haven’t
had enough development bandwidth to do as good a job on it as I would
have liked.

An obvious alternative is to scrap the XML and replace it with a
scripting front-end. Personally, I’d rather eat my own face than do
this in MATLAB, so basically we’re talking about Python. For the
record, I come pretty close to zealotry when it comes to Python, so
bear that in mind in what follows.

Python is great, but it doesn’t excel at everything. A huge
blind-spot for Python is multi-threading (#3). While Python includes
threading libraries, cPython has a global interpreter lock that makes
it effectively only run on one processor/core at a time. You can’t
even have more than one interpreter live in the same process. MW
conceptually relies heavily on multithreading, so it would be hard to
imagine replacing much of the core functionality of MW with Python,
unless you’re content to effectively run on one core.

This isn’t a problem, per se, since it is easy to wrap C or C++ code
with Python bindings, and let the C/C++ do the threading behind the
scenes. I’m a big fan of Python as a wrapper language in this
capacity, which is why I started building out the infrastructure for
conduit and analysis bindings in Python.

So why not build experiments in Python using more extensive bindings
like this? This could certainly work, and I am in favor of us
building out first class Python bindings so that you can do this if
you want. I’m not super excited about it, though, for my own use,
because explicit “experiment building” semantics in my eyes fail on #2
(lightweight) above. Actually, in a lot of the realistic experiments
that I’ve mocked up in this kind of syntax, the result isn’t much less
verbose than the XML (even if it is somewhat easier to look at), and
the structure of the experiment isn’t terribly obvious at a casual
glance (it’s easy to come up with contrived simple experiments where
this isn’t evident, but that’s not terribly helpful). Plus, the door
to #4 (use of an editor for non-programmers) is completely closed.

There are more aggressive ways of using Python (e.g. putting the
entirety of MW under the control of a Python interpreter) that perhaps
mitigate the #2 argument a bit, but I would argue that these violate
(or potentially violate) #5 and 6 above, and make a potentially huge
mess on the multithreading (#3) front. It’s also not clear to me that
if you go too far down this route that it wouldn’t make more sense to
start with something like VisionEgg anyways (which would be fine, but
that’s a different discussion).

Here’s what I propose as an alternative:

I think a simple DSL to specify experiment structure is the way to go.
This isn’t as scary or as error prone as it might sound. I’ve
already constructed a simple DSL with a lightweight syntax (and a
working parser) that maps in a pretty obviously onto the original XML.
The parser is written in just a few hundred lines of Python (using
pyparsing), and it reduces the number of characters needed to specify
an experiment by around 60% relative to XML. It’s also quite a bit
less verbose than any realistic Python-to-build-up-the-experiment type
approaches that I’ve been able to come up with. Here’s a quick
example snippet:

experiment[“My experiment”]{

protocol["Test protocol", randomization="random_with_replacement",
                        draw = 4]{

     task_system["My task system"]{

        state["Start state"]{
             # actions
             wait(100ms)

             python{
                print("These can go anywhere an action can.  But

use sparingly")
}

             report(s[1])
             x = 4 * (3 + y) * 2
        } transition {
            timer_expired(blah)  -> "State 2"
            (lick_sensor1 > 5)    -> "Initiated"
        }
    }
}

}

As you can see, it’s pretty obvious what’s going on here, and there is
some basic syntactic sugar that makes creation of assignment actions,
etc. less cumbersome to specify. Just to be clear, the parser for all
of this already exists and works (and already has much better error
reporting than the current MW parser). There’s a switch to make it
work with “Python-like” significant whitespace syntax, if you prefer;
this is a detail. There are also some fancier features I’ve
implemented on top of this, like template expansion, which I think
could greatly enhance the maintainability of large experiments. I’m
planning to check in a copy of this prototype parser soon so that
others can take a look, kick the tires, and offer feedback.

I also propose that we include “interpreted code” actions (+ good
bindings) to serve as a stop-gap for anything we don’t anticipate
(that’s what the “python” block in the snippet above is). Python is
the first, obvious candidate here, but I’d also like to see at least a
Ruby represented here as well. This lets you get stuff done quickly
if there’s a hole in what MW covers that hasn’t been patched yet, but
it doesn’t turn everyone’s experiments into the wild west.

I think that this approach enables us to satisfy #1-6 – it’s very
terse, it’s easy to see the structure of the experiment, experiments
are verifiable at parse-time, it retains the existing basic structure
of MW (enabling native multi-threadedness), state changes are all done
within the infrastructure of MW so it all gets logged properly, and
you can use a full-featured scripting language in a pinch if you need
to.

This approach also offers an interim solution whereby users can start
using the new DSL with the existing infrastructure, since the DSL is
one-to-one compilable into XML, as an option. I’ve also got the
beginnings of a translator that will automatically convert old XML
experiments into the new DSL. Of course, even with the availability
of the new DSL, XML and the editor would/could naturally remain a
one-to-one equivalent path into basically the same parser.

Anyhow, lots of options, and lots of room for debate.

Dave

[FOLLOWING UP, OUT OF ORDER]

Please don’t judge us by the sins of Cortex. :wink:

Seriously, we’ve come a long way since lex and yacc (and don’t just
mean flex and bison), and there are a lot of nice tools for building
straightforward, robust, comprehensible parsers.

The core strategy of my above message, distilled, is:

  1. create less weighty syntax that still basically maps onto the same
    skeleton as the XML, but makes it so that you don’t want to gouge your
    eyes out while manually editing it. I am confident that we can
    achieve this in a bulletproof, modern, non-Cortex-like way.
  2. integrate “foreign code” block actions as an infinitely-extensible
    stop gap measure.
  3. work on building out “internal” MW infrastructure to make it more
    expressive so that you do not need to fall back on #2 so much.

I think that #3 is an entirely achievable goal, but we’ve been so busy
“sanding down rough edges”, that relatively little time has been
devoted to this. Realistically, there are only a handful of control
flow idioms, and none of them are hard to implement. I think the lack
of a true “for” loop has really confounded people, and that is super
easy to fix.

And even if you don’t believe me, #2 still lets you get done what you
need to get done.

Anyhow, again, very open to debate,
Dave