Python bridge round-trip variable latency

Hi Chris,

In the course of using the python bridge for state-system processing, I’ve needed to know how long it took for a roundtrip. I set a variable in the XML, respond to it via callback in the python bridge by setting another variable which the XML acts upon.

I constructed a self-contained test for this, which I’m attaching. You should be able to run it easily (it’s based on your bridge example).
I find a roundtrip takes ~300ms for the client bridge, and either 100 or 200 ms for the server bridge, so worst-cases are 300ms and 200ms.
That includes about 75 ms for “conduit.send_data()” for both the client and server bridge.

Is that what you expect?

The example illustrates one more small nit I found - in string variable assignments quotes are treated as part of the string and everywhere else they are just syntax. For example:

if read in the python bridge, the value has three elements with the first and last quote characters.

Thanks,
Mark

Whoops, here is the code.

Attachments:

Hi Mark,

I find a roundtrip takes ~300ms for the client bridge, and either 100 or 200 ms for the server bridge, so worst-cases are 300ms and 200ms.

This took a little while to figure out. These delays reflect the “work cycle” of several MWorks threads that handle the sending, receiving, and processing of events. I’ll spare you the hairy details, but the upshot is that by changing two scheduling intervals in the MWorks source code, I can reduce the round-trip time to ~20ms for both the client and server bridges. (I think the 20ms delay, in turn, is fixed by the work cycle of some other thread(s), but I haven’t determined that conclusively.)

Do you have a sense of what an acceptable round-trip time would be for you purposes?

in string variable assignments quotes are treated as part of the string and everywhere else they are just syntax

This relates to this discussion, so I’ll answer in that thread.

Cheers,
Chris

A 25 ms worst-case round trip time should be fine.

I plan to use a “wait” action for a fixed 25 ms before acting on the output from Python, so I’d be interested in the worst case. I could add another state each time I need to wait for this, but if you expect a non-constant latency perhaps a new action that waits for a variable to take a value would be useful.

By the way - Dave said a while ago that the server bridge had lower latency - does that mean only for passing events from MWorks to Python (and not the reverse), or is that no longer true?

Thanks,
Mark

Hi Mark,

Any progress on the round-trip bridge times?

No, I haven’t been working on this lately. The code changes I mentioned previously are trivial, but I’ve been hesitant to commit them without some more thought and testing. I’ll try to get to that soon.

Chris

Thanks, I am making some changes soon that would benefit from this. Mark

Hi Chris,

We’d be willing to test a build with these changes if that helps you. I could use it on our human psychophysics rig and then after a few days, move it into more daily use in our training rigs. If you have other suggestions, let me know.

Mark

cc:ing Bram, who would also use these changes.

Hi Mark,

After experimenting with this some more, I think I’m comfortable with pushing my changes in to the nightly build. Do you want them in the task definition file build, too?

I plan to use a “wait” action for a fixed 25 ms before acting on the output from Python, so I’d be interested in the worst case. I could add another state each time I need to wait for this, but if you expect a non-constant latency perhaps a new action that waits for a variable to take a value would be useful.

After tweaking the timing a bit more, I now get consistent round-trip times of ~20ms with the client bridge and ~10ms with the server bridge. However, I don’t think I can ensure a hard upper limit on the round trip time, so waiting for explicit confirmation that the output from Python has been received would be the more robust approach.

Dave said a while ago that the server bridge had lower latency - does that mean only for passing events from MWorks to Python (and not the reverse), or is that no longer true?

It’s still true, and the latency is lower in both directions. The reason is that events sent over the client-side bridge have to first pass from server to client before going from the client to the Python process, whereas events going over the server-side bridge go directly from the server to Python.

Chris

Hi Chris,
Comments below.

After experimenting with this some more, I think I’m comfortable with pushing my changes in to the nightly build. Do you want them in the task definition file build, too?

Great!! I’ll test them in the nightly. I don’t need them in the task definition build for now.

I plan to use a “wait” action for a fixed 25 ms before acting on the output from Python, so I’d be interested in the worst case. I could add another state each time I need to wait for this, but if you expect a non-constant latency perhaps a new action that waits for a variable to take a value would be useful.

After tweaking the timing a bit more, I now get consistent round-trip times of ~20ms with the client bridge and ~10ms with the server bridge. However, I don’t think I can ensure a hard upper limit on the round trip time, so waiting for explicit confirmation that the output from Python has been received would be the more robust approach.

Great. Can I request a wait_for_condition action? ie.
<action type="wait_for_true" condition="python_output_received==1" error_timeout_ms="100" error_message="Python semaphore not high within 100 ms"/>
(If the condition is not true in 100ms the task system stops with a console error.)

We can implement this now in the state system, but it requires creating a separate new state for each wait, this would be a huge improvement.

In practice, I’m likely to use a fixed 30 ms delay action plus an assert for each wait, instead of a new state, so this action would just make for faster, more precise roundtrips.

Dave said a while ago that the server bridge had lower latency - does that mean only for passing events from MWorks to Python (and not the reverse), or is that no longer true?

It’s still true, and the latency is lower in both directions. The reason is that events sent over the client-side bridge have to first pass from server to client before going from the client to the Python process, whereas events going over the server-side bridge go directly from the server to Python.

Good, and it’s nice to know the difference quantitatively: 20ms and 10ms for roundtrips.

Thanks again, this is great.
Mark

Hi Chris,

I am a postdoc working together with Mark Histed on a mouse project. I’m thinking about buying a new Mac Pro for my experiment but was not sure if MWorks would run properly on it. Do you know of any users that have employed MWorks successfully on the new Mac Pro?

Thanks a lot,
Bram

After experimenting with this some more, I think I’m comfortable with pushing my changes in to the nightly build. Do you want them in the task definition file build, too?

Hi Chris,

After testing for a few days, it appears the reduced-latency changes cause no visible problems.

Do you mind doing a TDF build with the reduced-latency changes? Thank you, Mark

Hi Mark,

After testing for a few days, it appears the reduced-latency changes cause no visible problems. Do you mind doing a TDF build with the reduced-latency changes?

Thanks very much for your testing. I’ll merge these changes into the TDF branch and do a new build.

Can I request a wait_for_condition action?

Sure, no problem. That should be straightforward to implement, so I’ll try to get it done soon and roll it in to the TDF build, too.

Cheers,
Chris

Before you spend time on this- we are now seeing bugs with the low latency
build. I will give you a call this afternoon if that’s ok. -M

Before you spend time on this- we are now seeing bugs with the low latency build. I will give you a call this afternoon if that’s ok.

Sure, that’s fine.

Chris

Hi Chris,

I played around with the client and was able to put together a testcase that shows the behavior on my laptop, not just on the rig machine.

Load the attached experiment and launch the matching python script from the Client bridge. You should get “Done setting up…” and then a bunch of variable/event change state messages.
Then terminate and reload the python script until it loads, but you no longer see any event messages (no callback being called). Once that happens, clicking on the green arrow in the client will start the server but the client will crash.
If you can’t get the bridge into the bad state, you may try starting the experiment and stopping a few times (may need to use MWServer->Action->Stop to stop it).
This happens without reloading the experiment about 2/3rds of the time on our rig machines.

Also - try uncommenting all the variables in the attached XML; you get a boost error that seems to arise from too many variables.

There may be other issues we’re experiencing but I think this is the main one; if you fix this I’ll test again.

best
Mark

Attachments:

Hi Chris,

I wanted to clarify our limits. We typically have up to 220 variables per experiment. I can see that possibly going as high as 300, but I believe lists/arrays will limit that growth somewhat. I register a callback for each variable in Python that saves its current value for use by other python code.

Mark

Hi Mark,

Thanks for the test case. Your suspicions were correct: I’ve confirmed, using both your test and my own, that the issue is a big, ugly deadlock. (Note that the reduction in event-handling latency exposed the issue but didn’t cause it.) I haven’t untangled things yet, but I’ll let you know when I do.

Also - try uncommenting all the variables in the attached XML; you get a boost error that seems to arise from too many variables.

Yeah, this is an old problem. My past testing indicated that you run into this issue once you get past ~500 experiment-defined variables. There’s no reason that would have changed.

Chris

Hi Chris,

Thanks, keep me posted.

Mark