Python bridge round-trip variable latency

Hi Mark,

I resolved the deadlock. The fix will be in tonight’s nightly build.

Unfortunately, now when I run your test case, I get a bunch of these messages when registering callbacks in the Python script:

ERROR: Send on outgoing queue timed out; event will not be sent

Basically, MWClient is filling the interprocess message queue faster than the Python-side conduit can empty it. Because the queue is fixed size, once it’s full, outgoing events are just dropped.

You can avoid this problem by sleeping after each callback registration. (In my tests, I used time.sleep(0.1), although a shorter sleep interval may work, too.) However, this is kludgy. I think the real fix is to replace boost::interprocess::message_queue with a different interprocess transport (probably ZeroMQ-based). Getting away from a fixed-length, fixed-message-size queue would eliminate this problem as well as the too many variables issue.

That said, I do think the kludgy version will allow you to proceed with further testing. Should I push the deadlock fix in to the TDF build and get to work on a wait_for_condition action? Or would you prefer to wait for the “real” fix?

Chris

Chris,

Just to summarize our conversation today:
I will test the new nightly with the delays.
In the next few months, we were thinking about, in order:
fix the issue with IPC queue overflow by going to a new IPC mechanism
Add a wait_for_condition action
I’d test/roll out the TDF stuff at this point
Add a conduit function to return the current state of a variable, avoiding a need to use Python callbacks on all vars and maintain their state
Long run: Think about Python script actions/embedded interpreter, or MWorks-Server-as-a-Python library.

thanks,
Mark

Hi Chris,

With the registration delays, it seems like we are not seeing any problems with the new low-latency build after a few days of testing.
I’ll let you know if anything else comes up.

Mark

[From Mark]

Hi Chris,

I wanted to summarize where I was at with MWorks.

First, the nightlies with new conduit delays work fine. Thanks!

I have switched to using ZeroMQ (instead of a raw socket) for some python-to-python communication in our experiment code. If you haven’t implemented this yet, I can give you a few pointers on issues I’ve run into. (Summary: use PAIR, use ZeroMQ 4.0.x, need to set rcv and send_timeout, high_water_mark, turn LINGER off. I didn’t use an extra heartbeat socket.)

As for my todo list, it now looks like this:

  1. Add a wait_for_condition action
  2. add capability to edit LISTs in variables window
  3. I’d test/roll out the TDF stuff at this point

(2 and 3 can be swapped)

Lower priority:

  • destructor-type state called on state system stop
  • fix the issue with MPC queue overflow by going to a new MPC mechanism
  • Add a conduit function to return the current state of a variable, avoiding a need to use Python callbacks on all vars and maintain their state
  • Long run: Think about Python script actions/embedded interpreter, or MWorks-Server-as-a-Python library.

One more comment on the “long run” point: Either of those models works for me. Either way, I think it would be best if MWorks and Python were not independent threads, so that a call to Python code would by default block until it returned. That would save intro users the need to think about threading. And advanced users could do multithreading using Python via the multiprocess module. (Right now to make blocking calls, I make calls into python by setting a variable, using a python callback which when finished sets a ‘finished’ variable, and MWorks waits on the ‘finished’ value. This is an ok solution but probably should be easier by default).

thanks again!
Mark

Hi Mark,

The wait_for_condition action is done and will be in tonight’s nightly build. Example usage:

<action type="wait_for_condition" tag="Wait until foo is 2" condition="foo == 2" timeout="100ms" timeout_message="Wait timed out (foo = $foo)" stop_on_timeout="YES"></action>

This will wait until either foo == 2 evaluates to true or 100ms elapses. If the wait times out, an error will be generated using the given message (which can include $varname substitutions like report and assert), and the experiment will stop.

The “condition” and “timeout” parameters are required. If “timeout_message” is omitted, a default message is generated. If “stop_on_timeout” is omitted, it defaults to true; set it to NO or 0 if you don’t want the experiment to stop on timeout.

When you get a chance, please try this out and let me know how it works for you.

Thanks,
Chris

Great! I will test this in the next few days.
-M