Server crash, latest nightly

Hi Chris,

With the most recent nightly I’m seeing crashes as below. Have you
seen similar and can I provide more info?
We are registering callbacks in Python for every variable so we can
maintain the current value of all of them.

Thanks,
Mark

Thread 5 Crashed:
0 org.mworks-project.MWorksCore 0x000000010c7ee970
mw::KeyedEventCallbackPair::getCallback() + 30
1 org.mworks-project.MWorksCore 0x000000010c7eebfc
mw::EventCallbackHandler::handleCallbacks(mworks_boost::shared_ptrmw::Event)

  • 426
    2 org.mworks-project.MWorksCore 0x000000010c764b1b
    mw::Server::handleEvent(mworks_boost::shared_ptrmw::Event) + 51
    3 org.mworks-project.MWorksCore 0x000000010c7784c9
    mw::EventListener::service() + 215
    4 org.mworks-project.MWorksCore 0x000000010c7783b7
    mw::readReader(mworks_boost::shared_ptrmw::EventListener const&) +
    23
    5 org.mworks-project.ZenSchedulerPlugin 0x000000010ff6aa8a
    mw::zen_scheduler::zenScheduledExecutionThread(void*) + 2032
    6 libsystem_pthread.dylib 0x00007fff86b7999d _pthread_body + 131
    7 libsystem_pthread.dylib 0x00007fff86b7991a _pthread_start + 168
    8 libsystem_pthread.dylib 0x00007fff86b77351 thread_start + 13

Hi Mark,

I just ran some tests with a Python script that watches many variables, and everything worked fine.

Have you recompiled all your custom plugins (client and server) against the new MWorks build? To me, this looks like the seemingly random crashes you get when there are binary compatibility issues, so I’d check that first.

Chris

Hi Chris,

Thanks for checking.
I recompiled all our plugins with the latest nightly and I still see this
crash.

I was able to reduce the problem to a smaller testcase (attached). (The
XML file has a lot of variables, and probably could be even smaller, but I
didn’t try to reduce the XML further). Load the python script in the
client script bridge. The first time I load it, there is a Client error.
When I unload and reload, on the second load the server crashes for me.

Can you reproduce this? I’m using this nightly version: 0.7.dev-20160411
(28c5bfa)

Thanks!
Mark

Attachment: Archive.zip (4.14 KB)

Hi Mark,

Thanks for the test case. It does crash for me.

The problem appears to be that you’re trying to connect a client-side Python bridge script to a server-side conduit. It’s not immediately clear to me why that would be a problem, but I can eliminate the crash by changing the line

conduit_resource_name = 'server_conduit'

in your Python script to

conduit_resource_name = sys.argv[1]

While it’s certainly an ugly failure mode, I’m not sure I’d call this a bug, as the client and server conduits were never intended to be mixed in that way.

I also was seeing the following error message in the Python window console:

Assertion failed: (! detail::singleton_wrapper< T >::m_is_destroyed), function get_instance, file /Library/Application Support/MWorks/Developer/include/boost/serialization/singleton.hpp, line 131.

However, that seems to be an artifact of the script exiting immediately after registering a callback. If I make it sleep for a while at the end, then it terminates normally.

Chris

Hi Mark,

While it’s certainly an ugly failure mode, I’m not sure I’d call this a bug, as the client and server conduits were never intended to be mixed in that way.

As we discussed offline, I hadn’t realized that this used to work, which makes me think that it probably is a recently-introduced bug. I’ll try to figure out what’s happening and get a fix in to the nightly build soon.

Thanks,
Chris

I hadn’t realized that this used to work, which makes me think that it probably is a recently-introduced bug.

I just tried running your test case under MWorks 0.5.1 and 0.6, and it still crashes, making me think this is not a recently-introduced bug. Are you sure this used to work? If so, can you tell me what versions of MWorks and OS X it worked on?

Thanks,
Chris

Hi Chris,
I know this worked with Mountain Lion with MWorks versions from 2014.

I’m cc’ing Bram and Lindsey here. I know Lindsey was seeing crashes with
this nightly that might be the same problem. Bram, have you seen this
crash when you initialize the server python bridge from the Client
PythonBridgePlugin?

Mark

I don’t think so, I have no python bridge failures.
Let me know if you’d like me to test something.

Bram

We’re not getting python bridge failures, but when we try to load the xml
the server quits without showing an error.
Lindsey.

Hi Lindsey,

Have you recompiled any custom plugins against the new MWorks build? (I’m mostly thinking of your LabJack plugin, but maybe there are others.) If yes, and the server is still crashing, can you please send me more details about the experiment you’re trying to load? If you can send the whole XML file, that would be great.

Thanks,
Chris

I’ve verified that at least Bram and I have been using the server_conduit from inside the ClientBridgePlugin since about May 2014, first on Mountain Lion, and then on Mavericks and El Capitan.

Is is possible that this is a 10.11.4 issue? That’s the OS X version I made the testcase on.

Mark

Is it possible that this is a 10.11.4 issue? That’s the OS X version I made the testcase on.

That’s the OS X version I tested on, too. There may have been a latent bug in some MWorks code used by the conduits, which has been revealed by recent OS changes.

Also, I forgot to mention that I was able to crash the server by loading Mark’s test case via the server-side conduit, so it seems I was wrong to assume that mixing the client and server conduits caused the problem.

I’m out of this office this week, but I’ll dig in to this more when I return.

Thanks,
Chris

Hi Mark,

This issue should be fixed in the currently nightly build (i.e. the one available now). The root problem was some somewhat dubious usage of the C++ standard library: While it looked like it should work correctly (and seemingly did prior to OS X 10.11.4), it was definitely living on the edge with respect to the standard, and I’m still not sure whether this issue represents a longstanding bug in MWorks or a new bug in libc++.

Whatever the case, I replaced the dubious code with something non-dubious, and your test no longer crashes. When you have a chance, please download the new build and see if it fixes the problem for you, too.

Thanks,
Chris

After about one day of testing, this appears to work correctly now.
Thanks!
Mark