MW Scheduling

We’re having some trouble tracking down a bug associated with an IODevice we’ve written for National Instruments hardware. It seems likely that it’s arising from the way we’re handling the scheduling of some function calls. We get a crash pretty reliably after about 60 trials. Each trial makes two scheduler requests (one to sample the digital input periodically and one to give the juice). If we comment out either of these, we crash after about 120 trials. I was hoping you could answer a few quick questions for us.

We have a couple of calls of the form: pollScheduleNode = scheduler->scheduleUS(…); and I’d like to understand better how the MW scheduler works.

  1. If we use this type of call for a one-shot call (e.g., to turn off the juice), do we need to do any clean up for the scheduler?

  2. If we used this for a call that repeats indefinitely, what the correct way to stop and clean up? Is pollScheduleNode->cancel(); pollScheduleNode->kill(); what’s needed? All that’s needed?

  3. IODevice provide stopAllScheduleNodes(), but it looks like that only cancels the scheduled nodes and then clears them. Does cancel free all the node resources, or is the kill also needed? What clears the node resources (i.e. at what point does our node pointer become dangling)?

I’ve attached a copy of the file in case that helps. We’ve also wondered whether we could be getting in trouble by using the wrong sort of shared pointers.

Thanks,
John

Hi John,

Jim’s reply didn’t include your attachment, so I haven’t looked at your code. However, I can give general answers to your questions.

We have a couple of calls of the form: pollScheduleNode = scheduler->scheduleUS(…); and I’d like to understand better how the MW scheduler works.

  1. If we use this type of call for a one-shot call (e.g., to turn off the juice), do we need to do any clean up for the scheduler?

No. If the task runs to completion, it will clean up after itself (i.e. remove itself from the scheduler). Your code is responsible for deallocating the ScheduleTask object, but you don’t need to notify the scheduler in any way.

  1. If we used this for a call that repeats indefinitely, what the correct way to stop and clean up? Is pollScheduleNode->cancel(); pollScheduleNode->kill(); what’s needed? All that’s needed?

If you want to stop a task that’s been scheduled to run indefinitely, the correct method to call is cancel(). This is all that’s needed to stop the task and remove it from the scheduler.

The kill() method is a last resort. It blindly calls pthread_cancel() to blow away the task’s thread, circumventing all normal cleanup and shutdown. It exists so that you can kill a task that’s become wedged (e.g. stuck waiting for I/O, or caught in an infinite loop). Practically speaking, you should never call it.

  1. IODevice provide stopAllScheduleNodes(), but it looks like that only cancels the scheduled nodes and then clears them. Does cancel free all the node resources, or is the kill also needed? What clears the node resources (i.e. at what point does our node pointer become dangling)?

The calls to cancel() stop each task, and the clear() call drops all schedule node references held by the IODevice base class. These references are shared_ptr instances, so if there is no other shared_ptr referencing a given node, then its destructor is called and its resources are released. However, if there are other shared_ptr instances that wrap a given node, then it is not deallocated until all such instances are released (i.e. they go out of scope, or you call their reset() method).

If your code is both adding nodes to IODevice’s schedule_nodes vector and keeping its own pointers to the nodes, then the pointers should be wrapped in shared_ptr instances. That way, even if you call stopAllScheduleNodes(), your pointers will still be valid, and the nodes won’t be deallocated until you release them.

FYI, the right way to add nodes to the schedule_nodes vector is as follows (where “node” is an instance of boost::shared_ptr):

schedule_nodes_lock.lock();
schedule_nodes.push_back(node);
schedule_nodes_lock.unlock();

I’ve attached a copy of the file in case that helps. We’ve also wondered whether we could be getting in trouble by using the wrong sort of shared pointers.

Once I have a copy of your code, I’ll take a look and see if I can spot any issues.

Chris

Hi Chris and Jim,

Thanks for your replies. What you sent was very helpful, although ultimately our real problem seemed to lie in the National Instrument library. Mark switched to a different Lab IO (LabJack) and the problem seems to have gone away.

Thanks,
John

Chris, thanks for the comments on memory management and who should own which object. That is useful.

The kill() method is a last resort. It blindly calls pthread_cancel() to blow away the task’s thread, circumventing all normal cleanup and shutdown. It exists so that you can kill a task that’s become wedged (e.g. stuck waiting for I/O, or caught in an infinite loop). Practically speaking, you should never call it.

Argh. I figured this out myself a few days ago. In particular, kill() prevents any locks from being released.

One more question which we can discuss tomorrow: In some places you use your own class Lockable and in others a boost::mutex. I am locking the LabJack plugin with boost mutexes (principally scoped_lock objects). Will that cause any issues?

NB on National Instruments: The NIdaqMXBase library, which is all that is provided on Mac OS X, appears to have some pointer bug where it blows away parts of memory after ~120 or so calls into the library. Since it’s a binary, we have no way to fix it. Using the LabJack device and its library appears to work great.

Mark

Hi Mark,

One more question which we can discuss tomorrow: In some places you use your own class Lockable and in others a boost::mutex. I am locking the LabJack plugin with boost mutexes (principally scoped_lock objects). Will that cause any issues?

Both the Lockable class and the boost::mutex stuff use pthread mutexes internally, so you should be fine using either.

Looking at the code, I gather that the Lockable class was intended to be a mix-in for adding locking capabilities to other classes. However, it’s not clear to me that it simplifies things much. While it does save you the trouble off having to add a lock instance to your class, it also forces you to deal with multiple inheritance (assuming your class also inherits from some other, more useful base class), in particular how to ensure that all the base class constructors/destructors get called properly. That alone makes me think it’s more trouble than it’s worth.

Unless you see a really good reason to do otherwise, I’d recommend sticking with boost::mutex and friends.