Ne500 pump issue

Hi Chris,

As Davide anticipated, we are experiencing some problems with the novel MW version (0.5.1).

To summarize: we are running it on MacMinis with Yosemite (OS 10.10). We are using a Client-Server configuration, with three simultaneously working setups. The three setups have the same MW and OS version as the Client. The pumps (New Era NE-500 Syringe Pump) are connected to the server configuration via a serial-to-ethernet bridge (Startech 1 Port RS-232/422/485 Serial over IP Ethernet Device Server).

Two setups out of three are working. The other one has problems interacting with the pumps. After few trials (say, 50 trials), pumps stop responding. I need to restart the server and, sometimes, to restart the MacMini to make them work again. The NE500 plugin we have comes with MW version 0.5.1.
I’ve noticed that on the server console MW gives a “timeout connection” error, when the connection with the pumps fails.
We assume this might be a software problem, since both the serial-to-ethernet switch and the pumps work finely when connected to the other servers. Moreover, we have the same protocol running on the three setups, so it is unlikely that there is a bug in it.

I’m sure I’m missing something, so please do not hesitate to ask me for more details!

Thanks in advance
BW
Rosilari

Hi Rosilari,

If all three Mac minis are running the same versions of OS X, MWorks, and your protocol, and the pump has problems on only one of them, then it seems very unlikely that this is a software issue.

The first thing I’d suggest is to try using a different Ethernet cable to connect the problematic pump. I’ve found that Ethernet cables fail surprisingly often, and this could be the cause of the connection timeouts you’re seeing.

If swapping the cable doesn’t help, then I can borrow Mark’s pump and try to reproduce the issue. To do that, it’d be helpful to have a copy of your protocol. However, I don’t have much confidence that I’ll be able to reproduce the problem, given that you have three identical setups, yet only one manifests the issue. Still, I can give it a shot.

Cheers,
Chris

Hi Chris,

this makes sense.
I’ll test with a new cable and I’ll see if fixes the problem.
However, I ran the protocol today and the pumps worked fine. This makes a software problem even more unlikely, I guess.
I’ll keep you posted!

Thanks
Rosilari

Hi Chris,
I’ve tried to change the ethernet cables, as suggested.
The problem, however, is still there.
Now it happens with all the three setups. After a certain (totally random) number of trials, the pumps stop responding.
I’ve noticed that, at the same time, also MW servers are stuck and give a “open GL” error.
If I restart the macminis just before training, there are more chances to collect more than 100 trials without problems, but it is not a rule (for example, today I had to restart them twice and still the pumps got stuck a couple of times).
Any idea about what it is going on? I have the feeling that the NE500 plugin might be out of date…
Thanks again,
Rosilari

just a quick update: when the pumps are stuck, MW doesn’t always give an opengl error, most of the times I have to force quit, so I cannot even read what the error is on the server console. Any way to save the log stream, in these cases?
Thanks
Rosilari

Hi Chris,

things are kind of running out of control with the pump plugin. The pumps of all three rings stop delivering in an apparent random and erratic way. I am quite convinced that the issue is the plugin. The last version of this plugin was made by Dave at least 3 years ago I believe. I think it is necessary to revise it, especially because we are not the only lab using it (as far as I understand also Mark Histed is going to need it soon).

Can you take car elf this?

BTW, do you have the code of the plugin?

Thanks
Davide

Hi Rosilari,

After a certain (totally random) number of trials, the pumps stop responding. I’ve noticed that, at the same time, also MW servers are stuck and give a “open GL” error.

If MWServer is hanging, then my guess is that some I/O operation is getting stuck. For example, MWorks could be waiting for a response from the pump that never arrives. However, I’ll need to be able to reproduce the issue in order to confirm this. Can you send me a copy of an experiment that’s hitting this problem?

Thanks,
Chris

Hi Davide,

things are kind of running out of control with the pump plugin. The pumps of all three rings stop delivering in an apparent random and erratic way. I am quite convinced that the issue is the plugin. The last version of this plugin was made by Dave at least 3 years ago I believe. I think it is necessary to revise it, especially because we are not the only lab using it (as far as I understand also Mark Histed is going to need it soon).

Can you take car elf this?

Mark is sending me a pump. If I am able to reproduce the issue (again, an example experiment from your lab would be most helpful), I can try to fix it.

BTW, do you have the code of the plugin?

The only version I’ve seen is the one in the main MWorks repository.

Cheers,
Chris

Hi Chris, in attachment the script I’m using.
it is formed by two protocol. I use both and both give problems.
Let me know if there’s anything else you need to know!
Thanks
Rosilari

Hi Rosilari,

I now have a NE-500 pump and StarTech serial-over-IP adaptor. I’ve been conducting some tests with your experiment, but so far I haven’t seen the pump stop responding as you have. However, I do have a few more questions for you.

First, I see that whenever the experiment performs an infusion, the pump responds with an “Out of Range” error. Here’s an example from the server console:

00:00:52:  SENT: 00 RAT 100.0 MM
00:00:52:  WARNING: The syringe pump returned an error: Out of Range (OOR) [at line 29]
00:00:52:  RETURNED: 00S?OOR

Looking at the NE-500 manual, I see that the maximum rate for a 15mm-diameter syringe is somewhere around 9-10mL/min, so it makes sense that requesting 100mL/min would give the OOR error. Do you see these messages, too?

Second, I’d like to know more about how your hardware is connected. Specifically:

  • Is the StarTech adaptor connected directly to an Ethernet port on the Mac running MWServer? If not, how are they connected?

  • How many pumps are in the NE-500 network? I see that your experiment uses two connected pumps (IDs 1 and 2). Are there any others in the connection chain?

  • Is there one StarTech adaptor (and attached NE-500 network) per MWServer instance? Or are multiple MWServer processes talking to the same StarTech adaptor (assuming that would even function, which seems unlikely)?

Finally, regarding this:

when the pumps are stuck, MW doesn’t always give an opengl error, most of the times I have to force quit, so I cannot even read what the error is on the server console. Any way to save the log stream, in these cases?

One thing you can do is launch MWServer via the command line and redirect MWorks messages to a file. To do this, open Terminal and run the following command:

MWORKS_WRITE_MESSAGES_TO_STDERR=1 /Applications/MWServer.app/Contents/MacOS/MWServer 2>&1 | tee log.txt

That will store all messages in the file log.txt. If you can use this technique to record the messages from a session where the pump stops responding, I’d appreciate it if you could send me the log.

Thanks,
Chris

Thank you very much for your help Chris,

I re-direct this message to Rosilari that I am sure will reply to you soon.

Best
davide

Hi Chris,

thanks a lot for the instructions to save the log file! I find it pretty
useful.

About the setup:

Is the StarTech adaptor connected directly to an Ethernet port on
the Mac running MWServer? If not, how are they connected?

the startechs, the macminis (both servers and client) and three
surveillance webcams are all connected via ethernet cables to a Netgear
16 port ethernet switch (JGS516). The Netgear creates a local network
where:

the client communicates with 3 server setups
the client communicates with the 3 cams
each setup communicates with its own StarTech

How many pumps are in the NE-500 network? I see that your experiment
uses two connected pumps (IDs 1 and 2). Are there any others in the
connection chain?

I have two pumps per setup, both directly connected to a startech. So,
for three server setups, there are 3 startechs and 6 pumps.

Is there one StarTech adaptor (and attached NE-500 network) per
MWServer instance? Or are multiple MWServer processes talking to the
same StarTech adaptor (assuming that would even function, which
seems unlikely)?

as said above, there is one startech per server setup.

Finally, response are collected via a customized sensor connected to an
ADC Phidget (one per server setup) that is directly linked via usb to
the server macmini.
Now, I’ve observed that when the server crashes for some reason (mainly,
an OpenGl problem), sometimes also these phidgets get stuck. In the past
few days I haven’t had any pump issue, but I’ve had phidget problems and
I don’t think the problem is over. So I reckon that I’ll send you a log
file ASAP.

Hope everything is clear, don’t hesitate to ask me in case i missed
something.

Thanks in advance

Rosilari

Hi Rosilari,

Thanks for the additional info.

Now, I’ve observed that when the server crashes for some reason (mainly, an OpenGl problem), sometimes also these phidgets get stuck. In the past few days I haven’t had any pump issue, but I’ve had phidget problems and I don’t think the problem is over. So I reckon that I’ll send you a log file ASAP.

It’s beginning to sound like the pump problems are a side effect of some other issue. The log file will definitely be helpful.

Thanks,
Chris

Hi Chris,
i totally agree!
I’ll send you the log file as soon as some setup crashes again (it is
gonna happen soon, i’m afraid).
Thanks again

Rosilari

Hi Chris,
as promised, a couple of log files.

So, today this was the scenario:

  • I run 1 hour experiment on three servers at the same time
  • I reset the experiment to start a new session with another bunch of rats
  • the three servers seem to reset properly
  • I run the new experiment, load the variables etc and, as every other time, i test the pumps
  • in one server only the pumps are not responding (see log file “log_pump_issue.txt”).
  • I reboot the server with the issue, and an OpenGL problem emerges (see “log_OpenGL_issue.txt”)
  • I reboot the macmini and the server finally works

I took a look at the txt files, and I found a “WARNING: Did not receive a complete response from the pump”. Unfortunately, it is not clear why pumps did not respond…

Thanks again for taking a look

Rosilari

Attachments:

Hi Rosilari,

Thanks for the log files.

The first thing I notice is that you are getting the “Out of Range” errors that I mentioned previously, e.g.

SENT: 01 RAT 100.0 MM
WARNING: The syringe pump returned an error: Out of Range (OOR)
RETURNED: 01S?OOR

Based on my (limited) testing, I believe the pump’s response to an invalid rate request is to select some default, valid infusion rate. Maybe that doesn’t matter to you, but it’s probably worth checking.

That aside, it looks like the core issue is that some responses from the pumps are delayed. For example, here’s an excerpt from your first log file:

REWARD! (1886573870)
SENT: 02 DIR INF
WARNING: Did not receive a complete response from the pump
RETURNED: 
SENT: 02 RAT 100.0 MM
WARNING: Did not receive a complete response from the pump
RETURNED: 
SENT: 02 VOL 0.020
WARNING: Did not receive a complete response from the pump
RETURNED: 
SENT: 02 RUN
WARNING: Did not receive a complete response from the pump
RETURNED: 
Give additional licking time (1886990835)

Here, MWorks tries to send four commands to pump 2. Each time, it fails to get a response within the allotted time (100ms), issues a warning, and moves on. Then, a bit later, we see the following:

REWARD! (1889457068)
SENT: 01 DIR INF
RETURNED: 02S02S?OOR
SENT: 01 RAT 100.0 MM
RETURNED: 01S
SENT: 01 VOL 0.020
WARNING: The syringe pump returned an error: Unspecified error (OOR01S)
RETURNED: 01S?OOR01S
SENT: 01 RUN
RETURNED: 01I
Give additional licking time (1889673917)

Here, the first “RETURNED” line contains the (overdue) responses to the first two commands sent to pump 2 in the previous log excerpt. After that, we see the expected responses from the new commands being sent to pump 1 (mashed together in one case, leading to the “Unspecified error” message). However, we don’t see responses for the other two commands sent to pump 2, so either those responses were lost, or the pump never received those commands.

In this case, it appears that the pumps recovered, as both pumps 1 and 2 exhibit normal command/response behavior afterward. However, the delayed-response issue appears again at the very beginning of the last run in the log file, which is when you noticed the pumps weren’t responding.

The second log file (after reboot) is more puzzling. After some response timeouts, MWorks reports an extremely long response from pump 1:

Wait for triggered trial ... (81705195)
SENT: 01 DIR INF
WARNING: Did not receive a complete response from the pump
RETURNED: 01S001S001S001S001S001S... [this goes on and on]

While I don’t know where all this response data is coming from, a message this long would trigger a buffer-overflow bug (which I just discovered) inside MWorks’ NE500 plugin. This, in turn, would result in memory corruption, which is probably the cause of the weird OpenGL errors you’re seeing.

At this point, there are a few obvious steps we can take to try to eliminate this issue. On my end, I can

  1. fix the buffer-overflow bug and
  2. extend the timeout interval for responses from the pump. (I think one second would be reasonable, but it can be a user-configurable parameter, if you prefer.)

On your end, the next time the pumps stop responding, I recommend shutting down MWServer and restarting the pumps and the StarTech. This should clear out any unsent command responses or other bad state.

Beyond that, I’m wondering if we can ditch the StarTech and switch to a serial-over-USB interface. (For example, I have a FTDI cable that would do the job.) This would eliminate one potential point of failure and simplify your setup a bit. Or is there some reason you need a serial-over-TCP solution?

Thanks,
Chris

Hi Chris,
Thanks for the quick reply!

About the Out Of Range (OOR) error, you write:

Based on my (limited) testing, I believe the pump’s response to an
invalid rate request is to select some default, valid infusion rate.

What I do to set the ratio is to assign a value to the “flow_rate” field
in the NE500 channels in my MW protocol. I mean a uL/minute (UL) ratio,
but since I do not specify any unit of measure I think that somewhere
the default unit of measure is set to “mL/minute (MM)”. Of course a 100
MM ratio is too much for our syringes and this might explain the OOR
error.
This could happen either with the plugin or with the pumps themselves.
However, since no OOR error is raised when I set the ratio via terminal
-so directly to the pumps-, I think the problem might be in the plugin.

About the main pump issue:

That aside, it looks like the core issue is that some responses from
the pumps are delayed.

I agree with you that extending the timeout interval for responses from
the pump might do the job.
However, one second delay is maybe too much in the behavioral protocol,
since in this case the reward couldn’t be immediately associated to the
response. Having a custom timeout interval would surely allow me to test
what’s the best compromise.

About the startech solution:

Beyond that, I’m wondering if we can ditch the StarTech and switch to
a serial-over-USB interface. (For example, I have a FTDI cable [1]
that would do the job.) This would eliminate one potential point of
failure and simplify your setup a bit. Or is there some reason you
need a serial-over-TCP solution?

The reason to use a serial-to-ethernet bridge was mainly to have control
from the client to any connected device. However, this is not
fundamental.
Switching to a serial-to-usb interface would need some changes in the
plugin as well. At the moment the plugin asks for an ethernet address,
one for each pump network.

I don’t know how long these changes can take. If this is something that
could be done, I’d be more than happy to help testing and debugging!

Thanks again
Rosilari

Hi Rosilari,

I’ve made some changes to the MWorks NE-500 interface, which will be in tonight’s nightly build.

First, I fixed the buffer-overflow bug, which hopefully will eliminate memory corruption due to pump I/O failures.

Second, per your suggestion, I made the response timeout interval a user-configurable parameter. The new parameter is called response_timeout, and the attached example experiment shows how it’s used. If the parameter is omitted, the old value (100ms) is used.

Third, I made the error handling more robust. When communication with the pump fails, or the pump returns an error code, MWorks now generates an error message (instead of just a warning). Also, if you attempt to set an invalid infusion rate, or initial pump configuration fails for any other reason, then your experiment will fail to load.

Finally, regarding this:

What I do to set the ratio is to assign a value to the “flow_rate” field in the NE500 channels in my MW protocol. I mean a uL/minute (UL) ratio, but since I do not specify any unit of measure I think that somewhere the default unit of measure is set to “mL/minute (MM)”. Of course a 100 MM ratio is too much for our syringes and this might explain the OOR error.

Yes, MWorks always interprets the rate as mL/minute. You can see that in your log:

SENT: 01 RAT 100.0 MM

If you want 100uL/minute, then you need to set flow_rate to 0.1.

When you have a chance, please try the new build and see how it works. I’m hopeful that you’ll be able to eliminate the problems you’re having simply by selecting a larger response timeout, but if things do fail, then MWorks should at least do a better job of alerting you.

Cheers,
Chris

Attachment: ne500_demo.xml (2.21 KB)