ehFeedback error

Hi Lindsey,

Apologies for not following up on this sooner.

This looks like a different issue from the one we fixed previously. The old issue led to errors when the plugin attempted to receive a response from the device. Based on the messages written to the server console and stdout, it looks like the new issue is due to a failure to send data to the device. Specifically, a call to LJUSB_Write is failing, which causes ehFeedback to fail, which causes ljU6ReadPorts to fail, which causes readLeverDI to call stopDeviceIO, which ultimately leads to more LJUSB_Write failures and more error messages. It seems like, after the first write failure, all subsequent write attempts also fail.

At the moment, I don’t have any theories as to why writes to the device start failing. I’m going to try running the experiments you sent me, in the hope that I can reproduce the problem.

On your end, can you try running the failing experiment with LJ_DEBUG set to 1? To do this, edit line 47 in labjackusb.c to look like this:

#define LJ_DEBUG 1

then rebuild and reinstall your plugin. This will lead to a lot more messages on stderr. When the write failure happens, I’m expecting you’ll see a message like

LJUSB_DoTransfer: Transfer timed out. Returning.

or

libusb error: LIBUSB_ERROR_IO

or some other message starting with “libusb error”. If you can capture those and any subsequent stderr messages and send them to me, they may provide additional insight into what’s going wrong.

Thanks,
Chris

Hi Chris,
So, now I have a crash with a different error and a different experiment.
Here are the logs from the terminal server and the console.
Lindsey.

Attachment: Archive.zip (645 KB)

Hi Lindsey,

Thanks for sending the logs. It looks like the error is once again occurring in readLeverDI. However, there are two differences:

  1. The failure happens in LJUSB_Read, instead of LJUSB_Write (so it’s similar to the old issue in that regard).
  2. The connection seems to recover, because the subsequent attempts to shut down the device don’t produce further errors.

The stderr output contains exactly two error messages, both in response to the same LJUSB_Read failure:

LJUSB_DoTransfer: Transfer timed out. Returning.
ehFeedback error : response buffer is too small

So the read is timing out (after the default timeout interval of 1 second). ehFeedback neither detects nor handles the timeout and just treats it like a read failure (“response buffer is too small”). Given that subsequent attempts to communicate with the hardware succeed, it seems like retrying the read would probably succeed.

None of this explains why the read is timing out in the first place. One second is likely much longer than any normal sort of USB I/O delay, so it seems like something must be going wrong. The old issue was due to multiple, overlapping attempts by the plugin to communicate with the device. I know I went through the plugin code looking for other spots where that might happen, but maybe I missed something. I’ll take another look.

FYI, I ran your VisStimRet experiment for several hours, using a simulated counter input, and I never saw any errors.

Cheers,
Chris

Hi Chris,
Thanks for the update.
I am not surprised you haven’t been able to replicate the error on
VisStimRet yet- it has only happened twice so far. We have now created a
desktop command to open the Server through terminal so I’ll be able to send
you full readouts of future errors.
Do you have any other suggestions moving forward?
Thanks,
Lindsey.

Hi Chris,
I had another crash today. I’ve attached the error logs and the mwel code
for this experiment. It runs like VistStimRet (with pulses needed to drive
transitions between task states).
Lindsey.

Attachment: 220202_Error.zip (14.8 KB)

Hi Lindsey,

Thanks for sending the additional logs. This looks like another occurrence of the new issue (LJUSB_Write failure). Instead of a timeout, the first write failure produces this error message:

libusb error: LIBUSB_ERROR_IO

and the subsequent write failures produce:

libusb error: LIBUSB_ERROR_PIPE

Looking at the libusb source code, it seems like LIBUSB_ERROR_IO could indicate a lot of different failure modes, but LIBUSB_ERROR_PIPE mostly seems to indicate that the “endpoint halted”, which probably means the connection is dead, more or less. This supports the idea that communication with the device has failed for some reason, but it doesn’t tell us whether that’s due to a hardware failure or a software issue.

I went through your plugin code again today. I found one spot where multiple, overlapping attempts to communicate with the device could occur. However, it was in the initialize() method, which gets called exactly once when the experiment is loaded. It seems unlikely that an error there could lead to a failure much later in your experiment, but I guess it’s possible. I’ll send you a pull request with the fix, but I don’t expect it will fix this issue.

As for what to do next: At this point, I think we have all the info we’re going to get from setting LJ_DEBUG to 1, so you can set it back to 0. If it’s a software issue, and I can reproduce it while running in the debugger, then I might be able to figure out what’s happening. So far, though, I can’t reproduce it at all. Maybe it’s worth sending the terminal error messages to the LabJack folks to see if they have any insight?

Chris

Hi Chris,
I have reverted the LJ_DEBUG to 0 and I just pinged Labjack and will see
what they say.
The initial read error was very closely associated with updating to the
most recent version of MWorks. The write errors are more recent, but could
also be associated with the update given that they are much rarer. Is there
anything that changed in MWorks that could explain this?
Lindsey.

At this point, I think we have all the info we’re going to get from setting LJ_DEBUG to 1, so you can set it back to 0.

Actually, before you do that, there’s one more thing you could try. If you launch MWServer from Terminal with the following command:

LIBUSB_DEBUG=3 /Applications/MWServer.app/Contents/MacOS/MWServer

then, in addition to the LabJack debugging messages, libusb errors and warnings should be written to stderr. Maybe you can try running that way until you get another failure, then send me the output (which hopefully will have even more specific info about what failed)?

Thanks,
Chris

Ok- I can try that.
However- is it possible that the extra overhead of the debugger is
triggering some of these crashes?
Lindsey.

The initial read error was very closely associated with updating to the most recent version of MWorks. The write errors are more recent, but could also be associated with the update given that they are much rarer. Is there anything that changed in MWorks that could explain this?

Hmm. MWorks 0.10 added the DATAPixx device, which also uses libusb. I see that libusb creates some “worker” threads when MWServer launches, even before an experiment is loaded. I guess it’s possible that having two plugins both embedding libusb could lead to conflicts.

If you want to test this, remove the DATAPixx plugin. It’s at

/Library/Application Support/MWorks/Plugins/Core/DATAPixx.bundle

Then restart MWServer.

That’s the only connection I can think of because the inner workings of your LabJack plugin and anything else in MWorks.

However- is it possible that the extra overhead of the debugger is triggering some of these crashes?

You aren’t running a debugger. You’re just producing a lot of output on stderr. I can’t imagine how that could cause USB I/O failures.

Chris

Ok- I’ll try deleting the DATAPixx.bundle
Thanks,
Lindsey.

Hi Lindsey,

I submitted a pull request with the changes I mentioned. I haven’t seen any failures while running twoStim_2P_Frames.

Chris

Hi Chris,
Your changes in the pull request seem to have solved the original
ehFeedback error (the read failure).
We’ll see if it also solves the write failure issues (deleting the DATAPixx
bundle did not).
Thanks,
Lindsey.

Hi Chris,
Unfortunately, it did not solve the write error with VistStimRet.
I’ve attached the error logs.
Lindsey.

Attachment: 220206_errors.zip (394 KB)

Hi Lindsey,

Your changes in the pull request seem to have solved the original ehFeedback error (the read failure).

I’m surprised but pleased to hear it!

We’ll see if it also solves the write failure issues (deleting the DATAPixx bundle did not).

OK. If nothing else, it’s good to know that using libusb in multiple plugins isn’t an issue.

Unfortunately, it did not solve the write error with VistStimRet. I’ve attached the error logs.

Thanks for sending those, as they do include a little new information. Here’s the relevant portion of the stderr log:

Calling LJUSB_SetupTransfer with count = 16 and operation = 0.
Calling LJUSB_DoTransfer with endpoint = 0x1, count = 16, and isBulk = 1.
libusb: warning [darwin_transfer_status] transfer error: device not responding (value = 0xe00002ed)
libusb error: LIBUSB_ERROR_IO
ehFeedback error : write failed
Calling LJUSB_SetupTransfer with count = 26 and operation = 0.
Calling LJUSB_DoTransfer with endpoint = 0x1, count = 26, and isBulk = 1.
libusb: error [submit_bulk_transfer] bulk transfer failed (dir = Out): pipe is stalled (code = 0xe000404f)
libusb error: LIBUSB_ERROR_PIPE
Error : getCalibrationInfo write failed
Calling LJUSB_SetupTransfer with count = 10 and operation = 0.
Calling LJUSB_DoTransfer with endpoint = 0x1, count = 10, and isBulk = 1.
libusb: error [submit_bulk_transfer] bulk transfer failed (dir = Out): pipe is stalled (code = 0xe000404f)
libusb error: LIBUSB_ERROR_PIPE
ehFeedback error : write failed
Calling LJUSB_SetupTransfer with count = 10 and operation = 0.
Calling LJUSB_DoTransfer with endpoint = 0x1, count = 10, and isBulk = 1.
libusb: error [submit_bulk_transfer] bulk transfer failed (dir = Out): pipe is stalled (code = 0xe000404f)
libusb error: LIBUSB_ERROR_PIPE
ehFeedback error : write failed

It looks like the initial write failure produces a message from libusb saying “device not responding”, whereas subsequent attempts generate “pipe is stalled”. This aligns with my previous description of this failure mode: Communication with the device fails for some reason and doesn’t recover.

“Device not responding” certainly makes it sound like USB I/O with the device has broken. I don’t see any reason to think the plugin has done something wrong: Everything appears to be running smoothly, then the connection up and dies. On the other hand, I can think of many possibilities elsewhere, such as

  • Hardware or firmware issue on the U6 itself
  • Faulty USB cable
  • Faulty USB hub (if you’re using one)
  • Bug in libusb
  • Bug in some OS-level USB driver

Unfortunately, I don’t see any evidence to suggest that any of them is the cause, and I still can’t reproduce the issue myself.

Are you seeing this issue with one more than LabJack? If so, that makes a hardware fault seem much less likely (though not impossible). Are you seeing it on multiple Mac’s? If it only happens on one machine, then it’s possible there’s a fault in its hardware (or the macOS version it’s running).

Chris

Hi Chris,
We are just seeing this issue on one computer because we only have one
version of this microscope.
Labjack also suggested it might be a hardware issue. I tried swapping out
the USB cable- that didn’t help. I just tried moving to a new port on the
USB hub- but will now also try swapping out the hub. Any suggestions for
how to test for bugs in the libusb or driver?
One new observation- today it crashed repeatedly after very few trials
(this is unusual- it usually happens more rarely). After one crash we tried
just stopping and starting Mworks without closing out the session and it
just proceeded as if nothing had happened- maybe that re-initiates the
connection? Also, we then did restart Mworks, and it didn’t crash again
(today) after that.
Lindsey.

Hi Lindsey,

Any suggestions for how to test for bugs in the libusb or driver?

Probably the best thing would be to use the U6 with a Windows machine, since that would involve completely different USB driver code and different OS-specific code in libusb. But obviously you can’t run your MWorks experiment under Windows, so that’s not really an option.

Apart from that, someone with deep knowledge of libusb, the macOS USB driver stack, and/or the low-level USB I/O performed by the U6 might be able to assess whether things are working correctly on the software side. But I don’t think I’d be able to figure it out.

After one crash we tried just stopping and starting Mworks without closing out the session and it just proceeded as if nothing had happened- maybe that re-initiates the connection?

Interesting. Starting the device (i.e. calling startDeviceIO) will close and re-open the USB connection, but only if the first attempt to configure the device fails. If this happens, the MWServer console should show a warning saying:

LJU6 found dead, restarting.  (bug if not on MWServer restart)

Is that what you see?

Chris