Image loading limits

In response to a problem reported by Najib (namely, MWServer crashing when trying to load an experiment with a large number of image stimuli), I’ve been investigating MWorks’ limitations with respect to image loading. Here’s a summary of what I’ve found.

First problem: Scarab send buffer limit

Although Najib didn’t have this problem, the first potential limitation is in the Scarab networking code.

When loading an experiment, MWorks sends all required files, including images, over a TCP connection from the client to the server. Internally, Scarab buffers all the data before sending. The buffer size is fixed at 100,000,000 bytes (95.4 MB). If the total on-disk size of all the files in an experiment exceeds this limit, then experiment loading will fail. (Unhelpfully, no error will be reported, and the client will just hang forever in the loading phase.)

As a concrete example, I wrote an experiment that loads a set of 969×684 PNG images, each 789 KB on disk. If I try to use 123 such images (94.8 MB total), the experiment loads fine. If I try to use 124 images (95.6 MB total), then the client hangs indefinitely at “Loading Experiment…”.

This issue was originally discovered by Elias and still needs to be addressed.

Second problem: memory limits (RAM and VRAM)

Assuming you get past the first problem, the next potential issue is running out of memory.

When an image is loaded, it’s first converted into RGBA format, with each pixel stored as a 32-bit float (so 16 bytes total per pixel). The image is then converted into a mipmapped OpenGL texture. The memory used by each texture isn’t exactly 16 bytes per pixel, but you can use that as a rough guide. Based on some experimentation, it seems that the textures are initially stored in main memory (RAM) and are transferred to video memory (VRAM) when first displayed.

To get some real figures, I ran an experiment that displays 100 of the previously-described 969×684 images. The maximum amount of RAM used by MWServer was 3.95 GB (out of 8GB total). The main and mirror MWorks displays ran off separate video cards, using about 550 MB of VRAM (out of 1GB total) on each. Although this experiment did not exceed the limits of RAM and VRAM, it would be easy to construct one that did (say, by loading 300 images instead of 100).

Also, in the test quoted above, the server ran as a 64-bit process. When running as a 32-bit process, the server crashes while loading the experiment due to a memory allocation failure, roughly around the time it hits 2GB of virtual memory. Unfortunately, users of the ITC-18 are required to run MWServer in 32-bit mode, as the ITC-18 drivers are available only as 32-bit binaries.

Potential solutions

The Scarab buffer problem is an implementation issue that doesn’t reflect any fundamental limit. It can and should be addressed.

On the other hand, memory limits are fundamental. By running MWServer in 64-bit mode and optimizing MWorks’ image loading code, we can (potentially) increase the maximum number of images that can be loaded at once. However, we should also give users more explicit control over the loading and unloading of image data, so that they can optimize memory usage for their experiment.

Probably the right way to do this is to add “Load Stimulus” and “Unload Stimulus” actions. Combined with deferred loading (already supported), this would allow users to load and unload images as needed.

This is a very helpful list. A few thoughts:

  1. There already is a LoadStimulus action, though there isn’t a corresponding unload stimulus action. Aside from memory concerns, these are also important for managing when loading time delays are incurred. There are currently three flavors of loads: “regular” (load when the experiment is parsed), “deferred” (load when the stimulus is queued), and “explicit” (load only when the LoadStimulus action is executed). These semantics could stand some reviewing and revisiting, esp. since there is probably still time to tweak them since they are not widely deployed or publicized.

  2. The scarab buffering problems shouldn’t be too hard to fix (i.e. flush the buffer if it is full), but we may also want to revisit if we want to replace the scarab networking layer with something more robust/modern. I’ve been using ZeroMQ (zeromq.org) in another context and it has been very pleasant to use (and it has the advantage of having an active community developing it, unlike Scarab, which has been abandoned). We’d obviously want to tread very carefully if we decided to go this way, and it would need to be transparent to users.

This is a separate issue from Scarab as a serialization scheme. I’m not sure if we (or users) have the stomach to change this at this point, but there are also now relatively nice binary-self-describing-serialization libraries (e.g. BSON, which is used by MongoDB) that didn’t exist when we started. Might be worth evaluating at least. Again, user-transparency would be key.

I made a couple changes that should help in the area of memory limits.

First, during my investigation, I discovered that MWorks was calling DevIL’s initialization routines every time an image was loaded, when it should call them only once. After fixing this and re-running my 100-image test, the maximum amount of RAM used by MWServer was 2.39 GB instead of 3.95 GB, which is a pretty substantial savings. VRAM usage is unchanged.

Second, I added an unload_stimulus action and implemented an unload method for image stimuli which deletes the OpenGL textures associated with the image and frees the RAM and VRAM they occupied. To make use of this in an experiment, you need to do the following in your XML:

  1. Add deferred="explicit" to all image stimuli whose loading you want to control.

  2. To load an image, use the load_stimulus action:

     <action tag="Load Stimulus" type="load_stimulus" stimulus="my_image"></action>
    
  3. When you’re done with an image, unload it with the unload_stimulus action:

     <action tag="Unload Stimulus" type="unload_stimulus" stimulus="my_image"></action>
    

Both the bug fix and unload_stimulus support will be in tonight’s nightly build.

I just checked in a simple fix for the Scarab send buffer issue. If the buffer fills up, Scarab now empties it by forcing an immediate send. This should allow experiments to use more than 95.4 MB worth of images.

As a follow up to this discussion, I ran some tests to assess the delays incurred by loading/unloading images during an experiment.

The attached experiment loads and unloads a single image stimulus 100 times and computes the average time for each operation. I ran it on PNG files of different sizes. (I also tested JPEG versions of the same images and got similar results.) The tests were run on a mid-2010 Mac Pro (2.8 GHz Quad-Core Xeon, 7200-rpm Serial ATA 3Gb/s HDD). Here are the results:

Dimensions File Size Average Load Time (ms) Average Unload Time (ms)
100x100 29K 3.5 0.1
250x250 154K 16.0 0.1
500x500 547K 65.8 0.3
750x750 1.1M 217.1 1.0
1000x1000 1.8M 269.5 1.0

With all image sizes, most of the load time is spent in the DevIL functions ilLoadImage and ilutGLBindMipmaps.

While the exact load times will depend on the images and hardware used, these results suggest that with sufficiently small images, it may be feasible to load only one image at a time, without impacting the timing of the experiment. (Unload times appear to be negligible for all image sizes.)

Attachment: load_time.xml (4.13 KB)