How to load a randomized set of images in batches

Hey Dave,

Najib wants to write an experiment to do the following:

  1. Create a stimulus group containing 3000 image stimuli, each of which uses deferred="explicit".
  2. Randomize the order of the images in the group.
  3. Present the images in three batches of 1000. In order to avoid running out of memory, each batch will be loaded, queued, displayed, and unloaded before moving on to the next batch.

We’ve been trying to figure out how accomplish this in an MWorks experiment, and we haven’t had much success.

It’s easy enough to give the image files sequential names (e.g. img1.png, img2.png, etc.), create the stimulus group using a range replicator, and select images randomly via the “Selection and Randomization” parameters of a block, trial, or list. However, we can’t figure out how to use subsets of the randomized images while also ensuring that each image is used only once.

Do you see a way to do this using MWorks’ current capabilities? If not, I’ll guess we’ll have to think about features we can add to make this possible. One possibility is adding a “Shuffle Stimulus Group” action, which randomizes the order of elements of a stimulus group. Another option might be to make selection variables indexable as discussed here. Any other ideas?

Thanks,
Chris

Hey,

I’m not sure that I’m fully understanding the problem. Could you possibly do one of the following?:

  1. Define a block inside a replicator that indexes an offset parameter (e.g. “batch_number”). The stimulus groups would then be indexed using an expression (e.g. stimulus_group[ batch_number*1000 + i]). I think that this is permitted in the current framework (and if it isn’t, it should be).

  2. Define three blocks (that sample their contents randomly) that each contain a replicator indexed on a local variable “i”, where i goes from 0 to 999 in block 1, 1000-1999 in block 2 and so on. Less desirable due to duplication of block contents, but potentially manageable. The parser has some ability to interpret references/aiases to paradigm components, though this has never been exposed in the editor, and not much attention has been paid lately to validating that this feature still works.

  3. Create three stimulus groups, one for each batch. Then create three blocks (whose parent samples sequentially) each of which samples a different group. This is variant on #2, though one could create a (somewhat clumsy) if-then structure to decide which stimulus group to load / queue from.

I’m sure there are others ways in addition, those these are the ones that jump to mind. Is there some angle to this that I’m disregarding?

– Dave

Hi Dave,

The problem arises because I can’t load 3000 images due to memory constraints.
So I answer is to load the images 1000 at a time.
Because the replicator will only go through items sequentially.
This means that as you suggested, I will always load the first 1000 images first, and while i can randomize within them, they will always be first.
And as far as I know there is no way to create a list of 3000 random (shuffled) numbers, and access that list 1000 items at a time in order.
In the past we have used a selection variable as a way to randomize images, however, once you access the selection variable once (i.e. for loading an image in a replicator) you can’t access it again for displaying them at least I didn’t think so, but I might be mistaken.

Finally, there might be a way to do it if one can index using an expression, I will have to think about it more, but I don’t know we can actually index using an expression as of now?

Best
Najib

OK - I think I see what you mean now: you want to load three batches
of 1000 images where the partitioning of the random selections in each
batch is unique every time (e.g. batch 1 might be [1,56,99,104,…
1999] on one run, and a completely different set of 1000 of another
run. But you also want the batches to be non-overlapping. The
solutions I proposed only work if you are okay with an up-front
partitioning of the images in batches. Gotcha.

OK, so yes, the fundamental limitation here is that you need to run
through a randomized selection 1000 times, and then run through the
exact same sequence again at some later time. You can reset a
selection object in MW, but it won’t run through the same sequence
again.

There are a couple of mechanisms that I could imagine adding that
could solve this issue.

One way would be to make array data types available to users; this is
already on our list of desired features. This way, you could load a
random sampling of stimuli by whatever means you see fit, and then
store the indices in an array (e.g. with a variable assignment). This
would require modification of the expression parser, variable
assignment action, etc.

The other alternative, as Chris suggests, would be to have selection
variables be indexable themselves, e.g. my_selection_variable[4] means
“the forth thing selected on this variable since it was created or
last reset”. I think that this one would easiest to implement, though
we’d need to be very carefully in documenting the behavior of such a
thing, since I could imagine a variety of different behaviors that one
might imagine such indexing would do.

An easier option still would be to add a “generate identical
randomization after reject selection” option to the selection object.
The way this would work would be that the random number generator
would be reseeded after reset/reject/accept selection actions, and
that seed would be stored and restored as needed. So you’d draw 1000
times from the selection to load the stimuli, issue a
reject_selections action (rolling back the “clock” by 1000
selections), and then you’d get back the same values for the next 1000
draws (because you’d be generating random numbers with the same seed).
You would want this to be OFF by default, possibly with scary
warnings next to the check box in the editor, since this is often
explicitly not the behavior you’d want.

I think you could be up and running in an hour or two with the last
option, and it would represent the smallest change to the code base
and your experiment.

– Dave

An easier option still would be to add a “generate identical randomization after reject selection” option to the selection object.

Rather than modifying the behavior of reject_selections, maybe we should add a “replay_selections” action that rolls back all non-accepted selections and re-seeds the random number generator with the last-used seed.

For Najib’s problem, another major annoyance with selection variables is that the set of selectable values must be explicitly enumerated. For example. if you want to draw from integers 1…10, your selection variable must specify values="1,2,3,4,5,6,7,8,9,10". This isn’t a problem for a small set of values, but Najib needs to use integers 1…3000. His solution has been to generate the list in Python and cut and paste into the XML, but that’s pretty ugly.

It’d be great if the values could be specified using something like Python’s range function, e.g. values="range(1,3000)". Alternatively, we could accept interval notation ([1,3000]). While I don’t necessarily like the idea of adding an ad hoc expression syntax for selection values, it seems like some kind of shorthand is necessary if we’re going to be using selection variables in this way.

Chris

Sure, having a “replay_selections” sounds like an excellent compromise, though obviously all of the selection-oriented actions will need some internal modification to manage the seeds.

Re: enumerating lists in selection variables, I’m pretty sure that we used to have this feature prior to the latest parser implementation, but it fell by the wayside. That one should be straightforward to add; I’d vote for also supporting “mixed format” lists like “1,2,4…6, 9,10” while we’re at it. I don’t think it needs to be ad hoc. We already have a family of “getNumber”, “getVariable”, “getExpression”, etc. calls. Adding a “getList” one doesn’t seem to far out of line. This would naturally be reused in the list_replicator as well. Actually, looking at it, right now there is some nasty looking spirit code there that I don’t recognize and don’t think that I wrote. A centralized “getList” method with various bells and whistles would be welcome change.

– Dave

Further discussion between me and Najib has revealed another complication, which none of our current suggested solutions (generic arrays, indexable selection variables, or a “replay_selections” action) address.

So far, we’ve been thinking about ways to “pre-select” a list of random indices and then iterate over it multiple times. For example, if MWorks had a user-accessible array type, then you could generate a list of 10 random indices by creating an array variable and calling next_selection 10 times, appending each selection to the array. You would then load 10 images, display each one, and unload them using 3 range replicators that iterate over the array.

The issue is how to deal with accept/reject on the selections. If the monkey breaks fixation while an image is being displayed, Najib wants to reject the selection and display the image again later. However, in the above example, there are 9 other “uncommitted” selections in addition to the one currently displayed. The current implementation of reject_selections will reject all of them, so even images that were displayed successfully will go back into the pool and get shown again. The net result is that the monkey must maintain fixation for all 10 images in order for any of them to be accepted. Depending on the experiment and the monkey’s behavior, this may not be OK.

Maybe what we need is a way to reject a particular selection. Probably this would make the most sense with indexable selection variables: Rejecting selection_var would reject all non-accepted selections, while rejecting selection_var[3] would reject the third non-accepted selection without affecting the others. Of course, this potentially could be confusing (e.g. what happens if you ask for the value of selection_var[3] after rejecting it?). Also, I haven’t thought about how we’d need to change the implementation of selection variables to make this work; it could be ugly.

Hey,

I somehow missed this last portion of the discussion.

I think that even rejecting a single selection is not going to help you here (if it did, then you could just “accept” after each good trial, and all would be well). The issue, I think, is that once you’ve rejected anything, the next draw is going to be random again, and potentially outside of your originally-drawn partition.

I suspect that you’ve already worked around (I recall there being a no-you-really-can-load-stimuli-on-the-fly thread a little while back), but here are some things I could imagine working in this kind of general situation in the future:

  1. a “draw batch” action. Basically, this could be used for making an arbitrary number of draws out of one selection variable, depositing those indices into another. I believe such a thing could have reasonably broad utility, though complexity is obviously increased a bit. Sounds like you may already be thinking in this direction.

  2. a custom C++ selection variable object. It will always be possible to imagine something too complex for MW to support. This may be one of those times.

  3. a custom, interpreted (e.g. Python) selection variable object. There have always been vague plans to include such a thing, and this might be a good test case for that. Obviously, there are many levels of abstraction that this could exist at, ranging from a highly constrained, special purpose environment (think GLSL shader), to full-access, riding-with-live-ammunition-and-the-safety-off-type scenarios.

I think there is an argument to be made for the “shader” approach (e.g. small, constrained code snippets that fit into a more complex environment that the user doesn’t need to worry about). For one, I think selection is going to be one of those bug-bears that keeps coming back to bite us, because people will always want to do fancy things here. However, the entry points for selection-type actions are potentially well defined, and a constrained environment also potentially gives us better leverage to check up on the user to ensure they are doing something valid, though there are obviously limits here. Selection is also usually not time critical, which is a plus.