Stimulus hashing

Hi Yoon,

Could you provide more details about MWorks’ hashing policy for the stimulus set? Does MWorks use pixel-hashing, or does it generate unique hashes from filenames? Given the potential for redundant images, in large stimulus sets, is there something users should be cautious about?

For details, see this article. There’s also a bit more info in this discussion.

In short:

  • Only image file stimuli have hashes recorded.
  • The hash is computed from the raw bytes of the file. This means that filename and file access/modification times do not go in to the hash value, but anything in the file data itself (including any embedded metadata) does.

The only thing I’d be cautious about is changes to metadata (e.g. Exif tags), since those will cause otherwise-identical image files to produce different hash values.

Currently, I log the stimulus ID based on the MWorks experiment code, like the “stimulus_presented” variable in your RSVP example. Is there a way to access the hash table MWorks uses? Would doing so improve the reliability of managing stimulus metadata (compared to relying solely on the stimulus ID variable)?

My RSVP demo doesn’t include a “stimulus_presented” variable. Are you just storing the current index in to a stimulus group in “stimulus_presented”? If so, then I’d say that keeping track of filename and file hash would be a much more robust method of establishing image-file identity. Those values aren’t directly available to experiment code, but they’re recorded in the event file, as described in the previously-cited docs.

For example, if a user mapped their stimulus ID incorrectly in MWorks, could I still recover the exact filenames associated with the hash codes used in each RSVP trial (fixation)?

Yes, that’s the idea. As long as you have the file hashes from your event file, and you have access to the image files used in your experiment (so that you can compute each file’s hash), you can robustly associate each image stimulus presentation with the image file presented. You can also compare images used across multiple experiments, even when the experiments define the “stimulus_presented” variable differently (or don’t define it at all).

I recall that Chong ran an experiment with a large image set. He chose to “vectorize” all repeated stimuli instead of repeating unique images—presenting 16,000 images once instead of showing 8,000 unique images twice. Is there a difference in maintaining image order between these two approaches?

That was a poor experimental design choice, and as I recall, it caused problems for Sarah and Jon. If the goal is to present each image twice without insisting that every image is presented once before any image is repeated, then MWorks can do that without the experiment defining two different image stimuli for each image file. Please see this discussion for details.

I recently conducted an experiment involving 8,400 images presented twice. The parsed stimulus events and neural responses appear consistent, but I’d appreciate it if you could take a look at the MWorks code to ensure the code is handling image-ordering properly.

I’m happy to take a look, but I’m not sure how you’re defining “properly”. Can you be more specific about what you’d like me to check?

Thanks,
Chris