Draw timing

cstawarz · August 9, 2011, 5:56pm

Hello,

I’ve been following along as this thread has been evolving. I’m particularly interested in option #3.
GL_ARB_timer_query is 3.2 only, but isn’t GL_EXT_timer_query available in OpenGL 2.1 in Lion?

One other quick thing to add: there are hardware solutions to this problem as well. In my lab, we always measure the screen with photodiodes.

We have a special stimulus that is persistently queued in the bottom corner of the screen, and a photodiode array affixed to the screen over it. The stimulus is designed so that on each and every display update event, it displays a unique 4 bit “code” (white =1, black=0, and there are four spots), and we pipe the photodiode signals into our data acquisition system. By accounting for the screen position of the “code” stimulus on the display, we can estimate with relatively high accuracy when any given stimulus actually appeared on the screen, based on where it was positioned and when we measure the pixel code.

Obviously, there is some significant infrastructure associated with setting such a thing up, but I’d be happy to share / merge-in the MWorks stimulus, and share the basic hardware specs for the array that goes with it. I personally feel much more confident having such a system in place, since it does an end-run around the vagaries of the OpenGL drawing cycle.

Dave

cstawarz · August 9, 2011, 7:36pm

You mean 0.4.5, right? The stimulus display code changed significantly between 0.4.4 and 0.4.5 (e.g. that’s when we started using the display link), so I really hope we haven’t been talking about results obtained under 0.4.4.

Yep, sorry about that. 0.4.5 is what I was talking about.

If I’m interpreting your time stamps correctly, what these results show is a single instance of glFinishFenceAPPLE returning “late”, by which I mean “later than we’d like”, since the API makes no guarantee about when it will return, other than that it will be after all subsequent GL actions have completed.

The API does guarantee that it will return after all prior GL commands are processed. If it does not guarantee that it immediately returns, what’s the point of using it? The only thing I am really interested in is having a solid estimate for the VBL, if the Fence does not return now and then, it should not be reason for the timestamp.

Despite the late return, drawing for the next refresh does complete on time, which is why the time stamp for the next announcement is +8554.000000 (i.e. half a refresh cycle after the previous announcement, and coincident with the next VBI and buffer swap). Hence, the “short” interval here is just an artifact of the method you’re using to compare display update times, and the corresponding display update is actually right on time.

In this case, that is true. However, I think I gave you enough evidence already (post 7) that also the opposite, namely first a short, then a long delay happens. Hence the short timings are not artifacts in general, although the short interval in this specific example is. From what I have seen so far, the ‘off-time’ seems to be at least one quarter of the display refresh rate. It can also be half of a cycle or three quarters. I can’t confirm this with data, but maybe it means something to you.

** I tried solution 2 today and succeeded in building a fully functional version of MWorks**, which means that I can run my tests and I don’t see any irregularities, whatsoever. As for your implementation, I would suggest to not use the outputTime estimate but wait for the next Now value.
I think you received the messages of Kenneth Dyke as well. Although he claims that the Now value has nothing to do with the VBL, it is different from the estimate given by the previous invocation of CVDisplayLinkCallback. It’s just a feeling, but I would trust it more.

For this approach to be reliable, we would need to be confident that if frames were skipped, then the previously-drawn frame actually appeared on screen during the most recent refresh. As far as I know, the display link in no way guarantees this, so this approach makes me nervous.

Me too. We would have to compare the estimates of CVDisplayLink with the actual output of the screen. It’s quite easy to build a closed loop with photo-diode connected to the IO-Device. If then the timestamps of the DisplayLink and the diode activity are perfectly correlated I would trust the technique more. However, it is the first time since I use MWorks that I see perfectly smooth motion on the screen. That makes me think that the estimate of CVDisplayLink can’t be that bad.

More problematically, updating MWorks to use OpenGL 3.2 would require rewriting the drawing code for most stimuli, as 3.2 breaks compatibility with 2.1. That would require a significant amount of work. However, we will have to make that jump eventually, so the question is really “how soon?”.

I agree that updating to OpenGL 3.2 looks very promising. I installed Lion about three days ago and since then I am trying to get it to run. You are perfectly right about it being a fair amount of work, and I don’t have, neither now nor in the future, time to do it.

I you ask me to make up a list of priorities: Let’s get one stable version for Snow Leopard first and then switch to OpenGL 3.2 in MWorks 0.6

One other quick thing to add: there are hardware solutions to this problem as well. In my lab, we always measure the screen with photodiodes.

Hi Dave, thank you for your suggestions! Are you sure that we need GL_ARB_timer_query at all? There are these nice functions called glWaitSync() and glClientWaitSync() that come with GL 3.2. Is this not exactly what we want to have?

We have a special stimulus that is persistently queued in the bottom corner of the screen, and a photodiode array affixed to the screen over it.

I was thinking about something similar already, but your version (4-bit code) sounds very promising. Setting up the hardware doesn’t sound too complicated to me, it’s just four times the diode-device we already have, right? Having the stimulus and the analysis routines for the signal would be a real step forward for me. I would be very happy to share knowledge with you, although there is not much I can offer you at this point :). Did you talk to Stefan about this array?

Cheers,
Philipp

cstawarz · August 9, 2011, 8:18pm

Hi Dave, thank you for your suggestions! Are you sure that we need GL_ARB_timer_query at all? There are these nice functions called glWaitSync() and glClientWaitSync() that come with GL 3.2. Is this not exactly what we want to have?

No, no. GL_EXT_timer_query is supposedly support on OpenGL 2.1 in Lion. No OpenGL rewrite would be required. See: http://developer.apple.com/graphicsimaging/opengl/capabilities/

As with all of these extensions, there’s some non-uniformity in support across cards.

I was thinking about something similar already, but your version (4-bit code) sounds very promising. Setting up the hardware doesn’t sound too complicated to me, it’s just four times the diode-device we already have, right? Having the stimulus and the analysis routines for the signal would be a real step forward for me. I would be very happy to share knowledge with you, although there is not much I can offer you at this point :). Did you talk to Stefan about this array?

No, I’ve never talked to Stefan about this. I’m not sure we’ve even ever met…

Yes, just four photo diodes and mounting hardware for affixing them to the screen / avoiding light leakage, etc.

I’ll look into merging a version of our bit-code stimulus into the main-line plugins, since it seems like many could benefit from this. My only moment of hesitation is that the current version is a little bit tailored to our needs, insofar as it is listening to our (externally running) data acquisition system via zmq sockets (and doing real-time time-sync estimation), whereas you probably have your IO integrated differently. I don’t see any reason that it couldn’t be refactored a bit to suit a broader audience. In any event, we’ve been very happy with this strategy, since we get highly accurate “ground-truth” for what’s happening on the screen, irrespective of what the OS deigns to tell us in this regard.

Dave

David Cox, Ph.D.
The Rowland Institute at Harvard
cox@rowland.harvard.edu
http://www.rowland.harvard.edu/cox
office: 617-497-4682

cstawarz · August 9, 2011, 9:04pm

GL_ARB_timer_query is 3.2 only, but isn’t GL_EXT_timer_query available in OpenGL 2.1 in Lion?

Thanks for pointing that out. I wasn’t aware of the earlier extension.

Unfortunately, based on the GL_EXT_timer_query spec, it looks like the older extension only supports glBeginQuery/glEndQuery for determining elapsed time, whereas the newer version also supports using glQueryCounter to get the current GL timestamp. I believe the latter is necessary in order to convert GL time into MWorks time.

Specifically, I imagine doing something like this at the end of the draw loop:

drawFrame();
flushBuffer();

// Get the current GL time (synchronous)
glGetInteger64v(GL_TIMESTAMP, &glFlushTime);

// Get the current MW time
mwFlushTime = getCurrentTimeUS();

// Set up asynchronous query for GL finish time
glQueryCounter(query, GL_TIMESTAMP);

Then, at the beginning of the next draw loop iteration:

// Get the GL finish time
glGetQueryObjectui64v(query, GL_QUERY_RESULT, &glFinishTime);

// Convert GL finish time into MW time (including nano- to micro-second conversion)
mwFinishTime = mwFlushTime + (glFinishTime - glFlushTime) / 1000;

Without the ability to get the current GL timestamp, I don’t see how you can turn the elapsed GL time between flush and finish into an absolute MWorks time. Thus, I don’t think GL_EXT_timer_query is sufficient for our needs.

One other quick thing to add: there are hardware solutions to this problem as well.

Yeah, that was something Jim and Stefan pointed out, too. If you really need to know, you can always measure.

Obviously, there is some significant infrastructure associated with setting such a thing up, but I’d be happy to share / merge-in the MWorks stimulus, and share the basic hardware specs for the array that goes with it.

Sounds like a great idea. I’m sure many folks would appreciate it.

Chris

cstawarz · August 9, 2011, 9:37pm

The API does guarantee that it will return after all prior GL commands are processed. If it does not guarantee that it immediately returns, what’s the point of using it? The only thing I am really interested in is having a solid estimate for the VBL, if the Fence does not return now and then, it should not be reason for the timestamp.

Agreed. My point was just that the API was designed to let you know that something finished, not to tell you exactly when it finished.

I’d also like to note that it was the very same Ken Dyke who you’ve just been in touch with who recommended that we use the fence to determine the buffer swap time.

However, I think I gave you enough evidence already (post 7) that also the opposite, namely first a short, then a long delay happens. Hence the short timings are not artifacts in general, although the short interval in this specific example is.

I have yet to see such an event in my own tests, so I remain skeptical.

As for your implementation, I would suggest to not use the outputTime estimate but wait for the next Now value. I think you received the messages of Kenneth Dyke as well. Although he claims that the Now value has nothing to do with the VBL, it is different from the estimate given by the previous invocation of CVDisplayLinkCallback. It’s just a feeling, but I would trust it more.

Based on what Ken said (as well as the display link docs, my understanding is that the inNow parameter is just the current wallclock time when the display link callback is invoked. Of course that would be different from the previous inOutputTime, as that was an estimate of when the previous frame would be displayed on screen, which happens before the callback is invoked for the next refresh.

Since the point of the timestamp is to record an estimate of when the stimulus actually appeared on screen, the inOutputTime value is what you want. inNow is irrelevant.

However, it is the first time since I use MWorks that I see perfectly smooth motion on the screen. That makes me think that the estimate of CVDisplayLink can’t be that bad.

I don’t think that says anything about the time estimates, but it does provide further support for the theory that waiting on the GL fence is the source of most of the animation glitches you’ve seen. That’s good info.

Are you sure that we need GL_ARB_timer_query at all? There are these nice functions called glWaitSync() and glClientWaitSync() that come with GL 3.2. Is this not exactly what we want to have?

Those are the vendor-neutral versions of the Apple fence stuff. They don’t give us any functionality we don’t already have (although, like I said in a previous post, it’s possible that their implementation could be better than the old stuff; on the other hand, it could also be worse).

Chris

cstawarz · August 9, 2011, 9:46pm

For the record, here’s the exchange between Philipp and Ken Dyke at Apple:

Hi Ken,

in this post
(http://lists.apple.com/archives/mac-opengl/2009/Oct/msg00020.html) you
claim to be the person who wrote CVDisplayLink originally, and that is
why I am writing you.

I am revising a scientific stimulator, whose purpose it is to output
Videoframes as precisely as possible. Therefore I was not trusting
CVDisplayLinkOutputCallback enough to belief it’s timestamps were
exactly VBL synced. Instead I used the APPLEFence class to signal a
glSync() completion.

In OSX 10.5 it used to work, but in 10.6 I am experiencing strange
timing problems in the fences signal. Also, the
CVDisplayLinkOutputCallback signals usually around 200usec prior to the
fence. I beliefe this could have something to do with Apples progress
towards OpenGL 3.2 (that makes APPLEFence obsolete).

My question to you, since I can’t find anybody else who seems to be an
expert on the matter, is how CVDisplayLink synchronizes with the GPU. In
the reference it says: “… function, which is called whenever the
display link wants the application to output a frame”. I understand the
output time signaled by the Callback is an estimate for the next frame,
but is inNow really the VBL timestamp or just the time CVDisplayLink
supposes the GLSync happened?

From what I remember (and this is from a very long time ago), is that the inNow value actually reflects the current CPU wall clock time and has nothing to do with when the previous VBL time actually fired. For various implementation-detail-reasons, we don’t actually have a direct path from the VBL interrupt to the CVDisplayLink firing. The general idea of the CVDisplayLink callback was to tell you when “now” was (in terms of CPU wall clock time) and when the next VBL is suppsed to hit. We have to do a bunch of complicated math/filtering/etc to come up with that estimate (based on previous VBL to VBL time deltas, etc) and they can unfortunately jitter a bit. In a system that’s not under heavy load and is the GPU is also not completely swamped, the idea is that if you draw something with GL and then send that frame to the display (with the swap interval set to 1 on the GL context) that you have a good chance that the frame will hit the display at the next VBL. Of course trying to guarantee this with 100% accuracy is tricky. (The display pipeline on Mac OS X is incredibly complicated due to compositing, etc).

In 10.7 for power reasons we actually now disable VBL interrupts for long periods of time to keep from constantly waking up the CPU, and then we have to rely on the graphics drivers giving us very accurate information about pixel lock values etc so we can make good estimates as to when future VBL interrupts would have fired, and some drivers are better here than others.

cstawarz · August 10, 2011, 9:18am

Hi Chris,

However, I think I gave you enough evidence already (post 7) that also the opposite, namely first a short, then a long delay happens. Hence the short timings are not artifacts in general, although the short interval in this specific example is.

I have yet to see such an event in my own tests, so I remain skeptical.

In that case, look at frame 61 and 2797 in your own test data. You posted the datafile macmini.mwk previously, I just spend two minutes of looking at it myself.
I noted already that your MacMini has an attenuated version of my problem, and after looking closely at the data I am now fully convinced that this is so. The error is in the order of 1ms in your system, which I don’t believe is caused by inter-process communication (I don’t think you were giving heavy load to the CPU).

I don’t think that says anything about the time estimates, but it does provide further support for the theory that waiting on the GL fence is the source of most of the animation glitches you’ve seen. That’s good info.

I think it says a lot about the quality of the estimates, since the timestamps are what we base our motion calculation on. If I am not able to visually observe irregularities, the jitter must be below a certain order of magnitude.

Are you sure that we need GL_ARB_timer_query at all? There are these nice functions called glWaitSync() and glClientWaitSync() that come with GL 3.2. Is this not exactly what we want to have?

Those are the vendor-neutral versions of the Apple fence stuff. They don’t give us any functionality we don’t already have (although, like I said in a previous post, it’s possible that their implementation could be better than the old stuff; on the other hand, it could also be worse).

As far as I understand the documentation, there is quite a difference between Apple Fences and the ARB_SYNC extension. The most obvious change should be that an Apple Fence almost always invokes a flush, whereas the ARB extension does not. In our case, however, that makes absolutely no difference.

They don’t give us any functionality we don’t already have

glFinish() also gives us what we want, although it completely idles the graphics pipeline. This is just to prove that a good implementation for a function providing the information we want to obtain is crucial. If the Apple Fence Stuff broke, I would strongly recommend to give ARB_SYNC a try and see how it performs.

As for GL Counters, I don’t really see where we are going with this, since conversion to CPU time will introduce some new errors. Just a quick note on the subject: In the CVTimeStamp structures provided by CVDisplayLinkCallback, the ‘videoTime’ should be what you are trying to measure.

Chris, I think this discussion has already gone too far and before we loose our good manners I am now trying to end this.

Most probably I will be able to confirm with a new diode test that my version of the Display Code works, although I will have to wait at least until tonight to run it. The new version also provides a frame count that can be queried by dynamic stimuli as well as other improvements that seemed elegant to me.

As soon as I have data on how good the performance of this version is, I will have send it your way. I really hope that you are willing to look at it and revise it again.

Regards,
Philipp

cstawarz · August 10, 2011, 1:51pm

Hey guys,

I’m sort of confused by the turn the conversation took at the end.

For the record, I don’t like that you published this here without asking me or Ken Dyke for permission. The purpose of having you in the CC was solely to keep you, personally, informed.

For what it’s worth, I really appreciate being able to see what Ken actually said, so I don’t have to guess about vague references to “what Ken said”.

since the timestamps are what we base our motion calculation on.

GPUs are incredibly powerful and good at what the do, but they weren’t built for psychophysics. We’ve known since the beginning of MW that getting accurate frame times was going to be a challenge (leaving aside that the GPU can’t know about the characteristics of the electron beam or liquid crystals that actually give rise to the image). If your research depends critically on knowing accurate frame times, you’ll need to measure them directly (which we’ve already discussed).

That said, we’d all love for the timestamps to be as accurate as possible, and I think that will mean trying a bunch of alternatives (which you’ve been doing), evaluating them all carefully, and then refactoring and consolidating. And the issue will need to be revisited periodically as OS’s, GPUs and APIs change.

Dave

cstawarz · August 10, 2011, 2:25pm

I’ve been swamped and not following changes to the display code closely enough, so please forgive me for being way behind in coming up to speed in the following questions:

Why is announceDisplayUpdate dispatched into the global GCD queue? It was my understanding that this queue is shared across the entire system, and there can be serious latency issues if the other parts of the system are scheduling at high priority into that queue. I haven’t used GCD myself, and I’m not an expert on it, so I could very be misunderstanding.
Why is GCD being used at all? My understanding was that GCD is primarily a throughput-oriented scheduling framework (a la TBB and other such alternatives), rather than a latency-oriented one. What level of control / guarantees do we have over the latency behavior of GCD?
Why is the announce being done concurrently/dispatched at all? From a casual look, it appears that the dispatched thread will immediately hit several synchronization road blocks (first from the refreshSync barrier, and then implicitly from the OpenGL pipeline). What is being gained here?

Shouldn’t the glFinish / glFence stuff always be synchronous with the draw stuff anyways? Is the hope to offload the drawing thread? But the drawing thread needs exclusive access to the GL state machine, and the announce thread is making GL calls. Also, there doesn’t appear to be any mechanism to guard against order inversion in these two operations (e.g. announceDisplayUpdate is sufficiently delayed that the next cvlink callback comes first, or worse, the announceDisplay update issues an (otherwise thread-safe) glFinish smack dab in the middle of the OpenGL drawing, stalling the GL pipeline.

I’ll keep looking at it to see if I figure it out, but any hints you could provide would be helpful.

Thanks,
Dave

cstawarz · August 10, 2011, 3:59pm

Hi Dave,

I am quite happy to hear about your concerns regarding the design of the display code. As I said before, I performed some other modifications that include more or less everything you just came up with.

I can’t tell you why the announce thread is dispatched, but I can tell you that not dispatching it does not brake anything.

Despite the fact that it is untested, I will attach my version to this post, so you can look at it. If you happen to have time, I would appreciate any feedback about my changes. This is the version that works perfectly at least when evaluated by eye. It makes no use of glFinish or glFence, but relies solely on CVDisplayLinkCallback.

Thank you,
Philipp

cstawarz · August 10, 2011, 3:59pm

Attachment: mworks-project-mw_core-b5a77c1-2.zip (1 MB)

cstawarz · August 10, 2011, 4:20pm

Shouldn’t the glFinish / glFence stuff always be synchronous with the draw stuff anyways? Is the hope to offload the drawing thread? But the drawing thread needs exclusive access to the GL state machine, and the announce thread is making GL calls. Also, there doesn’t appear to be any mechanism to guard against order inversion in these two operations (e.g. announceDisplayUpdate is sufficiently delayed that the next cvlink callback comes first, or worse, the announceDisplay update issues an (otherwise thread-safe) glFinish smack dab in the middle of the OpenGL drawing, stalling the GL pipeline.

It took a while for me to figure this one out, but the announceThread does only make GL calls after it has synchronized with the draw thread. Synchronization with this thread takes place just before the draw thread ends and therefore already has injected the Fence and - more importantly - finished to submit all the GL commands.

After thinking about this a bit I guess the basic idea was that the announce thread is started well before it actually has to process anything. Therefore the dispatching is not time critical. The thread then sleeps on a semaphore, and as far as my understanding of that goes, it should be able to wake up from that immediately because it will not have to wait for the next scheduler time slice.

Of course, what is actually gained by this is a very good question, since we could also sleep on the Fence barrier directly inside the DisplayLink callback function. But then again, the callback is invoked only very briefly before we expect the Fence to unblock, so we could miss the exact time occasionally …

cstawarz · August 10, 2011, 4:25pm

Hey Dave,

I’ve been swamped and not following changes to the display code closely enough, so please forgive me for being way behind in coming up to speed in the following questions:

The answers to your questions are related, so I’ll try to explain the reasoning behind the current implementation.

As you point out, ideally we’d like to call glFinishFence in the same thread as and immediately following all drawing commands. However, in my tests, that approach caused the display link to consistently skip the next frame. I can only speculate as to why this happens; my guess is that the display link driver notices that the callback continues running into the next refresh cycle, and surmises that the actual buffer swap will have to take place during the next refresh. This is frustrating, since as long as glFinishFence returns in a timely manner, there should be plenty of time to draw and present the next frame on schedule.

Since blocking the display-link callback leads to dropped frames, my workaround was to wait for the fence on a separate thread. You’re correct in noting that GL contexts are not thread safe. The drawing and announcement threads use some fairly tricky (confusing, horrible) locking to ensure that they don’t perform GL operations concurrently, and to ensure that the next draw doesn’t stop until the previous announcement completes.

As for why I opted to use GCD for the announcement, my hope was that it would be lighter-weight (and therefore more efficient) than spinning up a whole new thread to handle that one task. According to Apple’s Concurrency Programming Guide, the “global” queues are global to the application, not the whole system. (Supposedly, there is some whole-system scheduling/optimization of GCD tasks, but I haven’t seen any details of how that works.) The queues are there whether you ask for them or not, so I thought, “Why not use it for the announcement?”

In light of this discussion, the answer to that question may be that the (supposedly high-priority) thread responsible for executing the queued action isn’t being scheduled as quickly as we’d like, which, as we’ve determined, can lead to an inaccurately late time stamp on the announcement and block drawing of the next frame. This is why item 1 in my list of possible fixes for the fence-finish delay is “try raising the priority of the thread calling glFinishFenceAPPLE”. Specifically, I meant that we should try waiting on the fence in a separate, non-GCD thread, whose priority is at least as high as that of the display link thread. Maybe that will eliminate the problem, maybe not; I’ll have to try it and see.

I think that answers most of your questions. As for this:

My understanding was that GCD is primarily a throughput-oriented scheduling framework (a la TBB and other such alternatives), rather than a latency-oriented one. What level of control / guarantees do we have over the latency behavior of GCD?

I think your characterization is accurate. GCD provides three global queues that differ in priority (low, normal, or high), and I think it guarantees that higher-priority tasks are always dequeued before lower priority ones. (The stimulus announcement is dispatched on the highest-priority queue.) Apart from that, there are no latency guarantees.

Because of that, you could reasonably argue that GCD isn’t the right tool for this job. I think an even stronger argument is that it doesn’t make sense to use GCD to run a task that we know is going to block, since doing so effectively steals a thread from GCD. As for whether this (mis)use of GCD is causing problems for us, testing will (hopefully) tell.

Chris

cstawarz · August 10, 2011, 4:59pm

Since blocking the display-link callback leads to dropped frames

Given that the behavior of the display link prevents us from taking the obvious, simple approach (i.e. drawing and calling glFinishFence in the same thread), I think it’s fair to ask why we’re using the display link at all.

The short answer is that Apple recommends this technique for driving OpenGL animation loops. It’s purpose-made for the task, and it provides potentially useful (and hopefully accurate) info about when future display updates will happen.

The alternative (which is also described in Apple’s Q&A) is to use an application-created thread that relies on the NSOpenGLCPSwapInterval setting to prevent it from drawing frames faster than the graphics card can display them. The downside to this approach is that we don’t get any info about where the display is in it’s refresh cycle. That isn’t a problem if you want to draw every frame; in that case, you just draw, wait on the fence, draw, wait on the fence, ad infinitum. However, if you don’t want to draw every frame, then you have to do some careful bookkeeping and thread scheduling to keep your draws in sync with the display refreshes. Probably that could be made to work, but it seems like the end result would just be a half-baked, inferior reimplementation of the display link.

That said, I’m certainly willing to consider arguments in favor of the latter approach. My purpose here is simply to share an additional bit of insight into why we currently do things the way we do.

Chris

cstawarz · August 10, 2011, 5:45pm

the end result would just be a half-baked, inferior reimplementation of the display link.

But it seems like the issue here is the display-link is wrapping our callback in other code that is making life difficult – e.g. maybe it is expecting to be in charge of finishing the pipeline, which is why adding the glFinish to the callback causes skipped frames.

If we created our own “display link”, we could be in full control of the fences and finishes, and we could give that thread defined-time-slice scheduling using mach APIs (which is probably the display link does in the first place). If this were a free running thread, it could synchronized to any phase of the display cycle we like (it’ll naturally be synced to the beginning of the cycle after the first glFinish/glFinishFence(). I think what we’re doing is sufficiently abnormal that it might not be unreasonable to expect that we’d need a custom solution.

Do either of you have clean test experiment that you could give me to play with? I’m imagining something at this stage that doesn’t require a photodiode but that raises big alarm bells when things aren’t as expected.

Dave

cstawarz · August 11, 2011, 3:45pm

Do either of you have clean test experiment that you could give me to play with? I’m imagining something at this stage that doesn’t require a photodiode but that raises big alarm bells when things aren’t as expected.

I don’t have any experiment that reliably triggers the “glFinishFence returns too late” problem. The ongoing investigation into the top level problem in this thread (animation glitches in dynamic stimuli, plus weird display-update timestamps) has uncovered a number of contributing issues:

In setups with both a main and a mirror display, interacting with the desktop (particularly using the dock) can cause MWorks to skip display refreshes (which should be reported if #warnOnSkippedRefresh is set to 1). In my experience, this is very easy to reproduce if the main and mirror displays run off the same video card, much more difficult if they’re running on separate cards.
Dynamic stimuli use a poorly-chosen algorithm for determining whether they currently need to draw and, if so, what frame they should draw. Basically, they compute the frame number by calculating the time elapsed since they started playing and multiplying by the frame rate. Two big issues with this:
1. The “current time” used in calculating the elapsed time is literally the current time when the computation is done, instead of the time when the frame will actually be displayed. Thus, the appearance of the stimulus depends on when in the refresh period its draw method is called, which is not fixed and can lead to uneven or jittery animation.
2. If the frame rate isn’t an integer, the frame-number calculation is imprecise and therefore potentially inaccurate.
As we’ve most recently been discussing, sometimes glFinishFence returns much later than expected. Since the next draw loop iteration is blocked until it returns, this can cause the next frame to be dropped.

My guess (although I don’t have data to back it up) is that the majority of observed animation glitches that don’t produce skipped refresh warnings are caused by issue 2. Thus, I think it will be easier to identify instances of issue 3 and then ensuing problems once issue 2 is addressed. That’s my top priority right now, and I hope to get a fix into the nightly build soon.

As for issue 1, I think the right solution is to give the mirror window it’s own independent draw loop, so that the main display is never blocked by it. However, that will require some pretty serious refactoring of the stimulus display code. For the time being, the best approach is probably to run the main and mirror windows on separate video cards, or (if possible) disable the mirror window altogether.

All that said, in case you still want something to play with, I’ve attached an example experiment from Stefan, which he used to demonstrate animation glitches in the drifting grating. The grating in the experiment is full-screen width, so you can follow an individual bar as it crosses the screen, which makes it pretty easy to see any unevenness in the motion.

Chris

cstawarz · August 11, 2011, 3:45pm

Attachment: _timing2.xml (1.8 KB)

cstawarz · August 11, 2011, 7:14pm

OK, I see. So lots of stuff going on here.

My primary interest is in trying out a different “display-link” strategy (since we’ve been noticing the 8ms offset from reality in the stimulusDisplayUpdate timestamps as well. I was hoping that someone had a simple test case for that problem (ideally without requiring a photodiode), but now that I’ve got MW back up and running for me under Lion, I will direct some effort towards building one.

Dave

cstawarz · August 12, 2011, 3:56pm

These are the accuracy measurements for a version using only the DisplayLink

Test Hardware:
MacPro5,1 2x6-core Xeon with 2.93GHz each, 16GB RAM
Primary graphics: NVIDIA GeForce GT 120 with Samsung Syncmaster 2443 running at 60Hz
Secondary graphics: ATI Radeon HD 5870 with Quato 240m running at 59.9 Hz
OSX 10.6.8, MWorks 0.5.dev (the version I posted above)

Experiment: live running experiment with two random dot patterns, a couple of different fixation points, auditory stimuli and a rather complex state system that displays a lot of messages per trial. I recorded about 2 hours of real monkey training (n=123772 display update events).
I used Dave’s BitCode stimulus with a setting of one bit. This means that it displayed a rectangle in one corner of the screen that changed it’s color (black or white) on every refresh.

A photodiode was attached to the Quato screen, it’s signal was sampled by the BKOHG IO device at 1kHz. To rule out irregularities in the screens decay rate only black->white (low->high diode activity) were considered in the following analysis. To convert the analog diode-signal into a binary representation I simulated a Schmitt-Trigger in the subsequent data analysis.

I downsampled both signals (diode and logged screen refreshes) to a sample-rate of 1kHz. Then I calculated the cross-correlation between the timestamps of all low-high transitions in both signals, allowing for a lag of up to 200ms and using the display update announcements as reference.

The result:
Cross correlation (n=123772) peaked at -82ms shift, that’s long but not unreasonable. The normalized correlation at that lag was 0.4518, that’s not too good.

It seems like there is substantial jitter in my timestamps as well, so the approach to rely on DisplayLink alone does not seem suitable for our needs.

-Philipp

cstawarz · August 12, 2011, 5:14pm

I’m not sure a cross-correlation is necessarily the right way to look at this (particularly if there is a chance for “frame-shift” errors). Also, with just a one bit code, there are substantial opportunities for sync ambiguity – if you could must at least one more photodiode, you’d have the opportunity to have greater confidence that you’re not matching events that don’t go together. I think you’ll want to match the events up one by one, and plot the distribution of the differences between the two events. Ideally the distribution should be unimodal with a small standard deviation. When we run that test, we get a strongly bimodal distribution (which we interpret as “on time” and “late timestamp”).

Dave

Topic		Replies	Views
Can't get stimulus display to update at 60Hz Support	20	220	July 20, 2022
Latency from MWorks display time to actual display Support unsolved-mystery	13	170	August 5, 2025
Timing Issue in MWorks Support	6	159	July 19, 2022
Stimulus timing Support	14	187	July 19, 2022
Understanding display update time stamps Timing	1	201	July 20, 2022

Draw timing

Related topics