Hi Philipp,
However, after population the code with debug messages, I figured out that both glSetFenceAPPLE and glFinishFenceAPPLE (also glTestFence) need a tremendous amount of time to return (usually around 1ms each).
It’s expected that glFinishFenceAPPLE would take a long time to return, since that’s where we actually wait for the buffer swap to complete. However, I don’t see why glSetFenceAPPLE or glTestFenceAPPLE would be taking on the order of 1ms to return, so I’m suspicious of that result.
In addition to that, sometimes the DisplayLink Callback would get activated (I assume that’s also an indicator for a completed VBL), but the FinishFence did not return. Thus the Announce Thread would pause some additional time and block all display updates during this wait time because it has the display lock.
That’s interesting. I agree that the display link callback getting invoked should indicate that the previous VBL and buffer swap completed. This suggests that there’s a delay in restarting the announcement thread after the buffer swap completes.
One possible explanation for such a delay is that the display link thread runs at a higher scheduling priority than the announcement thread. I should do some experiments to see if that’s true and whether matching their priorities makes any difference.
So what I did was to replace the glFenceAPPLE stuff with a boost::condition_variable that waits for a mutex. The thread that actually frees the waiting Stimulus Announcer is the Display Link Callback. That seemed straight forward to me and works like a charm. It does not solve all the problems (sometimes flushBuffer takes very long) but a lot of the stutter is gone now. That’s at least what I can see by eye (looking at the debug timestamps and the dots on the display).
I’m not sure I understand the change you made. Are you saying that, instead of blocking on glFinishFenceAPPLE, the announcement thread waits on a condition variable, which is signaled at the beginning of the display link callback?
The reason for all the fence business is to determine, as accurately as possible, when the buffer swap actually happened. The display link will tell you when the next swap should happen, and you can use that time and the refresh rate to compute when the previous swap probably happened. But waiting on the fence seems more robust than either of those approaches (although, as noted above, it’s possible that there can be a substantial delay between the buffer swap completing and MWorks being notified).
However, I do think it’s significant that your change reduces the amount of observable stutter, as it supports the hypothesis that delays in the announcement thread are impacting drawing of the next frame.
It seems to me that calling glFinish() before flushBuffer improves performance. I have no explanation for that, but I know that flushBuffer only calls glFlush() which is similar, but not equal to glFinish().
Interesting. By calling glFinish before flushBuffer, you’re guaranteeing that all drawing has completed on the back buffer before the display link callback exits. I don’t know why that would improve performance, but the fact that it does is noteworthy.
Also, it seems like a good idea to force NSOpenGLPFABackingStore, at least for the main display, as this makes the behavior of flushBuffer more predictable (although it disables some performance tweaks).
In what way does it make the behavior of flushBuffer more predictable?
Dynamic Stimuli have no way to access the last frame timestamp. I think it would be a very good idea to have a function in Stimulus Display that returns the last frame time. Right now all the moving stimuli use the time when an update is issued for their movement calculation. That might be inaccurate because it is not perfectly correlated with the VBLs.
Stefan stopped by today, and we discussed this very issue (among others). The upshot is that we agree with what you say. The important thing for a dynamic stimulus to know is what display refresh cycle it’s drawing for (with cycle 0 being the first refresh which displays the stimulus). Given that and the refresh rate, the stimulus can accurately calculate how much time has elapsed (and therefore how much it should have moved) since the last refresh.
The first type of lost frames can be triggered by expanding a stack from the dock, therefore it might be due to a busy GPU. Another hint is that it is possible to eliminate this type of lost frame (the portion of it occurring without touching the system) by disabling Quartz Extreme via the Quartz debug tools.
This makes sense. My understanding (based on this Wikipedia entry) is that when Quartz Extreme is enabled, the windowing system uses OpenGL to accelerate composition of all graphics visible on screen into a single frame buffer. This increases the load on the graphics hardware, sometimes to the point where it can’t keep up with both the needs of the windowing system and the needs of MWorks.
The second type of lost frames seems to be related to synchronization issues between the graphics card and the screen.
I wouldn’t rule this out, but I think a more likely culprit is the way that dynamic stimuli determine which frame to draw. I haven’t worked through the math, but I suspect that the stimulus is concluding (in it’s needDraw method) that it has already drawn the current frame, when in fact it hasn’t.
Actually, you should be able to verify that with another debug statement. Just override needDraw in your stimulus class to look something like this:
bool MyDynamicStimulus::needDraw() {
bool ret = StandardDynamicStimulus::needDraw();
if (started && (!ret))
fprintf(stderr, "stimulus is running but doesn't want to draw!\n");
return ret;
}
For a stimulus whose frames_per_second is set to the refresh rate, the message should never print. I’ll try this out and let you know what I find. Maybe you can do the same?
I’ll be leaving for the weekend shortly, but I’ll follow up on this stuff with you (and Stefan) next week.
Chris