Draw timing

cstawarz · June 3, 2011, 3:18pm

Hi Philipp,

To explore this issue further, I ran some additional tests on my Mac Pro, this time with both displays connected to a single video card. While I still don’t see the ultra-short update intervals that you observed, I do see many occurrences of “late” stimulus announcements, which happen up to 8ms (i.e. half a refresh cycle) later than I’d expect. Some observations:

Within a given experiment, the timing of the late announcements is very consistent. With a drifting grating, they were all 8600-8700us later than expected; with dynamic random dots, they were all about 2500us late.
Disabling the mirror window eliminates almost all of the late announcements, although I still see a few.
Interacting with the Dock at all (even just moving the mouse over it) results in a large increase in the number of late announcements. The worst case is when I expand a stack over the mirror window, which leads to a ton of late announcements and skipped refreshes.
When both monitors run at the same refresh rate, I can eliminate the late announcements by syncing mirror window buffer swaps to the VBI. In that case, it seems that the only way I can get late announcements is by opening a Dock stack over the mirror window. However, when the two monitors do not run at the same refresh rate, syncing mirror window buffer swaps to the VBL makes timing performance much worse.

The last point makes me think that MWorks should have separate threads drawing to the main and mirror displays, and that in both cases the buffer swaps should be synced to the VBI.

As for the Dock issues, it seems reasonable that opening a translucent stack over the mirror window would degrade performance, since the stack and the window must be composited. However, it’s not clear why interacting with the Dock in other ways degrades performance.

If you think it is worth trying, I could set up a small testing environment with a photo-diode and oscilloscope to check if there are any irregularities in the screen flicker. For that, however, I would need a dynamic stimulus that changes the luminance of the screen (e.g. white vs black background) on every frame.

This probably would be worth trying. My current theory is that the buffer swaps are happening at the right time (i.e. during the VBI), but the announcements are delayed because the graphics card is busy with the mirror window. If that’s correct, then the timing of the luminance changes should be steady, irrespective of whether the announcement is late.

In case you want to give this a shot, I’ve attached an experiment that switches between a black and white background on every frame (using a movie stimulus).

Chris

cstawarz · June 3, 2011, 3:18pm

Attachment: luminance_test.xml (2.36 KB)

cstawarz · June 6, 2011, 12:46pm

Hi Chris,

setting up the diode-test will take some time but for now I compared your movie stimulus experiment to the drifting grating experiment. The results combined with the fact that I am actually able to visually observe the irregularities lead me to the conclusion that the problem is not due to delayed announcements.

First of all, I do see the periodic occurrences of the skipped refreshes in both experiments, but I cannot replicate your finding about late announcements of draw times. Only when I do something demanding for the graphics hardware (e.g. expanding a stack from the dock) the late announcements occur and then they usually come together with skipped refreshes.

I attached a plot to this post that shows several interesting things. I plot frame number versus the time between the frames in usec. All messages are indicated using red dashed lines, regardless of their type (skipped refresh or delayed announcement). Two things can be seen on the plot: First, there are a lot of draw events that are delayed but do not have any type of warning attached to them. Second, the short timings and glitches (the reason for this bug report) are still there and do also not have any warnings. I magnified one relevant example for every experiment (bottom graphs). These are what you call the “ultra short” times, but as I noted previously, there is only one machine in this lab which has the ultra short intervals. All other machines with only one graphics card show a behavior comparable to your Mac-Mini testing machine that has the timing issue I am talking about. This is the type of glitch that I am able to visually observe. Late draw timings seem to produce a type of “stutter” but not the “slow-fast” glitch.

Concerning your idea that there should be two separate threads drawing to the mirror window and the experimental screen, I am on your side. I was wondering before wether this was the case, but since playing with the beam-sync settings in OSX didn’t change anything to the behavior of the draw timings I was under the impression that drawing to the mirror window was not causing the glitches. In previous tests I could also not see any dependence of the position, size and even presence of the mirror window on the draw timings. It did also make no difference wether the mirror was drawn on a screen attached to the same or a different graphics card.
Is there anything new to the code that might have changed this?

Sorry for not having better news or more ideas,
Philipp

cstawarz · June 6, 2011, 12:46pm

Attachment: drawtiming2.jpg (99 KB)

cstawarz · August 2, 2011, 11:25am

Hi Chris,

finally, I found some time to run the diode test. It was a lot of work, so I really hope I can convince you with this data that there exists a problem with the screen refreshes in dynamic stimuli and that it is much worse than I previously thought.

First of all, the setup:

It’s the Mac Pro described in previous posts, running as a headless server with two graphics cards. The stimulus display (a standard Dell TFT E2009W), is hooked up directly via DVI-D cable to a ATI Radeon HD 5870 and is the only screen on this GPU. The system uses an older NVIDIA card that also displays the mirror window. There is also only one screen hooked up to this (primary) GPU.
The client runs on a different system and connects via TCP link. The server was never touched during the experimental runs.
These are the settings that have proven to work most reliably in our previous tests.

The experiment:

I modified your luminance_test example so that it displays exactly 400ms of the black&white flicker stimulus. In addition to that I am sending a trigger pulse to the oscilloscope using our custom made BKOHG IO-Device. It was tested thoroughly and works very reliably.
The experiment xml file is attached to this post.

I hooked a standard photo diode that was taped to the TFT up to a digital oscilloscope. Each experimental run triggered the oscilloscope and recorded the following 500ms of diode activity. A typical reading shows medium activity in the beginning, which is then followed by pulses of low and high (black and white) activity. Because the movie stimulus consists of a white image followed by a black image, the expected result was that from medium activity the reading would go to high diode activity (white screen) after some delay (screen latency). The high activity pulse would then be followed by a period of low activity (black screen) and afterwards again by a period of high activity and so forth.
After 400ms of flicker activity it should go back to medium activity (gray background).

I did 8 runs for the current nightly build, and 8 runs for the 0.4.5 release that many people currently use to obtain their data. There was not a single run that showed the expected result. Readings show lost frames, duplicate frames, late frames, even frames in reverse order (black image first), and irregular timings for screen updates. Detailed results can be seen on the attached pdf sheet.

Up to this point I didn’t do any analysis on the data files obtained during this test. The only thing I can report in addition to the oscilloscope readings is that the only difference between 0.4.5 and 0.5dev that I observed was a high number of late announcements warnings in 0.5dev. The #warnOnSkippedRefresh parameter was set to true, but no such warnings occurred, even though refreshed were obviously skipped.

Given the variety of errors I observed, I am proposing that the core code responsible for updating dynamic stimuli should be revised. Quite obviously, there is something really wrong here, and given the fact that after seeing errors in the Dynamic Random Dots and the Dynamic Drifting Gratings now also the Movie Stimuli have similar problems, the issue seems to be further down in the process hierarchy than previously thought. However, right now I have no idea where to begin the search.

Chris, I really hope that I was now finally able to convince you that there is indeed a (larger) problem concerning dynamic stimuli, but possibly also static ones. If for some reasons, you really need additional measures on the matter, please contact me as this seems to be an issue of highest importance for this (and potentially other) labs.

Best regards,
Philipp

cstawarz · August 2, 2011, 11:25am

Attachments:

summary.pdf (6.51 MB)
luminance_test.xml (4.19 KB)
05d_luminance.mwk (214 KB)
045_luminance.mwk (196 KB)

cstawarz · August 2, 2011, 8:22pm

Hi Philipp,

Thank you for running these tests and sharing the results. I need to take some time to process this stuff, but I’ll try to get back to you ASAP.

Chris

cstawarz · August 5, 2011, 3:43pm

Hi Chris,

I performed some intensive hacking in the graphics core code (mostly StimulusDisplay and a bit in OpenGLContextManager).

It took me a bit to understand how exactly the displays get updated and what synchronizes them to the VBL. However, I think I have a vague idea now.

To make a long story short, I think I found something strange that might be worth investigating. After a display update has completed and was flushed to the FrontBuffer the Announce Thread takes over, acquires the display lock and waits for a VBL sync completion before it starts logging. To do that it uses Synchronization Fences ( glFinishFenceAPPLE(), that are set via glSetFence beforehand). Apple suggests that one does synchronize to the VBL using this method.

However, after population the code with debug messages, I figured out that both glSetFenceAPPLE and glFinishFenceAPPLE (also glTestFence) need a tremendous amount of time to return (usually around 1ms each). In addition to that, sometimes the DisplayLink Callback would get activated (I assume that’s also an indicator for a completed VBL), but the FinishFence did not return. Thus the Announce Thread would pause some additional time and block all display updates during this wait time because it has the display lock.

So what I did was to replace the glFenceAPPLE stuff with a boost::condition_variable that waits for a mutex. The thread that actually frees the waiting Stimulus Announcer is the Display Link Callback. That seemed straight forward to me and works like a charm. It does not solve all the problems (sometimes flushBuffer takes very long) but a lot of the stutter is gone now. That’s at least what I can see by eye (looking at the debug timestamps and the dots on the display). This hacked version does not run with anything else than my test RDP and that’s why I can’t do the diode test with it.

Some other things I found:
It seems to me that calling glFinish() before flushBuffer improves performance. I have no explanation for that, but I know that flushBuffer only calls glFlush() which is similar, but not equal to glFinish().
Also, it seems like a good idea to force NSOpenGLPFABackingStore, at least for the main display, as this makes the behavior of flushBuffer more predictable (although it disables some performance tweaks).
Dynamic Stimuli have no way to access the last frame timestamp. I think it would be a very good idea to have a function in Stimulus Display that returns the last frame time. Right now all the moving stimuli use the time when an update is issued for their movement calculation. That might be inaccurate because it is not perfectly correlated with the VBLs.

Cheers,
Philipp

cstawarz · August 5, 2011, 5:47pm

and here are the results of todays hacking attempt:

I can eliminate all but one type of glitch, or stutter. And this is for my iMac which previously showed all types of errors and hat, in general, the worst performance of all machines I tested so far.

The short timings are completely gone - looks like your intuition about this being an artifact produced by incorrect logged frames was true.
Strange irregularities in the timings are gone - times between frames are either ~16ms (60Hz) or ~32ms (one skipped frame).

There are two types of lost frames: those that seem related to a busy GPU (they have a ‘Lost Frame’ warning attached), and frames that get lost after an ever the same period of time (regular occurring losses). The latter type has no warning attached to it.

The first type of lost frames can be triggered by expanding a stack from the dock, therefore it might be due to a busy GPU. Another hint is that it is possible to eliminate this type of lost frame (the portion of it occurring without touching the system) by disabling Quartz Extreme via the Quartz debug tools. As far as my understanding about the function of Quartz Extreme goes it is a link for the system to use the GPU for calculations of various kinds. Disabling this link should preserve the GPUs processing power to only those tasks that try to draw something to the screen in any given moment. (I might be wrong here, I am not an expert on the topic).

The second type of lost frames seems to be related to synchronization issues between the graphics card and the screen. At least that’s my current theory. Reason for my belief is that skipped frames occur in bursts, or more precisely, regularly. If the display had an internal clock which was used to physically update the screen, and the graphics card had an internal clock which was used to synchronize data transmission between screen and graphics card, then there is a high probability that these clocks are not exactly in sync.
Then, on a regular basis, transmission errors would occur because the screen might be a little bit slower than the graphics card and busy drawing in the exact same moment the graphics card tries to send new data. The transmission would fail, and most importantly would not show up as a lost frame, because it’s only delayed and will be re-send the next time the synchronization clock fires.
It would be easy to test this theory by mixing a set of different graphics cards with a set of different screens. The result (more importantly the frequency of delayed frames) should be different for each pair of graphics card - screen.

However, before I waste more time in investigating this further (it’s not exactly my primary field of work), I am very interested in what you have to say to all this.

Philipp

cstawarz · August 5, 2011, 5:47pm

Attachments:

with_qe.jpg (25.8 KB)
without_qe.jpg (18.1 KB)

cstawarz · August 5, 2011, 8:37pm

Hi Philipp,

However, after population the code with debug messages, I figured out that both glSetFenceAPPLE and glFinishFenceAPPLE (also glTestFence) need a tremendous amount of time to return (usually around 1ms each).

It’s expected that glFinishFenceAPPLE would take a long time to return, since that’s where we actually wait for the buffer swap to complete. However, I don’t see why glSetFenceAPPLE or glTestFenceAPPLE would be taking on the order of 1ms to return, so I’m suspicious of that result.

In addition to that, sometimes the DisplayLink Callback would get activated (I assume that’s also an indicator for a completed VBL), but the FinishFence did not return. Thus the Announce Thread would pause some additional time and block all display updates during this wait time because it has the display lock.

That’s interesting. I agree that the display link callback getting invoked should indicate that the previous VBL and buffer swap completed. This suggests that there’s a delay in restarting the announcement thread after the buffer swap completes.

One possible explanation for such a delay is that the display link thread runs at a higher scheduling priority than the announcement thread. I should do some experiments to see if that’s true and whether matching their priorities makes any difference.

So what I did was to replace the glFenceAPPLE stuff with a boost::condition_variable that waits for a mutex. The thread that actually frees the waiting Stimulus Announcer is the Display Link Callback. That seemed straight forward to me and works like a charm. It does not solve all the problems (sometimes flushBuffer takes very long) but a lot of the stutter is gone now. That’s at least what I can see by eye (looking at the debug timestamps and the dots on the display).

I’m not sure I understand the change you made. Are you saying that, instead of blocking on glFinishFenceAPPLE, the announcement thread waits on a condition variable, which is signaled at the beginning of the display link callback?

The reason for all the fence business is to determine, as accurately as possible, when the buffer swap actually happened. The display link will tell you when the next swap should happen, and you can use that time and the refresh rate to compute when the previous swap probably happened. But waiting on the fence seems more robust than either of those approaches (although, as noted above, it’s possible that there can be a substantial delay between the buffer swap completing and MWorks being notified).

However, I do think it’s significant that your change reduces the amount of observable stutter, as it supports the hypothesis that delays in the announcement thread are impacting drawing of the next frame.

It seems to me that calling glFinish() before flushBuffer improves performance. I have no explanation for that, but I know that flushBuffer only calls glFlush() which is similar, but not equal to glFinish().

Interesting. By calling glFinish before flushBuffer, you’re guaranteeing that all drawing has completed on the back buffer before the display link callback exits. I don’t know why that would improve performance, but the fact that it does is noteworthy.

Also, it seems like a good idea to force NSOpenGLPFABackingStore, at least for the main display, as this makes the behavior of flushBuffer more predictable (although it disables some performance tweaks).

In what way does it make the behavior of flushBuffer more predictable?

Dynamic Stimuli have no way to access the last frame timestamp. I think it would be a very good idea to have a function in Stimulus Display that returns the last frame time. Right now all the moving stimuli use the time when an update is issued for their movement calculation. That might be inaccurate because it is not perfectly correlated with the VBLs.

Stefan stopped by today, and we discussed this very issue (among others). The upshot is that we agree with what you say. The important thing for a dynamic stimulus to know is what display refresh cycle it’s drawing for (with cycle 0 being the first refresh which displays the stimulus). Given that and the refresh rate, the stimulus can accurately calculate how much time has elapsed (and therefore how much it should have moved) since the last refresh.

The first type of lost frames can be triggered by expanding a stack from the dock, therefore it might be due to a busy GPU. Another hint is that it is possible to eliminate this type of lost frame (the portion of it occurring without touching the system) by disabling Quartz Extreme via the Quartz debug tools.

This makes sense. My understanding (based on this Wikipedia entry) is that when Quartz Extreme is enabled, the windowing system uses OpenGL to accelerate composition of all graphics visible on screen into a single frame buffer. This increases the load on the graphics hardware, sometimes to the point where it can’t keep up with both the needs of the windowing system and the needs of MWorks.

The second type of lost frames seems to be related to synchronization issues between the graphics card and the screen.

I wouldn’t rule this out, but I think a more likely culprit is the way that dynamic stimuli determine which frame to draw. I haven’t worked through the math, but I suspect that the stimulus is concluding (in it’s needDraw method) that it has already drawn the current frame, when in fact it hasn’t.

Actually, you should be able to verify that with another debug statement. Just override needDraw in your stimulus class to look something like this:

bool MyDynamicStimulus::needDraw() {
    bool ret = StandardDynamicStimulus::needDraw();
    if (started && (!ret))
        fprintf(stderr, "stimulus is running but doesn't want to draw!\n");
    return ret;
}

For a stimulus whose frames_per_second is set to the refresh rate, the message should never print. I’ll try this out and let you know what I find. Maybe you can do the same?

I’ll be leaving for the weekend shortly, but I’ll follow up on this stuff with you (and Stefan) next week.

Chris

cstawarz · August 6, 2011, 2:34pm

The second type of lost frames seems to be related to synchronization issues between the graphics card and the screen. At least that’s my current theory.

You run your test on a Dell TFT? Then I would agree. Most of the cheaper TFTs run internally around 50Hz - but communicate with the graphic card with 60Hz. Only the incredible expensive Quato CutterScreens, good Gamer TFTs and the cheap 3D-TFTs (like the 120Hz Samsung 2233RZ) work extern & intern with the same speed.

Philipp, maybe you can check this with one of the better screens (por favor)?

Gruß Ralf

cstawarz · August 6, 2011, 4:26pm

Ok, todays programming session yielded the following results:

Having debug messages print to the system console via ‘fprintf(stderr,“foo +%f\n”,timestamp);’ eliminates most of my previous findings. This a bit embarrassing, but I have to admit that I might be wrong about all my findings involving the APPLE Fence and the flush timing problems. It seems like the mprintf stream was simply overloaded and thus might even have caused some of the reported errors. Sorry about that.

Using the current nightly build I still have the slow-fast glitch when using the APPLE Fence, although I didn’t try modifying it like yesterdays 0.4.4 source.

I’m not sure I understand the change you made. Are you saying that, instead of blocking on glFinishFenceAPPLE, the announcement thread waits on a condition variable, which is signaled at the beginning of the display link callback?

Exactly. Although I might, given what you say about the Callback assuming that the VBL happened is true, just changed the logged timestamps to this assumed-to-be-regular VBL. The big question now is… what’s more accurate here?

Applying a simple fix (for you it’s probably a crude hack), solves the lost-without-warning frame issue. It looks like this:

//pschwed: allow for 1Hz asynchrony between real display refresh rate and requested value
if (interval_us > 1000) {
    interval_us -= 1000;
} 
else {
    merror(M_DISPLAY_MESSAGE_DOMAIN,
           "Requested frame rate (%g) is rather short",
           frameRate);
}

and goes into the play() function of DynamicStimulusDriver.

The purpose of this code is to add a 5% confidence interval to the assumed refresh rate of the screen.

I wouldn’t rule this out, but I think a more likely culprit is the way that dynamic stimuli determine which frame to draw. I haven’t worked through the math, but I suspect that the stimulus is concluding (in it’s needDraw method) that it has already drawn the current frame, when in fact it hasn’t.

I probed it with debug messages and I think that’s exactly what happens. To evaluate if a stimulus needs a redraw, the stimulus is asked if it is playing. In addition there is a check for the frameNumber that is calculated by the DynamicStimulusDriver. If the same frame happens to be requested for draw a second time, the Driver returns that there is no need to draw again. That makes a lot of sense, but has a nasty pitfall:
Frame number is calculated by taking the current time and dividing it by the expected time for one frame. The result is then truncated to the next integer. The problem here is that an estimate for a frame time is always slightly off and when it is actually longer than the real refresh interval, it happens now and then (depending of how much the estimate is off), that the truncation to the next integer eats up a whole frame. The simplest way to avoid this is to make sure that the estimate is shorter than the real refresh period. With the fix I am shortening the estimate by 1Hz, which seemes like a good compromise to me. If you choose to apply a fix like this, you should probably check that requested framerates are at least 3Hz (for whoever will be using such a strange setting).

** OK, lets look closer at a slow-fast glitch (figure attached) - this is not a lost frame, it has no warning attached, it’s just strange**

06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback is active (this is the request fort the next frame)
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread working on all the announcements .. +16730.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread is done and ends. Timestamp restetted to logged time. +25.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback has unique display lock and starts processing now +36.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback attempts to refresh Display +44.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Frame Number Evaluation yielded: 1907
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues draw to main +52.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Frame Number Evaluation yielded: 1907
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	mData::addElement(Datum value)
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Can't add element in order in something other than M_LIST -- Type => 6
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues update and Flush (main) +158.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display sets APPLE synchronization fence +250.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display calls for the announce thread +259.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues draw to mirror +270.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread alive +274.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread has Display lock +283.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Frame Number Evaluation yielded: 1907
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues update and Flush (mirror) +312.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display hears from the announce thread and ends. +458.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread has synchronized with Refresh Display +462.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback has finished updating Display +466.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback ends +498.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread now waits for APPLE Fence +512.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback is active (this is the request fort the next frame)
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread working on all the announcements .. +24804.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread is done and ends. Timestamp restetted to logged time. +32.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback has unique display lock and starts processing now +49.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback attempts to refresh Display +58.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Frame Number Evaluation yielded: 1909
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues draw to main +66.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Frame Number Evaluation yielded: 1909
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	mData::addElement(Datum value)
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Can't add element in order in something other than M_LIST -- Type => 6
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues update and Flush (main) +168.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display sets APPLE synchronization fence +212.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display calls for the announce thread +217.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues draw to mirror +231.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread alive +237.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Frame Number Evaluation yielded: 1909
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread has Display lock +249.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues update and Flush (mirror) +273.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display hears from the announce thread and ends. +423.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread has synchronized with Refresh Display +428.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback has finished updating Display +430.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback ends +445.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread now waits for APPLE Fence +462.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback is active (this is the request fort the next frame)
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread working on all the announcements .. +8554.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread is done and ends. Timestamp restetted to logged time. +20.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback has unique display lock and starts processing now +53.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback attempts to refresh Display +66.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Frame Number Evaluation yielded: 1910
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues draw to main +79.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Frame Number Evaluation yielded: 1910
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	mData::addElement(Datum value)
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Can't add element in order in something other than M_LIST -- Type => 6
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues update and Flush (main) +190.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display sets APPLE synchronization fence +234.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display calls for the announce thread +239.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues draw to mirror +247.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread alive +249.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread has Display lock +257.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Frame Number Evaluation yielded: 1910
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display issues update and Flush (mirror) +289.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Refresh Display hears from the announce thread and ends. +420.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback has finished updating Display +427.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread has synchronized with Refresh Display +428.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback ends +430.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread now waits for APPLE Fence +468.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Display Link Callback is active (this is the request fort the next frame)
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread working on all the announcements .. +16701.000000
06.08.11 17:29:07	[0x0-0xa40a4].org.mworks-project.MWServer[3656]	Announce Thread is done and ends. Timestamp restetted to logged time. +22.000000

This is the logging part corresponding to the two strangely timed frames plus one normal timed frame before and after those two. Look at Announce Thread working on all the announcements : It’s timestamp is what shows up in the plot as delta t (usec). Everything points to the fact that the APPLEFence really IS not on time, or the announce thread misses the Fences occurrence sometimes, whatever you prefer.

Also, it seems like a good idea to force NSOpenGLPFABackingStore, at least for the main display, as this makes the behavior of flushBuffer more predictable (although it disables some performance tweaks).

In what way does it make the behavior of flushBuffer more predictable?

Well, it makes sure that buffers are copied rather than swapped. Copying is slower, but the same technique for flushing would apply to all machines, even the older ones.

Stefan stopped by today, and we discussed this very issue (among others). The upshot is that we agree with what you say. The important thing for a dynamic stimulus to know is what display refresh cycle it’s drawing for (with cycle 0 being the first refresh which displays the stimulus). Given that and the refresh rate, the stimulus can accurately calculate how much time has elapsed (and therefore how much it should have moved) since the last refresh.

Yes. But the dynamic stimulus is rendering for the next refresh and therefore has to make an assumption about when this refresh is going to happen. Right now it takes the last “inter-frame-interval”, or worse, the last “inter-update-call-interval”. Having the first would give a better estimate for the time that really elapses between two updates and therefore could the dynamic stimulus render independent to the time when the rendering was issued.

A good compromise with little to no overhead would be to take the requested framerate and combine it with the last logged VBL timestamp. Like that we would get a very good estimate of when the next update is going to happen and therewith where the dynamic objects should be at that time. If for some reason the VBL sync does not happen regularly, we know about that because it’s in the datafile and there are warnings on the console (hopefully). Like that it’s at least possible to take care of irregularities in the offline analysis of the datafile.

I’ll also begin my weekend now, but what do you think about going back to skype? At least for me, it takes a bit of an effort to write these reports and I assume were the only ones reading them…

Philipp

cstawarz · August 6, 2011, 4:26pm

Attachment: nightly_with_fix.jpg (24.5 KB)

cstawarz · August 6, 2011, 4:32pm

Philipp, maybe you can check this with one of the better screens (por favor)?

Yep, yep… soon

cstawarz · August 6, 2011, 9:44pm

I just read This MWorks discussion. Isn’t that somehow related?

It seems to me that we are ready to reopen that one.

cstawarz · August 9, 2011, 2:46pm

Most of the cheaper TFTs run internally around 50Hz - but communicate with the graphic card with 60Hz.

Are you saying that the display receives frame buffers from the graphics card at 60Hz, but actually updates the LCD at 50Hz? How does that work?

I would really like to see Philipp’s oscilloscope tests re-run on a display with known-sensible timing characteristics. Any chance you could find a CRT to run them on?

Chris

cstawarz · August 9, 2011, 2:57pm

I just read This MWorks discussion. Isn’t that somehow related?

No. The symptom in that case was a very obvious frame tear or full-screen flicker (depending on whether you used a CRT or an LCD) every time you did a display update. It had nothing to do with dynamic stimuli or the GL fence. It also didn’t manifest on every graphics card, which led us to conclude that it was a regression in the drivers for certain cards that appeared in one of the 10.6.x updates.

Subsequent to the fix I described in that thread, Dave re-implemented the fullscreen display code using Apple’s recommended method for 10.6. That change and/or subsequent 10.6.x driver updates eliminated the problem.

Chris

cstawarz · August 9, 2011, 4:10pm

Using the current nightly build I still have the slow-fast glitch when using the APPLE Fence, although I didn’t try modifying it like yesterdays 0.4.4 source.

You mean 0.4.5, right? The stimulus display code changed significantly between 0.4.4 and 0.4.5 (e.g. that’s when we started using the display link), so I really hope we haven’t been talking about results obtained under 0.4.4.

Frame number is calculated by taking the current time and dividing it by the expected time for one frame. The result is then truncated to the next integer. The problem here is that an estimate for a frame time is always slightly off and when it is actually longer than the real refresh interval, it happens now and then (depending of how much the estimate is off), that the truncation to the next integer eats up a whole frame. The simplest way to avoid this is to make sure that the estimate is shorter than the real refresh period. With the fix I am shortening the estimate by 1Hz, which seemes like a good compromise to me.

I suppose that’s one way to do it. I was thinking that the base class would keep track of how many refresh cycles have elapsed since it started playing (which would require giving it access to more of the timing info passed to the display-link callback). That seems cleaner and more straightforward to me, although we’d have to be sure to include any skipped refresh cycles in the count.

OK, lets look closer at a slow-fast glitch (figure attached) - this is not a lost frame, it has no warning attached, it’s just strange

If I’m interpreting your time stamps correctly, what these results show is a single instance of glFinishFenceAPPLE returning “late”, by which I mean “later than we’d like”, since the API makes no guarantee about when it will return, other than that it will be after all subsequent GL actions have completed.

The late return corresponds to the time stamp +24804.000000. In this case, glFinishFenceAPPLE doesn’t return until half way in to the next refresh cycle (i.e. ~8ms after the VBI and buffer swap). Despite the late return, drawing for the next refresh does complete on time, which is why the time stamp for the next announcement is +8554.000000 (i.e. half a refresh cycle after the previous announcement, and coincident with the next VBI and buffer swap). Hence, the “short” interval here is just an artifact of the method you’re using to compare display update times, and the corresponding display update is actually right on time.

Nonetheless, these results indicate that we can’t rely on glFinishFenceAPPLE returning quickly enough to avoid impacting drawing of the subsequent frame. Yesterday, Stefan, Jim, and I discussed this problem and outlined several possible solutions:

Try raising the priority of the thread calling glFinishFenceAPPLE, so that it’s less likely to be delayed if the CPU is busy. If nothing else, it should match the priority of the display link thread, which is not true currently (I think).
Remove the fence wait and instead rely on the timing info provided by the display link to estimate when each display update occurs. Essentially, that would mean that the logged time of a display update would be outputTime->hostTime, unless we detect that the next draw occurs late. Here’s what this would look like in pseudocode:
```
 // Announce the previous frame
 if (numSkippedFrames > 0) {
   announcementTime += numSkippedFrames * refreshPeriod;
 }
 announceDisplayUpdate(announcementTime);
 
 // Draw the current frame
 ...
 
 // Record the expected annoucement time for the current frame
 annoucementTime = outputTime->hostTime;
```
For this approach to be reliable, we would need to be confident that if frames were skipped, then the previously-drawn frame actually appeared on screen during the most recent refresh. As far as I know, the display link in no way guarantees this, so this approach makes me nervous.
Use some other GL mechanism to determine when each buffer swap happens. OpenGL 3.2 (available in OS X 10.7) includes the GL_ARB_sync extension, which provides a standardized replacement for the Apple-specific fence mechanism. Theoretically, there’s no reason why this would work better than the current approach, but it’s possible that the standard mechanism will be better supported by graphics hardware vendors.

More promising is the GL_ARB_timer_query extension, which allows GL client code to get timing information from the graphics pipeline asynchronously. Using this extension, we could issue a glQueryCounter call immediately after flushBuffer, and then wait until the beginning of the next draw loop to access the result and announce the draw. This would eliminate the need for a separate thread that waits on a fence, potentially delaying the next frame draw.

Unfortunately, the timer query extension is part of OpenGL 3.3, which is not fully supported under 10.7. However, the extension is supported on the two 10.7 systems I have access to (one of which has an ATI GPU, the other NVIDIA, so it doesn’t seem to be a vendor-specific feature).

More problematically, updating MWorks to use OpenGL 3.2 would require rewriting the drawing code for most stimuli, as 3.2 breaks compatibility with 2.1. That would require a significant amount of work. However, we will have to make that jump eventually, so the question is really “how soon?”.

Chris

cstawarz · August 9, 2011, 4:12pm

other than that it will be after all subsequent GL actions have completed.

I meant that it will be after all preceding GL actions have completed. Sorry.

Topic		Replies	Views
Stim display update delay patterns Support	3	95	April 2, 2024
Can't get stimulus display to update at 60Hz Support	20	185	July 20, 2022
Timing Issue in MWorks Support	6	124	July 19, 2022
Help debugging? Support	9	139	July 19, 2022
Ball Moving Jittered on Monitor Support	12	91	July 19, 2022

Draw timing

Related topics