Closed Bug 867183 Opened 7 years ago Closed 6 years ago

Intermittent image\test\unit\test_async_notification.js | Test timed out

Categories

(Core :: ImageLib, defect)

x86_64
Windows 8
defect
Not set

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox24 --- affected
firefox25 --- affected
firefox26 --- affected

People

(Reporter: RyanVM, Assigned: seth)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 8 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=22413278&tree=Mozilla-Central

WINNT 6.2 mozilla-central pgo test xpcshell on 2013-04-30 07:12:48 PDT for push dd0c611a0a27
slave: t-w864-ix-020

07:15:58     INFO -  TEST-INFO | C:\slave\test\build\tests\xpcshell\tests\image\test\unit\test_async_notification.js | running test ...

command timed out: 1200 seconds without output, attempting to kill
program finished with exit code 1
Is there really an issue with the test itself? I see random issues on Windows that AFAICT are due to the test harness being unreliable.

If this doesn't reoccur within a month, I'm inclined to WORKSFORME the bug.
(In reply to Seth Fowler [:seth] from comment #1)
> Is there really an issue with the test itself? I see random issues on
> Windows that AFAICT are due to the test harness being unreliable.
> 
> If this doesn't reoccur within a month, I'm inclined to WORKSFORME the bug.

No such luck :(

Does the new logging help at least?
Flags: needinfo?(seth)
(In reply to Ryan VanderMeulen [:RyanVM][Out May 23-27] from comment #7)
> No such luck :(
> 
> Does the new logging help at least?

It seems like the frequency has suddenly jumped up as of the past week or so. Odd.

The new logging looks like it shows how far we get before things grind to a halt. That is probably going to be very useful. The next step is to correlate this with the test code and figure out which notification is never getting delivered (which I'm guessing is the problem here).
Flags: needinfo?(seth)
Summary: Intermittent image/test/unit/test_async_notification.js command timed out: 1200 seconds without output, attempting to kill → Intermittent image\test\unit\test_async_notification.js | Test timed out
Seth, any chance you could take a fresh look at this?
Flags: needinfo?(seth)
Off the top of my head, I wonder if this is because it's now possible for us to finish decoding synchronously from within callbacks from decode notifications (eg if someone calls RequestDecode() from a notification, we will see if a thread has finished decoding an image, sending out notifications), and that might screw up the logic of these tests.

This is another thing that might be fixed by not doing reentrant notifications.
(In reply to Joe Drew (:JOEDREW! \o/) from comment #88)
> This is another thing that might be fixed by not doing reentrant
> notifications.

Is there a bug tracking that work that we can mark this as depending on?
Presuming that this is a regression from multithreaded image decoding.
Assignee: nobody → joe
Blocks: 716140
Flags: needinfo?(seth) → needinfo?
Flags: needinfo?
Attached patch don't reentrantly notify (obsolete) — Splinter Review
I earlier tried not calling FinishedSomeDecoding reentrantly, and that failed. But really all we care about is not notifying reentrantly. This might be safer.

Try: https://tbpl.mozilla.org/?tree=Try&rev=01a8f5391835
Attachment #780598 - Flags: review?(seth)
Attached patch whitespace-insensitive (obsolete) — Splinter Review
A whitespace-insensitive diff of attachment 780598 [details] [diff] [review]
Unfortunately this patch broke some browser-chrome tests, among others. Ultrasigh.
Comment on attachment 780598 [details] [diff] [review]
don't reentrantly notify

Review of attachment 780598 [details] [diff] [review]:
-----------------------------------------------------------------

I'm clearing the review flag; set it again once the tests pass.

::: image/src/RasterImage.cpp
@@ +2989,5 @@
>  
> +    mNotifying = false;
> +
> +  } else {
> +    DecodeDoneWorker::NotifyFinishedSomeDecoding(this, aRequest);

So if someone tries to reentrantly notify we perform the notification asynchronously? Not surprised this broke some tests. Hopefully it's the tests that are bad and there's not a good reason to actually require these notifications to be synchronous.

::: image/src/RasterImage.h
@@ +645,5 @@
>    // END LOCKED MEMBER VARIABLES
>  
>    bool                       mInDecoder;
>  
> +  bool                       mNotifying;

Nit: Any reason this (and mInDecoder, for that matter) aren't bitfields?
Attachment #780598 - Flags: review?(seth)
(In reply to Seth Fowler [:seth] from comment #183)
> ::: image/src/RasterImage.cpp
> @@ +2989,5 @@
> >  
> > +    mNotifying = false;
> > +
> > +  } else {
> > +    DecodeDoneWorker::NotifyFinishedSomeDecoding(this, aRequest);
> 
> So if someone tries to reentrantly notify we perform the notification
> asynchronously? Not surprised this broke some tests. Hopefully it's the
> tests that are bad and there's not a good reason to actually require these
> notifications to be synchronous.

Yeah, that's the intent. Unfortunately I think there's a code requirement for this; hopefully only one that I can fix and be done with it.

> Nit: Any reason this (and mInDecoder, for that matter) aren't bitfields?

Yep: you can't locklessly update bitfields on multiple threads.

(mNotifying could probably be a bitfield since it should only be modified on the main thread.)
Joe, any more luck with this toporange since comment 188? :-)
Flags: needinfo?(joe)
Assignee: joe → nobody