Closed Bug 738645 Opened 8 years ago Closed 7 years ago

Intermittent Robocop testAboutPage | page title match - about: page title is correct

Categories

(Firefox for Android :: General, defect)

All
Android
defect
Not set

Tracking

()

RESOLVED WORKSFORME
Tracking Status
blocking-fennec1.0 --- -

People

(Reporter: kats, Assigned: wesj)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 1 obsolete file)

cc'ing wesj since the first cset i see this on was his.
One patch that went in a bit before this started cropping up was bug 738137, which could potentially affect layers rendering on android.
(In reply to Jeff Gilbert [:jgilbert] from comment #3)
> One patch that went in a bit before this started cropping up was bug 738137,
> which could potentially affect layers rendering on android.

I'm guessing that comment was intended for bug 738652? (There's two different kinds of intermittent failures in robocop at the moment)
(In reply to Kartikaya Gupta (:kats) from comment #4)
> (In reply to Jeff Gilbert [:jgilbert] from comment #3)
> > One patch that went in a bit before this started cropping up was bug 738137,
> > which could potentially affect layers rendering on android.
> 
> I'm guessing that comment was intended for bug 738652? (There's two
> different kinds of intermittent failures in robocop at the moment)

Yes, sorry.
(In reply to Kartikaya Gupta (:kats) from comment #2)
> Trying a possible fix at https://tbpl.mozilla.org/?tree=Try&rev=95c80b2ac614

Wesj: see the top patch on this try run. It seems to fix the testAboutPage and makes testFormHistory and testPasswordProvider consistently fail. I think your patch changes behaviour so these tests are incorrect. Do you agree?
https://tbpl.mozilla.org/php/getParsedLog.php?id=10324137&tree=Firefox
https://tbpl.mozilla.org/php/getParsedLog.php?id=10323394&tree=Mozilla-Inbound
Summary: Robocop: testAboutPage is failing intermittently → Robocop: testAboutPage is failing intermittently (page title match - about: page title is correct)
Assignee: nobody → wjohnston
Yeah, those tests expect that the provider's aren't initialized, but we changed their behavior.

These failures probably shouldn't have been starred without a backout or a fix in hand.
Summary: Robocop: testAboutPage is failing intermittently (page title match - about: page title is correct) → Intermittent Robocop testAboutPage | page title match - about: page title is correct
blocking-fennec1.0: --- → -
This is kats patch plus some changes to the Passwords/FormHistory tests:

https://tbpl.mozilla.org/?tree=Try&rev=8b6ca0c96afb
There are 2 cases when the tests can fail:
1. The Robocop receives the DOMContentLoaded event _before_ GeckoApp receives it.
2. The Robocop tries to look at the title _before_ GeckoApp informs it's UI thread to update the title.

Though Fennec works fine, the way to fix the tests are:
1. The tests should wait for DOMContentLoaded event, just like now.
2. The tests should wait for change in EditText.
   -- this would require an EventExpecter added, which waits for onTextChanged event on the specified EditText / TextView.

Things to consider:
1. What if we don't get the event from EventExpecter at all? (same as, what will happen if we don't get an event from Gecko, which we are eagerly waiting for?)
2. If we aren't waiting for UI changes, shouldn't the normal test wait for GeckoApp to process the event, before Robocop process it?
Attached patch Patch (obsolete) — Splinter Review
As per discussions in IRC, this patch adds a delay before notifying the blocking threads about received event. This allows GeckoApp to process the event before the Test does.
Attachment #610230 - Flags: review?(mark.finkle)
Comment on attachment 610230 [details] [diff] [review]
Patch

I don't think this does what you want it to do. You're wait()ing on the thread that is dispatching the event, which means that the other listeners that are handling this event will probably also not be running. A better place to do the sleep is in the blockForEvent() function, after exiting the while loop but before returning from the function.
Comment on attachment 610230 [details] [diff] [review]
Patch

This looks OK to me as a start. We could try different approaches for the "wait" if we need to. I'd love to find a different magical way to handle this type race.

Adding Joel for a review too.

More importantly, does this "fix" the oranges we think it should?
Attachment #610230 - Flags: review?(mark.finkle)
Attachment #610230 - Flags: review?(jmaher)
Attachment #610230 - Flags: review+
(In reply to Kartikaya Gupta (:kats) from comment #179)
> Comment on attachment 610230 [details] [diff] [review]
> Patch
> 
> I don't think this does what you want it to do. You're wait()ing on the
> thread that is dispatching the event, which means that the other listeners
> that are handling this event will probably also not be running. A better
> place to do the sleep is in the blockForEvent() function, after exiting the
> while loop but before returning from the function.

I don't understand this part. How do blockForEvent() help us? notifyForEvent() notifies all listeners. And I'm waiting to send the notification. Isn't it right? blockForEvent() will wait the caller right? So, a further wait of 1000ms, can be cancelled any time by the notify from notifyForEvent() right?
Comment on attachment 610230 [details] [diff] [review]
Patch

Review of attachment 610230 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry Sriram...just a couple more drive-by comments.

::: build/mobile/robocop/FennecNativeActions.java.in
@@ +187,5 @@
>                  "received event " + mGeckoEvent);
>              synchronized (this) {
>                  mEventReceived = true;
> +
> +                // Waiting for specified delay to allow Fennec to process the event.

Can we make this comment something like "Delay notification to reduce the chance of random test failures caused by delayed processing in Fennec."?

@@ +200,5 @@
>      }
> +
> +    public EventExpecter expectGeckoEvent(String geckoEvent) {
> +        // Default time to wait before notifying the expecters, in milliseconds.
> +        return expectGeckoEvent(geckoEvent, 1000);

1000 ms seems like a long time to me -- would 500 ms do? 100?? We don't want to cut it too short (and risk even less frequent random failures), but 1000 ms is arguably longer than a human would wait before reacting.
Attached patch PatchSplinter Review
This incorporates kats and geoffbrown's suggestions.
Attachment #610230 - Attachment is obsolete: true
Attachment #610230 - Flags: review?(jmaher)
Attachment #610261 - Flags: review?(mark.finkle)
Attachment #610261 - Flags: review?(jmaher)
Comment on attachment 610261 [details] [diff] [review]
Patch

The approach is still OK with me, but let's get kats to do the review, since I failed the first time.
Attachment #610261 - Flags: review?(mark.finkle) → review?(bugmail.mozilla)
Comment on attachment 610261 [details] [diff] [review]
Patch

Review of attachment 610261 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #610261 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 610261 [details] [diff] [review]
Patch

Review of attachment 610261 [details] [diff] [review]:
-----------------------------------------------------------------

we should run this on try a few times
Attachment #610261 - Flags: review?(jmaher) → review+
FYI, this might have been caused by bug 753845, a fix for which is now on inbound.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.