Closed Bug 1111137 Opened 10 years ago Closed 6 years ago

Intermittent test_user_agent_overrides.html | Navigator UA not overridden at step 1 - got Mozilla/5.0 (Android; Mobile; rv:37.0) Gecko/37.0 Firefox/37.0, expected DummyUserAgent

Categories

(Core :: Networking, defect, P3)

36 Branch
x86_64
Android
defect

Tracking

()

RESOLVED INACTIVE
Tracking Status
firefox35 --- disabled
firefox36 --- disabled
firefox37 --- disabled
firefox38 --- disabled
firefox-esr31 --- disabled

People

(Reporter: KWierso, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [test disabled on Android][leave open][necko-backlog])

14:15:58 INFO - 125 INFO TEST-START | /tests/netwerk/test/mochitests/test_user_agent_overrides.html 14:15:58 INFO - 126 INFO Overriding mochi.test with DummyUserAgent 14:15:58 INFO - 127 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | request failed 14:15:58 INFO - 128 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | invalid response 14:15:58 INFO - 129 INFO TEST-UNEXPECTED-FAIL | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | Navigator UA not overridden at step 1 - got Mozilla/5.0 (Android; Mobile; rv:37.0) Gecko/37.0 Firefox/37.0, expected DummyUserAgent 14:15:58 INFO - 130 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | request failed 14:15:58 INFO - 131 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | invalid response 14:15:58 INFO - 132 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | Header UA not overridden at step 2 14:15:58 INFO - 133 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | Navigator UA not restored at step 3 14:15:58 INFO - 134 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | request failed 14:15:58 INFO - 135 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | invalid response 14:15:58 INFO - 136 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | Header UA not restored at step 4 14:15:58 INFO - 137 INFO Overriding mochi.test with \wozilla#DummyUserAgent 14:15:58 INFO - 138 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | request failed 14:15:58 INFO - 139 INFO TEST-PASS | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | invalid response 14:15:58 INFO - 140 INFO TEST-UNEXPECTED-FAIL | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | Navigator UA not overridden at step 5 - got Mozilla/5.0 (Android; Mobile; rv:37.0) Gecko/37.0 Firefox/37.0, expected DummyUserAgent/5.0 (Android; Mobile; rv:37.0) Gecko/37.0 Firefox/37.0 14:15:58 INFO - 141 INFO TEST-OK | /tests/netwerk/test/mochitests/test_user_agent_overrides.html | took 17837ms 14:15:58 INFO - 142 INFO TEST-START | /tests/netwerk/test/mochitests/test_user_agent_updates.html 14:15:58 INFO - 143 INFO TEST-UNEXPECTED-FAIL | /tests/netwerk/test/mochitests/test_user_agent_updates.html | Test timed out. - expected PASS 14:15:58 INFO - 144 INFO TEST-OK | /tests/netwerk/test/mochitests/test_user_agent_updates.html | took 330044ms 14:15:58 INFO - 145 INFO TEST-START | /tests/netwerk/test/mochitests/test_user_agent_updates_reset.html 14:15:58 INFO - org.mozilla.fennec still alive after SIGABRT: waiting... 14:15:58 WARNING - TEST-UNEXPECTED-FAIL | /tests/netwerk/test/mochitests/test_user_agent_updates_reset.html | application timed out after 330 seconds with no output 14:15:58 INFO - INFO | automation.py | Application ran for: 0:34:05.500884 14:15:58 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmpiBZQbcpidlog 14:15:58 INFO - Contents of /data/anr/traces.txt: 14:15:58 INFO - 14:15:58 INFO - 14:15:58 INFO - 14:15:58 INFO - ----- pid 597 at 2014-12-12 14:13:49 ----- 14:15:58 INFO - Cmd line: org.mozilla.fennec 14:15:58 INFO - 14:15:58 INFO - DALVIK THREADS: 14:15:58 INFO - (mutexes: tll=0 tsl=0 tscl=0 ghl=0 hwl=0 hwll=0) 14:15:58 INFO - "main" prio=5 tid=1 NATIVE 14:15:58 INFO - | group="main" sCount=1 dsCount=0 obj=0x4001f188 self=0xce48 14:15:58 INFO - | sysTid=597 nice=0 sched=0/0 cgrp=default handle=-1345006528 14:15:58 INFO - | schedstat=( 7296340783 13505211533 2269 ) 14:15:58 INFO - at android.os.MessageQueue.nativePollOnce(Native Method) 14:15:58 INFO - at android.os.MessageQueue.next(MessageQueue.java:119) 14:15:58 INFO - at android.os.Looper.loop(Looper.java:117) 14:15:58 INFO - at android.app.ActivityThread.main(ActivityThread.java:3683) 14:15:58 INFO - at java.lang.reflect.Method.invokeNative(Native Method) 14:15:58 INFO - at java.lang.reflect.Method.invoke(Method.java:507) 14:15:58 INFO - at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839) 14:15:58 INFO - at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597) 14:15:58 INFO - at dalvik.system.NativeStart.main(Native Method) 14:15:58 INFO -
A bunch of these were getting starred into bug 1065132 before this got filed too.
Flags: needinfo?(nchen)
Ehsan, I see that you have some history with this infra. Any chance you can help me find an owner for this very frequent failure?
Flags: needinfo?(ehsan.akhgari)
I'll take a look today or tomorrow.
Flags: needinfo?(nchen)
(In reply to TBPL Robot from comment #33) > repository: mozilla-esr31 So does this test depend on an external resource which changed, and thus the test should be disabled on every single branch, or does it depend on something related to the infra it runs on?
(In reply to Phil Ringnalda (:philor) from comment #109) > (In reply to TBPL Robot from comment #33) > > repository: mozilla-esr31 > > So does this test depend on an external resource which changed, and thus the > test should be disabled on every single branch, or does it depend on > something related to the infra it runs on? The test doesn't depend on external resources. I want to say it's some kind of infra issue and we should just disable it on Android for now. (I couldn't even get it to reproduce on try after I added some logging: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b5fc5d72b52)
Apologies for the delay here. I have a theory here. UserAgentOverrides.init() calls UserAgentUpdates.init() which is asynchronous, and relies on the general.useragent.override.* prefs set by the test. The test first calls UserAgentOverrides.init() and then sets the pref asynchronously. I think this is a race condition that can potentially break overriding the UA string, as happens in these test failures.
Flags: needinfo?(ehsan)
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (not reading bugmail, needinfo? me!) from comment #203) > Apologies for the delay here. > > I have a theory here. UserAgentOverrides.init() calls > UserAgentUpdates.init() which is asynchronous, and relies on the > general.useragent.override.* prefs set by the test. The test first calls > UserAgentOverrides.init() and then sets the pref asynchronously. I think > this is a race condition that can potentially break overriding the UA > string, as happens in these test failures. That theory was wrong since UserAgentOverrides will be initialized at startup. Landing a debugging patch to see if the correct component is registered on each build, testing another hypothesis... <https://hg.mozilla.org/integration/mozilla-inbound/rev/0b4e473beaa5>
Keywords: leave-open
(In reply to :Ehsan Akhgari (not reading bugmail, needinfo? me!) from comment #212) > (In reply to :Ehsan Akhgari (not reading bugmail, needinfo? me!) from > comment #203) > > Apologies for the delay here. > > > > I have a theory here. UserAgentOverrides.init() calls > > UserAgentUpdates.init() which is asynchronous, and relies on the > > general.useragent.override.* prefs set by the test. The test first calls > > UserAgentOverrides.init() and then sets the pref asynchronously. I think > > this is a race condition that can potentially break overriding the UA > > string, as happens in these test failures. > > That theory was wrong since UserAgentOverrides will be initialized at > startup. > > Landing a debugging patch to see if the correct component is registered on > each build, testing another hypothesis... > <https://hg.mozilla.org/integration/mozilla-inbound/rev/0b4e473beaa5> The correct component is registered in all of the failing test runs. There goes another theory out of the window...
Ryan, sorry but I have no more theories as to what's happening. This code is testing an Android specific component that is not tested elsewhere, so it would be unfortunate to disable the test on Android. But if the mobile folks are unresponsive to this, I guess you should do the honors. :/ (If I had a debugging environment for Android mochitests, I'd help more... Sorry!)
Flags: needinfo?(ehsan)
Bit late for me to do it today, but I'll disable it across the branches first thing tomorrow. Shame :(
Flags: needinfo?(ryanvm)
Ehsan very-astutely points out that these failures appear to be build-specific (i.e. you can retrigger a failed run time as many times as you want and it will continue to fail indefinitely). Also, this bug started around the same time the split APK work was landing. Coincidence?
Flags: needinfo?(jlund)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #676) > Ehsan very-astutely points out that these failures appear to be > build-specific (i.e. you can retrigger a failed run time as many times as > you want and it will continue to fail indefinitely). Also, this bug started > around the same time the split APK work was landing. Coincidence? not just around the same day, *the* same day: https://bugzilla.mozilla.org/show_bug.cgi?id=1111137#c0 http://hg.mozilla.org/build/buildbot-configs/rev/6e2c613210e9 302 to rnewman for two reasons: 1) hoping he or someone from #mobile can provide some insight. 2) by removing this test from mochitest [comment 667] effects mobile so they can determine how valuable it is. rnewman, this isn't specific to the constrained api 9-10 or api 11+ builds as you can see from tbpl robot comments. I can lend a machine from within our infra and help run known bad revisions on it with someone from mobile if it helps.
Flags: needinfo?(jlund) → needinfo?(rnewman)
See Also: → 1078267
My favorite part of this is the line in the log immediately before the failure that shows the override occurring: 11:16:20 INFO - 519 INFO Overriding mochi.test with DummyUserAgent 11:16:20 INFO - 520 INFO TEST-PASS | netwerk/test/mochitests/test_user_agent_overrides.html | request failed 11:16:20 INFO - 521 INFO TEST-PASS | netwerk/test/mochitests/test_user_agent_overrides.html | invalid response 11:16:20 INFO - 522 INFO TEST-UNEXPECTED-FAIL | netwerk/test/mochitests/test_user_agent_overrides.html | Navigator UA not overridden at step 1 - got Mozilla/5.0 (Android; Tablet; rv:38.0) Gecko/38.0 Firefox/38.0, expected DummyUserAgent Yeah, I think we want to fix this, but the split APK work shouldn't have any affect on this apart from timing, and if timing is the cause then we should see a more usual smear of failures, rather than being build-specific. Nick, any thoughts on Comment 679?
Flags: needinfo?(nalexander)
(In reply to Richard Newman [:rnewman] from comment #680) > My favorite part of this is the line in the log immediately before the > failure that shows the override occurring: > > 11:16:20 INFO - 519 INFO Overriding mochi.test with DummyUserAgent > 11:16:20 INFO - 520 INFO TEST-PASS | > netwerk/test/mochitests/test_user_agent_overrides.html | request failed > 11:16:20 INFO - 521 INFO TEST-PASS | > netwerk/test/mochitests/test_user_agent_overrides.html | invalid response > 11:16:20 INFO - 522 INFO TEST-UNEXPECTED-FAIL | > netwerk/test/mochitests/test_user_agent_overrides.html | Navigator UA not > overridden at step 1 - got Mozilla/5.0 (Android; Tablet; rv:38.0) Gecko/38.0 > Firefox/38.0, expected DummyUserAgent > > > Yeah, I think we want to fix this, but the split APK work shouldn't have any > affect on this apart from timing, and if timing is the cause then we should > see a more usual smear of failures, rather than being build-specific. > > Nick, any thoughts on Comment 679? I am short for context and mostly confused. I agree that the split APK work should not have impacted this save for timing (due to the loading of additional classes, time spent initializing Google Play Services, etc). But I don't understand why you consider this "build-specific": it certainly seems to impact both 9-10 and 11+ builds, suggesting that it is *not* build-specific. Is your claim that it should not have impacted 11+ builds at all, because they did not change? I don't think that's true, because we strip some 9-10 only stuff from there too. In short, this really does appear to be intermittent.
Flags: needinfo?(nalexander)
"build-specific" in this context means that a given build will either always hit these failures or always not.
Nick, can you please tell me what the "split APK" work is about? Perhaps I can give you some possible theories, but I don't really know what people mean by that term. :-)
Flags: needinfo?(nalexander)
(In reply to :Ehsan Akhgari [Away: 1/29-2/20] (not reading bugmail, needinfo? me!) from comment #683) > Nick, can you please tell me what the "split APK" work is about? Perhaps I > can give you some possible theories, but I don't really know what people > mean by that term. :-) Split APK divided Android builds into two parts: those targeting Android SDK 9-10, and those targeting Android SDK 11+. Conceptually it's similar to ARM vs. x86, or OS X 10.6 vs. 10.7+. We disable some features on 9-10, and cull some code and resources on 11+. That culling could impact timing, since Java code loads at different times. It should not impact JavaScript execution at all, but it appears to be doing so. rnewman and jlund can provide details as needed.
Flags: needinfo?(nalexander)
Could this have affected XPCOM component registration in any way?
It should not. Right now the only effects are in Java-land. Furthermore, if there were something wrong with the constrained build stuff, I'd expect it to hit either API9-10 or API11+, not some builds for each. One hypothesis that might explain the "some builds, some of the time" problem is that there's a build dependency somewhere that isn't correctly indicated, and an incremental build is doing something wrong. But I don't think our builders mix and match mozconfigs against incremental object directories...
Flags: needinfo?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #686) > It should not. Right now the only effects are in Java-land. Ah, then I'm afraid I don't have any good theories... > Furthermore, if there were something wrong with the constrained build stuff, > I'd expect it to hit either API9-10 or API11+, not some builds for each. Well, if we had an intermittent *build* issue, we'd observe the exact symptoms as we are observing now, right? > One hypothesis that might explain the "some builds, some of the time" > problem is that there's a build dependency somewhere that isn't correctly > indicated, and an incremental build is doing something wrong. But I don't > think our builders mix and match mozconfigs against incremental object > directories... You don't need the mix and match to get unknown build dependencies to bite you like this, do you? It would be interesting to go back through the logs and see if this ever happened on a periodic clobber, to (dis)prove that theory.
I'm seeing try failures of this test on desktop with bug 1127448. Perhaps we should disable it everywhere while that code is on the tree.
Blocks: 1127448
Priority: -- → P5
Whiteboard: [test disabled on Android][leave open] → [test disabled on Android][leave open][necko-backlog]
Priority: P5 → P1
Priority: P1 → P3
The leave-open keyword is there and there is no activity for 6 months. :selenamarie, maybe it's time to close this bug?
Flags: needinfo?(sdeckelmann)
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(sdeckelmann)
Resolution: --- → INCOMPLETE
Resolution: INCOMPLETE → INACTIVE
You need to log in before you can comment on or make changes to this bug.