Closed
Bug 536473
Opened 15 years ago
Closed 14 years ago
Electrolysis-windows-talos: Seemingly random Talos "browser frozen"
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: benjamin, Assigned: smaug)
Details
Electrolysis has been having a problem since late last week: for some seemingly random percentage of pushes, all of the Talos runs on Windows are orange with "browser frozen" (at startup... the test doesn't seem to produce any results at all). This is tdhtml, tsspider, tgfx, tsvg, and tp4, but not the Ts tests. Sample logs: http://tinderbox.mozilla.org/showlog.cgi?log=Electrolysis/1261522502.1261523822.3152.gz http://tinderbox.mozilla.org/showlog.cgi?log=Electrolysis/1261514366.1261516401.16815.gz I've downloaded the same exact hourly builds that Talos is downloading (from stage) and run StandaloneTalos on them without incident. About 25% of builds, all the tests pass. But it's always all-or-nothing: for a particular build/set-of-talos, they either all fail or all pass. For example, the Talos runs for cset afc656f387fe, pushed Monday, December 21, 2009 9:24:14 AM -0800 all failed, and the runs for 3d5dcaeba50f, pushed Monday, December 21, 2009 9:37:55 AM -0800 all passed. The code inbetween those two pushes is not run by Talos at all, and can't have affected them. The last reliably-green push was 0e3ed118aedd, pushed Thursday, December 17, 2009 1:39:39 PM -0800. The orangeness started with 07c66d63ecb7, pushed Thursday, December 17, 2009 4:12:27 PM -0800 (which also is IPC-only code that Talos can't hit). I mentioned it to lsblakk and nthomas today and IRC, and they suggested perhaps clobbering the builders, but that didn't help. I don't know what to do next. Maybe somebody from releng can catch the "browser frozen" on one of the actual machines and see what it's doing (e.g. it's displaying a dialog box of some sort which prevents the tests from running), or if there's a spare Talos slave I can try to reproduce on it over VNC.
Comment 1•15 years ago
|
||
I've just rerun "WINNT 5.1 electrolysis talos" using http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/electrolysis-win32/1261517623/firefox-3.7a1pre.en-US.win32.zip (the current latest build) on an xp talos machine. Minefield has launched enough that Windows drew a titlebar and the window frame, but everything inside it is white (no chrome or content drawn). It's sitting idle, no processes using any cpu time.
Comment 2•15 years ago
|
||
I've tidied up talos-rev1-xp02 for you to play with. It is located at the MV office so you will need to use the MV office vpn to access it through VNC. Please ping me on irc for user/passwd.
Reporter | ||
Comment 3•15 years ago
|
||
Error console messages: failed to load XPCOM component: C:\.....\firefox\components\tp-cmdline.js Failed to load XPCOM component: C:\.....\firefox\components\nsProgresDialog.js Warning: unrecognized command line flag -tp Warning: unrecognized command line flag -tpchrome Warning: unrecognized command line flag -tpformat Warning: unrecognized command line flag -tpcycles
Reporter | ||
Comment 4•15 years ago
|
||
This appears to be because, at least on the Talos slave I have for testing, talos\page_load_test\components doesn't exist.
Comment 5•15 years ago
|
||
The script generate-tpcomponent.py is meant to populate the page_load_test\components directory: http://hg.mozilla.org/build/tools/file/default/buildfarm/utils/generate-tpcomponent.py Has e10s moved any of the files that that script is trying to copy? Maybe we're getting a silent copy failure early on.
Reporter | ||
Comment 6•15 years ago
|
||
Hrm, maybe I'm not using the slave correctly (I was trying to use it like a standalone talos setup, running run_tests.py my.config), so perhaps I'm not running that script correctly. I don't know of any relevant changes e10s has made to the files in question.
Comment 7•15 years ago
|
||
Sorry, that was overly aggressive slave cleanup on my part. I'll install the pageloader in the correct location for you.
Comment 8•15 years ago
|
||
(In reply to comment #7) > Sorry, that was overly aggressive slave cleanup on my part. I'll install the > pageloader in the correct location for you. pushing over to Alice.
Assignee: nobody → anodelman
Comment 9•15 years ago
|
||
Fix already in place, this bug should not be assigned to me.
Assignee: anodelman → nobody
Comment 10•15 years ago
|
||
(In reply to comment #9) > Fix already in place, this bug should not be assigned to me. Per irc, I didnt know Alice already put her fix in place, so we think her work here is done. bsmedberg, can you see if this is still a problem and if so, is it an electrolysis problem?
Assignee: nobody → benjamin
Comment 11•15 years ago
|
||
Sorry, fix as in talos-rev1-xp02 is now correctly configured for testing - not as in a fix for the actual browser freezing issue.
Reporter | ||
Comment 12•15 years ago
|
||
In the hanging case, I get a JS exception: Error: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIWebNavigation.loadURI]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://global/content/bindings/browser.xml :: loadURIWithFlags :: line 187" data: no] When I set up a try/catch, the error occurs trying to load http://localhost/page_load_test/tp4/www.youtube.com/www.youtube.com/index.html, which is a perfectly normal URL. I then get a couple of errors: Error: evt.originalTarget.defaultView is undefined Source File: chrome://pageloader/content/pageloader.js Line: 293 There is no mozilla-runtime process, so I'm pretty sure we're not running into any issues with accidentally trying to do remote tabs. That leaves the set of content changes for e10s which aren't plugin-related somehow causing the initial pageload to fail. bz/smaug, any clues about that (and why it would only show up on the Talos machines and not locally)?
Comment 13•15 years ago
|
||
Hmm. LoadURI can return NS_ERROR_FAILURE in the following cases (if we ignore the history-entry cases, which I assume we're not hitting here): 1) Empty (or whitespace-only) URI string 2) CreateFixupURI failed 3) GetService for the security manager fails with that error code 4) IsSystemPrincipal fails with that error code 5) SchemeIs on the given URI fails, or it's a wyciwyg URI 6) It's a targeted load and window.open fails with that error code 7) mIsBeingDestroyed is true 8) CheckLoadingPermissions returns this error code 9) NS_DispatchToCurrentThread returns this error code 10) Load is external and CreateAboutBlankContentViewer fails 11) It's an anchor scroll and session history is not working right 12) Stop() fails with that error code 13) DoURILoad fails with that error code My money is on #7... but that's checked twice in nsDocShell::InternalLoad and the first check doesn't actually return NS_ERROR_FAILURE. It's worth double-checking by adding some code to log things around the second check, I guess, and in general to see whether we make it into the DoURILoad call here.
Comment 14•15 years ago
|
||
Oh, and talos vs locally is almost certainly a timing issue of some sort.
Reporter | ||
Comment 15•15 years ago
|
||
->smaug for more investigation. This seems to be specific to the tab/frameloader changes in the Electrolysis branch which aren't in m-c (or is just really freaky).
Assignee: benjamin → Olli.Pettay
Comment 16•14 years ago
|
||
in irc with bsmedberg, this has been working for about a month now. Unclear what changed or what fixed it. Reopen if this reoccurs.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•