Closed Bug 584613 Opened 14 years ago Closed 14 years ago

Missing files in talos and unit tests - test_places/head_common.js, Components.interfaces.nsITestCrasher, CharsetConversionTests.js, chrome://pageloader/content/pageloader.js, httpd.js, ...

Categories

(Core Graveyard :: File Handling, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zwol, Unassigned)

References

Details

(Keywords: intermittent-failure, regression, Whiteboard: [fixed by backout of bug 286382])

Summary: Missing files in talos and unit tests - test_places/head_common.js, Components.interfaces.nsITestCrasher, CharsetConversionTests.js, chrome://pageloader/content/pageloader.js, ... → Missing files in talos and unit tests - test_places/head_common.js, Components.interfaces.nsITestCrasher, CharsetConversionTests.js, chrome://pageloader/content/pageloader.js, httpd.js, ...
Blocks: 438871
Whiteboard: [orange]
Has anyone actually checked the test packages to see whether the files are there?
Affected slaves, since it seems reasonable that this could be coming from flaky storage, and the list should tell someone who knows whether or not they are all on the same storage:

talos-r3-w7-024
win32-slave12
talos-r3-w7-019
talos-r3-w7-021
win32-slave30
win32-slave01
talos-r3-w7-044
talos-r3-w7-024
talos-r3-w7-032
talos-r3-w7-009
talos-r3-w7-008
Comment 2 is the reason I don't think it's truly missing files: that was mochitest-a11y failing to open httpd.js and server.js, after chrome and browser-chrome had already run, so I think they would have actually existed.
From the comment 7 log:

  inflating: xpcshell/tests/test_intl_uconv/unit/CharsetConversionTests.js  
...
TEST-UNEXPECTED-FAIL | c:\talos-slave\mozilla-central_win7_test-xpcshell\build\xpcshell\tests\test_intl_uconv\unit\test_decode_armscii.js | test failed (with xpcshell return code: 3), see following log:
  >>>>>>>
  c:/talos-slave/mozilla-central_win7_test-xpcshell/build/xpcshell/tests/test_intl_uconv/unit/test_decode_armscii.js:3: Error: cannot open file 'CharsetConversionTests.js' for reading

  <<<<<<<
after rebuilding recent mozilla-central with pymake (recent = just after pymake upgrade), now this happened to me once locally on my PC. a places test failed with "missing" head_common.js... running the test again worked fine.
Tinderbox isn't using pymake. mak, were you doing -jN builds? It's possible there's some weird parallelism issue.
Running the test again works, so perhaps it's gecko that's broken?
(In reply to comment #17)
> Tinderbox isn't using pymake. mak, were you doing -jN builds? It's possible
> there's some weird parallelism issue.

I know. Yes I'm using -j2, but this never happened to me and I run those tests daily. Plus running immediately the same test succeeded, without rebuilding anything in the middle. I just did "run test->fail, run test->success"
there's another thing, in one of the failures the failing test was test_places\bookmarks\test_getBookmarkedURIFor.js. but there are a lot of tests before this one using head_common.js and they run fine, all Places xpcshell-tests are including that file.
This bug is easily reproduce-able (at least here on my win7 box) running all xpcshell-tests in toolkit/components/places/tests/ sometimes one of them will fail with missing head_common.js. If anybody has ideas on what to try or backout, I can eventually try it locally.
I tried to locally backout redirect api patch from bug 546606, but the problem is still there, so it's not that change.
(In reply to comment #25)
> Too late to bisect using the builds from

yes, it's a bit late, but I can try building at revisions around the time when this bug first appeared (so, around the landing of async redirect API). Unfortunatly building on Windows takes some time, also with pymake, and I only have a Core2.
I am 99% certain that this has nothing to do with redirect API: it's either a change to nsLocalFileWin (but that doesn't appear to have changed recently) or to the JS engine (which has changed a lot!)

Why is it late to bisect using tinderbox-builds? They have test packages that should reproduce this just fine.
(In reply to comment #29)
> Why is it late to bisect using tinderbox-builds? They have test packages that
> should reproduce this just fine.

I meant it was too late yesterday (and to use that folder), not for bisect. Btw it's a time consuming task, if releng could build and test on some server that'd help.
I have confirmed that the issue was already present when we landed async redirect API. Now I've tried changeset b22036de463a, that is just before Khuey changes to MSVC config, since it was a Win only change I considered it interesting. But I still see the problem.
I'm going to try changeset 79aa28daf1f4, just before the TM merge on 03 Aug.
I think I can give a first regression-window, it's still a bit large but better than nothing, any changeset in this range (it includes the TM merge, that I'll test next).
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=79aa28daf1f4&tochange=267f561c325a
Keywords: regression
bug 286382 touched Windows files access, it should not hurt test files but it's the only interesting push in that range. Ehsan said will look into it, in the meanwhile I'm trying to further reduce the range.
(In reply to comment #33)
> bug 286382 touched Windows files access, it should not hurt test files but it's
> the only interesting push in that range. Ehsan said will look into it, in the
> meanwhile I'm trying to further reduce the range.

I don't think that there is any way for my patch in bug 286382 to have affected the packaging steps.
IIUC, I don't think packaging is the issue here -- comment 20 suggests that the "missing" files are actually there, but some tests sporadically have trouble finding/opening them.
(In reply to comment #35)
> IIUC, I don't think packaging is the issue here -- comment 20 suggests that the
> "missing" files are actually there, but some tests sporadically have trouble
> finding/opening them.

Then again, it doesn't really feel related to that bug.  Let's wait until mak finishes his bisect.
I could see Ehsan's patch introducing a race condition where Thread A tries to load a DLL at nearly the same time as Thread B tries to open a file by relative pathname, and the loader hook pulls the current working directory out from under Thread B.  Only I was under the impression you *couldn't* open a file by relative pathname from (chrome) JavaScript.
IINM, chrome URIs resolve to absolute path names, don't they?
sorry Ehsan, my final regression-window is
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6d75d4e390b7&tochange=4b801de1074a

excluding the IPC stuff that doesn't look interesting, there are just your changesets. I suggest to try a temporary backout and see how tinderbox behaves... we should see less failures on Windows (no orange talos and green Xo Xd, at least)
Can we move this bug to another component or is it still believed that it might be a releng bug?
This can move, not sure where to though.
Once we have confirmation of the final cause, it can move to the appropriate component.
Bug 286382 has been backed out so this might be fixed.
(In reply to comment #44)
> Bug 286382 has been backed out so this might be fixed.

It seems that the backout has fixed this, doesn't it?
at least windows talos are no more intermittently failing, and the most common failures from above are no appearing.
Not sure about comment 47, the error log is quite different but hard to tell.
Blocks: 584589
whatever this was it has been fixed by the backout, I'd say to file a new bug for the error in comment 47, since the backout clearly had effects (both on this and on bug 584589).
Status: NEW → RESOLVED
Closed: 14 years ago
Component: Release Engineering → File Handling
Product: mozilla.org → Core
QA Contact: release → file-handling
Resolution: --- → FIXED
Whiteboard: [orange] → [orange][fixed by backout of bug 286382]
Version: other → Trunk
Blocks: 286382
Whiteboard: [orange][fixed by backout of bug 286382] → [fixed by backout of bug 286382]
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.