Closed
Bug 883918
Opened 11 years ago
Closed 10 years ago
"hg purge" causing intermittent problems on linux+osx
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: Gavin, Unassigned)
References
Details
Attachments
(1 file)
1.23 KB,
patch
|
rail
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
The push: https://tbpl.mozilla.org/?tree=Try&rev=d8c746ad3326 The build failure: https://tbpl.mozilla.org/php/getParsedLog.php?id=24157495&tree=Try&full=1 TEST-UNEXPECTED-FAIL | xpccheck | test test_webappsActor.js is missing from test manifest /builds/slave/try-osx64-00000000000000000000/build/dom/apps/tests/unit/xpcshell.ini! My push didn't touch test_webappsActor.js, so this suggests that somehow that Try build on bld-lion-r5-027 wasn't properly clobbered before the build started. No idea if this was just a one-off problem.
Comment 1•11 years ago
|
||
We don't clobber on try since the try bits of bug 851270 landed. We now do 'hg purge' instead of a clobber. Perhaps this is another occurrence of bug 873067.
Comment 2•11 years ago
|
||
Similar to the other bug, the file exists on disk, and hg thinks that's just fine: bld-lion-r5-027:build cltbld$ hg ident 1066d9fca2ee bld-lion-r5-027:build cltbld$ hg status ./dom/apps/tests/unit/test_webappsActor.js bld-lion-r5-027:build cltbld$ hg log !$ hg log ./dom/apps/tests/unit/test_webappsActor.js changeset: 130931:c50f597b1e6a user: Alexandre Poirot <poirot.alex@gmail.com> date: Mon May 06 09:51:53 2013 -0400 summary: Bug 844227 - Add more functions to the webapps actor. r=fabrice However, http://hg.mozilla.org/try/file/1066d9fca2ee/dom/apps/tests/unit doesn't list that file, nor does http://hg.mozilla.org/try/file/d8c746ad3326/dom/apps/tests/unit. I think we should disable hg purge until we can track this bug down.
Comment 3•11 years ago
|
||
slave is disabled in slavealloc for investigation
Comment 4•11 years ago
|
||
Unlike bug 873067, I don't see anything obviously weird in the logs (e.g. a missing parent changeset). Possibilities: 1) Mercurial bug (we're running 2.5.4, right?) 2) Performing the purge before |hg up| results in oddities (but I don't think it should matter) 3) Filesystem or other weirdness. I'd hate to disable hg purge because it results in such a nice perf win. But, if it's buggy, that doesn't leave us much choice.
Comment 5•11 years ago
|
||
This machine hasn't gotten the hg update yet, it's still running 2.0.2
Comment 6•11 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #5) > This machine hasn't gotten the hg update yet, it's still running 2.0.2 In that case I'm inclined to blame an old, buggy hg version.
Comment 7•11 years ago
|
||
alright. I'll clobber the machine and throw it back in the pool. I don't know what the status of hg upgrades on OSX builders is.
Comment 8•11 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #7) > I don't know what the status of hg upgrades on OSX builders is. Around the corner, Bug 868192 I'm currently leaving our OSX builders final deploy up to puppet320 deploy for OSX. Bug 760093 tracks the puppet320 part.
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Comment 9•11 years ago
|
||
I just saw this on the try build at https://tbpl.mozilla.org/?tree=Try&rev=1844e440cadf Log at https://tbpl.mozilla.org/php/getParsedLog.php?id=24389790&tree=Try Slave details: Linux try build on 2013-06-20 10:45:25 PDT for push 1844e440cadf slave: bld-centos6-hp-041
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 10•11 years ago
|
||
[cltbld@bld-centos6-hp-041.build.scl1.mozilla.com ~]$ hg --version Mercurial Distributed SCM (version 2.5.4)
Comment 11•11 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #10) > [cltbld@bld-centos6-hp-041.build.scl1.mozilla.com ~]$ hg --version > Mercurial Distributed SCM (version 2.5.4) This is starting to look like a Mercurial bug :(
Comment 12•11 years ago
|
||
From a random try push, https://tbpl.mozilla.org/php/getParsedLog.php?id=24370008&tree=Try Can we please stop using hg purge, stop giving vastly more utterly bogus try results than we know we are giving, and *then* decide where the bug lies and what we should do about it?
Comment 13•11 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #12) > From a random try push, > https://tbpl.mozilla.org/php/getParsedLog.php?id=24370008&tree=Try > > Can we please stop using hg purge, stop giving vastly more utterly bogus try > results than we know we are giving, and *then* decide where the bug lies and > what we should do about it? Agreed.
Comment 14•11 years ago
|
||
Attachment #765932 -
Flags: review?(rail)
Comment 15•11 years ago
|
||
I was going to suggest printing the output of |hg status -A| after |hg up| so we can identify the next culprit. But I concede we'd be in an undefined state and backing out is probably best. This is such a weird bug.
Comment 16•11 years ago
|
||
Yeah, I'd certainly like to be able to reproduce this. What does role does .hg/dirstate play? Is there anything we can look at in there to see why hg thinks the file belongs? Also, if anybody catches this in action again, poke me or buildduty on irc and we can set aside the machine for debugging.
Updated•11 years ago
|
Attachment #765932 -
Flags: review?(rail) → review+
Updated•11 years ago
|
Attachment #765932 -
Flags: checked-in+
Comment 17•11 years ago
|
||
In production
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Comment 18•11 years ago
|
||
Found in triage, and moving to "Buildduty" because of comment#16. Tweaked summary, and bug dependencies, based on comments so far. gps: Looks like no further occurrences have been reported since "hg purge" was disabled in comment#17. Any word from hg folks - is this a known issue? (In reply to Chris AtLee [:catlee] from comment #16) > Yeah, I'd certainly like to be able to reproduce this. > > What does role does .hg/dirstate play? Is there anything we can look at in > there to see why hg thinks the file belongs? > > Also, if anybody catches this in action again, poke me or buildduty on irc > and we can set aside the machine for debugging. emorley, ryanvm, tomcat: if you see this again, please ping buildduty so we can pull the machine from production for investigation.
Component: Other → Buildduty
Depends on: 868192
Flags: needinfo?(ryanvm)
Flags: needinfo?(gps)
Flags: needinfo?(emorley)
Flags: needinfo?(cbook)
Summary: failure to clobber on try? → "hg purge" causing intermittent problems on linux+osx
Comment 20•11 years ago
|
||
We don't know what the underlying issue was. But considering not using purge with Mercurial or the Git equivalent is costing tons of time in builders (5-10% of total build job time for some builders), I highly encourage the appropriate people to investigate the causes of this. IMO we should start upgrading Mercurial clients to 2.6.3 or 2.7.0 at the earliest convenience (the client version doesn't need to match the server version) - this is something we should be doing anyway, regardless of this bug. We should also investigate selectively enabling purging on platforms that are known to not have problems - it's possible something wonky on some builders/platforms is confusing things. This bug should also likely also be resolved, possibly duped on bug 851270.
Flags: needinfo?(gps)
Updated•11 years ago
|
Flags: needinfo?(ryanvm)
Comment 21•11 years ago
|
||
It sounds like we suspect that older versions of hg have a broken purge in some situations. Is that right? It certainly reads to me like this is about debugging our usage of "hg purge" and then turning it back on, so I'm moving it out of buildduty...
Component: Buildduty → General Automation
Flags: needinfo?(cbook)
Comment 22•10 years ago
|
||
The purge code was backed out ages ago, and so we haven't had subsequent issues. We think we've tracked down the cause of the issue in bug 969689, so let's move future discussion there.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•