Closed Bug 464080 Opened 11 years ago Closed 11 years ago

parallel builds easily end up with incomplete .jar files

Categories

(Firefox Build System :: General, defect, major)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9.1b2

People

(Reporter: kairo, Assigned: neil)

References

Details

Attachments

(1 file)

Since we parallelized building suite/ (bug 462736) with the new mechanism from bug 461395, I've been running into having incomplete chrome again and again.

When investigating today, I found out that depend rebuilds that haven't much (or actually anything) to do are more likely to run into this, and removing the -j2 option from my mozconfig makes it go away.

What happens is that comm.jar ends up missing some files, probably due to multiple jar.mn files writing stuff to it and parallel runs of JarMaker racing each other and ending up overwriting each other's changes to the .jar file. We probably will run into this even in the mozilla tree when toolkit/ or browser/ are converted to the parallel build infrastructure, I think suite/ happened to be the first consumer of it that actually runs JarMaker.
Which platform, which version of python?

Which files are missing, and in which jar.mn are they?
Yeah, we really meant for parallel builds to be mainly focused on the compiler stuff, not chrome. The locking mechanism for rebuilding chrome is probably completely broken, if it exists at all.
I end up with that on Linux (openSUSE Factory) with Python 2.6, all the content/navigator/ files (suite/browser/jar.mn) end up missing practically every time, a also had a case where even more ended up missing but I didn't preserve it.
My habit of verifying problems in a build by nuking dist/ or dist/bin and rebuilding the same tree might help this, as no actual compilation work needs to be done there but just recreation of symlinks, some preprocessing and rebuilding chrome.

Neil, what do you think about comment #2?
We create the jar files with locks on at http://mxr.mozilla.org/mozilla-central/source/config/JarMaker.py#257, and it's implemented in MozZipFile and then http://mxr.mozilla.org/mozilla-central/source/config/utils.py#59.

Thus my question for more details on how to reproduce this problem.
Attached patch Proposed patchSplinter Review
Before:
* First process opens comm.jar
* First process creates lockfile
* First process writes comm.jar
* Second process opens comm.jar (!)
* Second process waits
* First process deletes lockfile
* Second process creates lockfile
* Second process overwrites comm.jar (!!)
* Second process deletes lockfile
After:
* First process creates lockfile
* First process opens comm.jar
* First process writes comm.jar
* Second process waits
* First process deletes lockfile
* Second process creates lockfile
* Second process opens comm.jar
* Second process appends to comm.jar
* Second process deletes lockfile
Assignee: nobody → neil
Status: NEW → ASSIGNED
Attachment #349152 - Flags: review?
Comment on attachment 349152 [details] [diff] [review]
Proposed patch

Bah, silly calendar QA contact name conflict
Attachment #349152 - Flags: review? → review?(l10n)
Comment on attachment 349152 [details] [diff] [review]
Proposed patch

Duh.

r=me.
Attachment #349152 - Flags: review?(l10n) → review+
Could we get this into mozilla-central ASAP please?
Target Milestone: --- → mozilla1.9.1b2
Attachment #349152 - Flags: approval1.9.1b2? → approval1.9.1b2+
Comment on attachment 349152 [details] [diff] [review]
Proposed patch

a191b2=beltzner
It happened to me a few times too on my Windows 2000.
Pushed changeset 2f0fe196aa89 to mozilla-central.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
OS: Linux → All
Hardware: PC → All
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.