Closed Bug 669953 Opened 13 years ago Closed 13 years ago

build debug builds with optimization

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ted, Assigned: espindola)

References

Details

(Whiteboard: [buildfaster:p1])

Attachments

(2 files, 4 obsolete files)

I was sure our debug builds were already optimized, but looking at the mozconfigs shows that I'm wrong. We explicitly --disable-optimize in their mozconfigs.

Because they're not optimized, test runs are much slower than they are on opt builds. If we're worried about getting good assertion stacktraces, we could use --enable-profiling to ensure that we have frame pointers.

Are there any other reasons to want these builds to be unoptimized?
Whiteboard: [buildfaster]
Well, we'd still loose a good bit of info in the stack traces due to inlining -- which I suspect is also where the bulk of the performance benefit comes from.
Seems like it's worth trying to see what the perf benefit is, and we can also gauge if the stack traces are a problem.
fwiw, I want to flip the default when you're doing a debug build to --disable-optimize, so you're going to want to explicitly specify --enable-optimize here.
Whiteboard: [buildfaster] → [buildfaster:p2]
Assignee: nobody → coop
Priority: -- → P3
Assignee: coop → ehsan
(In reply to comment #4)
> the resulting build looks pretty crashy.

That _might_ be because you didn't have frame pointers enabled.  Anyways, I'll have more info on this shortly.
Try normal debug builds: http://tbpl.mozilla.org/?tree=Try&rev=23957115624b
Try "fast" debug builds: http://tbpl.mozilla.org/?tree=Try&rev=2dd95b3f917c

Good news is that builds are less than 10% slower and tests are around 50% faster!  Bad news is that Windows builds break, and Linux 32 bit debugs all crash in JSAssert.  I was not able to reproduce either of the issues locally, but I should also say that I wasn't using our builder compilers (VC8 and gcc4.5).

Maybe someone with the right tools can step in and look into this?
Assignee: ehsan → nobody
The Linux builds are crashing in unittests, so it should be feasible for developers to download the builds and run them locally to try to reproduce the crash.
Also, the Windows hang looks like it's hanging running xpcshell during "make package" (we run xpcshell to generate startupcache).
I reduced the linux failure to gcc removing "address < 46" in

extern  void JS_Assert();
typedef enum {
eax,         ecx,         edx,         ebx,         esp,         ebp,         esi,         edi     }
RegisterID;
union StateRemat {
  RegisterID reg_;
  int offset_;
};
static StateRemat FromRegister(RegisterID reg) {
  StateRemat sr;
  sr.reg_ = reg;
  return sr;
}
static StateRemat FromAddress3(int address) {
  StateRemat sr;
    sr.offset_ = address;
  //sr.offset_ = 0;
  if (address < 46 &&    address >= 0) {
    JS_Assert();
  }
  return sr;
}
struct FrameState {
  StateRemat dataRematInfo2(bool y, int z) {
    if (y)         return FromRegister(RegisterID(1));
    return FromAddress3(z);
  }
};
FrameState frame;
StateRemat x;
void jsop_setelem(bool y, int z) {
  x = frame.dataRematInfo2(y, z);
}


which I am almost sure is a gcc bug. I will report it
Rafael: is there a way we can work around that compiler bug?
It seems hard to work around. I will try to add the patch the Richi mentions on the bug to our spec file and report if that works.
I tried to reproduce the failure on windows, but it failed. My setup was vc 2005 on windows 7.

I did a new push to try to see if the problem still exists:

http://tbpl.allizom.org/?usebuildbot=1&tree=Try&rev=9c8e5599d475
Whiteboard: [buildfaster:p2] → [buildfaster:p1]
Assignee: nobody → respindola
That was just an upload fail. I am still going to debug what the

### ERROR: WalkStack64: Only part of a ReadProcessMemory or WriteProcessMemory request was completed.

was, but there is a new try run in:

http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=644d28700788
Depends on: 680277
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #16)
> That was just an upload fail. I am still going to debug what the
> 
> ### ERROR: WalkStack64: Only part of a ReadProcessMemory or
> WriteProcessMemory request was completed.
> 
> was, but there is a new try run in:
> 
> http://tbpl.allizom.org/?tree=Try&usebuildbot=1&rev=644d28700788

We figured this out.  This is just the Windows stack walking API failing to give us full stack information because of the lack of frame pointers.  I filed bug 680277 to fix this issue.
Our configure script now disables vrp if it finds a broken gcc (like the ones in the bots) and it should be possible to work around frame pointers being disabled by adding --enable-profiling.

There is a try push in:

http://tbpl.allizom.org/?usebuildbot=1&tree=Try&rev=e6b1e04eda75
Attached patch Enable optimizations (obsolete) — Splinter Review
Attachment #550489 - Attachment is obsolete: true
Attachment #556819 - Flags: review?(catlee)
Attachment #556819 - Flags: feedback?(ehsan)
So did the Win32 hang turn out to be intermittent or something? Your try runs in comment 16 and comment 18 look good.
(In reply to Ted Mielczarek [:ted, :luser] from comment #20)
> So did the Win32 hang turn out to be intermittent or something? Your try
> runs in comment 16 and comment 18 look good.

It was caused by some libraries having frame pointers and other don't. Bug 680277 tracks the real fix. This patch avoids it by also using --enable-profiling.
Attachment #556819 - Flags: review?(catlee) → review+
Attachment 556819 [details] [diff] landed at 
  http://hg.mozilla.org/build/buildbot-configs/rev/ff5fc1e933d0
and went into production at 'Thu Sep 01 14:05:36 2011 -0700' with
  http://hg.mozilla.org/build/buildbot-configs/rev/b44dceb10136

Subsequently backed out for bustage on default
  http://hg.mozilla.org/build/buildbot-configs/rev/af7327dcc81a
and merged to production at 'Thu Sep 01 21:25:43 2011 -0700' with
  http://hg.mozilla.org/build/buildbot-configs/rev/9e669f6c07ec
Attachment #556819 - Flags: feedback-
Comment on attachment 556819 [details] [diff] [review]
Enable optimizations

Oops, wrong flag.
Attachment #556819 - Flags: feedback- → checked-in-
what caused this to bounce?
Linux32 debug crashing during make package - espindola and bz sounded like they were getting a handle on why last night.
The linux 32 bit problem was bug 684526.
Comment on attachment 556819 [details] [diff] [review]
Enable optimizations

I thought I had plused this...
Attachment #556819 - Flags: feedback?(ehsan) → feedback+
Attached patch Enable optimizations (obsolete) — Splinter Review
Attachment #556819 - Attachment is obsolete: true
Attachment #561770 - Flags: review?(catlee)
I was able to reproduce the 10.5 make check failure. I am debugging it.
After finding:

* a bug in gcc
* a strict aliasing violation on js
* a compartment gc + ti bug
* a build bug that caused js and the rest to have different flags
* and a build bug that caused us to create debug builds without frame pointers

it looks like this is finally working :-)

Is the patch OK?
Attachment #561770 - Flags: review?(catlee) → review+
This is now live.
\o/
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Reopening to enable this on try and m-i too....
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #563500 - Flags: review?(aki)
Attachment #563500 - Flags: review?(aki) → review+
This is now also live for try and m-i!
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
This caused WinXP debug reftest to go orange with:

TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 256 bytes during test execution
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of Mutex with size 12 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of ReentrantMonitor with size 16 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsTArray_base with size 4 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsThread with size 128 bytes
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | leaked 1 instance of nsTimerImpl with size 72 bytes

on both mozilla-central and mozilla-inbound.
Backed out.

http://hg.mozilla.org/build/buildbot-configs/rev/b275f12b6356

  934  hg pull -u 
  935  hg up -r default
  936  hg export 4822 | patch -R -p1
  937  hg export 4816 | patch -R -p1
  938  hg commit
  939  hg up -r production
  940  hg merge -r default
  941  hg commit
  942  hg push
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Also: Incidentally, this ended up triggering this odd command, with a confused optimize on/off state, in Win Debug build logs:
===
running /bin/sh /e/builds/moz2_slave/m-cen-w32-dbg/build/nsprpub/configure  --enable-application=browser --enable-jemalloc --enable-debug --enable-libxul --enable-trace-malloc --enable-tests --with-dist-prefix=e:/builds/moz2_slave/m-cen-w32-dbg/build/obj-firefox/dist --with-mozilla --enable-optimize --enable-debug --disable-optimize --cache-file=.././config.cache --srcdir=/e/builds/moz2_slave/m-cen-w32-dbg/build/nsprpub
===
(note the "--enable-optimize --enable-debug --disable-optimize" there)

https://tbpl.mozilla.org/php/getParsedLog.php?id=6615313&tree=Firefox
WINNT 5.2 mozilla-central leak test build on 2011-09-29 12:38:54 PDT for push 60e86b847759

This may merit a separate bug, but I thought I'd point it out in case it's worth fixing before this re-lands.
Was the leak permanent? A bugzilla search suggests:

https://bugzilla.mozilla.org/show_bug.cgi?id=687857

I am out tomorrow, but if no one gets to it first, on Monday I will try to debug both the nsprpub/configure invocation and the leak.
Yes, permanent, and 256 bytes in reftest, while 687857 is 2-4 megabytes in mochitest-chrome.
I was able to reproduce the bug and I am trying to reduce the testcase a bit.

While at it, I have a try run to try at least to fix the "--enable-optimize --enable-debug --disable-optimize"  silliness.
 
https://tbpl.mozilla.org/?tree=Try&rev=a88fa132ddaf
This patch makes us always pass --(enable|disable)-(optimizations|debug) to the nspr configure. That way we don't have to worry about defaults in two configures being in sync.

It also removes the hack for trace malloc from the top level configure. If there are nspr only restrictions on tracing and optimizations, the nspr configure should handle them.
Attachment #564300 - Flags: review?(khuey)
One interesting info, the leak only happens if NS_TRACE_MALLOC_DISABLE_STACKS is set.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #45)
> It also removes the hack for trace malloc from the top level configure. If
> there are nspr only restrictions on tracing and optimizations, the nspr
> configure should handle them.

Did you look at cvs/hg blame to see why it was added?

(In reply to Rafael Ávila de Espíndola (:espindola) from comment #46)
> One interesting info, the leak only happens if
> NS_TRACE_MALLOC_DISABLE_STACKS is set.

That seems most likely to be due to a change in performance characteristics.
Adding some logging shows that it is the mMainThread in the thread manager that is leaking.
Depends on: 692122
Comment on attachment 564300 [details] [diff] [review]
don't pass "--enable-optimize --enable-debug --disable-optimize" to sub configure

Sorry for the delay, I've been away from the computer for a few days.
Attachment #564300 - Flags: review?(khuey) → review+
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #49)
> A new try run that includes the fix te 692122 is at
> 
> https://tbpl.mozilla.org/?tree=Try&rev=0d0283093e31

There's a mochitest-chrome perma-leak on Windows XP debug with this patch.  Also, all of the Android debug tests are orange too.  Maybe somebody from the mobile team can tell us whether that's normal or not?
I think joel is tracking an issue with debug tests on Android, can't recall what bug though
Yes, debug tests on Android have been perma-orange for a while: bug 691071
That is not at all the problem with those runs. The problem with those runs is that some mobile meeting told aki that it would be useful to run debug tests even without having the assertions make it into the logs, which was, to put the best possible face on it, something which turns out not to be entirely the truth.

That is 26 utterly wasted test runs which (absent a few infra failures) are orange because they don't match the expected assertion counts, doing things like having 8 assertions in min-width-1b.html. What assertions? Who knows? Annotate them? As what? File a bug on the assertions? Which assertions?
And a new try on top of the previous fixes is at

https://tbpl.mozilla.org/?tree=Try&rev=f696664da2c0

I will debug the leak if it is still there on Tuesday.
(In reply to David Baron [:dbaron] from comment #47)
> (In reply to Rafael Ávila de Espíndola (:espindola) from comment #45)
> > It also removes the hack for trace malloc from the top level configure. If
> > there are nspr only restrictions on tracing and optimizations, the nspr
> > configure should handle them.
> 
> Did you look at cvs/hg blame to see why it was added?

That was

https://github.com/ehsan/mozilla-history/commit/785dd8c8b40e7676ec0acfbffc964aee8177dbcd

https://bugzilla.mozilla.org/show_bug.cgi?id=126915
Attachment #561770 - Attachment is obsolete: true
Attachment #563500 - Attachment is obsolete: true
Attachment #566113 - Flags: review?(ted.mielczarek)
Comment on attachment 566113 [details] [diff] [review]
Enable optimizations

Review of attachment 566113 [details] [diff] [review]:
-----------------------------------------------------------------

Go for it!
Attachment #566113 - Flags: review?(ted.mielczarek) → review+
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #61)
> Enable optimizations

https://hg.mozilla.org/mozilla-central/rev/52103e8ae17e
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Depends on: 731936
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.