Closed Bug 670175 Opened 9 years ago Closed 8 years ago

[10.7] Cannot start nightly trunk build after updating from 20110707 build

Categories

(Core :: Memory Allocator, defect, critical)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: marcia, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: hang, regression)

Crash Data

Attachments

(3 files)

I updated from 20110707 to 20110708 and now I cannot launch the trunk build. Either it hangs at the Profile manager and has to be force quit, or it does launch and then when I try to launch any site I get an instant hang and I have to force quit. I tried with several new profiles and the same thing happens.

I updated on my 10.6 machine and had no issues there.
Possibly related crash report on IRC: https://crash-stats.mozilla.com/report/index/298c0a59-a3ce-4e37-9213-496092110708

I am going to try the other 10.7 machine in the lab as well to see what happens there. Both machines are running 11A511.
I just downloaded today's mozilla-central nightly (firefox-2011-07-08-03-08-00-mozilla-central) separately, and had no trouble running it on the 10.7 GM (build 11A511).

Later I'll try explicitly updating from yesterday's nightly.

Does it make any difference to use a clean profile?
Oops, I spoke too soon.

I tried today's nightly with a fresh profile, and it loaded.  But then it hung as soon as I tried to visit http://www.apple.com/.
I tried using a clean profile but I could not get it to launch once I had already updated to today's build.

The 10.7 machine in the lab is exhibiting the same behavior after updating. The build hangs and I have to force quit. When I try to create a new profile it hangs at the profile manager and does not let me get beyond that step.
Summary: [10.7] Cannot start nightly trunk build → [10.7] Cannot start nightly trunk build after updating from 20110707 build
The way I create a clean profile is to delete or rename ~/Library/Application Support/Firefox -- which is (probably) why I don't see your Profile Manager hang.

I'm building an opt build with debug symbols (from current trunk code), to see what I can find out.
I just tested today's aurora nightly, and it doesn't have this problem.

Big sigh of relief!
http://tinyurl.com/6goc6mr shows up in crash stats today as a new signature for 10.7 only - users other than the one in Comment 1 are hitting it.

Adding Paul since I see libjemalloc.dylib in the module list up at the top.
I can no longer reproduce this bug as we've been describing it ... for reasons I can't fathom.

But now I'm seeing something else just as bad:  I start today's mozilla-central nightly, wait about 30 seconds, and move the mouse -- then I either crash or hang.

jemalloc sounds like a plausible reason for these problems.  I'll try disabling it (in source code) and see if that makes any difference.
Attached file Gdb crash stack
Here's a stack I got crashing in gdb (using my opt build with debug symbols).

And here are two crash stacks from using today's nightly:
bp-dbd2afff-b240-443e-aa49-4c8d42110708
bp-4dca78b0-9389-435d-96ec-874f82110708

Interesting that both of the latter are in font code ... but I suspect that's not relevant.
Turning off jemalloc on the Mac made my problems go away.

I'm doing a tryserver build, which should be available tomorrow morning.  Marcia, you can test with it once it's available.
The timing seems to match up for the jemalloc enablement on Mac from bug 414946

1ad1fd67e97a 2011-07-07 14:38 -0700	Paul Biggar - Bug 414946 (part 2): Enable jemalloc on Mac (r=pavlov)
2b2f584dc5fd 2011-05-21 20:27 -0700	Paul Biggar - Bug 414946 (part 1): Fix jemalloc on Mac, but leave disabled (r=pavlov)
Steven, can you try the original crashing builds with the NO_MAC_JEMALLOC environmental variable set? Also, when you say "turning off jemalloc" in comment 11, can you describe what you did (I'm guessing reverting "part 2" (1ad1fd67e97a))?
(In reply to comment #10)
> Created attachment 544890 [details]
> Gdb crash stack
> 
> Here's a stack I got crashing in gdb (using my opt build with debug symbols).

Thanks for this. This points very strongly to jemalloc being the culprit. I'm investigating another jemalloc related bug now (talos tp5 regression on 10.5), and will get back to you shortly.
> Steven, can you try the original crashing builds with the
> NO_MAC_JEMALLOC environmental variable set?

I'll do that (I	didn't realize it was possible to turn off jemalloc
without altering the code).

> Also, when you say "turning off jemalloc" in comment 11, can you
> describe what you did (I'm guessing reverting "part 2"
> (1ad1fd67e97a))?

Reverting "part 2" is exactly what I did.
(Following up comment #9 and comment #15)

> But now I'm seeing something else just as bad: I start today's
> mozilla-central nightly, wait about 30 seconds, and move the mouse
> -- then I either crash or hang.

Here are more precise STR:

1) Start FF, then after the main window comes up wait for 30 seconds
   without doing anything.

2) Move the mouse to a location where the cursor would normally change
   shape -- for example over the location bar, where it normally
   changes from an arrow to an I-beam. 

   At this point I normally hang or crash.

I don't hang or crash if I run firefox-bin from a Terminal prompt, and
prior to that enter the following at the command line:

export NO_MAC_JEMALLOC=YES
> Here are more precise STR:
>
> 1) Start FF, then after the main window comes up wait for 30 seconds
>    without doing anything.
>
> 2) Move the mouse to a location where the cursor would normally
>    change shape -- for example over the location bar, where it
>    normally changes from an arrow to an I-beam.
>
>    At this point I normally hang or crash.

This *still* isn't quite right.  Here's better:

1) Start FF, then after the main window appears (the default
   about:home page), wait until the cursor I-beam in the Google search
   box momentarily stops flashing.

2) Quickly (before the I-beam cursor starts flashing again) move the
   mouse over "about:home" in the location bar.

   At this point you'll normally hang.

For some reason it's quite difficult to reproduce these STR in gdb.
It helps if you first 'set args -foreground' in gdb (to make FF run in
the foreground).

The "hang" doesn't necessarily last forever.
Crash Signature: [@ libsystem_c.dylib@0x6ac31 ]
Blocks: 414946
Component: General → jemalloc
Product: Firefox → Core
QA Contact: general → jemalloc
I don't think this happens on all 10.7 machines, as I tried it on jruderman's machine (prerelease 10.7) and didn't have a problem.

Can I get access to a machine on which this happens? Do we have one is our test lab?
You need to find a machine with the GM DP (build 11A511).  That's what both Marcia and I have been testing with.
I have a machine in the QA lab that has the Gold Master installed.
I tried to duplicate this in the test lab, but couldn't. A good way to move forward is to disable jemalloc on 10.7, and keep it enabled on 10.6. Here's a nightly build with that change made:

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/pbiggar@mozilla.com-933061c83fd2/

Can you confirm that you can no longer replicate this bug?
> Can you confirm that you can no longer replicate this bug?

I can't (testing on the 10.7 GM, build 11A511).
Um, slight ambiguity there. I think you can no longer replicate the bug?
Oops, sorry.

I can no longer reproduce the bug with your tryserver build.

Though I still can reproduce it with the 2011-07-08 nightly (the first nightly with jemalloc), using my STR from comment #17.
Since we backed out jemalloc on 10.7, this is worksforme.  We're going to track enabling jemalloc on 10.7 in bug 694896.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.