Closed Bug 664453 Opened 13 years ago Closed 13 years ago

build with --enable-jprof often crashes at startup if JP_START is set

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jesup, Assigned: jesup)

References

Details

Attachments

(1 file, 3 obsolete files)

Patch that defers initing jprof until we launch the real app 13 years ago Randell Jesup [:jesup] (needinfo me) 2.88 KB, patch		Details \| Diff \| Splinter Review
Patch to defer start of profiling, and pause it around dangerous operations (fork/execv*) 13 years ago Randell Jesup [:jesup] (needinfo me) 12.72 KB, patch		Details \| Diff \| Splinter Review
Patch - one more update 13 years ago Randell Jesup [:jesup] (needinfo me) 15.46 KB, patch		Details \| Diff \| Splinter Review
Fix bitrot due to bug 552864 13 years ago Randell Jesup [:jesup] (needinfo me) 17.83 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review

Randell Jesup [:jesup] (needinfo me)

Assignee

Description

•

13 years ago

Frequently (50-80%) firefox will crash at startup if JP_START is set in the JPROF_FLAGS; may be worse with high interrupt rates.  Probably a timing issue with setting the signal handler.

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

13 years ago

Assignee: nobody → rjesup

Status: NEW → ASSIGNED

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 1

•

13 years ago

Attached patch Patch that defers initing jprof until we launch the real app (obsolete) — Details — Splinter Review

Defers jprof startup until we have a profile (etc); also pause jprof when we're about to launch another app of any type.  Works much more reliably - I've been unable to break it now.

Attachment #540051 - Flags: review?(benjamin)

Benjamin Smedberg

Updated

•

13 years ago

Attachment #540051 - Flags: review?(benjamin) → review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 2

•

13 years ago

Could you explain why this makes sense?  It seems like starting at startup ought to work.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 3

•

13 years ago

Starting it earlier causes random (frequent) crashes in library opens, things like segfaults, failures to catch the signal, and errors like this:

Inconsistency detected by ld.so: dl-open.c: 260: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!

Also, profiling the profile manager really is separate from profiling normal app startup. I tried some lesser changes (pausing it before exec-ing the new app, though I left that in) and they didn't solve the problem.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 4

•

13 years ago

As backup, from the setitimer() man page:

  Notes

  A child created via fork(2) does not inherit its parent's interval timers. 
  Interval timers are preserved across an execve(2). 

I assume they're preserved across execv*().  Signal handlers (of course) are not, so any combination of setitimer() and execv* is DANGEROUS unless preceded by fork() (and maybe spawn()).

Also, the atexit() to pause profiling and dump the address map won't get called it appears:

From the atexit manpage:

       When  a child process is created via fork(2), it inherits copies of its parent’s registrations.
       Upon a successful call to one of the exec(3) functions, all registrations are removed.

See also https://code.google.com/p/chromium/issues/detail?id=84911 and https://bugzilla.redhat.com/show_bug.cgi?id=645528 for examples of some of the asynchronous dangers of setitimer() and fork/exec. 

It's simply unsafe to leave the setitimer running when you might call fork() or execv*() (and quite possibly spawn()).

I'm revising the patch to also cover spawning plugin-container processes as well.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 5

•

13 years ago

Comment on attachment 540051 [details] [diff] [review]
Patch that defers initing jprof until we launch the real app

Pulling review request for now

Attachment #540051 - Flags: review?(dbaron)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 6

•

13 years ago

The fork/exec issue may explain why launching flash (plugin-container) was sometimes failing with complaints about broken pipes in chromium ipc.

I'll note that there are some fork()/execv*() uses I can't easily modify: one in NSS (in safe_popen()), the other in ForkAndExec() in NSPR.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 7

•

13 years ago

Attached patch Patch to defer start of profiling, and pause it around dangerous operations (fork/execv*) (obsolete) — Details — Splinter Review

Ok, updated patch given what I found about spawning subprocesses via fork and execv*.  

This does not hit every use of those calls.  In particular, it doesn't hit nspr or nss (we can file a follow-on bug/patch on them), the updater app isn't touched (but normally isn't jprofed'd anyways), and js.cpp is modified by this patch but the mod is disabled because there's a build-order issue to resolve (jprof.h hasn't been published in dist/include when js is built).  That can also be a follow-on.  These missed changes don't seem to cause much if any problem in general profiling, though there may be cases where they would fail.

In practice, normally the only things that had problems were initial system startup (crashing around profilemanager/etc) and spawning plugin-containers (i-looping when trying to start flash - this likely was the same problem Chromium had that I mentioned above).  Using profiling/setitimer() in a complex program is tricky at best, especially if you don't control directly every use of fork & exec in every library you use.  This is actually ok, so long as this is solely a debugging tool and it's "ok" if it fails once in a while.  Failing 8 of 10 startups though was obviously unacceptable.

Attachment #540051 - Attachment is obsolete: true

Attachment #540362 - Flags: review?(dbaron)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 8

•

13 years ago

Attached patch Patch - one more update (obsolete) — Details — Splinter Review

One last (I hope) update to cover another very indirect way we call fork() (the filepicker, which ~25 frames deep calls fork() from dbus).

There will be more I haven't found.  That's ok - jprof is a tool; and most uses of it won't hit any of these forks, OR the user will be either using a longer period (the short period greatly increases the odds of the i-loop trying to clone() the memory map), OR the period will be short but the process won't be as huge as my testcase (bigger the VM map, the bigger the chance of this biting you).  Even now, the reason I even noticed there was an issue is that it was biting me when profiling the startup code; normally you kill -PROF it right before your test, and -USR1 right after.

Attachment #540362 - Attachment is obsolete: true

Attachment #540362 - Flags: review?(dbaron)

Attachment #540377 - Flags: review?(dbaron)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 9

•

13 years ago

Attached patch Fix bitrot due to bug 552864 — Details — Splinter Review

Bug 552864 changed how firefox gets started and shut down, and for example broke the use of atexit() to dump the address map.  This update to the patch gets things working again.

Attachment #540377 - Attachment is obsolete: true

Attachment #540377 - Flags: review?(dbaron)

Attachment #540977 - Flags: review?(dbaron)

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

13 years ago

Depends on: 666501

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 10

•

13 years ago

I've found one small hole jprofing startup where it if we got a SIGPROF while inside a malloc pthread lock, and backtrace() needs to load a library and it calls malloc() you can get deadlocked.  I've only hit it once, and there's not too much one can do about it (probably force it to load or statically load the library in question before enabling the setitimers).  I don't want to hold this patch (which makes profiling startup usable) for that edge case.

dbaron: any chance of a review before lockdown?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 11

•

13 years ago

Comment on attachment 540977 [details] [diff] [review]
Fix bitrot due to bug 552864

I should have decided to do this ages ago, but given my review backlog, and the failures of my attempts to find somebody else interested in reviewing jprof code, I'm just going to rubber stamp this r=dbaron.  Sorry for taking so long to decide to do that.

Attachment #540977 - Flags: review?(dbaron) → review+

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 12

•

13 years ago

checked in as http://hg.mozilla.org/mozilla-central/rev/71c422d27ed4

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

build with --enable-jprof often crashes at startup if JP_START is set

Categories

(Core :: General, defect)

Tracking

()

People

(Reporter: jesup, Assigned: jesup)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 3 obsolete files)

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Comment 12

Attachment

General

Description

File Name

Content Type