Closed Bug 599610 Opened 14 years ago Closed 14 years ago

PurgeScriptFragments may not purge all fragments

Categories

(Core :: JavaScript Engine, defect)

1.9.2 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta8+
status1.9.2 --- wontfix
status1.9.1 --- unaffected

People

(Reporter: igor, Assigned: igor)

References

Details

(Whiteboard: [sg:critical] [qa-examined-192] [qa-needs-STR])

Attachments

(2 files, 4 obsolete files)

PurgeScriptFragments only looks for trace fragments that are found in TraceMonitor for the current thread. This misses other threads that also may contain the fragments for the script. For example, the GC never runs on a web worker thread so the GC will never purge any fragments for web worker functions. This is hazardous as the fragments may match a newly allocate script that reused memory from the deleted script.
The safest fix for branches would be to enumerate over all threads in search for script fragment. But with web workers activated this would imply that any finalized script (including function scripts) would need to scan at least 2 extra vmfragments instances implying a read of minimum 2 2K segment per script finalization. That could influence the GC pause timing.

An alternative is to put all fragments from JSScript into a list so a script can enumerate them all.
Assignee: general → igor
Whiteboard: [sg:critical]
blocking2.0: --- → ?
blocking2.0: ? → beta8+
Any update here?
blocking1.9.2: --- → ?
status1.9.2: --- → ?
On branches there is a problem with just enumerating over all threads. There JS_DestroyScript really destroy script and that function can be called outside the GC. Thus enumerating over all threads brings the complexity of using the GC machinery just to enumerate the thread data. This is too complex to add to branches.

A safer solution would be to add a field to JSScript that would track the thread that was used for the first recording and will abort the following recordings if that thread would mismatch the current one. That could hurt web workers if the worker is transferred to another thread, but that should be acceptable.
Attached patch v1 (obsolete) — Splinter Review
The idea above of using an owner thread field in JSScript would not lead to a simple patch. With such change it would be necessary to clear such field when JSThread is GC-ed but the script is not. That brings a lot of extra complexity.

So here is an implementation of that enumerating all threads idea that should be easy to port to branches. On branches it would not address the issue of JS_DestroyScript been called outside the GC for scripts having fragments on other threads, but such usage does not exists in FF code base.
I would imagine this affects 1.9.1 as well, yes?
blocking1.9.1: --- → ?
blocking1.9.2: ? → .12+
status1.9.1: --- → ?
(In reply to comment #5)
> I would imagine this affects 1.9.1 as well, yes?

No - on 1.9.1 the GC flashes all JIT-generated code. So there the bug can only affect explicitly destroyed scripts outside the GC phase, but those scripts do not escape to other threads in FF code base.
blocking1.9.1: ? → ---
Attached patch v2 (obsolete) — Splinter Review
Rebased version of v1 with minimal amount of changes.
Attachment #481230 - Attachment is obsolete: true
Attached patch v3 (obsolete) — Splinter Review
The new patch adds a hash of scripts with recorded fragments to offset performance impact of enumerating the fragments for all threads when destructing a script.
Attachment #482507 - Attachment is obsolete: true
Attachment #482522 - Flags: review?(gal)
Attached patch v4Splinter Review
The new version adds missing OOM reporting.
Attachment #482522 - Attachment is obsolete: true
Attachment #482547 - Flags: review?(gal)
Attachment #482522 - Flags: review?(gal)
gal, can you do this review please?
Whiteboard: [sg:critical] → [sg:critical][needs r=gal]
Attachment #482547 - Flags: review?(gal) → review+
Whiteboard: [sg:critical][needs r=gal] → [sg:critical]
Fell off the plate until sayrer pinged me. Sorry for the delay.
landed to TM - http://hg.mozilla.org/tracemonkey/rev/f9785814bdbc

For 1.9.2 the patch requires some porting efforts as the hashmap is not available there. So I suggest to move this to the next release on that branch.
blocking1.9.2: .13+ → ?
http://hg.mozilla.org/mozilla-central/rev/f9785814bdbc
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
per comment 12 it's now the "next release" on the branch.
blocking1.9.2: ? → .14+
Attached patch patch for 192 (obsolete) — Splinter Review
Here is untested backport to 1.9.2 that uses JSDHashTable to record traced scripts.
Attached patch patch for 192Splinter Review
The patch for 1.9.2 deviates from the trunk too much and requires own review. Part of the reason for that is that JSDhashTable does not have the clear method that required to finish table to remove all elements during flushing.
Attachment #497967 - Attachment is obsolete: true
Attachment #498162 - Flags: review?(gal)
gal - do you have time to look at 192 patch today before the code freeze ?
Comment on attachment 498162 [details] [diff] [review]
patch for 192

If I don't review a patch within 24h, please nag (I overlooked the email in that case).
Attachment #498162 - Flags: review?(gal) → review+
Attachment #498162 - Flags: approval1.9.2.14?
Its blocking, no approval needed.
(In reply to comment #19)
> Its blocking, no approval needed.

That's never the case on the branch, please see the tree rules at the top of every tinderbox page. That may be true (most of the time) on the trunk; the branches, however, operate in permanent "last days of the release" mode. (I assume you're referring to branch since it was checked in on the trunk in comment 13.)
Comment on attachment 498162 [details] [diff] [review]
patch for 192

a=LegNeato for 1.9.2.14. Please land this asap today.
Attachment #498162 - Flags: approval1.9.2.14? → approval1.9.2.14+
I see. My mistake. In that case though, whats the point of "blocking" a release branch? We should probably not use blocking on branches then, only wanted and approved.
blocking means we'll hold the release for it, wanted means we want it but won't hold the release. Because of the short cycle times we want extra eyes on the actual patch -- think of it as a super-review-required type step.
We briefly had a super-review-required rule for security patches, but we backed off of that because it was pointless.

Now the rule seems to be back, just different and even more pointless. Previously the super review was coming from a super-reviewer who was at least potentially qualified to review the patch at a technical level. Now the super-review is coming from a driver who most likely has no clue what the patch does?

Chris, do you have the slightest idea what the code in the patch does you just approved? Don't feel bad if not. Its a super specialized corner case that probably 3-4 people on the planet really understand. You can't expect drivers to have a technical clue/review opinion for individual patches. That's simply not their job. These extra eyes are not qualified to see anything useful (and shouldn't be expected to). Reviews should come from module owners/peers.

For the last few branch patches I have seen the approval step is basically just a delay/paperwork/stamping step. Why do it then? If you mark a bug blocking a branch, it means: module owners/peers, write a patch, have it reviewed, and land it asap. If you are worried about upcoming freezes/timing/scheduling and want to stabilize a branch, just close it.
(In reply to comment #24)
> Now the rule seems to be back, just different and even more pointless.

"back"? The branches have had the same rules now for as long as I can remember... probably at least 4-5 years, if not longer. This process isn't anything new. All branch patches require approval.

While it may seem burdensome, this process has caught numerous problems in the past that would have been missed otherwise had the branch drivers not specifically looked at the patch (even if they didn't understand all of it). Considering the branches get very little testing due to lack of nightly users and limited QA, having an extra pair of eyes is important.
(In reply to comment #24)

> Chris, do you have the slightest idea what the code in the patch does you just
> approved? Don't feel bad if not. Its a super specialized corner case that
> probably 3-4 people on the planet really understand. You can't expect drivers
> to have a technical clue/review opinion for individual patches. That's simply
> not their job. These extra eyes are not qualified to see anything useful (and
> shouldn't be expected to). Reviews should come from module owners/peers.

Nope, I do not. Sometimes I do, though more often I just look at the extent of the change and approve unless it looks huge. I see the approval required more to meter what goes into a release, not to determine if the fix is "correct". It also gives an additional chance for me (or other release drivers) to ask clarifying questions, make sure 1.9.1 gets a patch if needed, etc. Sure, the same thing can be done w/o needing the up front approvals, but the reality would be that the additional checking would go away after a month or so.

> For the last few branch patches I have seen the approval step is basically just
> a delay/paperwork/stamping step. Why do it then? If you mark a bug blocking a
> branch, it means: module owners/peers, write a patch, have it reviewed, and
> land it asap. If you are worried about upcoming freezes/timing/scheduling and
> want to stabilize a branch, just close it.

Yes, it is generally rubberstamping for JS blackmagic these days :-). It makes a lot more sense for other areas of the code.

I do think we need to revamp the way we manage branch blocking/approvals, I just haven't figured out the best way to do it. I've never encountered a release process that only approves bugs without requiring approval on the individual patches as well FWIW.
On 192 the patch lead to busted leak test:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1295472073.1295478386.7556.gz&fulltext=1

My plan is to check the patch one more time and backout in an hour or so.
To 1.9.2 drivers:

I could not find anything suspicions in the patch. Given the closed tree status I think it is better to wait until a fresh build on tree reopening. 

I will be offline for the next 9 hours or so (need to sleep in CEST timezone).
The tinderbox failure was caused by the bug 627225 and unrelated to the landed patch.
Are there any STR for 1.9.2 to verify this fix?
Whiteboard: [sg:critical] → [sg:critical] [qa-examined-192] [qa-needs-STR]
Depends on: 631105
I backed this out of 1.9.2 (both default and the .14 relbranch) due to it probably causing bug 631105
blocking1.9.2: .14+ → needed
It probably didn't cause bug 631105, not directly. Backing this out moved the crash to bug 633869 but the underlying cause of both now appears to be frankenbuilds.
We've landed and then (fairly publicly) backed out this change. We should try to get it into the next release if we can.
blocking1.9.2: needed → ?
Attachment #498162 - Flags: approval1.9.2.15?
blocking1.9.2: ? → .15+
Attachment #498162 - Flags: approval1.9.2.15? → approval1.9.2.15+
Comment on attachment 498162 [details] [diff] [review]
patch for 192

This turned out not to be the culprit for the crash spike, we want it back in now. Approved for 1.9.2.15, a=dveditz for release-drivers

clegnitto will try to re-land the patch he backed out, but may need a merge if something else changed.
Attachment #498162 - Flags: approval1.9.2.14+ → approval1.9.2.14-
blocking1.9.2: .17+ → .18+
Comment on attachment 498162 [details] [diff] [review]
patch for 192

Since this is one of the ones that led to the frankenbuild crash spike we want to wait another release. 3.6.17 is getting additional install/update fixes that should help that problem.
Attachment #498162 - Flags: approval1.9.2.17+ → approval1.9.2.18?
Comment on attachment 498162 [details] [diff] [review]
patch for 192

Approved for 1.9.2.18, a=dveditz for release-drivers

Now that users have the install fixes we want to land this.
Attachment #498162 - Flags: approval1.9.2.18? → approval1.9.2.18+
Flags: in-testsuite?
Keywords: testcase-wanted
blocking1.9.2: .18+ → .19+
Attachment #498162 - Flags: approval1.9.2.19?
Attachment #498162 - Flags: approval1.9.2.18-
Attachment #498162 - Flags: approval1.9.2.18+
We decided there is just too much risk at this point to take this in 3.6 due to our lower test population. Marking as wontfix for 1.9.2.
blocking1.9.2: .20+ → ---
Attachment #498162 - Flags: approval1.9.2.20? → approval1.9.2.20-
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: