PurgeScriptFragments may not purge all fragments

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
9 years ago
4 years ago

People

(Reporter: igor, Assigned: igor)

Tracking

1.9.2 Branch
Points:
---
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(blocking2.0 beta8+, status1.9.2 wontfix, status1.9.1 unaffected)

Details

(Whiteboard: [sg:critical] [qa-examined-192] [qa-needs-STR])

Attachments

(2 attachments, 4 obsolete attachments)

(Assignee)

Description

9 years ago
PurgeScriptFragments only looks for trace fragments that are found in TraceMonitor for the current thread. This misses other threads that also may contain the fragments for the script. For example, the GC never runs on a web worker thread so the GC will never purge any fragments for web worker functions. This is hazardous as the fragments may match a newly allocate script that reused memory from the deleted script.
(Assignee)

Comment 1

9 years ago
The safest fix for branches would be to enumerate over all threads in search for script fragment. But with web workers activated this would imply that any finalized script (including function scripts) would need to scan at least 2 extra vmfragments instances implying a read of minimum 2 2K segment per script finalization. That could influence the GC pause timing.

An alternative is to put all fragments from JSScript into a list so a script can enumerate them all.
Assignee: general → igor

Updated

9 years ago
Whiteboard: [sg:critical]
(Assignee)

Updated

9 years ago
blocking2.0: --- → ?

Updated

9 years ago
blocking2.0: ? → beta8+

Comment 2

9 years ago
Any update here?
blocking1.9.2: --- → ?
status1.9.2: --- → ?
(Assignee)

Comment 3

9 years ago
On branches there is a problem with just enumerating over all threads. There JS_DestroyScript really destroy script and that function can be called outside the GC. Thus enumerating over all threads brings the complexity of using the GC machinery just to enumerate the thread data. This is too complex to add to branches.

A safer solution would be to add a field to JSScript that would track the thread that was used for the first recording and will abort the following recordings if that thread would mismatch the current one. That could hurt web workers if the worker is transferred to another thread, but that should be acceptable.
(Assignee)

Comment 4

9 years ago
Posted patch v1 (obsolete) — Splinter Review
The idea above of using an owner thread field in JSScript would not lead to a simple patch. With such change it would be necessary to clear such field when JSThread is GC-ed but the script is not. That brings a lot of extra complexity.

So here is an implementation of that enumerating all threads idea that should be easy to port to branches. On branches it would not address the issue of JS_DestroyScript been called outside the GC for scripts having fragments on other threads, but such usage does not exists in FF code base.

Comment 5

9 years ago
I would imagine this affects 1.9.1 as well, yes?
blocking1.9.1: --- → ?
blocking1.9.2: ? → .12+
status1.9.1: --- → ?
(Assignee)

Comment 6

9 years ago
(In reply to comment #5)
> I would imagine this affects 1.9.1 as well, yes?

No - on 1.9.1 the GC flashes all JIT-generated code. So there the bug can only affect explicitly destroyed scripts outside the GC phase, but those scripts do not escape to other threads in FF code base.
blocking1.9.1: ? → ---
(Assignee)

Comment 7

9 years ago
Posted patch v2 (obsolete) — Splinter Review
Rebased version of v1 with minimal amount of changes.
Attachment #481230 - Attachment is obsolete: true
(Assignee)

Comment 8

9 years ago
Posted patch v3 (obsolete) — Splinter Review
The new patch adds a hash of scripts with recorded fragments to offset performance impact of enumerating the fragments for all threads when destructing a script.
Attachment #482507 - Attachment is obsolete: true
Attachment #482522 - Flags: review?(gal)
(Assignee)

Comment 9

9 years ago
Posted patch v4Splinter Review
The new version adds missing OOM reporting.
Attachment #482522 - Attachment is obsolete: true
Attachment #482547 - Flags: review?(gal)
Attachment #482522 - Flags: review?(gal)
gal, can you do this review please?
Whiteboard: [sg:critical] → [sg:critical][needs r=gal]

Updated

9 years ago
Attachment #482547 - Flags: review?(gal) → review+

Updated

9 years ago
Whiteboard: [sg:critical][needs r=gal] → [sg:critical]

Comment 11

9 years ago
Fell off the plate until sayrer pinged me. Sorry for the delay.
(Assignee)

Comment 12

9 years ago
landed to TM - http://hg.mozilla.org/tracemonkey/rev/f9785814bdbc

For 1.9.2 the patch requires some porting efforts as the hashmap is not available there. So I suggest to move this to the next release on that branch.
blocking1.9.2: .13+ → ?

Comment 13

9 years ago
http://hg.mozilla.org/mozilla-central/rev/f9785814bdbc
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
per comment 12 it's now the "next release" on the branch.
blocking1.9.2: ? → .14+
(Assignee)

Comment 15

9 years ago
Posted patch patch for 192 (obsolete) — Splinter Review
Here is untested backport to 1.9.2 that uses JSDHashTable to record traced scripts.
(Assignee)

Comment 16

9 years ago
Posted patch patch for 192Splinter Review
The patch for 1.9.2 deviates from the trunk too much and requires own review. Part of the reason for that is that JSDhashTable does not have the clear method that required to finish table to remove all elements during flushing.
Attachment #497967 - Attachment is obsolete: true
Attachment #498162 - Flags: review?(gal)
(Assignee)

Comment 17

8 years ago
gal - do you have time to look at 192 patch today before the code freeze ?

Comment 18

8 years ago
Comment on attachment 498162 [details] [diff] [review]
patch for 192

If I don't review a patch within 24h, please nag (I overlooked the email in that case).
Attachment #498162 - Flags: review?(gal) → review+
(Assignee)

Updated

8 years ago
Attachment #498162 - Flags: approval1.9.2.14?

Comment 19

8 years ago
Its blocking, no approval needed.
(In reply to comment #19)
> Its blocking, no approval needed.

That's never the case on the branch, please see the tree rules at the top of every tinderbox page. That may be true (most of the time) on the trunk; the branches, however, operate in permanent "last days of the release" mode. (I assume you're referring to branch since it was checked in on the trunk in comment 13.)

Comment 21

8 years ago
Comment on attachment 498162 [details] [diff] [review]
patch for 192

a=LegNeato for 1.9.2.14. Please land this asap today.
Attachment #498162 - Flags: approval1.9.2.14? → approval1.9.2.14+

Comment 22

8 years ago
I see. My mistake. In that case though, whats the point of "blocking" a release branch? We should probably not use blocking on branches then, only wanted and approved.
blocking means we'll hold the release for it, wanted means we want it but won't hold the release. Because of the short cycle times we want extra eyes on the actual patch -- think of it as a super-review-required type step.

Comment 24

8 years ago
We briefly had a super-review-required rule for security patches, but we backed off of that because it was pointless.

Now the rule seems to be back, just different and even more pointless. Previously the super review was coming from a super-reviewer who was at least potentially qualified to review the patch at a technical level. Now the super-review is coming from a driver who most likely has no clue what the patch does?

Chris, do you have the slightest idea what the code in the patch does you just approved? Don't feel bad if not. Its a super specialized corner case that probably 3-4 people on the planet really understand. You can't expect drivers to have a technical clue/review opinion for individual patches. That's simply not their job. These extra eyes are not qualified to see anything useful (and shouldn't be expected to). Reviews should come from module owners/peers.

For the last few branch patches I have seen the approval step is basically just a delay/paperwork/stamping step. Why do it then? If you mark a bug blocking a branch, it means: module owners/peers, write a patch, have it reviewed, and land it asap. If you are worried about upcoming freezes/timing/scheduling and want to stabilize a branch, just close it.
(In reply to comment #24)
> Now the rule seems to be back, just different and even more pointless.

"back"? The branches have had the same rules now for as long as I can remember... probably at least 4-5 years, if not longer. This process isn't anything new. All branch patches require approval.

While it may seem burdensome, this process has caught numerous problems in the past that would have been missed otherwise had the branch drivers not specifically looked at the patch (even if they didn't understand all of it). Considering the branches get very little testing due to lack of nightly users and limited QA, having an extra pair of eyes is important.

Comment 27

8 years ago
(In reply to comment #24)

> Chris, do you have the slightest idea what the code in the patch does you just
> approved? Don't feel bad if not. Its a super specialized corner case that
> probably 3-4 people on the planet really understand. You can't expect drivers
> to have a technical clue/review opinion for individual patches. That's simply
> not their job. These extra eyes are not qualified to see anything useful (and
> shouldn't be expected to). Reviews should come from module owners/peers.

Nope, I do not. Sometimes I do, though more often I just look at the extent of the change and approve unless it looks huge. I see the approval required more to meter what goes into a release, not to determine if the fix is "correct". It also gives an additional chance for me (or other release drivers) to ask clarifying questions, make sure 1.9.1 gets a patch if needed, etc. Sure, the same thing can be done w/o needing the up front approvals, but the reality would be that the additional checking would go away after a month or so.

> For the last few branch patches I have seen the approval step is basically just
> a delay/paperwork/stamping step. Why do it then? If you mark a bug blocking a
> branch, it means: module owners/peers, write a patch, have it reviewed, and
> land it asap. If you are worried about upcoming freezes/timing/scheduling and
> want to stabilize a branch, just close it.

Yes, it is generally rubberstamping for JS blackmagic these days :-). It makes a lot more sense for other areas of the code.

I do think we need to revamp the way we manage branch blocking/approvals, I just haven't figured out the best way to do it. I've never encountered a release process that only approves bugs without requiring approval on the individual patches as well FWIW.
(Assignee)

Comment 28

8 years ago
On 192 the patch lead to busted leak test:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1295472073.1295478386.7556.gz&fulltext=1

My plan is to check the patch one more time and backout in an hour or so.
(Assignee)

Comment 29

8 years ago
To 1.9.2 drivers:

I could not find anything suspicions in the patch. Given the closed tree status I think it is better to wait until a fresh build on tree reopening. 

I will be offline for the next 9 hours or so (need to sleep in CEST timezone).
(Assignee)

Comment 30

8 years ago
The tinderbox failure was caused by the bug 627225 and unrelated to the landed patch.
Are there any STR for 1.9.2 to verify this fix?
Whiteboard: [sg:critical] → [sg:critical] [qa-examined-192] [qa-needs-STR]
Depends on: 631105

Comment 32

8 years ago
I backed this out of 1.9.2 (both default and the .14 relbranch) due to it probably causing bug 631105
blocking1.9.2: .14+ → needed
It probably didn't cause bug 631105, not directly. Backing this out moved the crash to bug 633869 but the underlying cause of both now appears to be frankenbuilds.
We've landed and then (fairly publicly) backed out this change. We should try to get it into the next release if we can.
blocking1.9.2: needed → ?
Attachment #498162 - Flags: approval1.9.2.15?
blocking1.9.2: ? → .15+

Updated

8 years ago
Attachment #498162 - Flags: approval1.9.2.15? → approval1.9.2.15+
Comment on attachment 498162 [details] [diff] [review]
patch for 192

This turned out not to be the culprit for the crash spike, we want it back in now. Approved for 1.9.2.15, a=dveditz for release-drivers

clegnitto will try to re-land the patch he backed out, but may need a merge if something else changed.
Attachment #498162 - Flags: approval1.9.2.14+ → approval1.9.2.14-
blocking1.9.2: .17+ → .18+
Comment on attachment 498162 [details] [diff] [review]
patch for 192

Since this is one of the ones that led to the frankenbuild crash spike we want to wait another release. 3.6.17 is getting additional install/update fixes that should help that problem.
Attachment #498162 - Flags: approval1.9.2.17+ → approval1.9.2.18?
Comment on attachment 498162 [details] [diff] [review]
patch for 192

Approved for 1.9.2.18, a=dveditz for release-drivers

Now that users have the install fixes we want to land this.
Attachment #498162 - Flags: approval1.9.2.18? → approval1.9.2.18+
Flags: in-testsuite?
Keywords: testcase-wanted
blocking1.9.2: .18+ → .19+
Attachment #498162 - Flags: approval1.9.2.19?
Attachment #498162 - Flags: approval1.9.2.18-
Attachment #498162 - Flags: approval1.9.2.18+

Comment 38

8 years ago
We decided there is just too much risk at this point to take this in 3.6 due to our lower test population. Marking as wontfix for 1.9.2.
blocking1.9.2: .20+ → ---
Attachment #498162 - Flags: approval1.9.2.20? → approval1.9.2.20-
Group: core-security
You need to log in before you can comment on or make changes to this bug.