Closed Bug 605754 Opened 9 years ago Closed 9 years ago

crash [@ memcpy | DoReplace ]

Categories

(Core :: JavaScript Engine, defect, critical)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: scoobidiver, Assigned: cdleary)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files, 7 obsolete files)

8.01 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
14.71 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
12.58 KB, patch
Details | Diff | Splinter Review
Build: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b8pre) Gecko/20101019
Firefox/4.0b8pre

It is a residual crash signature that exists in trunk builds.
It is #79 top crasher in 4.0b8pre for the last week.

Comments say that it is related to HTML mail composing in RoundCube webmail or rich text note composing in Facebook.

Signature	memcpy | DoReplace
UUID	9c9ab159-ddc5-4bc1-b44d-4f7712101019
Time 	2010-10-19 08:14:36.784706
Uptime	113
Last Crash	149 seconds before submission
Install Age	113 seconds since version was first installed.
Product	Firefox
Version	4.0b8pre
Build ID	20101019041714
Branch	2.0
OS	Windows NT
OS Version	6.1.7600
CPU	x86
CPU Info	GenuineIntel family 6 model 23 stepping 10
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x1e9ffffe
App Notes 	AdapterVendorID: 8086, AdapterDeviceID: 2e32

Frame 	Module 	Signature [Expand] 	Source
0 	mozcrt19.dll 	memcpy 	memcpy.asm:257
1 	mozjs.dll 	DoReplace 	js/src/jsstr.cpp:2155
2 	mozjs.dll 	ReplaceCallback 	js/src/jsstr.cpp:2190
3 	mozjs.dll 	DoMatch 	js/src/jsstr.cpp:1845
4 	mozjs.dll 	str_replace_regexp 	js/src/jsstr.cpp:2358
5 	mozjs.dll 	js::str_replace 	js/src/jsstr.cpp:2475
6 		@0x4721114 	
7 	mozjs.dll 	js::mjit::EnterMethodJIT 	js/src/methodjit/MethodJIT.cpp:742
8 	mozjs.dll 	CheckStackAndEnterMethodJIT 	js/src/methodjit/MethodJIT.cpp:767
9 	mozjs.dll 	js::mjit::JaegerShot 	js/src/methodjit/MethodJIT.cpp:784
10 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:635
11 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:747
12 	mozjs.dll 	js_fun_apply 	js/src/jsfun.cpp:2369
13 		@0xe65dcc4 	
14 	mozjs.dll 	js::mjit::EnterMethodJIT 	js/src/methodjit/MethodJIT.cpp:742
15 	mozjs.dll 	CheckStackAndEnterMethodJIT 	js/src/methodjit/MethodJIT.cpp:767
16 	mozjs.dll 	js::mjit::JaegerShot 	js/src/methodjit/MethodJIT.cpp:784
17 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:635
18 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:747
19 	mozjs.dll 	js_fun_apply 	js/src/jsfun.cpp:2369
....

More reports at:
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=memcpy%20|%20DoReplace&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=memcpy%20|%20DoReplace
Assignee: general → cdleary
Status: NEW → ASSIGNED
Interesting crash signature! Notably, this started happening around the time of bug 601296 landing on m/c.

Some random suppositions:

- Mostly read violations, so the src of the memcpy, |sub.chars| is suspect, which points at RegExpStatics::getParen/LastMatch/etc.
- Nearly all of the addresses end with 0x...ffffe. I'm betting we're getting access violations when we roll over a megabyte boundary due to a bogus length. Length is determined by the contents of the RegExpStatic's matchPairs values.
- RoundCube uses TinyMCE for their rich text editing, and that one guy said it was happening when he hit "Send", which points at the serializer. Looking at the TinyMCE source, the serializer only seems to use numerical dollar substitutions, which would also point at bad parens/matchPairs values.
Yeah, there are those 2 before it, but they seem to have different signatures -- one is a write violation to something like looks like a valid CharBuffer address, and the other one is sign extended to a 64b address for some reason: 0xffffffffffff0005, but that couldn't be a reasonable offset from a valid |res->input| value, which is why I figured the rest of them correlate to the changes we made in bug 601296.

Hard to tell without heap data. I couldn't repro on Win7 with my RoundCube webmail account -- the guy said he was replying to somebody else's email, so the serializer might be handling data that the TinyMCE formatter didn't generate itself. I'm just shooting in the dark.
(In reply to comment #3)
> Yeah, there are those 2 before it

I wasn't going by date, but rather on the assumption that b6pre didn't have the patches in bug 601296 (so there are more than 2).
(In reply to comment #4)
> I wasn't going by date, but rather on the assumption that b6pre didn't have the
> patches

Ah yes, silly me. There are two crash reports there that looks like the others!
This is in the top 10 crash list for beta7 so far. We are going to nominate this one.
blocking2.0: --- → ?
Attached patch Failing memcpy diagnostics. (obsolete) — Splinter Review
Doing a bunch of things to see what sticks -- this patch just tryies to get a handle on what the source of the problem might be beyond "InterpretDollar gives something bad" by crashing closer to the source. A bunch of the call graph was duplicated with crashes replacing asserts.

Notably, since all the crashes are from reads at an address with suffix 0xffffe I detect whether the memcpy will cross such a value, and, if so, suck a bunch of the replaceValue chars into the black box.
Attachment #490776 - Flags: review?(dmandelin)
Comment on attachment 490776 [details] [diff] [review]
Failing memcpy diagnostics.

You might want to use something slightly different from 0xbbadbeef: The Nitro assembler uses that for its CRASH macro. But it's hard to see how they could overlap.
Attachment #490776 - Flags: review?(dmandelin)
Attachment #490776 - Flags: review+
Attachment #490776 - Flags: approval2.0+
My checkin got backed out because the following are not equivalent in MSVC -- I wrote:

volatile enum Color {

} * color;

Whereas it wanted:

enum Color {

};
volatile Color *color;

In any case, I put my MQ repo up at http://hg.mozilla.org/users/cleary_mozilla.com/regexp-crasher-mq -- checkin will be needed given success on http://hg.mozilla.org/try/rev/57ecec70d6f3
Looks good on try, so checkin needed!
Whiteboard: checkin-needed
Keywords: checkin-needed
Whiteboard: checkin-needed
(Checkin is for the MQ repo referenced above, not the patch attachment.)
Whiteboard: [c-n:http://hg.mozilla.org/users/cleary_mozilla.com/regexp-crasher-mq/
Whiteboard: [c-n:http://hg.mozilla.org/users/cleary_mozilla.com/regexp-crasher-mq/
blocking2.0: ? → beta9+
blocking2.0: beta9+ → betaN+
Could somebody mark this as blocking b8 for removal of the diagnostic patch? I'm looking at the results of the diagnostic patch this evening.
blocking2.0: betaN+ → ?
blocking2.0: ? → beta8+
can we remove the diagnostic patch?
I doubt the removal of the diagnostic patch needs to block the beta8 release. I also feel the underlying crash exists on beta7 so it's ok to ship beta8 if a fix doesn't come in before we freeze. Of course, a safe fix that lands before beta8 is the ideal case...just pretty sure we wouldn't hold for this particular bug.

Moving this to betaN, feel free to disagree.
blocking2.0: beta8+ → betaN+
(In reply to comment #15)
> Moving this to betaN, feel free to disagree.

Nope, was just unaware of the protocol! Thanks for the explanation. I can also #ifdef the diagnostics so they only trigger on the win32 platforms where we're seeing the issue.

In other news, I think I may have found the problem, thanks to the minidumps pointing me in the right direction. Patch coming up!
I found at least one potential source of error (and made a test to demonstrate it). Our regexps represent match segments as pairs of ints in a big match-pair vector. In a regexp like "foobarbaz".replace(/foo(bar)?bar/, "$1") the JIT initially assigns [3, 6] as the segment for the $1 paren, and later backtracks to make it not-so by turning the segment into [-1, 6]. In debug mode, both endpoints of the segment are set to -1, so some code was expecting [-1, -1] to result in (end - start) == 0 for some of the |JSSubString|s.

I've rectified that mistake, added some more diagnostics so we can better observe the remaining crashes, and #ifdef'd everything to XP_WIN32.

Would be cool if we can get this in for tomorrow's nightlies!

http://hg.mozilla.org/try/rev/49b201a6f0db
Attachment #494333 - Flags: review?(dmandelin)
(In reply to comment #17)
> Created attachment 494333 [details] [diff] [review]
> Fix bad start == end == -1 assumption.
> 
> I found at least one potential source of error (and made a test to demonstrate
> it). Our regexps represent match segments as pairs of ints in a big match-pair
> vector. In a regexp like "foobarbaz".replace(/foo(bar)?bar/, "$1") the JIT
> initially assigns [3, 6] as the segment for the $1 paren, and later backtracks
> to make it not-so by turning the segment into [-1, 6]. In debug mode, both
> endpoints of the segment are set to -1, so some code was expecting [-1, -1] to
> result in (end - start) == 0 for some of the |JSSubString|s.
> 
> I've rectified that mistake, added some more diagnostics so we can better
> observe the remaining crashes, and #ifdef'd everything to XP_WIN32.
> 
> Would be cool if we can get this in for tomorrow's nightlies!
> 
> http://hg.mozilla.org/try/rev/49b201a6f0db

First question--is anything in this patch meant to be permanent, or will you be backing out the entire thing in favor of a different bugfix patch later?
Regarding my question in comment 18:

- If this is a purely temporary patch, then it looks OK. I see you are storing a bunch of pointers to the stack to get back in the minidump, so I assume you just want to know if they are null, which ones are equal, etc., and not the pointed-to data.

- If part of this is to be a permanent fix, then:

  - please split into 2 patches, so we can land the fix permanently and the diagnostics temporarily
  - The #if 0 part can't go into the permanent fix
  - RegExpStatics::getLastParen is a little cryptic with the 2s. The + 1 is also non-obvious in getParen. I gather that the parens list really starts with the whole match. It might be better to just index that way everywhere. Otherwise there should be some more indication in the function names about what indexing scheme is in force, and getLastParen made more obvious with something along the lines of:

  int index = pairCountCrash() - 1 - 1 /* first is whole match */;
  if (index < 0) ...
Separated from the additional diagnostics.
Attachment #494333 - Attachment is obsolete: true
Attachment #494535 - Flags: review?(dmandelin)
Attachment #494333 - Flags: review?(dmandelin)
Attached patch 2. Make paren indexing uniform. (obsolete) — Splinter Review
This is certainly better -- the pre-YARR code used 0-based indexing because there was a separate paren substring vector.
Attachment #494536 - Flags: review?(dmandelin)
Attached patch 3. Additional diagnostics. (obsolete) — Splinter Review
Just looking at some more offsets and lengths, plus looking at the start of the repstr instead of starting at rdata.dollar because they're not very long strings in the common case.
Attachment #494537 - Flags: review?(dmandelin)
Attached patch 3. Additional diagnostics. (obsolete) — Splinter Review
Was missing the DOLLAR_EMPTY path after a patch conflict, added it back in.
Attachment #494537 - Attachment is obsolete: true
Attachment #494538 - Flags: review?(dmandelin)
Attachment #494537 - Flags: review?(dmandelin)
Thanks to the cleanup I realized that one of the guards is implied.
Attachment #494536 - Attachment is obsolete: true
Attachment #494539 - Flags: review?(dmandelin)
Attachment #494536 - Flags: review?(dmandelin)
Attached patch 3. Additional diagnostics. (obsolete) — Splinter Review
Simple rebasing.
Attachment #494538 - Attachment is obsolete: true
Attachment #494540 - Flags: review?(dmandelin)
Attachment #494538 - Flags: review?(dmandelin)
Attachment #494535 - Flags: review?(dmandelin) → review+
Attachment #494539 - Flags: review?(dmandelin) → review+
Attachment #494540 - Flags: review?(dmandelin) → review+
Not really fixed-in-tracemonkey until we see the crash stats I suppose.
Whiteboard: fixed-in-tracemonkey
up to 500 crashes per day on 4.0b7 and 20 per day on b8pre.  this should be one that we try and move up to blocking b8+ if the patch looks good.
Getting this patch in for beta8 - is this a viable proposal?
Sorry for the delayed response. These patches did, in fact, land on M-C earlier today:

http://hg.mozilla.org/mozilla-central/rev/03ff7e80545f
http://hg.mozilla.org/mozilla-central/rev/a93d62654d2d
http://hg.mozilla.org/mozilla-central/rev/be1f9b4e522e

They are *not* known to fix the problem in its entirety -- they fix an issue I've identified that *may* be the cause of crashers, but we won't know if that's the primary issue until we see the crash stats. The patches add some additional diagnostics as well.

Will those make it into beta 8?
We have not release-branched for beta 8 (and likely won't until right before it's ready to ship), so yes, anything on mozilla-central now will make it into b8.
Looks like the last reported crash build is 2010-12-02-030316. /me happily whips up a patch to rip out the diagnostics!
Attached patch Remove diagnostics. (obsolete) — Splinter Review
Leaves in JS_CRASH_UNLESS macro for future use. I actually also left in the #if 0 in YARR-JIT to make things more obvious when merging from upstream.
Attachment #490776 - Attachment is obsolete: true
Attachment #494540 - Attachment is obsolete: true
Attachment #495570 - Flags: review?(dmandelin)
Attachment #495570 - Flags: review?(dmandelin) → review+
Forgotten qrefresh.
Attachment #495570 - Attachment is obsolete: true
Resolving as fixed, with blog entry on how to successfully capture heap data to follow.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Crash Signature: [@ memcpy | DoReplace ]
You need to log in before you can comment on or make changes to this bug.