149336 - Linux Flash crashes and causes stack curruption in NPP_New when swLiveconnect=TRUE [Crash loading www.joins.com] [@ 0x00000000 - nsTableCellFrame::Reflow]

Reporter

Description

•

23 years ago

seen on linux commercial branch build 2002-06-05-07-1.0.0 -load www.joins.com the page loads veryu slowly. it takes a few minutes, but the app eventually crashes. unfortunately, the talkback incidents I submitted aren't showing up in the database.

Tracy Walker [:tracy]

Reporter

Comment 1

•

23 years ago

this is an international topsite.

Severity: critical → blocker

Keywords: smoketest

Roy Yokoyama

Comment 2

•

23 years ago

any other platforms? cc i18n folks.

Tracy Walker [:tracy]

Reporter

Comment 3

•

23 years ago

mac osx and windows builds from this morning do not crash.

Roy Yokoyama

Comment 4

•

23 years ago

this site again, shhh. All crashing bugs (83670, 105127, 126229, 109356) resulted to WFM. Can anyone else reproduce this?

Yuying Long

Comment 5

•

23 years ago

I can reproduce it on 06-05 branch build, I saw the crash after finished load the page. Somehow I can not find the talkback ID right now. However, with 06-04 branch build I didn't get this crash, a regression from yesterday?

Roy Yokoyama

Comment 6

•

23 years ago

Any stack trace/talkback help here since I have no linux. shanjian/ftang/smontagu: I need help on Linux platform.

Roy Yokoyama

Comment 7

•

23 years ago

WFM with smontagu's debug build in Linux. His machine didn't have the plugin. tracy/ylong: do you have the plugin installed?

Tracy Walker [:tracy]

Reporter

Comment 8

•

23 years ago

this all the talkback incident has: Incident ID 7032910 Stack Signature Email Address twalker@netscape.com Product ID Gecko1.0 Build ID 2002060507 Trigger Time 2002-06-05 10:27:14 Platform LinuxIntel Operating System Linux 2.2.12-20smp Module URL visited www.joins.com User Comments crashed loading www.joins.com Trigger Reason SIGSEGV: Segmentation Fault: (signal 11) Source File Name Trigger Line No. Stack Trace no stack trace present. :-/ flash plugin is installed and works on other sites

Rui Xu

Updated

•

23 years ago

QA Contact: ruixu → ylong

Rui Xu

Updated

•

23 years ago

Keywords: intl

Roy Yokoyama

Comment 9

•

23 years ago

>WFM with smontagu's debug build in Linux. My bad. We were looking at the today's trunk build. After downloading the branch, we were able to reproduce the crash. Simon is pulling the branch source now.

Roy Yokoyama

Comment 10

•

23 years ago

nsDocumentViewer.cpp change is the only checkin I can think of may cause this problem. Bug 132672 Other checkins are mail/news specific. cc rods@netscape.com

Roy Yokoyama

Comment 11

•

23 years ago

smontagu/shanjian: can I ask you to rollback rod's change (Bug 132672) on your linux branch build?

Simon Montagu :smontagu

Comment 12

•

23 years ago

I can't reproduce this in a mozilla branch build either with or without the plugin. I should soon have a commercial branch debug build for testing.

Keywords: intl

QA Contact: ylong → ruixu

Simon Montagu :smontagu

Comment 13

•

23 years ago

... but I can reproduce the crash with a downloaded mozilla branch build, BuildID 2002060507

Rui Xu

Updated

•

23 years ago

QA Contact: ruixu → ylong

Peter Lubczynski

Comment 14

•

23 years ago

If anyone has an updated branch build, could we try backing out Rod and see if that makes the crash go away?

QA Contact: ylong → ruixu

Peter Lubczynski

Updated

•

23 years ago

QA Contact: ruixu → ylong

serge (gone)

Assignee

Comment 15

•

23 years ago

I doubt rods chechin causes this crash, here is the simple test I've tried: the patch http://bugzilla.mozilla.org/attachment.cgi?id=84435&action=view changed only libgkcontent.so, and there is no more changes in libgkcontent.so, I swapped that lib between 2002-06-04-11-1.0.0 & 2002-06-05-07-1.0.0 builds, the first one is still stable, when last one is still crashing:(

Peter Lubczynski

Comment 16

•

23 years ago

Is this an old bug? Did this ever work correctly?

rods (gone)

Comment 17

•

23 years ago

You can easily "back out" my checkin by commenting out lines 1372 & 1373 in the Documentviewer.cpp http://lxr.mozilla.org/mozilla1.0/source/content/base/src/nsDocumentViewer.cpp#1372 But I don't think my check in has anything to do with the crash, I can load the URL with the trunk and without the flash plugin, but that probably isn't a very meaningful test.

QA Contact: ylong → ruixu

rods (gone)

Comment 18

•

23 years ago

Can you attach a stack trace now that you can get it to crash?

Yuying Long

Comment 19

•

23 years ago

The crash in this site is an old problem. We never had a solution to fix it. And because of this web page contains changes everyday, so once a while, we have this crash and then after a few days, then the crash gone away.

Keywords: crash, intl

QA Contact: ruixu → ylong

Simon Montagu :smontagu

Comment 20

•

23 years ago

If the problem was with the site content, we would be seeing the crash when testing today with both yesterday's build and today's build. Is that the case?

Keywords: crash, intl

QA Contact: ylong → ruixu

Yuying Long

Comment 21

•

23 years ago

With same web page content(today), it crashes on today's build but not yesterday's build for me and Tracy.

Keywords: crash, intl

serge (gone)

Assignee

Comment 22

•

23 years ago

it could be plugins problem bug 143178, I'm bulding debug build now

Rui Xu

Comment 23

•

23 years ago

Reassign to the correct IQA owner: ylong@netscape.com and feel free to ask us for any help.

QA Contact: ruixu → ylong

Shanjian Li

Comment 24

•

23 years ago

I could not reproduce the problem with my branch debug build (today's pull). Base one ylong's information, we might want to later this one and open the tree.

Simon Montagu :smontagu

Comment 25

•

23 years ago

I still haven't succeeded in getting a stack trace, because I can't reproduce consistently in a debug build, but I got some information from running a downloaded build in debugger: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1024 (LWP 10910)] 0x4153f176 in NSGetModule () from /u/smontagu/moz_0605/mozilla/components/libgklayout.so (gdb)

Simon Montagu :smontagu

Comment 26

•

23 years ago

Attached file Finally a stack trace — Details

Peter Lubczynski

Comment 27

•

23 years ago

hm...stack points to Reflow....but I think there are some other bugs just like this one. I'm not convinced this is a new problem but it's strange yesterday's builds work. Maybe some curruption of sorts? Anyone who can reproduce this in a debug branch build can you try backing out Rod?

Simon Montagu :smontagu

Comment 28

•

23 years ago

I backed out Rod and haven't crashed since, but this may be a false lead because I didn't crash every time before.

serge (gone)

Assignee

Comment 29

•

23 years ago

I cannot crash branch debug build with plugins or w/o plugins all just WFM, I did surf joins.com for about 15+ min:(

QA Contact: ylong → ruixu

Roy Yokoyama

Comment 30

•

23 years ago

serge: Is your build with or without rod's patch?

QA Contact: ruixu → ylong

Adam Hauner

Comment 31

•

23 years ago

Bugs with nsBlockFrame::ReflowDirtyLines - bug 117088, bug 120165, bug 134917 - all marked as WFM.

URL: http://www.joins.com → http://www.joins.com/

rods (gone)

Comment 32

•

23 years ago

I just pulled the branch on Linux w/o the plugin is loads fine, no crash. I haven't installed the plugin yet.

URL: http://www.joins.com/ → http://www.joins.com

Keywords: crash, intl

QA Contact: ylong → ruixu

Kai Engert [:KaiE:]

Comment 33

•

23 years ago

If I try really hard, click a lot in that page, resize a lot, move the popup window around while the page below it is loading, reload the page, have two windows open and close, and stuff, I can make it crash after about 5 minutes heavy usage. While during all my tests I was running it in the debugger, I have problems reporting a crash location. Once the debugger was unable to give out a crash location, it just displayed garbage. Next, I was shown a crash somewhere in the reflow code. It crashed in a variable, where the first parameter was "aState", given as a reference (not as a pointer). It crashed when trying to assign to aState.X, IIRC. I can't tell exactly, because the debugger crashed while I was looking around. However, it seemed to me, that it might have been wrong information, because it seemed that reference was passed on from a caller, and in that caller the variable did not point to zero. Next, I'm not in the debugger again, and it crashed at a completely different location. Argh! Sorry, I was about to copy the stack from the window, but now the debugger is again in an endless loop, no longer allowing me to control it. But I can say that it was unable to lock a mutex, return code 22. nsWebShellWindow::FirePersistenceTimer. Does all of that indicate memory corruption? Normally I'm able to debug without problems. Linux branch build from today, flash plugin installed.

Kai Engert [:KaiE:]

Comment 34

•

23 years ago

I forgot to say, when I crashed in the reflow code, that aState reference variable did point to zero.

Oliver Klee

Updated

•

23 years ago

URL: http://www.joins.com → http://www.joins.com/

Keywords: crash

hugo van woerkom

Comment 35

•

23 years ago

For what it's worth: this site crashes my CVS MOZILLA_1_0_RELEASE build, but *not* the CVS trunk build of 5/27 (ID 2002052722) that I have.

Randell Jesup [:jesup] (needinfo me)

Comment 36

•

23 years ago

ReflowDirtyLines - waterson's been in there a bit recently. Adding myself and a few reflow people.

Rui Xu

Updated

•

23 years ago

Keywords: intl

QA Contact: ruixu → ylong

Roy Yokoyama

Comment 37

•

23 years ago

I don't think I should be the owner of this bug. I dont' even have linux mahcine. Assign to smontagu.

Assignee: yokoyama → smontagu

Darin Fisher

Comment 38

•

23 years ago

so i just pulled and built the latest branch bits under redhat 7.2. i installed the latest flash/shockwave plugin and could not reproduce this crash. why is this a smoketest blocker?

Darin Fisher

Comment 39

•

23 years ago

after loading the site a couple more times i observed both a hang and a crash. the stack trace after the crash seems odd... #0 __pthread_mutex_lock (mutex=0x410) at mutex.c:99 #1 0x40659be8 in __libc_free (mem=0x854e398) at malloc.c:3152 #2 0x43137215 in morkUsage_g_static_init_done () from /export/darin/mozilla/dist/bin/plugins/libflashplayer.so #3 0x41f14500 in compositeClassRec () from /usr/X11R6/lib/libXt.so.6 this particular stack trace seems to implicate the flash plugin... but it might just be the victum instead of the culprit.

Darin Fisher

Comment 40

•

23 years ago

verified the crash is not present in the 2002-06-04-14 linux build but is present in the 2002-06-05-07 linux build. looking now to see what changed...

Darin Fisher

Comment 41

•

23 years ago

i tried backing out changes by av, rods, and nhotta and that didn't eliminate the crash. 80% of the time i crash in nsObjectFrame::Destroy calling nsIPluginInstance::Stop... there appears to be heap corruption because the libc free call typically isued by libflashplayer results in either a crash or an infinite loop. out of the list of checkins between 6/4/14 and 6/5/7, there doesn't appear to be anything obvious that might have caused this. at this point i'm somewhat stuck... the only thing i can think to do is to simply backout everything one-by-one. what a pain!!

Frank Tang

Comment 42

•

23 years ago

darin- I appreciate if you can take this bug. simon's experties is on Window and he is just start to learn Linux debugging right now. He is the wrong person to own this issue. Reassign to me if you do not agree to take it. Thanks

Assignee: smontagu → darin

Frank Tang

Comment 43

•

23 years ago

cc av since darin mention about object frame. also, make sure you create a new profile when you try to reproduce it. with smontague's machine, we cannot reproduce it once the 6/4 load the page in a same profile. After that, we cannot reproduce it on the 6/5 build with the same profile. (could be because some content are cached). darin, do you have any trick to reproduce it? which versoin of linux are you on? redhat 7.2 7.3 ?

Darin Fisher

Comment 44

•

23 years ago

i was testing this on a redhat 7.2 system. i was able to reproduce the problem almost always when reloading www.joins.com... only sometimes did it happen on initial load. i know nothing about the related code... i was only looking at this bug because i happened to have a linux branch build and because this bug was preventing me from checking in changes to the branch. unfortunately, i don't have much time to look at this today... i will be able to spend some time on this come monday.

av (gone)

Comment 45

•

23 years ago

My check in on 06.04.2002 causes doing some new things in cases when a plugin is not installed. When plugin is present it does not change anything. Windows build gives the followin warning of every plugin instantiation, if this can be of any help: WARNING: empty damage rect: update caller to avoid fcn call overhead, file y:\branch\mozilla\layout\html\base\src\nsFrame.cpp, line 2287

Samir Gehani

Comment 46

•

23 years ago

Serge will be looking into this. Thanks Serge!

Assignee: darin → serge

Chris McAfee

Comment 47

•

23 years ago

Two things, 1) can we save off a static version of this site so it doesn't change on us? 2) maybe it's time to pull-by date search through the checkins to figure this one out. It's painful, but we've done this many times before to find hard-to-find bugs like this.

serge (gone)

Assignee

Comment 48

•

23 years ago

finally I got the stack trace, here is top part of it: (gdb) bt #0 0x4240ef8e in nsTableFrame::ReflowChildren (this=0x8a39c08, aPresContext=0x96761e8, aReflowState=@0xbfffb720, aDoColGroups=1, aDirtyOnly=0, aStatus=@0x0, aLastChildReflowed=@0xbfffb7b8, aReflowedAtLeastOne=0x0) at nsTableFrame.cpp:3372 #1 0x4240c5bf in nsTableFrame::ReflowTable (this=0x8a39c08, aPresContext=0x96761e8, aDesiredSize=@0xbfffbae0, aReflowState=@0xbfffb8b0, aAvailHeight=1073741824, aReason=eReflowReason_Resize, aLastChildReflowed=@0xbfffb7b8, aDoCollapse=@0xbfffb7f0, aDidBalance=@0xbfffb7e8, aStatus=@0x0) at nsTableFrame.cpp:2301 #2 0x4240c00c in nsTableFrame::Reflow (this=0x8a39c08, aPresContext=0x96761e8, aDesiredSize=@0xbfffbae0, aReflowState=@0xbfffb8b0, aStatus=@0x0) at nsTableFrame.cpp:2160 #3 0x4230802f in nsContainerFrame::ReflowChild (this=0x8a399e8, aKidFrame=0x8a39c08, aPresContext=0x96761e8, aDesiredSize=@0xbfffbae0, aReflowState=@0xbfffb8b0, aX=0, aY=0, aFlags=3, aStatus=@0xbfffc084) at nsContainerFrame.cpp:784 #4 0x4241e9b7 in nsTableOuterFrame::OuterReflowChild (this=0x8a399e8, aPresContext=0x96761e8, aChildFrame=0x8a39c08, aOuterRS=@0xbfffbf10, aMetrics=@0xbfffbae0, aAvailWidth=0x0, aDesiredSize=@0xbfffbb60, aMargin=@0xbfffbb50, aMarginNoAuto=@0xbfffbb40, aPadding=@0xbfffbb30, aReflowReason=eReflowReason_Incremental, aStatus=@0xbfffc084) at nsTableOuterFrame.cpp:1026 #5 0x4241f836 in nsTableOuterFrame::IR_InnerTableReflow (this=0x8a399e8, aPresContext=0x96761e8, aOuterMet=@0xbfffc1c8, aOuterRS=@0xbfffbf10, aStatus=@0xbfffc084) at nsTableOuterFrame.cpp:1342 #6 0x4241ed69 in nsTableOuterFrame::IR_TargetIsInnerTableFrame (this=0x8a399e8, aPresContext=0x96761e8, aDesiredSize=@0xbfffc1c8, aReflowState=@0xbfffbf10, aStatus=@0xbfffc084) at nsTableOuterFrame.cpp:1132 #7 0x4241ec63 in nsTableOuterFrame::IR_TargetIsChild (this=0x8a399e8, aPresContext=0x96761e8, aDesiredSize=@0xbfffc1c8, aReflowState=@0xbfffbf10, aStatus=@0xbfffc084, aNextFrame=0x8a39c08) at nsTableOuterFrame.cpp:1104 --- notice: aStatus=@0x0 in frame 2 but aStatus=@0xbfffc084 in all frames below frame 2, some how it's getting corrupted in nsContainerFrame::ReflowChild() and I crashed on assignment to aStatus. ccing karnaze

serge (gone)

Assignee

Comment 49

•

23 years ago

I've played for a while (heavily resizing & reloading joins.com) with branch build 2002060411, which is a more stable compare the latest ones, I've submitted 3 TB reports, all have different stack traces, only one is plugin related: http://climate.netscape.com/reports/incidenttemplate.cfm?setvar=DeveloperDeveloperTabSet:Overview+%28Developer%29&bbid=7125026#DeveloperMachineState http://climate.netscape.com/reports/incidenttemplate.cfm?setvar=DeveloperDeveloperTabSet:Overview+%28Developer%29&bbid=7124802#DeveloperMachineState http://climate.netscape.com/reports/incidenttemplate.cfm?setvar=DeveloperDeveloperTabSet:Overview+%28Developer%29&bbid=7124620#DeveloperMachineState and here is TB report from 2002060707 which has corrupted stack trace, http://climate.netscape.com/reports/incidenttemplate.cfm?setvar=DeveloperDeveloperTabSet:Code+Around+the+PC&bbid=7124007#DeveloperMachineState but EIP points to /builds/release/branch/2002-06-07-07-1.0.0/components/libgklayout.so 0x406d2000 0x408b5333 I'm lowering severity for this bug to critical, and continue to investigate.

Severity: blocker → critical

Darin Fisher

Comment 50

•

23 years ago

i think the various stack traces result from heap corruption. so, the stack traces unfortunately don't really point us at the real problem... we need to figure out where the heap corruption first occurs. so far, we're only seeing victimized bits of code dying as a result of someone else's badness (i think).

Roland Mainz

Comment 51

•

23 years ago

Can anyone run a session with Rational Purify and look for FMW (Free Memory Writes) etc., please ?

Darin Fisher

Comment 52

•

23 years ago

Roland: this is linux only... isn't Rational Purify only available under Windows and Solaris? would love to have a tool like this available under linux :)

Roland Mainz

Comment 53

•

23 years ago

Darin Fisher wrote: > Roland: this is linux only... isn't Rational Purify only available under > Windows and Solaris? I would be surprised if this crash does not occur under Solaris, too. Note that pavlov has a couple of nice Solaris machines... =:-) > would love to have a tool like this available under linux :) Does Linux come with "watchmalloc" ? Another possible solution under Linux may be "Valgrind" (http://developer.kde.org/~sewardj/) but I never had the time to test it myself in detail...

Peter Lubczynski

Comment 54

•

23 years ago

We've got quite a few bugs marked WFM that crash on joins.com. It doesn't look like any of them have anything helpful though :(

Roland Mainz

Comment 55

•

23 years ago

Peter Lubczynski wrote: > We've got quite a few bugs marked WFM that crash on joins.com. It doesn't look > like any of them have anything helpful though :( I guess the content from joins.com is dynamically generated and (maybe) differs based on the type of client, locale and used OS... Maybe we can ask the staff which does the "smoketest" testing to monitor this issue and do debugging on the same machine where it occurs...

av (gone)

Comment 56

•

23 years ago

Even if Purify is not available on a platform of interest, it still could be helpful to run it on Windows. The thing can be that corruption occurs there too but Windows just survives.

Roland Mainz

Comment 57

•

23 years ago

Just did some testing with my Solaris builds, joins.com does not crash it ad-hoc but sooner or later it crashes while loading other pages like http://www.mozilla.org/ or http://www.spiegel.de/ Running Purify on Solaris should give some hints what's going wrong...

Tracy Walker [:tracy]

Reporter

Comment 58

•

23 years ago

I was able to reproduce this with ease until today. The only thing different about the test machine is that it has been rebooted since the series of crashes seen at this sight during last weeks smoketesting. I shut the machine off for the weekend. Also of note: a slow mail issue with this same machine is also gone today.

Roland Mainz

Comment 59

•

23 years ago

BTW: Solaris has another usefull tool to track FMRs: Try this: % export LD_PRELOAD=libmapmalloc.so % ./mozilla -g The "mapmalloc" shared library then replaces the normal |malloc()| allocator with a function which uses mmap() for obtaining memory. A side-effect is that we get more or less unique memory addresses which are surrounded by areas of non-mapped memory, e.g. writing beyond the area of the allocated space will result in a SEGV. |free()|'ing the memory will unmap it, any further read/write access to that memory will result in a SEGV.

serge (gone)

Assignee

Comment 60

•

23 years ago

well, the things were changed (mozilla code + joins.com content), I cannot reproduce the crash with 20020610 branch builds, neither release one nor debug:(

Darin Fisher

Comment 61

•

23 years ago

i can still crash easily with the 1.0-20020605-07 build under linux.

Darin Fisher

Comment 62

•

23 years ago

i'm able to reproduce the crash using just TestGtkEmbed which nicely eliminates a number of checkins.

Darin Fisher

Comment 63

•

23 years ago

i was also able to repro the crash after removing all of the security libraries from my build.

Peter Lubczynski

Comment 64

•

23 years ago

Ran joins.com through Purify on Windows and nothing interesting showed up. :( Does an older version of the Flash plugin work better?

Darin Fisher

Comment 65

•

23 years ago

i'm doing an electric fence build now FWIW.

serge (gone)

Assignee

Comment 66

•

23 years ago

hmm, it appears my debug build of TestGtkEmbed is linked against old mozilla libs, which came with red hat distribution, and reside in /usr/lib /usr/lib/mozilla dirs. Darin, if you have /usr/lib/libgtkembedmoz.so[libgtksuperwin.so] installed, you are probably linking against those libs too:(

Darin Fisher

Comment 67

•

23 years ago

nope... ldd tells me that i'm picking up the correct version of libmozembedgtk.

Peter Lubczynski

Comment 68

•

23 years ago

Okay, I'm seeing something really weird on Windows which could be causing this problem: Setting a breakpoint in ns4xPluginInstance before we call NPP_NewProc, I sometimes see us trying to use the default plugin on a mime type of "application/x-ableclick-dottag". This only happens sometimes, on random reloads. This could mean Andrei's patch is being touched sometimes. Also setting a breakpoint before calling NPP_SetWindow, I sometimes see us passing in width == height == 0 after a previous call with values for width/height. It might be a long shot, but I wonder if Flash is getting confused because of the multiple calls with differenet values?

Darin Fisher

Comment 69

•

23 years ago

peterl: i tried backing out av's change and it still crashed.

Peter Lubczynski

Comment 70

•

23 years ago

Setting env var NSPR_LOG_MODULES=Plugin:8,PluginNPN:8,PluginNPP:8 will output some plugin debug info.

Darin Fisher

Comment 71

•

23 years ago

unfortunately, my ElectricFence build doesn't run. during startup it complains about insufficient memory. even with 1GB of RAM and what i would consider plenty of swapspace it continues to fail. oh well :(

rubydoo123

Updated

•

23 years ago

Priority: -- → P2

Target Milestone: --- → mozilla1.1alpha

rubydoo123

Comment 72

•

23 years ago

Serge and AV has checked that this is not the cause of Andrei's checkins, assigning back to darin

Assignee: serge → darin

Darin Fisher

Comment 73

•

23 years ago

targeting 1.0.1 since this is a branch bug not a trunk bug.

Target Milestone: mozilla1.1alpha → mozilla1.0.1

ly nguyen

Comment 74

•

23 years ago

i found a stack trace that matched simon's stack trace (incident #7091063). examining a talkback list with the same stack sig, looks like there are 3 incidents for moz1.0 where win machines are crashing, 9 incidents for rc3 where linux and win machines were crashing (mostly linux though), and 1 incident under pr1 for a win machine. this stack sig dates all the way back to build 2000110801. take note that the list only conveys stack sigs as a common thread not the entire stack trace. will take a closer look...

ly nguyen

Comment 75

•

23 years ago

Attached file values of locals and params — Details

from talkback info and the stack trace that matched simon's, here's the values of locals and params.

Darin Fisher

Comment 76

•

23 years ago

so, i switched over to the latest branch build and a single processor machine. for some reason, i'm now able to repro the crash nearly 100% of the time... only now the crash happens in nsBlockFrame::DoReflowInlineFrames. it crashes because |this| becomes null. i inspected the code, added some ASSERTIONS checking for a null this pointer, and it looks like the culprit is stack corruption. so, with dbaron's help i discovered that the source of the problem is actually stack corruption resulting from the invocation of CallNPP_NewProc from ns4xPluginInstance::InitializePlugin. we pushed a large buffer onto the stack and populated it with known values. then we checked the contents of that buffer before and after the call to CallNPP_NewProc, and discovered corruption upon return from CallNPP_NewProc. i'll attach a patch that demonstrates the problem.

Darin Fisher

Comment 77

•

23 years ago

Attached patch patch to demonstrate stack corruption — Details — Splinter Review

this patch allocates a 256 byte buffer on the stack inside nsBlockFrame::ReflowInlineFrames and populates it with uniformly with the ASCII 'a' character. it then verifies the values in this buffer before and after invoking CallNPP_NewProc inside ns4xPluginInstance::InitializePlugin. after the call to CallNPP_NewProc, the stack buffer is corrupt. without this stack buffer, the |this| pointer of nsBlockFrame get's nulled out. chances are you might not be able to reproduce this bug using this patch, but hopefully this information will mean something to the plugin engineers.

Darin Fisher

Comment 78

•

23 years ago

-> plugins

Assignee: darin → beppe

Component: Internationalization → Plug-ins

QA Contact: ylong → shrir

serge (gone)

Assignee

Comment 79

•

23 years ago

confirm linux flash 5.0 r48 & 4.0 r12 plugins corrupt the stack if swliveconnect="TRUE" argument is present:(

rubydoo123

Comment 80

•

23 years ago

handing this back to Serge for debug

Assignee: beppe → serge

Frank Tang

Comment 81

•

23 years ago

My question is what changes we made in the tree in 06-05 to trigger this problem by flash ? or the problem is always but corrupt the stack differently which does not introduce crash?

serge (gone)

Assignee

Comment 82

•

23 years ago

the problem is always, I've seen it with flash 4.0, but some how it's getting exposed randomly, and probably depends on stack layout, in my observation it always zeroes something like couple of 4 bytes addresses, in some cases it could harmless.

Darin Fisher

Comment 83

•

23 years ago

exactly! i also saw it nulling out one DWORD. on my machine it just happens to null out the |this| pointer for a nsBlockFrame. pure coincidence.

serge (gone)

Assignee

Comment 84

•

23 years ago

*** Bug 133118 has been marked as a duplicate of this bug. ***

Peter Lubczynski

Updated

•

23 years ago

Keywords: nsbeta1, topcrash

Summary: Crash loading www.joins.com → Linux Flash crashes and causes stack curruption in NPP_New when swLiveconnect=TRUE [Crash loading www.joins.com] [@ 0x00000000 - nsTableCellFrame::Reflow]

Whiteboard: [PL RTM]

serge (gone)

Assignee

Comment 85

•

23 years ago

Attached patch patch v1. (obsolete) — Details — Splinter Review

this is the hack, but what else I can do:(

Roland Mainz

Comment 86

•

23 years ago

Comment on attachment 88356 [details] [diff] [review] patch v1. Can you add an env var to turn this hack _off_, please ? Maybe the plugin vendor fixes the issue some day and this hack may then be incompatible with the fixed plugin...

Attachment #88356 - Flags: needs-work+

serge (gone)

Assignee

Comment 87

•

23 years ago

I doubt, it'll be ever fixed, they probably hacked LiveConnect support in ns4.x, but surely I can add hidden pref to disable this hack, or env var is preferable?

serge (gone)

Assignee

Comment 88

•

23 years ago

Attached patch patch v2 w/ env var to disable this hack (obsolete) — Details — Splinter Review

added check for env var MOZ_DISABLE_FLASH_SWLIVECONNECT_HACK

Roland Mainz

Comment 89

•

23 years ago

serge wrote: > I doubt, it'll be ever fixed, they probably hacked LiveConnect support in > ns4.x, but surely I can add hidden pref to disable this hack, > or env var is preferable? Mhhh, I am not sure... when I hit such problems in the past I always used env vars since they are less AFAIK expensive than the prefs service. Choose something you like - but do not forget to add a comment in the release notes! :)

serge (gone)

Assignee

Comment 90

•

23 years ago

Attached patch patch v3 (obsolete) — Details — Splinter Review

call getenv() only for "application/x-shockwave-flash" mimetype

Attachment #88356 - Attachment is obsolete: true

Attachment #88359 - Attachment is obsolete: true

Darin Fisher

Comment 91

•

23 years ago

Comment on attachment 88360 [details] [diff] [review] patch v3 >Index: ns4xPluginInstance.cpp >+ if (count && !PL_strncmp(mimetype, flashMimeType, sizeof(flashMimeType)-1)) { beware this usage of PL_strncmp... it will also succeed if mimetype equals... "application/x-shockwave-flash-just-kidding" strcmp is probably more appropriate.

serge (gone)

Assignee

Comment 92

•

23 years ago

Attached patch patch v4 (obsolete) — Details — Splinter Review

agreed, the same is true for + if (!PL_strncasecmp(names[i], blockedParam, sizeof(blockedParam)-1)) { all PL_srtn changed to PL_strcasecmp()

Attachment #88360 - Attachment is obsolete: true

av (gone)

Comment 93

•

23 years ago

Comment on attachment 88360 [details] [diff] [review] patch v3 r=av

Attachment #88360 - Flags: review+

av (gone)

Comment 94

•

23 years ago

Comment on attachment 88369 [details] [diff] [review] patch v4 commented in the wrong attachment, sorry. r=av should be here.

Attachment #88369 - Flags: review+

Roland Mainz

Comment 95

•

23 years ago

Three minor, non-mandatory nits: - env vars should be build like this MOZILLA_%name_of_module%_%description% (most of the code does that like this), e.g. s/MOZ_DISABLE_FLASH_SWLIVECONNECT_HACK/MOZILLA_PLUGIN_DISABLE_FLASH_SWLIVECONNECT_HACK/ - Please use |PR_GetEnv()| instead of |getenv()| - What about caching the result of |PR_GetEnv()|/|getenv()| ? Or is that place not performance-critical ?

serge (gone)

Assignee

Comment 96

•

23 years ago

Attached patch patch v5 (obsolete) — Details — Splinter Review

With Roland's concerns addressed. Thanks.

Attachment #88369 - Attachment is obsolete: true

Roland Mainz

Comment 97

•

23 years ago

Comment on attachment 88382 [details] [diff] [review] patch v5 r=Roland.Mainz@informatik.med.uni-giessen.de

Attachment #88382 - Flags: review+

serge (gone)

Assignee

Comment 98

•

23 years ago

Darin, could you sr= ,please?

rubydoo123

Comment 99

•

23 years ago

this should have been marked as nsbeta1+ already, adding ADT1 - this is a topcrash

Keywords: nsbeta1 → nsbeta1+

Whiteboard: [PL RTM] → [PL RTM][ADT1]

Darin Fisher

Comment 100

•

23 years ago

Comment on attachment 88382 [details] [diff] [review] patch v5 >Index: ns4xPluginInstance.cpp >+#ifdef XP_UNIX >+ int cnt = (int) count; >+ for (int i=0; i<cnt; i++) { why do you need |cnt|? how about this instead: for (PRUint16 i=0; i<count; i++) { >+ if (!PL_strcasecmp(names[i], blockedParam)) { >+ char *val = (char*) values[i]; >+ if (val && *val) { >+ // we cannot just *val=0, it wont be free properly in such case >+ val[0] = '0'; >+ val[1] = 0; >+ } |values| is declared |const char* const*| meaning that you shouldn't be allowed to alter it, so is this code really valid? what if |values| contains pointers to string literals? wouldn't this cause a crash?

serge (gone)

Assignee

Comment 101

•

23 years ago

1. (PRUint16 i=0; makes sense 2. |const char* const* values| is alloced at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/html/base/src/nsObjec tFrame.cpp&rev=1.349&root=/cvsroot#3020 and freed at line #2096 so, it couldn't be a const ptr to string literal, of course, I can dup all of name/value strings modify and free it, but I think it would be unnecessary performance hit, correct me if I'm wrong, please.

Darin Fisher

Comment 102

•

23 years ago

serge: it just looks like a bug waiting to happen. why prototype the variable as |const char * const| if you really intend to modify it? it just seems scary to do so without at least adding a big fat warning and comments to the effect that you know where this data is coming from. but even with that, i'd worry.

Andrew Schultz

Comment 103

•

23 years ago

*** Bug 153639 has been marked as a duplicate of this bug. ***

Jaime Rodriguez, Jr.

Comment 104

•

23 years ago

lowering impact to adt2 RTM.

Blocks: 143047

Whiteboard: [PL RTM][ADT1] → [ADT1] [PL RTM] [Need ETA]

Jaime Rodriguez, Jr.

Updated

•

23 years ago

Whiteboard: [ADT1] [PL RTM] [Need ETA] → [ADT2 RTM] [PL RTM] [Need ETA]

rubydoo123

Updated

•

23 years ago

Whiteboard: [ADT2 RTM] [PL RTM] [Need ETA] → [ADT2 RTM] [PL RTM] [Need ETA][Need SR=]

serge (gone)

Assignee

Comment 105

•

23 years ago

Attached patch patch v6 (obsolete) — Details — Splinter Review

All right, I've moved this hack into nsObjectFrame.cpp:nsPluginInstanceOwner::EnsureCachedAttrParamArrays() "swliveconnect" is getting removed from attributes list completely. please review.

Attachment #88382 - Attachment is obsolete: true

av (gone)

Comment 106

•

23 years ago

I don't think we need checking prefs or env to disable the hack. If Flash is going to be scriptable with Mozilla this parameter will no longer be relevant anyway and they will likely just ignore it in their code. All this is because scriptability can be achieved a different way, and LiveConnect is deprecated for this purpose. I vote for just ignoring it if it causes problems, this will make the code much cleaner.

serge (gone)

Assignee

Comment 107

•

23 years ago

Attached patch patch v7 (obsolete) — Details — Splinter Review

I agreed with av, here is the patch

Attachment #88971 - Attachment is obsolete: true

Peter Lubczynski

Comment 108

•

23 years ago

Comment on attachment 88980 [details] [diff] [review] patch v7 This patch won't catch cases where the TYPE attribute is missing and we guess via extension or HTTP header. eg: <EMBED SRC="myflash.swf" swLiveconnect=true> I like patch v.5 better. It's only because of the deprecated XPCOM API that we use const pointers.

Attachment #88980 - Flags: needs-work+

serge (gone)

Assignee

Comment 109

•

23 years ago

Attached patch patch v8 — Details — Splinter Review

Peter is right, w/o type= we'll missed swliveconnect= :( This patch is the same a patch v5, with BIG FAT WORNING added Darin, would you agree on this worning?

Attachment #88980 - Attachment is obsolete: true

Darin Fisher

Comment 110

•

23 years ago

Comment on attachment 89001 [details] [diff] [review] patch v8 sr=darin

Attachment #89001 - Flags: superreview+

serge (gone)

Assignee

Comment 111

•

23 years ago

on the trunk mozilla/modules/plugin/base/src/ns4xPluginInstance.cpp,v <-- ns4xPluginInstance.cpp new revision: 1.95; previous revision: 1.94 Thanks all.

Status: NEW → RESOLVED

Closed: 23 years ago

Keywords: mozilla1.0.1

Resolution: --- → FIXED

Whiteboard: [ADT2 RTM] [PL RTM] [Need ETA][Need SR=] → [ADT2 RTM] [PL RTM] [fix-trunk]

Blake Ross

Updated

•

23 years ago

Keywords: adt1.0.1

shrirang khanzode

Comment 112

•

23 years ago

*** Bug 153576 has been marked as a duplicate of this bug. ***

shrirang khanzode

Comment 113

•

23 years ago

verified on 0625 trunk on linux, this no longer crashes, works great.

Status: RESOLVED → VERIFIED

serge (gone)

Assignee

Comment 114

•

23 years ago

*** Bug 144520 has been marked as a duplicate of this bug. ***

scottputterman

Comment 115

•

23 years ago

adding adt1.0.1+. Please get drivers approval before checking in.

Keywords: adt1.0.1 → adt1.0.1+

Judson Valeski

Comment 116

•

23 years ago

please checkin to the 1.0.1 branch. once there, remove the "mozilla1.0.1+" keyword and add the "fixed1.0.1" keyword.

Keywords: mozilla1.0.1 → mozilla1.0.1+

serge (gone)

Assignee

Comment 117

•

23 years ago

on the branch mozilla/modules/plugin/base/src/ns4xPluginInstance.cpp,v <-- ns4xPluginInstance.cpp new revision: 1.90.2.4; previous revision: 1.90.2.3

Keywords: mozilla1.0.1+ → fixed1.0.1

serge (gone)

Assignee

Comment 118

•

23 years ago

*** Bug 123709 has been marked as a duplicate of this bug. ***

rubydoo123

Comment 119

•

23 years ago

*** Bug 149116 has been marked as a duplicate of this bug. ***

shrirang khanzode

Comment 120

•

23 years ago

verfd fixd, no crash.

Keywords: fixed1.0.1 → verified1.0.1

Jay Yan

Comment 121

•

23 years ago

.

Nobody; OK to take it and work on it

Updated

•

14 years ago

Crash Signature: [@ 0x00000000 - nsTableCellFrame::Reflow]

BMO Automation

Updated

•

3 years ago

Product: Core → Core Graveyard

Finally a stack trace 23 years ago Simon Montagu :smontagu 13.92 KB, text/plain		Details
values of locals and params 23 years ago ly nguyen 1.71 KB, text/plain		Details
patch to demonstrate stack corruption 23 years ago Darin Fisher 6.44 KB, patch		Details \| Diff \| Splinter Review
patch v1. 23 years ago serge (gone) 1.35 KB, patch		Details \| Diff \| Splinter Review
patch v2 w/ env var to disable this hack 23 years ago serge (gone) 1.35 KB, patch		Details \| Diff \| Splinter Review
patch v3 23 years ago serge (gone) 1.47 KB, patch	serhunt : review+	Details \| Diff \| Splinter Review
patch v4 23 years ago serge (gone) 1.43 KB, patch	serhunt : review+	Details \| Diff \| Splinter Review
patch v5 23 years ago serge (gone) 1.65 KB, patch	roland.mainz : review+	Details \| Diff \| Splinter Review
patch v6 23 years ago serge (gone) 2.48 KB, patch		Details \| Diff \| Splinter Review
patch v7 23 years ago serge (gone) 2.18 KB, patch		Details \| Diff \| Splinter Review
patch v8 23 years ago serge (gone) 2.00 KB, patch	darin.moz : superreview+	Details \| Diff \| Splinter Review