Closed
Bug 565312
Opened 15 years ago
Closed 11 years ago
[SoftFloat] NULL pointer dereferenced in js/src/nanojit/Assembler.cpp
Categories
(Core Graveyard :: Nanojit, defect, P3)
Tracking
(Not tracked)
RESOLVED
WONTFIX
flash10.2
People
(Reporter: junkmailnotread, Unassigned)
References
Details
Attachments
(3 files)
43.59 KB,
text/plain
|
Details | |
894 bytes,
patch
|
Details | Diff | Splinter Review | |
680.27 KB,
text/plain
|
Details |
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
Build Identifier: fennec-2.0a1pre
fennec-2.0a1pre built by myself and running on an HP iPAQ hx4700 (armv5te).
fennec crashes immediately after loading an internet page, for example Google.
I traced the crash to js/src/nanojit/Assembler.cpp, whose function Assembler::findVictim() returns a NULL pointer to Assembler::registerAlloc(). The NULL pointer is immediately dereferenced in Assembler::registerAlloc().
Adding my own debug to Assembler::findVictim():
LIns* Assembler::findVictim(RegisterMask allow)
{
printf("allow=0x%x\n", allow);
NanoAssert(allow);
LIns *ins, *vic = 0;
int allow_pri = 0x7fffffff;
printf("vic=%p, allow_pri=%d, FirstReg=%d, LastReg=%d\n", vic, allow_pri, FirstReg, LastReg);
for (Register r = FirstReg; r <= LastReg; r = nextreg(r))
{
printf("r=%d\n", r);
if ((allow & rmask(r)) && (ins = _allocator.getActive(r)) != 0)
{
int pri = canRemat(ins) ? 0 : _allocator.getPriority(r);
printf("ins=%p, pri=%d\n", ins, pri);
if (!vic || pri < allow_pri) {
vic = ins;
allow_pri = pri;
printf("vic=%p, allow_pri=%d\n", vic, allow_pri);
}
}
}
NanoAssert(vic != 0);
printf("return vic=%p\n", vic);
return vic;
}
showed that Assembler::findVictim() was called 861 times, and returned NULL on the last call. This is the debug from the last 2 calls:
allow=0x1
vic=(nil), allow_pri=2147483647, FirstReg=0, LastReg=22
r=0
ins=0xfd7e7c, pri=151
vic=0xfd7e7c, allow_pri=151
r=1
r=2
r=3
r=4
r=5
r=6
r=7
r=8
r=9
r=10
r=11
r=12
r=13
r=14
r=15
r=16
r=17
r=18
r=19
r=20
r=21
r=22
return vic=0xfd7e7c
allow=0x7f0000
vic=(nil), allow_pri=2147483647, FirstReg=0, LastReg=22
r=0
r=1
r=2
r=3
r=4
r=5
r=6
r=7
r=8
r=9
r=10
r=11
r=12
r=13
r=14
r=15
r=16
r=17
r=18
r=19
r=20
r=21
r=22
return vic=(nil)
Segmentation fault
That final "allow" value of 0x7f0000 was the only one not to have any of the l.s. 16 bits set:
% grep '^allow' db2.out | sort -u
allow=0x1
allow=0x170
allow=0x190
allow=0x1c0
allow=0x1f0
allow=0x2
allow=0x270
allow=0x2b0
allow=0x320
allow=0x370
allow=0x380
allow=0x3b0
allow=0x3d0
allow=0x3f0
allow=0x4f0
allow=0x570
allow=0x5f0
allow=0x610
allow=0x670
allow=0x6f0
allow=0x750
allow=0x760
allow=0x770
allow=0x7a0
allow=0x7b0
allow=0x7d0
allow=0x7e0
allow=0x7f0
allow=0x7f0000
allow=0x7ff
allow=0xbfff
allow=0xfbff
allow=0xfdff
allow=0xfeff
allow=0xff7f
allow=0xffbf
allow=0xffdf
allow=0xffef
allow=0xfffb
allow=0xfffd
allow=0xfffe
allow=0xffff
I can add more debug if required.
Reproducible: Always
Reporter | ||
Comment 1•15 years ago
|
||
Output from --enable-debug build. It shows the final assertion failure from Assembler.cpp:
Assertion failed: vic != 0 (/home/software/ipaq/user/mozilla-central/js/src/nanojit/Assembler.cpp:2392)
Comment 2•15 years ago
|
||
Thanks for the debug output, that helps a lot.
This is on ARM. allow=0x7f0000 is equivalent to allow=FpRegs.
Somehow we're trying to allocate an FpReg, and we decide that none of them are free, but then when we try to find one to evict we fail because all of them are actually free. Ie. findVictim() should not have been called. We need to work out why it was called. Sounds like the 'free' and 'active' lists have gotten out of sync.
junkmailnotread, can you apply the attached patch (on top of your first debugging patch) and rerun, and attach the output from the final two occurrences again? It'll need to be a debug build.
Updated•15 years ago
|
Attachment #445014 -
Attachment description: patch → patch with additional diagnostic printf calls
Reporter | ||
Comment 3•15 years ago
|
||
OK here it is; my debug + your debug + --enable-debug is their entirety.
Comment 4•15 years ago
|
||
Thanks.
Well that's weird. We have this:
setA___=0x7f0000, set__F_=0x407f, setA_F_=0x0
So any FpReg is allowed, no FpReg is free, and yet we have this:
r=0
r=1
r=2
r=3
r=4
r=5
r=6
r=7
r=8
r=9
r=10
r=11
r=12
r=13
r=14
r=15
r=16
r=17
r=18
r=19
r=20
r=21
r=22
which indicates that none of the registers have an entry in the 'active'
list. Which would indicate a monumental free/active out-of-sync condition,
ie. every FpReg is out-of-sync.
The most interesting thing is that this is the problematic instruction:
ii2d1 = ii2d js_UnboxDouble1, hcalli2
So I'm guessing it is a SoftFloat problem. junkmailnotread, is the armv5te
an ARM chip that lack floating-point hardware? That would explain why
others haven't hit this, because most ARM chips that Nanojit is used on do
have floating-point hardware, AFAIK.
Hmm, should we be trying to allocate FpRegs when using SoftFloat? I think
Ed Smith knows more about SoftFloat than I do, but he's away this week.
Bug 543636 is also relevant here.
Comment 5•15 years ago
|
||
BTW, this bug is marked critical but I suspect it could be downgraded, depending on how important the "HP iPAQ hx4700 (armv5te)" platform is to our goals. Input about this from people who know more about it than me would be welcome!
Updated•15 years ago
|
Summary: NULL pointer dereferenced in js/src/nanojit/Assembler.cpp → [SoftFloat] NULL pointer dereferenced in js/src/nanojit/Assembler.cpp
Comment 6•15 years ago
|
||
(In reply to comment #4)
> junkmailnotread, is the armv5te an ARM chip that lack floating-point hardware?
Another possibility is that it has FP hardware but the OS has disabled it. See bug 543636 comment 3.
Reporter | ||
Comment 7•15 years ago
|
||
As for FP hardware, I really had no idea. I just pass the -march=armv5te flag to my gcc cross-compiler. So I compiled a small test program with floating point operations and examined the assembler output. Sure enough it was making library calls.
As for "critical", that just matches the descriptions presented when one opens a new bug; i.e. that the application crashed. I don't mind what severity is chosen.
Reporter | ||
Comment 8•15 years ago
|
||
There is no accessible FP hardware; passing -mfloat-abi=softfp caused the test program to generate an illegal instruction. I could see nothing obvious in the kernel configuration regarding FP hardware.
Reporter | ||
Updated•15 years ago
|
Severity: critical → normal
Comment 9•15 years ago
|
||
(In reply to comment #8)
> There is no accessible FP hardware; passing -mfloat-abi=softfp caused the test
> program to generate an illegal instruction. I could see nothing obvious in the
> kernel configuration regarding FP hardware.
I see. AFAIK the OS-disabling-FP-hardware thing is Windows-only.
So the question becomes: do we (Mozilla, Adobe) care about this HP iPAQ hx4700 device?
Reporter | ||
Comment 10•15 years ago
|
||
(In reply to comment #9)
> So the question becomes: do we (Mozilla, Adobe) care about this HP iPAQ hx4700
> device?
Or more generally, do you care about devices without FP hardware.
Comment 11•15 years ago
|
||
Adobe still cares about devices without FP hardware. In 2009 we were still dealing with Arm-11 devices where the device manufacturer would disable the VFP to conserve power. It is quite possible that we're still dealing with low-end devices (e.g, settop boxes, feature phones) where an FPU is not available. I'll look at getting a more definitive answer from product mgmt.
Reporter | ||
Comment 12•15 years ago
|
||
I have solved my immediate problem by building fennec without nanojit support; it no longer crashes. I cannot tell what the performance impact has been.
Comment 13•15 years ago
|
||
(In reply to comment #11)
> Adobe still cares about devices without FP hardware.
It'll fall on someone from Adobe to fix this bug, then. Thanks for the info.
Comment 14•15 years ago
|
||
(In reply to comment #4)
>
> Hmm, should we be trying to allocate FpRegs when using SoftFloat? I think
> Ed Smith knows more about SoftFloat than I do, but he's away this week.
>
> Bug 543636 is also relevant here.
We do care about SoftFloat, yes, when used from TR.
I would expect in soft-float mode, that no FP-reg requiring instructions are emitted by the LIR frontend, and no FP registers would be marked available or free by the ARM backend. A few more asserts in Assembler::gen(), or in ValidateWriter, might catch the problem earlier if this is truly a softfloat problem.
Since the bug came up with TM, there isn't clearly something fixable here for TR.
Comment 15•15 years ago
|
||
(In reply to comment #14)
>
> Since the bug came up with TM, there isn't clearly something fixable here for
> TR.
AFAICT all the SoftFloat code is in NJ itself. In particular, SoftFloatFilter, which introduced the SoftFloat opcodes. So if you do care about SoftFloat I think you shouldn't ignore this problem, because it's likely to be a problem for TR as well.
Bug 543636 said something about Adobe getting buildbot testing of SoftFloat up. Has that happened?
Updated•15 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → Windows CE
Priority: -- → P3
Hardware: Other → ARM
Target Milestone: --- → flash10.2
Reporter | ||
Updated•15 years ago
|
OS: Windows CE → Linux
Assignee | ||
Updated•11 years ago
|
Product: Core → Core Graveyard
Comment 16•11 years ago
|
||
Nanojit has been dead for several years. Its Bugzilla component has been moved to the graveyard (bug 984276).
I checked all the open bugs. They're all uninteresting, so I'm WONTFIXing them all. Apologies for the bugspam.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•