Status

()

defect
RESOLVED FIXED
9 years ago
8 years ago

People

(Reporter: jbramley, Assigned: cdleary)

Tracking

unspecified
ARM
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 -, fennec2.0b4+)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(11 attachments, 9 obsolete attachments)

7.08 KB, text/plain
Details
42.58 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
38.45 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
6.02 KB, patch
cdleary
: review+
Details | Diff | Splinter Review
54.68 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
17.34 KB, patch
cdleary
: review+
Details | Diff | Splinter Review
20.84 KB, patch
cdleary
: review+
Details | Diff | Splinter Review
33.30 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
8.68 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
14.07 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
19.32 KB, patch
dmandelin
: review+
Details | Diff | Splinter Review
Reporter

Description

9 years ago
No description provided.
Reporter

Comment 1

9 years ago
This is looking like it will involve special cases all over the place, and I'd really like to avoid that if possible. For example:

----
    /* Copy the slot value to the expression stack. */
    Address slot(objReg, 1 << 24);
#if defined JS_NUNBOX32
    masm.loadTypeTag(slot, shapeReg);
    DBGLABEL(dbgTypeLoad);
    masm.loadPayload(slot, objReg);
    DBGLABEL(dbgDataLoad);
#elif defined JS_PUNBOX64
    Label inlineValueOffsetLabel =
        masm.loadValueAsComponents(slot, shapeReg, objReg);
#endif
----

On x86, the offset is a 32-bit literal in the instruction stream. Is is emitted twice, once by loadTypeTag and once by loadPayload, and thus patched twice in the PIC. On x64, it is still inline, but is issued as one instruction and one patch site. However, in that case the offset is calculated at run-time, per-PIC. ARM really wants to sit somewhere in the middle; the offset will be constant, but we only want one offset (and ideally one load):

(As currently emitted by load64WithAddressOffsetPatch:)
  LDR  <scratch>, =offset    @ <-- offset patch location
  ADD  <scratch>, <scratch>, objReg
  LDR  objReg, [<scratch>, #0]
  LDR  shapeReg, [<scratch>, #4]

The existing solution, with two separate address calculations, is not only slow (on ARM), but also broken as the back-end will try to optimize the ADD so that slot.offset (1<<24) is encoded inline in the ADD. We could use a constant that won't do that, but that solution is fragile and still not ideal.

I can go and hack in an ARM-friendly case in every PIC, and that's what I did for MICs, but it seems to me that we'd be better off trying to abstract this neatly. Could we, for example, implement loadValueAsComponents for all platforms? I can't think of a decent way to do that without losing one of the DBGLABELs.

MacroAssembler has its own methods for handling patchable offsets so we might to be able to re-use or extend them somehow. I already added load64WithAddressOffsetPatch for ARM; this is essentially loadValueAsComponents, I think.

----

I'm happy to go and think about this, but let me know if there are other restrictions which I haven't considered, or if there are some obstacles that I'm likely to hit. Ideally, I want many of the #ifdefs in jsop_*_pic to disappear, and be replaced with something like this:

Label patchableThing = masm.loadValueWithAddressOffsetPatch(slot, shapeReg, objReg);

... which is then repatched with something like that:

repatcher.repatchAddressOffset(pic.patchableThing, offset);

That should do away with a lot of the nastiness. Some methods will still require platform-specific code, of course, but we should only need to implement them once.
The approach looks good to me. dvander, can you weigh in too, especially on whether there are any gotchas here?
Removing the DBGLABEL makes me uncomfortable, because we have no other means to be assured of constant validity. There's a way we can keep them, though:

loadValueWithAddressOffsetPatch(foo, RegisterID bar, RegisterID baz, Label &firstLoadLabel, Label &secondLoadLabel)

Each platform can then pass in labels in all circumstances, including on opt builds. We can then switch on the CPU type at the bottom of the function when we go to assert; e.g., x86 would use both labels, whereas x64 would only use the first.

Removing #ifdef is always a noble cause.
Reporter

Comment 4

9 years ago
(In reply to comment #3)
> Removing the DBGLABEL makes me uncomfortable, because we have no other means to
> be assured of constant validity. There's a way we can keep them, though:

You still get the second one, though, so unless x86's 'mov' has variants that can cause the first label to move but not the second, it should be safe. We can also assert this to a degree anyway because the offset of the first move will be 4 bytes apart from the offset of the second move.
Reporter

Comment 5

9 years ago
For example:

Label patchableThing = masm.loadValueWithAddressOffsetPatch(slot, shapeReg,
objReg);

----

loadValueWithAddressOffsetPatch(...) {
    ...
#if defined(JS_CPU_ARM)
    Label label = masm.load64WithAddressOffsetPatch(slot, hiReg, loReg);
    JS_ASSERT(end.getValueAtOffset(0) == slot.offset);
    return label;
#elif defined(JS_CPU_X86)
    Label start = masm.label();
    ... (emit code)
    Label end = masm.label();
    JS_ASSERT(differenceBetween(start, end) == 12);  // #define the constant.
    JS_ASSERT(end.getValueAtOffset(-6) == slot.offset);
    JS_ASSERT(end.getValueAtOffset(0) == slot.offset+4);
    return end;
#endif
}

----

Ok, I made up a few method names, but the gist is there. X64 would use the loadValueAsComponents method in "PunboxAssembler.h". I already put ARM's load64WithAddressOffsetPatch directly in MacroAssemblerARM.h, alongside some similar methods.
That's fine so long as "Label start = masm.label()" becomes "DBGLABEL(start)" -- otherwise there will be compiler warnings in opt builds.
Reporter

Comment 7

9 years ago
Posted patch Abstraction example. (obsolete) — Splinter Review
Ok, here's an example of my proposed abstraction applied to MICs (because they're simpler). PICs may require additional methods in "ICRepatcher.h", depending on the usage; I looked at a couple and they looked straightforward, but this may not be true of all of them.

There are several TODOs remaining, and clearly I haven't actually touched the PICs yet, but this is an example of the general direction I'm heading in. Also, I've not done any release builds yet so there may be warnings to fix etc.

I want to finish this by Wednesday as I'll be away for a couple of days after that, so I'll push ahead with this immediately. (This is the last thing left on my "must do before 2010-09-01" list.) If you have any comments or concerns, please let me know.
Reporter

Comment 8

9 years ago
   440 #if defined JS_PUNBOX64
   441         pic.labels.setprop.stubShapeJump = masm.differenceBetween(start, stubShapeJumpLabel);
   442         JS_ASSERT(pic.labels.setprop.stubShapeJump == masm.differenceBetween(start, stubShapeJumpLabel));
   443 #endif

Isn't this assertion redundant? (There are a few similar sequences throughout PolyIC.cpp.)
Reporter

Comment 9

9 years ago
(In reply to comment #7)
> I want to finish this by Wednesday

By the way, at the moment I can't see that happening. There are just too many PICs with too many #ifdefs and x86-specifics to change individually, and getting a reasonable abstraction working well is going to take some time.
(In reply to comment #8)
>    440 #if defined JS_PUNBOX64
>    441         pic.labels.setprop.stubShapeJump = masm.differenceBetween(start,
> stubShapeJumpLabel);
>    442         JS_ASSERT(pic.labels.setprop.stubShapeJump ==
> masm.differenceBetween(start, stubShapeJumpLabel));
>    443 #endif
> 
> Isn't this assertion redundant? (There are a few similar sequences throughout
> PolyIC.cpp.)

I was surprised by code like that in a review the other day. I was thinking maybe it deserves a comment, but it didn't reach the threshold for me to actually say something. But apparently it does deserve a comment. 

Anyway, the purpose is to make sure the value doesn't overflow the type of the field it's stored in.

Updated

9 years ago
blocking2.0: --- → ?
tracking-fennec: --- → 2.0b2+

Updated

9 years ago
blocking2.0: ? → -

Comment 11

9 years ago
Any updates here? We'd really like to get this out the door for Firefox 4 Android Beta 2.
Reporter

Comment 12

9 years ago
(In reply to comment #11)
> Any updates here? We'd really like to get this out the door for Firefox 4
> Android Beta 2.

I'm struggling to find the time at the moment. I've started to port the MIC abstraction [attachment 468322 [details] [diff] [review]] to PICs but it's only part of the problem.

If anyone at Mozilla (who undoubtedly knows PICs better than I do) is twiddling their thumbs, I'd be happy to share the work somehow.
Reporter

Comment 13

9 years ago
This is a very rough start which introduces (load|store)ValueWithAddressOffsetPatch to replace much of the nastiness. I successfully replaced bits in the MICs. (Not tested for a while, though.) It also adds a few necessary bits to the ARM back-end.

I've put all the abstractions into a new "ICRepatcher.h", which is shared between MICs and PICs.

I've literally just pulled this out of my MQ so it could include debug code or dirty hacks that I've forgotten about!
Attachment #468322 - Attachment is obsolete: true
Reporter

Comment 14

9 years ago
Posted patch Utilities to support PICs. (obsolete) — Splinter Review
This one adds to ICRepatcher to support mechanisms used by PICs. (Specifically, compare-and-branch where both the guard and the target are patchable.) It also adds another overloaded implementation of storeValueWithAddressOffsetPatch because it simplifies things in a couple of places.

I did experiment a bit with replacing some code in the PICs themselves, but I didn't produce anything solid.

(Like the last one, this is pulled straight from my MQ without being tested.)

----

Aside from the Galaxy problems, I'm currently working mostly on bug 602468 because test failures scare me, though I can prioritize PICs if it's considered more important for Fennec.
Jacob and I talked through the state of things. MICs are working, but will probably be affected by any refactoring. We're going to collaborate on getting PIC patching working in an x86-like style to minimize code churn, using indirect jumps to get access to the full possible 32b address range.

Once we have a working x86-like implementation we'll experiment with the possible tuning options (balancing, tables, etc.). Made a user repo here for us to work out of: http://hg.mozilla.org/users/cleary_mozilla.com/tm-arm-pic

Next things I'm doing: finishing up some other bugs and ramping up on the existing ARM PIC code. We should be getting started on 10/15.

Updated

9 years ago
tracking-fennec: 2.0b2+ → 2.0b3+
Status update in bug 605415 comment 0.

PIC work switched to an MQ repo because we find that's easier, will break it up before commit: http://hg.mozilla.org/users/cleary_mozilla.com/tm-arm-pic-mq

(Per-platform patching details are refactored behind the ICRepatcher as we go, so testing happens against all three CPU architectures for these changes.)
We're working through a gdb snafu that makes debugging JIT'd JaegerMonkey code difficult on ARM: see bug 605758.
Depends on: 605758
Progress update: we've got GET/CALLPROP and SETPROP PICs working on all three platforms. NAME PICs are well on their way.

High-level summary is that we've created a set of new labels and variable patching-offsets in PICInfo for cross-platform use. Once we reach a state where all the PICs are working we'll be able to ifdef out the variable offset members of the PICInfo on platforms where they are unnecessary. At that point we will also drop the offsets back into bitfields.

On deck:
- BindName
- Get/Call/SetElement
Posted patch 1. Tracer build tweaks. (obsolete) — Splinter Review
Fix one build error and one build warning with --enable-methodjit --disable-tracejit, because that's what I started out doing.
Assignee: Jacob.Bramley → cdleary
Attachment #492530 - Flags: review?(dvander)
Attachment #492530 - Attachment description: Tracer build tweaks. → 1. Tracer build tweaks.
Posted patch 2. Cross-platform PIC labels. (obsolete) — Splinter Review
As originally made by dvander for the farjump patch.
Attachment #492531 - Flags: review?(dvander)
Posted patch 3. ARM asm tweaks. (obsolete) — Splinter Review
Fixes some ARM repatch-based backend quirks.
Attachment #492532 - Flags: review?(dvander)
Posted patch 4. Newlabel MIC refactor. (obsolete) — Splinter Review
Introduces "loadLabelWithAddressOffsetPatch" and "storeValueWithAddressOffsetPatch" as a cross-platform load/store value interface with a cross-platform Repatcher interface to accompany it.
Attachment #492535 - Flags: review?(dvander)
Posted patch 5. Cross-platform GETPROP. (obsolete) — Splinter Review
Moves a bunch of things behind platform-agnostic interfaces and enables *solely* GETPROP PICs for ARM. Also introduces machinery to ensure that a constant pool doesn't get dumped into an IC's inline path, which requires a followup patch to apply to the existing MICs.
Attachment #492536 - Flags: review?(dvander)
Attachment #481812 - Attachment is obsolete: true
Attachment #481815 - Attachment is obsolete: true
Also noteworthy, on my Tegra2 board I saw a 8% speedup on SunSpider and 60% speedup on V8 with GETPROP PICs enabled.
Attachment #492530 - Flags: review?(dvander) → review+
Attachment #492532 - Flags: review?(dvander) → review?(Jacob.Bramley)
Reporter

Comment 25

9 years ago
Comment on attachment 492532 [details] [diff] [review]
3. ARM asm tweaks.

I'm giving r+ because it all looks sensible. Most of the individual changes in this particular patch (if not the collation and suchlike) were mine anyway, so I'm not sure how valid my review is.

Ironically, you're probably the best person to review this. I think you already have, but I'll tag it for the Bugzilla formality anyway.
Attachment #492532 - Flags: review?(cdleary)
Attachment #492532 - Flags: review?(Jacob.Bramley)
Attachment #492532 - Flags: review+
Reporter

Comment 26

9 years ago
I see one unexpected* jit-test failure with -m:

    -m /work/moz/pic/js/src/jit-test/tests/basic/bug584499-1.js

I configured with "--enable-debug --disable-optimize --enable-methodjit --disable-tracejit", and used the latest MQ stack available in tm-arm-pic-mq, with all +newlabels patches up to newlabel-getprop.

* I also see the failures from bug 613482 and the broken date test.
(In reply to comment #25)

Could you double check the mask on repatchLEAToLoadPtr? I had to change that a bit more since we last spoke about it.
Reporter

Comment 28

9 years ago
(In reply to comment #27)
> Could you double check the mask on repatchLEAToLoadPtr? I had to change that a
> bit more since we last spoke about it.

It's sound.
Attachment #492532 - Flags: review?(cdleary) → review+
Reporter

Comment 29

9 years ago
This assumes that calls generated by infallibleVMCall are never patched.

http://infomonkey.cdleary.com/questions/57/can-an-infalliblevmcall-ever-be-patched

If it can be patched, the change to my patch is simple to make.
Attachment #492660 - Flags: review?(cdleary)
Canceling reviews and postponing until beta 4 for a smoother transition and better testing. SETPROP is nearly done being tested with the new labels so new patches for review should be up soon.
Attachment #492531 - Flags: review?(dvander)
Attachment #492535 - Flags: review?(dvander)
Attachment #492536 - Flags: review?(dvander)
Attachment #492660 - Flags: review?(cdleary)

Updated

9 years ago
tracking-fennec: 2.0b3+ → 2.0b4+
Reporter

Updated

9 years ago
Depends on: 614907
Comparing against pre-PIC methodjit-only we get 1.078x on SunSpider and 1.98x on V8-V4 in the SunSpider harness. jbramley has some ideas for how we can meaningfully profile the slowdowns (3d-morph is 20% slower, math-spectral-norm is 11% slower, v8-crypto is 12% slower). Overall, we have some nice wins, though.

Reviewable patch queue to follow.
Initial port to the cross-platform PIC labels. Static instances are used on X86 to avoid memory usage overhead (since that platform has known fixed-constant offsets for all patchable code locations). This small memory reduction could also be done for ARM, eventually.

These labels evolve a bit over the course of the patch set, sprouting more convenient accessors.
Attachment #492530 - Attachment is obsolete: true
Attachment #492531 - Attachment is obsolete: true
Attachment #492532 - Attachment is obsolete: true
Attachment #498304 - Flags: review?(dmandelin)
Switches the MICs to use a "repatcher" interface -- the idea being that higher-level code constructs, like branchWithPatch32 or loadValueFromComponents, can be patched through a platform independent API.
Attachment #492535 - Attachment is obsolete: true
Attachment #498305 - Flags: review?(dmandelin)
This addresses a bug in the way fallibleVMCalls were being formed.
Attachment #498307 - Flags: review?(cdleary)
Attachment #492536 - Attachment is obsolete: true
Attachment #498308 - Flags: review?(dmandelin)
Attachment #492660 - Attachment is obsolete: true
Attachment #498311 - Flags: review?(dmandelin)
Attachment #498313 - Flags: review?(dmandelin)
Review checklists should include that IC space has been appropriately reserved for both inline and out-of-line paths, so that constant pools don't get dumped in the middle of offset-sensitive IC code.
Chris, it looks like some of these reviews have you as the requestee. Were those patches not written by you or is that some bugzilla bustage?
Comment on attachment 498304 [details] [diff] [review]
1. Cross-platform PIC labels.

I think this is pretty close, but I have a few questions. As a sanity preservation device, to track which patches have been reviewed and not +'d, I'm actually going to minus them (gasp! :-) ). This should not actually be considered a negative evaluation of the patch.

Anyway, my notes on this patch:

1. The 8-bit bitfields make me a bit nervous. We have had bugs in the past from too-small offset bitfields. Can we get either (a) a comment explaining why 8 bits is certainly enough, or (b) assertions that the values have been stored properly?

2. Shouldn't the data fields be #if'd out if IC_HAS_LABELS is not defined?

3. I really like the tagged union, checked and made obvious by assertions, for the PIC labels in the PICInfo structs.

4. It seems to me that "offsets" would be a more apt name than "labels" in this case. But it seems too much work to change at this point. Maybe we can do that automatically someday for fun.

5. The "dbg" names in the compiler do seem wrong, though--in the current abstraction they are real, not just debugging bits.

Action is required only on #1, #2, and #5.
Attachment #498304 - Flags: review?(dmandelin) → review-
Comment on attachment 498305 [details] [diff] [review]
2. Newlabel MIC refactor.

Nice-looking patch. I have only a few small requests. By the way, let me know how you want to process the requests--are you going to revise each patch in place, or do you want to address comments in further patches at the end?

Action-required comments:

1. There are magic numbers in ICChecker.h. They are very limited use so it is not egregious, but I think we would be better off with const definitions for them, just to make it easier to maintain later.

2. There are a few typos in comments:

- In ICRepatcher, search for ' tore', which should be 'store'. 
- There is also a missing paren near there. 
- In Nunbox.h, comments refer to 'type' and 'payload' while the arg names are 'treg' and 'dreg'.
- In MonoIC.h, there is a bit of ambiguity in the "Used by GET/SET" part: the first comment seems to be referring to "load", but it's not explicit.
- Also there, there is a typo 'pint' for 'point'.

No-action-required comments:

3. Abbreviation B for byte and b for bit hasn't been used before in SM codebase. I figured it out, and it's standard in general usage, so it doesn't seem too bad. Might want to ask others about that, though.

4. The DBGLABEL names are kind of gross to me. You didn't invent them, though, and it's not obvious how to really improve them, so it's not necessary to fix them now. 

5. Excellent comment blocks in ICRepatcher.h!
Attachment #498305 - Flags: review?(dmandelin) → review-
Comment on attachment 498308 [details] [diff] [review]
3. Newlabel GETPROP.

Looks good. There seems to be an old dead variable "Jump j" in Compiler::jsop_getprop() that you could delete. I studied the offset logic added here and its uses pretty closely, and it looks right to me. It would be a good idea next time we meet in person for you to give me an explanation of the constant pool stuff (I do get the gist of it here, and I see the macros where I think they belong). 

I think also, at the end, I should recheck that all the x86 fixed offsets are checked by assertions. I can do that after landing, though.
Attachment #498308 - Flags: review?(dmandelin) → review+
Comment on attachment 498311 [details] [diff] [review]
6. Newlabel SETPROP.

I spotted only 2 minor issues, so I'm just going to r+ with change requests.

>diff --git a/js/src/methodjit/ICLabels.h b/js/src/methodjit/ICLabels.h


1. The ifdefs are now unneeded:

>     CodeLocationJump getInlineTypeJump(CodeLocationLabel fastPathStart) {
> #if defined JS_CPU_X86 || defined JS_CPU_X64
>         return fastPathStart.jumpAtOffset(getInlineTypeJumpOffset());
> #elif defined JS_CPU_ARM
>         /* Type check is after the testObject, so offset by one instruction to get the jump. */
>-        return fastPathStart.jumpAtOffset(getInlineTypeJumpOffset() - sizeof(ARMWord));
>+        return fastPathStart.jumpAtOffset(getInlineTypeJumpOffset());
> #endif
>     }
>+    void setDslotsLoadOffset(int offset, bool isConstant, bool isTypeKnown) {
>+#if defined JS_HAS_IC_LABELS
>+#elif defined JS_CPU_X86
>+        JS_ASSERT_IF(isConstant, offset == INLINE_DSLOTS_BEFORE_CONSTANT);
>+        JS_ASSERT_IF(isTypeKnown && !isConstant, offset == INLINE_DSLOTS_BEFORE_KTYPE);

2. Need a 3rd assertion here to cover the DYNAMIC case.

That's it.
Attachment #498311 - Flags: review?(dmandelin) → review+
Attachment #498312 - Flags: review?(dmandelin) → review+
Attachment #498313 - Flags: review?(dmandelin) → review+
Attachment #498314 - Flags: review?(dmandelin) → review+
Attachment #498310 - Flags: review?(cdleary) → review+
Attachment #498309 - Flags: review?(cdleary) → review+
Attachment #498307 - Flags: review?(cdleary) → review+
FWIW, the patches in this bug caused a 10% V8 regression (http://www.arewefastyet.com/awfy2.php?machine=9). Looking at the individual tests, raytrace, earley and splay regressed the most.
Depends on: 625701
Depends on: 625718
Chris and I went over the important questions from review IRL. Only nits are left, so r+ing the patches so we can fix nits and land.
Attachment #498304 - Flags: review- → review+
Attachment #498305 - Flags: review- → review+
Depends on: 625757
No longer depends on: 625757
How could I forget about http://hg.mozilla.org/mozilla-central/rev/87d45053e28f ? That was heartless.
You need to log in before you can comment on or make changes to this bug.