many misaligned 32-bit loads from jitted regexps

NEW
Unassigned

Status

()

Core
JavaScript Engine
8 years ago
4 years ago

People

(Reporter: luke, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

8 years ago
With njn's misalignment patch applied (bug 476122), valgrind is reporting ~2 million misaligned 32-bit loads from regexp jitted code running regexp-dna.js.

For the following microbenchmark:

s = "abcdabcdabcd";
for (var i = 0; i < 10000; ++i) {
    /cdab/.test(s);
}

valgrind reports 10000 misaligned 32-bit loads.
Will investigate. Have we seen what kind of wins we get from any other misaligned load bugs so I know where to prioritize?
Assignee: general → cdleary
Status: NEW → ASSIGNED
(In reply to comment #1)
> Will investigate. Have we seen what kind of wins we get from any other
> misaligned load bugs so I know where to prioritize?

So far, it's been mostly with misaligned doubles. It seems like it should be possible to find out if this matters by doing a quick-and-dirty tweak to reduce the unaligned loads in the microbenchmark and see if that has any effect.
(Reporter)

Comment 3

8 years ago
(In reply to comment #1)
In bug 589526 comment 0, I seemed to get a 1-2ms speedup (on a way-old 1.8GHz laptop) from a hack to remove around 200K misaligned double-loads/stores.  YMMV.
(In reply to comment #1)
I think it's worth investigating, although I suspect it might end
up feeling like a trip into the microarchitectural Twilight Zone.

One thing to bear in mind is, there may be a (big?) cost difference
between misaligned accesses that straddle a D1 or L2 line, as opposed
to those that don't.  In the former case the processor has to fish
out both cache lines and glue the result together, which sounds
slow.  See (eg) 2nd para of "Introduction" of 
http://software.intel.com/en-us/articles/reducing-the-impact-of-misaligned-memory-accesses
(In reply to comment #4)
> misaligned accesses that straddle a D1 or L2 line

Or worse, a page boundary!
Created attachment 470561 [details]
Annotated asm found during debug.

My debugging session showed only aligned accesses to the string with our malloc.

The assembled regexp program always fails to match on the first two (coalesced and aligned) characters of the string in my debugging session, but Valgrind is showing the error at an address of 0x7d254f2.

It says, "Address 0x7d254f2 is 2 bytes inside a block of size 26 alloc'd" -- I'm guessing that means the accesses is at _an offset of two bytes_ within a block sized 26 bytes? If so, I can't repro that behavior under debug ATM. Will ponder a bit.
> It says, "Address 0x7d254f2 is 2 bytes inside a block of size 26 alloc'd" --
> I'm guessing that means the accesses is at _an offset of two bytes_ within a
> block sized 26 bytes?

Yes.

> If so, I can't repro that behavior under debug ATM. Will
> ponder a bit.

Rerun with --db-attach=yes.  This allows you to optionally attach GDB
to the process at any error V reports, so you can look at the
registers exactly at the point where the alleged misalignment
occurred.  (--db-attach only works on Linux, be warned.)
Nevermind, this makes sense to me now. The increment after the first char test is only one, so our misaligned dword load comes from the "b" char, two bytes in as Valgrind is reporting. Thinking now about how to get a dword-sized increment in the most general case we can.

Thanks for the tip Julian! Will definitely try that out.
Mass-reassigning cdleary's bugs to default. He won't work on any of them, anymore. I guess, at least.

@cdleary: shout if you take issue with this.
Assignee: cdleary → general
Status: ASSIGNED → NEW
(Assignee)

Updated

4 years ago
Assignee: general → nobody
You need to log in before you can comment on or make changes to this bug.