Open Bug 1845775 Opened 1 year ago Updated 21 days ago

Google blog page loads very slowly and often fails to fully load

Tracking

()

Status:

ASSIGNED

Project Flags:

Performance Impact

low

People

(Reporter: mccr8, Assigned: iain)

References

(Blocks 2 open bugs,
URL
)

Details

(Keywords: webcompat:platform-bug, webcompat:site-report)

User Story

platform:windows,mac,linux,android
impact:site-broken
configuration:general
affects:some
branch:release
diagnosis-team:performance

Attachments

(2 files)

reduced-test-case.html 1 year ago Jeff Muizelaar [:jrmuizel] 153.07 KB, text/html		Details
WIP: Bug 1845775: Generate cmpb instead of movzx/test in CheckBitInTable 1 year ago Iain Ireland [:iain] 48 bytes, text/x-phabricator-request		Details \| Review

Andrew McCreight [:mccr8]

Reporter

Description

•

1 year ago

This web page doesn't fully load for me in desktop Firefox: https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html

I see the header and the footer, but nothing else. Reloading doesn't help.

In another profile, I had these errors after a few reloads:

TypeError: can't access property "GetAvailRect", screen is nullPanelMultiView.sys.mjs:1189:5
    _calculateMaxHeight resource:///modules/PanelMultiView.sys.mjs:1189
    handleEvent resource:///modules/PanelMultiView.sys.mjs:1259

Request to access cookie or storage on “https://www.youtube.com/subscribe_embed? etc"
was blocked because it came from a tracker and content blocking is enabled.
(twice)

But after reloading 3 or so times it eventually worked. In my regular profile, it still is not working.

In my main profile, I'm getting this error message:

Exception trying to format object for log message: SyntaxError: JSON.parse: unexpected end of data at line 1 column 1 of the JSON data(resource://services-sync/resource.sys.mjs:222:21) JS Stack trace: _processResponse/<@resource.sys.mjs:222:21
format@Log.sys.mjs:639:19
formatText@Log.sys.mjs:568:44
format@Log.sys.mjs:586:12
append@Log.sys.mjs:679:37
append@logmanager.sys.mjs:137:44
log@Log.sys.mjs:376:16
warn@Log.sys.mjs:387:10
_doQuickWrite@tabs.sys.mjs:308:17

and then:

Uncaught InternalError: too much recursion
    initContent https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html:3882
    jQuery 2
    initContent https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html:3875
    process https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html:3932
    <anonymous> https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html:3973
    jQuery 11

Andrew McCreight [:mccr8]

Reporter

Comment 1

•

1 year ago

I'm seeing the JSON.parse message even without the page loaded, so it is probably unrelated.

Andrew McCreight [:mccr8]

Reporter

Comment 2

•

1 year ago

We also take about twice as long to show content on this page as Chrome. Looking at a profile, approximately 100% of the 9 seconds it takes to show anything is being spent in is being spent in the JS engine's regexp code, so maybe something is going weird there? https://share.firefox.dev/45mtSfh

Andrew McCreight [:mccr8]

Reporter

Comment 3

•

1 year ago

I'll move this over to the JS engine component then.

Iain, is there somebody who might be able to look at irregexp performance issues? Thanks.

URL: https://security.googleblog.com/2023/...

Component: Desktop → JavaScript Engine

Flags: needinfo?(iireland)

Product: Web Compatibility → Core

Summary: https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html doesn't fully load → Google blog page loads very slowly and often fails to fully load

Andrew McCreight [:mccr8]

Reporter

Updated

•

1 year ago

Performance Impact: --- → ?

Webcompat Priority: --- → ?

Iain Ireland [:iain]

Assignee

Comment 4

•

1 year ago

Profiling Chrome on the same site, it takes several seconds to load and spends 94% of the total time in the same regexp (/([\s\S]+?)<div data-is-preview.+?>([\s\S\]+)<\/div>/). It looks like it is stuck in a pathological backtracking case, although not for as long as we are.

I don't know why Chrome gets through the pathological regexp faster than we do. We use the same regexp engine as V8, so we should be doing basically the same work. We have our own codegen layer, so maybe there's an issue there.

Should we reach out to the blog author and let them know that their regexp is bad?

Flags: needinfo?(iireland)

Andrew McCreight [:mccr8]

Reporter

Comment 5

•

1 year ago

I posted about it on Twitter in reply to this tweet.

Andrew McCreight [:mccr8]

Reporter

Comment 6

•

1 year ago

They're looking into it.

Randell Jesup [:jesup] (needinfo me)

Updated

•

1 year ago

Performance Impact: ? → low

Ksenia Berezina [:ksenia]

Updated

•

1 year ago

Webcompat Priority: ? → ---

Ksenia Berezina [:ksenia]

Updated

•

1 year ago

Keywords: webcompat:needs-diagnosis

Andrew McCreight [:mccr8]

Reporter

Comment 7

•

1 year ago

Somebody else filed a webcompat issue on this same site.

See Also: → https://github.com/webcompat/web-bugs/issues/125193

Jeff Muizelaar [:jrmuizel]

Comment 8

•

1 year ago

Attached file reduced-test-case.html — Details

Iain Ireland [:iain]

Assignee

Comment 9

•

1 year ago

Locally, d8 takes about 3.4s and SM takes about 4 seconds when running a reduced testcase in the shell. In SM, roughly 98% of all samples are in this instruction sequence in jitted regexp code:

cmp rdx, -0x7
jge $+0x28
movzx ecx, byte [rax + rdx * 1 + 0x7]
mov rsi, 0x7fbc96c40c14
mov edi, 0x7f
and edi, ecx
movzx esi, byte [rsi + rdi * 1]
test esi, esi
jnz $+0x6
add rdx, 0x8
jmp $-0x32
cmp rdx, -0x1c
jge $+0x418
mov ecx, dword [rax + rdx * 1]
cmp ecx, 0x7669643c
jz $+0x6
add rdx, 0x1
jmp $-0x51

In d8, we get the following:

cmpl rdi,0xf9		
jge 0x34dc000840d4  <+0x94>
movzxbl rdx,[rsi+rdi*1+0x7]
REX.W movq rax,0x34dc082124
REX.W movq rbx,rdx		
REX.W andq rbx,0x7f	
cmpb [rax+rbx*1+0x7],0x0	
jnz 0x34dc000840d4  <+0x94>
REX.W addq rdi,0x8		
jmp 0x34dc000840a4  <+0x64>
cmpl rdi,0xe4		
jge 0x34dc00084501  <+0x4c1
movl rdx,[rsi+rdi*1]	
cmpl rdx,0x7669643c	
jz 0x34dc000840f2  <+0xb2>	
REX.W addq rdi,0x1		
jmp 0x34dc000840a4  <+0x64>

Setting aside differences in how things are formatted, the generated code is mostly identical except for this chunk:

          ***SM***                             ***V8***
movzx esi, byte [rsi + rdi * 1]		 cmpb [rax+rbx*1+0x7],0x0	
test esi, esi				 jnz 0x34dc000840d4  <+0x94>
jnz $+0x6

We're doing a load and a test; V8 is doing a comparison directly in memory. The test instruction is the single hottest instruction in the benchmark, with more than 1/3 of all samples (31K/89K), so it seems plausible that fixing this would help. It's generated by this code here. The macroassembler doesn't currently support branch8 with a BaseIndex and an immediate operand, but I don't see any reason we can't add it.

Iain Ireland [:iain]

Assignee

Comment 10

•

1 year ago

I wrote a patch that generates code matching V8:

cmp byte [rsi + rdi * 1], 0x0
jnz $+0x6

Unfortunately, at least on my machine, it slows things down: d8 is still around 3.4s, but my modified version of SM has regressed from 4s to nearly 5s. In cases like this, where there's a tiny amount of hot code and small changes that look like they should be improvements regress performance, I start to suspect that we're getting unlucky with eg cache line alignment: maybe V8's regexp prologue is a slightly different length, code is aligned in slightly different places, and since we spend a huge amount of time in a tiny number of instructions, even a small hiccup in the hardware can lead to large performance differences. If that's what's going on here, my experience is that it's difficult/impossible to modify the compiler to avoid ever hitting this kind of problem, so you should just generate the code that should be better and assume that it will be a win overall.

I'm not sure if that's an argument for landing my patch or dropping it. I'll put it up as a WIP for now. If anybody can measure a real improvement from it, we can consider landing it.

Iain Ireland [:iain]

Assignee

Comment 11

•

1 year ago

Attached file WIP: Bug 1845775: Generate cmpb instead of movzx/test in CheckBitInTable — Details

Jeff Muizelaar [:jrmuizel]

Comment 12

•

1 year ago

Why do we throw InternalError: too much recursion and Chrome doesn't?

Jeff Muizelaar [:jrmuizel]

Updated

•

1 year ago

Flags: needinfo?(iireland)

Nicolas B. Pierron [:nbp]

Updated

•

1 year ago

Assignee: nobody → iireland

Blocks: sm-defects-site, sm-jits

Severity: -- → S4

Status: NEW → ASSIGNED

Priority: -- → P1

Iain Ireland [:iain]

Assignee

Comment 13

•

1 year ago

Hmm, interesting question. "too much recursion" is basically our default error when things go wrong in a regexp. (We should maybe fix that at some point.) I was assuming it was this code, which limits the number of times we retry after being interrupted. I didn't actually verify that, though, and now that I think about it, it doesn't make much sense: the slow script dialog doesn't appear here.

Flags: needinfo?(iireland)

Iain Ireland [:iain]

Assignee

Comment 14

•

1 year ago

•

Edited

Actually, scratch that: I tested it out in the browser, and we are indeed triggering interrupts via the watchdog thread. I guess we count the number of times we're interrupted, not the number of times we show the dialog.

Long story short: the watchdog thread interrupted us four times, so we gave up and threw an error. This also explains why Chrome is significantly faster here: when interrupted, we stop the regexp, handle the interrupt, then restart regexp execution from the beginning, whereas V8 does scary code crimes to keep the regexp running despite the possibility of triggering a GC with unrooted values on the stack. (The words "manually relocate unhandlified references" are involved.)

Hal Wine [:hwine] use NI!

Comment 15

•

1 year ago

anecdotal info: pages may load for folks with more RAM -- I'm on win64 arm64 with 16GBi ram and always fails, others with 64GBi ram get it rendered.

Worcester12345

Comment 16

•

1 year ago

"Perf" key word?

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

1 year ago

Priority: P1 → P3

James Graham [:jgraham]

Updated

•

8 months ago

Keywords: webcompat:needs-diagnosis

James Graham [:jgraham]

Updated

•

2 months ago

Keywords: webcompat:platform-bug, webcompat:site-report

Dennis Schubert [:denschub]

Updated

•

2 months ago

Severity: S4 → S2

User Story: (updated)

Matthew Gaudet (he/him) [:mgaudet]

Comment 17

•

2 months ago

Hey Iain,

What ever happened to https://v8.dev/blog/non-backtracking-regexp? Would that help here? Are there other things beyond scary code crimes we could do here? (I suppose maybe we allow more watchdog interrupts from regexps, with some backoff allowed).

Flags: needinfo?(iireland)

Iain Ireland [:iain]

Assignee

Comment 18

•

21 days ago

It's still experimental. It doesn't look particularly active; it looks like there might be a grad student plugging away at it (eg here: "This work was supervised by Aurèle Barrière, from SYSTEMF at EPFL.")

It's disabled by default in V8.

If we want to fix this, we might need the scary code crimes. Working through what those crimes would actually entail:

The core problem is that handling an interrupt could trigger a GC in the middle of regexp execution, and we haven't necessarily rooted everything.
Inside the generated irregexp code, we mostly use offsets. The exception is a single register (input_end_pointer_) that points to the byte after the last character of the input. This is what everything else is an offset from. If we could root the underlying string before triggering an interrupt, and update this pointer after handling the interrupt, everything else should Just Work.
The irregexp code never actually sees a string. It is called with a pointer to an InputOutputData, which in turn contains two pointers to the beginning and end of the string. The generated irregexp code uses the InputOutputData to set up its stack frame (see FrameData) and registers, then drops the reference to the InputOutputData.
As far as I can tell, all callers have access to a real JSString, and are currently responsible for computing the start/end pointers. I believe I copied this design decision from bhackett's original irregexp port. In V8, the calling convention for the generated code passes in the string pointer directly. It should be mostly trivial to rewrite this code to pass a JSString* in InputOutputData, and store it in the irregexp FrameData.
If we did that, then on stack overflow (which is how interrupts are signalled), instead of immediately throwing an exception and terminating regexp execution, we could a) pass a FrameData* to the interrupt handler and manually update the JSString*, or (maybe better?) b) push an exit frame and treat this frame like any other frame containing data that needs to be traced.
This would require us to audit the code that calls into the generated irregexp code, to make sure that nothing is unrooted in our caller, either. When called from C++, this is ExecuteRaw and its only caller here. (One downside of option b is that this code currently skips most of the overhead of calling into jitcode, like pushing a new activation, and that might not be possible if we want to trace frames the standard way.) When called from jit code, the caller is generated by PrepareAndExecuteRegExp, which also probably needs work to be GC-safe.

All in all, it seems doable, but would need careful auditing.

Flags: needinfo?(iireland)

You need to log in before you can comment on or make changes to this bug.