Google blog page loads very slowly and often fails to fully load
Categories
(Core :: JavaScript Engine, defect, P3)
Tracking
()
Performance Impact | low |
People
(Reporter: mccr8, Assigned: iain)
References
(Blocks 2 open bugs, )
Details
(Keywords: webcompat:platform-bug, webcompat:site-report)
User Story
platform:windows,mac,linux,android impact:site-broken configuration:general affects:some branch:release diagnosis-team:performance
Attachments
(2 files)
This web page doesn't fully load for me in desktop Firefox: https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html
I see the header and the footer, but nothing else. Reloading doesn't help.
In another profile, I had these errors after a few reloads:
TypeError: can't access property "GetAvailRect", screen is nullPanelMultiView.sys.mjs:1189:5
_calculateMaxHeight resource:///modules/PanelMultiView.sys.mjs:1189
handleEvent resource:///modules/PanelMultiView.sys.mjs:1259
Request to access cookie or storage on “https://www.youtube.com/subscribe_embed? etc"
was blocked because it came from a tracker and content blocking is enabled.
(twice)
But after reloading 3 or so times it eventually worked. In my regular profile, it still is not working.
In my main profile, I'm getting this error message:
Exception trying to format object for log message: SyntaxError: JSON.parse: unexpected end of data at line 1 column 1 of the JSON data(resource://services-sync/resource.sys.mjs:222:21) JS Stack trace: _processResponse/<@resource.sys.mjs:222:21
format@Log.sys.mjs:639:19
formatText@Log.sys.mjs:568:44
format@Log.sys.mjs:586:12
append@Log.sys.mjs:679:37
append@logmanager.sys.mjs:137:44
log@Log.sys.mjs:376:16
warn@Log.sys.mjs:387:10
_doQuickWrite@tabs.sys.mjs:308:17
and then:
Uncaught InternalError: too much recursion
initContent https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html:3882
jQuery 2
initContent https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html:3875
process https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html:3932
<anonymous> https://security.googleblog.com/2023/07/the-ups-and-downs-of-0-days-year-in.html:3973
jQuery 11
Reporter | ||
Comment 1•1 year ago
|
||
I'm seeing the JSON.parse message even without the page loaded, so it is probably unrelated.
Reporter | ||
Comment 2•1 year ago
|
||
We also take about twice as long to show content on this page as Chrome. Looking at a profile, approximately 100% of the 9 seconds it takes to show anything is being spent in is being spent in the JS engine's regexp code, so maybe something is going weird there? https://share.firefox.dev/45mtSfh
Reporter | ||
Comment 3•1 year ago
|
||
I'll move this over to the JS engine component then.
Iain, is there somebody who might be able to look at irregexp performance issues? Thanks.
Reporter | ||
Updated•1 year ago
|
Assignee | ||
Comment 4•1 year ago
|
||
Profiling Chrome on the same site, it takes several seconds to load and spends 94% of the total time in the same regexp (/([\s\S]+?)<div data-is-preview.+?>([\s\S\]+)<\/div>/
). It looks like it is stuck in a pathological backtracking case, although not for as long as we are.
I don't know why Chrome gets through the pathological regexp faster than we do. We use the same regexp engine as V8, so we should be doing basically the same work. We have our own codegen layer, so maybe there's an issue there.
Should we reach out to the blog author and let them know that their regexp is bad?
Reporter | ||
Comment 5•1 year ago
|
||
I posted about it on Twitter in reply to this tweet.
Reporter | ||
Comment 6•1 year ago
|
||
They're looking into it.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Reporter | ||
Comment 7•1 year ago
|
||
Somebody else filed a webcompat issue on this same site.
Comment 8•1 year ago
|
||
Assignee | ||
Comment 9•1 year ago
|
||
Locally, d8 takes about 3.4s and SM takes about 4 seconds when running a reduced testcase in the shell. In SM, roughly 98% of all samples are in this instruction sequence in jitted regexp code:
cmp rdx, -0x7
jge $+0x28
movzx ecx, byte [rax + rdx * 1 + 0x7]
mov rsi, 0x7fbc96c40c14
mov edi, 0x7f
and edi, ecx
movzx esi, byte [rsi + rdi * 1]
test esi, esi
jnz $+0x6
add rdx, 0x8
jmp $-0x32
cmp rdx, -0x1c
jge $+0x418
mov ecx, dword [rax + rdx * 1]
cmp ecx, 0x7669643c
jz $+0x6
add rdx, 0x1
jmp $-0x51
In d8, we get the following:
cmpl rdi,0xf9
jge 0x34dc000840d4 <+0x94>
movzxbl rdx,[rsi+rdi*1+0x7]
REX.W movq rax,0x34dc082124
REX.W movq rbx,rdx
REX.W andq rbx,0x7f
cmpb [rax+rbx*1+0x7],0x0
jnz 0x34dc000840d4 <+0x94>
REX.W addq rdi,0x8
jmp 0x34dc000840a4 <+0x64>
cmpl rdi,0xe4
jge 0x34dc00084501 <+0x4c1
movl rdx,[rsi+rdi*1]
cmpl rdx,0x7669643c
jz 0x34dc000840f2 <+0xb2>
REX.W addq rdi,0x1
jmp 0x34dc000840a4 <+0x64>
Setting aside differences in how things are formatted, the generated code is mostly identical except for this chunk:
***SM*** ***V8***
movzx esi, byte [rsi + rdi * 1] cmpb [rax+rbx*1+0x7],0x0
test esi, esi jnz 0x34dc000840d4 <+0x94>
jnz $+0x6
We're doing a load and a test; V8 is doing a comparison directly in memory. The test instruction is the single hottest instruction in the benchmark, with more than 1/3 of all samples (31K/89K), so it seems plausible that fixing this would help. It's generated by this code here. The macroassembler doesn't currently support branch8
with a BaseIndex and an immediate operand, but I don't see any reason we can't add it.
Assignee | ||
Comment 10•1 year ago
|
||
I wrote a patch that generates code matching V8:
cmp byte [rsi + rdi * 1], 0x0
jnz $+0x6
Unfortunately, at least on my machine, it slows things down: d8 is still around 3.4s, but my modified version of SM has regressed from 4s to nearly 5s. In cases like this, where there's a tiny amount of hot code and small changes that look like they should be improvements regress performance, I start to suspect that we're getting unlucky with eg cache line alignment: maybe V8's regexp prologue is a slightly different length, code is aligned in slightly different places, and since we spend a huge amount of time in a tiny number of instructions, even a small hiccup in the hardware can lead to large performance differences. If that's what's going on here, my experience is that it's difficult/impossible to modify the compiler to avoid ever hitting this kind of problem, so you should just generate the code that should be better and assume that it will be a win overall.
I'm not sure if that's an argument for landing my patch or dropping it. I'll put it up as a WIP for now. If anybody can measure a real improvement from it, we can consider landing it.
Assignee | ||
Comment 11•1 year ago
|
||
Comment 12•1 year ago
|
||
Why do we throw InternalError: too much recursion
and Chrome doesn't?
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 13•1 year ago
|
||
Hmm, interesting question. "too much recursion" is basically our default error when things go wrong in a regexp. (We should maybe fix that at some point.) I was assuming it was this code, which limits the number of times we retry after being interrupted. I didn't actually verify that, though, and now that I think about it, it doesn't make much sense: the slow script dialog doesn't appear here.
Assignee | ||
Comment 14•1 year ago
•
|
||
Actually, scratch that: I tested it out in the browser, and we are indeed triggering interrupts via the watchdog thread. I guess we count the number of times we're interrupted, not the number of times we show the dialog.
Long story short: the watchdog thread interrupted us four times, so we gave up and threw an error. This also explains why Chrome is significantly faster here: when interrupted, we stop the regexp, handle the interrupt, then restart regexp execution from the beginning, whereas V8 does scary code crimes to keep the regexp running despite the possibility of triggering a GC with unrooted values on the stack. (The words "manually relocate unhandlified references" are involved.)
Comment 15•1 year ago
|
||
anecdotal info: pages may load for folks with more RAM -- I'm on win64 arm64 with 16GBi ram and always fails, others with 64GBi ram get it rendered.
Comment 16•1 year ago
|
||
"Perf" key word?
Updated•1 year ago
|
Updated•8 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Comment 17•2 months ago
|
||
Hey Iain,
What ever happened to https://v8.dev/blog/non-backtracking-regexp? Would that help here? Are there other things beyond scary code crimes we could do here? (I suppose maybe we allow more watchdog interrupts from regexps, with some backoff allowed).
Assignee | ||
Comment 18•21 days ago
|
||
It's still experimental. It doesn't look particularly active; it looks like there might be a grad student plugging away at it (eg here: "This work was supervised by Aurèle Barrière, from SYSTEMF at EPFL.")
It's disabled by default in V8.
If we want to fix this, we might need the scary code crimes. Working through what those crimes would actually entail:
- The core problem is that handling an interrupt could trigger a GC in the middle of regexp execution, and we haven't necessarily rooted everything.
- Inside the generated irregexp code, we mostly use offsets. The exception is a single register (input_end_pointer_) that points to the byte after the last character of the input. This is what everything else is an offset from. If we could root the underlying string before triggering an interrupt, and update this pointer after handling the interrupt, everything else should Just Work.
- The irregexp code never actually sees a string. It is called with a pointer to an InputOutputData, which in turn contains two pointers to the beginning and end of the string. The generated irregexp code uses the InputOutputData to set up its stack frame (see FrameData) and registers, then drops the reference to the InputOutputData.
- As far as I can tell, all callers have access to a real JSString, and are currently responsible for computing the start/end pointers. I believe I copied this design decision from bhackett's original irregexp port. In V8, the calling convention for the generated code passes in the string pointer directly. It should be mostly trivial to rewrite this code to pass a JSString* in InputOutputData, and store it in the irregexp FrameData.
- If we did that, then on stack overflow (which is how interrupts are signalled), instead of immediately throwing an exception and terminating regexp execution, we could a) pass a FrameData* to the interrupt handler and manually update the JSString*, or (maybe better?) b) push an exit frame and treat this frame like any other frame containing data that needs to be traced.
- This would require us to audit the code that calls into the generated irregexp code, to make sure that nothing is unrooted in our caller, either. When called from C++, this is ExecuteRaw and its only caller here. (One downside of option b is that this code currently skips most of the overhead of calling into jitcode, like pushing a new activation, and that might not be possible if we want to trace frames the standard way.) When called from jit code, the caller is generated by PrepareAndExecuteRegExp, which also probably needs work to be GC-safe.
All in all, it seems doable, but would need careful auditing.
Description
•