Initial mozilla safebrowsing/tracking download almost always fails on local artifact build
Categories
(Core :: Networking: HTTP, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox109 | --- | fixed |
People
(Reporter: Gijs, Assigned: kershaw)
References
Details
(Whiteboard: [necko-triaged])
Attachments
(4 files)
I was having issues testing out styling changes to the tracking protection panel because in new profiles, there was no effective TP at all.
Digging into it, what seems to happen is that on an artifact build and a clean profile, only the mozilla
provider is being listed on about:url-classifier
, and attempting to force an update hits the backoff timer ("cannot update").
From logging, it appears that the initial request errors out without a status
argument provided to https://searchfox.org/mozilla-central/rev/2d1097c27f31cb15f7588b043d691cda9710d14b/toolkit/components/url-classifier/UrlClassifierListManager.jsm#755-759 . The url that fails is https://shavar.services.mozilla.com/downloads?client=Firefox&appver=108.0a&pver=2.2
but no request is logged at all in the network part of the browser toolbox. I have the impression the request times out entirely or fails to connect, but I'm not sure how to get more details for it.
It doesn't appear possible to force an update despite the backoff time from the UI, but if I set a breakpoint in the debugger here and overwrite the nextRequestTime_
when clicking "Force update" in about:url-classifier
the request succeeds fine and TP starts working.
The same issue is more intermittent when using mozregression downloaded builds, which makes it difficult/impossible to determine if this is some kind of regression or specific to artifact builds. It smells like a race condition to me but I'm unsure how to dig into it more.
Reporter | ||
Comment 1•3 years ago
|
||
Comment 2•3 years ago
•
|
||
Thanks for the detailed report! I couldn't reproduce this locally with an artifact build and mach run --temp-profile
yet. The ETP list seems to get populated within a few seconds of browser startup.
Dimi, do you have time to take a look at this issue when you're back? I'm worried that it might be more widespread and actually affect users in the wild. Happy to take a look together too, feel free to reach out on Slack or Matrix.
Comment 3•3 years ago
•
|
||
I'm able to reproduce this issue on my platform. It looks like when SafeBrowsing tries to fetch update from the server, the OnStartRequest is never called, hence SafeBrowsing returns a timeout error.
Somehow this issue only happens when SafeBrowsing triggers an update during startup, if I trigger an update manually after startup, everything works fine.
Hi Kershaw,
I collected a debug log with http logging, could someone in your team help take a look at the log to see if you have any idea about why this issue is happening? Thanks. Please also let me know if you need more information from me.
Some information about the attached log
- The first two SafeBrowsing requests succeed (line 31721, line 125277), the third one fails (line 161661).
- I change the SafeBrowsing timeout to 10sec. The timeout callback is called in line 169290 (the second file)
Assignee | ||
Comment 4•3 years ago
|
||
Thanks for the log, Dimi.
This is actually a necko bug. I'll try to fix this.
Basically, what happens in necko is that we got stuck in an infinite loop of dispatching a transaction.
The first condition to trigger this is having a http3 connection (tracking-protection.cdn.mozilla.net
) available.
2022-11-07 11:27:04.586894 UTC - [Parent 8909: Socket Thread]: V/nsHttp nsHttpConnectionMgr::DispatchAbstractTransaction [ci=.S........[tlsflags0x00000000]tracking-protection.cdn.mozilla.net:443 <ROUTE-via tracking-protection.cdn.mozilla.net:443> {NPN-TOKEN h3}^firstPartyDomain=safebrowsing.86868755-6b82-4842-b301-72671a0db32e.mozilla trans=1489f2320 caps=400 conn=139d6fd20]
2022-11-07 11:27:04.586897 UTC - [Parent 8909: Socket Thread]: E/nsHttp HttpConnectionUDP::Activate [this=139d6fd20 trans=1489f2320 caps=400]
Next, another channel to the same host (tracking-protection.cdn.mozilla.net
) is created, but with http3 disabled.
2022-11-07 11:27:04.585965 UTC - [Parent 8909: Main Thread]: D/nsHttp Creating nsHttpChannel [this=14a20be00]
2022-11-07 11:27:04.585969 UTC - [Parent 8909: Main Thread]: E/nsHttp HttpBaseChannel::Init [this=14a20be00]
2022-11-07 11:27:04.585975 UTC - [Parent 8909: Main Thread]: E/nsHttp uri=https://tracking-protection.cdn.mozilla.net/social-track-digest256/1604686195
Then, we successfully created a http2 connection to the server. But this connection can not be used, since we try to coalesce this http2 connection to the http3 one.
2022-11-07 11:27:04.652369 UTC - [Parent 8909: Socket Thread]: V/nsHttp FindCoalescableConnection .S........[tlsflags0x00000000]tracking-protection.cdn.mozilla.net:443^firstPartyDomain=safebrowsing.86868755-6b82-4842-b301-72671a0db32e.mozilla
2022-11-07 11:27:04.652380 UTC - [Parent 8909: Socket Thread]: V/nsHttp FindCoalescableConnectionByHashKey() found match conn=139d6fd20 key=34.120.158.37~.:~.:443/[^firstPartyDomain=safebrowsing.86868755-6b82-4842-b301-72671a0db32e.mozilla]viaDNS newCI=.S........[tlsflags0x00000000]tracking-protection.cdn.mozilla.net:443^firstPartyDomain=safebrowsing.86868755-6b82-4842-b301-72671a0db32e.mozilla matchedCI=.S........[tlsflags0x00000000]tracking-protection.cdn.mozilla.net:443 <ROUTE-via tracking-protection.cdn.mozilla.net:443> {NPN-TOKEN h3}^firstPartyDomain=safebrowsing.86868755-6b82-4842-b301-72671a0db32e.mozilla join ok
2022-11-07 11:27:04.652384 UTC - [Parent 8909: Socket Thread]: V/nsHttp FindCoalescableConnection(.S........[tlsflags0x00000000]tracking-protection.cdn.mozilla.net:443^firstPartyDomain=safebrowsing.86868755-6b82-4842-b301-72671a0db32e.mozilla) match conn 139d6fd20 on dns key 34.120.158.37~.:~.:443/[^firstPartyDomain=safebrowsing.86868755-6b82-4842-b301-72671a0db32e.mozilla]viaDNS
2022-11-07 11:27:04.652388 UTC - [Parent 8909: Socket Thread]: V/nsHttp UpdateCoalescingForNewConn() found existing active conn that could have served newConn graceful close of newConn=14cb82800 to migrate to existingConn 139d6fd20
We dispatch the same transaction again. Since http3 is disabled, we end up with a look of creating a http2 connection, closing the connection, and dispatching the transaction again.
Assignee | ||
Comment 5•3 years ago
•
|
||
More context about what triggers this bug.
This bug is triggered by a server side change. To be more specifically, this is triggered by a server that advertises alt-svc
header.
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
When necko sees this header, we create a HTTP/3 connection to the server to verify this information. However, the request to load tracking protection list has beConservative
flag set, which means HTTP/3 is disabled for this request. In the end, this old necko bug is triggered.
We should also ask the server to stop advertising alt-svc
header, since this bug also affects multiple Firefox release versions (also ESR).
Dimi, do you know who to talk at the server side?
Comment 6•3 years ago
|
||
Sven,
Is there any recent change on the server side that make the server starts to send alt-svc
header?
Assignee | ||
Comment 7•3 years ago
|
||
Assignee | ||
Comment 8•3 years ago
|
||
Comment 9•3 years ago
|
||
Thanks Dimi and Kershaw for tracking this down!
(In reply to Tim Huang[:timhuang] from comment #6)
Is there any recent change on the server side that make the server starts to send
alt-svc
header?
There were no changes to the Shavar servers, i.e. the servers backing the domain name shavar.services.mozilla.com. However, as mentioned above, there were changes to the CDN we use to distribute the actual tracking protection lists, i.e. to the CDN that tracking-protection.cdn.mozilla.net points to. We used to use Amazon CloudFront before, and now we are using Google Cloud CDN. The header is added by Google. I'll look into whether we can get rid of it.
Assignee | ||
Comment 10•3 years ago
|
||
Add leave-open
, since I am working on a test case as a separate patch.
Comment 11•3 years ago
|
||
I've disabled HTTP3 on the production CDN https://tracking-protection.cdn.mozilla.net . I've left it on the stage CDN https://cdn.stage.shavar.nonprod.cloudops.mozgcp.net in case folks need to use that for testing fixes
Comment 12•3 years ago
|
||
Comment 13•3 years ago
|
||
bugherder |
Comment 14•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Comment 15•3 years ago
|
||
Backed out for causing failures complaining about security/sandbox/test/browser_bug1717599_XDG-CONFIG-DIRS.ini.
Backout link: https://hg.mozilla.org/integration/autoland/rev/1a5ebe6b0dac211cf4f8096c5409de5255fcd737
Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=f16db56098b0ec268a7dd0cf551f2abebba42135
failure log: https://treeherder.mozilla.org/logviewer?job_id=396266394&repo=autoland&lineNumber=2333
Assignee | ||
Comment 16•3 years ago
|
||
(In reply to Atila Butkovits from comment #15)
Backed out for causing failures complaining about security/sandbox/test/browser_bug1717599_XDG-CONFIG-DIRS.ini.
Backout link: https://hg.mozilla.org/integration/autoland/rev/1a5ebe6b0dac211cf4f8096c5409de5255fcd737
Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=f16db56098b0ec268a7dd0cf551f2abebba42135
failure log: https://treeherder.mozilla.org/logviewer?job_id=396266394&repo=autoland&lineNumber=2333
Not sure why adding a xpcshell test can cause this failure...
Anyway, I'll try to rename my test and land again.
Assignee | ||
Updated•3 years ago
|
Comment 17•3 years ago
•
|
||
I'm seeing a crash on my M1 MacOS 13 running a debug build with this change 2e39366a830c.
thread #10, name = 'Socket Thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x00000002bcb8b608 XUL`mozilla::net::Http2Session::Caps(this=0x00000001770f4e00) at Http2Session.cpp:4319:3
frame #1: 0x00000002bcda9adc XUL`mozilla::net::nsHttpConnection::StartSpdy(this=0x0000000173785000, sslControl=0x0000000173750400, spdyVersion=HTTP_2) at nsHttpConnection.cpp:299:38
frame #2: 0x00000002bcdb5914 XUL`mozilla::net::nsHttpConnection::HandshakeDoneInternal(this=0x0000000173785000) at nsHttpConnection.cpp:2370:7
frame #3: 0x00000002bcd57d78 XUL`mozilla::net::TlsHandshaker::HandshakeDone(this=0x0000000163957c98)::$_22::operator()() const at TlsHandshaker.cpp:45:27
frame #4: 0x00000002bcd57c7c XUL`mozilla::detail::RunnableFunction<mozilla::net::TlsHandshaker::HandshakeDone()::$_22>::Run(this=0x0000000163957c70) at nsThreadUtils.h:531:5
frame #5: 0x00000002bbf89e2c XUL`nsThread::ProcessNextEvent(this=0x0000000113c8b580, aMayWait=true, aResult=0x000000016d412527) at nsThread.cpp:1198:16
frame #6: 0x00000002bbf91c20 XUL`NS_ProcessNextEvent(aThread=0x0000000113c8b580, aMayWait=true) at nsThreadUtils.cpp:465:10
frame #7: 0x00000002bc498100 XUL`mozilla::net::nsSocketTransportService::Run(this=0x0000000103ba9900) at nsSocketTransportService2.cpp:1235:11
frame #8: 0x00000002bbf89e2c XUL`nsThread::ProcessNextEvent(this=0x0000000113c8b580, aMayWait=false, aResult=0x000000016d412ae7) at nsThread.cpp:1198:16
frame #9: 0x00000002bbf91c20 XUL`NS_ProcessNextEvent(aThread=0x0000000113c8b580, aMayWait=false) at nsThreadUtils.cpp:465:10
frame #10: 0x00000002bd423e6c XUL`mozilla::ipc::MessagePumpForNonMainThreads::Run(this=0x0000000113c91400, aDelegate=0x000000016d412dc8) at MessagePump.cpp:300:20
frame #11: 0x00000002bd28d434 XUL`MessageLoop::RunInternal(this=0x000000016d412dc8) at message_loop.cc:381:10
frame #12: 0x00000002bd28d378 XUL`MessageLoop::RunHandler(this=0x000000016d412dc8) at message_loop.cc:374:3
frame #13: 0x00000002bd28d2f4 XUL`MessageLoop::Run(this=0x000000016d412dc8) at message_loop.cc:356:3
frame #14: 0x00000002bbf8469c XUL`nsThread::ThreadFunc(aArg=0x00000001138f5c40) at nsThread.cpp:383:10
frame #15: 0x00000001042c1e04 libnss3.dylib`_pt_root(arg=0x00000001032cd8b0) at ptthread.c:201:5
frame #16: 0x0000000196b4e06c libsystem_pthread.dylib`_pthread_start + 148
Assignee | ||
Comment 18•3 years ago
|
||
(In reply to Dan Baker[:dbaker] from comment #17)
I'm seeing a crash on my M1 MacOS 13 running a debug build with this change 2e39366a830c.
thread #10, name = 'Socket Thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x00000002bcb8b608 XUL`mozilla::net::Http2Session::Caps(this=0x00000001770f4e00) at Http2Session.cpp:4319:3 frame #1: 0x00000002bcda9adc XUL`mozilla::net::nsHttpConnection::StartSpdy(this=0x0000000173785000, sslControl=0x0000000173750400, spdyVersion=HTTP_2) at nsHttpConnection.cpp:299:38 frame #2: 0x00000002bcdb5914 XUL`mozilla::net::nsHttpConnection::HandshakeDoneInternal(this=0x0000000173785000) at nsHttpConnection.cpp:2370:7 frame #3: 0x00000002bcd57d78 XUL`mozilla::net::TlsHandshaker::HandshakeDone(this=0x0000000163957c98)::$_22::operator()() const at TlsHandshaker.cpp:45:27 frame #4: 0x00000002bcd57c7c XUL`mozilla::detail::RunnableFunction<mozilla::net::TlsHandshaker::HandshakeDone()::$_22>::Run(this=0x0000000163957c70) at nsThreadUtils.h:531:5 frame #5: 0x00000002bbf89e2c XUL`nsThread::ProcessNextEvent(this=0x0000000113c8b580, aMayWait=true, aResult=0x000000016d412527) at nsThread.cpp:1198:16 frame #6: 0x00000002bbf91c20 XUL`NS_ProcessNextEvent(aThread=0x0000000113c8b580, aMayWait=true) at nsThreadUtils.cpp:465:10 frame #7: 0x00000002bc498100 XUL`mozilla::net::nsSocketTransportService::Run(this=0x0000000103ba9900) at nsSocketTransportService2.cpp:1235:11 frame #8: 0x00000002bbf89e2c XUL`nsThread::ProcessNextEvent(this=0x0000000113c8b580, aMayWait=false, aResult=0x000000016d412ae7) at nsThread.cpp:1198:16 frame #9: 0x00000002bbf91c20 XUL`NS_ProcessNextEvent(aThread=0x0000000113c8b580, aMayWait=false) at nsThreadUtils.cpp:465:10 frame #10: 0x00000002bd423e6c XUL`mozilla::ipc::MessagePumpForNonMainThreads::Run(this=0x0000000113c91400, aDelegate=0x000000016d412dc8) at MessagePump.cpp:300:20 frame #11: 0x00000002bd28d434 XUL`MessageLoop::RunInternal(this=0x000000016d412dc8) at message_loop.cc:381:10 frame #12: 0x00000002bd28d378 XUL`MessageLoop::RunHandler(this=0x000000016d412dc8) at message_loop.cc:374:3 frame #13: 0x00000002bd28d2f4 XUL`MessageLoop::Run(this=0x000000016d412dc8) at message_loop.cc:356:3 frame #14: 0x00000002bbf8469c XUL`nsThread::ThreadFunc(aArg=0x00000001138f5c40) at nsThread.cpp:383:10 frame #15: 0x00000001042c1e04 libnss3.dylib`_pt_root(arg=0x00000001032cd8b0) at ptthread.c:201:5 frame #16: 0x0000000196b4e06c libsystem_pthread.dylib`_pthread_start + 148
Thanks for letting me know.
I'll submit a patch in bug 1800690 to fix this.
Comment 19•3 years ago
|
||
(In reply to Kershaw Chang [:kershaw] from comment #18)
(In reply to Dan Baker[:dbaker] from comment #17)
I'm seeing a crash on my M1 MacOS 13 running a debug build with this change 2e39366a830c.
thread #10, name = 'Socket Thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x00000002bcb8b608 XUL`mozilla::net::Http2Session::Caps(this=0x00000001770f4e00) at Http2Session.cpp:4319:3 frame #1: 0x00000002bcda9adc XUL`mozilla::net::nsHttpConnection::StartSpdy(this=0x0000000173785000, sslControl=0x0000000173750400, spdyVersion=HTTP_2) at nsHttpConnection.cpp:299:38 frame #2: 0x00000002bcdb5914 XUL`mozilla::net::nsHttpConnection::HandshakeDoneInternal(this=0x0000000173785000) at nsHttpConnection.cpp:2370:7 frame #3: 0x00000002bcd57d78 XUL`mozilla::net::TlsHandshaker::HandshakeDone(this=0x0000000163957c98)::$_22::operator()() const at TlsHandshaker.cpp:45:27 frame #4: 0x00000002bcd57c7c XUL`mozilla::detail::RunnableFunction<mozilla::net::TlsHandshaker::HandshakeDone()::$_22>::Run(this=0x0000000163957c70) at nsThreadUtils.h:531:5 frame #5: 0x00000002bbf89e2c XUL`nsThread::ProcessNextEvent(this=0x0000000113c8b580, aMayWait=true, aResult=0x000000016d412527) at nsThread.cpp:1198:16 frame #6: 0x00000002bbf91c20 XUL`NS_ProcessNextEvent(aThread=0x0000000113c8b580, aMayWait=true) at nsThreadUtils.cpp:465:10 frame #7: 0x00000002bc498100 XUL`mozilla::net::nsSocketTransportService::Run(this=0x0000000103ba9900) at nsSocketTransportService2.cpp:1235:11 frame #8: 0x00000002bbf89e2c XUL`nsThread::ProcessNextEvent(this=0x0000000113c8b580, aMayWait=false, aResult=0x000000016d412ae7) at nsThread.cpp:1198:16 frame #9: 0x00000002bbf91c20 XUL`NS_ProcessNextEvent(aThread=0x0000000113c8b580, aMayWait=false) at nsThreadUtils.cpp:465:10 frame #10: 0x00000002bd423e6c XUL`mozilla::ipc::MessagePumpForNonMainThreads::Run(this=0x0000000113c91400, aDelegate=0x000000016d412dc8) at MessagePump.cpp:300:20 frame #11: 0x00000002bd28d434 XUL`MessageLoop::RunInternal(this=0x000000016d412dc8) at message_loop.cc:381:10 frame #12: 0x00000002bd28d378 XUL`MessageLoop::RunHandler(this=0x000000016d412dc8) at message_loop.cc:374:3 frame #13: 0x00000002bd28d2f4 XUL`MessageLoop::Run(this=0x000000016d412dc8) at message_loop.cc:356:3 frame #14: 0x00000002bbf8469c XUL`nsThread::ThreadFunc(aArg=0x00000001138f5c40) at nsThread.cpp:383:10 frame #15: 0x00000001042c1e04 libnss3.dylib`_pt_root(arg=0x00000001032cd8b0) at ptthread.c:201:5 frame #16: 0x0000000196b4e06c libsystem_pthread.dylib`_pthread_start + 148
Thanks for letting me know.
I'll submit a patch in bug 1800690 to fix this.
Just to be clear this is happening on startup preventing me from launching the debug build.
Comment 20•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Comment 21•3 years ago
|
||
bugherder |
Comment 22•3 years ago
|
||
bugherder |
Description
•