Can no longer select individual requests in Developer tools / network tab
Categories
(DevTools :: Netmonitor, defect, P3)
Tracking
(firefox81 fixed)
Tracking | Status | |
---|---|---|
firefox81 | --- | fixed |
People
(Reporter: kremit, Assigned: user, NeedInfo)
References
Details
(Keywords: regressionwindow-wanted)
Attachments
(2 files)
User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0
Steps to reproduce:
Open the Web Developer tools, then Network tab. Attempt to select an individual HTTP request.
Actual results:
In some cases (see screenshot), multiple entries are selected. You can double-click on an entry to resend the request in a new tab, but you cannot select one to view details of that request as other entries are also selected. This appears to be a regression around FF 72.
Expected results:
Previous versions, and other dev tools like Chrome, allow you to choose individual requests and view their details in the sidebar as usual.
Comment 1•5 years ago
|
||
Bugbug thinks this bug should belong to this component, but please revert this change in case of error.
Updated•5 years ago
|
Comment 2•5 years ago
|
||
I was testing this using VirtualBox and Fedora ISO downloaded from here:
https://download.fedoraproject.org/pub/fedora/linux/releases/32/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-32-1.6.iso
And I still can't reproduce the multi-selection problem.
Wesley, can you please try to see the problem in Firefox Nightly?
https://www.mozilla.org/en-US/firefox/channel/desktop/
Honza
Comment 3•4 years ago
|
||
I can't reproduce this issue either, tested in Ubuntu 18.04 using the latest Nightly 79.0a1 and Firefox 77.
Wesley, in your previous attempts of reproducing this issue, did you ever used a clean profile? If not, could you please give it a try?
https://support.mozilla.org/en-US/kb/profile-manager-create-remove-switch-firefox-profiles
Assignee | ||
Comment 4•4 years ago
|
||
I seem to suffer from the same issue. I tried using Fedora Live ISO (version 32-1.6) inside VirtualBox, and wasn't able to reproduce the error. However, I still can readily reproduce the issue on my machine, using the very same Firefox version (78.0b4). Can I provide any more information to aid in the debugging?
Assignee | ||
Comment 5•4 years ago
|
||
I'll try building from source code on my machine and poking around the code. Any advice on where I should look to provide more information for the developers?
Comment 6•4 years ago
|
||
We seem to have found an STR that relate
Could you try https://bugzilla.mozilla.org/show_bug.cgi?id=1628162#c13
Assignee | ||
Comment 7•4 years ago
|
||
Yes, I can see the issue using this STR, both on my machine (Arch Linux) and Fedora in VirtualBox.
Comment 8•4 years ago
|
||
Tried to reproduce this issue again in order to provide a regression window - used the steps indicated in Comment 6 and also the steps from the duplicate reports of Bug 1628162. I still cannot reproduce it on my machine. Considering this, I'm going to add the whiteboard [qa-regression-triage], please feel free to remove it if additional information that could lead to finding the regression window are added.
Assignee | ||
Comment 9•4 years ago
|
||
I suspect that this isn't a regression inside Firefox, but instead it's somehow related to changes in lower-level components (like for example the kernel). Why do I think that: because testing with "stable" distributions (Ubuntu, Fedora) in VirtualBox doesn't repoduce the issue, even though exactly the same Firefox build is used.
I'm still trying to build Firefox on my machine to try drilling deep into the failure, but unfortunately I'm pressed for time and compiling Firefox turned out to be harder than I thought (currently I'm hitting some codegen bugs inside one of the Rust components).
Assignee | ||
Comment 10•4 years ago
|
||
I finished building the Firefox on my machine (the "tip" branch). The bug is still present in the freshly built instance.
I'll try searching for the bug myself, but given that I'm unfamiliar with the codebase and Firefox is a huge project that will probably take a lot of time. Meanwhile, is there anything that I can do on my machine (maybe tweak some code, provide debug output, etc) that will help you diagnose the cause of the error?
Assignee | ||
Comment 11•4 years ago
|
||
This is bizzare, but just as I sat down and started drilling into netmonitor code, the bug stopped reproducing - both on the freshly built version and on the stock one on my machine. I'm quite sure that I didn't do any drastic changes to my environment during that time - no reboots, no package updates (I verified that using system logs and bash history).
I'll keep an eye out in case the problem reoccurs.
Assignee | ||
Comment 12•4 years ago
|
||
Hooray, the bug came back!
Comment 13•4 years ago
|
||
I still can't repro this issue (running Windows + Firefox Nightly/Beta/Release)
Is anyone else able to see this and provide baby-steps to reproduce it?
Honza
Assignee | ||
Comment 14•4 years ago
|
||
I can reproduce the bug reliably at the moment, but suspect that this is somehow related to network activity outside Firefox (it stopped reproducing yesterday, came back today, all without any relevant changes on the machine).
Your analysis in bug 1628162 seems to be on point. I'm currently working through NetworkMonitorCode to verify your analysis in the second comment, will get back with additional details.
At the moment I can definitely say that several (different) requests in a row get exactly the same actor ids:
// console.log("server/network-monitor sends networkEvent", this.parentID, actor.form().actor, actor.form().url);
server/network-monitor sends networkEvent server1.conn0.consoleActor3 server1.conn0.netEvent239 http://localhost:5501/test.png/000
server/network-monitor sends networkEvent server0.conn0.child2/consoleActor2 server0.conn0.netEvent24 http://localhost:5501/test.png/000
server/network-monitor sends networkEvent server1.conn0.consoleActor3 server1.conn0.netEvent239 http://localhost:5501/test.png/001
server/network-monitor sends networkEvent server0.conn0.child2/consoleActor2 server0.conn0.netEvent24 http://localhost:5501/test.png/001
server/network-monitor sends networkEvent server1.conn0.consoleActor3 server1.conn0.netEvent239 http://localhost:5501/test.png/002
server/network-monitor sends networkEvent server0.conn0.child2/consoleActor2 server0.conn0.netEvent24 http://localhost:5501/test.png/002
server/network-monitor sends networkEvent server1.conn0.consoleActor3 server1.conn0.netEvent280 http://localhost:5501/test.png/003
server/network-monitor sends networkEvent server0.conn0.child2/consoleActor2 server0.conn0.netEvent65 http://localhost:5501/test.png/003
server/network-monitor sends networkEvent server1.conn0.consoleActor3 server1.conn0.netEvent294 http://localhost:5501/test.png/004
server/network-monitor sends networkEvent server0.conn0.child2/consoleActor2 server0.conn0.netEvent79 http://localhost:5501/test.png/004
Assignee | ||
Comment 15•4 years ago
|
||
I'm not familiar with the codebase - is "channel" in network-observer supposed to be shared between several different requests? Currently the bad events originate from the same channelId, and that channelId is used to look up the actor, and that in turn leads to request id collisions.
P.S. Maybe it will be more appropriate to ask newbie questions like above in some Matrix room - which one?
Comment 16•4 years ago
|
||
(In reply to Platon Pronko from comment #14)
Your analysis in bug 1628162 seems to be on point. I'm currently working through NetworkMonitorCode to verify your analysis in the second comment, will get back with additional details.
Excellent, thanks for doing that!
(In reply to Platon Pronko from comment #15)
I'm not familiar with the codebase - is "channel" in network-observer supposed to be shared between several different requests? Currently the bad events originate from the same channelId, and that channelId is used to look up the actor, and that in turn leads to request id collisions.
channel
is supposed to be used for one request only
The problem might be in the way we track these channels.
-
We use
openRequests
map to track all open channels
https://searchfox.org/mozilla-central/rev/72a1334982cadde0ca8b3d3583877afea80f5639/devtools/server/actors/network-monitor/network-observer.js#157 -
We are adding new channels into the map here
https://searchfox.org/mozilla-central/rev/72a1334982cadde0ca8b3d3583877afea80f5639/devtools/server/actors/network-monitor/network-observer.js#923 -
We are sending an event to the client side (to the Network panel) here
https://searchfox.org/mozilla-central/rev/72a1334982cadde0ca8b3d3583877afea80f5639/devtools/server/actors/network-monitor/network-observer.js#839
It might be that we are sending this event multiple times for the same channel. Perhaps removing the channel from this.openRequests
at some point and then re-adding it back and send to the client again - just as if it would be a new request....
- Also note that we are using this
ChannelMap
foropenRequests
https://searchfox.org/mozilla-central/rev/72a1334982cadde0ca8b3d3583877afea80f5639/devtools/server/actors/network-monitor/utils/channel-map.js#26
P.S. Maybe it will be more appropriate to ask newbie questions like above in some Matrix room - which one?
DevTools
orDevTools Network Monitor
Honza
Assignee | ||
Comment 17•4 years ago
|
||
Unfortunately the bug stopped reproducing again. Could it possibly be somehow time-related? It stopped reproducing yesterday at about the same time. Will keep trying.
Possibly interesting: I had left one tab open from two hours ago, and the bug still reproduces inside it. But if I open a new tab, or start a new browser instance the bug doesn't reproduce.
It doesn't look that activity objects or the channel map are causing the issue. The channels already arrive with same channelId
but different name
. For example here's a log from NetworkObserver._onRequestHeader (log from NetworkObserver._createNetworkEvent is similar):
// console.log("NetworkObserver._onRequestHeader", channel.channelId, channel.name);
NetworkObserver._onRequestHeader 16502183474233400 http://localhost:5501/
NetworkObserver._onRequestHeader 16502183474233400 http://localhost:5501/
NetworkObserver._onRequestHeader 16502183474233404 http://localhost:5501/test.png/000
NetworkObserver._onRequestHeader 16502183474233404 http://localhost:5501/test.png/000
NetworkObserver._onRequestHeader 16502183474233404 http://localhost:5501/test.png/001
NetworkObserver._onRequestHeader 16502183474233404 http://localhost:5501/test.png/001
NetworkObserver._onRequestHeader 16502183474233404 http://localhost:5501/test.png/002
NetworkObserver._onRequestHeader 16502183474233404 http://localhost:5501/test.png/002
NetworkObserver._onRequestHeader 16502183474233406 http://localhost:5501/test.png/003
NetworkObserver._onRequestHeader 16502183474233406 http://localhost:5501/test.png/003
NetworkObserver._onRequestHeader 16502183474233408 http://localhost:5501/test.png/004
NetworkObserver._onRequestHeader 16502183474233408 http://localhost:5501/test.png/004
Note how channel *404 is shown several times with different urls.
Assignee | ||
Comment 18•4 years ago
|
||
So, the bug is back and I managed to make significant progress.
The problem is that nsHttpChannel
stores channelId
as unit64_t
. But JavaScript uses floats for numbers, thus if channelId
is bigger than 2^52 then information in lower bits will be lost.
I haven't looked into channelId generation code yet, but it seems that ids are generated sequentially, at least if requests are all made at the same time. So while in C++ code ids are correct, by the time they get to NetworkObserver several neighbouring ids collapse into one. Here's an example (ids are taken from my actual request logs):
$ node
Welcome to Node.js v12.18.1.
Type ".help" for more information.
> 16125102525513871
16125102525513872
> 16125102525513872
16125102525513872
> 16125102525513873
16125102525513872
I'm off to look at channelId generation code. Just a shot in the dark - could it be possible that channelId generator looks at the tabs' PID?
Assignee | ||
Comment 19•4 years ago
|
||
Indeed, my guess was correct - ids are generated sequentially and use the PID. Here's the actual code:
nsresult nsHttpHandler::NewChannelId(uint64_t& channelId) {
channelId =
((static_cast<uint64_t>(mProcessId) << 32) & 0xFFFFFFFF00000000LL) |
mNextChannelId++;
return NS_OK;
}
Thus if we get a PID greater than ~2000000 generated channel ids will overflow "safe" integer-float range and the errors will surface.
Usually PIDs are less than 32768, but it's configurable by writing /proc/sys/kernel/pid_max
. Actual max allowed PID value is PID_MAX_LIMIT, 2^22.
And systemd from version 243 sets pid_max
to 2^22 on 64-bit systems by default (https://github.com/systemd/systemd/blob/master/NEWS#L1182).
So the fix is simple - we're actually only one bit short, so we just need to shift the pid by one bit less. This works on the whole range of possible PIDs:
@@ -2743,7 +2743,7 @@ void nsHttpHandler::ShutdownConnectionManager() {
nsresult nsHttpHandler::NewChannelId(uint64_t& channelId) {
channelId =
- ((static_cast<uint64_t>(mProcessId) << 32) & 0xFFFFFFFF00000000LL) |
+ ((static_cast<uint64_t>(mProcessId) << 31) & 0xFFFFFFFF80000000LL) |
mNextChannelId++;
return NS_OK;
}
(I tested it by forking lots of processes to cycle through PID range, and the bug goes away on all pids from 0 to PID_MAX_LIMIT).
Assignee | ||
Comment 20•4 years ago
|
||
What is the purpose of & 0xFFFFFFFF00000000LL
in nsHttpHandler::NewChannelId
? We already have uint64_t on the lhs, thus there we are not zeroing upper bits - and we just did the left shift, thus zeroing out lower bits doesn't make sense either.
Comment 21•4 years ago
|
||
Thanks Platon for investigating this.
This looks clearly like a platform issue?
Would you like to open a patch with the fix?
Also adding someone from the networking platform team.
Honza,
Not sure who is best to help with Platon's observations and question in comment 20.
Thanks
Assignee | ||
Comment 22•4 years ago
|
||
Ok, I'll start preparing the patch. How to submit a patch tutorial states that patch submitter should be the assignee of the bug - can somebody assign me to this bug?
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 23•4 years ago
|
||
On Linux PID_MAX_LIMIT is 2^22. By default pid_max was set to 2^15, but
systemd since version 243 sets it to 2^22.
nsHttpHandler::NewChannelId uses PID shifted by 32 bits to generate
channelId, and that results in values greater than 2^53.
Such values are fine for uint64_t, but when we pass channelId to
devtools it gets converted to JavaScript floats, and values greater
than 2^53 get errors in lower two bits. This in turn leads to much
sorrow, like request id collisions or "no headers for this request".
This patch shifts the PID by one bit less, resulting in channelIds
that always fit into safe JavaScript integer range for all PIDs.
Comment 24•4 years ago
|
||
Platon, you've done great work, thanks for the patch!
Let's see what the Necko team says.
Comment 25•4 years ago
|
||
Comment 26•4 years ago
|
||
Platon, Valentin, thanks for working on this. This is great improvement!
Honza
Comment 27•4 years ago
|
||
bugherder |
Description
•