poisoning sessionstore + document.title heart attack (emoji problem) = persistant dos scenario
Categories
(Firefox :: Session Restore, defect)
Tracking
()
People
(Reporter: p.wojciechowski, Unassigned, NeedInfo)
References
(Depends on 1 open bug)
Details
(Keywords: csectype-dos, reporter-external, Whiteboard: [client-bounty-form])
Attachments
(1 file)
|
1.86 MB,
application/pdf
|
Details |
Hello guys.
I found 3 bugs and described them in detail in attached pdf
Best regards,
Piotr Wojciechowski
p.wojciechowski@mixedpixel.pl
| Reporter | ||
Comment 1•1 year ago
|
||
Comment 2•1 year ago
|
||
You should file 3 separate bugs in Bugzilla for these 3 separate issues. You can repurpose this for one of the three issues or file new issues entirely.
Please attach test cases to the bug instead of describing the test case in a PDF. A PDF is not really necessary for a straightforward and short report like this. Attaching test cases will make it easier to investigate the issues, which will get it triaged and fixed more quickly.
I don't think these really qualify as a persistent DOS. The browser does have some built in detection for when it fails immediately on startup, so it should stop reloading the previous session after a few tries.
Updated•1 year ago
|
| Reporter | ||
Comment 3•1 year ago
|
||
I will do that, but please take a look at my screen record. Imho its persistant,
because default action after few fails didnt solve issue.
Its about third bug (emoji problem in document.title) described in my pdf:
https://mixedpixel.pl/bb/152ecfb3-4f7a-4a38-9256-a3159642d2e2/heart.mov
Piotr Wojciechowski
Comment 4•1 year ago
|
||
Your test case waits 5 seconds before triggering the hang, so the user will have 5 seconds to close the tab. If the hang happens quickly before the user can do anything, then I think it should trigger the behavior where it stops loading the previous session, but I have not verified this.
On my own MacOS machine, the large emoji title made the browser UI difficult to interact with, but I was able to open a new tab and then close the original tab. I don't know if that's because my machine is very powerful or maybe there's some specific different in the versions of MacOS we're using. I ran the profiler, and it looked like the CPU time was being spent in some kind of system font rendering code. While I could close the tab, I did get a shutdown hang, so maybe the browser wasn't entirely back to normal.
Comment 5•1 year ago
|
||
It looks like we have bug 1868925 already on file for the "large emoji title" issue, so there's no need to file something for that.
| Reporter | ||
Updated•1 year ago
|
Comment 6•1 year ago
|
||
Irrespective of the specific cause of the perf issue/hang (bug 1868925 in this case), is there more we could do at start-up to detect repeated problems like this and help the user get out of it? If the misbehaving tab causes the problem "during startup" we take some actions, but in this case the attack is delayed to make sure the bad tab is actually saved by session restore.
Maybe this should go in the session restore component? Or is it a general browser start-up issue.
It doesn't need to be hidden... this is not the first time we've seen this kind of thing but maybe it's a convenient hook to think about solutions.
Comment 7•1 year ago
|
||
In terms of restoring a session, there is a crash screen that is shown if a tab has caused an issue with saving the session, and there is work to improve it but that is mostly about the UX side of it see bug 1625831.
Its not clear to me what the expectation is with "helping the user get out of it"... I've cc'd Gijs and Dao as they might be able to weigh in on this.
Comment 8•1 year ago
|
||
Interestingly, I just resolved bug 1896088 which describes a similar issue. When a single tab is crashing or causing problems, ideally we would have a way to not restore that tab. Knowing it is "causing problems" is a bit ambiguous though. If it crashed that's a signal we can feed into session restore. If its using too much memory that's harder to know - how much is too much? That is very much the case here - at the time we capture and then restore the tab on startup, there is nothing wrong with this tab. Its only later is causes problems.
In general, about:sessionrestore has the information the user might need to selectively restore some tabs and not others. We could show this in more circumstances than we do currently. But that assumes the user knows which tab was the culprit and can pick it out of the list - which isn't always going to be true.
So I'm not sure this is really a session restore problem. If we do have a signal that a tab is running too slow, gobbling up memory or whatever then yes lets get a bug on file to use that information in session restore. But for the specific case, I don't think that would help.
Comment 9•1 year ago
|
||
I think a number of things are true here:
- the behaviour of the tab is bad
- the fact that it is bad is not in any way exposed / obvious to session restore code if seen separate from the rest of Firefox, so it's not like we can just add an extra
ifcondition somewhere in session restore and then not restore the tab. I don't see a good proxy for this type of thing currently existing. - the fact that session restore inadvertently exacerbates the badness (by putting the user in a position where they experience the badness repeatedly if automatic session restore is enabled) is still bad.
I don't know what a solution here looks like but it's going to take some folks working together and thinking about the problem and then adapting session restore to store more information about tabs, and then use that information to make different decisions on startup.
From the bug Sam referenced:
I don't see how the browser is to know whether a tab is a non-response tab
We don't store this right now but I think we could do something. Solving the general halting problem is obviously a non-starter, but we can actually detect whether we're hanging or using lots of CPU. Florian may be able to help expand on this - we have something called BHR, background hang reporter, on Nightly, and we have the slow script warning, and the profiler knows about CPU use.
For correlating with parent process badness (which is sort of what this bug is about), I expect we're a bit more limited, for 2 reasons:
- parent process activity is often not directly correlatable with a tab (unlike child-process-to-tab linkage which is fairly direct)
- if the parent process is busy hanging due to a "bad" tab, we're gonna have a bad time writing session restore information (like "hey maybe don't restore this tab") from said parent process.
Either way though: we don't write any information about hanging / CPU use to session restore - only whether we shut down cleanly. Perhaps we want to do something different in the "spontaneous crash despite being responsive previously" and "the user killed us after we were hung for 20s" cases. Perhaps instead of increasing max_resumed_crashes, we should decrease it so that we always show the page after a crash (but also make the session restore page more user-friendly, and perhaps exclude OS restarts/sleeps which I believe currently mean we show the page way too often) ?
Hopefully Florian has ideas for the detection side of things.
| Reporter | ||
Comment 10•1 year ago
|
||
Hi guys. Let's talk about sessionstore and recovery.jsonlz4 file.
I am sure you guys know more about this whole mechanism than I do, but I just wanted to share a few observations:
- looks like file "sessionstore-backups/recovery.jsonlz4" represents session store and consitency of this data file is important in tabs recovery operation. According to comments that i found in resource:///modules/sessionstore/SessionFile.sys.mjs
// This file [recovery.jsonlz4] is designed to protect against crashes / sudden shutdown"
- recovery.jsonlz4 is compressed json file which can be decompressed like this:
file = \`${PathUtils.profileDir}/sessionstore-backups/recovery.jsonlz4\`;
data = await IOUtils.readJSON(file, {decompress:true});
IOUtils.writeJSON(file+".uncompressed.json", data);
- according to my observations both active and inactive tab can trigger sessionstore update operation (writing to recovery.jsonlz4)
- constantly changing value of document.title can trigger sessionstore update not more often that ~16sec [see my screenrecord below]
- even empty new tab is able to trigger sessionstore update operation (strange)
Don't miss my screenrecord where I watched syscalls:
https://mixedpixel.pl/bb/152ecfb3-4f7a-4a38-9256-a3159642d2e2/session_store_dtrace.mov
- on my machine this uncompressed recovery.jsonlz4 file is over 12MB heavy because this long title heart payload in the past. In my opinion, the title length should be fixed to something like the first 1kB. Retrieving such long titles from the session store is a problem in itself and can cause on my Mac the Spinning Beach Ball of Death (confirmed info) [check my screen: https://mixedpixel.pl/bb/152ecfb3-4f7a-4a38-9256-a3159642d2e2/sessionstore_uncompressed_heart_problem.png]
- btw. not sure why _closedWindows array is still there in sessionstore after so many FF restarts? Is this a leak? [same screen: https://mixedpixel.pl/bb/152ecfb3-4f7a-4a38-9256-a3159642d2e2/sessionstore_uncompressed_heart_problem.png]
It is my hope that I have provided a new perspective on this issue.
| Reporter | ||
Comment 11•1 year ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #4)
Your test case waits 5 seconds before triggering the hang, so the user will have 5 seconds to close the tab. If the hang happens quickly before the user can do anything, then I think it should trigger the behavior where it stops loading the previous session, but I have not verified this.
On my own MacOS machine, the large emoji title made the browser UI difficult to interact with, but I was able to open a new tab and then close the original tab. I don't know if that's because my machine is very powerful or maybe there's some specific different in the versions of MacOS we're using. I ran the profiler, and it looked like the CPU time was being spent in some kind of system font rendering code. While I could close the tab, I did get a shutdown hang, so maybe the browser wasn't entirely back to normal.
Here is my improved DoS exploit. Let me know how it works for u and please make sure you have enabled "open previous windows and tabs"
https://mixedpixel.pl/bb/152ecfb3-4f7a-4a38-9256-a3159642d2e2/sessionstore_poisoning.html
One more thing. In my opinion "Open" button in Troubleshoot Mode should wipe sessionstore. Maybe this is the beginning of a good change? 🖐
| Reporter | ||
Updated•1 year ago
|
Comment 12•1 year ago
|
||
The Bugbug bot thinks this bug should belong to the 'Firefox::PDF Viewer' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Updated•1 year ago
|
Comment 13•1 year ago
|
||
Either way though: we don't write any information about hanging / CPU use to session restore - only whether we shut down cleanly. Perhaps we want to do something different in the "spontaneous crash despite being responsive previously" and "the user killed us after we were hung for 20s" cases. Perhaps instead of increasing
max_resumed_crashes, we should decrease it so that we always show the page after a crash (but also make the session restore page more user-friendly, and perhaps exclude OS restarts/sleeps which I believe currently mean we show the page way too often) ?
We do have a few bugs on file to improve about:sessionrestore and make that something we can show more often and provide more choices and information to the user. UX has even done some work on this problem - as referenced in bug 1625831. You could even imagine about:sessionrestore being optionally set as a home page so the first thing you are presented with on startup is some UI to pick which windows and tabs you want to resume.
I'm very interested in getting more context into the session backup so we can make smarter decisions on the next startup. If we did skip restoring a possibly-problematic tab, I think we would need to replace it with something (maybe a view into about:sessionrestore that filters for just that tab/URL) that informs the user what happened and give them an explicit choice to restore it or not. Rather than just mysteriously losing a tab.
Updated•1 year ago
|
Comment 14•1 year ago
|
||
This looks to be part of Session Restore; moving it out of the general triage pile.
Description
•