Compare Revisions feature does not work on at least one page

NEW
Unassigned

Status

--
major
4 years ago
4 years ago

People

(Reporter: dave, Unassigned)

Tracking

Details

(Whiteboard: [specification][type:bug], URL)

(Reporter)

Description

4 years ago
What did you do?
================
1. Visit https://developer.mozilla.org/en-US/docs/Profile_Manager
2. Click gear icon, History.
3. Select latest two revisions, click Compare Selected Revisions.

Shortcut:
https://developer.mozilla.org/en-US/docs/Profile_Manager$compare?to=513053&from=512703

What happened?
==============
Red error message: "There was an error generating the content."

What should have happened?
==========================
Display diff between revisions.

Is there anything else we should know?
======================================
(Reporter)

Updated

4 years ago
Component: General → Wiki pages
OS: Other → All

Updated

4 years ago
Severity: normal → major
Status: UNCONFIRMED → NEW
Ever confirmed: true
I can find other pages that do this, but some work fine. I can't tell rhyme or reason. And right this moment, the "Profile Manager" page is the only one I'm finding. But I know I found several on Friday.
Code-wise, this message is the result of the diff library crashing with a recursion-depth error. In this specific case, the error is coming from difflib trying to wrap long lines at a set point (to avoid long lines which scroll off the page). The problem is that some of the lines in the revision being compared to are data: URLs containing raw base64-encoded images, which produce huge lines of text difflib tries, and fails, to recursively split and wrap, until it hits Python's recursion limit and crashes.

I can't really think of anything we can do upstream in the codebase to work around this, other than perhaps disallow base64-encoded data: URLs being used as images.
:ubernostrum and I are wondering - how much, and why, do we use base64-encoded data: urls for images in content?
Flags: needinfo?(eshepherd)
(In reply to Luke Crouch [:groovecoder] from comment #3)
> :ubernostrum and I are wondering - how much, and why, do we use
> base64-encoded data: urls for images in content?

I would assume that if they're used at all, it's just for the purpose of demonstrating how they work. I had no idea we did it anywhere at all though. Certainly I don't like the idea, personally.
Flags: needinfo?(eshepherd)
(Reporter)

Comment 5

4 years ago
James,(In reply to James Bennett [:ubernostrum] from comment #2)
> [...] huge lines of text difflib tries, and fails, to recursively split
> and wrap, until it hits Python's recursion limit and crashes.

Let's fix that.  By updating python, probably.  Is your current version less than 2.4.3?

Would you post the stack trace?  There's more than one old bug in difflib, apparently, and I'm not sure that I've been googlging the right one.
(In reply to dave from comment #5)
> Let's fix that.  By updating python, probably.  Is your current version less
> than 2.4.3?

MDN production is Python 2.6.6.
 
> Would you post the stack trace?  There's more than one old bug in difflib,
> apparently, and I'm not sure that I've been googlging the right one.

Here (taken from walking through the code with the content of the revisions reported in this bug):

/usr/lib64/python2.6/difflib.pyc in _split_line(self, data_list, line_num, text)
   1773
   1774         # use this routine again to wrap the remaining text

-> 1775         self._split_line(data_list,'>',line2)
   1776
   1777     def _line_wrapper(self,diffs):

/usr/lib64/python2.6/difflib.pyc in _split_line(self, data_list, line_num, text)
   1773
   1774         # use this routine again to wrap the remaining text

-> 1775         self._split_line(data_list,'>',line2)
   1776
   1777     def _line_wrapper(self,diffs):

/usr/lib64/python2.6/difflib.pyc in _split_line(self, data_list, line_num, text)
   1747         mark = ''
   1748         while n < max and i < size:
-> 1749             if text[i] == '\0':
   1750                 i += 1
   1751                 mark = text[i]

I find it unlikely that a Python 2.4 bug would be affecting us, btw :)
I have thought today about using data urls instead of an image in the content. 

If I want to demonstrate image manipulation with canvas for example, I will need an image in my live sample and as there are no images on mdn.mozillademos.org and same origin policy prevents me from using e.g. https://developer.cdn.mozilla.net/media/redesign/img/MDNLogo.png, I have to think of another solution. So, data urls came to mind.

We shouldn't use them at all at the moment? Will this bug fix that?
Flags: needinfo?(jbennett)
Not sure what to do with samples, but if we want compare revisions to work we probably need to disallow data URLs. Also is probably a better security practice to disallow them.
Flags: needinfo?(jbennett)
Did we make a decision here.
:ubernostrum - so it sounds like if we (continue to) allow data urls, compare revisions will always break when they are used, and we have increased security risk? Or how much work do you estimate it will take to allow data urls AND mitigate security risks AND fix compare view?
Flags: needinfo?(jbennett)
Let's go ahead and just disallow data URLs then.

This won't apply to live samples, right? We would want to be able to use them there, for the purpose of demonstrating them, for instance.
(In reply to Luke Crouch [:groovecoder] from comment #10)
> :ubernostrum - so it sounds like if we (continue to) allow data urls,
> compare revisions will always break when they are used, and we have
> increased security risk? Or how much work do you estimate it will take to
> allow data urls AND mitigate security risks AND fix compare view?

Well, any time the URL is sufficiently long to make difflib bomb out trying to wrap lines, the compare view will break.

The security risks are:

1. The obvious problem that it's more difficult to analyze the URL to figure out what it is/what it's doing.

2. The more insidious problem is that data URLs enable things like sneaky phishing attacks where a data URL is used to encode the desired phishing content.

Need to think a bit to estimate how much work involved in killing them off.
Flags: needinfo?(jbennett)
(In reply to James Bennett [:ubernostrum] from comment #12)
> (In reply to Luke Crouch [:groovecoder] from comment #10)
> > :ubernostrum - so it sounds like if we (continue to) allow data urls,
> > compare revisions will always break when they are used, and we have
> > increased security risk? Or how much work do you estimate it will take to
> > allow data urls AND mitigate security risks AND fix compare view?
> 
> Well, any time the URL is sufficiently long to make difflib bomb out trying
> to wrap lines, the compare view will break.

Sounds like this could happen for any long line. And disallowing data URLs now doesn't solve the problem reported here. So IMO turning off data URL support should be part of a separate bug.

To avoid the bug here wouldn't it be possible to simply split the string into smaller chunks first, i.e. after a specific number of characters, before running it through difflib, and afterwards join them again into one string?

Sebastian
(In reply to Sebastian Zartner [:sebo] from comment #13)

> Sounds like this could happen for any long line. And disallowing data URLs
> now doesn't solve the problem reported here. So IMO turning off data URL
> support should be part of a separate bug.

Yeah, that's a good point. We do need to actually fix this underlying problem, regardless of the state of data URL support.
You need to log in before you can comment on or make changes to this bug.