Closed
Bug 715061
Opened 14 years ago
Closed 12 years ago
some Unicode characters can break a pad from loading
Categories
(Websites Graveyard :: etherpad.mozilla.org, defect)
Websites Graveyard
etherpad.mozilla.org
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: carla, Unassigned)
Details
Attachments
(2 files)
My etherpad: https://etherpad.mozilla.org/obi-documentation hangs on loading. As a workaround, I copied its text out of the etherpad, brought it into Mac's TextEdit, made it into plain text, copied that plain text into another etherpad and it hangs the new pad, too. (This is repeatable, I have a slew of dead etherpads now.) Can this original pad be brought back to life?
Additionally, when you receive a "Disconnected" alert box on an etherpad it offers a link to "let us know," of the problem, but that link throws an internal server error. (I have a screenshot of it, but this system only seems to allow one screenshot…)
A clarification: using the timeline slider, I copied the text out of the obi-documentation etherpad, brought it into TextEdit, made it into plain text, etc. (as above).
Updated•14 years ago
|
Assignee: server-ops → nmaul
OS: Mac OS X → All
Hardware: x86 → All
Comment 2•14 years ago
|
||
This is caused by an errant unicode character in the content, which Etherpad can't render properly (I've seen this before). The character in question is U+2028.
I have a clean copy of the content with that character removed... it's just whitespace, so nothing is lost. I'm working on a solution to remove and recreate the existing pad.
In the meantime, attached is the complete data that you can copy/paste into a new etherpad, which will not break.
Comment 3•14 years ago
|
||
Moving to the new Websites::etherpad.mozilla.org component. The only solution I have to this is to delete the existing pad and copy the latest contents back into place.
We're working on a newer release of Etherpad which makes it easy for us to delete public pads. When that happens, I can finish this off entirely.... although I suppose by now it's likely not a big issue, since it's been almost a full month.
Assignee: nmaul → nobody
Component: Server Operations → etherpad.mozilla.org
Product: mozilla.org → Websites
QA Contact: cshields → etherpad-mozilla-org
So, there is no way for a user to see & remove the (hidden) errant unicode character? Not sure what is meant by the "moving to the new Websites::etherpad.mozilla.org component." Any explanation you can provide will help. Thank you!
Comment 5•14 years ago
|
||
With the right tools, you can sanitize a copy of the data and make a new pad... it's not simple, though. If you're good with a Unix/Linux/OSX command line, you can use 'sed' to remove the character. In some cases you may even be able to see it with 'less' or 'vim', depending on your terminal settings. Apart from a pretty technical approach such as this, I don't have a good way to sanitize a file of this character.
There's no way for an end user to fix an existing pad that has the problem, either, unfortunately... in fact, at present even as the admin I haven't discovered a good solution. Etherpad was not designed to permit editing of pads outside of the normal interface, and the database schema is not such that I can easily edit the pad that way, either.
As far as the "moving to the new component", I'm simply referring to the Buzilla product/component this bug is now in. Nothing to worry about. :)
Comment 6•13 years ago
|
||
The original affected pad has just been fixed (by deleting it and recreating it, with the attachment in comment 2). Please let me know if you still have others that are affected in the same way. Even if you don't need them anymore, they may be helpful from a troubleshooting perspective.
I'm also changing the description of this bug, and CC'ing someone who might be able to investigate the code more thoroughly and find out why this occurs.
Summary: one etherpad hangs on "Loading…" and never loads → some Unicode characters can break a pad from loading
Comment 7•13 years ago
|
||
I have to say this bug drives me crazy and has killed a lot of good pads for me :(
Just curious, is there any progress on fixing this?
Seems to be a big limitation to not be able to copy and paste text into a Mozilla etherpad, I'm surprised more people aren't rallying on this one.
From reading these comments, nobody knows a workaround to sanitize a block of text so it can be pasted into a Mozilla etherpad without the deadly U+2028 character in it?
Comment 8•13 years ago
|
||
If you go to 'Time Slider' and download as html you can remove bad (non ascii printable characters with:
sed 's/[^[:print:]]//g'
We just had another reported case of this and I can't work out how to delete the bad pad once it's hung :/ I'm guessing it needs to be nukes from the db directly?
Comment 9•13 years ago
|
||
The problem char(s) were one or both of:
> grep '[^[:print:]]' opennews-fellow-promoz-latest.html | sed 's/[[:print:]]//g' | hexdump
0000000 80e2 0aa8
0000004
Comment 10•12 years ago
|
||
This should be fixed now!
We modified the database to accept 4-byte UTF-8 characters. The UI doesn't know how to display them (they just show up as "??"), but that's obviously much better than ruining the pad altogether.
I honestly don't know if any already-broken pads will start working again (I suspect they will remain broken), but it should not happen anymore in the future.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 11•12 years ago
|
||
This is great news, thank you!!!
Assignee | ||
Updated•9 years ago
|
Product: Websites → Websites Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•