Closed Bug 431822 Opened 16 years ago Closed 12 years ago

RSS titles with spaces corrupt folder names

Categories

(MailNews Core :: Feed Reader, defect)

1.8 Branch
x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 547543

People

(Reporter: nick.battle, Unassigned)

References

(Blocks 1 open bug, )

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9b5) Gecko/2008032620 Firefox/3.0b5
Build Identifier: 2.0.0.14 (20080421)

RSS feeds with whitespace round the text in the title tag cause folders to be created with the whitespace and corrupted characters on the end. For example, all CNET blog feeds have spaces and newlines in their titles. The example URL above creates a folder called "             Geek Gestalt                        4b313965".

You can rename such a folder, but then bug 291293 kicks in and you get a duplicate (empty) folder with the corrupted name.

This may relate to bug 316051, though that mentions that the articles cannot be retrieved, whereas the articles are retrieved correctly in my case.


Reproducible: Always

Steps to Reproduce:
1. Create a new feed for http://www.news.com/8300-13772_3-52.xml
2. A folder will be created with spaces at the start and garbage at the end.
3. Rename the folder and restart

Actual Results:  
The renamed folder will be there after restart, but the corrupted folder will re-appear. The renamed folder will contain the articles.

Expected Results:  
The original folder creation should strip the whitespace and create a clean folder name with no corruption.

See bug 291293 and bug 316051.

All CNET blogs have spaces and newlines in their title tags. I've asked whether they can clean this up, but TB should handle it anyway.
Version: unspecified → 2.0
It seems to be slightly worse than I described. After experimenting with various CNET blog subscriptions, I now always get three corrupt folders appearing whenever I restart TB:

"             Beyond Binary                        244d907d"
"             Geek Gestalt                        4b313965"
"             The Open Road                        fb064738"

I do actually have a subscription to The Open Road, but I moved the feed to a subfolder. The subfolder shows the articles correctly (and its name remains uncorrupted), but on every restart the corrupted folder above (together with the two that do not have subscriptions) always reappear at the top level. They will delete "successfully", but reappear on restart.

The problem with the corrupt folders re-appearing after deletion and restart can be fixed by attempting to delete the folders in the GUI, exiting TB, then deleting the folders (which still exist) from the profile directly.

For example, I had three folders with the names in the previous comment in <profile>\Mail\News & Blogs. Deleting these by hand prevents them from appearing when TB is restarted.

So it looks like there are two distinct problems: RSS feeds with spaces in the title create corrupt folder names (with spaces and other garbage); and deleting folders in this state does not do a complete job (leaving the folders in the News & Blogs directory).
(Problem-1)
> "             Geek Gestalt                       4b313965"
> corrupted characters on the end

It's not "corrupted characters".

"Character space" usable in file/directory name is narrow than "character space" usable for mail folder name. So hashing is used for illegal file name characters such as "\"(MS Win's path delimitor, escape character). Same hashing is used for  special charatcers also, such as "/","?","#"(delimiter of URI. Mozilla uses URI format internaly, so such chars in file name causes problem). Further, there are some limitations in file name by OS, for example;
 (a) Starting "."    : Hidden file on Linux/Mac OS X
 (b) Ending "~"      : Backup file on Linux
 (c) Trailing "."    : MS Win creates file without trailing "." on file creation
 (d) Trailing spaces : MS Win removes trailing spaces on file creation.
These are also hashed in order to avoid problems due to restriction on file name.
See Bug 229522, Bug 117840. See also Bug 379101 for compatibility issue due to change for above (c) & (d) on Tb 2.0.
See also Bug 275770 for improvement request of the "hashed file name".

Because "mail folder name" is set as <title> string for RSS feed internally, above is also applicable to RSS feed. This is current implementation(can be called "restriction" when RSS feed, I think).

(Problem-2)
> Steps to Reproduce:
>(snip)
>3. Rename the folder and restart
> Actual Results:  
> The renamed folder will be there after restart, but the corrupted folder will
re-appear.

"Mail folder name" is held in ".msf"(if treated as local mail folder), or is held at some where else(virtualFolders.dat when saved search folder/virtual folder), or is obtained form RSS feed data(<title> tag) when accessed RSS. And hashing to narrow "character space for file name" occurs when "mail folder" open.
Manual corruption of file name for RSS feed is user's fault, and result of "manual corruption of file name for RSS by user" is unpredicable.

Problem-1 is basically INVALID, and Problem-2 is apparently INVALID.
However, for Problem-1, I think "remove of heading/trailing spaces(and excess spaces at mid) from folder name" is good circumvention of(improvement for) this bug's issue on RSS and problem like Bug 303729, as seen in Bug 291293 Comment #5.
Component: RSS → Feed Reader
Product: Thunderbird → MailNews Core
Version: 2.0 → 1.8 Branch
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.