Closed Bug 337377 Opened 14 years ago Closed 12 years ago

Fails to import Outlook CSV files that have fewer than 13 fields

Categories

(Calendar :: Import and Export, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: j_goldfoot, Assigned: Hb)

Details

Attachments

(6 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060506 Mozilla Sunbird/0.3a2

Outlook 2003 (and possibly other versions) sometimes generate CSV files that have only a limited number of fields.  The current code assumes that Outlook CSV files will have at least 13 fields.  Thus, when asked to import a CSV file with 7 fields, it fails.

Reproducible: Always

Steps to Reproduce:
1.  File | Import
2.  Files of type: Comma Separated Values (from Outlook)
3.  [choose the sample CSV file I've attached; it was generated with Outlook 2003, after I ran a script on it to scramble private details]
4.  Select a calendar to import into if necessary

Actual Results:  
No events are imported.

Expected Results:  
All events should be imported.
Attachment #221536 - Flags: first-review?(mvl)
Status: UNCONFIRMED → NEW
Component: General → Sunbird and Calendar-Extension Front End
Ever confirmed: true
QA Contact: general → sunbird
Version: unspecified → Trunk
The current committed code attempts to parse CSV files by (1) determining how many fields there are per line, and (2) constructing, on the fly, a regular expression object that parses one line of the CSV.  However, this approach assumes that each line has the same number of fields.  If a user attempts to import an Outlook 2003 file, this is not the case.  As a result, an error occurs:  Only those fields with seven lines (i.e. those that have a "description" field) are imported.

This patch adds a new CSV parser that returns an variable-length array for each CSV record.

I've checked this code against every CSV sample file I could find uploaded to the bug database, and each imports correctly.
Attachment #221536 - Attachment is obsolete: true
Attachment #223053 - Flags: first-review?(mvl)
Attachment #221536 - Flags: first-review?(mvl)
Sorry for taking so long to get to this. (problems with isp's etc) I'm not sure when i will have time to look into this patch, i hope soonish.

If there is anybody willing to test the new review system, feel free to give this patch a first look.
Comment on attachment 223053 [details] [diff] [review]
Allows import of records with variable number of fields per line (Outlook 2003 style)

Moving first-review to Daniel, since mvl's queue is pretty big and he can only cope with his ui-reviews and 2nd reviews.
Attachment #223053 - Flags: first-review?(mvl) → first-review?(daniel.boelzle)
Component: Sunbird Only → Import and Export
QA Contact: sunbird → import-export
Comment on attachment 223053 [details] [diff] [review]
Allows import of records with variable number of fields per line (Outlook 2003 style)

Since I am no outlook csv import/export expert, I would like to pass this review back to mvl.
Attachment #223053 - Flags: first-review?(daniel.boelzle) → first-review?(mvl)
We're all busy.  Let me put in another plug for the importance of this bug.  I think lots of people, when they take a look at Sunbird, try to import from Outlook first thing.  When that doesn't work, they give up.  See, for example, the many complaints in bug 173562.  

I think my patch fixes this problem.
Comment on attachment 223053 [details] [diff] [review]
Allows import of records with variable number of fields per line (Outlook 2003 style)

Moving r1 to ctalbert since he actually has Windows and Outlook to test with.
Attachment #223053 - Flags: first-review?(mvl) → first-review?(ctalbert.moz)
Josh, thanks and great job on this patch.  It works for the current test case here in the bug.  The code had been changed since you first made the patch, and it no longer would apply cleanly to the source, so I updated the patch for you to fix this.  While doing that, I also fixed some small mozilla-style issues:
* No empty spaces at ends of lines, or empty lines with only spaces
* That said, a blank line can make code far more readable, and I inserted a few
* We generally always encourage people to add {'s to loop and conditional statements, even when they are only one line.  This makes the code quite a bit more readable.
* The rule on wrapping is to wrap if it leaves the remaining text easily readable.  In a few instances I unwrapped the text because I felt it was more readable unwrapped case in point:
sDate.isDate = ("allDayIndex" in args ? localeEn.valueTrue == eventFields[args.allDayIndex] : true);

--- Some Other Comments To Consider for the next revision of this patch ---
// Assumes that this.spot begins at a quote
+           if (this.text[this.spot] != '"') {
+                throw new Error();
+            }
We need to throw a better error message here--if this is an error. I question your assertion - can the CSV file use single quotes for its strings?  Perhaps we need to better handle the case of single quotes and no quotes. In the seed data Outlook Export that we use internally (at my non-mozilla job) for testing, there are no quotes around the items.

* I completely agree that your cvsReaderObj should be a component.  In fact, I think it should be a toolkit component, ideally.  However, in the calendar codebase we have been declaring prototypes inside the global scope.  I think in the interest of maintainability and a unified style across the code, I would ask you to refactor the cvsReaderObj in the same way. You can find code samples in JS code here: http://lxr.mozilla.org/seamonkey/source/calendar/base/src/  If you have questions on style or architecture, don't hesitate to jump onto #calendar (irc://irc.mozilla.org/calendar) and ask.  My nick is ctalbert.

Thank you very much for this patch.  I will try to scrub my outlook csv files and attach them to this bug so you will have more testcases to use.
Attachment #223053 - Attachment is obsolete: true
Attachment #223053 - Flags: first-review?(ctalbert.moz)
Forgot to say that I am looking forward to reviewing your updated patch, so feel free to request r1 from me.
Outlook 2K "Windows" Export csv.  Looks like the items are quoted. I may have been wrong about us needing to handle cases where there are no quotes in the data. I will try to follow up and see where our evil internal test case came from.
Here is the other CSV option from Outlook 2000, a "DOS" version. Offhand, I don't see the difference.
I did a smaller solution with simply reducing the number of mandatory fields to six. It can't deal with rows which have not the same number of fields as in the header row. Please see bug 359083 for more.
Depends on: 359083
Bug solution:

After applying this patch only six columns are mandatory and Sunbird imports the first three events out of "Sample Outlook 2003 CSV export (with only seven fields)". Then the import stops.

The fourth line as one more field than all others. It has a field without a corresponding header in the first line. I consider such CSV files as misformed. It is correct that this data is not imported.

Please reopen if your current Outlook exports files with a changing amount of headers.


Enhancement:

> -        args.boolStr = localeEn.valueTrue;
> -        args.boolIsTrue = true; 
> -        var dateParseConfirmed = false;

Deleting the setting of this unused variables.
Assignee: nobody → hb
Status: NEW → ASSIGNED
Attachment #304860 - Flags: review?(mvl)
> The fourth line as one more field than all others. It has a field without a
corresponding header in the first line. I consider such CSV files as misformed.
It is correct that this data is not imported.

I might agree with you, but that is the file that Outlook generated (after I anonymized it).  Users will expect that their Outlook CSV files can be imported.  My patch accomplishes this.
(In reply to comment #15)
> I might agree with you, but that is the file that Outlook generated (after I
> anonymized it).  Users will expect that their Outlook CSV files can be
> imported. 

Is the part 
"##@@@@@@@p@@@@@@@@@
[Dulp K-21PC-99JL629B]


"
in your file valid (anonymized) data?

Users can set the number of columns for CSV files in Outlook. This setting is stored for the next import or export process. It is reported that these settings sometimes get invalid because the underlying outlook configuration has changed. Please see http://support.microsoft.com/kb/290859 for this.

The dialog for this setting has a button "Standardzuordnung" = "Default mapping" in OL 98 (de) and OL 2002 (de). Perhaps using this default mapping would generate normal CSV files with 22 columns.

I strictly agree that an easy way to import Outlook data is very important.

It would be great if someone with OL 2003 at hand could reproduce that the number of fields might vary. 

> My patch accomplishes this.

I did not checked it in depth because ctalbert did. Main goal of my patch was to reduce the number of mandatory columns and to get something going before the 0.8 release. 

On the long run want the CSV importer to accept lists like "Subject","Start Date","Location" for easly import event lists to Sunbird. But this discussion is made under bug 359083.
Yes, that string of strange characters, complete with carriage returns, is part is what I found "in the wild" of a CSV export of a calendar I have been maintaining for years (synced to Palm handhelds).  That string of strange characters is put there by DateBk5, a popular Palm calendar program.  If I remember correctly, it is part of the Note field--which, BTW, Outlook will export with unescaped carriage returns, something the old code did not account for at all.

The only anonymization I did was to replace all letters with another (random) letter, and all non-date non-time digits with another (random) digit.  All other bytes are the same.

Every single version of Outlook I encountered exported CSV differently, and I couldn't find any MS documentation explaining what the deal was.  Satisfying all Outlook users will be hard, but I think we can come close.


(In reply to comment #16)

> Is the part 
> "##@@@@@@@p@@@@@@@@@
> [Dulp K-21PC-99JL629B]
> 
> 
> "
> in your file valid (anonymized) data?
> 
> Users can set the number of columns for CSV files in Outlook. This setting is
> stored for the next import or export process. It is reported that these
> settings sometimes get invalid because the underlying outlook configuration has
> changed. Please see http://support.microsoft.com/kb/290859 for this.
> 
> The dialog for this setting has a button "Standardzuordnung" = "Default
> mapping" in OL 98 (de) and OL 2002 (de). Perhaps using this default mapping
> would generate normal CSV files with 22 columns.
> 
> I strictly agree that an easy way to import Outlook data is very important.
> 
> It would be great if someone with OL 2003 at hand could reproduce that the
> number of fields might vary. 
> 
> > My patch accomplishes this.
> 
> I did not checked it in depth because ctalbert did. Main goal of my patch was
> to reduce the number of mandatory columns and to get something going before the
> 0.8 release. 
> 
> On the long run want the CSV importer to accept lists like "Subject","Start
> Date","Location" for easly import event lists to Sunbird. But this discussion
> is made under bug 359083.
> 

Attachment #304860 - Flags: review?(mvl)
With this patch only the two fields "Title" and "Start Date" are mandatory for CSV import. 

Additionally the fault tolerance is greatly increased and the datetimes are checked before placed in the event. If necessary corrections are made according to this table:

eDate     eDate     sDate       Action
------    -------   --------    ---------
false |   -         isDate      Set duration to one day.
false |   -         has time    Set duration from preferences.


true      isDate    isDate |    Failed to recognize start time,
                                so throw end time away too and 
                                check if duration gt one day.
true      has time  isDate |    Check if duration gt one day.


true      has time  has time    Failed to recognize end time,
                                set end time to 23:59 and
                                check if duration is positive.
true      isDate    has time    Check if duration is positive.
------    -------   --------    ---------
("has time" == date has valid time part)

This patch is workable alone. But to recognize dates even if the time field is empty parseDateTime() needs the patch out of bug 301750.

I see main advantage for people how hack an event list in their spreadsheet and want to import it. CSV is such an easy format for data exchange that Sunbird should make it easily usable and not insist on these 22 column monsters from Outlook.
Attachment #305247 - Flags: review?(mvl)
No longer depends on: 359083
Comment on attachment 305247 [details] [diff] [review]
Patch_v2 for calOutlookCSVImportExport, only two fields mandatory

>Index: calendar/import-export/calOutlookCSVImportExport.js

>@@ -207,20 +212,20 @@ function csv_importFromStream(aStream, a
>+            knownIndxs = 13;

I don't think that's needed.

r=mvl with that fixed.
Attachment #305247 - Flags: review?(mvl) → review+
Keywords: checkin-needed
(In reply to comment #19)
> >@@ -207,20 +212,20 @@ function csv_importFromStream(aStream, a
> >+            knownIndxs = 13;
> 
> I don't think that's needed.
> 
> r=mvl with that fixed.

True. That line isn't needed any more at. Please delete before check in. 

OS: Windows XP → All
Hardware: PC → All
Version: Trunk → unspecified
Checked in on HEAD and MOZILLA_1_8_BRANCH (with |knownIndxs = 13;
| deleted)

-> FIXED
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → 0.8
You need to log in before you can comment on or make changes to this bug.