Closed Bug 458690 Opened 12 years ago Closed 4 years ago

Sunbird & Lightning 0.9 XML error and hung process after webDAV event update

Categories

(Calendar :: Provider: ICS/WebDAV, defect, critical)

x86
Windows 2000
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: vinceofarrell, Unassigned)

Details

(Keywords: hang, testcase)

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.8.1.17) Gecko/20080829 Firefox/2.0.0.17
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.18pre) Gecko/20080917 Sunbird/0.9

This bug has similarities to others such as 456978, but it seems very similar to 387559.

Background

The problem affects my use of Lightning on Windows (both 2K and XP) but I have demonstrated a similar problem in Sunbird. I have been using Lightning for some time now and have come to depend on it - while not perfect, it is an invaluable tool. I record all my activities as events - I do not use it for tasks or invitations. I have 5 web/remote DAV .ics calendars on my own in-house Apache 2.2.8 server and have got into the habit with versions 0.7/0.8 of Lightning of doing a reload calendars after every update to work around a bug in those versions which left Lightning in an odd state immediately after the update. The reload calendars action seemed to straighten out Lightning which otherwise would report an error on the next update and make the calendar read-only. The reload action was a bit of a chore but I was in the habit of doing it and had got to accept it.

I installed Lightning 0.9 about a week ago and hoped that either the above bug had been fixed, or at worst would still be present with similar symptoms. However, the bug seems to have changed in 0.9 such that a follow-on update still causes a problem but a calendar reload hangs part way through with a status message of the form 'Checking Calendar n of m'. Closing TB at this point removes the GUI component from the screen but a TB task remains which must be removed using Windows Task Manager. Lightning 0.9 is therefore unusable for me and to add insult to injury, I can't regress to 0.8 because the Add-on manager says the calendars may get corrupted and it therefore disables 0.8.

Specific Sunbird bug details

So, I decided to investigate further with SB 0.9 since if the problem was Lightning-specific, I was happy in the short term to live without Lightning and survive on SB. Here is what happens with SB 0.9:

I uninstalled SB 0.7, physically deleted the program directory and the local SB profile data, then installed a completely fresh copy of SB 0.9. I created three of my calendars in SB, two are static holidays and anniversaries while the third is my own active work calendar. The Error Console was clear at this point (apart from an initial timezone warning).

I added a new event which appeared in the calendar display and I believe the .ics file was updated since the event appears in later runs. The EC then shows:
Error: missing = in XML attribute
Source File:
Line: 1, Column: 36
Source Code:
<parent xmlns='DAV:'><!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
-------------------------------------------------------^
[The above is the result of a copy-and-paste and only approximates to what is visible onscreen. I have manually adding the last line with the column indicator which is not present in the copied data. Onscreen, the arrow points to between the P and U of PUBLIC.]

Reloading remote calendars at this point seems to reload the first two calendars but hang on the third (the one that was updated). The status message shows 'Checking calendar 2 of 3' with all calendar 3's events missing. (I have seen other instances of this message with 'Checking Calendar n of 4294967300'!!) The EC does not show any new messages after the calendar reload.

Closing the EC window and the main SB window leaves a SB process running, apparently with an unchanging memory allocation of around 32MB and using little or no cpu judging from its position in the sorted-by-cpu process list.


Reproducible: Always

Steps to Reproduce:
1. Install a fresh copy of Sunbird 0.9 after removing all remains of any earlier versions.
2. Add a webDAV network .ics calendar.
3. Open the Error Console and clear or note contents.
4. Add a new event which should appear in calendar display.
5. Notice that Error Console now contains the XML syntax error listed in Details above.
5a. Attempts to rectify the situation by reloading calendars for example will cause more oddities - usually a hang while reloading the calendar just updated.
6. Close Sunbird GUI windows.
7. Windows task manager shows a Sunbird process still running which until removal will block full and correct restarts of Sunbird.
Actual Results:  
1. New or updated event appears in calendar display and (when checked later) in .ics file.
2. Error Console reports error.
3. Sunbird is not able to update until restarted.
4. GUI window closures do not close all Sunbird processes.

Expected Results:  
1. Updated the .ics file and the calendar window.
2. Sunbird continues to operate correctly.
You can copy & paste the error message from the error console by right-clicking on the respective error message and choosing 'Copy'. Please post the exact error message here in this bug so that we can appropriately diagnose the problem.
Simon

I *did* copy and paste from the EC, as my report above makes clear. In fact, I used the hopefully equivalent Ctrl/C and Ctrl/V commands to do the business.

However, once again so there can be no confusion, here is the result of copying (but this time using a right click context menu command ;>) and pasting the EC message after the update:

Error: missing = in XML attribute
Source File: 
Line: 1, Column: 36
Source Code:
<parent xmlns='DAV:'><!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

(though as I explain above, the EC display message is slightly different in that it includes a red octagon warning symbol and another line below the source code line which attempts to indicate how far the parser had got when it reported the error - these objects do not copy)

vof
That error looks like the server is responding with an error page. Could you do a capture of the wire transfer? (i.e using wireshark)
I can confirm that this also happens on Linux - all my webdav calendars are crippled!
Philipp

Thank you for the pointer. You are correct and have helped me discover a
workaround if not the cause of the problem. Wireshark shows the following
packet sequence:

HTTP PUT /davhome/vince.ics HTTP/1.1
<85 continuation packets>
HTTP/1.1 204 No Content
HTTP PROPFIND /davhome/vince.ics HTTP/1.1
<continuation packet>
HTTP/1.1 401 Authorization Required

so the 401 seems to be in response to the PROPFIND.

The Apache log shows the PUT followed by a PROPFIND but no errors in its error
log:

192.168.1.3 - - [06/Oct/2008:22:33:50 +0100] "PUT /davhome/vince.ics HTTP/1.1"
204 - "-" "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.18pre)
Gecko/20080917 Sunbird/0.9"
192.168.1.3 - - [06/Oct/2008:22:33:50 +0100] "PROPFIND /davhome/vince.ics
HTTP/1.1" 401 478 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
rv:1.8.1.18pre) Gecko/20080917 Sunbird/0.9"

My dav_fs.conf file contains:

DAVLockDB /var/lock/apache2/DAVLock
#DAVMinTimeout 600

<Location /davhome/>
Dav On

AuthType Basic
AuthName davuser
AuthUserFile /var/www/davhome/.DAVlogin

<LimitExcept GET PUT OPTIONS>
Require user davuser
</LimitExcept>
</Location>

Note that I have not included PROPFIND in the LimitExcept list. When I do now
include it, the problem goes away.

Now I do not know enough about DAV to say whether the above evidence shows that
SB is behaving correctly in this situation - it certainly would help if it
failed more gracefully, and should it not have prompted for auth credentials at
some point? The AuthUserFile exists and includes the necessary credential
information.

vof
Bruno should know more about DAV to judge what is right.
I'm a little surprized that you don't get an authprompt in response to that PROPFIND, but then I've never used a DAV configuration like this before. Given some of the changes between 0.8 and 0.9 the need to add PROPFIND to the list seems reasonable. The LimitExcept allows passwordless access for GET and PUT; since 0.9 under some circumstances issues PROPFIND as well that will need to be in the list.

I'm curious about the headers you see in wireshark:
1) does the response to the PUT include an Etag header?  For that matter, is there an etag in the XML response body to the PROPFIND?
2) Does the response to GET and PUT include an Authorization header?
Bruno - I've attached 3 text files exported from Wireshark which hopefully show you the packet data you want to see.
I am able to replicate this bug with Apache 2.2.9 on FreeBSD 7.0, using Lightning 0.9 clients on Windows and Linux.

vof's suggested temporary fix of adding the PROPFIND method to the LimitExcept directive appears to work for me as well.  Thanks for the suggestion.
I can confirm this bug on Linux...
Is there something I can do about this if I don't have access to the server? It worked in Lightnig 0.8, so there should be something different in how Lightning (Sunbird) behaves.
Hi,
I have exactly the same problem with the remote calendar after each Update. Lightning Version 0.7 to 0.8 and 0.8 to 0.9. In Sunbird it is the same. Only a new installation of complete Thunderbird with Lightning or Sunbird solves the problem and that is to much work for me every time. Importing the profile is no solution too. Can anyone give me some help and explain the solution by adding PROFIND?
But this can't be the right way, because the problem exists only after updating from an old version to a new and not with a new installation.
Best regards and thanks from FOX, Germany.
Hi folks!

Not sure I have the same situation, since I'm using HTTPS + Dav/svn + Squid.
The problem is no refresh for just modified calendar + rolling dots at the upper-right corner of thunderbird.
From the log it seems like broken SSL?

Other (unchanged calendars) are just getting 304 - not changed, so no SSL errors.

Thunderbird restart does nothing. Sunbird shows all three just fine.

The similarity is with hung connection.

[Tue Nov 04 18:32:08 2008] [info] Initial (No.1) HTTPS request received for child 10 (server svn.....com:443)
[Tue Nov 04 18:32:13 2008] [debug] ssl_engine_io.c(1828): OpenSSL: I/O error, 5 bytes expected to read on BIO#291054c0 [mem: 2919a000]
[Tue Nov 04 18:32:13 2008] [info] [client ....] (70007)The timeout specified has expired: SSL input filter read failed.
[Tue Nov 04 18:32:13 2008] [debug] ssl_engine_kernel.c(1770): OpenSSL: Write: SSL negotiation finished successfully
[Tue Nov 04 18:32:13 2008] [info] [client .....] Connection closed to child 10 with standard shutdown (server svn.....com:443)
[Tue Nov 04 18:32:13 2008] [debug] ssl_engine_io.c(1828): OpenSSL: I/O error, 5 bytes expected to read on BIO#29105300 [mem: 2916a000]
[Tue Nov 04 18:32:13 2008] [info] [client ......] (70007)The timeout specified has expired: SSL input filter read failed.
[Tue Nov 04 18:32:13 2008] [debug] ssl_engine_kernel.c(1770): OpenSSL: Write: SSL negotiation finished successfully
[Tue Nov 04 18:32:13 2008] [info] [client ....] Connection closed to child 8 with standard shutdown (server svn.....com:443)
(In reply to comment #15)

> Other (unchanged calendars) are just getting 304 - not changed, so no SSL
> errors.
> 
> Thunderbird restart does nothing. Sunbird shows all three just fine.
Correction: closing Thunderbird results in hung background process. Restarting just bring the GUI back.
Killing and starting process again solves the problem.
Keywords: hang
I use mydrive.ch. Thus, I don't believe I am able to use the LimitExcept-fix. Are there any other fixes until this bug is fixed?
Does this still happen with the latest 1.0b2pre nightlies? I believe it was fixed by a different bug.
Thanks for the ping, Wayne, but I have no access to the setup anymore. Unable to check, sorry.
No response for a while now, closing INCOMPLETE. Please reopen if you can reproduce or answer any unanswered questions.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
It still happens with TB3.0.4 + Lightning 1.0b1 on both XP and Lucid if the PROPFIND is removed from the LIMITEXCEPT statement though the actual symptoms are slightly different - no XML error reported but still leaves screwed TB and a hung TB process after TB is exited.

On XP, also get an error window:
------------------------------------------------------
An error occurred when writing to the calendar xxxxxx!

Error code: MODIFICATION_FAILED
Description:      (blank)
------------------------------------------------------
and the calendar is marked read-only when TB is tidied and restarted.

Will try with a 1.0b2pre if I get time/opportunity.
Resolution: INCOMPLETE → FIXED
Just tried with /pub/mozilla.org/calendar/lightning/nightly/2010/03/2010-03-05-03-comm-1.9.2/linux-xpi/lightning.xpi (1.0b2pre) with TB (3.0.4 on Lucid).

[I modified the install.rdf minVersion to allow it to install on TB 3.0.4.] 

Unfortunately Philipp, the problem is still present - TB is screwed after the first webDAV calendar update as originally described for 0.9. The displayed calendar grid is only partially populated - the day to which the second update was applied is empty. TB appears to exit when requested but leaves 3 running processes (on Lucid).

Philipp: Suggest bug is reopened. Let me know if you want me to gather more evidence.
Lightning 1.0b2pre test builds require Thunderbird 3.1 test builds for correct functionality. If you retest please use a current Lightning 1.0b2pre test build instead of an old one.
Reopened for now, but waiting for more details from vof.
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
I am using TB 3.0.4 and lightning 1.0b1. also.  The behavior does not happen all the time, but when several events are added or moved eventually it will hang and I get the error occurred writing to calendar.  TB continues to function.  TB does not remain in memory on exit.  On restart, the calendar in question, is locked on restart.
I am using TB 3.0.4 and lightning 1.0b1. also.  The behavior does not happen all the time, but when several events are added or moved eventually it will hang and I get the error occurred writing to calendar.  TB continues to function.  TB does not remain in memory on exit.  On restart, the calendar in question, is locked on restart.
Found time tonight to try on a fresh XP system using:
TB 3.1 RC2
Lightning 1.0b2pre

Problem still occurs if PROPFIND is absent from DAV server LIMITEXCEPT statement.

Action/event sequence is:
- modify calendar (e.g. add event) - calendar view is updated, TB seems to behave normally.
- modify calendar again - calendar view is not updated.
- close TB - UI disappears but a process remains which needs to be manually removed.
- no error messages reported.

I also tried with only one DAV calendar defined rather than three with the same results.
...and have just confirmed problem still exists with released TB 3.1 and Lightning 1.0b2 on XP.
:fallen, is a trace needed?
Keywords: testcase
We should try to reproduce using steps from comment 27, if that doesn't work out we can still request a trace.
In response to a request from Wayne Mery, I have just reproduced problem again on an XP/TB14.0/Lightning1.6 system. Symptoms the same as in comment 27.

Server is apache2 on lucid. I have been running with PROPFIND in LIMITEXCEPT statement in dav_fs.conf since early days of bug in October 2008 so it does not affect me but not everyone can modify the affected apache config.

Please explain how to produce the type of trace wanted.
As a measure to reduce the number of issues to a more manageable amount I am closing off this bug because it has been filed before 2010 and is still in the "unconfirmed" state and it has not recently been changed.

If this is still an issue, please test with the latest version of Thunderbird and Lightning and comment on the bug, I am happy to reopen it.

Thank you for your understanding!
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago4 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.