Unable to open files by tapping the download complete notification

VERIFIED FIXED in Firefox 24

Status

()

defect
VERIFIED FIXED
6 years ago
3 years ago

People

(Reporter: TeoVermesan, Assigned: wesj)

Tracking

({regression})

25 Branch
Firefox 26
ARM
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox23 unaffected, firefox24+ verified, firefox25+ verified, firefox26 verified, relnote-firefox -, fennec24+)

Details

Attachments

(2 attachments, 1 obsolete attachment)

Tested with:
Build: Firefox for Android Nightly 25.0a1 2013-07-21
Device: Samsung Galaxy R (Android 2.3)

Steps to reproduce:
1. Load any article from news.google.com
2. Open the custom menu and choose "Save as PDF"

Expected results:
- After step 2, a download complete notification appears. By tapping it, the file is opened

Actual results:
- There is no action after tapping the download complete notification 
Note: I can reproduce this also with images or mp3 files.
Devices: Samsung Galaxy Nexus (Android 4.1.1), LG Nexus 4(Android 4.2.2)
Blocks: 887655
This is a regression from bug 887655?
Flags: needinfo?(teodora.vermesan)
Keywords: regression
tracking-fennec: --- → ?
Yes. The issue appears in the 07/20 build, where bug 887655 was fixed (in 07/19 build there was no download complete notification).
Flags: needinfo?(teodora.vermesan)
This downloads notification is the world's biggest PITA. Is there an error in the log?
I don't think you can say this is a regression from bug 887655 because that bug just added back the notification, but the notification may have been broken before that.

It seems more likely that bug 887655 caused this.
(In reply to :Margaret Leibovic from comment #4)
> I don't think you can say this is a regression from bug 887655 because that
> bug just added back the notification, but the notification may have been
> broken before that.
> 
> It seems more likely that bug 887655 caused this.

Er, I mean bug 746976.
I just tried adding some logging to investigate, and I found this observer is just  never being called:
http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/downloads.js#70
Assignee: nobody → bnicholson
See comment 4 -- this wasn't caused by bug 887655.
Assignee: bnicholson → fedepaol
No longer blocks: 887655
I placed a lot of logging, and I think the cause *might* be the fix of bug 884792 .

RemoveObserver method switched from being synchronous to asynchronous.

When a download finishes:
http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/downloads.js#176

The current notification and its observer gets removed as effect of progressListener.onCancel() 
and the download complete notification is created:

http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/downloads.js#186

Now, what I think is happening is that the creation of the new observer is synchronous and happens immediately.
showAlertNotification -> 
  mozilla::AndroidBridge::Bridge()->ShowAlertNotification() ->
        nsAppShell::gAppShell->AddObserver(aAlertName, aAlertListener);


whereas the removing of the previous observer is executed asynchronously after the creation of the new observer.
Since both the notifications use the same id, which is used as key of the map that contains the observers, removing the old observer has the effect of removing this brand new one. (Or at least this is my hypothesis)
Update: after backing off bug 884792, the observer is still not getting called :-( . Need to place some more traces to understand what's going on.
tracking-fennec: ? → 24+
Update (2) the observer wasn't getting called because fennec was crashing due to  bug 884792, so I could not check if it was the cause. I think trying to fix moving onCancel or using a different notificationid will prove if I am right or not. 
I hope I will be able to submit a patch in the weekend.
I tried to use different ids for pending and complete notifications but it did not work. Anyway, it helped me to see what is the central point of this bug. 
The observer is not removed (only) by canceling explicitly the notification of pending download because they share the same id.

The problem is that calling showAlertNotification
http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/downloads.js#100

ends up in calling close() and add()

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/alerts/nsAlertsService.cpp#89

Now, close notification also removes the related observer:
http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/GeckoAppShell.java#1293

So, what's happening now every time we create a notification is the removeObserver gets queued, the addObserver is executed synchronously and when the remove is handled it removes the observer we wanted to register.

Playing with the id did not help, I can think about two solutions now:
- removing the close in http://mxr.mozilla.org/mozilla-central/source/toolkit/components/alerts/nsAlertsService.cpp#89 , which may result in leaked observers if we add two notifications with the same id without removing the previous one

or

- making addObserver asynchronous too. In this way we should be able to preserve the order of add / remove execution. 

Both solution may have impacts that I am not seeing, for example making addobserver asynchronous too may result in missed events.
(In reply to Federico Paolinelli from comment #11)
> - making addObserver asynchronous too. In this way we should be able to
> preserve the order of add / remove execution. 

This seems like it will be racy to me. If I'm reading that right, really we add the observers when notifications are created here:

http://mxr.mozilla.org/mozilla-central/source/widget/android/AndroidBridge.cpp#711

but when they're cancelled, we send a message to java that says "Remove this one", the notification fires its pendingIntent to GeckoApp, which sends a message to Gecko and queues the "remove this observer" message here:

http://mxr.mozilla.org/mozilla-central/source/widget/android/nsAppShell.cpp#540

If we add the "add this observer" message to the queue, we have to rely on all that messaging that's going through Android's native libraries to be synchronous (i.e. remove still has to be added to the queue before add) which makes me nervous.

Can we instead change AndroidBridge::ShowAlertNotification to remove any observers already registered, and then register the new one (if it exists). Then we can just do your first option and remove the cancel from the nsIAlertsService. That means developers can still get yourself into these problems if you try quickly removing and readding your notification though.

Also, this code makes my brain hurt. Adding brad in case he's got any good ideas on how to fight concurrency issues here.
Flags: needinfo?(blassey.bugs)
Does this become any easier if we stop using nsIAlertsService and start using some JS wrapper for a JSON API (a la Notifications.jsm)
(In reply to Mark Finkle (:mfinkle) from comment #13)
> Does this become any easier if we stop using nsIAlertsService and start
> using some JS wrapper for a JSON API (a la Notifications.jsm)

This can be associated with bug 815202 I started working on before falling in this mess. 
With Wes, we decided to fix it incrementally, using custom json messages and adding the Notifications.jsm module later. 
As a first step, I can just replace nsIAlertService and add the buttons later. Don't know if this comment should be moved over the other bug.
Could just add an observer on initialization and never removed it until shutdown.
Flags: needinfo?(blassey.bugs)
Posted patch Patch 1/2Splinter Review
This gets rid of a few paths that trigger hiding a notification (namely, showing a new notification with the same ID or having a progress notification hit 100%). Android handles having multiple notifications with the same ID just fine (in fact, that's how you "update" a notification). And our hashtables should also add/release fine (I hope?).

Getting rid of the "hide at 100%" code also makes sense to me. Either callers should be doing something at 100% (download notifications "update" to a non-progress style), or they should be responsible for hiding the notification themselves.

You can still cause races with this. i.e. something like:

cancelAlert(id);
addAlert(id);

will cause the cancel to remove the observer finish after the add. Second patch tries to mitigate that using Federico's other idea, pushing adds to the queue.
Attachment #784092 - Flags: review?(blassey.bugs)
Posted patch Patch 2/2 (obsolete) — Splinter Review
This pushes the AddObserver calls to the queue. In all the tests I ran, that meant we preserved order when calling

cancel(id)
add(id);
Attachment #784095 - Flags: review?(blassey.bugs)
Comment on attachment 784092 [details] [diff] [review]
Patch 1/2

Review of attachment 784092 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/chrome/content/downloads.js
@@ +46,2 @@
>        // in case we are not able to open the file (i.e. there is no app able to handle it)
>        // we just open the browser tab showing it 

clean up this ws while you're here
Attachment #784092 - Flags: review?(blassey.bugs) → review+
Comment on attachment 784095 [details] [diff] [review]
Patch 2/2

Review of attachment 784095 [details] [diff] [review]:
-----------------------------------------------------------------

just want to see another patch with these changes

::: widget/android/nsAppShell.cpp
@@ +757,3 @@
>      return NS_OK;
>  }
>  

I'd rather remove this method, get rid of its one caller (http://mxr.mozilla.org/mozilla-central/source/widget/android/AndroidBridge.cpp#711), and post the event to add the observer from GeckoAppShell.ShowAlertNotification()
Attachment #784095 - Flags: review?(blassey.bugs) → review-
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #19)
> I'd rather remove this method, get rid of its one caller
> (http://mxr.mozilla.org/mozilla-central/source/widget/android/AndroidBridge.
> cpp#711), and post the event to add the observer from
> GeckoAppShell.ShowAlertNotification()

This will be a bit difficult because we need a reference to the nsIObserver when we do it.... I bet we could do some JNI juggling of refcounted pointers, but it feels a bit dangerous.
needinfo on :wesj to help uplift this.
Flags: needinfo?(wjohnston)
Comment on attachment 784092 [details] [diff] [review]
Patch 1/2

This fix is already on Aurora.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 884792
User impact if declined: Tapping on notifications does nothing
Testing completed (on m-c, etc.): This has been on nightly and aurora for a month. No fallout that I've seen.
Risk to taking this patch (and alternatives if risky): Medium. This changes the flow of our notification code. We no longer automatically cancel them at 100%. We no longer cancel them before showing a new one with the same ID. The other patch in here is an alternative fix, but is likely more risky.

String or IDL/UUID changes made by this patch: None.
Attachment #784092 - Flags: approval-mozilla-beta?
Flags: needinfo?(wjohnston)
(In reply to Wesley Johnston (:wesj) from comment #24)
> Comment on attachment 784092 [details] [diff] [review]
> Patch 1/2
> 
> This fix is already on Aurora.
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): bug 884792
> User impact if declined: Tapping on notifications does nothing
> Testing completed (on m-c, etc.): This has been on nightly and aurora for a
> month. No fallout that I've seen.
> Risk to taking this patch (and alternatives if risky): Medium. This changes
> the flow of our notification code. We no longer automatically cancel them at
> 100%. We no longer cancel them before showing a new one with the same ID.
> The other patch in here is an alternative fix, but is likely more risky.
> 
> String or IDL/UUID changes made by this patch: None.

Wesj, could you provide some pointers to QA for testing so the risk is mitigated here.We do have couple more beta's so it should be manageable.Also in hindsight, I think we should uplift this earlier in the cycle.

could you also help understand what is the worst case that may happen here related to notification ? Will the user be in the state that is described in this bug or are we talking about other worst possible situations, in which case we should get more testing as I said earlier to identify it.
Comment on attachment 784092 [details] [diff] [review]
Patch 1/2

Adding verify so QA can help with some addhoc testing related to notifications in addition to verifying this bug.
Attachment #784092 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Keywords: verifyme
The original problem as reported is still an issue on Nightly (08/27), Aurora (08/27) and https://tbpl.mozilla.org/?tree=Mozilla-Beta&rev=34fd3a39ab84 (missed FIREFOX_24_0b6_RELEASE).

(In reply to Teodora Vermesan (:TeoVermesan) from comment #0)
> Steps to reproduce:
> 1. Load any article from news.google.com
> 2. Open the custom menu and choose "Save as PDF"
> 
> Expected results:
> - After step 2, a download complete notification appears. By tapping it, the
> file is opened
> 
> Actual results:
> - There is no action after tapping the download complete notification 

I can still reproduce this on all of today's builds.
Flags: needinfo?(wjohnston)
Comment on attachment 784095 [details] [diff] [review]
Patch 2/2

Yeah, I can too. Adding some logging made the race swing back in my favor, so I wonder if that's what I was testing with before. I think we need this other patch.

I don't think we can do what you suggested, bard, as wel need a handle to the observer. Maybe I misunderstand what you're asking though.
Attachment #784095 - Flags: review- → review?(blassey.bugs)
Comment on attachment 784095 [details] [diff] [review]
Patch 2/2

Review of attachment 784095 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/android/nsAppShell.cpp
@@ +757,3 @@
>      return NS_OK;
>  }
>  

I'm suggesting to get rid of nsAppShell::AddObserver and have the one caller the of it in AndroidBridge call nsAppShell::PostEvent() instead
Posted patch PatchSplinter Review
Oh, that was an easy change! This works well for me.
Attachment #784095 - Attachment is obsolete: true
Attachment #784095 - Flags: review?(blassey.bugs)
Attachment #799159 - Flags: review?(blassey.bugs)
Flags: needinfo?(wjohnston)
Attachment #799159 - Flags: review?(blassey.bugs) → review+
Should this still be open?
Assignee: fedepaol → wjohnston
Target Milestone: --- → Firefox 26
Nope.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment on attachment 799159 [details] [diff] [review]
Patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: Tapping notifications doesn't do anything
Testing completed (on m-c, etc.): Landed on mc today. Looks good
Risk to taking this patch (and alternatives if risky): I've come around, and think the risk here is pretty minimal. We're using code that's already in place. Just moving around how its called.
String or IDL/UUID changes made by this patch: None.
Attachment #799159 - Flags: approval-mozilla-beta?
Attachment #799159 - Flags: approval-mozilla-aurora?
Comment on attachment 799159 [details] [diff] [review]
Patch

Approving, given the risk assessment.

Requesting QA verification to verify this on nightly before we go-to-build for beta.
Attachment #799159 - Flags: approval-mozilla-beta?
Attachment #799159 - Flags: approval-mozilla-beta+
Attachment #799159 - Flags: approval-mozilla-aurora?
Attachment #799159 - Flags: approval-mozilla-aurora+
Flags: needinfo?(aaron.train)
Keywords: checkin-needed
(In reply to bhavana bajaj [:bajaj] from comment #37)
> Comment on attachment 799159 [details] [diff] [review]
> Patch
> 
> Approving, given the risk assessment.
> 
> Requesting QA verification to verify this on nightly before we go-to-build
> for beta.

Verified fixed on:
Build: Firefox for Android 26.0a1(2013-09-08) 
Device: Samsung Galaxy Nexus
OS: Android 4.1.1
Verified fixed on:
Build: Firefox for Android 25.0a2(2013-09-12) and Firefox for Android 24 RC
Device: LG Nexus 4
OS: Android 4.2.2
Since I've verified the bug on all branches, I'll mark it as verified fixed.
Status: RESOLVED → VERIFIED
Flags: needinfo?(aaron.train)
Minusing for relnote as this is fixed on Fx24
You need to log in before you can comment on or make changes to this bug.