Closed Bug 994819 Opened 10 years ago Closed 10 years ago

Sending many SMS messages rapidly causes AbortError

Categories

(Firefox OS Graveyard :: Gaia::Cost Control, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(tracking-b2g:backlog)

RESOLVED FIXED
2.0 S2 (23may)
tracking-b2g backlog

People

(Reporter: cyang, Assigned: vicamo)

References

Details

(Whiteboard: [ft:ril][p=1])

Attachments

(1 file, 1 obsolete file)

This is related to bug 936703.

Running on Moz RIL or reference RIL, I see the following when rapidly sending SMS messages out:

11318:04-08 14:18:40.654   224   224 E GeckoConsole: [JavaScript Error: "AbortError"]
blocking-b2g: --- → 1.4?
blocks a blocker
blocking-b2g: 1.4? → ---
nom'ing for 1.4 as it blocks a blocker.
blocking-b2g: --- → 1.4?
Component: Gaia → Gaia::SMS
Hi Bevis, could you help investigate whether this AbortErrorcomes comes from message db or other place? Thanks.
Component: Gaia::SMS → RIL
Flags: needinfo?(btseng)
Can we have STR and what's the impact on the device after showing this error?
It's hard to know where is error come from with only one line log.

So far, I didn't see "AbortError" is defined in MobileMessage related implementation.
Flags: needinfo?(btseng) → needinfo?(cyang)
I've seen in the past that "AbortError" is sent by the IndexedDB implementation.

Now, I don't see why this should be a blocker: would a user do this? Or even be capable of doing this?
(In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #4)
> Can we have STR and what's the impact on the device after showing this error?
> It's hard to know where is error come from with only one line log.
> 
> So far, I didn't see "AbortError" is defined in MobileMessage related
> implementation.

That's an IndexedDB error.  We can probably try to have a test script that sends multiple SMS in a short time to observe this.  If that doesn't happen, try to send via Gaia SMS app.
See Also: → 936703
> That's an IndexedDB error.  We can probably try to have a test script that
> sends multiple SMS in a short time to observe this.  If that doesn't happen,
> try to send via Gaia SMS app.

I initially started seeing this when sending multiple SMSs out via a test script. But it is rather easy to reproduce even with Gaia SMS app directly. Just send about 5-10 SMS messages to the same number quickly (type and send while the previous message is still being sent).
Flags: needinfo?(cyang)
I just reproduced this locally.
1. send 6 SMS to the same recipient quickly.
2. From the log, I saw the AbortError twice.
3. 6 SMS are sent successfully.
4. 6 SMS are stored in DB successfully even after device reboot.
5. 6 SMS are received in the remote side correctly.

However, I didn't see any impact that causes this as a 1.4 blocker.
04-16 10:23:12.249  5826  5826 E GeckoConsole: [JavaScript Error: "AbortError"]
04-16 10:23:12.249  5826  6186 E GeckoConsole: [JavaScript Error: "IndexedDB UnknownErr: IDBTransaction.cpp:864"]
(In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #9)
> 04-16 10:23:12.249  5826  5826 E GeckoConsole: [JavaScript Error:
> "AbortError"]
> 04-16 10:23:12.249  5826  6186 E GeckoConsole: [JavaScript Error: "IndexedDB
> UnknownErr: IDBTransaction.cpp:864"]

http://hg.mozilla.org/mozilla-central/file/16e9cda44250/dom/indexedDB/IDBTransaction.cpp#l864
(In reply to Bevis Tseng [:bevistseng] (btseng@mozilla.com) from comment #8)
> I just reproduced this locally.
> 1. send 6 SMS to the same recipient quickly.
> 2. From the log, I saw the AbortError twice.
> 3. 6 SMS are sent successfully.
> 4. 6 SMS are stored in DB successfully even after device reboot.
> 5. 6 SMS are received in the remote side correctly.
> 
> However, I didn't see any impact that causes this as a 1.4 blocker.
Hello Bevis, these abort errors cause our test scripts to fail and without them it is very hard for us to ensure the quality of a release on a daily basis. This is not a new issue and has been plaguing us for a long time now. Unless this is marked a blocker I don't see anyone paying attention to it and so I would request to please consider this a blocker.
(In reply to Anshul from comment #11)
> Hello Bevis, these abort errors cause our test scripts to fail and without
> them it is very hard for us to ensure the quality of a release on a daily
> basis. This is not a new issue and has been plaguing us for a long time now.
> Unless this is marked a blocker I don't see anyone paying attention to it
> and so I would request to please consider this a blocker.

Thanks for clarification. We'll look into this and feedback you soon.
Assignee: nobody → btseng
After printing out the database name before rising the error of [JavaScript Error: "IndexedDB UnknownErr: IDBTransaction.cpp:864"],
I found that the database of this transaction |asyncStorage| even the solution in bug 936703 already included.
Need to dig out why there is access to asyncStorage after IndexedDBDatabaseParent::ActorDestroy().
Component: RIL → Gaia
After commenting this line in costcontrol app [1] to updateSMSCount(), I can not reproduce this problem anymore.
[1] https://github.com/mozilla-b2g/gaia/blob/master/apps/costcontrol/js/message_handler.js#L509

In addition, I found that coscontrol app was executed in the preallocated slave process by nuwa.

NI salva for further clarification of cost control app to see if
there is any clue that causes the access to the asyncStorage abnormally.
Component: Gaia → Gaia::Cost Control
Flags: needinfo?(salva)
CS blocker - so blocking+
blocking-b2g: 1.4? → 1.4+
Since this issue is more related to costcontrol app & the shared module of async_storage.js in gaia instead of RIL,
I'd like to ni correct owners(salva, david) who are more familiar with this to follow up.
Assignee: btseng → nobody
Flags: needinfo?(dflanagan)
Hey Fernando, maybe you can help with this FTU issue?
Flags: needinfo?(fernando.campo)
Ben: Do you know what the internal error at IDBTransaction.cpp:864 is about? (See comment #10)  Could there be something going wrong in gecko here?

Salva: How does the cost control app work when an SMS message is received? If lots of messages are received at the same time, is it possible that there could be two instances of the app running (and modifying the async storage db) at the same time?
Flags: needinfo?(dflanagan) → needinfo?(bent.mozilla)
(In reply to David Flanagan [:djf] from comment #18)
> Ben: Do you know what the internal error at IDBTransaction.cpp:864 is about?
> (See comment #10)  Could there be something going wrong in gecko here?

The invalidated state means that the parent process has decided that no further transactions are allowed to succeed for a given database. That usually happens when a transaction is in progress and the window that started the transaction gets closed.
Flags: needinfo?(bent.mozilla)
Hi Anshul,
From user experience, this doesn't look as critical as a blocker. 
Not sure how bad this impacts QC test script. 
It seems the investigation of root cause is ongoing, so is it possible to defer this as 2.0+ ?
Late fix landing also incurs risk, IMO.
Flags: needinfo?(anshulj)
(In reply to Julien Wajsberg [:julienw] from comment #17)
> Hey Fernando, maybe you can help with this FTU issue?
Did you mean to ni? me from another bug? I don't see the relation with FTE in here...
Flags: needinfo?(fernando.campo) → needinfo?(felash)
(In reply to Wesley Huang [:wesley_huang] from comment #20)
> Hi Anshul,
> From user experience, this doesn't look as critical as a blocker. 
> Not sure how bad this impacts QC test script. 
> It seems the investigation of root cause is ongoing, so is it possible to
> defer this as 2.0+ ?
> Late fix landing also incurs risk, IMO.
Wesley, I am fine with this change not landing on 1.4 but I would like to have some one work on it so we have a patch that we can pull in internally to enable our tests.
Flags: needinfo?(anshulj)
(In reply to Fernando Campo (:fcampo) from comment #21)
> (In reply to Julien Wajsberg [:julienw] from comment #17)
> > Hey Fernando, maybe you can help with this FTU issue?
> Did you mean to ni? me from another bug? I don't see the relation with FTE
> in here...

Yeah I really got confused, sorry.
Flags: needinfo?(felash)
(In reply to Anshul from comment #22)
> (In reply to Wesley Huang [:wesley_huang] from comment #20)
> > Hi Anshul,
> > From user experience, this doesn't look as critical as a blocker. 
> > Not sure how bad this impacts QC test script. 
> > It seems the investigation of root cause is ongoing, so is it possible to
> > defer this as 2.0+ ?
> > Late fix landing also incurs risk, IMO.
> Wesley, I am fine with this change not landing on 1.4 but I would like to
> have some one work on it so we have a patch that we can pull in internally
> to enable our tests.

Thanks, Anshul. 

No matter how, we should find someone to work on it of course. 
David, because you're the owner for this component(Cost Control), do you think that you can help to find someone to work on it? Thanks!
blocking-b2g: 1.4+ → 1.4?
Flags: needinfo?(dscravaglieri)
Comment 19 seems like a seriously big hint here about what is going on.  Whoever takes this bug might want to check how the app is exiting and have it wait for the async_storage update to call its success callback.
(In reply to David Flanagan [:djf] from comment #18)
> Ben: Do you know what the internal error at IDBTransaction.cpp:864 is about?
> (See comment #10)  Could there be something going wrong in gecko here?
> 
> Salva: How does the cost control app work when an SMS message is received?
> If lots of messages are received at the same time, is it possible that there
> could be two instances of the app running (and modifying the async storage
> db) at the same time?

Hi David. Nop. In theory, the system message is dispatched to the same window if there is a window already open. It should not open another window or process. If it does, it is not related with CC app but with window manager.

Nevertheless, I'm thinking it is possible the Cost Control application try to closes itself before completing all transactions but I though bug 936703 solved this.
Flags: needinfo?(salva)
Backlog per comment 24 - this isn't required for 1.4 anymore.
blocking-b2g: 1.4? → backlog
ni? reporter to confirm again as bug 936703 is solved
Flags: needinfo?(dscravaglieri) → needinfo?(cyang)
(In reply to Joe Cheng [:jcheng] from comment #28)
> ni? reporter to confirm again as bug 936703 is solved
Joe, as per comment #0, the bug 936703 doesn't solve the problem.
(In reply to Salvador de la Puente González [:salva] from comment #26)
> Nevertheless, I'm thinking it is possible the Cost Control application try
> to closes itself before completing all transactions but I though bug 936703
> solved this.

Hi, like I've pointed out in bug 936703 comment 49, the resolved state of bug 936703 doesn't follow that Cost Control will then be free of similar problems.  All that bug 936703 does is to prevent actions being taken before an IndexedDB transaction really completes.  However, there can still be cascaded transactions, and there can still be multiple concurrent transactions.  That's why you see the STR to replicate this bug is to "rapidly send SMS messages out".  These composite, concurrent transactions themselves become another super-transaction and absolutely need to take extra care, and it's beyond the work in bug 936703 but similar.
There are multiple weaknesses in cost control app:

1) apps/costcontrol/js/config/config_manager.js, function requestSettings, https://github.com/mozilla-b2g/gaia/blob/master/apps/costcontrol/js/config/config_manager.js#L174, callback may be invoked while a companion asyncStorage.setItem() call is on going.

2) apps/costcontrol/js/config/config_manager.js, function setOption, https://github.com/mozilla-b2g/gaia/blob/master/apps/costcontrol/js/config/config_manager.js#L234, callback may be invoked while a companion requestSettings() call is on going.

3) apps/costcontrol/js/config/config_manager.js, function syncOption, https://github.com/mozilla-b2g/gaia/blob/master/apps/costcontrol/js/config/config_manager.js#L250, no way to track complete state because this call doesn't accept a callback.

4) apps/costcontrol/js/message_handler.js, function _onNetworkAlarm, https://github.com/mozilla-b2g/gaia/blob/master/apps/costcontrol/js/message_handler.js#L333, function closeIfProceeds may be invoked while a companion ConfigManager.setOption() call is on going.

Just a rough scan of these two files.
Cancel NI per comment 29, 30, 31.
Flags: needinfo?(cyang)
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #31)
> There are multiple weaknesses in cost control app:

In my opinion, these weaknesses are probably not the root cause of this bug, but they can also become practical problems under some serious stress tests.  The root cause should be the racing between multiple system message handler transactions. I think.
Reproducible with emulator.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #33)
> In my opinion, these weaknesses are probably not the root cause of this bug,
> but they can also become practical problems under some serious stress tests.
> The root cause should be the racing between multiple system message handler
> transactions. I think.

Oops! It seems 2) alone is sufficient enough to wipes out all the AbortError messages from logcat.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #35)
> Oops! It seems 2) alone is sufficient enough to wipes out all the AbortError
> messages from logcat.

I don't have a good reason for this.  I thought that requestAll() is not re-entrant safe and might break normal control flow so no error was emitted, but it's not.  Anyway, I'm going to send a patch for this.
Attached file Github PR for Gaia
Assignee: nobody → vyang
Attached patch patch (obsolete) — Splinter Review
This patch is generated from previous git pull request (attachment 8416849 [details] [review]) for review.  It fixes only 1), 2), and 4) I mentioned in comment 31.  However, 2) alone is sufficient for this bug.
Attachment #8416851 - Flags: review?(salva)
Attachment #8416851 - Flags: feedback?(cyang)
Whiteboard: [ft:ril][p=1]
Target Milestone: --- → 2.0 S1 (9may)
Comment on attachment 8416851 [details] [diff] [review]
patch

Review of attachment 8416851 [details] [diff] [review]:
-----------------------------------------------------------------

Just attend the comments I leave you. Thank you very much for spotting these weakness.

::: apps/costcontrol/js/config/config_manager.js
@@ +222,5 @@
>      var currentICCID = Common.dataSimIccId || NO_ICCID;
>      asyncStorage.setItem(currentICCID, JSON.stringify(settings),
>        function _onSet() {
>          requestSettings(Common.dataSimIccId, function _onSettings(settings) {
> +          if (callback) {

I would prefer this call to be at the end of the process, I mean after dispatching the events.

::: apps/costcontrol/js/message_handler.js
@@ +328,5 @@
>          var limitText = Formatting.formatData(Formatting.smartRound(limit));
>          var title = _('data-limit-notification-title2', { limit: limitText });
>          var message = _('data-limit-notification-text2');
>          NotificationHelper.send(title, message, iconURL, goToDataUsage);
> +        ConfigManager.setOption({ 'dataUsageNotified': true }, closeIfProceeds);

Nice, thank you.
Attachment #8416851 - Flags: review?(salva) → review+
Comment on attachment 8416851 [details] [diff] [review]
patch

Addressed review comments in pull request.
Attachment #8416851 - Attachment is obsolete: true
Attachment #8416851 - Flags: feedback?(cyang)
Comment on attachment 8416849 [details] [review]
Github PR for Gaia

This pull request solves the problem in my emulator box, but I still need further confirmation from you.  Thanks.
Attachment #8416849 - Flags: feedback?(cyang)
Attachment #8416849 - Flags: review+
Comment on attachment 8416849 [details] [review]
Github PR for Gaia

Sorry I've been out of the office for the last few weeks. Just pulled this patch in and stopped seeing abort errors while running my test case which rapidly sends SMS out.
Attachment #8416849 - Flags: feedback?(cyang) → feedback+
(In reply to Carol Yang [:cyang] from comment #42)
> Sorry I've been out of the office for the last few weeks. Just pulled this
> patch in and stopped seeing abort errors while running my test case which
> rapidly sends SMS out.

Never mind.  Good to know it works for you as well.  Thank you!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: 2.0 S1 (9may) → 2.0 S2 (23may)
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: