Closed Bug 936703 Opened 11 years ago Closed 10 years ago

Error in call screen causes tests to fail

Categories

(Firefox OS Graveyard :: Gaia::Cost Control, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:1.4+, b2g-v1.4 fixed, b2g-v2.0 fixed)

RESOLVED FIXED
1.4 S4 (28mar)
blocking-b2g 1.4+
Tracking Status
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: mschwart, Assigned: vicamo)

References

Details

(Keywords: regression)

Attachments

(3 files, 4 obsolete files)

After initiating an MO call, the following error occurs:

E/GeckoConsole(  128): [JavaScript Error: "TypeError: self.hideKeyboardImmediately is not a function" {file: "app://system.gaiamobile.org/js/keyboard_manager.js" line: 410}]

This causes our automated tests for call handling to fail.
Adding koi as it's failing our regression tests.
blocking-b2g: --- → koi?
Looks like it may be a merge error from Bug 907625.
Flags: needinfo?(alive)
Flags: needinfo?(alive) → needinfo?(tzhuang)
Assignee: nobody → tzhuang
Flags: needinfo?(tzhuang)
Component: Gaia::Dialer → Gaia::Keyboard
Hi Michael,

Could you provide your gaia and gecko version of testing environment?
I cannot reproduce it on my ZTE Open.

 Gaia:     cc9fec48957c9d217221d403aac4b5f35a826dd6                           
 Gecko:    http://hg.mozilla.org/mozilla-central/rev/16949049f03d             
 BuildID   20131110040200                                                     
 Version   28.0a1

Steps:
1. Make an MO call from testing device
2. Device display call screen as usual (I didn't see any error message from keyboard_manager.js)

Thanks
Flags: needinfo?(mschwart)
https://github.com/mozilla-b2g/gaia/blob/a32e81bf56a0326dfcadcacf176f48a682a0dce4/apps/system/js/keyboard_manager.js#L399-L421

I don't understand what could possibly go wrong here. It's not possible to overwrite |self| in this function, and |hideKeyboardImmediately| indeed exists.

Please koi? again after figuring out the version used here. Might not be an issue in the open sourced codebase.
blocking-b2g: koi? → ---
Thanks for checking Tim.  It looks like that error was due to a merge issue (must have sync'd only part of a number of patches).

I am still getting AbortError which is an error we used to get before this new one popped up.  Setup a conference call to 3 destinations then end them all at the same time.  Do you have any recommendations on how to hunt down the source of an AbortError?

Coincidentally, the error is most often accompanied by these messages:

E/GeckoConsole(  125): [JavaScript Error: "stack is undefined" {file: "app://system.gaiamobile.org/js/trusted_ui.js" line: 293}]
I/Gecko   (  125): -*- QCContentHelper_QC_B2G: receiveMessage: 'child-process-shutdown' arrived from content process
I/Gecko   (  125): -*- QCContentHelper_QC_B2G: Unregistered icc target: [object ChromeMessageSender]
I/Gecko   (  125): -*- QCContentHelper_QC_B2G: Unregistered mobileconnection target: [object ChromeMessageSender]
E/GeckoConsole(  125): [JavaScript Error: "AbortError"]
E/GeckoConsole(  125): [JavaScript Error: "AbortError"]
E/GeckoConsole(  463): [JavaScript Warning: "Error in parsing value for 'font-size'.  Declaration dropped." {file: "app://communications.gaiamobile.org/dialer/index.html#keyboard-view" line: 0 column: 0 sour
ce: "40"}]
Flags: needinfo?(mschwart)
blocking-b2g: --- → 1.3?
(In reply to Michael Schwartz [:m4] from comment #5)
> Thanks for checking Tim.  It looks like that error was due to a merge issue
> (must have sync'd only part of a number of patches).
> 
> I am still getting AbortError which is an error we used to get before this
> new one popped up.  Setup a conference call to 3 destinations then end them
> all at the same time.  Do you have any recommendations on how to hunt down
> the source of an AbortError?

AbortError is related to IndexedDB; it looks like the automation flicks the UI too fast and caused the in-flight IndexedDB actions to be aborted. Unfortunately no file name is printed here so we cannot tell anything here, let alone figuring out if it's related. But with an educated guess I suspect this is related to call log.

Mozilla developers cannot debug local builds which we have no access too. Debugging the automated tests will be even harder since a certain timing is required to trigger the bug...
Assignee: tzhuang → nobody
Component: Gaia::Keyboard → Gaia::Dialer
Keywords: regression
QAWANTED to confirm this on Mozilla builds
Keywords: qawanted
(In reply to Joe Cheng [:jcheng] from comment #7)
> QAWANTED to confirm this on Mozilla builds

Mozilla QA isn't going to be able to help here - there's nothing even close to a hint of an end-user STR here. We need our partner to give a clearer path to reproduce to move forward here.
Flags: needinfo?(mschwart)
Keywords: qawanted
Error persists.  On a v1.3 build:

# Conference 3 MO calls
# Press End

All calls are ended as expected but AbortError is produced.  Same behaviour using Moz or Ref RIL.
Flags: needinfo?(mschwart)
etienne, do you have suggestions on how to move this forward? Thanks
Flags: needinfo?(etienne)
blocking-b2g: 1.3? → 1.3+
Greg,

Can you please here? There is nothing Moz is able to do here.
Flags: needinfo?(ggrisco)
Not going to block on this.  If we get a patch landed in master we can consider an uplift to v1.3 at that time.
blocking-b2g: 1.3+ → ---
(In reply to Preeti Raghunath(:Preeti) from comment #11)
> Greg,
> 
> Can you please here? There is nothing Moz is able to do here.

Hi Preeti, there is an error being produced on a 1.3 build with Moz RIL and we're looking for assistance on debugging an error.  Why is there nothing that Moz can do?
Flags: needinfo?(praghunath)
I heard that AbortError could be a DB failure which can happen if there are too many accesses at once.  That sound possible in this case since 3 calls are being ended all at once.  Does that sound possible to folks in the know?
Clearing. way out of my league :)
Flags: needinfo?(etienne)
Clearing my ni since m4 has provided some leads.
Flags: needinfo?(ggrisco)
Is there any update on this issue? I am now also seeing the abort error when sending an outgoing SMS.
I have very easy steps with the Mozilla ril.  Steps:

# Initiate an outgoing call
# Before the call is connected, the call is still in the dialing state, end the call

It is reproducible every time on a 1.3 build:

Gecko: ec0b1c0b54d78a052b499ff47e86fa0f0726ad17
Gaia: 77cc5367781d5770b4d94d8746cf4f5ecd80d190

This is really limiting our testing and needs to be addressed.
(In reply to Michael Schwartz [:m4] from comment #18)
> I have very easy steps with the Mozilla ril.  Steps:
> 
> # Initiate an outgoing call
> # Before the call is connected, the call is still in the dialing state, end
> the call
> 
> It is reproducible every time on a 1.3 build:
> 
> Gecko: ec0b1c0b54d78a052b499ff47e86fa0f0726ad17
> Gaia: 77cc5367781d5770b4d94d8746cf4f5ecd80d190
> 
> This is really limiting our testing and needs to be addressed.

QA Wanted - Can someone try to reproduce this on Buri to confirm this?
Keywords: qawanted
Moving to 1.4 since it blocks 1.4 CS
blocking-b2g: --- → 1.4+
Flags: needinfo?(praghunath)
I am not able to reproduce the 'AbortError' using the 02/21/14 1.3 build on a Buri. 

I attempted this using the STR from comment 3 and comment 18.

Device: Buri v1.3 MOZ RIL
BuildID: 20140221004002
Gaia: 8039a5cb7519adfa81677df577f494c6a4de6140
Gecko: e5f25becc0e7
Version: 28.0
Firmware Version: V1.2-device.cfg
Keywords: qawanted
QA Contact: mvaughan
We've tried multiple times here to reproduce this, but we still can't repro this in house. Without any actionable STR here after multiple attempts, I don't think we have a path forward to resolving this bug. 

Clearing the blocking flag here - we've not blocked on this previously on 1.2 & 1.3 for an inability to reproduce this bug in house & the same issue has happened again in 1.4. Please renominate with clear reproducible STR & user impact specified when that's known.
blocking-b2g: 1.4+ → ---
blocking-b2g: --- → 1.4+
m4, can you please attach a script that demonstrates the issue
Flags: needinfo?(mschwart)
Flags: needinfo?(mschwart)
(In reply to Jason Smith [:jsmith] from comment #22)
> We've tried multiple times here to reproduce this, but we still can't repro
> this in house. Without any actionable STR here after multiple attempts, I
> don't think we have a path forward to resolving this bug. 

I can re-produce this with ICS ARM emulator:

  [JavaScript Error: "IndexedDB" UnknownErr: IDBTransaction.cpp:863]

That dialer app always exits automatically after this message.  Although it's not an AbortError, the line number is the same with that in Michael's demonstration.
I accidentally find more similar error messages in running mobilemessage marionette test cases in emulator:

  E/GeckoConsole(   44): [JavaScript Error: "IndexedDB UnknownErr: IDBTransaction.cpp:863"]
  E/GeckoConsole(   44): [JavaScript Error: "AbortError"]
  E/GeckoConsole(   44): [JavaScript Error: "IndexedDB UnknownErr: IDBTransaction.cpp:863"]
Vicamo

Next steps here? Do we need anything from QC?
Flags: needinfo?(vyang)
(In reply to Preeti Raghunath(:Preeti) from comment #26)
> Vicamo
> 
> Next steps here? Do we need anything from QC?

I think not.  I can find similar error messages in every captured emulator log.
Flags: needinfo?(vyang)
Assigned to Vicamo, per active participation. Please fix by 3/24
Assignee: nobody → vyang
After some more investigation, the situation goes back to comment 22.  The process termination I saw is not a crash.  It's actually Usage app.

So basically:

  1) Nuwa creates preallocated slave process as pid 163
  2) pid 163 becomes Dialer app
  3) Nuwa creates another preallocated slave process as pid 343
  4) (dial & disconnect)
  5) pid 343 becomes Usage app
  6) Nuwa creates another preallocated slave process as pid 543
  7) pid 343 ends.  No crash.

In the end, I got pid 163 (Dialer) and 543 (preallocated).  No one crashes ever.

By the way, we have already a test case, test_outgoing_hangup_alerting.js [1], in effect for this scenario. Although it only runs in in-process mode now, it also passes in OOP mode.

[1]: http://mxr.mozilla.org/mozilla-central/source/dom/telephony/test/marionette/test_outgoing_hangup_alerting.js
Hi Michael Schwartz,

Maybe we need you to provide something more now.  Crash dump?  I wonder maybe we saw the same thing that day, because I kind of remember that when call screen vanished, the dialer screen was still there.  Since dialer doesn't create new process, that somehow follows that dialer app didn't crash.  I'll have a test on Unagi as well.  Thank you.
Flags: needinfo?(mschwart)
Unagi has the same result.
Hi Vicamo, I don't think I'm necessarily seeing a crash in any of the cases.  The AbortError just causes Marionette execution to stop.
Flags: needinfo?(mschwart)
Hi Vicamo

Do we need any help here?
Flags: needinfo?(vyang)
Attached file Github PR for Gaia
There is a small chance that WindowManager terminates CostControl app when it's still processing IndexedDB calls. This pull request add an optional |asyncStorage.oncomplete| callback slot so that we have a chance to delay |Common.closeApplication()| call in it.
Attachment #8394068 - Flags: feedback?(mschwart)
Flags: needinfo?(vyang)
Component: Gaia::Dialer → Gaia::Cost Control
m4 -- Can you please verify if these patches help?
Flags: needinfo?(mschwart)
I'm trying not to get too excited but this appears to fix the issue.  I tested the simple str (initiate a MO call and end it before it connects) as well as the more complicated str (establish a conf call with 3 call legs; end all calls).  Next step is to bring the test code back up to par but I can reopen if I find an issue then - looks good now.
Flags: needinfo?(mschwart)
Comment on attachment 8394076 [details] [diff] [review]
part 1/2: add optional asyncStorage.oncomplete callback

I think I'm not the one to review this part of the patch but I did it and you have my f+.
Attachment #8394076 - Flags: review?(salva) → feedback+
Comment on attachment 8394077 [details] [diff] [review]
part 2/2: close costcontrol app after all db transactions are completed

You have my comments on GitHub, please ask for my review once addressed. Thank you very much for these patches.
Attachment #8394077 - Flags: review?(salva)
Attachment #8394068 - Flags: feedback?(mschwart) → feedback+
Carol tested this fix with an outgoing SMS abort error she was seeing and the fix seems to have resolved that error as well. Thank you very much Vicamo.
Comment on attachment 8394076 [details] [diff] [review]
part 1/2: add optional asyncStorage.oncomplete callback

Hi David,

I traced back to commit history of "gaia/shared/js/asyncStorage.js" and found it was created and developed by you in 2012.  So, may I have your review?
Attachment #8394076 - Flags: review?(dflanagan)
Bug 960861 makes emulator always in airplane mode.  Will rebase after that being reverted.
Address review comment.
Attachment #8394077 - Attachment is obsolete: true
Attachment #8396269 - Flags: review?(salva)
Sorry, picked a wrong patch.
Attachment #8396269 - Attachment is obsolete: true
Attachment #8396269 - Flags: review?(salva)
Attachment #8396270 - Flags: review?(salva)
Comment on attachment 8396270 [details] [diff] [review]
part 2/2: close costcontrol app after all db transactions are completed : v3

Thank you Vicamo.
Attachment #8396270 - Flags: review?(salva) → review+
Comment on attachment 8394076 [details] [diff] [review]
part 1/2: add optional asyncStorage.oncomplete callback

Review of attachment 8394076 [details] [diff] [review]:
-----------------------------------------------------------------

It would be better if we didn't have to add a new feature like this.  Is there actually any real concern about data loss here with the Usage app exiting while transactions are pending, or would it be acceptable to just catch the AbortErrors in async_storage and convert them to console.error() messages?  If you can fix the failing tests by just adding a db.onabort handler in async_storage.js that seems like it would be simpler all around.

On the other hand, it is kind of cool to have this new feature in async storage, and I'm willing to give r+ with the issues below addressed.

::: shared/js/async_storage.js
@@ +73,5 @@
>  
> +  function withStore(type, f) {
> +    withDatabase(function() {
> +      var txn = db.transaction(STORENAME, type);
> +      txn.oncomplete = function() {

Doesn't the oncomplete event bubble up to the database itself? Rather than setting oncomplete on each transaction, couldn't you set it on the db?  That would simplify things.

Also, I don't understand why you don't increment pendingTransactions here. From the comments below it looks like there is a reason, but I don't understand it.  

If you can do the increment here, please do it here. If you can't, please add a comment explaining why you can't do it here and explaining where the increment is done.

@@ +76,5 @@
> +      var txn = db.transaction(STORENAME, type);
> +      txn.oncomplete = function() {
> +        --pendingTrasactions;
> +        if (!pendingTrasactions && asyncStorage.oncomplete) {
> +          asyncStorage.oncomplete();

since the asyncStorage object is global, it seems problematic to have only a single oncomplete callback.  Different parts of the same app might both want to use it and overwrite each other's callbacks.

On the other hand, a full addEventListener()/removeEventListener() solution seems like overkill.

What is the general use-case for this?  Will apps want to be notified repeatedly, or is it more likely that they will want a one-time notification.  If just one time, consider adding a method named whenDone() or something. It would take a function argument and add it to an array of functions.  When complete, call all of the functions and clear the array.  If whenDone() is called while nothing is pending, it would just invoke the function right away.  I that would give you cleaner code in the other part of the patch.

Also, I think you should wrap the call to the callback in try/catch and do an console.error() message if you catch an exception.

@@ +102,5 @@
>        req.onerror = function getItemOnError() {
>          console.error('Error in asyncStorage.getItem(): ', req.error.name);
>        };
> +      // Increase pendingTrasactions after the request has been made to prevent
> +      // unexpected increment.

This comment is repeated multiple times below. But I don't know what it means.  Why can't you increment in the utility functions above?  What would be unexpected about incrementing there?

@@ +213,5 @@
>      });
>    }
>  
>    return {
> +    isPending: isPending,

I think hasPendingTransactions() would be a better name for this method because it is more explicit.
Attachment #8394076 - Flags: review?(dflanagan) → review-
(In reply to David Flanagan [:djf] from comment #47)
> Comment on attachment 8394076 [details] [diff] [review]
> part 1/2: add optional asyncStorage.oncomplete callback
> 
> Review of attachment 8394076 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It would be better if we didn't have to add a new feature like this.  Is
> there actually any real concern about data loss here with the Usage app
> exiting while transactions are pending, or would it be acceptable to just
> catch the AbortErrors in async_storage and convert them to console.error()
> messages?  If you can fix the failing tests by just adding a db.onabort
> handler in async_storage.js that seems like it would be simpler all around.
> 
> On the other hand, it is kind of cool to have this new feature in async
> storage, and I'm willing to give r+ with the issues below addressed.
> 
> ::: shared/js/async_storage.js
> @@ +73,5 @@
> >  
> > +  function withStore(type, f) {
> > +    withDatabase(function() {
> > +      var txn = db.transaction(STORENAME, type);
> > +      txn.oncomplete = function() {
> 
> Doesn't the oncomplete event bubble up to the database itself? Rather than
> setting oncomplete on each transaction, couldn't you set it on the db?  That
> would simplify things.

http://www.w3.org/TR/IndexedDB/#database-interface
http://dxr.mozilla.org/mozilla-central/source/dom/webidl/IDBDatabase.webidl
IDBDatabase: onabort, onerror, onversionchange

http://www.w3.org/TR/IndexedDB/#object-store
http://dxr.mozilla.org/mozilla-central/source/dom/webidl/IDBObjectStore.webidl
IDBObjectStore: (none)

> Also, I don't understand why you don't increment pendingTransactions here.
> From the comments below it looks like there is a reason, but I don't
> understand it.  
> 
> If you can do the increment here, please do it here. If you can't, please
> add a comment explaining why you can't do it here and explaining where the
> increment is done.

Considering transaction callback functions may also throw and abort the process, I thought we should only increase that pending count at the end of the callback.  However, after some more thoughts, even they're aborted or partially aborted, the transaction is still valid and its oncomplete event will still be called.  I should really have moved that increment line here.  Thank you.

> @@ +76,5 @@
> > +      var txn = db.transaction(STORENAME, type);
> > +      txn.oncomplete = function() {
> > +        --pendingTrasactions;
> > +        if (!pendingTrasactions && asyncStorage.oncomplete) {
> > +          asyncStorage.oncomplete();
> 
> since the asyncStorage object is global, it seems problematic to have only a
> single oncomplete callback.  Different parts of the same app might both want
> to use it and overwrite each other's callbacks.
> 
> On the other hand, a full addEventListener()/removeEventListener() solution
> seems like overkill.
> 
> What is the general use-case for this?

There is no general use case now.  At lease I don't know.  The one trapped by this problem is our partner, although data loss should absolutely an issue for Gaia in my point of view.

> Will apps want to be notified
> repeatedly, or is it more likely that they will want a one-time
> notification.  If just one time, consider adding a method named whenDone()
> or something. It would take a function argument and add it to an array of
> functions.  When complete, call all of the functions and clear the array. 

I'm not quite sure what does that "one-time" mean.  There can be as many transactions as it can be.  Each transaction can be the last one that blocks app termination, and each oncomplete event of the transactions can trigger calls to whenDone callbacks.  Clearing that array means you have to re-register them.

> If whenDone() is called while nothing is pending, it would just invoke the
> function right away.  I that would give you cleaner code in the other part
> of the patch.

1) App calls whenDone(func) to register a callback.
2) No db operation pending, so let's invoke func directly. func is never ever registered at all.
3) App invoke db operations
4) When transaction complete, nothing happens. This bug remains open.

Somehow the discuss here leads me to a deeper layer.  Maybe I should rewrite that async_storage.js and call callback in transaction oncomplete event, instead of request onsuccess event.  After all, this is suggested in W3C IndexedDB document [1].

[1]: http://www.w3.org/TR/IndexedDB/#widl-IDBTransaction-oncomplete

> Also, I think you should wrap the call to the callback in try/catch and do
> an console.error() message if you catch an exception.

Sure.

> @@ +213,5 @@
> >      });
> >    }
> >  
> >    return {
> > +    isPending: isPending,
> 
> I think hasPendingTransactions() would be a better name for this method
> because it is more explicit.

Nice.
Attached patch patchSplinter Review
Hi David, Salvador, Michael,

This patch is a completely rewrite, so, Michael, I need your feedback again.

The reason we had all these problem is we tried to call |Common.closeApplication()| when IndexedDB transactions are not necessarily completed.  That's really not a recommended way to manipulate IndexedDB.  By invoking callback on transaction complete event, we can ensure all calls to asyncStorage APIs will only get called back after its binding transaction has been completed, so we're certainly free to call |Common.closeApplication()| then.

One problem(?) is cascaded db operations.  For example:

  001  asyncStorage.getItem(key, function(value){
  002    ...
  003    asyncStorage.setItem(key, value, function() {
  004      ...
  005    });
  006  });

We have all transactions completed at line 2 and 4.  So we have an extreme case: if somebody just calls |Common.closeApplication()| in line 2, we may still have the same problem here.  That doesn't make any sense to me, so let's hope nobody will ever commit such lines.

Another more interesting case is racing condition between system message handlers.  For example, we may have:

  1) system message 'sms-received' got
    1.1) clearTimeout
    1.2) ConfigManager.requestAll (callback_1)
  2) system message 'sms-received' got, again
    2.1) clearTimeout
    2.2) ConfigManager.requestAll (callback_2)
  3) callback_1 gets called
    3.1) call to Common.closeApplication
    3.2) invalidate database
  4) AbortError on dispatching requests for callback_2

I'm not quite sure if this deserves a fix at this time.  Salvador, this might be a question for you :)
Attachment #8394076 - Attachment is obsolete: true
Attachment #8396270 - Attachment is obsolete: true
Attachment #8397665 - Flags: review?(dflanagan)
Attachment #8397665 - Flags: feedback?(mschwart)
Flags: needinfo?(salva)
Comment on attachment 8397665 [details] [diff] [review]
patch

Review of attachment 8397665 [details] [diff] [review]:
-----------------------------------------------------------------

I like this! Callbacks on the transaction make more sense than on the requests. Consider doing the onerror handling on the transaction as well.  That way you could catch any AbortErrors that did occur in the corner cases you described.  Then you'd have a withStore() method that took three method arguments: withStore(type, request, success, error).

I am also scared of changing this file in 1.4 without stabilization time.  

Have you run the tests in apps/gallery/test/unit/asyncStorage_test.js? r+ is contingent on the tests passing, obviously.

I've posted to dev-gaia about this change, please check that thread to see if there are any objections before landing.

After landing this change to master, consider waiting a day or two for stabilization before uplifting to 1.4.  Or for 1.4, consider making a private copy of asyncStorage.js and only uplifting it to the one affected app.
Attachment #8397665 - Flags: review?(dflanagan) → review+
Comment on attachment 8397665 [details] [diff] [review]
patch

I tested the simple initiate a call and end case - no AbortError.
Attachment #8397665 - Flags: feedback?(mschwart) → feedback+
Blocks: 989179
(In reply to David Flanagan [:djf] from comment #50)
> Review of attachment 8397665 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I like this! Callbacks on the transaction make more sense than on the
> requests. Consider doing the onerror handling on the transaction as well. 

Like onsuccess/oncomplete issues we have here, "transaction.onerror" event may be fired multiple times depending on the implementation of the event listeners.  For me, a better candidate will be "transaction.onabort", which ensures that transaction must has been aborted.

Anyway, all these event listeners have their own meanings and whether should we take any or both of them remains a question, especially before a total examination of all use scenarios of asyncStorage helper.

> That way you could catch any AbortErrors that did occur in the corner cases
> you described.  Then you'd have a withStore() method that took three method
> arguments: withStore(type, request, success, error).

Filed bug 989179.  I'm not going to include error handling in this bug because it's not already there and I'd better not add new lines in a 1.4+ bug.  Besides, error handling should be done by developers familiar with their own components because there will be a ton of decisions to make.

> I am also scared of changing this file in 1.4 without stabilization time.  
> Have you run the tests in apps/gallery/test/unit/asyncStorage_test.js? r+ is
> contingent on the tests passing, obviously.

It's already included in Travis test https://travis-ci.org/mozilla-b2g/gaia/builds/21654468

> I've posted to dev-gaia about this change, please check that thread to see
> if there are any objections before landing.

For those would like to follow: https://groups.google.com/d/msg/mozilla.dev.gaia/ENmu2JgBX70/D6paZv9qJoIJ
https://github.com/mozilla-b2g/gaia/commit/de6c28247b128fcbd2bc8172fd77ced3eeb5872c
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Hi Vicamo,

Sorry to chime in kind of late but I am seeing a similar issue when sending many MO-SMS messages very quickly in succession. I had pulled in your patches from comment 34 through 36 when you first uploaded them and saw that the problem went away. Since then, I have not revisited this issue as I assumed your latest patch would also fix this but it doesn't seem to be the case. Running on Moz RIL or reference RIL, I see the following when rapidly sending SMS messages out:

11318:04-08 14:18:40.654   224   224 E GeckoConsole: [JavaScript Error: "AbortError"]

I'd imagine the fix would be similar to what you already made. Should I create a separate bugzilla to track this?
Flags: needinfo?(vyang)
Sorry for the delay.

(In reply to Vicamo Yang [:vicamo][:vyang] from comment #49)
> One problem(?) is cascaded db operations.  For example:
> 
>   001  asyncStorage.getItem(key, function(value){
>   002    ...
>   003    asyncStorage.setItem(key, value, function() {
>   004      ...
>   005    });
>   006  });
> We have all transactions completed at line 2 and 4.  So we have an extreme
> case: if somebody just calls |Common.closeApplication()| in line 2, we may
> still have the same problem here.  That doesn't make any sense to me, so
> let's hope nobody will ever commit such lines.

We can avoid this kind of problems by ensuring we only call close or even, we actually close only if there are no pending transactions. In case they are, we should only try to close in a while. Something like:

function closeApplication() {
  if (noMoreTransactions) { window.close(); }
  else { setTimeout(closeApplication, 100); }
}

WDYT?

>
> Another more interesting case is racing condition between system message
> handlers.  For example, we may have:
> 
>   1) system message 'sms-received' got
>     1.1) clearTimeout
>     1.2) ConfigManager.requestAll (callback_1)
>   2) system message 'sms-received' got, again
>     2.1) clearTimeout
>     2.2) ConfigManager.requestAll (callback_2)
>   3) callback_1 gets called
>     3.1) call to Common.closeApplication
>     3.2) invalidate database

I think same solution may solve this as well. Or not?
>   4) AbortError on dispatching requests for callback_2
> 
> I'm not quite sure if this deserves a fix at this time.  Salvador, this
> might be a question for you :)
Flags: needinfo?(salva)
(In reply to Carol Yang [:cyang] from comment #57)
> I'd imagine the fix would be similar to what you already made. Should I
> create a separate bugzilla to track this?

Please file a new one.  Your STR can be really helpful.  Thank you :)

(In reply to Salvador de la Puente González [:salva] from comment #58)
> Sorry for the delay.
> 
> (In reply to Vicamo Yang [:vicamo][:vyang] from comment #49)
> > One problem(?) is cascaded db operations.  For example:
> > 
> >   001  asyncStorage.getItem(key, function(value){
> >   002    ...
> >   003    asyncStorage.setItem(key, value, function() {
> >   004      ...
> >   005    });
> >   006  });
> > We have all transactions completed at line 2 and 4.  So we have an extreme
> > case: if somebody just calls |Common.closeApplication()| in line 2, we may
> > still have the same problem here.  That doesn't make any sense to me, so
> > let's hope nobody will ever commit such lines.
> 
> We can avoid this kind of problems by ensuring we only call close or even,
> we actually close only if there are no pending transactions. In case they
> are, we should only try to close in a while. Something like:
> 
> function closeApplication() {
>   if (noMoreTransactions) { window.close(); }
>   else { setTimeout(closeApplication, 100); }
> }
> 
> WDYT?
> 
> > Another more interesting case is racing condition between system message
> > handlers.  For example, we may have:
> > 
> >   1) system message 'sms-received' got
> >     1.1) clearTimeout
> >     1.2) ConfigManager.requestAll (callback_1)
> >   2) system message 'sms-received' got, again
> >     2.1) clearTimeout
> >     2.2) ConfigManager.requestAll (callback_2)
> >   3) callback_1 gets called
> >     3.1) call to Common.closeApplication
> >     3.2) invalidate database
> 
> I think same solution may solve this as well. Or not?

It's all about how do we design that "noMoreTransactions" flag.  Maybe we can move the discussion to Carol's new bug.
Flags: needinfo?(vyang) → needinfo?(cyang)
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #59)
> (In reply to Carol Yang [:cyang] from comment #57)
> > I'd imagine the fix would be similar to what you already made. Should I
> > create a separate bugzilla to track this?
> 
> Please file a new one.  Your STR can be really helpful.  Thank you :)

Thanks Vicamo! Created bug 994819.
Flags: needinfo?(cyang)
I've identified this commit into master as causing app launches to freeze on the on-device Python testing framework. 
It seems to be related to starting the app rapidly after the b2g process is started. A race condition, in other words.
The test scenario loads the Clock app which freezes on the loading section (black with the app's icon in the middle). We've replicated it while launching other apps too.
Blocks: 990580
See Also: → 994819
(In reply to Zac C (:zac) from comment #61)
> I've identified this commit into master as causing app launches to freeze on
> the on-device Python testing framework. 
> It seems to be related to starting the app rapidly after the b2g process is
> started. A race condition, in other words.
> The test scenario loads the Clock app which freezes on the loading section
> (black with the app's icon in the middle). We've replicated it while
> launching other apps too.

Thanks a million, Zac.  :jsmith, can we get help in the form of traction with this?
Flags: needinfo?(jsmith)
I pinged Vicamo on bug 990580 about this.
Flags: needinfo?(jsmith)
(In reply to Jason Smith [:jsmith] from comment #63)
> I pinged Vicamo on bug 990580 about this.

Thanks! vicamo I recommend we do a immediate backout here to fix the qablocker bug 990580
Just to place a clarification that bug 990580 has been resolved and there is no more action to take in this bug.
Flags: in-moztrap?(smiko)
Test case exists to address this issue: https://moztrap.mozilla.org/manage/case/2175/
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(smiko)
Flags: in-moztrap+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: