Closed Bug 1127550 Opened 9 years ago Closed 9 years ago

[MobileMessaging] Very slow sending while retrieving messages in a thread

Categories

(Firefox OS Graveyard :: RIL, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(tracking-b2g:backlog, b2g-v2.0 unaffected, b2g-v2.1 affected, b2g-v2.2 ?, b2g-master ?)

RESOLVED WORKSFORME
tracking-b2g backlog
Tracking Status
b2g-v2.0 --- unaffected
b2g-v2.1 --- affected
b2g-v2.2 --- ?
b2g-master --- ?

People

(Reporter: julienw, Assigned: bevis)

References

Details

(Whiteboard: [grooming])

[Blocking Requested - why for this release]:

STR:
1. have a thread with a lot of messages. The thread must take ~30sec to load completely. Most of the thread loading is invisible, but this is visible when looking at the CPU usage for example.
2. open this thread.
3. send a message right away.

=> Observe that the message is not added in the thread before some seconds.

I've added some logs and I see that we don't get the "sending" event until we retrieved all messages.

I see this on a v2.1 build on my dogfooding device. This is IMO a regression from 2.0 because I never noticed this issue before upgrading my build to v2.1. That's why I'm asking a blocker status.

QAWanted to test on newer builds. If you can't make it work I can try myself with my own data.

NI Bevis to look closer at this issue.
Flags: needinfo?(btseng)
QA Contact: bzumwalt
Unable to reproduce issue on Flame 2.1, 2.2, or 3.0

Sent message in thread containing over 100 messages (including three with over 1000 words of text) while thread was loading after rebooting device. Message sent in less than 4 seconds with "Sending..." message visible from the time the send button is pressed.

Device: Flame 2.1
Build ID: 20150129003733
Gaia: 9df43e6e696da43b6a1433283dc5b155987747bc
Gecko: 095a46f32cd4
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 34.0 (2.1)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Device: Flame 2.2
Build ID: 20150129003432
Gaia: 6e494f1d2676d231abba7dcc2e2822d1170d2d02
Gecko: 5e6fac01a72f
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 37.0a2 (2.2)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

Device: Flame 3.0 Master
Build ID: 20150129010239
Gaia: 9d2378a9ef092ab1fc15c3a9f7fc4171aab59d57
Gecko: 6bfc0e1c4b29
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 38.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:38.0) Gecko/38.0 Firefox/38.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: qawanted
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
(In reply to Julien Wajsberg [:julienw] from comment #0)
> [Blocking Requested - why for this release]:
> 
> STR:
> 1. have a thread with a lot of messages. The thread must take ~30sec to load
> completely. Most of the thread loading is invisible, but this is visible
> when looking at the CPU usage for example.
Does that means DOMCursor.continue() was requested continuously unexpectedly even when the message is not visible?
> 2. open this thread.
> 3. send a message right away.
> 
> => Observe that the message is not added in the thread before some seconds.
> 
> I've added some logs and I see that we don't get the "sending" event until
> we retrieved all messages.
If DOMCursor.continue() is requested continuously until all the messages are loaded,
then it is possible that the "sending" event is sent until all messages are retrieved,
because the "sending" event will be triggered after the outing message is saved to the DB.
> 
> I see this on a v2.1 build on my dogfooding device. This is IMO a regression
> from 2.0 because I never noticed this issue before upgrading my build to
> v2.1. That's why I'm asking a blocker status.
> 
> QAWanted to test on newer builds. If you can't make it work I can try myself
> with my own data.
> 
> NI Bevis to look closer at this issue.
Flags: needinfo?(btseng) → needinfo?(felash)
Yes we call DOMCursor.continue() continuously (but still in an asynchronous way because this API is asynchronous, right?).

But we do that for a long time, I don't think we changed this between v2.0 and v2.1 :/
Flags: needinfo?(felash) → needinfo?(btseng)
(In reply to Brogan Zumwalt [:BroganZ] from comment #1)
> Unable to reproduce issue on Flame 2.1, 2.2, or 3.0

ok I'll test myself.

> 
> Sent message in thread containing over 100 messages (including three with
> over 1000 words of text) while thread was loading after rebooting device.

100 messages is likely not enough. I think I have more thousands of message in this thread. This is not unlikely, when you use a phone there are some people you communicate a lot more with.

> Message sent in less than 4 seconds with "Sending..." message visible from
> the time the send button is pressed.

4 seconds is a lot, it should be less than 1 sec, and even less than 500ms.
(In reply to Julien Wajsberg [:julienw] from comment #3)
> Yes we call DOMCursor.continue() continuously (but still in an asynchronous
> way because this API is asynchronous, right?).
> 
> But we do that for a long time, I don't think we changed this between v2.0
> and v2.1 :/

Is it possible to have the same messages loaded in v2.0 for further clarification?
This seems not reproducible by QA per comment comment1. :(
It will be helpful to know whether this is a regression or not.

Anyway, I'll try to see if I can reproduce this locally as well.
Flags: needinfo?(btseng)
Actually I used the same message set in v2.0 than in v2.1 (my personal data set).

I'll see if I can reproduce easily with one of the performance workload, NI me.
Flags: needinfo?(felash)
(In reply to Julien Wajsberg [:julienw] from comment #4)
> (In reply to Brogan Zumwalt [:BroganZ] from comment #1)
> > Unable to reproduce issue on Flame 2.1, 2.2, or 3.0
> 
> ok I'll test myself.
> 
> > 
> > Sent message in thread containing over 100 messages (including three with
> > over 1000 words of text) while thread was loading after rebooting device.
> 
> 100 messages is likely not enough. I think I have more thousands of message
> in this thread. This is not unlikely, when you use a phone there are some
> people you communicate a lot more with.
> 
> > Message sent in less than 4 seconds with "Sending..." message visible from
> > the time the send button is pressed.
> 
> 4 seconds is a lot, it should be less than 1 sec, and even less than 500ms.

It could be reasonable to have message "sent" in 4 seconds according to the network/modem condition.
But, for "sending", as you said it should be displayed as soon as possible after the outgoing message is saved.
Set owner to myself and take this change to deeply review if there is any performance issue in current MobileMessageDB implementation. :)
Assignee: nobody → btseng
(In reply to Bevis Tseng[:bevistseng][:btseng] from comment #7)

> > > Message sent in less than 4 seconds with "Sending..." message visible from
> > > the time the send button is pressed.
> > 
> > 4 seconds is a lot, it should be less than 1 sec, and even less than 500ms.
> 
> It could be reasonable to have message "sent" in 4 seconds according to the
> network/modem condition.
> But, for "sending", as you said it should be displayed as soon as possible
> after the outgoing message is saved.

ah yeah right, I misread the sentence. "sending" should be displayed quickly.
I can't really reproduce on my flames, I reproduce only on my Open C. But I didn't use the same dataset, so I'll investigate more on monday.
I am not able to reproduce this in normal use cases with more than 3K messages of a single thread in both v2.0 & m-c.

Hence, it doesn't seems to be a regression, IMO.

However, when enabling MobileMessageDB logs(which cause device performance lower), I found a bottleneck in MobileMessageDB that allows me to reproduce the symptom in comment 0 only if user send a SMS immediately right after entering a message thread. 
(The delay I saw was less than 3 seconds after send button was pressed.)

So far, I'd like to summarize what I have found here for further reference/discussion:
1. In current implementation of |createMessageCursor|, there are 2 phases to be done as followed:
   a) Collect |ALL| message ids of selected threads. 
      This requires a dedicated transaction to sms object store and this transaction will be completed 
      only after all message ids are loaded.
   b) After the invocation of |getMessages| with first |continue| triggered internally,
      whenever a message id is collected and a DOMCursor.continue() has been requested,
      another transaction will be created to retrieve the specified message or more if |readAhead| is 
      configured.
2. The bottleneck is obvious in step a), if there is a WRITE request to the DB like sending a SMS.
3. To remove the bottleneck in a), there are 2 approaches to collect message ids on demand instead of 
   retrieving them all:
   a) Define a threshold to retrieve new ids if the number of |retrieved ids| is less than this threshold:
      - we have to know the number of messages in step a) which seems achievable by |IDBIndex.count|. [1]
      - Instead of getting all message ids, whenever the number of collected ids is reduced than this 
        threshold, we start the retrieve the rest of message until the threshold is met or no more to 
        retrieved.
   b) reuse the |SmsFilter| with start/end timestamp to load the message in the range of requested 
      timestamps.
      - This requires changes in both UX/Gaia/Gecko.
      - In gecko, currently, we don't support compound queries, i.e. define index with multiple keys 
        like |threadId| and |timestamp| which seems achievable by |IDBObjectStore::createIndex| with
        multiple keypathes applied. [2] 
      - In UX/Gaia, we could consider to load the messages in a thread within 1 week, 1 month, etc.

NI Julien to double confirm if this finding is the root cause of the symptom in comment 0:
1. How many messages are stored in the same thread?
2. Is it only reproducible if user send the sms immediately right after selecting a thread?
   (IMO, this is not a normal use case, because normal user always takes time to input something before 
    sending, unless there's already a draft to be sent.)
3. Anything other suspicious points?

If the problem looks different from your side, may I have your help to collect adb logcat with DEBUG = true in MobileMessageDB.jsm for further analysis?
https://dxr.mozilla.org/mozilla-central/source/dom/mobilemessage/gonk/MobileMessageDB.jsm#25

Thanks!

[1] https://developer.mozilla.org/en-US/docs/Web/API/IDBIndex.count
[2] http://www.w3.org/TR/IndexedDB/#widl-IDBObjectStore-createIndex-IDBIndex-DOMString-name-DOMString-sequence-DOMString--keyPath-IDBIndexParameters-optionalParameters
(In reply to Bevis Tseng[:bevistseng][:btseng] from comment #11)
> 
> However, when enabling MobileMessageDB logs(which cause device performance
> lower), I found a bottleneck in MobileMessageDB that allows me to reproduce
> the symptom in comment 0 only if user send a SMS immediately right after
> entering a message thread. 

> 2. Is it only reproducible if user send the sms immediately right after
> selecting a thread?
>    (IMO, this is not a normal use case, because normal user always takes
> time to input something before 
>     sending, unless there's already a draft to be sent.)

yes, that's exactly this.

I have a lot of messages in my thread (I'll try to count them later, so I'm keeping the needinfo) and it looks like it takes time to load. Maybe my inbox is not retrieved yet either (I also have a lot of different conversations), and maybe both loads compete with each other :/



It looks like a regression for me because I haven't seen this ever when I was dogfooding my Flame v2.0, but I see this while dogfooding the Open C on v2.1.
(In reply to Julien Wajsberg [:julienw] from comment #12)
> It looks like a regression for me because I haven't seen this ever when I
> was dogfooding my Flame v2.0, but I see this while dogfooding the Open C on
> v2.1.

This is weird to me because there is no only one minor change from v2.0 to v2.1 in MobileMessageDB and
that is bug 878533 which doesn't cause any performance impact. :-(

Since there is no DB version changed from v2.0 to v2.1,
Is it possible to clarify with the same data in Open C on v2.1 to be loaded on your Flame v2.0?
This is what I want to try today, yes :)
BTW, I am also interested in how long the delay of "sending" displayed is with how many messages you have in the same thread. :)
The reason to ask is that I only saw less than 3 second-delay with 3K messages only if I enable the log in MobileMessageDB.jsm in both v2.0 and m-c.

Thanks!
I have 169 threads and 1869 messages in the thread where I see the issue.

And I still don't reproduce on my Flame (v2.1).

I just flashed a new version of v2.1 on my Open C, I'll see if I reproduce in the next days. I did a first try and I don't, but maybe the STR is not what I thought initially. Clearing nom for now.
blocking-b2g: 2.1? → ---
Flags: needinfo?(felash)
blocking-b2g: --- → backlog
Whiteboard: [grooming]
Bevis, it seems that I don't see this issue anymore now that I use a more recent v2.1 build.

If I see this again I'll reopen but let's close for now.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
(In reply to Julien Wajsberg [:julienw] from comment #17)
> Bevis, it seems that I don't see this issue anymore now that I use a more
> recent v2.1 build.
> 
> If I see this again I'll reopen but let's close for now.

Sure, thanks for your information! :)
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.