B2G SMS: getMessages cursor is slow when SMS database contains large number of messages

RESOLVED FIXED in Firefox 21

Status

()

defect
P1
normal
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: borjasalguero, Assigned: vicamo)

Tracking

({dev-doc-needed, perf})

unspecified
B2G C4 (2jan on)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:tef+, blocking-basecamp:-, firefox19 wontfix, firefox20 wontfix, firefox21 fixed, b2g18 fixed, b2g18-v1.0.0 fixed)

Details

Attachments

(8 attachments, 32 obsolete attachments)

3.15 KB, text/plain
Details
12.93 KB, text/plain
Details
14.93 KB, patch
vicamo
: review+
Details | Diff | Splinter Review
130.92 KB, text/plain
Details
23.53 KB, patch
vicamo
: review+
Details | Diff | Splinter Review
8.04 KB, patch
vicamo
: review+
Details | Diff | Splinter Review
7.07 KB, patch
vicamo
: review+
Details | Diff | Splinter Review
12.13 KB, patch
vicamo
: review+
Details | Diff | Splinter Review
(Reporter)

Description

7 years ago
As we have 'filters' in order to get 'all messages' given a phone number, we would need to retrieve a fixed number of items in order to optimize the rendering in future/current SMS Apps.
(Reporter)

Updated

7 years ago
Blocks: 806592
(Reporter)

Updated

7 years ago
blocking-basecamp: --- → ?

Updated

7 years ago
Keywords: perf
Borja: Has anybody measured that this is actually a problem? I.e. can we stick 5000 messages in the database and see how long the API call takes? Note that just because you receive 5000 messages doesn't mean that you have to immediately render them all. So could someone do a measurement of *just* the API call which returns 5000 messages.
Need data here before we make a blocking call.
Flags: needinfo?(fbsc)
We designed SmsCursor in a way that allows the implementation to not load the full list of messages in memory when getMessages() is called. IOW, it shouldn't be significantly slower to get a lot of messages. If it is slow, we should fix the B2G backend of WebSMS.

But as Jonas said, does the SMS app just tries to do pagination?
OS: Gonk (Firefox OS) → All
Hardware: ARM → All
(Reporter)

Comment 4

7 years ago
I've tested in the patch which I was working with Ben in SF, and with a DB with 5000 threads, 50 messages each, getting the messages of a given number takes 2 seconds at least :S! So we have the following:
- In a crowded DB, getting a small portion of Data is sooooooo slow, could we optimize it?
- In a DB with only 10 threads, but 5000 messages each, the request would be fast if instead retrieving 5000 messages I retrieve 25? 
Thanks!
Flags: needinfo?(fbsc)
(Reporter)

Comment 5

7 years ago
And one thing more. I was checking how many messages we send/receive though an App as 'What's up', and the average is about 10000!!! So this case is not an 'edge case', somebody could have this number of messages (overall because some carriers have free sms) :S
(In reply to Borja Salguero [:borjasalguero] from comment #4)
> I've tested in the patch which I was working with Ben in SF, and with a DB
> with 5000 threads, 50 messages each, getting the messages of a given number
> takes 2 seconds at least :S!

I can't understand. Do you mean that getting N messages always takes 2 seconds regardless of the value of N?

> So we have the following:
> - In a crowded DB, getting a small portion of Data is sooooooo slow, could
> we optimize it?

What did you test exactly? If you have thousands of messages and the DB isn't indexed on threads, calling getThread() will be slow even if you get a few messages in return.

> - In a DB with only 10 threads, but 5000 messages each, the request would be
> fast if instead retrieving 5000 messages I retrieve 25? 

I'm not sure I understand this... Did you try to only get the 25 first messages?
Flags: needinfo?(fbsc)
(Reporter)

Comment 7

7 years ago
(In reply to Mounir Lamouri (:mounir) from comment #6)
> (In reply to Borja Salguero [:borjasalguero] from comment #4)
> > I've tested in the patch which I was working with Ben in SF, and with a DB
> > with 5000 threads, 50 messages each, getting the messages of a given number
> > takes 2 seconds at least :S!
> 
> I can't understand. Do you mean that getting N messages always takes 2
> seconds regardless of the value of N?
> 

In this case, as you can read, there are 5000 threads with 50 messages each, so N=50 in every case. If you like we could create a patch with a better approach. 
In this scenario it takes around 2 seconds, measured from the instant when we make the request to the API until retrieving the last message given a phone number.

> > So we have the following:
> > - In a crowded DB, getting a small portion of Data is sooooooo slow, could
> > we optimize it?
> 
> What did you test exactly? If you have thousands of messages and the DB
> isn't indexed on threads, calling getThread() will be slow even if you get a
> few messages in return.
> 

This is not related with 'getThreads', this is related with 'getMessages' given a filter. If you have a DB with a large amount of messages (10000 or 25000) and you make a request to 'getMessages' given a number, the time from requesting until getting the last messages is at the level of seconds :S

> > - In a DB with only 10 threads, but 5000 messages each, the request would be
> > fast if instead retrieving 5000 messages I retrieve 25? 
> 
> I'm not sure I understand this... Did you try to only get the 25 first
> messages?

It's a question. Do you think that we could reduce the time spent retrieving all messages given a phone number with pagination (instead of retrieving N messages -N could be 5000- we could retrieve only the first 25 ones)?.
Flags: needinfo?(fbsc)
(Reporter)

Updated

7 years ago
Flags: needinfo?(mounir)

Updated

7 years ago
Keywords: dev-doc-needed
(In reply to Borja Salguero [:borjasalguero] from comment #7)
> (In reply to Mounir Lamouri (:mounir) from comment #6)
> > (In reply to Borja Salguero [:borjasalguero] from comment #4)
> > > I've tested in the patch which I was working with Ben in SF, and with a DB
> > > with 5000 threads, 50 messages each, getting the messages of a given number
> > > takes 2 seconds at least :S!
> > 
> > I can't understand. Do you mean that getting N messages always takes 2
> > seconds regardless of the value of N?
> > 
> 
> In this case, as you can read, there are 5000 threads with 50 messages each,
> so N=50 in every case.

5000 threads with 50 messages each? Or 50 threads with 5000 messages each?

I guess either of them are interesting scenarios, but my recollection was that the scenario we had talked about measuring before was the time it took to display the beginning of a thread that contains 5000 messages.

> If you like we could create a patch with a better
> approach. 
> In this scenario it takes around 2 seconds, measured from the instant when
> we make the request to the API until retrieving the last message given a
> phone number.

How long does it take to get the first message? I believe that was what we talked about on IRC in addition to how long it takes to get all 5000 messages.

It makes sense that reading all 5000 messages takes a long time and a pagination API wouldn't change that. But you don't have to get all 5000 messages to put something on the screen. As soon as the cursor returns the first message you should be able to start rendering things on screen and so that's the time that matters the most.

Looking at the code though it does do a lot of the IO on the first call, but not all of it. So I'd still be interested in seeing numbers for how long it takes before the first message arrives.
I still believe this is something the implementation should be able to optimize. There is no need to do most of the IO. Can't we put a limit to X messages every time there is a call and just get the next X if it happens that we want to go farther?
Flags: needinfo?(mounir)
Yes, I would expect that if we run into performance issues here we can fix them without changing the API.
(Reporter)

Comment 11

7 years ago
(In reply to Jonas Sicking (:sicking) from comment #8)
> (In reply to Borja Salguero [:borjasalguero] from comment #7)
> > (In reply to Mounir Lamouri (:mounir) from comment #6)
> > > (In reply to Borja Salguero [:borjasalguero] from comment #4)
> > > > I've tested in the patch which I was working with Ben in SF, and with a DB
> > > > with 5000 threads, 50 messages each, getting the messages of a given number
> > > > takes 2 seconds at least :S!
> > > 
> > > I can't understand. Do you mean that getting N messages always takes 2
> > > seconds regardless of the value of N?
> > > 
> > 
> > In this case, as you can read, there are 5000 threads with 50 messages each,
> > so N=50 in every case.
> 
> 5000 threads with 50 messages each? Or 50 threads with 5000 messages each?
> 

Now I was testing 5000 threads with 50 messages each, because first step for performance was applying 'getThreads'. But as I told you, with this number if messages Im really worried about the time that we use for making the request 'getMessages' given a number... Probably we could create a better scenario, but I have some concerns about if we should optimize something there (Indexes, structure...)

> I guess either of them are interesting scenarios, but my recollection was
> that the scenario we had talked about measuring before was the time it took
> to display the beginning of a thread that contains 5000 messages.
> 

If you want we could create a branch with a 'shared' scenario in order to test the same, Wdyt? Could somebody of you create a small patch in Gecko in order to create this 'DB'? I will create the branch for Gaia and we could work on the same code.

> > If you like we could create a patch with a better
> > approach. 
> > In this scenario it takes around 2 seconds, measured from the instant when
> > we make the request to the API until retrieving the last message given a
> > phone number.
> 
> How long does it take to get the first message? I believe that was what we
> talked about on IRC in addition to how long it takes to get all 5000
> messages.
> 
> It makes sense that reading all 5000 messages takes a long time and a
> pagination API wouldn't change that. But you don't have to get all 5000
> messages to put something on the screen. As soon as the cursor returns the
> first message you should be able to start rendering things on screen and so
> that's the time that matters the most.
> 
> Looking at the code though it does do a lot of the IO on the first call, but
> not all of it. So I'd still be interested in seeing numbers for how long it
> takes before the first message arrives.

Hmmmm not really. After talking with UX this 'lazy rendering' it's ok for threads, but not for messages inside a Thread. The reason is that, as iPhone or Android does, once the 'slide' from 'thread list' to 'thread' is over, a 'spinner' appears and all messages are rendered at the same time. The trick here is that all these devices only render 25 messages, and there are a 'infinite screen' in order to load more. It's so useful to have this 'capability' in Gecko, and let the App to use it or not (is usual to have in API when you are requesting a set of data, you could specify how many entries you need).
(Reporter)

Comment 12

7 years ago
(In reply to Mounir Lamouri (:mounir) from comment #9)
> I still believe this is something the implementation should be able to
> optimize. There is no need to do most of the IO. Can't we put a limit to X
> messages every time there is a call and just get the next X if it happens
> that we want to go farther?

It could be nice to specify this limit in the request, as 'give me the latest X messages from/to this phone number'.  If there is no limit, the cursor let me acces to the whole list. We would optimize the IO because if we only need to show 25 messages, we could save the effort of IO in order to retrieve all messages when I only need a reduced set of data.
(In reply to Borja Salguero [:borjasalguero] from comment #11)
> Now I was testing 5000 threads with 50 messages each, because first step for
> performance was applying 'getThreads'. But as I told you, with this number
> if messages Im really worried about the time that we use for making the
> request 'getMessages' given a number... Probably we could create a better
> scenario, but I have some concerns about if we should optimize something
> there (Indexes, structure...)

Sorry, I'm not really following you here. Are you worried about the getThreadList call or the getMessages call? Or both. They are quite different in the backend so it would help a lot if we discuss them separately.

I didn't realize that you were worried about performance for the getThreadList API. It's too bad we didn't discuss that at the SF work week when we created that API. Do you really think it'll be common enough to have 5000 threads that we need to worry about it for v1?

> > I guess either of them are interesting scenarios, but my recollection was
> > that the scenario we had talked about measuring before was the time it took
> > to display the beginning of a thread that contains 5000 messages.
> > 
> 
> If you want we could create a branch with a 'shared' scenario in order to
> test the same, Wdyt? Could somebody of you create a small patch in Gecko in
> order to create this 'DB'? I will create the branch for Gaia and we could
> work on the same code.

I don't know that we need to make a huge deal out of just getting these numbers.

The three numbers we need are:

1. How long does it take to call getThreadList for a "large-but-reasonably-sized" *threadlist*? 5000 sounds larger than reasonable to me, but maybe that's not the case?
2. How long does it take to get the first message when calling getMessages() for a "large-but-reasonably-sized" *thread*.
3. How long does it take to get the last message when calling getMessages() for a "large-but-reasonably-sized" *thread*.

> > It makes sense that reading all 5000 messages takes a long time and a
> > pagination API wouldn't change that. But you don't have to get all 5000
> > messages to put something on the screen. As soon as the cursor returns the
> > first message you should be able to start rendering things on screen and so
> > that's the time that matters the most.
> > 
> > Looking at the code though it does do a lot of the IO on the first call, but
> > not all of it. So I'd still be interested in seeing numbers for how long it
> > takes before the first message arrives.
> 
> Hmmmm not really. After talking with UX this 'lazy rendering' it's ok for
> threads, but not for messages inside a Thread. The reason is that, as iPhone
> or Android does, once the 'slide' from 'thread list' to 'thread' is over, a
> 'spinner' appears and all messages are rendered at the same time. The trick
> here is that all these devices only render 25 messages, and there are a
> 'infinite screen' in order to load more.

This can be built with the API as it currently works.

1. Create a cursor by calling getMessages()
2. Call .next() on the cursor until you have enough messages that you can render the initial view.
3. When the user scolls down, call .next() as much as needed to load further messages.

So if you want to get 25 messages before removing the spinner you can do that. Just render the spinner until you have 25 messages in step 2 and then replace the spinner with those 25 messages. But keep a reference to the cursor so that when the user scolls down you can call .next() as appropriate.

The point of a cursor is that it's *more* flexible than a pagination API and allows more types of UIs. There really is very few things that you can do with a pagination API that you can't do with a cursor.
(Reporter)

Updated

7 years ago
Summary: [SMS API] Needs pagination in request to API → [SMS API] Request to API are really slow with a large amount of data
(Reporter)

Comment 14

7 years ago
I've updated the name of the bug.
Brief summary because there are some misunderstandings here:
- This is NOT related with 'getThreads'. 
- Having a large amount of Data in SMS DB is NOT an edge case. My 'Whatsup app' has 12000 messages... Despite of we are talking about SMS you have to take into account that sending SMS is so popular in countries of Latam, because some carriers offer 'free sms' to their customers. So 10000 messages could be usual.
- Using the cursor for rendering all messages that I need for UI is what Im doing now in my 'performance' branch;)
- What is this bug related?
Im gonna explain in detail the scenario:
+ Create a DB with 10000/15000 messages for 30 phone numbers. Particularly one thread, for the phone number A, should have 50 messages.
+ Make a request to API for retrieving all messages for phone number A ('getMessages' with a 'filter' for phone number A).
--> Expected: We have to retrieve the cursor in a small period of time, and getting the whole list of messages in a reasonable period of time (X seconds is not reasonable, should be milliseconds).

--> Currently: For getting the 50 messages that we requested, we take around 2 seconds.

--> Posible solution: Pagination? Improve performance in API/SMS DB?
Thanks for clarifying which part of the API this bug is about. If you are seeing performance problems with the getThreadList function, please don't hesitate to file a separate bug on that and include performance numbers.

I don't think anyone is arguing that 10000 or 15000 messages in the database is too much. The concern was simply that the bug didn't contain actionable data indicating what was slow.

Pagination won't be a solution here. If returning 50 messages takes two seconds then returning 25 will take no less than 1 second. Actually most likely the problem right now is that the performance is relative to the number of messages in the database, not the number of messages returned. So a pagination API would still take 2 seconds to return the first page of messages.

In any case, two seconds for 50 messages is too slow, so marking this a blocker.

I think what we should do here is to simply change the implementation of the current API such that it makes better use of indexes. If we do that we should be able to start returning results immediately.

That way it's entirely up to the app how many messages to get and when to get them. So you can get the first 25 messages as soon as you want to render the thread, and any number of more messages when the user scrolls or presses a button or whathaveyou.
blocking-basecamp: ? → +
Summary: [SMS API] Request to API are really slow with a large amount of data → [SMS API] getMessages cursor is slow when SMS database contains large number of messages
Jonas, what does getThreadList returns? Is it a cursor or an array?

Also, some performance issues can be solved by the app itself. The stock Android messaging app has a user-configurable limit of messages each thread can have. I think it is a few hundreds by default (<500). It's a nice and simple way to prevent having to deal with 5000 messages per thread, which, I believe it still completely crazy even if doable.
getThreadList returns an array. But the problem here isn't related to getThreadList but rather getMessages. Any getThreadList issues should be filed separately.

The implementation of getMessages is currently very inefficient so we should just fix that to use indexes properly. Currently I don't believe it takes much advantage of being a cursor at all.
> Also, some performance issues can be solved by the app itself. The stock
> Android messaging app has a user-configurable limit of messages each thread
> can have. I think it is a few hundreds by default (<500). It's a nice and
> simple way to prevent having to deal with 5000 messages per thread, which, I
> believe it still completely crazy even if doable.

That wouldn't help here. We're seeing slow performance even when calling getMessages on a thread which just contains 50 messages.

The problem here is simply that getMessages isn't optimized for performance at all. At least not for the filters we're currently using.
Having a max number for threads will help because it will reduce the number of messages in the DB as a consequence. Seems pretty hard to have 50k messages if you only have 500 messages per threads.

Also, I wonder how useful getThreadList() is if we return the full thread in an array: if the app wants to show the first 25 messages and load the next ones, it will have to use getMessages(), right? I thought getThreadList() was there to actually be able to make that use case faster.
We won't ever get good performance as long as we do full table scans. We have to use indexes correctly. And once we use indexes correctly it should be no problems to scale up to 100000 messages.
(Reporter)

Comment 21

7 years ago
(In reply to Mounir Lamouri (:mounir) from comment #19)
> Having a max number for threads will help because it will reduce the number
> of messages in the DB as a consequence. Seems pretty hard to have 50k
> messages if you only have 500 messages per threads.
> 
> Also, I wonder how useful getThreadList() is if we return the full thread in
> an array: if the app wants to show the first 25 messages and load the next
> ones, it will have to use getMessages(), right? I thought getThreadList()
> was there to actually be able to make that use case faster.

Only for clarifying, THIS IS NOT related to 'getThreadList', it's related with 'getMessages' given a filter.
(In reply to Borja Salguero [:borjasalguero] from comment #21)
> Only for clarifying, THIS IS NOT related to 'getThreadList', it's related
> with 'getMessages' given a filter.

Just FYI, We tested getThreadList on an unagi with 1000 *threads* and it was super fast.
(Reporter)

Comment 23

7 years ago
(In reply to ben turner [:bent] from comment #22)
> (In reply to Borja Salguero [:borjasalguero] from comment #21)
> > Only for clarifying, THIS IS NOT related to 'getThreadList', it's related
> > with 'getMessages' given a filter.
> 
> Just FYI, We tested getThreadList on an unagi with 1000 *threads* and it was
> super fast.
+1!
Ben, are you working on this?
Assignee: nobody → bent.mozilla
I wasn't... But then you assigned it to me... Is that a hint or a goof?
I think we should find someone else here.
Assignee: bent.mozilla → nobody
Who else can do this?  Gregor?
Setting priority based on triage discussions.  Feel free to decrease priority if you disagree.
Priority: -- → P1
(In reply to Andrew Overholt [:overholt] from comment #27)
> Who else can do this?  Gregor?

My blockers count says no.
I'm starting doing timings to at least understand where we are slow.
Fernando, would you have time to look at this?

The reason we are slow is because filters are implemented very inefficiently.

There are several things that can/should be fixed. I'll try to list them in decreasing order of importance:

A) When getMessages is called without a date-range filter we create a date-range filter which encompasses all of time. We then iterate through an index to find all messages which matches this filter, which in this case means that we iterate over *all* messages. We then do separate passes for other properties that we want to filter on and do a union operation to find all messages that match all filters.

This is clearly very suboptimal in almost all cases. Iterating over all messages and then doing a union operation on the result of that takes a very long time and doesn't do anything at all.

The only time we need to iterate all messages is for the special case when *no* filters are applied, but we should handle that as a special case.

B) The way getMessages are implemented, before it returns any results at all to the cursor, it figures out the full list of messages that match the filter. This is unneccesary for the case of a paginated UI which wants to just get the first X results.

What we should do is to start returning results as soon as we have them. This is very simple when we're only filtering on a single property, which is the common case in our app. For this case we should simply figure out which index we should iterate over, figure out if we should have a forward or a backward cursor, and then start sending results back to the API caller.

I.e. in this case we shouldn't build a full list and then iterate the list as cursor.next() is called. There are several faster strategies that can be used.

I believe the easiest one to implement would be that as soon as the IDB cursor returns a result, send it back to the SMS-API caller. However still keep iterating the IDB cursor and append results to the end of the list. As the SMS cursor keeps asking for more items we read them from the start of the list until we reach the end and the IDB cursor also reaches the end.


Fixing these two above issues should make the current API blazingly fast.

C) Even for multi-property filters we can make clever use of cursors to ensure that we only read the appropriate data out of the database. I.e. we can in most cases iterate just the needed set of messages and return them, rather than creating multiple lists and performing union operations.

This is however much more complex and I believe isn't needed by the way the SMS app is currently using the SMS API.

However if it turns out to be needed, let me know and I can help out whoever works on this with out to structure IDB indexes for optimal performance.
Assignee: nobody → frsela
Posted patch wip patch (obsolete) — Splinter Review
wip patch, sponsored by Delta Airlines making me waiting 5 hours in an airport.

That works pretty well with a set of 2000 messages, except the first screen since gaia does not use yet getThreadList().

I added a new index (senderOrReceiver) that combines both numbers in a single index. That saves us to do a double scan and a key union.

shell.js has (commented) code at the end to generate a randomized sms database; it's better to create it on desktop since that takes some time.
Attachment #687618 - Flags: feedback?(jonas)
Attachment #687618 - Flags: feedback?(frsela)
Hi Jonas and all,

The Fernando you mean is Fernando Jimenez who wrote this code :p

I am not opposed to review it but I think it is better to ask Fernando Jimenez
Flags: needinfo?(ferjmoreno)
Comment on attachment 687618 [details] [diff] [review]
wip patch

Awesome patch Fabrice! Thanks a lot :)!

As I already mentioned you via IRC, if I am not wrong the 'senderOrReceiver' index would not work properly in the case that we get the MSISDN from the SIM card, which is not the common case but happens with some carriers. Anyway, I can't think about any reason to store the sender in the case of a sent SMS or the receiver for received SMSs... so we could probably ignore the MSISDN [1] and always store 'null'. Unless you can find a reason to store the real MSISDN.

[1] https://mxr.mozilla.org/mozilla-central/source/dom/sms/src/ril/SmsDatabaseService.js#517
Attachment #687618 - Flags: feedback?(frsela)
Attachment #687618 - Flags: feedback?(ferjmoreno)
Attachment #687618 - Flags: feedback+
Flags: needinfo?(ferjmoreno)
Posted patch patch v2 (obsolete) — Splinter Review
Updated according to last comments. I'd love Barja to test with that patch before landing though.
Assignee: frsela → fabrice
Attachment #687618 - Attachment is obsolete: true
Attachment #687618 - Flags: feedback?(jonas)
Attachment #687904 - Flags: review?(ferjmoreno)
(Reporter)

Comment 36

7 years ago
[:fabrice] for sure! Tomorrow I will check with [:ferjm] the patch ;)
Comment on attachment 687904 [details] [diff] [review]
patch v2

Review of attachment 687904 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +1035,5 @@
> +
> +        let currentThread = Services.tm.currentThread;
> +        while (messageId == null && !cursor.done) {
> +          currentThread.processNextEvent(false);
> +          messageId = cursor.ids.shift();

Ugh! Spinning the event loop is evil! Could you instead register a onsuccess handler on the cursor using cursor.addEventListener("success", ...)?

That way you can wait until there's a result from the cursor and check if you got a new result or if the cursor is done. The event listener added using cursor.addEventListener is guaranteed to run after the cursor added using .onsuccess since the .onsuccess listener is registered first.
(Assignee)

Updated

7 years ago
Summary: [SMS API] getMessages cursor is slow when SMS database contains large number of messages → B2G SMS: getMessages cursor is slow when SMS database contains large number of messages
Borja: Please keep in mind that getting the first message should be dramatically faster with this patch, as should getting the first, say, 25 messages.

However getting *all* messages for a thread can still take a non-trivial amount of time if there are a lot of messages in the thread.

But the great thing about this patch is that results should start coming in immediately. And the total time should be much more proportional to the number of messages in the thread, rather than the number of messages in the database.
(Reporter)

Comment 39

7 years ago
[:sicking] Cool. I have the patch installed in my device so tomorrow morning I will try with the Gaia patch. Only one doubt, Why do we need a long time for building the entire DB? Tomorrow I will come back with some feedback! Thanks
Posted patch patch v3 (obsolete) — Splinter Review
Addressing Jonas's comments
Attachment #687904 - Attachment is obsolete: true
Attachment #687904 - Flags: review?(ferjmoreno)
Attachment #688495 - Flags: review?(jonas)
Attachment #688495 - Flags: review?(ferjmoreno)
Posted patch patch v4 (obsolete) — Splinter Review
Rebased on current inbound because bug 811538 landed.
Attachment #688495 - Attachment is obsolete: true
Attachment #688495 - Flags: review?(jonas)
Attachment #688495 - Flags: review?(ferjmoreno)
Attachment #688509 - Flags: review?(ferjmoreno)
Comment on attachment 688509 [details] [diff] [review]
patch v4

Review of attachment 688509 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Fabrice!

r=me with the comments addressed, please.

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +96,5 @@
>      };
>    });
>  
>    this.messageLists = {};
> +  this.cursorReqs = {};

Could you add a comment explaining how is |this.cursorReqs| being populated and how is used later in |getNextMessageInList()|, please? That would help a lot to understand the whole patch :)

@@ +439,5 @@
>        cursor.continue();
>      }
>    },
>  
> +  upgradeSchema5: function upgradeSchema5(objectStore) {

I guess this one would be upgradeSchema6.

@@ +446,5 @@
> +
> +    objectStore.openCursor().onsuccess = function(event) {
> +      let cursor = event.target.result;
> +      if (!cursor) {
> +        if (DEBUG) debug("updgradeSchema5 done");

Ditto.

@@ +451,5 @@
> +        return;
> +      }
> +
> +      let message = cursor.value;
> +      message.senderOrReceiver = message.sender || message.receiver;

Since we are updating a previously existing DB, we might find here the MSISDN issue that I mentioned in comment 34. We should probably  check |message.delivery| before assigning a value to |message.senderOrReceiver|. It should be |message.receiver| for sent SMSs (DELIVERY_SENT) or |message.sender| for received SMSs (DELIVERY_RECEIVED).

@@ +957,5 @@
> +      } else {
> +        indexName = "timestamp";
> +        if (filter.startDate != null && filter.endDate != null) {
> +          keyRange = IDBKeyRange.bound(filter.startDate.getTime(),
> +                                           filter.endDate.getTime());

nit: alignment.

@@ +1036,5 @@
> +      // Retrieve the keys from the 'senderOrReceiver' indexes that
> +      // match the values of filter.numbers
> +      if (filter.numbers) {
> +        for (let i = 0; i < filter.numbers.length; i++) {
> +          filtered = true;

Take the assignment out of the for loop, please.

@@ +1079,2 @@
>  
> +      debug("timeKeyRange: " + timeKeyRange + " filtered: " + filtered);

Prefix the |debug()| call with |if (DEBUG)|, please.

@@ +1151,5 @@
> +        // to provide something.
> +        cursor.cursor.addEventListener("success",
> +          function waitForResult(aEvent) {
> +            cursor.cursor.removeEventListener("success", waitForResult);
> +            debug("onsuccess called");

Remove this debug message, please.

@@ +1179,2 @@
>  
> +    getMessage(messagId);

typo: 'messageId'
Attachment #688509 - Flags: review?(ferjmoreno) → review+
Mass Modify: All un-milestoned, unresolved blocking-basecamp+ bugs are being moved into the C3 milestone. Note that the target milestone does not mean that these bugs can't be resolved prior to 12/10, rather C2 bugs should be prioritized ahead of C3 bugs.
Target Milestone: --- → B2G C3 (12dec-1jan)
(Reporter)

Comment 44

7 years ago
Well, brief summary after testing the patch:
- Create the 'Test DB' takes... minutes :S!
- 'getThreadList' is working fast
- 'getMessages' with filter. Following results:
	+ Phone number '1801385094', 40 Messages retrieved, 452 ms in order to get the first one
	+ Phone number '1918258035', 55 Messages retrieved, 1066 ms in order to get the first one
	+ Phone number '340261238', 50 Messages retrieved, 1410 ms in order to get the first one
	+ Phone number '5221096486', 42 Messages retrieved, 92593 ms in order to get the first one
	+ Phone number '6993585343', 31 Messages retrieved, 132548 ms in order to get the first one

I've made this test linearly, clicking in one thread, checking the time, coming back to the thread-list and choosing other one. I dont know what it's happening there, but the test show a huge impact in the performance! Do you know what is happening? Any idea?
Can you post your sms database somewhere so we use the same one for timings?
(Reporter)

Comment 46

7 years ago
I was using the DB that is in Gecko as a test (I've not created a new one)
Posted patch rebased patch (obsolete) — Splinter Review
Here's a rebased patch. Fernando, can you check the timings with Barja? I'm not seeing that here, but I'm not on the same gaia branch either. I had consistent sub-1 seconds display time for the lists, with no degradation.

I really don't want to land that before we're 100% sure we have good performance.
Attachment #688509 - Attachment is obsolete: true
I've been doing some tests and it seems that the patch is working great, we get pretty good times (< 0.5 seconds) retrieving the messages with the test db that is provided along with the first WIP patch. I am attaching a log with the debug messages (sorry, they are in spanish, but it basically says "Time needed for retrieving the messages" and "Number of messages").

Anyway, I've found that we might have a bottle neck in the |markMessageRead()| function. To get this timings I had to comment the gaia code responsible for calling |markMessageRead()| for each message when entering in a message view.

Without commenting this code, the times are terrible if the transactions for marking the messages as read are not completed once you move from one thread to another. I don't know how IndexedDB work internally, but I guess that all the transaction requests are queued and we don't start the one for getting the messages until all the ones for marking the previously retrieved messages as read are completed. I am wondering if we should fix Bug 771463 to have an unique an atomic transaction for marking a thread as read, instead of opening multiple transactions for each message, that I guess it would be more expensive in terms of performance.
Flags: needinfo?(jonas)
Logcat with the calls to |markMessageRead()| in Gaia
This is why it's important to measure just the API calls and not have any UI logic when measuring API timing :)

The way IndexedDB works is that if you have multiple "readwrite" transactions, they execute one after the other. I.e. the first one has to finish before the second one can start.

There's definitely a lot of overhead involved if we mark a thousand messages as read at the same time. However that doesn't seem like a common usecase. Usually you don't end up with more than 10-20 messages as unread in a single thread. So it doesn't seem to me like something we should worry about for now?

Marking several messages as read sounds like a good idea, but probably not something that matters for v1.
Flags: needinfo?(jonas)
(In reply to Jonas Sicking (:sicking) from comment #50)
> There's definitely a lot of overhead involved if we mark a thousand messages
> as read at the same time. However that doesn't seem like a common usecase.
> Usually you don't end up with more than 10-20 messages as unread in a single
> thread. So it doesn't seem to me like something we should worry about for
> now?
> 
> Marking several messages as read sounds like a good idea, but probably not
> something that matters for v1.

I am afraid that it might not be that edge case for users from carriers that offers free SMS. Also I personally have lots of unread SMS on my android phone, cause I use to read them directly from the notification bar.

Note that the number of messages that we are testing is not that big. Around 50 in some cases. But the times are pretty worst.

I personally would prefer to have this implemented for v1. I could work on a patch for it if needed :)

We could also do some work in Gaia to avoid marking *all* the messages of a thread as read, but only the ones that are shown in the screen.
Please file a separate bug on the markMessageAsRead slowness and cc me. I think think the slowness can be worked around in the UI.
https://hg.mozilla.org/mozilla-central/rev/057218b30dcb
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Something got backed out here, not fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This comment stream is confusing - what needs to happen here?
Apparently the whole shebang got backed out due to failing tests and needs to be fixed and relanded.
Posted patch Updated SMS tests (obsolete) — Splinter Review
Corresponding SMS test updates. Will prevent the tests from failing on TBPL when the changes in this issue are submitted.
Attachment #693111 - Flags: review?(fabrice)
Posted patch Updated SMS tests (obsolete) — Splinter Review
Corresponding SMS test updates. Will prevent the tests from failing on TBPL when the changes in this issue are submitted.
Attachment #693117 - Flags: review?(fabrice)
Comment on attachment 693111 [details] [diff] [review]
Updated SMS tests

Firefox crashed in the middle of uploading the patch so the same patch was uploaded twice, please ignore this one.
Attachment #693111 - Flags: review?(fabrice)
Attachment #693111 - Attachment is obsolete: true
Comment on attachment 693117 [details] [diff] [review]
Updated SMS tests

Review of attachment 693117 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks a lot, Rob!

I pushed everything to try, and will land if this turns green:
https://tbpl.mozilla.org/?tree=Try&rev=57a064fe5a4c
Attachment #693117 - Flags: review?(fabrice) → review+
and... it looks we were backed out again in
https://hg.mozilla.org/integration/mozilla-inbound/rev/c8102d94b2fd

Rob, can you take a look at these (new?) test failures?
Blocks: 823076
With the latest patch (including test changes) the SMS tests in question are failing due to a new problem.

If you do an sms.getMessages using a new / default MozSmsFilter, the sms request never returns onsucess or onerror - causing the test to timeout. This used to work fine, and the request would return success whether or not any messages were found.

If you specify a filter attribute (i.e. filter.read = true) then the getMessages request returns onsuccess.

All of the failing SMS tests first do a search for any existing messages (using getMessages and new/default MozSmsFilter) to delete them and start clean, so that is why all of those tests are failing.
Posted patch patch v3 (obsolete) — Splinter Review
Rob,

Can you try this version of the patch. There was indeed a bug where we never notified if we had an empty result set.
Attachment #690165 - Attachment is obsolete: true
How'd the try run go? Is latest patch ready for review?
(In reply to Dietrich Ayala (:dietrich) from comment #69)
> How'd the try run go? Is latest patch ready for review?

Looks like B2G crashes during test_message_classes.js with this patch.
Vicamo, if this is crashing during tests, it's likely due to bugs deeper than in the code changed by this patch. Could you have a look while Fabrice is on vacation?
Assignee: fabrice → vyang
Target Milestone: B2G C3 (12dec-1jan) → B2G C4 (2jan on)
How are things here, Vicamo?
Flags: needinfo?(vyang)
(Assignee)

Comment 73

6 years ago
(In reply to Andrew Overholt [:overholt] from comment #72)
> How are things here, Vicamo?

Sorry, I'll help on this today :)
Flags: needinfo?(vyang)
(Assignee)

Comment 74

6 years ago
Comment on attachment 693117 [details] [diff] [review]
Updated SMS tests

obsoleted by attachment #695071 [details] [diff] [review]
Attachment #693117 - Attachment is obsolete: true
(Assignee)

Comment 75

6 years ago
I got:

> ###!!! [Child][SyncChannel] Error: Channel error: cannot
> send/recv"

Seem to be a Marionette bug?
(Assignee)

Comment 76

6 years ago
Yes, confirmed. It's not related to the patch attached. Current Gecko build suffers from this problem.

> ###!!! [Parent][SyncChannel] Error: Value error: message was
> deserialized, but contained an illegal value
> [Parent 165] ###!!! ABORT: failed to construct LayersChild: file /home/vicamo/WorkSpace/mozilla/b2g/qemu-arm/gecko/widget/xpwidgets/nsBaseWidget.cpp, line 877
> mozalloc_abort: [Parent 165] ###!!! ABORT: failed to construct LayersChild: file /home/vicamo/WorkSpace/mozilla/b2g/qemu-arm/gecko/widget/xpwidgets/nsBaseWidget.cpp, line 877
> Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
> ###!!! [Child][SyncChannel] Error: Channel error: cannot
> send/recv"
Vicamo: Is it the test changes in this patch that causes that problem?

The problem we are having is that without this patch, the tests that are automatically run on tbpl are passing fine, but with the patch checked in the tests are failing.

So clearly *something* in this patch is causing problem.
(Assignee)

Comment 78

6 years ago
(In reply to Jonas Sicking (:sicking) from comment #77)
> Vicamo: Is it the test changes in this patch that causes that
> problem?
> 
> The problem we are having is that without this patch, the
> tests that are automatically run on tbpl are passing fine, but
> with the patch checked in the tests are failing.
> 
> So clearly *something* in this patch is causing problem.

Yes, just found IPC errors I posted yesterday belongs to another problem. The patch does crash Gecko in another way and can be easily re-produced. Checking.
(Assignee)

Comment 79

6 years ago
NS_IMETHODIMP
SmsCursor::Continue()
{
  ...
  request->Reset(); // Crash here.
  ...
}
(Assignee)

Comment 80

6 years ago
Comment on attachment 695071 [details] [diff] [review]
patch v3

Review of attachment 695071 [details] [diff] [review]:
-----------------------------------------------------------------

I fully scan this patch again and found at least two problems in this patch now:

1. newly introduced column 'senderOrReceiver' is assigned to either sender or receiver as its name suggests. However, when filtering with desired numbers, matches on either sender or receiver are expected. For example, sending a message out from "111" to "222" and filtering with number "111" should still match the sent message. But with this change merged, this behavior will be no more.

2. Clean-up for cursorReq is omited after an empty result set filtering. SmsCursor is only instantiated with a valid list id when SmsDatabaseService replies with notifyMessageListCreated, and with only valid list id will the dstr of SmsCursor invoke clearMessageList, and only in the function will SmsDatabaseService removes curserReq. However, a cursorReq item is inserted in every startCursorRequest() call. So every time an empty result set filtering is invoked, a cursorReq is leaked. Even worse, when clearMessageList() is invoked with a valid list id, it clears only message list and leaves dangling cursorReq behind. So, a cursorReq is never removed in either case, valid or invalid list id.
Attachment #695071 - Flags: review-
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #80)
> Comment on attachment 695071 [details] [diff] [review]
> patch v3
> 
> Review of attachment 695071 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I fully scan this patch again and found at least two problems in this patch
> now:
> 
> 1. newly introduced column 'senderOrReceiver' is assigned to either sender
> or receiver as its name suggests. However, when filtering with desired
> numbers, matches on either sender or receiver are expected. For example,
> sending a message out from "111" to "222" and filtering with number "111"
> should still match the sent message. But with this change merged, this
> behavior will be no more.

We use the new index in createMessageList() and the behavior is coherent with what getThreadList() does. Note the we set the sender to null for sent messages and the receiver to null for received ones anyway.

> 2. Clean-up for cursorReq is omited after an empty result set filtering.
> SmsCursor is only instantiated with a valid list id when SmsDatabaseService
> replies with notifyMessageListCreated, and with only valid list id will the
> dstr of SmsCursor invoke clearMessageList, and only in the function will
> SmsDatabaseService removes curserReq. However, a cursorReq item is inserted
> in every startCursorRequest() call. So every time an empty result set
> filtering is invoked, a cursorReq is leaked. Even worse, when
> clearMessageList() is invoked with a valid list id, it clears only message
> list and leaves dangling cursorReq behind. So, a cursorReq is never removed
> in either case, valid or invalid list id.

Good catch for the empty cursor result set leak! Do you feel that it would be safer to use only one incremental id for both lists and cursors? Does the backend expect them to be in any order?
(Assignee)

Comment 82

6 years ago
(In reply to Fabrice Desré [:fabrice] from comment #81)
> Good catch for the empty cursor result set leak! Do you feel
> that it would be safer to use only one incremental id for both
> lists and cursors? Does the backend expect them to be in any
> order?

Yes, we can surely reuse list id as cursors'. But I think the very purpose of this issue is to take advantage of indexedDB cursor feature, try to reply the first entry as soon as possible. The original indexes disallow this because we'll always have to fetch all possible records and sort them by timestamp. I had a discuss with Jonas this morning and he told me that we can have some multi-key indexes, and with that, every filter result set will be automatically sorted by indexedDB itself.
(Assignee)

Comment 83

6 years ago
This is a prove of concept patch of using multi-key indexes. All SMS marionette test cases passed, but mochitests left unchecked for now.
How are things going here, Vicamo?  Is there some way we can get this reviewed today?
Flags: needinfo?(vyang)
(Assignee)

Comment 85

6 years ago
Posted patch Plan B - Part 2: WIP (obsolete) — Splinter Review
Reply message got as soon as possible. Works with one filter condition only.
Flags: needinfo?(vyang)
(Assignee)

Comment 86

6 years ago
Posted patch Plan B - Part 2: WIP2 (obsolete) — Splinter Review
all test cases but test_filter_number_multiple.js passed.
Attachment #700511 - Attachment is obsolete: true
(Assignee)

Comment 88

6 years ago
Comment on attachment 699830 [details] [diff] [review]
Plan B - Part 1: use multi-key indexes

This patch introduces multi-key index to SMS database. No actual sorting improvements made so far and all test cases should pass as before.
Attachment #699830 - Flags: review?(jonas)
Comment on attachment 699830 [details] [diff] [review]
Plan B - Part 1: use multi-key indexes

Review of attachment 699830 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +390,5 @@
> +
> +    // Create new "delivery", "number" and "read" indexes.
> +    objectStore.createIndex("delivery", "deliveryIndex", { unique: false });
> +    objectStore.createIndex("number", "numberIndex", { unique: false, multiEntry: true });
> +    objectStore.createIndex("read", "readIndex", { unique: false });

You don't need the "unique: false" on any of these as that's the default value. Up to you.

@@ +392,5 @@
> +    objectStore.createIndex("delivery", "deliveryIndex", { unique: false });
> +    objectStore.createIndex("number", "numberIndex", { unique: false, multiEntry: true });
> +    objectStore.createIndex("read", "readIndex", { unique: false });
> +
> +    // Populate new "deliverIndex" and "numberIndex" attributes.

and "readIndex"

@@ +403,5 @@
> +      let message = cursor.value;
> +      let timestamp = message.timestamp;
> +      message.deliveryIndex = [message.delivery, timestamp];
> +      message.numberIndex = [
> +        [message.sender,timestamp],

space after ','
Attachment #699830 - Flags: review?(jonas) → review+
(Assignee)

Comment 90

6 years ago
Address Jonas's review comments in comment #89.
Attachment #699830 - Attachment is obsolete: true
Attachment #701034 - Flags: review+
(Assignee)

Comment 91

6 years ago
Posted patch Plan B - Part 2: WIP3 (obsolete) — Splinter Review
Filtering senarios solved:
1) filtering by only one attribute.
2) filtering by mix of any attributes, but having at most one phone number.

TODO:
3) filtering by multiple phone numbers.
Attachment #700875 - Attachment is obsolete: true
(Assignee)

Comment 92

6 years ago
Add more test cases.
Attachment #700876 - Attachment is obsolete: true
(Assignee)

Comment 93

6 years ago
Attachment #701037 - Attachment is obsolete: true
Attachment #701058 - Flags: review?(jonas)
Attachment #701058 - Flags: review?(jonas) → review?(ferjmoreno)
Attachment #701059 - Flags: review?(jonas) → review?(ferjmoreno)
Component: DOM: Device Interfaces → Gaia
Product: Core → Boot2Gecko
Version: Trunk → unspecified
Component: Gaia → DOM: Device Interfaces
Product: Boot2Gecko → Core
Comment on attachment 701034 [details] [diff] [review]
Plan B - Part 1: use multi-key indexes : v2

Review of attachment 701034 [details] [diff] [review]:
-----------------------------------------------------------------

In general, I still don't see the need for storing the sender for a sent message or the receiver for a received message. These values would always be the device's MSISDN and I see no use case where someone search his own MSISDN and, in any case, a search of this kind would end up with the whole list of stored messages.

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +403,5 @@
> +      let message = cursor.value;
> +      let timestamp = message.timestamp;
> +      message.deliveryIndex = [message.delivery, timestamp];
> +      message.numberIndex = [
> +        [message.sender,timestamp],

nit: space after ','
Since we are creating a 'numberIndex', instead of doing it multi-key, why don't we just remove 'sender' and 'receiver' and store a single 'number' field? I think that would be an easier solution.
Trying to test the patch I am getting:

I/Gecko   (  485): SmsDatabaseService: onSingleFilterSuccess
I/Gecko   (  485): SmsDatabaseService: onSingleFilterSuccess - cursor
I/Gecko   (  485): SmsDatabaseService: onNextMessageInListGot - 7
I/Gecko   (  485): SmsDatabaseService: onSingleFilterSuccess
I/Gecko   (  485): SmsDatabaseService: onSingleFilterSuccess - cursor
I/Gecko   (  485): SmsDatabaseService: onNextMessageInListGot - 6
I/Gecko   (  485): SmsDatabaseService: onSingleFilterSuccess
I/Gecko   (  485): SmsDatabaseService: onSingleFilterSuccess - cursor
I/Gecko   (  485): SmsDatabaseService: onNextMessageInListGot - 5
I/Gecko   (  609): [Child 609] WARNING: pipe error (3): Connection reset by peer: file /Volumes/Data/dev/mozilla/mozilla-inbound/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 431
I/Gecko   (  609): 
I/Gecko   (  609): ###!!! [Child][SyncChannel] Error: Channel error: cannot send/recv
I/Gecko   (  609): 
I/GeckoDump(  612): exception: [Exception... "Component returned failure code: 0x80570018 (NS_ERROR_XPC_BAD_IID) [nsIJSCID.getService]"  nsresult: "0x80570018 (NS_ERROR_XPC_BAD_IID)"  location: "JS frame :: chrome://browser/content/shell.js :: shell_start :: line 187"  data: no]

Not sure if this is related
Comment on attachment 701058 [details] [diff] [review]
Plan B - Part 2-1: rewrite filtering by only one condition

I'd like to wait for Vicamo to answer my previous comments before doing a deeper review.
I would also appreciate Philipp's feedback on this patch since he was the original reviewer of this code.
Attachment #701058 - Flags: review?(ferjmoreno) → feedback?(philipp)
Attachment #701059 - Flags: review?(ferjmoreno) → feedback?(philipp)
(Assignee)

Comment 99

6 years ago
(In reply to Fernando Jiménez Moreno [:ferjm] from comment #96)
> Since we are creating a 'numberIndex', instead of doing it
> multi-key, why don't we just remove 'sender' and 'receiver' and
> store a single 'number' field? I think that would be an easier
> solution.

Jonas had told me maybe we don't need all the attributes in SmsFilter, or even a filter interface at all. Removing 'sender' and 'receiver' is the same thing. I only want to shorten the first reporting time in this bug. Other issues, like removing SmsFilter, or sender/receiver attributes, are not really something blocks this refactoring.

IMHO, having a 'number' field only doesn't bring any benefit at all. The problem of SmsDatabase slowness is it waits all required indexes fetched and sorts them based on the results got from timestamp index. As we have the 'numberIndex' introduced in the first patch, the problem can be then solved in following ones.

Besides, keeping a meaningful 'sender' attribute can be also necessary for MultiSIM. In MultiSIM, you want to know the origin of a message so that you can reply it through correct sim card. We still have no perfect solution for this and keeping them until some final conclusion comes out is a better idea for now.
(Assignee)

Comment 100

6 years ago
(In reply to Fernando Jiménez Moreno [:ferjm] from comment #97)
> I/Gecko   (  609): ###!!! [Child][SyncChannel] Error: Channel
> error: cannot send/recv
> 
> Not sure if this is related

It's not. I do have such error shown in the first place, but some how can't reproduce it with gecko tip. The reason that the original patch was bailed out is it crashes gecko in SmsCursor::Continue() with segmentation faults.
The *simplest* thing we could do in this patch is to reduce the feature-set of SMSFilter to the filters that we currently need. And either remove other features from the API, or make them throw.

That is definitely simpler than optimizing the full API feature set and looking at reducing the API in a different bug.

The *only* filter we are currently using right now is a filter which only specifies a single number. And where that number is the sending number for received messages, or the receiving number for sent messages.

Nothing else will be used by Gaia, or could be used by 3rd party apps, in v1.

However, I'm totally ok with leaving it up to Vicamo to decide what to do. But please keep in mind that this bug is supposed to be fixed by tuesday.


We can look at expanding that feature set for v2 to support the features that we think a good SMS API should have. But I do think that for the API to be a good and reliable API, *all* queries supported by the API needs to be backed by fast index searches. This is simply not possible with the current API design without slowing down other operations and using up too much disk space. Exactly how to solve this problem is of course something we should debate elsewhere, but I don't think we should be shy about simplifying the current API because it simply is currently too complex to be implemented well.
(Assignee)

Comment 102

6 years ago
(In reply to Jonas Sicking (:sicking) from comment #101)
> However, I'm totally ok with leaving it up to Vicamo to decide
> what to do. But please keep in mind that this bug is supposed
> to be fixed by tuesday.

I've revised the patches locally and will update them today. About missed part 2-3, I realized I just can't have an absolutely reversed, ordered sequence for all cases. So sad that I just can't optimize all scenarios.
Comment on attachment 701058 [details] [diff] [review]
Plan B - Part 2-1: rewrite filtering by only one condition

Review of attachment 701058 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice! If you haven't already, please ensure that test_smsdatabaseservice.xul still passes.

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +441,2 @@
>  
> +    --aMessageList.waitCount;

Style nit: other parts of this file use the

  aMessageList.waitCount -= 1;

notation.

@@ +830,5 @@
> +
> +      let singleFilterSuccessCb = function onSingleFilterSuccess(event) {
> +        debug("onSingleFilterSuccess");
> +        if (messageList.stop) {
> +          debug("onSingleFilterSuccess - stop");

if (DEBUG) both times please

@@ +859,4 @@
>        };
>  
> +      let direction = reverse ? PREV : NEXT;
> +      if (filter.delivery || filter.numbers || filter.read != undefined) {

How about a comment like this above the `if`:

  // We support filtering by date range only (see `else` block below) or
  // by number/delivery status/read status with an optional date range.

@@ +861,5 @@
> +      let direction = reverse ? PREV : NEXT;
> +      if (filter.delivery || filter.numbers || filter.read != undefined) {
> +        // Numeric 0 is small than any time stamp, and empty string is larger
> +        // than all numeric values.
> +        let startDate = 0, endDate = "";

Very clever!

Grammar nit: s/small/smaller/

@@ +918,5 @@
> +          let request = store.index("read").openKeyCursor(range, direction);
> +          request.onsuccess = singleFilterSuccessCb;
> +          request.onerror = singleFilterErrorCb;
> +        }
> +      } else {

// Filtering by date range only.
Attachment #701058 - Flags: feedback?(philipp) → review+
Comment on attachment 701038 [details] [diff] [review]
Plan B - Part 3: test cases for mixed filter targets. WIP2

Review of attachment 701038 [details] [diff] [review]:
-----------------------------------------------------------------

Using Marionette and the emulator's SMS simulation to test SmsDatabaseService seems a bit overkill. Take a look at test_smsdatabaseservice.xul, it's a Brower Chrome Mochitest that you can run if you build Firefox with --enable-b2g-ril.
(Assignee)

Comment 105

6 years ago
Address review comments in comment #95.
Attachment #695071 - Attachment is obsolete: true
Attachment #701034 - Attachment is obsolete: true
Attachment #701730 - Flags: review+
(Assignee)

Comment 106

6 years ago
1) complete error handling
2) address review comments in comment #103
Attachment #701058 - Attachment is obsolete: true
(Assignee)

Comment 107

6 years ago
Comment on attachment 701732 [details] [diff] [review]
Plan B - Part 2-1: rewrite filtering by only one condition : v2

previously r+ in comment #103
Attachment #701732 - Flags: review+
(Assignee)

Comment 108

6 years ago
Complete error handling & reverse filtering.
Attachment #701059 - Attachment is obsolete: true
Attachment #701059 - Flags: feedback?(philipp)
Attachment #701734 - Flags: review?(philipp)
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #99)
> (In reply to Fernando Jiménez Moreno [:ferjm] from comment #96)
> > Since we are creating a 'numberIndex', instead of doing it
> > multi-key, why don't we just remove 'sender' and 'receiver' and
> > store a single 'number' field? I think that would be an easier
> > solution.
> 
> Jonas had told me maybe we don't need all the attributes in SmsFilter, or
> even a filter interface at all. Removing 'sender' and 'receiver' is the same
> thing. I only want to shorten the first reporting time in this bug. Other
> issues, like removing SmsFilter, or sender/receiver attributes, are not
> really something blocks this refactoring.
> 

Fair enough.
Comment on attachment 701734 [details] [diff] [review]
Plan B - Part 2-2: rewrite filtering by mix of any two conditions : v2

Review of attachment 701734 [details] [diff] [review]:
-----------------------------------------------------------------

Additional review.

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +528,5 @@
> +          found = true;
> +          break;
> +        }
> +        if ((!aMessageList.reverse && (result.timestamp > aTimestamp))
> +            || (aMessageList.reverse && (result.timestamp < aTimestamp))) {

nit: 

if ((!aMessageList.reverse && (result.timestamp > aTimestamp)) ||
    (aMessageList.reverse && (result.timestamp < aTimestamp))) {
  [...]
}

@@ +540,5 @@
> +          // C) Cannot find a match anymore. Drop.
> +          if (results.length) {
> +            let lastResult = results[results.length - 1];
> +            if ((!aMessageList.reverse && (lastResult.timestamp >= aTimestamp))
> +                || (aMessageList.reverse && (lastResult.timestamp <= aTimestamp))) {

nit: same as above.

@@ +959,5 @@
> +          }
> +
> +          let cursor = event.target.result;
> +          if (cursor) {
> +            if (onNextMessageInMultiFiltersGotCb(which, cursor.primaryKey, cursor.key[1])) {

nit: lines shorter than 80 chars, please.

@@ +1004,5 @@
>            if (DEBUG) debug("filter.delivery " + filter.delivery);
>            let range = IDBKeyRange.bound([filter.delivery, startDate],
>                                          [filter.delivery, endDate]);
>            let request = store.index("delivery").openKeyCursor(range, direction);
> +          if (singleFilter) {

nit: this pattern seems to be repeated for delivery, numbers and read filter. I wonder if you could use an aux function for it.
Attachment #701734 - Flags: review+
(Assignee)

Comment 111

6 years ago
Address review comments in comment #110
Attachment #701734 - Attachment is obsolete: true
Attachment #701734 - Flags: review?(philipp)
Attachment #701811 - Flags: review?(philipp)
(Assignee)

Comment 112

6 years ago
Oops! attached wrong file.
Attachment #701811 - Attachment is obsolete: true
Attachment #701811 - Flags: review?(philipp)
Attachment #701816 - Flags: review?(philipp)
(Assignee)

Comment 113

6 years ago
Add r=philikon only
Attachment #701732 - Attachment is obsolete: true
Attachment #701874 - Flags: review+
(Assignee)

Comment 114

6 years ago
Add r=ferjm, fix tabs.
Attachment #701816 - Attachment is obsolete: true
Attachment #701816 - Flags: review?(philipp)
Attachment #701875 - Flags: review+
(Assignee)

Comment 115

6 years ago
This patch uses the same algorithm used before. No actual improvements made. But since we don't do such query in Gaia, I think that's acceptable for now.
Attachment #701883 - Flags: review?(philipp)
Attachment #701883 - Flags: review?(ferjmoreno)
(Assignee)

Comment 116

6 years ago
Add more test cases. Philipp, do you want me to convert it into a mochitest one?
Attachment #701038 - Attachment is obsolete: true
Attachment #701885 - Flags: feedback?(philipp)
Try run for abd60572c6f0 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=abd60572c6f0
Results (out of 19 total builds):
    exception: 13
    failure: 6
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/vyang@mozilla.com-abd60572c6f0
(Assignee)

Comment 118

6 years ago
(In reply to Mozilla RelEng Bot from comment #117)
> Try run for abd60572c6f0 is complete.
> Detailed breakdown of the results available here:
>     https://tbpl.mozilla.org/?tree=Try&rev=abd60572c6f0

I just cancelled it. Another run with exactly the patches attached above is available in https://tbpl.mozilla.org/?tree=Try&rev=0d04aeeffd4e
(Assignee)

Comment 119

6 years ago
(In reply to Philipp von Weitershausen [:philikon] from comment #104)
> Using Marionette and the emulator's SMS simulation to test
> SmsDatabaseService seems a bit overkill. Take a look at
> test_smsdatabaseservice.xul, it's a Brower Chrome Mochitest that
> you can run if you build Firefox with --enable-b2g-ril.

Actually I don't think test_smsdatabaseservice.xul is still valid. It's still using SmsRequestManager, which had been removed long ago in bug 775997!
Comment on attachment 701883 [details] [diff] [review]
Plan B - Part 2-3: rewrite filtering by multiple phone numbers

Review of attachment 701883 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Vicamo! Looks good to me.

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +577,5 @@
> +    }
> +    let queues = aMessageList.filters[aWhich].queues;
> +    let q = queues[aIndex];
> +    if (aMessageId) {
> +      if (!aIndex) { // Timestamp.

nit:
if (!aIndex) {
  // Timestamp.
  ...
} else {
  // Numbers.
  ...
}
Attachment #701883 - Flags: review?(ferjmoreno) → review+
Comment on attachment 701875 [details] [diff] [review]
Plan B - Part 2-2: rewrite filtering by mix of any two conditions : v4

Review of attachment 701875 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +516,5 @@
> +    // we wait for further results; either B) record timestamp is larger
> +    // then current timestamp or C) no more processing for a filter, then we
> +    // drop this id because there can't be a match anymore.
> +    for (let i = 0; i < filters.length; i++) {
> +      if (i == aWhich) continue;

Nit: braces and new line plz :)

@@ +531,5 @@
> +        if ((!aMessageList.reverse && (result.timestamp > aTimestamp)) ||
> +            (aMessageList.reverse && (result.timestamp < aTimestamp))) {
> +          // B) Cannot find a match anymore. Drop.
> +          return true;
> +        }

I think here you're making the implicit assumption that the index's read cursor will return messages in timestamp order. I don't think that assumption is valid.

I *think* that the index's cursor returns results in index value order, but your index values are [data, timestamp]. So you might be iterating through some low `data` values associated with high `timestamp` values and falsely bailing out early here.

Or am I missing something?

@@ +1009,5 @@
> +          if (singleFilter) {
> +            request.onsuccess = singleFilterSuccessCb;
> +            request.onerror = singleFilterErrorCb;
> +          } else {
> +            let me = which++;

`which` and `me` aren't very descriptive variable names. `numberOfContexts` and `currentContext` perhaps?
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #119)
> (In reply to Philipp von Weitershausen [:philikon] from comment #104)
> > Using Marionette and the emulator's SMS simulation to test
> > SmsDatabaseService seems a bit overkill. Take a look at
> > test_smsdatabaseservice.xul, it's a Brower Chrome Mochitest that
> > you can run if you build Firefox with --enable-b2g-ril.
> 
> Actually I don't think test_smsdatabaseservice.xul is still valid. It's
> still using SmsRequestManager, which had been removed long ago in bug 775997!

Ugh yeah. This sucks. This code needs good tests!

(In reply to Vicamo Yang [:vicamo][:vyang] from comment #116)
> Created attachment 701885 [details] [diff] [review]
> Part 3: test cases for mixed filter targets.
> 
> Add more test cases. Philipp, do you want me to convert it into a mochitest
> one?

Meh, at least the Marionette tests are executed right now. It sounds like the browser chrome test (test_smsdatabaseservice.xul) wasn't even executed on our CI.
Attachment #701883 - Flags: review?(philipp) → review+
Comment on attachment 701885 [details] [diff] [review]
Part 3: test cases for mixed filter targets.

Review of attachment 701885 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/sms/tests/marionette/test_filter_mixed.js
@@ +110,5 @@
> +      }
> +    } else if (count < NUM_THREADS) {
> +      sendMessage(count);
> +    } else {
> +      window.setTimeout(testDeliveryAndNumber, 0);

I suggest implementing an array of test function and a helper to run the next test. Something like:

  var tests = [];

  function run_next_test() {
    if (!tests.length) {
      return;
    }
    var f = tests.shift();
    window.setTimeout(f);
  };

And then you can just do

  tests.push(function testDeliverAndNumber() {
    ...
  });

for each test function and the clean up function :). We should really add those helpers to Marionette and Mochitest...

r=me with that.
Attachment #701885 - Flags: feedback?(philipp) → review+
(In reply to Philipp von Weitershausen [:philikon] from comment #121)
> @@ +531,5 @@
> > +        if ((!aMessageList.reverse && (result.timestamp > aTimestamp)) ||
> > +            (aMessageList.reverse && (result.timestamp < aTimestamp))) {
> > +          // B) Cannot find a match anymore. Drop.
> > +          return true;
> > +        }
> 
> I think here you're making the implicit assumption that the index's read
> cursor will return messages in timestamp order. I don't think that
> assumption is valid.
> 
> I *think* that the index's cursor returns results in index value order, but
> your index values are [data, timestamp]. So you might be iterating through
> some low `data` values associated with high `timestamp` values and falsely
> bailing out early here.
> 
> Or am I missing something?

I think I figured out what I'm missing. We support value ranges only for dates. For every other field, `data` will always be the same value, so [data, timestamp] will be sorted by `timestamp`. Nice! That's very clever. Maybe adding a comment there would help the next guy understand this. Thanks! :)
Should we wait for Vicamo to come online (6:27 AM his time) or should we get someone else to land this?
Since ferjm and I only had nits left and the blocker deadline is today, we went ahead and landed this:

https://hg.mozilla.org/releases/mozilla-b2g18/rev/da2dc0be1c71
https://hg.mozilla.org/releases/mozilla-b2g18/rev/415c88d72c1e
https://hg.mozilla.org/releases/mozilla-b2g18/rev/d2bc7e26a789
https://hg.mozilla.org/releases/mozilla-b2g18/rev/d7999c3c2800
https://hg.mozilla.org/releases/mozilla-b2g18/rev/4935100647bb

Inbound is closed atm, will land asap there.

Vicamo, can you please file a bug to address the nits mentioned in comments 120 through 123? Thanks!
I just got a crash opening an SMS thread when trying to re-commission my dogfood phone.  Processing minidump atm but this bug looks suspicious ...
Posted file Minidump
Crash reason:  SIGSEGV
Crash address: 0x10

Thread 0 (crashed)
 0  libxul.so!XPCNativeScriptableCreateInfo::SetCallback [nsCOMPtr.h : 435 + 0x0]
     r4 = 0x00000010    r5 = 0x4b402330    r6 = 0x4b402330    r7 = 0xbe8c2688
     r8 = 0x00000002    r9 = 0x417933b0   r10 = 0xbe8c2990    fp = 0x4b283860
     sp = 0xbe8c2548    lr = 0x40d750f5    pc = 0x409f807c
    Found by: given as instruction pointer in context
 1  libxul.so!nsCOMPtr_base::assign_with_AddRef [nsCOMPtr.cpp : 49 + 0x7]
     r4 = 0x00000010    r5 = 0x4b402330    r6 = 0x4b402330    r7 = 0xbe8c2688
     r8 = 0x00000002    r9 = 0x417933b0   r10 = 0xbe8c2990    fp = 0x4b283860
     sp = 0xbe8c2550    pc = 0x40d750f5
    Found by: call frame info
 2  libxul.so!mozilla::dom::sms::SmsRequest::NotifyNextMessageInListGot [nsCOMPtr.h : 622 + 0x5]
     r4 = 0xbe8c2650    r5 = 0x4b25ebc0    r6 = 0x4b402330    r7 = 0xbe8c2688
     r8 = 0x00000002    r9 = 0x417933b0   r10 = 0xbe8c2990    fp = 0x4b283860
     sp = 0xbe8c2560    pc = 0x40d4942f
    Found by: call frame info

Gonna need to come out ... :(
STR crash
 (1) Open sms app
 (2) Open sms thread from main screen

First thread I opened crashed while opening (never showed conversation view before crashing).  Subsequent attempts to open threads either crashed or didn't do anything at all, with no error messages in logcat.
Backed out:

https://hg.mozilla.org/releases/mozilla-b2g18/rev/e855f4ca39a7
https://hg.mozilla.org/releases/mozilla-b2g18/rev/3c7d637932d3

Inbound is closed again, will back out there later.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Pushing this performance optimization out to tef+ per earlier discussion with gal.
blocking-b2g: --- → tef+
blocking-basecamp: + → -
Most of the time I can tap on a thread and nothing happens. Only sometimes I can get it to crash. The reason for this is probably the following exception:

I/Gecko   (  105): SmsDatabaseService: filter.numbers +1650xxxxxx
I/Gecko   (  105): SmsDatabaseService: onNextMessageInListGot - 5
I/Gecko   (  105): SmsDatabaseService: Fetching message 5
I/Gecko   (  105): SmsDatabaseService: createMessageFromRecord: {"deliveryIndex":["received",1358215186000],"numberIndex":[["+1650xxxx",1358215186000],["+1650xxxxx",1358215186000]],"readIndex":[1,1358215186000],"delivery":"received","deliveryStatus":"success","sender":"+1650xxxxxx","receiver":"+1650xxxxx","body":"Hello","messageClass":"normal","timestamp":1358215186000,"read":1,"id":5}
I/Gecko   (  105): SmsDatabaseService: notifyMessageListCreated 5
I/Gecko   (  105): SmsDatabaseService: onNextMessageInListGot - 4
I/Gecko   (  105): SmsDatabaseService: Cursor.continue() not called yet
I/Gecko   (  105): SmsDatabaseService: Getting next message in list 1
I/Gecko   (  105): SmsDatabaseService: onNextMessageInListGot - 0
I/Gecko   (  105): SmsDatabaseService: Fetching message 4
I/Gecko   (  105): SmsDatabaseService: createMessageFromRecord: undefined
E/GeckoConsole(  105): [JavaScript Error: "record is undefined" {file: "jar:file:///system/b2g/omni.ja!/components/SmsDatabaseService.js" line: 417}]

This problem can be fixed with 

diff --git a/dom/sms/src/ril/SmsDatabaseService.js b/dom/sms/src/ril/SmsDatabaseService.js
--- a/dom/sms/src/ril/SmsDatabaseService.js
+++ b/dom/sms/src/ril/SmsDatabaseService.js
@@ -458,20 +458,20 @@ SmsDatabaseService.prototype = {
 
     if (aMessageList.results[0] < 0) {
       aRequest.notifyReadMessageListFailed(Ci.nsISmsRequest.INTERNAL_ERROR);
       return;
     }
 
     let firstMessageId = aMessageList.results.shift();
     if (DEBUG) debug ("Fetching message " + firstMessageId);
-    let request = aObjectStore.get(aMessageId);
+    let request = aObjectStore.get(firstMessageId);
     let self = this;
     request.onsuccess = function onsuccess(event) {
-      let sms = self.createMessageFromRecord(event.target.result);
+      let sms = self.createMessageFromRecord(request.result);
       if (aMessageList.listId >= 0) {
         if (DEBUG) debug("notifyNextMessageInListGot " + firstMessageId);
         aRequest.notifyNextMessageInListGot(sms);
       } else {
         self.lastMessageListId += 1;
         aMessageList.listId = self.lastMessageListId;
         self.messageLists[self.lastMessageListId] = aMessageList;
         if (DEBUG) debug("notifyMessageListCreated " + firstMessageId);

Now I get a crash every time I have a thread with more than one message in it:

Program received signal SIGSEGV, Segmentation fault.
nsCOMPtr_base::assign_assuming_AddRef (this=0x10, newPtr=0x47813ce0) at ../../dist/include/nsCOMPtr.h:435
435	          nsISupports* oldPtr = mRawPtr;
(gdb) bt
#0  nsCOMPtr_base::assign_assuming_AddRef (this=0x10, newPtr=0x47813ce0) at ../../dist/include/nsCOMPtr.h:435
#1  0x40d6f884 in nsRefPtr<mozilla::dom::indexedDB::IDBIndex>::assign_with_AddRef (this=0x10, rawPtr=0x47813ce0)
    at ../../dist/include/nsAutoPtr.h:846
#2  0x40d43bbe in nsCOMPtr<nsIDOMMozSmsMessage>::operator= (this=0x485656c0, aMessage=<value optimized out>)
    at ../../../dist/include/nsCOMPtr.h:622
#3  mozilla::dom::sms::SmsCursor::SetMessage (this=0x485656c0, aMessage=<value optimized out>)
    at /home/philipp/dev/m-b2g/dom/sms/src/SmsCursor.h:47
#4  mozilla::dom::sms::SmsRequest::NotifyNextMessageInListGot (this=0x485656c0, aMessage=<value optimized out>)
    at /home/philipp/dev/m-b2g/dom/sms/src/SmsRequest.cpp:422
<snip>
I suspect that when createMessageList() and getNextMessageInList() are called, they get different aRequest objects as parameters, even if in the content process it is indeed the exact same request object. So when we hold onto the request that's passed to createMessageList(), like such:

      let onNextMessageInListGotCb =
        self.onNextMessageInListGot.bind(self, aRequest, store, messageList);

and then later call request.notifyNextMessageInListGot() on it when answering to a getNextMessageInList() call, we're writing to a request that has long been cycle-collected.

I suspect we must buffer the `aRequest` objects as well and answer to each one in order.
My suspicions from comment 135 were correct. Here's a fix.
Attachment #702131 - Flags: review?(vyang)
No more crashing, but
 - often (3/4?) times I tap on a thread, the conversation view doesn't load.  No errors in logcat.
 - on a thread with maybe ~100 messages, it still takes ~10s to load the view.  Is there a gaia followup needed for this?
(Assignee)

Comment 138

6 years ago
Comment on attachment 702131 [details] [diff] [review]
The Make-Phone-Not-Crash Addendum

Review of attachment 702131 [details] [diff] [review]:
-----------------------------------------------------------------

l have no computer access now, and review on ff mobile is really not easy :(


I push -1 at the end of the result list for errors, but in this patch we won't have that because you push it only when aMessageId is larger equal than 0. However, corresponding error reporting code is still there, so l assume there's still something has to be fixed.
Attachment #702131 - Flags: review?(vyang)
(Assignee)

Comment 139

6 years ago
is there any way we can turn on oop in marionette?
(In reply to Chris Jones [:cjones] [:warhammer] from comment #137)
> No more crashing, but
>  - often (3/4?) times I tap on a thread, the conversation view doesn't load.
> No errors in logcat.

That's probably an unrelated problem.

>  - on a thread with maybe ~100 messages, it still takes ~10s to load the
> view.  Is there a gaia followup needed for this?

There shouldn't be.

(In reply to Vicamo Yang [:vicamo][:vyang] from comment #138)
> I push -1 at the end of the result list for errors, but in this patch we
> won't have that because you push it only when aMessageId is larger equal
> than 0.

I did not change those semantics. The treatment of -1 and 0 is exactly the same as in your patch.
Comment on attachment 702131 [details] [diff] [review]
The Make-Phone-Not-Crash Addendum

Fernando, can you review this please? It might not be perfect yet, but it's a start :)
Attachment #702131 - Flags: review?(ferjmoreno)
(Assignee)

Comment 142

6 years ago
(In reply to Philipp von Weitershausen [:philikon] from comment #140)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #137)
> > No more crashing, but
> >  - often (3/4?) times I tap on a thread, the conversation view doesn't load.
> > No errors in logcat.
> 
> That's probably an unrelated problem.
> 
> >  - on a thread with maybe ~100 messages, it still takes ~10s to load the
> > view.  Is there a gaia followup needed for this?
> 
> There shouldn't be.

it's possible because these patches do not improve getThreadList().

> (In reply to Vicamo Yang [:vicamo][:vyang] from comment #138)
> > I push -1 at the end of the result list for errors, but in this patch we
> > won't have that because you push it only when aMessageId is larger equal
> > than 0.
> 
> I did not change those semantics. The treatment of -1 and 0 is exactly the
> same as in your patch.

please check again in func onNewMessageInListGot, it's different. You push only positive values while the original code pushed all non zero values
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #142)
> (In reply to Philipp von Weitershausen [:philikon] from comment #140)
> > (In reply to Chris Jones [:cjones] [:warhammer] from comment #137)
> > > No more crashing, but
> > >  - often (3/4?) times I tap on a thread, the conversation view doesn't load.
> > > No errors in logcat.
> > 
> > That's probably an unrelated problem.
> > 
> > >  - on a thread with maybe ~100 messages, it still takes ~10s to load the
> > > view.  Is there a gaia followup needed for this?
> > 
> > There shouldn't be.
> 
> it's possible because these patches do not improve getThreadList().

I think cjones was talking about viewing an individual thread, not the thread list.

> > (In reply to Vicamo Yang [:vicamo][:vyang] from comment #138)
> > > I push -1 at the end of the result list for errors, but in this patch we
> > > won't have that because you push it only when aMessageId is larger equal
> > > than 0.
> > 
> > I did not change those semantics. The treatment of -1 and 0 is exactly the
> > same as in your patch.
> 
> please check again in func onNewMessageInListGot, it's different. You push
> only positive values while the original code pushed all non zero values

I stand corrected. Thanks for pointing that out. These subtle differences should perhaps be documented ;). I'll fix that part in my patch and test again.
philikon is correct.  Showing the thread list on startup is nice and snappy, but loading the conversation view for one thread that has 165 messages takes ~10s.  That's not usable.
I've done some tests and I am also seeing severe perf issues:

E/GeckoConsole(  425): Content JS LOG at app://sms.gaiamobile.org/js/sms.js:234 in onsuccess: First message retrieved in 151
E/GeckoConsole(  425): Content JS LOG at app://sms.gaiamobile.org/js/sms.js:239 in onsuccess: Last message retrieved in 3814
E/GeckoConsole(  425): Content JS LOG at app://sms.gaiamobile.org/js/sms.js:240 in onsuccess: Number of messages retrieved 35

E/GeckoConsole( 2031): Content JS LOG at app://sms.gaiamobile.org/js/sms.js:234 in onsuccess: First message retrieved in 159
E/GeckoConsole( 2031): Content JS LOG at app://sms.gaiamobile.org/js/sms.js:239 in onsuccess: Last message retrieved in 8279
E/GeckoConsole( 2031): Content JS LOG at app://sms.gaiamobile.org/js/sms.js:240 in onsuccess: Number of messages retrieved 61
Comment on attachment 702131 [details] [diff] [review]
The Make-Phone-Not-Crash Addendum

Review of attachment 702131 [details] [diff] [review]:
-----------------------------------------------------------------

The patch looks good. However we still need to deal with this perf issues :|.

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +426,5 @@
>    },
>  
>    /**
>     * Queue up passed message id, reply if necessary. 'aMessageId' = 0 for no
>     * more messages, negtive for errors and valid otherwise.

Typo: 'negative'

@@ +461,5 @@
>        return;
>      }
>  
> +    let firstMessageId = aMessageList.results.shift();
> +    if (DEBUG) debug ("Message to fetch: " + firstMessageId);    

nit: trailing whitespaces.
Attachment #702131 - Flags: review?(ferjmoreno) → review+
(Assignee)

Comment 147

6 years ago
I'm back to Taipei and is working on it.
(Assignee)

Comment 148

6 years ago
(In reply to Chris Jones [:cjones] [:warhammer] from comment #144)
> philikon is correct.  Showing the thread list on startup is nice and snappy,
> but loading the conversation view for one thread that has 165 messages takes
> ~10s.  That's not usable.

https://github.com/mozilla-b2g/gaia/blob/master/apps/sms/js/sms.js#L218
As you can see, Gaia collects all messages and then renders the conversation view. So all works here is still in vain. :(
(Assignee)

Comment 149

6 years ago
1) Include necessary changes from attachment #702131 [details] [diff] [review] by philikon: replace waitCount with requestsWaiting. I've also added a while-loop in onNextMessageInListGot() to serve as many queued requests as possible.
2) Move createRangedRequest() and createSimpleRangedRequest() from Part 2-2 to this patch because they're first used here.
3) Use numberOfContexts instead of numFilterTargets for unified naming in latter patches
4) Add comments after FIXME.
Attachment #701874 - Attachment is obsolete: true
Attachment #702384 - Flags: review+
(Assignee)

Comment 150

6 years ago
Ooops! forget `hg qrefresh` first.
Attachment #702384 - Attachment is obsolete: true
Attachment #702385 - Flags: review+
(Assignee)

Updated

6 years ago
Attachment #702131 - Attachment is obsolete: true
(Assignee)

Comment 151

6 years ago
1) Address review comment #121 and rename 'messageList.filters' back to 'messageList.contexts'
2) Include necessary fixes for SmsRequest crash.
Attachment #701875 - Attachment is obsolete: true
Attachment #702389 - Flags: review+
(Assignee)

Comment 152

6 years ago
1) Address review comment #120, also renamed 'index' to 'queueIndex'.
2) Include necessary fixes for SmsRequest crash.
Attachment #701883 - Attachment is obsolete: true
Attachment #702392 - Flags: review+
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #148)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #144)
> > philikon is correct.  Showing the thread list on startup is nice and snappy,
> > but loading the conversation view for one thread that has 165 messages takes
> > ~10s.  That's not usable.
> 
> https://github.com/mozilla-b2g/gaia/blob/master/apps/sms/js/sms.js#L218
> As you can see, Gaia collects all messages and then renders the conversation
> view. So all works here is still in vain. :(

Is there a bug on file for this? If not, please file one and get it on Borja's radar.
(Assignee)

Comment 154

6 years ago
Address review comment #123, add a test tasks management object.
Attachment #701885 - Attachment is obsolete: true
Attachment #702412 - Flags: review+
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #148)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #144)
> > philikon is correct.  Showing the thread list on startup is nice and snappy,
> > but loading the conversation view for one thread that has 165 messages takes
> > ~10s.  That's not usable.
> 
> https://github.com/mozilla-b2g/gaia/blob/master/apps/sms/js/sms.js#L218
> As you can see, Gaia collects all messages and then renders the conversation
> view. So all works here is still in vain. :(

Indeed :\ That's why I added the debug messages from comment 145, so we can get the time required to get the messages, not only the visual feedback. If I am not wrong, Borja already had some code to address this issue in his github account.
Flags: needinfo?(fbsc)
(Assignee)

Comment 156

6 years ago
https://tbpl.mozilla.org/?tree=Try&rev=9fa6bafacd57 and I've also verify them with emulator, no Unagi at hand right now.
(Reporter)

Comment 157

6 years ago
Yep, this problem in Gaia it's gonna be solved in the following bug https://bugzilla.mozilla.org/show_bug.cgi?id=827815. However, the main purpose of the Gecko side bug it's optimize the time for getting the cursor, because when having a large amount of SMSs from multiple phone numbers this time it's big, and must be decreased.

That's why, for your timing tests, the useful info it's the one explained in https://bugzilla.mozilla.org/show_bug.cgi?id=813978#c145 . In Gaia we will have a patch for getting a better user experience (Im defining some stuff with UX about it, we are going to get some sort of 'pagination'), but the optimization in Gecko it's needed as first step.
Flags: needinfo?(fbsc)
(Assignee)

Comment 158

6 years ago
(In reply to Borja Salguero [:borjasalguero] from comment #157)
> Yep, this problem in Gaia it's gonna be solved in the following bug
> https://bugzilla.mozilla.org/show_bug.cgi?id=827815. However, the main
> purpose of the Gecko side bug it's optimize the time for getting the cursor,
> because when having a large amount of SMSs from multiple phone numbers this
> time it's big, and must be decreased.

I'm not quite sure which Gecko call do you want to optimize? Getting the 'first' cursor or 'all' cursors?

> That's why, for your timing tests, the useful info it's the one explained in
> https://bugzilla.mozilla.org/show_bug.cgi?id=813978#c145 .

I don't know, I can't find any time stamps in comment #145, but I managed to fetch 100 messages out of 2000(20 sender x 100 sms each) in 2 seconds in emulator. Is it fast enough?

> In Gaia we will
> have a patch for getting a better user experience (Im defining some stuff
> with UX about it, we are going to get some sort of 'pagination'), but the
> optimization in Gecko it's needed as first step.

I think we can even land Gaia code first. Is there anything blocks Gaia from using async model?
We need to get the first cursor as soon as possible, then the UI can fetch results lazily.

2 seconds for 100 messages still seems slow, but maybe the performance is better on device than on the emulator.
(Reporter)

Comment 160

6 years ago
+1 to Fabrice comment. In my UI patch I need to get the cursor as fast as possible, and then I will apply pagination and lazy-loading for improve user experience in Gaia.
(Assignee)

Comment 161

6 years ago
(In reply to Fabrice Desré [:fabrice] from comment #159)
> We need to get the first cursor as soon as possible, then the UI can fetch
> results lazily.

Great! Then we have it.

> 2 seconds for 100 messages still seems slow, but maybe the performance is
> better on device than on the emulator.

It means the first cursor arrives in 20ms in average. The whole screen may only contain less than 10 messages. So we can render the screen in one blink and watch the scroll bar going shorter and shorter in rest of the 2 seconds. I would say that's good enough.
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #158)
> > That's why, for your timing tests, the useful info it's the one explained in
> > https://bugzilla.mozilla.org/show_bug.cgi?id=813978#c145 .
> 
> I don't know, I can't find any time stamps in comment #145, but I managed to
> fetch 100 messages out of 2000(20 sender x 100 sms each) in 2 seconds in
> emulator. Is it fast enough?

The only thing that matters is device tests, preferably on Otoro, to get the performance characteristics right.

(In reply to Vicamo Yang [:vicamo][:vyang] from comment #149)
> 1) Include necessary changes from attachment #702131 [details] [diff] [review] [diff]
> [review] by philikon: replace waitCount with requestsWaiting. I've also
> added a while-loop in onNextMessageInListGot() to serve as many queued
> requests as possible.

I just realized that there can only ever be one request in the queue. Because content can only call cursor.continue() once, not multiple times. So I don't think we we need a waitCount or waiting list, just *one* waiting spot.
With the latest patches, I see a bunch of erratic behaviour:

* Sometimes tapping on a message thread won't open the thread. I see this in logcat:

01-15 16:51:18.960   105   105 I Gecko   : SmsDatabaseService: createMessageFromRecord: undefined
01-15 16:51:18.960   105   105 E GeckoConsole: [JavaScript Error: "record is undefined" {file: "jar:file:///system/b2g/omni.ja!/components/SmsDatabaseService.js" line: 417}]

So for some reason, we found a message ID through some index, but getting the corresponding record yields undefined. Weird.

* Some messages are randomly dropped or duplicated. I texted back and forth with a friend today and every time I open the thread, I see a different subset (incl dupes occasionally) of our messages. Examining the debug output of SmsDatabaseService doesn't reveal anything obvious. onNextMessageInListGot and createMessageFromRecord are called for all expected messages in the right order... I don't recall having observed this bug before, so I don't think it's a Gaia bug but has to be related to the patches here.


At this point I'm a bit worried about the size and number of patches here and the obscure bugs its introducing. As far as I can tell, we only need to make very simple queries work. Why can't we work off of Fabrice's much smaller patch and simply not support the other uses cases (throw NOT_IMPLEMENTED_ERROR or something), or fall back to the slow code path.
(Assignee)

Comment 164

6 years ago
(In reply to Philipp von Weitershausen [:philikon] from comment #163)
> With the latest patches, I see a bunch of erratic behaviour:
> 
> * Sometimes tapping on a message thread won't open the thread.
> I see this in logcat:
> 
> 01-15 16:51:18.960   105   105 I Gecko   : SmsDatabaseService:
> createMessageFromRecord: undefined
> 01-15 16:51:18.960   105   105 E GeckoConsole: [JavaScript
> Error: "record is undefined" {file:
> "jar:file:///system/b2g/omni.ja!/components/SmsDatabaseService.js"
> line: 417}]
> 
> So for some reason, we found a message ID through some index,
> but getting the corresponding record yields undefined. Weird.

I have this, too. But that's a indexedDb issue, isn't it? If you just kill the sms app and re-launch it again, it often comes back.

> * Some messages are randomly dropped or duplicated. I texted
> back and forth with a friend today and every time I open the
> thread, I see a different subset (incl dupes occasionally) of
> our messages.

I have this issue, too.

> Why can't we work off of Fabrice's much smaller patch

Because Fabrice's patch assumes "messages sorted by id are also sorted by timestamp", which is actually wrong. The timestamps of received messages are from SMSC UTC time and those of sent messages are from device time, but message IDs are from the order of delivery, no strict relation with message timestamps. Having such assumption creates another message ordering bug. If you want to fix it, you'll find out it can't go without my part 1, using multi-key indexes.

Fabrice's patch shortcuts the most simple cases, 1 or 0 filtering constraints, as my part 2-1 does.
(Assignee)

Comment 165

6 years ago
Comment on attachment 702385 [details] [diff] [review]
Plan B - Part 2-1: rewrite filtering by only one condition : v3

Review of attachment 702385 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +472,5 @@
> +      }
> +
> +      let firstMessageId = aMessageList.results.shift();
> +      if (DEBUG) debug ("Fetching message " + firstMessageId);
> +      let getRequest = aObjectStore.get(aMessageId);

Should be 'firstMessageId' here.
Attachment #702385 - Flags: review+ → review-
(Assignee)

Comment 166

6 years ago
With this revision, I've no longer had issues listed in comment #163. "record is undefined" is gone, no message dropped, duplicated, listed with wrong timestamp order.
Attachment #702385 - Attachment is obsolete: true
Attachment #702735 - Flags: review+
Attachment #702735 - Flags: feedback?(philipp)
(In reply to Vicamo Yang [:vicamo][:vyang] from comment #165)
> > +      let firstMessageId = aMessageList.results.shift();
> > +      if (DEBUG) debug ("Fetching message " + firstMessageId);
> > +      let getRequest = aObjectStore.get(aMessageId);
> 
> Should be 'firstMessageId' here.

Grrr.

(In reply to Vicamo Yang [:vicamo][:vyang] from comment #166)
> Created attachment 702735 [details] [diff] [review]
> Plan B - Part 2-1: rewrite filtering by only one condition : v4
> 
> With this revision, I've no longer had issues listed in comment #163.
> "record is undefined" is gone, no message dropped, duplicated, listed with
> wrong timestamp order.

I can confirm this! Yay.
Comment on attachment 702735 [details] [diff] [review]
Plan B - Part 2-1: rewrite filtering by only one condition : v4

Review of attachment 702735 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/sms/src/ril/SmsDatabaseService.js
@@ +839,5 @@
> +        stop: false,
> +        // Local contexts for multiple filter targets' case.
> +        contexts: [],
> +        // Pending createMessageList or getNextMessageInList SmsRequests.
> +        requestsWaiting: [aRequest],

As pointed out in comment 162, this does not need to be an array because there can only ever be one request waiting. Means we can get rid of the `while` loops, too. I think we should address that before landing. Looks good otherwise!
Attachment #702735 - Flags: feedback?(philipp)
(Assignee)

Comment 169

6 years ago
1) Address review comment #168, remove while-loop in onNextMessageInListGot for we can only have one SmsRequest. Described in comments as well.
2) Move messageList.contexts to latter patch
3) Move multiple numbers filtering support to latter patch
Attachment #702735 - Attachment is obsolete: true
Attachment #703188 - Flags: review+
(Assignee)

Comment 170

6 years ago
1) Add messageList.contexts
2) Move multiple numbers filtering support to latter patch
Attachment #702389 - Attachment is obsolete: true
Attachment #703189 - Flags: review+
(Assignee)

Comment 172

6 years ago
Add r=philikon only
Attachment #702412 - Attachment is obsolete: true
Attachment #703191 - Flags: review+
Comment on attachment 703188 [details] [diff] [review]
Plan B - Part 2-1: rewrite filtering by only one condition : v5

[Triage Comment]
Attachment #703188 - Flags: approval-mozilla-b2g18+
Comment on attachment 703189 [details] [diff] [review]
Plan B - Part 2-2: rewrite filtering by mix of any two conditions : v6

[Triage Comment]
Attachment #703189 - Flags: approval-mozilla-b2g18+
Comment on attachment 703190 [details] [diff] [review]
Plan B - Part 2-3: rewrite filtering by multiple phone numbers : v3

[Triage Comment]
Attachment #703190 - Flags: approval-mozilla-b2g18+
Comment on attachment 703191 [details] [diff] [review]
Part 3: test cases for mixed filter targets : v3

[Triage Comment]
Attachment #703191 - Flags: approval-mozilla-b2g18+
Which bug is the gaia followup to make rendering a conversation with 165 messages take less than 10 seconds?
(Assignee)

Comment 181

6 years ago
(In reply to Chris Jones [:cjones] [:warhammer] from comment #180)
> Which bug is the gaia followup to make rendering a conversation
> with 165 messages take less than 10 seconds?

Bug 827815.
Landed on mozilla-b2g18/gaia master prior to the 1/25 branching to mozilla-b2g18_v1_0_0/v1.0.0, updating status-b2g-v1.0.0 to fixed.
You need to log in before you can comment on or make changes to this bug.