Closed Bug 613379 Opened 15 years ago Closed 14 years ago

Change the default search mode of the main page search box to "contains" instead of "is exactly"

Categories

(Socorro :: Webapp, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ehsan.akhgari, Assigned: adrian)

References

Details

This happens to me everytime I want to search Socorro for a signature. I type a function name in the search box, press Enter, and get redirected to a page like <http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A&query_search=signature&query_type=exact&query=XPC_WN_GetterSetter> which tells me that there are no results found. Then I have to click Advanced Filters, change the search mode to "contains" and resubmit the search. It would be really helpful if "contains" would be the default mode.
Thanks for reporting this Ehsan - I'm actually working on that problem right now in Bug 609070. We're hoping to get this pushed to production around Dec 2.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
(In reply to comment #1) > Thanks for reporting this Ehsan - I'm actually working on that problem right > now in Bug 609070. We're hoping to get this pushed to production around Dec 2. Did I mention that you're awesome? Thanks! :-)
Actually, this was never fixed, and it's still hurting everybody who's searching for signatures on crash-stats.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Flags: in-testsuite?
OS: Mac OS X → All
Hardware: x86 → All
Ryan: given that we apparently didn't address this, mind taking it for 1.7.7?
Target Milestone: --- → 1.7.7
Assignee: nobody → ryan
Alright, I'm not sure how this was missed. The only possible explanation was possible dyslexia, where I thought search_type was supposed to be "exact" instead of "contains". Quick searches now have search_type set to "contains" by default. == Sending webapp-php/application/controllers/query.php Transmitting file data . Committed revision 2909.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Note, this will be available for testing on stage at https://crash-stats.stage.mozilla.com/ until Socorro 1.7.7 is released at the end of February.
If we're going to support "contains" as default, we need to do some engineering on the database end to keep it from saturating query bandwidth. Nothing too extreme, but we'll need to to performance testing before we deploy this in production. i.e. do NOT deploy this to production until we have done performance testing. Thank you. Hence, I'm opening the bug to indicate that we need to do the testing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Here are sample queries that can be used for testing. The following 2 queries are run every time a search is performed using the Contains option for the crash Signature. 1 week is the default query time range for the quick search. 12 weeks is the maximum query time range for logged in users. == SELECT COUNT(DISTINCT reports.signature) as count FROM reports WHERE (reports.product = 'Firefox') AND (reports.os_name = 'Windows NT') AND reports.signature LIKE '%UserCallWinProcCheckW%' AND reports.date_processed BETWEEN CAST('02/11/2011 14:30:30' AS TIMESTAMP WITHOUT TIME ZONE) - CAST('1 weeks' AS INTERVAL) AND CAST('02/11/2011 14:30:30' AS TIMESTAMP WITHOUT TIME ZONE) == SELECT reports.signature, count(reports.id), count(CASE WHEN (reports.os_name = 'Windows NT') THEN 1 END) AS is_windows, count(CASE WHEN (reports.os_name = 'Mac OS X') THEN 1 END) AS is_mac, count(CASE WHEN (reports.os_name = 'Linux') THEN 1 END) AS is_linux, count(CASE WHEN (reports.os_name = 'Solaris') THEN 1 END) AS is_solaris, SUM (CASE WHEN hangid IS NULL THEN 0 ELSE 1 END) AS numhang, SUM (CASE WHEN process_type IS NULL THEN 0 ELSE 1 END) AS numplugin FROM reports WHERE (reports.product = 'Firefox') AND (reports.os_name = 'Windows NT') AND reports.signature LIKE '%UserCallWinProcCheckW%' AND reports.date_processed BETWEEN CAST('02/11/2011 14:30:30' AS TIMESTAMP WITHOUT TIME ZONE) - CAST('1 weeks' AS INTERVAL) AND CAST('02/11/2011 14:30:30' AS TIMESTAMP WITHOUT TIME ZONE) GROUP BY reports.signature ORDER BY count(reports.id) DESC LIMIT 300 OFFSET 0
Ryan, Does the search always include a product and/or OS as well as a date limit? If so, we don't have an issue here with signature; those parameters restrict the search enough. Although I might want to adjust the indexes. Always helpful to see the actual query. Also: the second query is a bit screwed up. It's filtering by os_name, and then counting by os_name ... even though os_names which don't match the filter are always going to be zero. Perhaps we could do better constructing this query?
The search always includes a product. By default it is Firefox, or the most recently selected product in the navigation bar. The user can choose to alter the query to search for more than 1 product, but when the user uses the quick search in the upper right the query will only search for 1 product. Searches do not include an OS by default. The user may choose to specify one or more OSes in the advanced search page. Searches always use a date limit. Default is 1 week. Max is 12 weeks for logged in users. I agree that the query could be written better, but that can wait until another milestone. I've opened Bug 634451 for this.
Ryan, Ok, let me test the worst case of this (product but not OS, 12 weeks) and see what it looks like on the DB server.
Ryan, I need to veto this feature change based on performance. First, the default search (1 week, no OS, one product = "Firefox") using this change goes from 0.4ms execution time to over 4000ms query execution time. If the user chooses "12 weeks" it's even worse; over 130 seconds. Making this change would have a siginificant, adverse affect on database peformance.
Josh: any way to speed it up? or are we then gated on something like elasticsearch?
Josh, in my experience everyone always uses "contains" no matter what - it just takes two steps instead of one.
Laura, The only way would be for us to create our own index type for indexing signatures which tokenized on each character. This would be a significant development project; it would take me a couple hours of work to even tell you how difficult it would be. I've looked at a number of other approaches, including requiring the user to select an OS for the default search, but all of them result in the default search page queries taking at least 1000X as long as they do with an exact match (or a begins-with) on signature. The problem with making "contains" the default search is that nobody will ever select "matches exactly" once that's done ... as opposed to now, where users use "contains" only if "matches exactly" doesn't turn up what they want. So we could expect the majority of searches on the database to suddenly take 1000X as much CPU time. While the new database server is faster than the old, I don't think that it's 1000X faster. Is it possible that we can get at the information devs want some other way? Like, why are they searching on signature fragments in the first place? Are there "families" of signatures? Joe, That's not what the database logs show. Keep in mind that there are "public" users, and those are the ones more likely to use the default search parameters than Mozilla's internal users.
(In reply to comment #14) > Josh, in my experience everyone always uses "contains" no matter what - it just > takes two steps instead of one. I second that. *Nobody* would search crashstates for a full signature... That's not how the human mind works. :-)
(In reply to comment #16) > (In reply to comment #14) > > Josh, in my experience everyone always uses "contains" no matter what - it just > > takes two steps instead of one. > > I second that. *Nobody* would search crashstates for a full signature... > That's not how the human mind works. :-) Really? I assumed you were largely cutting and pasting the signature (out of a bug, for example). Would autocomplete or similar solve this?
Also: is the change to "contains" something all the devs want, or only a few? Given that this is going to slow down the default search for the majority of users, I think we should have some verification that this feature is a general request. I am currently turning on full query logging to see what current search parameters look like for all searches.
One of my concerns in changing the default to "contains" is that sometimes I want to have an exact match and not have Socorro return all the vksaver crashes, for example (Bug 614966). My workflow is I usually cut and paste the crash signature and get what I want. I use the "contains" search when I want to find a bunch of signatures that are related. Like I might type "vksaver" if I want to get all the associated crashes in one lump. But I think I use "exact" a lot more than I use "contains."
S(In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #14) > > > Josh, in my experience everyone always uses "contains" no matter what - it just > > > takes two steps instead of one. > > > > I second that. *Nobody* would search crashstates for a full signature... > > That's not how the human mind works. :-) > > Really? I assumed you were largely cutting and pasting the signature (out of a > bug, for example). That doesn't always work in my experience. What would probably work is "begins with," though. > Would autocomplete or similar solve this? That would help too!
I'm going to push this out to 1.7.8: not because this isn't important, but because we need to clarify the requirements and performance tradeoffs.
Target Milestone: 1.7.7 → 1.7.8
> That doesn't always work in my experience. What would probably work is "begins > with," though. Oh! Well, then maybe we have an escaping issue. Can you provide a few examples?
Target Milestone: 1.7.8 → 1.7.7
It sounds like this is mostly about a way to work around bugs in "is exactly" rather than really needing "includes" as the default. I know there are bugs on "is exactly" on file, especially when copy and paste is involved. ehsan's comment 16 contradicts this idea though. ehsan, can you expand on your comment 16? -how you come up with the search term you are looking for? -is it copy and pasted from somewhere? -when you get the results back from a contains search do they need further filtering, and/or whats the quality of the results that you see? walking though a use case to show a place where contains was useful in understanding a bug or set of bugs might be helpful.
(In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #14) > > > Josh, in my experience everyone always uses "contains" no matter what - it just > > > takes two steps instead of one. > > > > I second that. *Nobody* would search crashstates for a full signature... > > That's not how the human mind works. :-) > > Really? I assumed you were largely cutting and pasting the signature (out of a > bug, for example). No. If you're in a bug, it usually contains a link to the related crashstats query. Even if you copy and paste, the idea of doing exact searches will break your workflow if you fail to select the closing bracket or something like that... > Would autocomplete or similar solve this? It will help enormously, but it's not related to how searches should work by default, right?
Autocomplete would actually help how searches work by default. If the search box in the upper-right would autocomplete the signature as you typed it in, then it would allow you to do an exact search on that signature. So, we could move back to doing an exact search by default, but allow people to search using the "contains" option through the advanced search page.
Reverting code from r2909. == Sending application/controllers/query.php Transmitting file data . Committed revision 2933.
(In reply to comment #20) > > Really? I assumed you were largely cutting and pasting the signature (out of a > > bug, for example). > > That doesn't always work in my experience. What would probably work is "begins > with," though. That wouldn't work for me, though. Most of the time when I'm using a non-canned search, I want to search for related signatures where the first frame is either prefixed or ignored (so signatures like "objc_msgSend | foo" and "0x0 | foo", which in the latter case shows up in the UI as just "foo"). > > Would autocomplete or similar solve this? Probably not in my case, but for all the beginsWith people, sure. What about if we switched logged-in search to "contains" by default and left logged-out search "exact"? That should keep the mass of public/external users from contributing to increased load?
(In reply to comment #23) > It sounds like this is mostly about a way to work around bugs in "is exactly" > rather than really needing "includes" as the default. I know there are bugs on > "is exactly" on file, especially when copy and paste is involved. > > ehsan's comment 16 contradicts this idea though. ehsan, can you expand on your > comment 16? > > -how you come up with the search term you are looking for? > > -is it copy and pasted from somewhere? > > -when you get the results back from a contains search do they need further > filtering, and/or whats the quality of the results that you see? > > walking though a use case to show a place where contains was useful in > understanding a bug or set of bugs might be helpful. Take bug 634387 for example. If I want to adjust to the "exact" search query, I should copy and paste "[@ mozalloc_abort | NS_DebugBreak_P]" in the search box, delete NS_DebugBreak_P and the pipe character, and make sure that all of the spacing is correct, and then press enter. If the default search mode is "contains", I can just type "mozalloc_abort" in the search box and get my results. Another example is the current "vksaver.dll" crashes. I need to be able to query all of them, instead of doing individual queries such as "vksaver.dll@0x3329". Another example is this crash: "js::StackSpace::pushSegmentForInvoke(JSContext*, unsigned int, js::InvokeArgsGuard*)". Nobody can expect me to have the exact signature memorized, I should be able to just search for "pushSegmentForInvoke" and get the results that I need. It's true that sometimes the results with "contains" searches are too broad, but that's usually how you search. You search for something, and then narrow down your search. You usually don't go the other way (searching for the more specific query, and then broadening your search). Is that convincing?
Target Milestone: 1.7.7 → 1.7.8
Assignee: ryan → nobody
Holding for ES.
Target Milestone: 1.7.8 → 1.9
Adrian will work on this as a first example with ES.
Assignee: nobody → agaudebert
Depends on: 654567
This is actually part of 2.0
Target Milestone: 1.9 → 2.0
This will be solved by the ElasticSearch implementation of the search, though it won't be real "contains", but the default term-based search ES provides (see http://www.elasticsearch.org/ ). A new search mode will appear, called "default", which will match this term-based search if ES is enabled, or fall to "starts_with" if PostgreSQL if used. The "contains" mode will still exist, and still be slow for both implementations.
Fixed in 2.0.
Status: REOPENED → RESOLVED
Closed: 15 years ago14 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
Component: General → Webapp
You need to log in before you can comment on or make changes to this bug.