Closed Bug 475754 Opened 15 years ago Closed 10 years ago

Ability to search with full functionality via the WebService

Categories

(Bugzilla :: WebService, enhancement, P1)

enhancement

Tracking

()

RESOLVED DUPLICATE of bug 477601

People

(Reporter: rosie.clarkson, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121622 Fedora/3.0.5-1.fc9 Firefox/3.0.5
Build Identifier: 

Need the full functionality of Search.pm in the WebService as well as the Bug.search basic search (Bug 398281). You ought to be able to replicate any search you can do via query.cgi through the WebService.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
This patch creates a method which takes a hash of search values as key:value pairs and returns a list of bug ids. It uses Search.pm to create the query and then queries the database.

This works but I assume it's a bit too naive. Is it wrong to query the database from the WebService? 

I've seen Bug 398308 dealing with the fact that Search.pm takes CGI objects but in the meantime I've used Bugzilla::CGI to create one

The strange copyright statements are because I've been working on this at work - it explains it a bit more at https://support.planningportal.gov.uk/wiki/index.php/Information_Asset_Register
Attachment #359301 - Attachment is patch: true
Attachment #359301 - Attachment mime type: application/octet-stream → text/plain
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Priority: -- → P1
Hardware: x86 → All
Target Milestone: --- → Bugzilla 4.0
Comment on attachment 359301 [details] [diff] [review]
patch to create new search method

>© Crown copyright 2009 - Rosie Clarkson (development@planningportal.gov.uk) for the Planning Portal
>#
># You may re-use the Crown copyright protected material (not including the Royal Arms
># and other departmental or agency logos) free of charge in any format. The material must
># be acknowledged as Crown copyright and the source given.

  That license is unacceptable. The code must either be licensed purely under the MPL or it may not be included in Bugzilla.

>@@ -209,6 +217,22 @@
>     return undef;
> }
> 
>+sub full_search{
>+	my ($self,$params) = @_;
>+	my $searchparams = new Bugzilla::CGI($params); #convert to cgi object

  Unfortunately, I want a much simpler interface than this allows us. I'd prefer something where individual boolean charts are specified like:

  { field => 'assigned_to', type => 'allwordssubstr', value => ['mkanat', 'bugzilla'] },

  The trick is how to allow OR, AND, NOT, and multiple charts, with that format.
Attachment #359301 - Flags: review-
We're pretty certain we can sort out the licensing issue and will be in touch about that.

In the meantime I can try and propose a suitable interface for the search.
Comment on attachment 359301 [details] [diff] [review]
patch to create new search method

>© Crown copyright 2009 - Rosie Clarkson (development@planningportal.gov.uk) for the Planning Portal
>#
># You may re-use the Crown copyright protected material (not including the Royal Arms
># and other departmental or agency logos) free of charge in any format. The material must
># be acknowledged as Crown copyright and the source given.

  For the record, I had a discussion by email with Chris Eveleigh, Rosie's boss, and we decided that it's OK for this copyright statement to stay at the top of the patch like this, but that the code itself should be licensed under the MPL. This statement is intended merely to allow anybody to relicense the code without contacting the Crown.
I suspect that you will be dependent upon bug 398308 to do this the right way.
Depends on: 398308
Oh, also, I was thinking that we'd call the method search_advanced.
Or even better, we could re-implement the normal Bugzilla.search criteria using Search.pm, and the Boolean Charts could just go into an "advanced =>" parameter.

Since this is a WebService interface, I'd like to be strict and require that people always specify a field, type, and value for each chart in the "advanced" parameter.

You may wish to first file a bug about re-working the existing Bug.search using Search.pm.
The API proposed by the simple search is just key=>value pairs - we'd add to this an 'advanced' key whose value would be an array of boolean charts.
This array would take the form:

advanced=>[chart]
    where chart = {negate, andset}
        where negate= bool and andset=[orset]
            where orset = [query]
                where query = {field, operator, value}


This matches the boolean chart setup on the query page where charts are made up of grouped OR statements separated by AND statements and whole charts can be negated

So now we need to translate that structure into a CGI object like this: 

{
    negate0:0, field0-0-0:product, type0-0-0:equals, value0-0-0:Bugzilla, field0-0-1:component, type0-0-1:substring, value0-0-1:WebService, field0-1-0:assigned_to, type0-1-0:substring, value0-1-0:mkanat
}

Which should be easy because you can see that the x-x-x string corresponds to the position of the value in each array advanced-andset-orset.

By doing this we can use Search.pm without depending on bug #398308. Once #398308 has been resolved this translation can be removed/modified but the interface can remain the same.

The existing basic search can be translated into boolean chart form here and added to the CGI object. But initially we will either treat the search as basic OR advanced not a mix of the two.

There are some examples at https://support.planningportal.gov.uk/wiki/index.php/API

We can start work on this on Tuesday and are happy to do this either under this bug or a new one.
(In reply to comment #9)
> advanced=>[chart]
>     where chart = {negate, andset}

  For simplicity's sake, you should probably also allow people to just specify charts directly in "chart". Perhaps we should start off with just that interface (assume all charts are ANDed), and then later add the and/or/negate interface?

>                 where query = {field, operator, value}

  I think "type" might be clearer than "operator"--I think people tend to think of "operator" as a symbol.

> By doing this we can use Search.pm without depending on bug #398308.

  Well, the problem is that I never want the API to be affected by bug 67036. Once we introduce an API, I like it to behave consistently forever. So I'd like to be able to pass in arrays directly for "value", not just comma-separated strings.

> The existing basic search can be translated into boolean chart form here and
> added to the CGI object. But initially we will either treat the search as basic
> OR advanced not a mix of the two.

  I'd like it to always support both. Perhaps the first item should be translating basic search into Search.pm format.
Thanks for the API examples. Actually, it looks pretty sensible (at least, as sensible as we can get when dealing with something like boolean charts). Perhaps we should just stick with your API.

But I do also want my simple version, which would look like:

advanced => [
  { field, type, value },
  { field, type, value },
]

Do you think that would also be OK?
thanks for another quick response.  that sounds plausible.  i've updated our API wiki page a bit.  the 'advanced' definition now looks like this:

advanced: [chart | query]
    chart := {negate: bool, chart: andset}
    andset := [orset]
    orset := [query]
    query := {field: string, operator: string, value: any}
    any := string | array | integer | bool | datetime | whatever

i'll discuss this with Rosie on Tuesday and we'll post another update.  in the meantime, keep the ideas and pointers coming.  :-)

personally, i strongly prefer 'operator' to 'type'.  the main audience for the API is, presumably, programmers.  for programmers the words "type" and "operator" have quite specific meanings.  in that context this is an operator, not a type.

by the way, do you have a preference for the format of the values of the type/operator key?  e.g. "equals" or "is equal to", "allwordssubstr" or "contains all of the strings"?
(In reply to comment #12)
> personally, i strongly prefer 'operator' to 'type'.  the main audience for the
> API is, presumably, programmers.  for programmers the words "type" and
> "operator" have quite specific meanings.  in that context this is an operator,
> not a type.

  Hmm. True, true. I've heard others use operator in this context, too. So yeah, that sounds best.

> by the way, do you have a preference for the format of the values of the
> type/operator key?  e.g. "equals" or "is equal to", "allwordssubstr" or
> "contains all of the strings"?

  Well, I want it to not have any spaces, but I also want it to be clear, particularly to people who aren't native English speakers. Perhaps we should use symbols for the ones that we can?

  The basic (most common) ones would be:

  = "is equal to"
  != "is not equal to"
  ~ "matches regex"
  !~ "does not match regex"
  contains "contains the string (case insensitive)"
  not_contains "does not contain the string"

  Note that the case-insensitivity of "contains" is actually not guaranteed currently for all fields on PostgreSQL, I believe.
  
  Then advanced ones would be:

  < "is less than"
  > "is greater than"
  contains_case "contains the string (exact case)"
  contains_strings_any "contains any of the strings" (takes an array)
  contains_strings_all "contains all of the strings" (takes an array)
  contains_strings_none "contains none of the strings" (takes an array)
  contains_words_any "contains any of the words" (takes an array)
  contains_words_all "contains all of the words" (takes an array)
  contains_words_none "contains none of the words" (takes an array)
  fulltext "matches" (perhaps should also take a "mode" argument to note whether or not boolean search operators are available, but that's hard to guarantee as an an API across DBs)

  Then all of the "changed" operators are just their display name with an underscore in place of the space.

  We could then translate these into Search.pm format much like we do with FIELD_MAP for Bug.get and Bug.search.
glad i'm not the only one who was thinking about people who don't have english as their first language.

i've got some thoughts about more symbols we can use.

also i think we can kind of 'overload' some operators depending on whether the 'value' is an array or not.  e.g. re-use "==" and "!=" for 'allwords' and 'nowords' and even "=~" and "!~" for 'allwordssubstr' and 'nowordssubstr' ... and for 'substring' and 'notsubstring' too.

as you can see i'm preferring "==" and "=~" to "=" and "~" as the two-character versions are the form used in perl and other languages.

i'll update again tomorrow with more detailed thoughts.
(In reply to comment #14)
> also i think we can kind of 'overload' some operators depending on whether the
> 'value' is an array or not.  e.g. re-use "=="

  '=', not '==', though.

> and "!=" for 'allwords' and 'nowords'

  I think that's too confusing. Would I expect (least surprise) substr or words?

> and even "=~" and "!~" for 'allwordssubstr' and 'nowordssubstr' ...
> and for 'substring' and 'notsubstring' too.

  No, because that changes the meaning from regex to something else, which is confusing.

> as you can see i'm preferring "==" and "=~" to "=" and "~" as the two-character
> versions are the form used in perl and other languages.

  Yes, but I believe that to be a mistake that became popular with C and that we have propagated unnecessarily to other languages. (See Eiffel for a good example of a better way of handling the = problems.)
thanks Max - i'll go have a look at Eiffel now - sounds good.

i've updated our wiki page with a bit of reasoning behind the various symbol ideas.
(In reply to comment #16)
> thanks Max - i'll go have a look at Eiffel now - sounds good.

  It's generally a great programming language, just unfortunately impractical to use for a shipping application. It does almost everything right.

> i've updated our wiki page with a bit of reasoning behind the various symbol
> ideas.

  Hmm. I'm opposed to overloading the symbols for any reason. So let's not overload them, I think it will keep things simpler for our code and for clients.
okay.  here's my first attempt at the complete 'advanced' API definition.

i've used (and attempted to extrapolate) your values for the operator.

the field names are copied from the FIELD_MAP and then from the cgi names for everything else.

oh, except i've changed "description" back to "comment" so that it makes sense alongside "commenter" - and i've changed "longdescs.isprivate" to "comment.isprivate".

what other changes would you like?  and do you think this is sufficient?

advanced: [chart | query]
    chart := {negate: bool, chart: andset}
    andset := [orset]
    orset := [query]
    query := {field: bzfield, operator: bzopr, value: any}
    bzfield := "creation_time" | "comment" | "id" | "last_change_time" |
        "platform" | "severity" | "status" | "summary" | "url" | "whiteboard" |
        "alias" | "assigned_to" | "attachments.submitter" |
        "attach_data.thedata" | "attachments.description" |
        "attachments.filename" | "attachments.isurl" |
        "attachments.isobsolete" | "attachments.ispatch" |
        "attachments.isprivate" | "attachments.mimetype" | "blocked" | "cc" |
        "cclist_accessible" | "classification" | "comment.isprivate" |
        "commenter" | "component" | "content" | "days_elapsed" | "deadline" |
        "dependson" | "estimated_time" | "everconfirmed" | "flagtypes.name" |
        "requestees.login_name" | "setters.login_name" | "bug_group" |
        "work_time" | "keywords" | "op_sys" | "percentage_complete" |
        "priority" | "product" | "qa_contact" | "remaining_time" | "reporter" |
        "reporter_accessible" | "resolution" | "see_also" |
        "target_milestone" | "owner_idle_time" | "version" | "votes" |
        "cf_".string
    bzopr := "=" | "!=" | "<" | ">" | "~" | "!~" |
        "matches" | "is_any" | "contains_case" | "contains" | "not_contains" |
        "contains_words_all" | "contains_words_any" | "contains_words_none" |
        "contains_strings_all" | "contains_strings_any" |
        "contains_strings_none" | "changed_from" | "changed_to" |
        "changed_by" | "changed_before" | "changed_after"
    any := string | array | integer | bool | datetime | whatever
We've nearly finished a method to search boolean charts as well as the basic
search key value pairs using Search.pm and hopefully should be able to submit a
patch tomorrow
(In reply to comment #18)
>     bzfield := "creation_time" | "comment" | "id" | "last_change_time" |
>         "platform" | "severity" | "status" | "summary" | "url" | "whiteboard" |

  Note that we haven't presented URL in some search interfaces because we plan for it to disappear. We can keep it as long as we note in the docs that it will disappear some day, most likely.

>         "alias" | "assigned_to" | "attachments.submitter" |

  Perhaps we should make attachment singular, since everything else is.

>         "attach_data.thedata" 

  attachment.data

>| "attachments.description" |

  attachment.summary I think might be better?

>         "attachments.filename" | "attachments.isurl" |

  For consistency: attachment.file_name, attachment.is_url, etc.

>         "attachments.isprivate" | "attachments.mimetype" | "blocked" | "cc" |

  "blocked" as a field name was a terrible mistake. Should be "blocks".

>        "flagtypes.name" |

  flag.name

>         "requestees.login_name" | "setters.login_name" | "bug_group" |

  flag.requestee, flag.setter, group

>     bzopr := "=" | "!=" | "<" | ">" | "~" | "!~" |
>         "matches" | "is_any" | "contains_case" | "contains" | "not_contains" |

  "fulltext" instead of "matches"

  What's "is_any"?

  Everything else looks pretty good, but I might have some other thoughts as we continue to revise.
thanks Max.  "is_any" is "is equal to any of the strings".
how about comment.is_private, depends_on, ever_confirmed and operating_system?
(In reply to comment #22)
> how about comment.is_private, depends_on, ever_confirmed and operating_system?

  That sounds good, as long as we're completely consistent with what Bug.create and the current Bug.search do. (So we have to keep op_sys unfortunately.)
hmm.  does that mean we also _have_ to use "description" instead of "comment", even though it's actually going to search all the comments? (and all the other comment fields are called "comment.something")

also, we're currently returning a set of bug IDs, whereas the existing search method returns a set of bug hashes - do we have to return hashes?  (not a big problem to implement - i just don't like the feel of it!)

apologies for no patch today - we ran into a couple of issues.
maybe we should add another option - similar to the existing 'limit' and 'offset' - called 'return' which can be "count", "id" or "bug"?

it could default to "id" for compatibility with the current search method.
(In reply to comment #24)
> hmm.  does that mean we also _have_ to use "description" instead of "comment",
> even though it's actually going to search all the comments? (and all the other
> comment fields are called "comment.something")

  No, "description" is unique to Bug.create, that's fine. In fact, we should alias "comment" to "description" for Bug.create so that people can use it consistently if they want.
 
> also, we're currently returning a set of bug IDs, whereas the existing search
> method returns a set of bug hashes - do we have to return hashes?  (not a big
> problem to implement - i just don't like the feel of it!)

  You do have to return identically what Bug.search already returns, yes. But _bug_to_hash makes it very easy.

(In reply to comment #25)
> maybe we should add another option - similar to the existing 'limit' and
> 'offset' - called 'return' which can be "count", "id" or "bug"?
> 
> it could default to "id" for compatibility with the current search method.

  No, just keep it as-is. You can implement the include_fields and exclude_fields parameters (see Bugzilla::WebService on tip) if you want to limit the return value.
thanks for that info Max - i wouldn't have spotted the include/exclude_fields thing.  that sounds like a much better way of doing it.

unfortunately we've been interrupted by snow and poor internet connectivity so there'll be no patch again today.  fingers crossed for tomorrow.  :-)
This patch should retain the functionality and API of the basic search and add the ability to search via boolean charts using the advanced option. It was also falling over finding null dup_ids so have added a check for that.
Attachment #359301 - Attachment is obsolete: true
Rosie, please set the review flag to "?" and put mkanat as reviewer in the requestee field.
Assignee: webservice → rosie.clarkson
Status: NEW → ASSIGNED
Comment on attachment 360766 [details] [diff] [review]
Bug.search with advanced option using Search.pm

Okay, here's what I want to do:

1) Just move to supporting the basic search, first, in a separate bug.

2) Add advanced searching.

That would make the reviews simpler.

Also, we should make Search.pm actually support limit directly instead of hacking it into the SQL (in a separate bug).

You should be calling new_from_list instead of check() in a loop, because as your comment points out, you don't need check().

The FIELD_MAP and the OPERATOR_MAP look great, by the way. :-)

You might want to take bug 474897 into account, also, when translating the basic fields.
Oh and yeah, in reference to LpSolit's comment, make sure to read our Contributor Guidelines:

  http://wiki.mozilla.org/Bugzilla:Developers
(In reply to comment #30)
> 1) Just move to supporting the basic search, first, in a separate bug.
> 
> 2) Add advanced searching.
> 
> That would make the reviews simpler.

we're happy to do that, i think, but please consider that the advanced searching code will have to be reviewed at some point anyway.  essentially, the only additional check that's required beyond that is, "does the 'basic to advanced translation' (8 lines of code) work?" - is it really going to be easier doing this in two stages?  (can you tell that i don't really know what goes into 'reviewing'? ;-) )
> 
> Also, we should make Search.pm actually support limit directly instead of
> hacking it into the SQL (in a separate bug).

true, but not before bug 398308 is done, i suspect.  i suggest we just put in a comment about it for now.  presumably it's just going to be a matter of, approximately, moving these lines into Search.pm?

by the way, can you remove this bug's dependency on bug 398308 please?

> 
> You should be calling new_from_list instead of check() in a loop, because as
> your comment points out, you don't need check().

thanks - we'll sort that out tomorrow too.
> 
> The FIELD_MAP and the OPERATOR_MAP look great, by the way. :-)

yay!

> 
> You might want to take bug 474897 into account, also, when translating the
> basic fields.

that looks like fun!  can i suggest that we just add a comment about that too for now?

sorry about missing the 'review' flag - we realised straight away but thought it'd be okay.  we'll re-read the Guidelines in case there's more stuff we need to be reminded of.  :-)
(In reply to comment #32)
> is it really going to be
> easier doing this in two stages?  (can you tell that i don't really know what
> goes into 'reviewing'? ;-) )

  Yeah. See this:

  https://wiki.mozilla.org/Bugzilla:Simple_Patches
Thanks for the comments - we're working on a simpler patch and should be able to  report the new bug and submit that next week
I've created bug 477601 to adapt Bug.search to use Search.pm and have added a patch.

I also created bug 477593 to fix another issue i found whilst testing this.
Depends on: 477601
Blocks: 637642
Is this still being worked on?
Not as far as I know! The blockers need some love. Once they're implemented, this should be pretty easy.
Assignee: rosie.clarkson → webservice
Status: ASSIGNED → NEW
We are going to branch for Bugzilla 4.4 next week and this bug is either too invasive to be accepted for 4.4 at this point or shows no recent activity. The target milestone is reset and will be set again *only* when a patch is attached and approved.

I ask the assignee to reassign the bug to the default assignee if you don't plan to work on this bug in the near future, to make it clearer which bugs should be fixed by someone else.
Target Milestone: Bugzilla 4.4 → ---
What is the plan for this enhancement?

For Mylyn we want to use the upcoming REST API to replace the old HTML way so this is important.
No longer blocks: 637642
All blockers are done and a patch is submitted.
Any plans to implement a patch into a main branch?
As far as I can tell, the functionality for this bug was added in bug 477601 and thus is a duplicate. Bug 477601 is in 5.0 which should be released in the next few weeks or so.

dkl
Status: NEW → RESOLVED
Closed: 10 years ago
No longer depends on: 477601
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: