Open Bug 1407212 Opened 7 years ago Updated 5 years ago

[Super Search] super search fields need refactoring

Tracking

(Not tracked)

Status:

NEW

People

(Reporter: willkg, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

pr 4947: bug 1407212: remove unused super_search_fields code 5 years ago Will Kahn-Greene [:willkg] ET needinfo? me 53 bytes, text/x-github-pull-request		Details \| Review

Will Kahn-Greene [:willkg] ET needinfo? me

Reporter

Description

•

7 years ago

We moved super search fields from being stored in Elasticsearch to being stored in one massive Python file in socorro/external/es/super_search_fields.py. That file is 4400 lines long and mostly consists of repeated sections defining fields.

Many fields share the bulk of their definition. For example, most of the "keyword" type fields have the following parts in common:

    'data_validation_type': 'str',
    'default_value': None,
    'form_field_choices': [],
    'form_field_type': 'StringField',
    'has_full_version': False,
    'is_exposed': True,
    'is_mandatory': False,
    'is_returned': True,
    'query_type': 'string',
    'storage_mapping': {
        'analyzer': 'keyword',
        'type': 'string'
    }

So now we've got a file that's got a lot of redundancy and copying/pasting.

It'd be nice to have a tiny layer of abstraction here that makes it easier to maintain fields and deal with this file.

Will Kahn-Greene [:willkg] ET needinfo? me

Reporter

Comment 1

•

7 years ago

I think I want the following out of an abstraction layer:

1. when looking at a field, I want to know how it differs from the defaults
2. when looking at a field, it should be easy to determine what "type" it is
3. adding a field with defaults should be easier
4. it'd be nice if the abstraction layer reduced possibility of typos


I'm thinking maybe creating a few functions that return dicts. For example, here's keyword:

PROCESSED_CRASH = 'processed_crash'

VIEW_PII = 'crashstats.view_pii'


def verify(data):
    # validate the data against some schema here and return False if invalid
    return True


def keyword(**overrides):
    defaults = {
        'form_field_type': 'StringField',
        'has_full_version': False,
        'is_exposed': True,
        'is_mandatory': False,
        'is_returned': True,
        ...
    }
    defaults.update(overrides)
    verify(defaults)
    return defaults


Then we could do this:

    'url': keyword(
        description='The website the user ...',
        in_database_name='url',
        name='url',
        namespace=PROCESSED_CRASH,
        permission_needed=[VIEW_PII],
    ),


All the work is done at module-load time. The likelihood of typos is reduced. After the module has been loaded, the value of FIELDS would be the same as it is now, so we don't have to change any other code--just super_search_fields.py. It reduces a lot of redundant defaults and makes it clearer when something is different. We can combine this with comments, too.

This feels easier to write, easier to read, and easier to review changes on.

What do we think?

John Whitlock [:jwhitlock]

Comment 2

•

5 years ago

I like it, this pattern has worked well for me for similar problems.

Another option is moving the definitions to an external file (JSON, YAML, CSV) and loading them at configuration time. It doesn't help with the duplicate data, but gets them out of code, and may make it easier for a non-staff dev to process the structure. I prefer the functions-returning-dicts, however.

Will Kahn-Greene [:willkg] ET needinfo? me

Reporter

Comment 3

•

5 years ago

We originally had a JSON file, but I ditched that for the Python file because JSON sucks for stuff like this where you need to document why things are different. Those were dark days.

Another option is to use attrs data classes which get us a lot of what we're doing above with a lot less code. This will require us to adjust super_search_fields using code, but I think that's probably a good thing all things considered.

I'll spend a couple of hours fiddling with these options to see what's easier to deal with and report back.

I want to do this before embarking on the Elasticsearch migration because simplifying super search fields will make that significantly easier.

Assignee: nobody → willkg

Blocks: 1322630

Status: NEW → ASSIGNED

Will Kahn-Greene [:willkg] ET needinfo? me

Reporter

Comment 4

•

5 years ago

Here's an example using attrs:

@attr.s
class Keyword:
    """Defines a keyword field."""
    name = attr.ib()
    namespace = attr.ib()

    in_database_name = attr.ib(
        default=attr.Factory(
            lambda self: self.name,
            takes_self=True
        )
    )
    default_value = attr.ib(default=None)
    data_validation_type = attr.ib(default='str')
    description = attr.ib(default='Add description')
    has_full_version = attr.ib(default=False)
    is_exposed = attr.ib(default=True)
    is_mandatory = attr.ib(default=False)
    is_returned = attr.ib(default=True)
    query_type = attr.ib(default='string')
    form_field_choices = attr.ib(default=attr.Factory(list))
    permissions_needed = attr.ib(default=attr.Factory(lambda: ['crashstats.view_pii']))
    storage_mapping = attr.ib(
        default=attr.Factory(
            lambda: {'analyzer': 'keyword', 'type': 'string'}
        )
    )


FIELDS = [
    Keyword(
        name='ActiveExperiment',
        namespace='raw_crash',
        description='The telemetry experiment active at the time of the crash, if any.',
    )
]

That's pretty succinct. Using attrs allows us to type-check and validate values, too, though I didn't test that out much here. I think I want to go with this.

Will Kahn-Greene [:willkg] ET needinfo? me

Reporter

Comment 5

•

5 years ago

A while back, I documented the fields in super_search_fields.py and noticed that is_mandatory and default_value have the same value for all the fields. Adrian and I talked about that briefly:

https://github.com/mozilla-services/socorro/pull/4417#discussion_r184945744

We can remove those.

I'll do this before I go further since it reduces the code.

Will Kahn-Greene [:willkg] ET needinfo? me

Reporter

Comment 6

•

5 years ago

Attached file pr 4947: bug 1407212: remove unused super_search_fields code — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Reporter

Comment 7

•

5 years ago

willkg merged PR #4947: "bug 1407212: remove unused super_search_fields code" in 10af200.

That'll make it easier to diff old fields with new fields to make sure they're the same.

Will Kahn-Greene [:willkg] ET needinfo? me

Reporter

Comment 8

•

5 years ago

Unassigning myself since I'm not going to get to this any time soon.

Assignee: willkg → nobody

Status: ASSIGNED → NEW

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

[Super Search] super search fields need refactoring

Categories

(Socorro :: General, task)

Tracking

(Not tracked)

People

(Reporter: willkg, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Attachment

General

Description

File Name

Content Type