Bug 1771804 Comment 0 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Andrew Sutherland [:asuth] (he/him)

on 2022-05-30 12:18:28 PDT

My team recently has been running into the lack of webkit having something comparable to searchfox or chromium's https://source.chromium.org/chromium.  Igalia has generously run https://webkit-search.igalia.com/ for some time but it unfortunately seems to be under-resourced and is frequently not operational and/or it only is able to perform a limited amount of semantic indexing of the tree and, for example, seems to not have any blame history.

To this end I'm planning to add a searchfox indexing job for webkit.  I expect there could potentially be a lot of interest in something like this as I understand webkit to be a frequently-embedded web runtime.  And I think it would be fantastic if us standing up an index of webkit provides positive externalities that can benefit the open source community at large.

That said, I'm very concerned about the potential for confusion about how supported such a webkit tree would be and the resulting burden on searchfox contributors.  Although searchfox is not under-resourced in terms of AWS machine-time (although we try very hard to be responsible with the AWS resources we use, including a major series of indexing optimizations I just landed that cut many indexing jobs time in half), we operate largely on a volunteer-basis and the justification for spending any work-time on searchfox is largely about mozilla-central.

Currently, the only trees we index semantically for C++ are mozilla-central variants and nss (which is something that also gets built as part of mozilla-central, and is extremely stable).  The mozilla-central jobs all run their C++ language-specific analyses as part of the mozilla-central CI and although the jobs are tier 2, sheriffs and developers actively help keep these jobs green (ex: bug 1768996 where :glandium provided a fix to searchfox's indexer) as the mozsearch indexer is part of the tier-1 in-tree builds even if its execution is tier 2.  For webkit, we will be running the C++ analysis on one of our indexers which has a significantly greater chance of breakage for many reasons; for example, the build script will need to install a bunch of dependencies at runtime which will increase the chance of random failures.

My plan for making it clear that the webkit repo indexing is not a tier-1 or supported repo is to dub it "wubkat".  My hope is that this will help set expectations appropriately while also maybe giving people a little chuckle.

While there are obviously other options like adding banners to the generated pages:
- This would be new development work.
- The primary goal here is to provide an identical searchfox experience for gecko developers trying to see what other browsers are doing, and adding a bunch of jarring annoyances is not helpful for that.  Also, it's more likely one-off searchfox users would quickly pre-attentively filter out any such nag UI, whereas regular searchfox users would find the added nag UI very jarring as it would deviate from their visual muscle memory for searchfox.

If we find that adding the webkit repo to searchfox gains additional contributors who help pick up some of the maintenance load and/or can help us decrease the maintenance burden by having webkit's CI generate searchfox analysis upstream, then we can definitely consider renaming the repo "webkit".  Alternately, if there's upstream interest, maybe "webkit" could run its own officially support mozsearch instance (like a better-resourced version of https://webkit-search.igalia.com/) and we could just redirect wubkat/webkit to that instance and stop running one on searchfox.

For prior art, note that emilio did an initial attempt at webkit support some years ago at https://github.com/emilio/webkit-index-config and I believe the Igalia maintainer built on this with https://github.com/dpino/webkit-index-config which I believe is what powers https://webkit-search.igalia.com/.  My thanks to both for providing this groundwork, as the webkit docs definitely don't seem to treat linux as a particularly supported platform and so it was nice to have the extra context that makes it clear that the GTK port is what should be built.

Revision 1 by

Andrew Sutherland [:asuth] (he/him)

on 2022-05-31 21:50:52 PDT

My team recently has been running into the lack of webkit having something comparable to searchfox or chromium's https://source.chromium.org/chromium.  Igalia has generously run https://webkit-search.igalia.com/ for some time but it unfortunately seems to be under-resourced and is frequently not operational and/or it only is able to perform a limited amount of semantic indexing of the tree.  (edit: I erroneously thought there was not full blame because of the config scripts, but now that I've checked the server when operational, I see the blame bar is fully operational.)

To this end I'm planning to add a searchfox indexing job for webkit.  I expect there could potentially be a lot of interest in something like this as I understand webkit to be a frequently-embedded web runtime.  And I think it would be fantastic if us standing up an index of webkit provides positive externalities that can benefit the open source community at large.

That said, I'm very concerned about the potential for confusion about how supported such a webkit tree would be and the resulting burden on searchfox contributors.  Although searchfox is not under-resourced in terms of AWS machine-time (although we try very hard to be responsible with the AWS resources we use, including a major series of indexing optimizations I just landed that cut many indexing jobs time in half), we operate largely on a volunteer-basis and the justification for spending any work-time on searchfox is largely about mozilla-central.

Currently, the only trees we index semantically for C++ are mozilla-central variants and nss (which is something that also gets built as part of mozilla-central, and is extremely stable).  The mozilla-central jobs all run their C++ language-specific analyses as part of the mozilla-central CI and although the jobs are tier 2, sheriffs and developers actively help keep these jobs green (ex: bug 1768996 where :glandium provided a fix to searchfox's indexer) as the mozsearch indexer is part of the tier-1 in-tree builds even if its execution is tier 2.  For webkit, we will be running the C++ analysis on one of our indexers which has a significantly greater chance of breakage for many reasons; for example, the build script will need to install a bunch of dependencies at runtime which will increase the chance of random failures.

My plan for making it clear that the webkit repo indexing is not a tier-1 or supported repo is to dub it "wubkat".  My hope is that this will help set expectations appropriately while also maybe giving people a little chuckle.

While there are obviously other options like adding banners to the generated pages:
- This would be new development work.
- The primary goal here is to provide an identical searchfox experience for gecko developers trying to see what other browsers are doing, and adding a bunch of jarring annoyances is not helpful for that.  Also, it's more likely one-off searchfox users would quickly pre-attentively filter out any such nag UI, whereas regular searchfox users would find the added nag UI very jarring as it would deviate from their visual muscle memory for searchfox.

If we find that adding the webkit repo to searchfox gains additional contributors who help pick up some of the maintenance load and/or can help us decrease the maintenance burden by having webkit's CI generate searchfox analysis upstream, then we can definitely consider renaming the repo "webkit".  Alternately, if there's upstream interest, maybe "webkit" could run its own officially support mozsearch instance (like a better-resourced version of https://webkit-search.igalia.com/) and we could just redirect wubkat/webkit to that instance and stop running one on searchfox.

For prior art, note that emilio did an initial attempt at webkit support some years ago at https://github.com/emilio/webkit-index-config and I believe the Igalia maintainer built on this with https://github.com/dpino/webkit-index-config which I believe is what powers https://webkit-search.igalia.com/.  My thanks to both for providing this groundwork, as the webkit docs definitely don't seem to treat linux as a particularly supported platform and so it was nice to have the extra context that makes it clear that the GTK port is what should be built.

Revision 2 by

Andrew Sutherland [:asuth] (he/him)

on 2022-05-31 21:53:39 PDT

My team recently has been running into the lack of webkit having something comparable to searchfox or chromium's https://source.chromium.org/chromium.  Igalia has generously run https://webkit-search.igalia.com/ for some time but it unfortunately seems to be under-resourced and is frequently not operational during Eastern Time work hours.  (edit: I erroneously thought there was limited semantic analysis run based on previous investigations and that there was not full blame because of the config scripts, but now that I've checked the server when operational, I see the blame bar is fully operational and semantic analysis for the branch in use seems to be running on everything, which is very exciting!)

To this end I'm planning to add a searchfox indexing job for webkit.  I expect there could potentially be a lot of interest in something like this as I understand webkit to be a frequently-embedded web runtime.  And I think it would be fantastic if us standing up an index of webkit provides positive externalities that can benefit the open source community at large.

That said, I'm very concerned about the potential for confusion about how supported such a webkit tree would be and the resulting burden on searchfox contributors.  Although searchfox is not under-resourced in terms of AWS machine-time (although we try very hard to be responsible with the AWS resources we use, including a major series of indexing optimizations I just landed that cut many indexing jobs time in half), we operate largely on a volunteer-basis and the justification for spending any work-time on searchfox is largely about mozilla-central.

Currently, the only trees we index semantically for C++ are mozilla-central variants and nss (which is something that also gets built as part of mozilla-central, and is extremely stable).  The mozilla-central jobs all run their C++ language-specific analyses as part of the mozilla-central CI and although the jobs are tier 2, sheriffs and developers actively help keep these jobs green (ex: bug 1768996 where :glandium provided a fix to searchfox's indexer) as the mozsearch indexer is part of the tier-1 in-tree builds even if its execution is tier 2.  For webkit, we will be running the C++ analysis on one of our indexers which has a significantly greater chance of breakage for many reasons; for example, the build script will need to install a bunch of dependencies at runtime which will increase the chance of random failures.

My plan for making it clear that the webkit repo indexing is not a tier-1 or supported repo is to dub it "wubkat".  My hope is that this will help set expectations appropriately while also maybe giving people a little chuckle.

While there are obviously other options like adding banners to the generated pages:
- This would be new development work.
- The primary goal here is to provide an identical searchfox experience for gecko developers trying to see what other browsers are doing, and adding a bunch of jarring annoyances is not helpful for that.  Also, it's more likely one-off searchfox users would quickly pre-attentively filter out any such nag UI, whereas regular searchfox users would find the added nag UI very jarring as it would deviate from their visual muscle memory for searchfox.

If we find that adding the webkit repo to searchfox gains additional contributors who help pick up some of the maintenance load and/or can help us decrease the maintenance burden by having webkit's CI generate searchfox analysis upstream, then we can definitely consider renaming the repo "webkit".  Alternately, if there's upstream interest, maybe "webkit" could run its own officially support mozsearch instance (like a better-resourced version of https://webkit-search.igalia.com/) and we could just redirect wubkat/webkit to that instance and stop running one on searchfox.

For prior art, note that emilio did an initial attempt at webkit support some years ago at https://github.com/emilio/webkit-index-config and I believe the Igalia maintainer built on this with https://github.com/dpino/webkit-index-config which I believe is what powers https://webkit-search.igalia.com/.  My thanks to both for providing this groundwork, as the webkit docs definitely don't seem to treat linux as a particularly supported platform and so it was nice to have the extra context that makes it clear that the GTK port is what should be built.

Back to Bug 1771804 Comment 0