need a server for storing stars/comments for tinderboxpushlog without tinderbox

RESOLVED FIXED

Status

Tree Management Graveyard
TBPL
RESOLVED FIXED
8 years ago
3 years ago

People

(Reporter: dbaron, Assigned: mstange)

Tracking

Dependency tree / graph

Details

Attachments

(2 attachments, 7 obsolete attachments)

To have a tinderboxpushlog without tinderbox, we need a server to store and retrieve the stars and comments.  (The client side needs this server, and it also needs bug 601711.)
I'm working on this.
Assignee: nobody → mstange
Status: NEW → ASSIGNED
Blocks: 608991
We should talk to the guys on the Metrics team, since they're building tools which act on this data.
The automation tools team is working on this using a metrics Elastic Search database.   Currently, when a bug is starred in tbpl, it reports that data into our ES database as well as tinderbox.  It wouldn't take much to get tbpl to query this db as well, and cease depending on tinderbox at all for comment data.

See bug 617755
Depends on: 624507
I've just submitted a patch to allow tbpl to store all star comments in the ElasticSearch database we're using for the War on Orange (bug 624507).  Previously, it was only storing comments that referenced bug numbers.

Once this patch has been running for some time (a couple of weeks?), we could change tbpl to read star comments from this database instead of tinderbox, taking tinderbox out of the loop entirely for star comments.

In order to store tbpl comments in the ElasticSearch db, we send the data through an intermediate script, starcomment.php, which is currently located on brasstacks.mozilla.com.  This is convenient right now for testing, but eventually we should move this to a machine managed by IT.  This script could be part of pushlog itself, if the machine running tbpl can see the MPT network.  Can it?

Updated

8 years ago
Blocks: 630538
Created attachment 511170 [details] [diff] [review]
patch v0.1

We've been storing tbpl comments in the OrangeFactor ElasticSearch database for sometime now (in addition to tinderbox).  This patch updates tbpl to read comments from this datasource as well, completely removing tinderbox from the loop, as far as comments go.

Comment dates aren't displayed for comments that were submitted prior to the time that the fix for bug 632605 goes live.

If we decide to go with this patch, we should first determine where starcomment.php should live, which is a simple php script that acts as a gateway to the ElasticSearch db.  It's currently hosted on brasstacks, but this might not be an ideal permanent solution, as the webserver for this box isn't monitored by IT.  Either we'll need to have it monitored by IT, or we'll need to move starcomment.php to another machine.  Potentially, it could reside on tbpl.mozilla.org, if we can get IT to open a hole in the firewall between that machine and the ES db at elasticsearch1.metrics.sjc1.mozilla.com:9200.
Attachment #511170 - Flags: review?(mstange)
Created attachment 511175 [details] [diff] [review]
patch v0.2

previous patch was missing some data
Attachment #511170 - Attachment is obsolete: true
Attachment #511175 - Flags: review?(mstange)
Attachment #511170 - Flags: review?(mstange)
Comment on attachment 511175 [details] [diff] [review]
patch v0.2

>+      var slave = "";
>+      if (buildScrape) {
>+        for (var i = 0; i < buildScrape.length; i++) {
>+          var matches = /^(\s*)s:(\s*)(.*?)$/.exec(buildScrape[i]);
>+          if (matches) {
>+            slave = matches[3];

Add a break here.
Otherwise it looks good, one further step to drop tinderbox.
Comment on attachment 511175 [details] [diff] [review]
patch v0.2

starcomment.php doesn't send any headers to allow cross-origin reads. You'll need to add a line like
header("Access-Control-Allow-Origin: *");
to allow the TBPL server to read anything.

However, even then I don't think accessing the iframe's DOM is possible. We'll need to use XMLHttpRequest instead, so probably $.ajax instead of NetUtils.crossDomainPost. The iframe/DOM stuff that NetUtils.crossDomainPost was necessary for write-only requests to the Tinderbox starring script which didn't send cross-origin headers.

>+  _loadESNoteData: function Data__loadNoteData(loadTracker, machineResults, updatedPushCallback, data) {

You don't need the "data" parameter. data is the object this method is called on, so just use "this" where you were using "data".

>+    // generate the POST data to use to request note data from ElasticSearch;
>+    // this consists of the tree name and a list of dates encompassing all
>+    // the machineResults

This is a sentence, so please capitalize the first word and end with a full stop. This also applies to the other two comments you're adding.

>+    var noteparams = { 'tree': Config.repoNames[data._treeName],
>+                       'dates': dates };

I think we use this indentation style in most of the other code:

>+    var noteparams = {
>+      'tree': Config.repoNames[data._treeName],
>+      'dates': dates
>+    };

>+        for (var result in machineResults) {

Call result resultID instead.

>+          var startTime = machineResults[result].startTime.getTime() / 1000;
>+          for each (var note in notes) {
>+            if (note.startTime == startTime && note.slave == machineResults[result].slave) {

How about a new function Data._getNotesForMachineResult(machineResult, notes)?
Also, do you know what the cross-browser support for "for each" loops is like?

>+              if (machineResults[result].note)
>+                machineResults[result].note += "<br/><br/>";
>+              machineResults[result].note += "[<b><a href='mailto:" + note.who + "'>" + note.who + "</a></b>";
>+              if (note.timestamp) {
>+                var date = new Date(note.timestamp * 1000);
>+                machineResults[result].note += " - " + UserInterface._getDisplayDate(date);
>+              }
>+              machineResults[result].note += "]<br/>" + note.note;

I'd prefer to constrain this kind of HTML munging to UserInterface. So instead of having a "note" property on the machineResult that contains a string, how about making it a "notes" array that contains note objects? Then you can do the HTML generation inside htmlForNoteInPopup() in UserInterface.js (or even better, in another function that's called from there).

>+      }
>+      catch (e) {

No line break here please.

>-      function tinderboxDataLoadCallback(data) {

Ah, now I see where the confusion comes from. This data is a different kind of data than in the rest of the file :-)
Yeah, renaming the argument to machineResults is a good idea.

>diff --git a/js/TinderboxJSONUser.js b/js/TinderboxJSONUser.js
>--- a/js/TinderboxJSONUser.js
>+++ b/js/TinderboxJSONUser.js
>@@ -21,7 +21,7 @@ var TinderboxJSONUser = {
>             // up-to-date Tinderbox data for this push.
>             push.latestTinderboxData = Math.max(push.latestTinderboxData || push.date, timeRange.endTime);
>           });
>-          loadCallback(self.parseTinderbox(tree, tinderbox_data, data));
>+          loadCallback(self.parseTinderbox(tree, tinderbox_data, data), data);

So this change can obviously go away.

> 
>       var note = build.hasnote ? notes[build.noteid * 1] : "";
> 
>+      var slave = "";
>+      if (buildScrape) {
>+        for (var i = 0; i < buildScrape.length; i++) {
>+          var matches = /^(\s*)s:(\s*)(.*?)$/.exec(buildScrape[i]);
>+          if (matches) {
>+            slave = matches[3];
>+          }
>+        }
>+      }

I'd prefer this to be factored into a separate function so you can do
var slave = self._getSlaveName(buildScrape);
and because "if (matches) return matches[3];" looks better than
"if (matches) { slave = matches[3]; break; }" :-)

>       var result = machineResults[machineRunID] = new MachineResult ({
>         "tree" : tree,
>         "machine": machine,
>+        "slave": slave,

Or actually you can just do "slave": self._getSlaveName(buildScrape), here.

>@@ -140,7 +151,7 @@ var TinderboxJSONUser = {
>         "fullLogURL": "http://tinderbox.mozilla.org/showlog.cgi?log=" + tree + '/' + machineRunID + '&fulltext=1',
>         "summaryURL": Config.baseURL + "php/getSummary.php?tree=" + tree + "&id=" + machineRunID,
>         "revs": revs,
>-        "note": note,
>+        "note": "",

This will become "notes": [] then. And you probably want to add a method "hasNote()" to machineResult.prototype which you can use in UserInterface.

>diff --git a/js/NetUtils.js b/js/NetUtils.js
>--- a/js/NetUtils.js
>+++ b/js/NetUtils.js
>@@ -42,7 +42,7 @@ var NetUtils = {
>     form.get(0).submit();
>     form.remove();
>     iframe.get(0).onload = function crossDomainIframeLoaded() {
>-      callback();
>+      callback($(iframe).contents().find('body').html());
>       setTimeout(function () { iframe.remove(); }, 0);
>     }
>   },

Thinking about this again, loading JSON inside an iframe also poses a security issue: Somebody could add evil HTML and JavaScript in his notes, and it would be executed in the iframe...
Attachment #511175 - Flags: review?(mstange) → review-
Created attachment 514675 [details] [diff] [review]
patch v0.3

Updated patch to address all review comments, except this one:

> And you probably want to add a method
> "hasNote()" to machineResult.prototype which you can use in UserInterface.

This doesn't work, as not all 'machineResult' parameters passed to various functions are instances of MachineResult; those for pending and running results are not.  So instead I added a _machineResultHasNotes() function to UserInterface.
Attachment #511175 - Attachment is obsolete: true
Attachment #514675 - Flags: review?
Attachment #514675 - Flags: review? → review?(mstange)
Comment on attachment 514675 [details] [diff] [review]
patch v0.3

Awesome!
Attachment #514675 - Flags: review?(mstange) → review+
One thing:

+    // Generate the query string data to use to request note data from ElasticSearch;
+    // this consists of the tree name and a list of dates encompassing all
+    // the machineResults.
+    var dates = [];
+    for (var resultID in machineResults) {
+      if (machineResults[resultID].startTime) {
+        var resultDate = UserInterface._ISODateString(machineResults[resultID].startTime);
+        if (dates.indexOf(resultDate) == -1 && resultDate != 'NaN-NaN-NaN') {
+          dates.push(resultDate);
+        }
+      }
+    }

After this, check that the dates array isn't empty, and skip the rest if it is. Otherwise we'll get an empty json and hit the 'note data is invalid' error case below.
Oh, and there's a "!!result.note" in SummaryLoader.js which needs to be fixed.
Created attachment 515957 [details] [diff] [review]
patch v0.4

Updated patch addressing comments 11 and 12.
Attachment #514675 - Attachment is obsolete: true
Created attachment 524258 [details] [diff] [review]
PHP gateway to ElasticSearch

The PHP component of this can now live on the machine that hosts tbpl, so I'm including it here for review.
Attachment #524258 - Flags: review?(mstange)
Created attachment 524260 [details] [diff] [review]
patch v0.5

Updated tbpl patch which fixes bitrot from previous version; no other changes.
Attachment #515957 - Attachment is obsolete: true
Created attachment 524263 [details] [diff] [review]
patch v0.6

Remove some dump statements...
Attachment #524260 - Attachment is obsolete: true
Comment on attachment 524258 [details] [diff] [review]
PHP gateway to ElasticSearch

>diff --git a/php/starcomment.php b/php/starcomment.php
>new file mode 100644
>--- /dev/null
>+++ b/php/starcomment.php
>@@ -0,0 +1,163 @@
>+<?php
>+
>+# some globals
>+$elastic_search_server = "http://elasticsearch1.metrics.sjc1.mozilla.com:9200/";
>+$elastic_search_comment_index = "tbpl";
>+$elastic_search_comment_doctype = "comments";
>+$elastic_search_bug_index = "bugs";
>+$elastic_search_bug_doctype = "bug_info";
>+$log_filename = null;
>+$log_fp = null;
>+
>+function log_msg($msg)
>+{
>+  global $log_fp;
>+
>+  if ($log_fp)
>+    fwrite($log_fp, $msg);
>+}
>+
>+# make a simple HTTP request and return the response
>+function do_http_request($url, $data, $method = 'POST')
>+{
>+  $params = array('http' => array(
>+              'method' => $method,
>+              'content' => json_encode($data),
>+              'header' => "Content-type: text/plain\r\n" .
>+                          "Connection: close\r\n"
>+            ));
>+
>+  $ctx = stream_context_create($params);
>+  $fp = @fopen($url, 'rb', false, $ctx);
>+  if (!$fp) {
>+    log_msg("Problem with $url, $php_errormsg\n");
>+  }
>+
>+  $response = @stream_get_contents($fp);
>+  if ($response === false) {
>+    log_msg("Problem reading data from $url, $php_errormsg\n");
>+  }
>+  if ($fp) {
>+    fclose($fp);
>+  }
>+  return $response;
>+}
>+
>+function get_notes($url, $params)
>+{
>+  log_msg(json_encode($params) . "\n");
>+
>+  # create a request
>+  $request = array('query' => array(
>+                    'bool' => array(
>+                      'must' => array(
>+                        array('field' => array('tree' => $params['tree'])),
>+                        array('field' => array('date' => implode(" ", $params['dates'])))
>+                        )
>+                      )
>+                    ),
>+                    'size' => 5000
>+                  );
>+
>+  # send the request to the _search url with HTTP GET
>+  $result = do_http_request($url . "_search", $request, 'GET');
>+  $hits = json_decode($result)->hits->hits;

should we add a try/catch block if $result is an invalid json?

>+
>+  $notes = Array();
>+  foreach($hits as $hit) {
>+    $note = array('startTime' => $hit->_source->starttime,
>+                  'slave' => $hit->_source->machinename,
>+                  'who'=> $hit->_source->who,
>+                  'note'=> $hit->_source->comment);
>+    if (property_exists($hit->_source, 'timestamp')) {
>+      $note = $note + array('timestamp' => $hit->_source->timestamp);
>+    }
>+    array_push($notes, $note);
>+  }
>+
>+  header("Content-Type: text/plain,charset=utf-8");
>+  header("Access-Control-Allow-Origin: *");
>+  echo json_encode($notes);
>+}
>+
>+# send a bug to ES if it doesn't already exist there

This function checks uniqueness based on bug-id, is it possible to have same bug with different comments/description? We can also index using the bug-id as a "key" which will allow us to have only one instance / bug inside ES.


>+function send_bug($url, $dat, $bug)
>+{
>+  unset($dat['comment']);
>+  $dat['bug'] = $bug;
>+
>+  # create a request body to test if this bug is already in ES
>+  $request = array('query' => array(
>+                    'bool' => array(
>+                      'must' => array(
>+                        array('field' => array('bug' => $bug)),
>+                        array('field' => array('machinename' => $dat['machinename'])),
>+                        array('field' => array('starttime' => $dat['starttime']))
>+                        )
>+                      )
>+                    )
>+                  );
>+  log_msg("request: " . json_encode($request) . "\n");
>+
>+  # send the request to the _search url with HTTP GET
>+  $result = do_http_request($url . "_search", $request, 'GET');
>+  log_msg("response: " . $result . "\n");
>+
>+  # find how many instances of this bug already exist in ES
>+  $hits = json_decode($result)->hits->total;
>+  log_msg("hits: " . json_encode($hits) . "\n");
>+
>+  # if no instances exist, add to ES
>+  if ($hits == 0) {
>+    log_msg("writing bug to ES: " . json_encode($dat) . "\n");
>+    $result = do_http_request($url, $dat);
>+    log_msg("result: " . $result . "\n");
>+  }
>+}
>+
>+$es_comment_url = $elastic_search_server . $elastic_search_comment_index .
>+                  "/" . $elastic_search_comment_doctype . "/";
>+$es_bug_url = $elastic_search_server . $elastic_search_bug_index .
>+              "/" . $elastic_search_bug_doctype . "/";
>+
>+if ($log_filename)
>+  $log_fp = fopen($log_filename, 'a');
>+
>+if (isset($_REQUEST['dates'])) {
>+
>+  get_notes($es_comment_url, $_GET);
>+
>+} else {
>+  try {
>+
>+    # the comment arrives as the data in the $_POST array
>+    $dat = $_POST;
>+
>+    # send the comment to ES, as-is
>+    log_msg("writing comment to ES: " . json_encode($dat) . "\n");
>+    log_msg("response: " . do_http_request($es_comment_url, $dat) . "\n");
>+
>+    # search the comment string and see if any bug numbers are referenced
>+    $hits = preg_match_all('/Bug\s*(?P<bug>\d+)/i', $dat['comment'], $matches);
>+
>+    # for each bug referenced, push the bug data to the 'bugs' doctype separately
>+    if ($hits) {
>+      foreach ($matches['bug'] as $bug) {
>+        log_msg("bug found: " . $bug . "\n");
>+        send_bug($es_bug_url, $dat, $bug);
>+      }
>+    }
>+    else {
>+      log_msg("no bug found\n");
>+    }
>+
>+  }
>+  catch (Exception $e) {
>+    echo 'Caught exception: ',  $e->getMessage(), "\n";
>+    log_msg('Caught exception: ' . $e->getMessage() . "\n");
>+  }
>+}
>+
>+if ($log_fp)
>+  fclose($log_fp);
>+?>
> This function checks uniqueness based on bug-id, is it possible to have same
> bug with different comments/description? We can also index using the bug-id as
> a "key" which will allow us to have only one instance / bug inside ES.

We process comments in a couple of different ways.  First, we log the comment as-is to tbpl/comments in ElasticSearch.  This does no checking for duplicate bug id's, etc.  Then we log the bugs id's found in the comment to bugs/bug_info, ensuring there are no dupes, for use by OrangeFactor, since OrangeFactor only wants to know about unique bugs per failure.

I agree, it would be more elegant to use the bug id as the document's id for the bugs index.

> should we add a try/catch block if $result is an invalid json?

Yes, I'll add that, and a corresponding change in the JS to display the error if the JSON is invalid for some reason.
Comment on attachment 524258 [details] [diff] [review]
PHP gateway to ElasticSearch

>+  $ctx = stream_context_create($params);
>+  $fp = @fopen($url, 'rb', false, $ctx);
>+  if (!$fp) {
>+    log_msg("Problem with $url, $php_errormsg\n");

$php_errormsg is undefined.

>+  }
>+
>+  $response = @stream_get_contents($fp);
>+  if ($response === false) {
>+    log_msg("Problem reading data from $url, $php_errormsg\n");
>+  }

the same

>+    if (property_exists($hit->_source, 'timestamp')) {
>+      $note = $note + array('timestamp' => $hit->_source->timestamp);

Didn't know this was possible. Alternatively just $note[...] = ...
Comment on attachment 524258 [details] [diff] [review]
PHP gateway to ElasticSearch

>+# some globals

The other php files use "//" for comments, let's be consistent.

>+$elastic_search_comment_index = "tbpl";
>+$elastic_search_comment_doctype = "comments";

The terms "index" and "doctype" are unfamiliar to me in this context. Can you add a short comment describing what they mean?

>+  $notes = Array();

I've never seen "Array" with an uppercase A in PHP. Does this work? Is there a difference to lowercase array()?

>+      $note = $note + array('timestamp' => $hit->_source->timestamp);

$note['timestamp'] = $hit->_source->timestamp; would be a lot less confusing and do the same, right? Or am I missing something?

>+    }
>+    array_push($notes, $note);

$notes[] = $note; would be shorter. It might be more confusing, though; I don't know how commonly used it is. Your call.

>+$es_comment_url = $elastic_search_server . $elastic_search_comment_index .
>+                  "/" . $elastic_search_comment_doctype . "/";
>+$es_bug_url = $elastic_search_server . $elastic_search_bug_index .
>+              "/" . $elastic_search_bug_doctype . "/";
>+
>+if ($log_filename)
>+  $log_fp = fopen($log_filename, 'a');
>+
>+if (isset($_REQUEST['dates'])) {
>+
>+  get_notes($es_comment_url, $_GET);

You're only using the data from $_GET, so check $_GET above instead of $_REQUEST.

>+?>

This can be omitted.

Since this is public-facing php code, I think it needs to be security reviewed before it can be put on the server. Please request review from mcoates@mozilla.com on the updated patch.
>>+  $notes = Array();

> I've never seen "Array" with an uppercase A in PHP. Does this work? Is there a
> difference to lowercase array()?

Interestingly, in PHP, function names and language constructs, like Array, are case-insensitive.  However, I've lower-cased this to be consistent.
Created attachment 525724 [details] [diff] [review]
PHP gateway to ElasticSearch, v0.2

Updated per previous review comments.  Flagging for security review by mcoates.
Attachment #524258 - Attachment is obsolete: true
Attachment #525724 - Flags: review?
Attachment #524258 - Flags: review?(mstange)
Attachment #525724 - Flags: review? → review?(mcoates)
Created attachment 532205 [details] [diff] [review]
patch v0.7

Jonathan, are there reasons against landing this patch with the brasstacks.mozilla.com/starcomment.php version, until we get the local starcomment.php version security-approved?

I've updated the patch to trunk and removed the Config.wooBugURL change.
Attachment #524263 - Attachment is obsolete: true
(In reply to comment #23)
> 
> Jonathan, are there reasons against landing this patch with the
> brasstacks.mozilla.com/starcomment.php version, until we get the local
> starcomment.php version security-approved?
> 
> I've updated the patch to trunk and removed the Config.wooBugURL change.

Only that brasstacks is probably not quite as reliable as the tbpl machine; it's sort of a sandbox for a lot of automation tools, and the webserver sometimes gets restarted, etc.
I missed the bugzilla email about the review request. Sorry for the delay.

Is this code running somewhere in stage? The best way to get this on the sec review radar and completed is to file a security review request.  This will provide us all the info we need to understand the system and get it reviewed as quick as possible.

https://wiki.mozilla.org/WebAppSec/Security_Review_Request
Depends on: 658098
I've thrown up a staging version of this code at http://brasstacks.mozilla.com/tbpl/ (connected to a dummy ES instance), and filed the security review request (bug 658098).

Updated

7 years ago
Depends on: 682914
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Comment on attachment 525724 [details] [diff] [review]
PHP gateway to ElasticSearch, v0.2

Bug is already fixed. Removing myself from reviewer
Attachment #525724 - Flags: review?(mcoates)
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.