Indexed logs & efficient search

ASSIGNED
Assigned to

Status

Instantbird
Other
--
enhancement
ASSIGNED
4 years ago
2 years ago

People

(Reporter: clokep, Assigned: nhnt11)

Tracking

(Blocks: 3 bugs)

Dependency tree / graph

Details

Attachments

(3 attachments, 20 obsolete attachments)

27.06 KB, patch
aleth
: review+
Details | Diff | Splinter Review
40.30 KB, patch
florian
: review-
aleth
: review+
florian
: feedback+
Details | Diff | Splinter Review
10.04 KB, patch
aleth
: review+
Details | Diff | Splinter Review
(Reporter)

Description

4 years ago
*** Original post on bio 1584 at 2012-07-11 19:45:00 UTC ***

You should quickly and easily be able to search your previous logs with a contact (or all your logs?) via Instantbird...i.e. without using grep. :)
(Reporter)

Updated

4 years ago
Duplicate of this bug: 955285
(Reporter)

Updated

4 years ago
Duplicate of this bug: 955384
(Reporter)

Updated

4 years ago
Duplicate of this bug: 955434
Created attachment 8354316 [details] [diff] [review]
WIP

*** Original post on bio 1584 as attmnt 2548 at 2013-07-02 14:39:00 UTC ***

This WIP only goes through the log folder asynchronously at startup and parses all the log files (and dumps everything to stdout for now).
(Assignee)

Updated

3 years ago
Depends on: 1025464
(Assignee)

Comment 5

3 years ago
Created attachment 8442200 [details]
Initial proof of concept

This was a proof of concept I wrote a few days ago, much of the code is borrowed from imContacts.js. I'm posting it here just for reference, it's all going to be rewritten using Sqlite.jsm.
Assignee: nobody → nhnt11
Attachment #8354316 - Attachment is obsolete: true
Status: NEW → ASSIGNED
(Assignee)

Comment 6

3 years ago
Created attachment 8445412 [details] [diff] [review]
WIP

- Adds code for indexing logs using Sqlite.jsm
- Pref'd off by default in chat-prefs.js
- Updates Instantbird UI for log searching

This is only partially tested - it was working, then I did some refactoring and didn't have time to test properly before bed.

To-do:
- Test it
- Handle AsyncShutdown
- Perfect the UX (behavior of the findbar, show all logs when search bar is cleared, etc)
Attachment #8442200 - Attachment is obsolete: true
Attachment #8445412 - Flags: feedback?(florian)
(Assignee)

Comment 7

3 years ago
Created attachment 8445442 [details]
Example use for the promise queue

I figured it wasn't clear why there's yet another promise queue in the patch, this example uses it to cache pending logs in a json file, to be indexed at startup.
Comment on attachment 8445412 [details] [diff] [review]
WIP

Review of attachment 8445412 [details] [diff] [review]:
-----------------------------------------------------------------

Great job for a first WIP! :-)

Are you making progress on bug 1025522? (It is a pre-requisite, isn't it?)

::: chat/chat-prefs.js
@@ +105,5 @@
>  pref("purple.logging.log_system", true);
> +
> +// Disable indexing by default.
> +pref("purple.logging.indexLogs", false);
> +pref("purple.logging.reindex-required", false);

How would you feel about naming these prefs chat.*

I know it's inconsistent with the above purple.logging.* prefs, but I don't think we should carry the 'purple' prefix forever here.

::: chat/components/public/imILogger.idl
@@ +74,5 @@
>                                   in AUTF8String aNormalizedName,
>                                   [optional] in boolean aGroupByDay);
> +  jsval searchLogsForAccountAndName(in imIAccount aAccount,
> +                                    in AUTF8String aNormalizedName,
> +                                    in AUTF8String aQuery)

I don't think this interface is what you want.

You'll need to be able to search the logs for a contact, etc... Or maybe even search all logs at once. You'll also want to set constraints by date.

I don't think you want to duplicate all the existing getLogsFor* methods with similar searchLogsFor*.


A vague idea (feel free to improve it!) I have to make this interface less painful would be to create a new LogsSet interface that would have a forEach and a search method.
We would replace all the existing getLogsFor* methods by selectLogsFor* methods returning a log set.

Internally a log set could just be an array of log directory paths.

::: chat/components/src/logger.js
@@ +91,5 @@
>    })
>  );
>  
>  
> +let gDBConn = null;

Should we drop this connection after some period of inactivity? Or on memory pressure?

@@ +92,5 @@
>  );
>  
>  
> +let gDBConn = null;
> +let getDBConn = Task.async(function* () {

nit: function*(
(or at least I think it would be consistent with us usually not putting a space between "function" and "(" for anonymous functions.)

@@ +95,5 @@
> +let gDBConn = null;
> +let getDBConn = Task.async(function* () {
> +  if (gDBConn)
> +    return gDBConn;
> +  let path = OS.Path.join(OS.Constants.Path.profileDir, "logsindex.sqlite");

- This file name should be in a const (I see you are using it a second time later).
- I dislike the "logsindex" name. What about indexedLogs?

@@ +96,5 @@
> +let getDBConn = Task.async(function* () {
> +  if (gDBConn)
> +    return gDBConn;
> +  let path = OS.Path.join(OS.Constants.Path.profileDir, "logsindex.sqlite");
> +  return (gDBConn = yield Sqlite.openConnection({path: path}));

Should that "yield " be removed? If not, isn't it possible for getDBConn to have already been called (but with the Sqlite.openConnection promise not having resolved yet) but gDBConn to still be null?

@@ +123,5 @@
> +  // If there were no results, insert it.
> +  if (!results.length) {
> +    yield conn.executeCached("INSERT INTO pathIds (path) VALUES (:path);",
> +                             {path: path});
> +    results = yield conn.executeCached("SELECT last_insert_rowid();");

Isn't there a possible race condition here, if some other code has queued an INSERT on the same DB connection before your |conn.executeCached("INSERT INTO pathIds...| resolves?

@@ +126,5 @@
> +                             {path: path});
> +    results = yield conn.executeCached("SELECT last_insert_rowid();");
> +  }
> +  let id = results[0].getResultByIndex(0);
> +  yield conn.executeCached("INSERT INTO fileContents (contents, filename, pathid) " +

nit: pathId

@@ +128,5 @@
> +  }
> +  let id = results[0].getResultByIndex(0);
> +  yield conn.executeCached("INSERT INTO fileContents (contents, filename, pathid) " +
> +                           "VALUES (:contents, :filename, :pathid);",
> +                           {contents: text, filename: filename, pathid: id});

You don't want to insert the whole log file content, right?

@@ +134,5 @@
> +
> +// This removes the current index and re-indexes all logs.
> +let indexAllLogs = Task.async(function* () {
> +  let path = OS.Path.join(OS.Constants.Path.profileDir, "logsindex.sqlite");
> +  yield OS.File.remove(path);

Will this fail if the database didn't exist before?

@@ +135,5 @@
> +// This removes the current index and re-indexes all logs.
> +let indexAllLogs = Task.async(function* () {
> +  let path = OS.Path.join(OS.Constants.Path.profileDir, "logsindex.sqlite");
> +  yield OS.File.remove(path);
> +  let conn = yield getDBConn();

Don't you need to close any existing DB connection before removing the file?

If you don't I think you'll still be writing to a file that will be removed by the OS as soon as we close it.

@@ +136,5 @@
> +let indexAllLogs = Task.async(function* () {
> +  let path = OS.Path.join(OS.Constants.Path.profileDir, "logsindex.sqlite");
> +  yield OS.File.remove(path);
> +  let conn = yield getDBConn();
> +  yield conn.execute("CREATE TABLE pathIds (" +

Would it be an improvement to create the tables/view while creating the DB connection?

@@ +139,5 @@
> +  let conn = yield getDBConn();
> +  yield conn.execute("CREATE TABLE pathIds (" +
> +                     "id INTEGER PRIMARY KEY, " +
> +                     "path VARCHAR UNIQUE);");
> +  yield conn.execute("CREATE VIRTUAL TABLE fileContents USING fts3(" +

I don't really remember the fts3 syntax, but isn't this going to index all the 3 columns, rather than just 'contents'?

@@ +149,5 @@
> +    "SELECT fileContents.contents, fileContents.filename, pathIds.path " +
> +    "FROM fileContents INNER JOIN pathIds " +
> +    "ON fileContents.pathid = pathIds.id;");
> +  let decoder = new TextDecoder();
> +  yield Services.logs.forEach(Task.async(function* (aLog) {

aLog is a misleading parameter/variable name if it actually contains a path.

@@ +343,5 @@
> +  let logWriter = gLogWritersById.get(aConversation.id);
> +  if (!logWriter) {
> +    return;
> +  if (logWriter.path)
> +    queueIndexingJob(logWriter.path);

You don't really want to do that during shutdown, right?

@@ -411,5 @@
>      return enumerator;
>    }
>  };
>  
> -

Please avoid unrelated changes.

@@ +821,5 @@
>      return this._getEnumerator(entries, aGroupByDay);
>    }),
> +  searchLogsForAccountAndName: Task.async(function* (aAccount, aNormalizedName, aQuery) {
> +    if (!Services.prefs.getBoolPref("purple.logging.indexLogs"))
> +      return EmptyEnumerator;

Returning an empty set of results doesn't seem correct if we haven't actually searched. What about returning null in the error case?

@@ +828,5 @@
> +    if (!(yield queueFileOperation(path, () => OS.File.exists(path))))
> +      return EmptyEnumerator;
> +
> +    let conn = yield getDBConn();
> +    if (!(yield conn.tableExists("logs")))

We shouldn't create the DB if we are searching rather than indexing, right? And then if the db exists, the logs table is guaranteed to exist, isn't it?

@@ +832,5 @@
> +    if (!(yield conn.tableExists("logs")))
> +      return EmptyEnumerator;
> +    try {
> +      let results = yield conn.execute(
> +        "SELECT * FROM logs WHERE text MATCH :queryStr AND path = :path;",

- You don't want to select * if you are only using the path and filename.
- I think you really want your SELECT query to provide an array of path ids, rather than just one.

::: im/content/conversation.xml
@@ +1614,5 @@
>            Services.logs.getLogsForConversation(this.conv, true).then(aLogs => {
>              if (!aLogs || !aLogs.hasMoreElements())
>                return;
>              window.openDialog("chrome://instantbird/content/viewlog.xul",
> +                              "Logs", "chrome,resizable", {logs: aLogs, conv: this.conv},

This new parameter is unfortunate :(. It would be slightly better to add a method in the logger to return the log path for a conversation, and give the log viewer only that path rather than the conversation object (that would avoid keeping the conversation object alive after the conversation is closed if the viewer is still open).

Forcing the conv object to stay alive could cause 'interesting' issues if the account is deleted.

Or maybe what we really want to give the log viewer is a log set object rather than a log enumerator ;-).
Attachment #8445412 - Flags: feedback?(florian) → feedback+
(Reporter)

Comment 9

3 years ago
(In reply to Florian Quèze [:florian] [:flo] from comment #8)
> ::: chat/chat-prefs.js
> @@ +105,5 @@
> >  pref("purple.logging.log_system", true);
> > +
> > +// Disable indexing by default.
> > +pref("purple.logging.indexLogs", false);
> > +pref("purple.logging.reindex-required", false);
> 
> How would you feel about naming these prefs chat.*
> 
> I know it's inconsistent with the above purple.logging.* prefs, but I don't
> think we should carry the 'purple' prefix forever here.

Most of the preferences in that file use messenger.*, not chat.*. Do we have any preference (haha) to which one to use?
(Assignee)

Comment 10

3 years ago
Created attachment 8454767 [details] [diff] [review]
Patch

This addresses all of the review comments. I've introduced a new LogSet interface for consumers to use. There are new promise queues for db operations and shutdown is properly handled. The one thing I haven't done yet is close the db connection after periods of inactivity.
Attachment #8445412 - Attachment is obsolete: true
Attachment #8454767 - Flags: review?(florian)
Attachment #8454767 - Flags: review?(aleth)
(Assignee)

Comment 11

3 years ago
Created attachment 8454768 [details] [diff] [review]
Update Instantbird UI
Attachment #8454768 - Flags: review?(aleth)
(Assignee)

Updated

3 years ago
Depends on: 1025522
(Assignee)

Comment 12

3 years ago
Created attachment 8454900 [details] [diff] [review]
Patch v2

Did a self review, updated some comments and fixed some nits. Also fixed a bug in LogSet.prototype.search.
Attachment #8445442 - Attachment is obsolete: true
Attachment #8454767 - Attachment is obsolete: true
Attachment #8454767 - Flags: review?(florian)
Attachment #8454767 - Flags: review?(aleth)
Attachment #8454900 - Flags: review?(florian)
Attachment #8454900 - Flags: review?(aleth)
(Assignee)

Updated

3 years ago
Depends on: 1035844

Comment 13

3 years ago
Comment on attachment 8454900 [details] [diff] [review]
Patch v2

Review of attachment 8454900 [details] [diff] [review]:
-----------------------------------------------------------------

Some initial comments on the database part. It looks good to me but I wonder if the code couldn't be simplified.

::: chat/components/src/logger.js
@@ +107,5 @@
> +  return path;
> +}
> +
> +// This is set to true before profile-before-change.
> +let gShuttingDown = false;

Can we do without this boolean by overriding getDBConn and queueIndexingJob on shutdown instead?

Really what you want to be overriding (speaking more abstractly) are database.open and database.queue methods I guess.

@@ +109,5 @@
> +
> +// This is set to true before profile-before-change.
> +let gShuttingDown = false;
> +
> +let gDBConn = null;

Needs a comment (This is a promise that resolves to the db connection when it is open)

@@ +110,5 @@
> +// This is set to true before profile-before-change.
> +let gShuttingDown = false;
> +
> +let gDBConn = null;
> +function getDBConn() {

Does this want to be a (lazy) getter on some kind of DB object? Just a thought, all these separate functions and variables could possibly be tidied up and encapsulated. The current setup seems very complicated to read.

@@ +111,5 @@
> +let gShuttingDown = false;
> +
> +let gDBConn = null;
> +function getDBConn() {
> +  return (gClosePromise || Promise.resolve()).then(() => {

Isn't the most common case captured more readably by
if (!gClosePromise && gDBConn)
  return gDBConn;

No - because gClosePromise is what we chain things on once the DB has been closed at least once?

@@ +112,5 @@
> +
> +let gDBConn = null;
> +function getDBConn() {
> +  return (gClosePromise || Promise.resolve()).then(() => {
> +    if (!gDBConn) {

Turn this around (if (gDBConn) return gDBConn;)

@@ +115,5 @@
> +  return (gClosePromise || Promise.resolve()).then(() => {
> +    if (!gDBConn) {
> +      if (gShuttingDown)
> +        throw "Trying to open a new DB connection at shutdown.";
> +      gDBConn = Task.spawn(function* () {

I'm confused. Isn't this a "database operation" which should therefore be queued?

@@ +139,5 @@
> +    return gDBConn;
> +  });
> +}
> +
> +let gClosePromise = null;

Would an isClosed boolean be simpler?

There appear to be two promises here: gDBConn (which resolves to the connection when open) and gClosePromise (which is resolved when the connection is not currently being closed). It took me far too long to figure this out from the code ;)

@@ +143,5 @@
> +let gClosePromise = null;
> +function closeDBConn() {
> +  if (!gDBConn)
> +    return Promise.resolve();
> +  return gClosePromise = getDBConn().then(aConn => {

So nothing ever sets gClosePromise back to null, it just stays resolved when the DB is reopened? That's very confusing, given the name. And suggests Promise.resolve is the default value, not null, if you do want to do things this way.

@@ +149,5 @@
> +    return aConn.close();
> +  });
> +}
> +
> +// Promise queue for database operations to prevent race conditions.

What are these possible race conditions? please expand the comment.

Aren't opening and closing "database operations" too?

@@ +153,5 @@
> +// Promise queue for database operations to prevent race conditions.
> +let gDatabaseQueue = null;
> +function queueDbOperation(aOperation) {
> +  return gDatabaseQueue =
> +    (gDatabaseQueue || Promise.resolve()).then(aOperation, aOperation);

This looks like an initial value that should replace null.

@@ +171,5 @@
> +    if (gShuttingDown) {
> +      gPendingFiles.add(aLog);
> +      return null;
> +    }
> +    return indexLogFile(aLog, new TextDecoder());

Why the second argument if indexLogFile can create a new TextDecoder anyway? Or did you mean to keep an instance around?

@@ +173,5 @@
> +      return null;
> +    }
> +    return indexLogFile(aLog, new TextDecoder());
> +  }
> +  return gIndexingQueue = (gIndexingQueue || Promise.resolve()).then(job, job);

Can't we just give gIndexingQueue a default value of Promise.resolve() instead of null?

If there are errors, e.g. syntax errors, during job, are you sure that isn't dropped on the floor?
Attachment #8454900 - Flags: feedback+

Comment 14

3 years ago
Comment on attachment 8454768 [details] [diff] [review]
Update Instantbird UI

Review of attachment 8454768 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine to me, but I'll hold off a review until the backend patch is final.

::: im/content/viewlog.js
@@ -75,5 @@
>      let findbar = document.getElementById("findbar");
>      if (log.format == "json") {
>        log.getConversation().then((aConv) => {
>          if (!aConv) {
> -          // Empty or completely broken json log file.

Don't remove this comment.
Attachment #8454768 - Flags: feedback+

Comment 15

3 years ago
Comment on attachment 8454900 [details] [diff] [review]
Patch v2

Review of attachment 8454900 [details] [diff] [review]:
-----------------------------------------------------------------

::: chat/components/public/imILogger.idl
@@ +66,5 @@
> +  // callback on imILog instances constructed from each file. Returns a promise
> +  // that resolves when iteration is complete. If the callback returns a promise,
> +  // iteration pauses until the promise resolves. If the callback throws (or rejects),
> +  // iteration will stop and the returned promise will reject with the same error.
> +  jsval forEach(in imIProcessLogsCallback aCallback);

Blank line after this please for legibility.

::: chat/components/src/logger.js
@@ +102,5 @@
> +  let pathComponents = OS.Path.split(aPath).components;
> +  let logDirIndex = pathComponents.lastIndexOf("logs");
> +  let path = "";
> +  for (let i = logDirIndex + 1; i < pathComponents.length ; ++i)
> +    path = OS.Path.join(path, pathComponents[i]);

Doesn't the string aPath always start with kLogDirPath? Can't we just strip this off? This needs a comment explaining why the function is so complicated.

@@ +249,5 @@
> +  let decoder = new TextDecoder();
> +  let logs = yield Services.logs.getAllLogs();
> +  yield logs.forEach({processLog: Task.async(function* (aLog) {
> +    try {
> +      yield indexLogFile(aLog.path, decoder);

This function will fail for txt logs, so shouldn't we check it's .json?

@@ +802,5 @@
>   */
> +function DailyLogSet(aEntries) {
> +  this._relativePaths = new Set();
> +  this._dayIDsByRelativePath = new Map();
> +  this._entriesByDay = new Map();

Uh. This was already complicated enough...

@@ +807,3 @@
>    for (let entry of aEntries) {
>      let path = entry.path;
> +    let relativePath = getRelativeLogPath(path);

It's super confusing to have this._relativePaths exclude filenames while relativePath includes it.

@@ +842,2 @@
>      }
> +    this._dayIDsByRelativePath.set(relativePath, dayID);

Ditto here. This now includes filenames. How about dayIDForFile or something like that?

Do we need to store this or can we simply put the above filename -> dayID function in a separate method and call it when needed? Or is it too elaborate?

@@ +845,5 @@
>  
> +  this._days = [];
> +  for (let key of this._entriesByDay.keys())
> +    this._days.push(key);
> +  this._days.sort();

this._days = [...this.entriesByDay.keys()].sort();

@@ +850,3 @@
>  }
> +DailyLogSet.prototype = {
> +  // Paths relative to the logs directory, excluding the filename.

Can we make this comment and the var name more helpful?
e.g. something like "all the directories of files included in this log set, needed for search because..."

@@ +851,5 @@
> +DailyLogSet.prototype = {
> +  // Paths relative to the logs directory, excluding the filename.
> +  _relativePaths: new Set(),
> +  get isEmpty() this._relativePaths.size == 0,
> +  // Maps dayID to an array of paths (including filenames) belonging to that day.

an ordered array!

@@ +859,3 @@
>    _days: [],
> +  forEach: Task.async(function* (aCallback) {
> +    for (let i = 0; i < this._days.length; ++i)

for (dayID of this._days)

@@ +865,5 @@
> +    if (!Services.prefs.getBoolPref("messenger.logging.indexLogs"))
> +      return null;
> +    let paths = [];
> +    this._relativePaths.forEach(aPath => paths.push(aPath));
> +    let results = yield searchDatabase(aQuery, paths);

searchDatabase(aQuery, [...this._relativePaths])

@@ +877,5 @@
> +      // that the file is in the same folder as one of the member files of this
> +      // DailyLogSet, but is not a member itself. If multiple files belonging to
> +      // the same day match the query, add this day's files only once.
> +      let dayID = this._dayIDsByRelativePath.get(path);
> +      if (!dayID || dayIDsInResults.has(dayID))

!dayID can be true when?

@@ +879,5 @@
> +      // the same day match the query, add this day's files only once.
> +      let dayID = this._dayIDsByRelativePath.get(path);
> +      if (!dayID || dayIDsInResults.has(dayID))
> +        continue;
> +      entries = entries.concat(this._entriesByDay.get(dayID));

Lets move the construction of entries outside this loop. Then you don't need the if clause and the construction is a simple entries = [ ... ] generator.
Attachment #8454900 - Flags: review?(aleth) → review-
(Assignee)

Comment 16

3 years ago
Created attachment 8457686 [details] [diff] [review]
Patch v3

I've tried to address all of your review comments.

> @@ +879,5 @@
> > +      // the same day match the query, add this day's files only once.
> > +      let dayID = this._dayIDsByRelativePath.get(path);
> > +      if (!dayID || dayIDsInResults.has(dayID))
> > +        continue;
> > +      entries = entries.concat(this._entriesByDay.get(dayID));
> 
> Lets move the construction of entries outside this loop. Then you don't need
> the if clause and the construction is a simple entries = [ ... ] generator.
I ended up with this array comprehension:
>[entry for (entry of [this._entriesByDay.get(dayID) for (dayID of dayIDsInResults.values())])]
I figured what I have currently is far less confusing so I kept it.
Attachment #8454900 - Attachment is obsolete: true
Attachment #8454900 - Flags: review?(florian)
Attachment #8457686 - Flags: review?(aleth)
(Assignee)

Comment 17

3 years ago
Created attachment 8458087 [details] [diff] [review]
Patch v4

Use the spread operator where possible, and update tests to use new API.
Attachment #8457686 - Attachment is obsolete: true
Attachment #8457686 - Flags: review?(aleth)
Attachment #8458087 - Flags: review?(aleth)
(Assignee)

Comment 18

3 years ago
Created attachment 8458088 [details] [diff] [review]
Update Instantbird UI v2

Update stats service log sweeping code.
Attachment #8454768 - Attachment is obsolete: true
Attachment #8454768 - Flags: review?(aleth)
Attachment #8458088 - Flags: review?(aleth)
(Assignee)

Comment 19

3 years ago
Created attachment 8458114 [details] [diff] [review]
Update Thunderbird UI
Attachment #8458114 - Flags: review?(aleth)

Comment 20

3 years ago
Comment on attachment 8458088 [details] [diff] [review]
Update Instantbird UI v2

Review of attachment 8458088 [details] [diff] [review]:
-----------------------------------------------------------------

Shouldn't the searchbox be hidden if indexing is turned off?
(Assignee)

Comment 21

3 years ago
Comment on attachment 8458088 [details] [diff] [review]
Update Instantbird UI v2

Review of attachment 8458088 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to aleth [:aleth] from comment #20)
> Comment on attachment 8458088 [details] [diff] [review]
> Update Instantbird UI v2
> 
> Review of attachment 8458088 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Shouldn't the searchbox be hidden if indexing is turned off?

::: im/content/viewlog.js
@@ +42,5 @@
>      logTree.focus();
> +    logWindow._originalLogSet = window.arguments[0].logs;
> +    logWindow._reloadTree();
> +    if (!Services.prefs.getBoolPref("messenger.logging.indexLogs"))
> +      document.getElementById("viewlog-toolbar").setAttribute("hidden", true);

The toolbar (which includes the searchbox) is hidden if indexing is disabled.

Comment 22

3 years ago
Comment on attachment 8458087 [details] [diff] [review]
Patch v4

Review of attachment 8458087 [details] [diff] [review]:
-----------------------------------------------------------------

This is looking pretty solid, but I'd like flo to take a look at it (in particular the DB structure) before a final review.

::: chat/components/src/logger.js
@@ +94,5 @@
> +const kLogDirPath = OS.Path.join(OS.Constants.Path.profileDir, "logs");
> +const kLogsIndexPath = OS.Path.join(OS.Constants.Path.profileDir,
> +                                    "indexedLogs.sqlite");
> +const kPendingIndexingJobsPath = OS.Path.join(OS.Constants.Path.profileDir,
> +                                              "pendingIndexingJobs.json");

This is not used.

@@ +99,5 @@
> +
> +// Returns the subpath from the logs directory.
> +function getRelativeLogPath(aPath) {
> +  if (!aPath.startsWith(kLogDirPath))
> +    return aPath;

Didn't you decide to throw?

@@ +137,5 @@
> +  }),
> +  close: function() {
> +    if (!this._dbConn)
> +      return Promise.resolve();
> +    return this._closePromise = this.getConn().then(aConn => {

this._dbConn is enough here.

@@ +144,5 @@
> +    });
> +  },
> +  // Promise queue for database operations to prevent race conditions between
> +  // statements that modify the database - particularly INSERTs. If one INSERT
> +  // statement gets executed while yielding another, last_insert_rowid cannot be

"while yielding another"? Is sqlite.jsm off-main-thread?

Can't you do without the queue by chaining any SELECT last_insert_rowid calls directly on the INSERT execute, and yielding on that?

@@ +149,5 @@
> +  // predicted. Use of this queue ensures that such statements get executed one
> +  // after another.
> +  _dbQueue: Promise.resolve(),
> +  queueOperation: function(aOperation) {
> +    return this.dbQueue = this._dbQueue.then(aOperation, aOperation);

Bad typo!

@@ +203,5 @@
> +  // This removes the current index and re-indexes all logs.
> +  indexAllLogs: function() {
> +    if (!Services.prefs.getBoolPref("messenger.logging.indexLogs"))
> +      return;
> +    this._indexingQueue = Task.spawn(function* () {

Don't you want this._indexingQueue.then(...) here just in case?

@@ +245,5 @@
> +    let conn = yield gDatabase.getConn();
> +    return yield conn.execute(
> +      "SELECT path, filename FROM logs WHERE content MATCH ? " +
> +      "AND path in (" + "?, ".repeat(aPathsToMatch.length - 1) + "?);",
> +      [aQuery, ...aPathsToMatch]);

:)

@@ +255,5 @@
> +  queueIndexingJob: function(aLog) {
> +    if (!Services.prefs.getBoolPref("messenger.logging.indexLogs"))
> +      return null;
> +    let job = () => this.indexLogFile(aLog, new TextDecoder());
> +    return this._indexingQueue = this._indexingQueue.then(job, job);

I ask again whether e.g. syntax errors in job() or other rejections are correctly reported.

@@ +274,5 @@
> +    for (let convId of gLogWritersById.keys())
> +      closeLogWriter(convId);
> +    yield gIndexer._indexingQueue;
> +    let filesToBeIndexed = {
> +      paths: []

Just init this to [...pendingFiles]

@@ +277,5 @@
> +    let filesToBeIndexed = {
> +      paths: []
> +    };
> +    for (let file of pendingFiles.values())
> +      filesToBeIndexed.paths.push(file);

then you don't need this.
Attachment #8458087 - Flags: review?(aleth)
Attachment #8458087 - Flags: review-
Attachment #8458087 - Flags: feedback?(florian)
(Assignee)

Comment 23

3 years ago
Yoric, I've got a few questions about Sqlite.jsm:

Currently I open a database connection when needed, but it stays open until shutdown. I've got a couple of issues with closing the connection. Any ongoing transaction is forcibly rolled back when close() is called. This seems to imply that if I want to wait for any ongoing transaction to finish and then close, I'd need to maintain my own promise queue and chain all transactions on this, and yield on this queue before closing. The other option is to |while (conn.transactionInProgress) yield;| which is kind of ugly. Is there another way?

Another idea is to close the database after periods of inactivity (which certainly calls for a promise queue for all transactions). However, I came across |shrinkMemoryOnConnectionIdleMS| which seems a nice way to reduce memory usage, but requires an open connection. Is there an upside/downside to using this over closing the connection? Is there any benefit from closing the connection other than releasing the lock on the db file?
Flags: needinfo?(dteller)
(Assignee)

Comment 24

3 years ago
Created attachment 8458328 [details] [diff] [review]
Patch v5

Address review comments, don't allow database operations while closing it in async shutdown.
Attachment #8458087 - Attachment is obsolete: true
Attachment #8458087 - Flags: feedback?(florian)
Attachment #8458328 - Flags: review?(aleth)
Attachment #8458328 - Flags: feedback?(florian)
(In reply to Nihanth Subramanya [:nhnt11] from comment #23)
> Yoric, I've got a few questions about Sqlite.jsm:
> 
> Currently I open a database connection when needed, but it stays open until
> shutdown. I've got a couple of issues with closing the connection. Any
> ongoing transaction is forcibly rolled back when close() is called. This
> seems to imply that if I want to wait for any ongoing transaction to finish
> and then close, I'd need to maintain my own promise queue and chain all
> transactions on this, and yield on this queue before closing. The other
> option is to |while (conn.transactionInProgress) yield;| which is kind of
> ugly. Is there another way?

I seem to remember that you can execute a bogus request and `yield` on the result.

> Another idea is to close the database after periods of inactivity (which
> certainly calls for a promise queue for all transactions). However, I came
> across |shrinkMemoryOnConnectionIdleMS| which seems a nice way to reduce
> memory usage, but requires an open connection. Is there an upside/downside
> to using this over closing the connection? Is there any benefit from closing
> the connection other than releasing the lock on the db file?

I don't remember specific data, but I seem to remember that closing is better.
Flags: needinfo?(dteller)
(Assignee)

Updated

3 years ago
Depends on: 1041928
(Assignee)

Comment 26

3 years ago
Created attachment 8460050 [details] [diff] [review]
Patch v5.1

I just realized I uploaded the wrong patch last time :(
This mainly fixes a couple of syntax errors. I also took the opportunity to make logWriter.startNewFile queue an indexing job on the previous file.
Attachment #8458328 - Attachment is obsolete: true
Attachment #8458328 - Flags: review?(aleth)
Attachment #8458328 - Flags: feedback?(florian)
Attachment #8460050 - Flags: review?(aleth)
Attachment #8460050 - Flags: feedback?(florian)

Comment 27

3 years ago
Given Yoric's feedback, please add code to automatically close the database after a certain period of inactivity.

Comment 28

3 years ago
For the record, we decided to keep the database promise queue, as a comment in Sqlite.jsm warns: 
* FUTURE IMPROVEMENTS
*
*   Ability to enqueue operations. Currently there can be race conditions,
*   especially as far as transactions are concerned. It would be nice to have
*   an enqueueOperation(func) API that serially executes passed functions.

Updated

3 years ago
Blocks: 1041928
No longer depends on: 1041928
(Assignee)

Comment 29

3 years ago
Created attachment 8461099 [details] [diff] [review]
Patch v6

Close database after 5 minutes of inactivity.
Attachment #8460050 - Attachment is obsolete: true
Attachment #8460050 - Flags: review?(aleth)
Attachment #8460050 - Flags: feedback?(florian)
Attachment #8461099 - Flags: review?(aleth)
Attachment #8461099 - Flags: feedback?(florian)
Comment on attachment 8461099 [details] [diff] [review]
Patch v6

Review of attachment 8461099 [details] [diff] [review]:
-----------------------------------------------------------------

Great progress!

note: I haven't looked in details at the DailyLogSet object, and I haven't looked at the test changes.

::: chat/components/public/imILogger.idl
@@ +85,5 @@
>  
> +  // Below methods return promises that resolve to imILogSet instances.
> +  // If aGroupByDay is true, the LogSet will organize log files by date and
> +  // construct imILog instances from all of the files for a given date. If false,
> +  // imILog instances will be constructed from individual files.

Would it make sense to drop all these aGroupByDay parameters here, and instead have a forEachDay method in the imILogSet interface?

::: chat/components/src/logger.js
@@ +92,5 @@
>  );
>  
> +const kLogDirPath = OS.Path.join(OS.Constants.Path.profileDir, "logs");
> +const kLogsIndexPath = OS.Path.join(OS.Constants.Path.profileDir,
> +                                    "indexedLogs.sqlite");

Is there a good reason for the kLogsIndexPath string to be stored in memory even if indexing is disabled?

@@ +109,5 @@
> +  // should use the conn getter, which will open a new connection if necessary
> +  // before returning this promise.
> +  _dbConn: null,
> +  // This promise is pending when the connection is being closed, and resolved otherwise.
> +  _closePromise: Promise.resolve(),

shouldn't this be initialized to null so that we have a way of knowing if a closing operation is in progress?

@@ +110,5 @@
> +  // before returning this promise.
> +  _dbConn: null,
> +  // This promise is pending when the connection is being closed, and resolved otherwise.
> +  _closePromise: Promise.resolve(),
> +  getConn: function() this._closePromise.then(() => {

I see no reason to use "Conn" rather than "Connection" here, it doesn't seem difficult to make the lines fit 80 chars.

@@ +115,5 @@
> +    if (this._dbConn)
> +      return this._dbConn;
> +    return this._dbConn = this.queueOperation(Task.async(function* () {
> +      let conn = yield Sqlite.openConnection({path: kLogsIndexPath});
> +      yield conn.execute("CREATE TABLE IF NOT EXISTS pathIds (" +

pathIds -> paths ?

@@ +119,5 @@
> +      yield conn.execute("CREATE TABLE IF NOT EXISTS pathIds (" +
> +                         "id INTEGER PRIMARY KEY, " +
> +                         "path VARCHAR UNIQUE);");
> +      yield conn.execute("CREATE VIRTUAL TABLE IF NOT EXISTS fileContents USING fts4();");
> +      yield conn.execute("CREATE TABLE IF NOT EXISTS fileNamesAndIds (" +

logFiles?

@@ +138,5 @@
> +    if (!this._dbConn)
> +      return Promise.resolve();
> +    return this._closePromise = this._dbConn.then(aConn => {
> +      this._dbConn = null;
> +      return aConn.close();

Is this returning something? Is it a promise?

@@ +166,5 @@
> +    return this._dbQueue = promise;
> +  }
> +};
> +
> +let gIndexer = {

maybe it would be useful to have an 'enabled' getter here to avoid duplicating the preference check several times.

@@ +184,5 @@
> +      catch (aError) {
> +        // Ignore any invalid JSON in the log file.
> +      }
> +    }
> +

Should we check here that content isn't empty anymore, and return early if it is?

@@ +198,5 @@
> +      // If there were no results, insert it.
> +      if (!results.length) {
> +        yield conn.executeCached("INSERT INTO pathIds (path) VALUES (:path);",
> +                                 {path: path});
> +        results = yield conn.executeCached("SELECT last_insert_rowid();");

Are you sure there's no longer a race condition here?

Would it work to do instead:
yield results = conn.executeCached("INSERT INTO pathIds (path) VALUES (:path); SELECT last_insert_rowid();", {path: path});

@@ +204,5 @@
> +      let pathId = results[0].getResultByIndex(0);
> +
> +      yield conn.executeCached("INSERT INTO fileContents (content) VALUES (:content);",
> +                               {content: content});
> +      results = yield conn.executeCached("SELECT last_insert_rowid();");

Same comment here.

@@ +206,5 @@
> +      yield conn.executeCached("INSERT INTO fileContents (content) VALUES (:content);",
> +                               {content: content});
> +      results = yield conn.executeCached("SELECT last_insert_rowid();");
> +      let contentId = results[0].getResultByIndex(0);
> +      yield conn.executeCached(

Nit: blank line before this.

@@ +233,5 @@
> +        }
> +      })});
> +      Services.prefs.setBoolPref("messenger.logging.reindex-required", false);
> +    });
> +    return this._indexingQueue = this._indexingQueue.then(job, job);

Do we expect the indexingQueue to be non-empty at this point?

@@ +243,5 @@
> +      return;
> +    let pendingFiles;
> +    try {
> +      pendingFiles = JSON.parse(
> +        Services.prefs.getCharPref("messenger.logging.filesToBeIndexed"));

The only place where I see you setting this pref is in the async shutdown code. If you want to index log files that are created but not closed cleanly due to a crash of Instantbird (or power loss, or whatever), I think you need to add the file to this pref as soon as you've started writing to the log file.

@@ +253,5 @@
> +  }),
> +
> +  // Executes a SELECT statement for the query and returns the array of results.
> +  search: Task.async(function* (aQuery, aPathsToMatch) {
> +    if (!aPathsToMatch.length || !(yield OS.File.exists(kLogsIndexPath)))

I don't think this code should look for the existence of the database file.

What about instead making getConn[ection] take an optional aCreate boolean parameter, and making it return null if the database doesn't exist?

@@ +273,5 @@
> +    let job = () => this.indexLogFile(aLogPath, new TextDecoder());
> +    this._indexingQueue = this._indexingQueue.then(job, job);
> +    this._indexingQueue.catch(aError => {
> +      let errorMsg = "Error indexing log file at " + aLogPath + ":\n" +
> +                     aError + "\n";

Drop this final "\n".

@@ +276,5 @@
> +      let errorMsg = "Error indexing log file at " + aLogPath + ":\n" +
> +                     aError + "\n";
> +      if (aError.errors) {
> +        for (let error of aError.errors)
> +          errorMsg += error.message + "\n";

"\n" + error.message

@@ +284,5 @@
> +    return this._indexingQueue;
> +  }
> +};
> +
> +AsyncShutdown.profileBeforeChange.addBlocker(

Is this shutdown blocker needed if indexing isn't enabled? If not, you should likely check the value of the pref before adding it.

@@ +302,5 @@
> +      closeLogWriter(convId);
> +    yield gIndexer._indexingQueue;
> +    let filesToBeIndexed = {
> +      paths: [...pendingFiles]
> +    };

Why is the {paths: } wrapping object needed?

@@ +304,5 @@
> +    let filesToBeIndexed = {
> +      paths: [...pendingFiles]
> +    };
> +    Services.prefs.setCharPref("messenger.logging.filesToBeIndexed",
> +                               JSON.stringify(filesToBeIndexed));

This pref probably shouldn't be written if indexing is disabled.

@@ +541,5 @@
> +function closeLogWriter(aConvId) {
> +  let logWriter = gLogWritersById.get(aConvId);
> +  if (!logWriter)
> +    return;
> +  if (logWriter.path)

I think you should check if indexing is enabled here.

@@ +917,3 @@
>  };
>  
> +function LogSet(aEntries) {

This needs a comment about what aEntries is.

@@ +917,4 @@
>  };
>  
> +function LogSet(aEntries) {
> +  this._paths = aEntries.sort((a, b) => a.name > b.name)

Is this .sort supposed to sort the entries by start time? May be worth a comment.

@@ +919,5 @@
> +function LogSet(aEntries) {
> +  this._paths = aEntries.sort((a, b) => a.name > b.name)
> +                        .map(aEntry => aEntry.path);
> +  this._relativeDirnames = new Set(
> +    [OS.Path.dirname(getRelativeLogPath(path)) for (path of this._paths)]);

Why is this a set rather than just the array?

Why do you need to save both the paths and the relative paths in the constructor?

It seems the relative paths would be shorter strings and so take less memory, so maybe that's what we would want to save. The paths are easy to rebuild from the relative paths, aren't they?

@@ +1124,5 @@
>         "account-connected", "account-disconnected",
>         "account-buddy-status-changed"].forEach(function(aEvent) {
>          Services.obs.addObserver(this, aEvent, false);
>        }, this);
> +      if (Services.prefs.getBoolPref("messenger.logging.reindex-required"))

Wrap this whole new block of code in a test check that indexing is enabled, then remove the pref checks in indexAllLogs and indexPendingLogs.
Attachment #8461099 - Flags: feedback?(florian) → feedback+
(Assignee)

Comment 31

3 years ago
> @@ +233,5 @@
> > +        }
> > +      })});
> > +      Services.prefs.setBoolPref("messenger.logging.reindex-required", false);
> > +    });
> > +    return this._indexingQueue = this._indexingQueue.then(job, job);
> 
> Do we expect the indexingQueue to be non-empty at this point?

I don't understand, are you asking if the promise exists? If so - yes. We initialize it to Promise.resolve() so it is guaranteed to exist since we never set it to null anywhere.

> @@ +919,5 @@
> > +function LogSet(aEntries) {
> > +  this._paths = aEntries.sort((a, b) => a.name > b.name)
> > +                        .map(aEntry => aEntry.path);
> > +  this._relativeDirnames = new Set(
> > +    [OS.Path.dirname(getRelativeLogPath(path)) for (path of this._paths)]);
> 
> Why is this a set rather than just the array?

To remove duplicates.
(In reply to Nihanth Subramanya [:nhnt11] from comment #31)
> > @@ +233,5 @@
> > > +        }
> > > +      })});
> > > +      Services.prefs.setBoolPref("messenger.logging.reindex-required", false);
> > > +    });
> > > +    return this._indexingQueue = this._indexingQueue.then(job, job);
> > 
> > Do we expect the indexingQueue to be non-empty at this point?
> 
> I don't understand, are you asking if the promise exists? If so - yes. We
> initialize it to Promise.resolve() so it is guaranteed to exist since we
> never set it to null anywhere.

I was wondering if it's possible to have already-queued indexing jobs for specific log files at the point where we decide to do a complete re-index. It seems like a complete re-index should clear the queue of all pending tasks related to indexing... but then I'm not sure that case is worth handling, because if we are confident the queue is always empty at that point, that case never happens.

> > @@ +919,5 @@
> > > +function LogSet(aEntries) {
> > > +  this._paths = aEntries.sort((a, b) => a.name > b.name)
> > > +                        .map(aEntry => aEntry.path);
> > > +  this._relativeDirnames = new Set(
> > > +    [OS.Path.dirname(getRelativeLogPath(path)) for (path of this._paths)]);
> > 
> > Why is this a set rather than just the array?
> 
> To remove duplicates.

Add a code comment please :). It wasn't obvious to me that you expected duplicates.
(Assignee)

Comment 33

3 years ago
Created attachment 8466508 [details] [diff] [review]
Patch v7

This addresses your review comments keeping in mind the discussions we had on IRC. Sorry for the delay.
Attachment #8461099 - Attachment is obsolete: true
Attachment #8461099 - Flags: review?(aleth)
Attachment #8466508 - Flags: review?(florian)
(Assignee)

Comment 34

3 years ago
FYI test_logger.js hasn't yet been updated according to the changes in Patch v7.
(Assignee)

Comment 35

3 years ago
Created attachment 8466511 [details] [diff] [review]
Update Instantbird UI v3

Update according to changes in Patch v7
Attachment #8458088 - Attachment is obsolete: true
Attachment #8458088 - Flags: review?(aleth)

Comment 36

3 years ago
Comment on attachment 8466508 [details] [diff] [review]
Patch v7

Review of attachment 8466508 [details] [diff] [review]:
-----------------------------------------------------------------

forEachDay seems like a nice simplification!

How much work would be to add a simple indexing test to the current log write/read test? e.g. turn on indexing beforehand, then after writing the logs, carry out a search.

::: chat/components/public/imILogger.idl
@@ +55,5 @@
>  interface imIProcessLogsCallback: nsISupports {
>    // The callback can return a promise. If it does, then it will not be called
>    // on the next log until this promise resolves. If it throws (or rejects),
>    // iteration will stop.
> +  jsval processLog(in imILog aLog);

Just checking: Are you sure there are no current use cases where we could save some overhead with a processLog(aPath) rather than constructing Log objects?

@@ +91,5 @@
>  
> +  // Below methods return promises that resolve to imILogSet instances.
> +  // If aGroupByDay is true, the LogSet will organize log files by date and
> +  // construct imILog instances from all of the files for a given date. If false,
> +  // imILog instances will be constructed from individual files.

Comment needs updating.

::: chat/components/src/logger.js
@@ +104,5 @@
> +
> +let gDatabase = {
> +  get path() OS.Path.join(OS.Constants.Path.profileDir, "indexedLogs.sqlite"),
> +  // Promise that resolves to a database connection when it is open. Consumers
> +  // should use the conn getter, which will open a new connection if necessary

connection getter

@@ +109,5 @@
> +  // before returning this promise.
> +  _dbConn: null,
> +  // This promise is pending when the connection is being closed, and resolved otherwise.
> +  _closePromise: Promise.resolve(),
> +  getConnection: Task.async(function* (aCreate=true) {

nit: spaces around =

@@ +147,5 @@
> +  // Promise queue for database operations to prevent race conditions between
> +  // statements that modify the database - particularly INSERTs. If one INSERT
> +  // statement gets executed while yielding another, last_insert_rowid cannot be
> +  // predicted. Use of this queue ensures that such statements get executed one
> +  // after another.

I don't think the example given here makes sense as INSERT/SELECT are in the same transaction, a single entry in this queue. Better refer to the warning in Sqlite.jsm.

@@ +151,5 @@
> +  // after another.
> +  _dbQueue: Promise.resolve(),
> +  queueOperation: function(aOperation) {
> +    // If we'd scheduled a close(), cancel it.
> +    if (this._closeTimer) {

I don't think you need this with the changes below.

@@ +159,5 @@
> +    let promise = this._dbQueue.then(aOperation, aOperation);
> +    promise.then(() => {
> +      // If after the operation has completed, no other operation has been queued,
> +      // set a timeout to close the connection after 5 minutes.
> +      if (this._dbQueue == promise) {

This is not the right time to check this. The right time to check it is when the timeout handler fires.

What about a clearTimeout if a closeTimer already exists?

@@ +161,5 @@
> +      // If after the operation has completed, no other operation has been queued,
> +      // set a timeout to close the connection after 5 minutes.
> +      if (this._dbQueue == promise) {
> +        this._closeTimer = setTimeout(() =>
> +          this._dbQueue = this.close(), 5 * 60 * 1000);

The timeout handler must delete this._closeTimer.

@@ +176,5 @@
> +    // case of a crash for example), this ensures it's indexed at next startup.
> +    let pendingFiles;
> +    try {
> +      pendingFiles = new Set(JSON.parse(
> +        Services.prefs.getCharPref("messenger.logging.filesToBeIndexed")));

Under what circumstances is this ever not empty?

@@ +182,5 @@
> +      pendingFiles = new Set();
> +    }
> +    pendingFiles.add(aLogPath);
> +    Services.prefs.setCharPref("messenger.logging.filesToBeIndexed",
> +                               JSON.stringify([...pendingFiles]));

Are you sure this isn't going to cause problems when we index all logs, or if something goes wrong while we index all logs?

@@ +240,5 @@
> +  }),
> +
> +  // This removes the current index and re-indexes all logs. It is assumed that
> +  // this function is called only at startup, and hence that the indexing queue
> +  // is empty.

What exactly happens if the assumption is wrong? Should it throw if it's not true?

@@ +241,5 @@
> +
> +  // This removes the current index and re-indexes all logs. It is assumed that
> +  // this function is called only at startup, and hence that the indexing queue
> +  // is empty.
> +  indexAllLogs: function() {

Should this check enabled()?

@@ +257,5 @@
> +        }
> +      }));
> +      Services.prefs.setBoolPref("messenger.logging.reindex-required", false);
> +    });
> +    return this._indexingQueue = this._indexingQueue.then(job, job);

If you're really confident the queue is empty, why not this._indexingQueue = Task.spawn(...contents of job)

@@ +289,5 @@
> +      [aQuery, ...aPathsToMatch]));
> +  }),
> +
> +  // Promise queue for indexing.
> +  _indexingQueue: Promise.resolve(),

If the indexer is always initialized with either an all-logs index or a pending-logs index, then this should not be the default. Instead an init method should do call one of those two (now private) methods and only there set _indexingQueue to a promise. This ensures errors if something is queued before init.

@@ +291,5 @@
> +
> +  // Promise queue for indexing.
> +  _indexingQueue: Promise.resolve(),
> +  // Returns a promise that resolves after the given file has been indexed.
> +  queueIndexingJob: function(aLogPath) {

Maybe call this queueFile considering the parameter is not a function.

@@ +864,5 @@
> +LogSet.prototype = {
> +  // Full paths of all log files in this LogSet.
> +  _paths: [],
> +  // Log folder paths relative to the logs directory. These are needed for search:
> +  // we query the database for log files whose directory paths belong to this Set.

Maybe "These determine the scope of a database search carried out with this log set".

@@ +889,5 @@
> +      let dayID;
> +      if (logFormat == "json") {
> +        // We clone the date for the log, reset it to the 0th hour/minute/second,
> +        // and use that to construct the day ID.
> +        // Array we'll put the log in.

uh?

@@ +900,5 @@
> +        // For legacy text logs, create Log instances from individual paths.
> +        dayID = dateForID.toISOString() + "txt";
> +      }
> +
> +      if (currentDayID != dayID) {

Please check this works just as the current code does when a particular day contains a mixture of txt and json logs with various timestamps. I think it may break some assumptions implicitly made here, but I'm not sure, as the sort is by filename. At any rate it warrants a comment.

@@ +936,5 @@
>  function Logger() { }
>  Logger.prototype = {
>    // Returned Promise resolves to an array of entries for the
>    // log folder if it exists, otherwise null.
>    _getLogArray: Task.async(function* (aAccount, aNormalizedName) {

Can we give this function a better name? It sounds more general than it is.
Attachment #8466508 - Flags: review?(florian) → review-

Comment 37

3 years ago
Comment on attachment 8466511 [details] [diff] [review]
Update Instantbird UI v3

Review of attachment 8466511 [details] [diff] [review]:
-----------------------------------------------------------------

When you do the corresponding TB UI changes, it may be an idea to consider unforking the chatLogTreeView object which is shared code, and stick it in a JS module. Rather than copypasting all the changes by hand.

::: im/components/ibConvStatsService.js
@@ +66,5 @@
>  
>      let decoder = new TextDecoder();
>  
> +    Services.logs.forEach(Task.async(function* (aLog) {
> +        let path = aLog.path;

How much overhead is it that we're now passing around a log object that we don't use?

@@ +67,5 @@
>      let decoder = new TextDecoder();
>  
> +    Services.logs.forEach(Task.async(function* (aLog) {
> +        let path = aLog.path;
> +        if (!path.endsWith(".json"))

Given that we do have a log object, I guess there is an aLog.format?

Comment 38

3 years ago
Comment on attachment 8466508 [details] [diff] [review]
Patch v7

Review of attachment 8466508 [details] [diff] [review]:
-----------------------------------------------------------------

::: chat/components/public/imILogger.idl
@@ +110,5 @@
>    // invokes the callback on every log file. Returns a promise that resolves when
>    // iteration is complete. If the callback returns a promise, iteration pauses
>    // until the promise resolves. If the callback throws (or rejects), iteration
>    // will stop and the returned promise will reject with the same error.
>    jsval forEach(in imIProcessLogsCallback aCallback);

Regarding the question of passing a path vs an imILog to the callback: maybe this "global" forEach should use a callback that takes paths as an argument, and the logset foreach should use the current one? What are the use cases for each?
(Assignee)

Comment 39

3 years ago
Created attachment 8467242 [details] [diff] [review]
Patch v8

Address review comments.
Attachment #8466508 - Attachment is obsolete: true
Attachment #8467242 - Flags: review?(aleth)

Comment 40

3 years ago
Comment on attachment 8467242 [details] [diff] [review]
Patch v8

Review of attachment 8467242 [details] [diff] [review]:
-----------------------------------------------------------------

r+ from me with nits fixed :)

::: chat/components/src/logger.js
@@ +153,5 @@
> +    promise.then(() => {
> +      // If we'd scheduled a close(), cancel it.
> +      if (this._closeTimer) {
> +        clearTimeout(this._closeTimer);
> +        delete this._closeTimer;

You don't need to delete it here, as you set it again in the next line.

@@ +420,5 @@
>    format: "txt",
>    encoder: new TextEncoder(),
>    startNewFile: function lw_startNewFile(aStartTime, aContinuedSession) {
> +    // Queue the previous file for indexing before we start a new one.
> +    if (this.path && gIndexer.enabled)

You don't need the second check as queueFile contains it too.

::: chat/components/src/test/test_logger.js
@@ +415,5 @@
> +  yield filteredLogs.forEachDay({processLog: Task.async(function* (aLog) {
> +    let conv = yield aLog.getConversation();
> +    let date = reduceTimeToDate(aLog.time);
> +    // One session message - the search should have filtered out the bad files.
> +    testMsgs(conv.getMessages(), messagesByDay.get(date), 1);

I think you also want to check the log returned is the right one (the one that contains "Nothing"). And fail if more than one date is returned. (Or maybe I'm missing something about the test here)
Attachment #8467242 - Flags: review?(aleth) → review+
(Assignee)

Comment 41

3 years ago
Created attachment 8467650 [details] [diff] [review]
Patch v9

Requesting r? again for the test changes.
Attachment #8467242 - Attachment is obsolete: true
Attachment #8467650 - Flags: review?(aleth)
(Assignee)

Comment 42

3 years ago
Created attachment 8467671 [details] [diff] [review]
Patch v9.1

In the searching test, check the date of the log instead of testing the messages.
Attachment #8467650 - Attachment is obsolete: true
Attachment #8467650 - Flags: review?(aleth)
Attachment #8467671 - Flags: review?(aleth)
(Assignee)

Comment 43

3 years ago
Created attachment 8467673 [details] [diff] [review]
Update Instantbird UI v4

Address review comments
Attachment #8466511 - Attachment is obsolete: true
Attachment #8467673 - Flags: review?(aleth)

Updated

3 years ago
Attachment #8467673 - Flags: review?(aleth) → review+

Comment 44

3 years ago
Comment on attachment 8467671 [details] [diff] [review]
Patch v9.1

Review of attachment 8467671 [details] [diff] [review]:
-----------------------------------------------------------------

Based on the discussion http://logs.glob.uno/?c=developers#c1026700 (doing your work for you in the interest of moving this forward) I don't think you can use prefs for your pendingFiles mechanism, at least not for the part in indexLogFile (they are safe for the shutdown part). You'll have to yield on a write to a file of your own if you really want such protection. But before you implement something like that think about the cases in which that too would fail and the cases in which it actually helps, to figure out if it's worthwhile.

This looks r+ otherwise so I'm r?ing flo.
Attachment #8467671 - Flags: review?(florian)
Attachment #8467671 - Flags: review?(aleth)
Attachment #8467671 - Flags: review-
(Assignee)

Comment 45

3 years ago
Created attachment 8467726 [details] [diff] [review]
Patch v10

Add log files to the filesToBeIndexed pref when they are created, rather than just before they are indexed. This simplifies the AsyncShutdown blocker nicely.
Attachment #8467671 - Attachment is obsolete: true
Attachment #8467671 - Flags: review?(florian)
Attachment #8467726 - Flags: review?(florian)
(Assignee)

Comment 46

3 years ago
Created attachment 8467735 [details] [diff] [review]
Update Thunderbird UI v2

Small update for the forEachDay changes.
Attachment #8458114 - Attachment is obsolete: true
Attachment #8458114 - Flags: review?(aleth)
Attachment #8467735 - Flags: review?(florian)

Comment 47

3 years ago
Comment on attachment 8467735 [details] [diff] [review]
Update Thunderbird UI v2

Review of attachment 8467735 [details] [diff] [review]:
-----------------------------------------------------------------

There is a section at the end of chatlogtreeview that does not match the IB version. Is that intentional? (Iirc the TB showLogList also has more parameters as it's not always desired to select the first entry)
Attachment #8467735 - Flags: review?(florian) → review-
(Assignee)

Comment 48

3 years ago
Created attachment 8467803 [details] [diff] [review]
Update Thunderbird UI v3

Updated _showLogList: It now returns a Promise since building the tree view is async.
Attachment #8467735 - Attachment is obsolete: true
Attachment #8467803 - Flags: review?(aleth)

Comment 49

3 years ago
Comment on attachment 8467803 [details] [diff] [review]
Update Thunderbird UI v3

Review of attachment 8467803 [details] [diff] [review]:
-----------------------------------------------------------------

::: mail/components/im/content/chat-messenger-overlay.js
@@ +632,5 @@
>  
>        imServices.logs.getLogsForContact(contact).then(aLogs => {
> +        this._showLogList(aLogs, true).then(aNotEmpty => {
> +          if (aNotEmpty)
> +            return;

Do we really need to call showLogList to find out if there are any logs, or what is this for? Can't this be simplified?

Updated

3 years ago
Attachment #8467803 - Flags: review?(aleth) → review+

Updated

3 years ago
Attachment #8467726 - Flags: review+
Comment on attachment 8467726 [details] [diff] [review]
Patch v10

Review of attachment 8467726 [details] [diff] [review]:
-----------------------------------------------------------------

All looks pretty good. The real reason for r- here is the concern about pointlessly iterating through the logs if someone calls the API to search without displaying a list of logs.

I still haven't looked at the test changes, but I'm fine with this part landing with only aleth's review, so unless you ask me specifically to look at it, I won't.

::: chat/components/src/logger.js
@@ +9,5 @@
>  Cu.import("resource:///modules/imXPCOMUtils.jsm");
>  Cu.import("resource:///modules/jsProtoHelper.jsm");
>  
>  Cu.import("resource://gre/modules/Task.jsm")
> +Cu.import("resource://gre/modules/AsyncShutdown.jsm");

Should this be lazy?

@@ +94,5 @@
> +const kLogDirPath = OS.Path.join(OS.Constants.Path.profileDir, "logs");
> +
> +// Returns the subpath from the logs directory.
> +function getRelativeLogPath(aPath) {
> +  if (!aPath.startsWith(kLogDirPath))

nit: {} if what's after the test is on more than one line (even if it's only one instruction).

@@ +115,5 @@
> +    if (this._dbConn)
> +      return this._dbConn;
> +    if (!aCreate && !(yield OS.File.exists(this.path)))
> +      return null;
> +    return this._dbConn = this.queueOperation(Task.async(function* () {

I think you need to add () around (foo = bar) when using the value from it, otherwise there'll be a JS strict warning.

@@ +135,5 @@
> +        "ON logFiles.pathId = paths.id;");
> +      return conn;
> +    }));
> +  }),
> +  close: function() {

what happens if close is called twice? Should we yield on this._closePromise before nullchecking this._dbConn?

@@ +138,5 @@
> +  }),
> +  close: function() {
> +    if (!this._dbConn)
> +      return Promise.resolve();
> +    return this._closePromise = this._dbConn.then(aConn => {

()

@@ +151,5 @@
> +  queueOperation: function(aOperation) {
> +    let promise = this._dbQueue.then(aOperation, aOperation);
> +    promise.then(() => {
> +      // If we'd scheduled a close(), cancel it.
> +      if (this._closeTimer)

I wonder if https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/DeferredTask.jsm would have been helpful here.

@@ +196,5 @@
> +      this._indexingQueue = this.indexAllLogs();
> +      return;
> +    }
> +    this._indexingQueue = Promise.resolve();
> +    if (Services.prefs.getCharPref("messenger.logging.filesToBeIndexed"))

This test doesn't seem useful. Can the pref just default to a valid empty JSON list?

@@ +270,5 @@
> +    Services.prefs.clearUserPref("messenger.logging.filesToBeIndexed");
> +    yield gDatabase.close();
> +    yield OS.File.remove(gDatabase.path);
> +    let decoder = new TextDecoder();
> +    yield Services.logs.forEach(Task.async(function* (aLogPath) {

What happens if Instantbird is closed before this full index finishes? How cleanly is it failing?

@@ +297,5 @@
> +  }),
> +
> +  // Executes a SELECT statement for the query and returns the array of results.
> +  search: Task.async(function* (aQuery, aPathsToMatch) {
> +    if (!aPathsToMatch.length)

I'm tempted to say the second parameter should be optional, and not specifying any path should mean "search all logs" but I don't think this is useful for our current UI, so let's keep that for a follow-up if we ever need it.

@@ +456,5 @@
> +        pendingFiles = new Set();
> +      }
> +      pendingFiles.add(this.path);
> +      Services.prefs.setCharPref("messenger.logging.filesToBeIndexed",
> +                                 JSON.stringify([...pendingFiles]));

The code reading and saving this pref is duplicated.

@@ +868,5 @@
> +  this._paths = aEntries.sort((a, b) => a.name > b.name)
> +                        .map(aEntry => aEntry.path);
> +  // Create a Set out of the array to remove duplicate entries.
> +  this._relativeDirnames = new Set(
> +    [OS.Path.dirname(getRelativeLogPath(path)) for (path of this._paths)]);

Am I reading correctly that to search logs for a contact we:
1. Get the relative paths for all the buddies of the contact.
2. Iterate through all these directories to get the whole list of paths.
3. Create a log set with this list of paths
4. The log set constructor (here) will then iterate over all paths to guess what the relative directory paths were.
5. Query the DB.
6. Rebuild the full paths from the DB results?

This seems inefficient. I think (2) should only happen the first time .forEach(Day) is called on the log set.

I think you can relatively easily do this by moving the content of _getLogFolderEntries to somewhere within the LogSet implementation. That new internal method would populate _path if it's empty but _relativeDirnames isn't.

(btw, should this be _relativeDir*N*ames? Or maybe just _dirs to simplify.)

@@ +1024,1 @@
>                                                         accountBuddy.normalizedName));

nit: indent

@@ +1033,1 @@
>                                                           accountBuddy.normalizedName));

nit: indent
Attachment #8467726 - Flags: review?(florian)
Attachment #8467726 - Flags: review-
Attachment #8467726 - Flags: feedback+

Updated

3 years ago
Blocks: 1071337

Updated

3 years ago
Blocks: 788145

Updated

3 years ago
No longer blocks: 1071337

Updated

2 years ago
Blocks: 954670
You need to log in before you can comment on or make changes to this bug.