Add $value and $object properties for indexing to ES

NEW
Unassigned

Status

Testing
ActiveData
2 years ago
23 days ago

People

(Reporter: ekyle, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
ES can not just store any document; the document must conform to the existing schema (as determined by previous documents and schema rules).  This allows {"a": 1} and {"a": "ok"} to be stored, because  integers can be translated to strings; but {"a": "ok"} and {"a": {"not": "ok"}} can not be indexed:  There is no map from integers to objects. 

This lack flexibility can be overcome by adding `$value` property to all leaf nodes:

> {"a": 1} -> {"a": {"$value": 1}}

This will allow future indexing of object property values.  Technically, I do not expect this to be a performance consideration because ES already only indexes leaf nodes by full name; the names are a bit longer, but there are not more of them.

Adding `$value` also allows the indexing of pure values:  ES is not able to index strings on their own, only objects with string property values.  Demanding an artificial container for a string is annoying, and may interfere with the namespace of future documents.  

> "save me!" -> {"$value": "save me!"}

Furthermore, ES has no visibility to he existence of objects (non-leaf nodes).  After indexing {"a": {"not": "ok"}}, a subsequent filter query with {"exists":{"field":"a"}} will return nothing.  This is misleading, and requires the query writer know a specific property that the object may contain so it's existence  can be queried (eg {"exists":{"field":"a.not"}} will return true)

To solve this, all objects will be given an `$object` property (with a value of "."?), so they can be queried for existence.

> {"a": {"not": "ok"}} -> {"a" : {"$object":".", "not": {"$value": "ok"}}}

Again, the extra computational overhead is expected to minimal. 

Both these mappings are expected to be handled by the Qb query translation layer, so the documents stored and the queries on them are unaffected.  It is expected this be done to the debug log first:  Which suffers from these two detriments in the most extreme case: The diversity of shapes in the debug log makes it unindexable, and is preventing basic monitoring of the programs using structured logging.
(Reporter)

Updated

2 years ago
Assignee: klahnakoski → nobody
(Reporter)

Comment 1

23 days ago
See

https://github.com/klahnakoski/ActiveData/blob/dev/docs/Outreachy%20Proposal.md
You need to log in before you can comment on or make changes to this bug.