> we did notice that the path name we give to kvstore is actually used as a directory where the actual data file is being stored and so we were wondering what if we would use a separate data file for each of the databases? would that be doable as a way to make the usage of kvstore for multiple database stored in the same store path on the rkv or kvstore side? Ooh, good question! I think only Extension Storage and Glean take advantage of multiple databases in the same environment, and neither uses cross-database transactions. That said, I'm wary of using separate files, because: * It's a departure from LMDB's semantics, where a single transaction is atomic across all databases in an environment. * We'd need to change the transaction implementation to keep track of which databases were modified, so that we know which files we need to save. * For transactions that affect multiple databases, we'd need to do extra work to avoid partial writes. (Or we could fail transactions that try to update multiple databases—but we'd still need to add code to track that, and then it's a backward-incompatible change for other consumers). * We'd need to decide if we should load the contents of all the files in memory when the environment is opened (that is, at the `Rkv::new` call) , or just when the database is opened (at the `open_single` call)? * What happens if file corruption only affects some of the database files in the directory? Do we move the entire directory aside, or do we only surface the corruption error when the caller tries to access the corrupt database? I'm not sure the latter is backward-compatible; Glean (and bug 1645907 😊) currently recover from `FileInvalid` when the environment is created, but not when an individual database in that environment fails to open. * Files and filesystems are surprisingly hard, even in the happy paths! These two articles by Dan Luu definitely made an impression on me: https://danluu.com/deconstruct-files/, https://danluu.com/file-consistency/ I like Dan Luu's conclusion in his first post: > I'm not saying you should never use files. There is a tradeoff here. But if you have an application where you'd like to reduce the rate of data corruption, considering using a database to store data instead of using files. ...Which, to bring it back to the beginning, is why we [considered LMDB at first](https://mozilla.github.io/firefox-browser-architecture/text/0015-rkv.html)—history doesn't repeat, but it does rhyme! 😅
Bug 1807010 Comment 4 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
> we did notice that the path name we give to kvstore is actually used as a directory where the actual data file is being stored and so we were wondering what if we would use a separate data file for each of the databases? would that be doable as a way to make the usage of kvstore for multiple database stored in the same store path on the rkv or kvstore side? Ooh, good question! I think only Extension Storage and Glean take advantage of multiple databases in the same environment, and neither uses cross-database transactions. That said, I'm wary of using separate files, because: * It's a departure from LMDB's semantics, where a single transaction is atomic across all databases in an environment. * We'd need to change the transaction implementation to keep track of which databases were modified, so that we know which files we need to save. * For transactions that affect multiple databases, we'd need to do extra work to avoid partial writes. (Or we could fail transactions that try to update multiple databases—but we'd still need to add code to track that, and then it's a backward-incompatible change for other consumers). * We'd need to decide if we should load the contents of all the files in memory when the environment is opened (that is, at the `Rkv::new` call) , or just when the database is opened (at the `open_single` call)? * What happens if file corruption only affects some of the database files in the directory? Do we move the entire directory aside, or do we only surface the corruption error when the caller tries to access the corrupt database? I'm not sure the latter is backward-compatible; Glean (and bug 1645907 😊) currently recover from `FileInvalid` when the environment is created, but not when an individual database in that environment fails to open. * Files and filesystems are surprisingly hard, even in the happy paths! These two articles by Dan Luu definitely made an impression on me: https://danluu.com/deconstruct-files/, https://danluu.com/file-consistency/ I like Dan Luu's conclusion in his first post: > I'm not saying you should never use files. There is a tradeoff here. But if you have an application where you'd like to reduce the rate of data corruption, considering using a database to store data instead of using files. ...Which, to bring it back to the beginning, is why we [considered LMDB at first](https://mozilla.github.io/firefox-browser-architecture/text/0015-rkv.html)—history doesn't repeat, but it does rhyme! 😅 I think we considered having an SQLite backend for rkv at one time, too, but punted on it because we wanted to use LMDB in the future, and safe mode wasn't meant to be permanent. It's a bit outside the scope of this particular bug, but I wonder if it's worth revisiting that decision now, as an alternative to expanding safe mode...