Reactive type-safe persistence

Update:

Something similar to this idea now exists - RXDB.

It is difficult to get a type-safe persistent storage working in the JavaScript ecosystem.

A common constraint that has emerged is that database type systems are incompatible with the types of JavaScript (and TypeScript). Prisma kinda does some stuff around this, but so far, I have not found it particularly convenient. It still requires intervention to get working, and to maintain.

Instead, I want to explore what would be possible if we stop trying to squeeze our data into a database.

Given that we’re working with JavaScript, we will be storing JS data, and the obvious format to store that data in is JSON.

However, a data store completely written in JSON has a couple problems:

  • While SQLite has shown that storing a database in a single file can be reasonable, SQLite is written in C and has lower level control at it’s disposal. As far as I know, there isn’t a way to mutate a JSON file using current npm tools without parsing the whole thing.
  • Even with an optimal solution, finding data with a specific nesting path would require parsing either from the start of the file to the location, or the end of the file to the location. The longer this file therefore, the more memory will be required to execute each instruction.
  • While JSON can encode most JavaScript values, it doesn’t encode everything.
    • undefined, NaN, Infinity, Date, RegExp are examples of things that could be reasonably encoded, but aren’t with JSON.stringify.
  • Multiple operations cannot be performed simultaneously unless they are batched, which has other compromises.

So an alternative solution is required.

The other variable in storing files is using the filesystem itself; using multiple files, giving files names, nesting them in directories.

Git uses this method to store chunks of data:

<project>/.git/objects
drwxr-xr-x   5 lorem ipsum  160 Jun  6 14:38 00
drwxr-xr-x   5 lorem ipsum  160 Jun  6 18:06 01
drwxr-xr-x   6 lorem ipsum  192 Jun  6 14:46 02
drwxr-xr-x   6 lorem ipsum  192 Jun  6 14:46 03
drwxr-xr-x   6 lorem ipsum  192 Aug 22 22:27 04
drwxr-xr-x   6 lorem ipsum  192 Jun  7 14:28 05
drwxr-xr-x   3 lorem ipsum   96 May 30 17:21 06
drwxr-xr-x   7 lorem ipsum  224 Jun  6 10:13 07
drwxr-xr-x   6 lorem ipsum  192 Aug 22 22:33 08
drwxr-xr-x   5 lorem ipsum  160 Jun  7 14:28 09
drwxr-xr-x   3 lorem ipsum   96 May 31 19:05 0a
drwxr-xr-x  10 lorem ipsum  320 Aug 22 22:13 0b
drwxr-xr-x   5 lorem ipsum  160 Aug 22 22:13 0c
drwxr-xr-x   5 lorem ipsum  160 Aug 22 22:27 0d
drwxr-xr-x   6 lorem ipsum  192 Jun  4 10:13 0e
drwxr-xr-x   4 lorem ipsum  128 Jun  2 14:09 0f
drwxr-xr-x   5 lorem ipsum  160 Jun  6 12:09 10
drwxr-xr-x   6 lorem ipsum  192 Jun  7 14:28 11
drwxr-xr-x   4 lorem ipsum  128 Jun  5 22:03 12
drwxr-xr-x   4 lorem ipsum  128 Jun  6 14:46 13
drwxr-xr-x   8 lorem ipsum  256 Aug 22 22:13 14
drwxr-xr-x   4 lorem ipsum  128 Jun  2 16:24 15
  ...

Proposal

Create a statically typed data-structure that stores, modifies and retrieves persistent data.

Interestingly, nothing prohibits working with this using actual Arrays and Objects and doing this synchronously using a JS Proxy, in a manner similar to this:

// example.js
const x = PersistentArray([]);
console.log(x);
x[0] = { a: "b" };
x.push({ a: "c" });
$ node example.js
-> []
$ node example.js
-> [{ a: "b" }, { a: "c" }]
$ node example.js
-> [{ a: "b" }, { a: "c" }, { a: "c" }]

However, I’d probably suggest something that makes use of Promises, and async and await; this would allow the data-structure to perform optimizations, and allow it the ability to improve over time in a backwards compatible way.

Other suggestions for refinements:

  • Pass in types and runtime checks
  • Maybe use function calls; not because they’re necessary but because rather they’d communicate to both the author, linters and minifiers that the code is not static.
    • JS programmers are instinctively untrusting of function calls; assuming them to be inefficient. Effort must be given to assuage this concern, and to trust the process!
  • Allow the user an option to use a local cache on fetches (à la GraphQL clients) but not require it; some users may want data to be able to be changed manually without restarting the running program, or want to change it in another script.
  • Allow optional namespacing (default to “default”). Also find a way to prevent namespace names that would make for invalid file names.