Skip to content

Document version fetching #218

Closed
Closed
@alecgibson

Description

@alecgibson

Problem statement

As a ShareDb client, I want to fetch a read-only snapshot of my document at a given version number or timestamp.

Motivation

By the time we're going to all the effort of storing a set of deltas between document versions, it's only natural that a client would wish to leverage this power to view a document at any point in its history.

The problem statement mentions fetching a document at a given timestamp, because it is far more natural to request a document at a particular time, than at a given (arbitrary) version number.

API

The proposed API is to add two methods to the Doc class:

  • Doc.prototype.fetchVersion(version, callback) takes version, which is a number, and recreates the snapshot up to that version number. The result is stored in doc.data
  • Doc.prototype.fetchAtTime(time, callback) takes time, which is a Date, and recreates the snapshot using ops whose timestamps are up-to-and-including that Date. The result is stored in doc.data

Implementation details

Data flow

The request for the document version will be submitted like the existing fetch function - by submitting an event to the server, and attaching a callback.

The message will be picked up by Agent._handleMessage, which will then leverage Backend.getOps to fetch the requested ops.

We may need to make a small change to Backend.getOps to let us request metadata from the backend using the options object. As discussed in this Pull Request, this will be done in such a way that keeps the option object out of the public API (probably by creating an internal Backend._getOps method that can take an options object, and calling it with null from Backend.getOps).

Discussion

Read-only snapshots

Using the Doc class is potentially a leaky concern, given that it will also have Doc.prototype.submitOp, which doesn't really make sense when fetching an historical document.

A possible alternative could be to expose these functions instead on the Connection class? That way it should be very clear that the consumer is receiving a snapshot, and not a full-blown Doc?

Out of scope

The following possibilities are deemed out-of-scope for the initial solution.

Optimising for reversible types

Making a type reversible is optional. As such, any solution must at least be able to construct a document from its initial version, and build up. However, given the nature of documents, it is highly likely that users will wish to return to more recent versions, where it will probably be faster to start from the current version and work backwards.

This is deemed out-of-scope.

Caching ops

Fetching ops can be expensive. Ideally we would cache the ops for a given document, and - so long as the requested version/timestamp is lower than the latest op - then we could simply read ops back from the cache.

This is deemed out-of-scope.

Other starting snapshot optimisations

It could be possible to fetch the latest create op, and start building from there, instead of the very beginning. It might also (theoretically) be possible to store intermittent snapshots of the document for faster reconstruction at a trade-off with space.

These sorts of optimisations are also deemed out-of-scope.

Doing the work

Given that we need this functionality, I'm happy to undertake the majority of the work on this, but I haven't developed in this codebase before, so may need some assistance (especially because I haven't really worked with all the features of ShareDb, such as projections).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions