Skip to content

Implement an index-based approach #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
killercup opened this issue Oct 20, 2018 · 1 comment
Closed

Implement an index-based approach #1

killercup opened this issue Oct 20, 2018 · 1 comment

Comments

@killercup
Copy link
Owner

killercup commented Oct 20, 2018

The goal of this crate is to store a lot of small (text) files in a compressed archive and serve them over http (without re-compressing them). Currently, we use bincode to serialize a HashMap<FilePath, CompressedFileContent> to a file. This means that to deserialize from the file we also have to use bincode, and currently, it'll have to read the whole archive into memory.

While this works for a prototype (which is what this crate currently is), to make this work with larger archives, it makes sense to use another file structure. My initial idea is to have an index that gets loaded into memory, and which maps from a file path to a slice (offset + length) in a "content" file. The content file can be mmap'd to read. Should we ever want to add data to the archive, we can append to the content file, and write a new index.

The fst crate supports building and storing index maps in a very efficient way by using – as the name suggests – finite state transducers. However, it has some limitations: It can currently only carry a u64 as value, and the key-value pairs have to be added in lexographical order.

Luckily, we can make this work with the design described above: Our index is a fst::Map that maps from file path to (offset as u32, len as u32) (using bit shifts to store both in a u64), and is build by first having the directory walker we currently use put all paths in a Vec, sorting it, and then adding entries to the index while compressing and writing file contents to another file.

@killercup
Copy link
Owner Author

Implemented.

This was referenced Oct 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant