You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of this crate is to store a lot of small (text) files in a compressed archive and serve them over http (without re-compressing them). Currently, we use bincode to serialize a HashMap<FilePath, CompressedFileContent> to a file. This means that to deserialize from the file we also have to use bincode, and currently, it'll have to read the whole archive into memory.
While this works for a prototype (which is what this crate currently is), to make this work with larger archives, it makes sense to use another file structure. My initial idea is to have an index that gets loaded into memory, and which maps from a file path to a slice (offset + length) in a "content" file. The content file can be mmap'd to read. Should we ever want to add data to the archive, we can append to the content file, and write a new index.
The fst crate supports building and storing index maps in a very efficient way by using – as the name suggests – finite state transducers. However, it has some limitations: It can currently only carry a u64 as value, and the key-value pairs have to be added in lexographical order.
Luckily, we can make this work with the design described above: Our index is a fst::Map that maps from file path to (offset as u32, len as u32) (using bit shifts to store both in a u64), and is build by first having the directory walker we currently use put all paths in a Vec, sorting it, and then adding entries to the index while compressing and writing file contents to another file.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
The goal of this crate is to store a lot of small (text) files in a compressed archive and serve them over http (without re-compressing them). Currently, we use bincode to serialize a
HashMap<FilePath, CompressedFileContent>
to a file. This means that to deserialize from the file we also have to use bincode, and currently, it'll have to read the whole archive into memory.While this works for a prototype (which is what this crate currently is), to make this work with larger archives, it makes sense to use another file structure. My initial idea is to have an index that gets loaded into memory, and which maps from a file path to a slice (offset + length) in a "content" file. The content file can be mmap'd to read. Should we ever want to add data to the archive, we can append to the content file, and write a new index.
The fst crate supports building and storing index maps in a very efficient way by using – as the name suggests – finite state transducers. However, it has some limitations: It can currently only carry a
u64
as value, and the key-value pairs have to be added in lexographical order.Luckily, we can make this work with the design described above: Our index is a
fst::Map
that maps from file path to(offset as u32, len as u32)
(using bit shifts to store both in a u64), and is build by first having the directory walker we currently use put all paths in a Vec, sorting it, and then adding entries to the index while compressing and writing file contents to another file.The text was updated successfully, but these errors were encountered: