Description
Renewal of #43223
In the discussion of io/fs and embed, a few people asked for automatic serving of ETag headers for static content, using content hashes.
Here is a proposal which tries to address the concerns raised in #43223.
Accepted proposal
In io/fs, define
// HashFileInfo provides hashes of the file content in constant time.
type HashFileInfo interface {
FileInfo
// Hash returns content hashes of the file that uniquely
// identifies the file contents.
//
// Hash must NOT compute any hash of the file during the call.
// That is, it must run in time O(1) not O(length of file).
// If no content hash is already available, Hash should
// return nil rather than take the time to compute one.
//
// The order of the returned hashes must be constant (preferred hashes first).
Hash() []Hash
}
// Hash indicates the hash of a given content.
type Hash struct {
// Algorithm indicates the algorithm used. Implementations are encouraged
// to use package-like name for interoperability with other systems
// (lowercase, without dash: e.g. sha256, sha1, crc32)
Algorithm string
// Sum is the result of the hash, it should not be modified by the caller.
Sum []byte
}
Then, in net/http.serveFile, serveFile calls Stat, and if the result implements HashFileInfo, it calls info.Hash. If that returns >=1 hashes, serveFile uses hash[0] as the Etag header, formatting it using Alg+":"+base64(Sum).
In package embed, the file type would add a Hash method and an assertion that it implements HashFileInfo. It would return a single hash with Algorithm “sha256”.
Original proposal (fs.File)
First, in io/fs, define
// A ContentHashFile is a file that can return hashes of its content in constant time.
type ContentHashFile interface {
fs.File
// ContentHash returns content hashes of the file that uniquely
// identifies the file contents.
// The returned hashes should be of the form algorithm-base64.
// Implementations are encouraged to use sha256, sha384, or sha512
// as the algorithms and RawStdEncoding as the base64 encoding,
// for interoperability with other systems (e.g. Subresource Integrity).
//
// ContentHash must NOT compute any hash of the file during the call.
// That is, it must run in time O(1) not O(length of file).
// If no content hash is already available, ContentHash should
// return nil rather than take the time to compute one.
ContentHash() []string
}
Second, in net/http, when serving a File (in serveFile
, right before serveContent
for instance), if it implements ContentHashFile and the ContentHash method succeeds and is alphanumeric (no spaces, no Unicode, no symbols, to avoid any kind of header problems), use that result as the default ETag.
func setEtag(w http.ResponseWriter, file File) {
if ch, ok := file.(fs.ContentHashFile); ok {
if w.Header().Get("Etag") != "" {
return
}
for _, h := range ch.ContentHash() {
// TODO: skip the hash if unsuitable (space, unicode, symbol)
// TODO: should the etag be weak or strong?
w.Header().Set("Etag", `W/"`+h+`"`)
break
}
}
}
Third, add the ContentHash
method on http.ioFile
file (as a proxy to the fs.File
ContentHash
method).
Fourth (probably out of scope for this proposal), add the ContentHash
method on embed.FS
files.
This proposal fixes the following objections:
The API as proposed does not let the caller request a particular implementation.
The caller will simply get all available implementations and can filter them out.
The API as proposed does not let the implementation say which hash it used.
The implementers are encouraged to indicate the algorithm used for each hash.
The API as proposed does not let the implementation return multiple hashes.
This one does.
what is expected to happen if the ContentHash returns an error before transport?
This implementation cannot return an error (the implementer choose to panic. Returning nil
seems better suited).
Drop this proposal and let third-party code fill this need.
It is currently very cumbersome, since the middleware would need to open the file as well (which means having the exact same logic regarding URL cleanup as the http.FileServer). Here is my attempt: https://git.sr.ht/~oliverpool/exp/tree/main/item/httpetag/fileserver.go (even uglier, since I have use reflect
to retrieve the underlying fs.File
from the http.File
).
Could a "github-collaborator" post a message in #43223 to notify the people who engaged in previous proposal of this updated proposal?
Metadata
Metadata
Assignees
Type
Projects
Status