Skip to content

Switch from html5ever to lol_html #876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jyn514 opened this issue Jul 2, 2020 · 4 comments · Fixed by #930
Closed

Switch from html5ever to lol_html #876

jyn514 opened this issue Jul 2, 2020 · 4 comments · Fixed by #930
Labels
A-backend Area: Webserver backend E-medium Effort: This requires a fair amount of work P-medium Medium priority

Comments

@jyn514
Copy link
Member

jyn514 commented Jul 2, 2020

See kuchiki-rs/kuchiki#74 (comment) for discussion. In a nutshell, kuchiki is not intended for low-memory usage: it uses lots of Rc and RefCell (one for each node in the tree!) LOL HTML is intended for exactly our use case:

Low Output Latency streaming HTML rewriter/parser with CSS-selector based API.

This would allow us to step the size of files rendered way up, possibly removing the limit altogether (#834).

@jyn514 jyn514 added E-easy Effort: Should be easy to implement and would make a good first PR P-medium Medium priority labels Jul 2, 2020
@jyn514
Copy link
Member Author

jyn514 commented Jul 2, 2020

LOL HTML is also developed by cloudflare so it has seen a lot of real-world usage.

@jyn514 jyn514 added the A-frontend Area: Web frontend label Jul 2, 2020
@Kixiron
Copy link
Member

Kixiron commented Jul 2, 2020

I think having a limit even if it's absurdly high and will "never" be hit is something good, just as a safeguard

Edit: LOL Html has MemorySettings which allows us to specify the maximum and minimum memory used for parsing

@Kixiron
Copy link
Member

Kixiron commented Jul 2, 2020

And according to Cloudflare's blog post LOL Html is vastly faster than html5ever and scales much better. I believe our performance will be somewhere in the ballpark of the tag scanner, but potentially even better than "normal" parsing since we grab two portions of the html and LOL doesn't parse innards if you specify a tag, while html5ever parses everything unconditionally

I'm working on benches, so stand by for those

@Kixiron
Copy link
Member

Kixiron commented Jul 3, 2020

Unfortunately I don't think this can be implemented until lol-html/#40 is upstreamed

@Kixiron Kixiron added the S-blocked Status: marked as blocked ❌ on something else such as an RFC or other implementation work. label Jul 3, 2020
@jyn514 jyn514 added E-medium Effort: This requires a fair amount of work and removed E-easy Effort: Should be easy to implement and would make a good first PR labels Jul 6, 2020
@jyn514 jyn514 added A-backend Area: Webserver backend and removed A-frontend Area: Web frontend labels Jul 14, 2020
@jyn514 jyn514 removed the S-blocked Status: marked as blocked ❌ on something else such as an RFC or other implementation work. label Aug 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend Area: Webserver backend E-medium Effort: This requires a fair amount of work P-medium Medium priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants