-
Notifications
You must be signed in to change notification settings - Fork 645
Record readme filename; use to determine whether to render as Markdown or not #1134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record readme filename; use to determine whether to render as Markdown or not #1134
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this PR @kivikakk !
I just have a few remarks, otherwise this seems great to me :)
src/render.rs
Outdated
@@ -141,11 +142,31 @@ impl<'a> Default for MarkdownRenderer<'a> { | |||
/// let text = "[Rust](https://rust-lang.org/) is an awesome *systems programming* language!"; | |||
/// let rendered = markdown_to_html(text)?; | |||
/// ``` | |||
pub fn markdown_to_html(text: &str) -> CargoResult<String> { | |||
fn markdown_to_html(text: &str) -> CargoResult<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we drop the pub
modifier on markdown_to_html
, we probably want to drop it for struct MarkdownRenderer
too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
src/render.rs
Outdated
let renderer = MarkdownRenderer::new(); | ||
renderer.to_html(text) | ||
} | ||
|
||
static MARKDOWN_EXTENSIONS: [&'static str; 7] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this is not public & pretty self-explanatory, we might want to add a small doc comment to this constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
src/render.rs
Outdated
".mkdown", | ||
]; | ||
|
||
pub fn readme_to_html(text: &str, filename: &str) -> CargoResult<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a doc comment on this function ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, forgot!
src/render.rs
Outdated
} | ||
} | ||
|
||
Ok(encode_minimal(text).replace("\n", "<br>\n")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we pulling htmlescape
as a new dependency ? Isn't ammonia
already doing a similar job ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
htmlescape
is replacing characters with their entities in order to basically turn text into HTML that represents/renders as the same text; ammonia
is parsing HTML itself and then sanitizing that DOM. They're pretty different (and unfortunately, I looked and neither ammonia
nor any of its dependencies, like html5ever
, can do the job of htmlescape
simply.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine :)
@@ -65,6 +65,7 @@ pub struct Crate { | |||
pub homepage: Option<String>, | |||
pub documentation: Option<String>, | |||
pub readme: Option<String>, | |||
pub readme_file: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we're now storing the README filename alongside its content, though it doesn't seem to be used anywhere ? I'm not opposed to storing it but I'm not sure to see how useful it is to store it in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we don't strictly need to store it. I was thinking we might need it in render-readmes
, but that pulls the README filename from the manifest directly. However, I feel like storing this as well might be important for things like #1025/#1019?
A more general argument is that, if there's no need to store the README filename in the database, then there's no need to store the README content in the database, because we can't properly make sense of the latter without the former.
That's moreorless the motivation of this PR in a nutshell: we don't know how to interpret the README contents without knowing its filename. So if we store the content, we should store the filename too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're totally right about needing to store the filename alongside the README content, my bad.
However, maybe we can improve a little here and instead of storing the README filename as transmitted, how about storing the normalized README type ? i.e only markdown
for now. This would also allow us to use an ENUM
instead of a VARCHAR
field in the database and simplify its usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of doing this, but I'm worried about the loss of fidelity. e.g. say we do so, and currently store 'markdown'
or NULL
for everything. Later we add AsciiDoc or org-mode support — it'd be great to go back and rerender the newly supported readmes, but to do so we'd have to reach back inside each crate and retrieve the readme filename from the manifest again.
I think storing the filename directly allows us more flexibility later on in how we want to treat certain things, especially in the face of changing/new standards and formats; this is how we approach file classification at GitHub, for example, and it works well for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but to do so we'd have to reach back inside each crate and retrieve the readme filename from the manifest again.
In any case, we probably will have to do it. The crates
table only stores the README of the last published version of a given crate, so if we add org-mode or AsciiDoc support to crates.io we'd still have to reach back inside each crate in order to render it. Anyway, you're right about your approach being more flexible so let's keep it this way :)
I think we should still assume that |
@sgrif That works for me! Change made. |
Looks great and works great with the cargo change, and the current behavior still works with stable cargo! Just merged in upstream to resolve the merge conflicts :) bors: r+ |
1134: Record readme filename; use to determine whether to render as Markdown or not r=carols10cents Fixes #995. Depends on rust-lang/cargo#4633 to have the desired effect, but won't break in the absence of that PR; we'll just assume the readme is called `"README.md"` if the client doesn't report one. Small CSS fix included to wrap extra long lines. Comparison of render of https://crates.io/crates/relm's README: | Before | After | | :-: | :-: | |  |  |
Build succeeded |
Fixes #995. Depends on rust-lang/cargo#4633 to have the desired effect, but won't break in the absence of that PR; we'll just assume the readme is called
"README.md"
if the client doesn't report one.Small CSS fix included to wrap extra long lines.
Comparison of render of https://crates.io/crates/relm's README: