Skip to content

Record readme filename; use to determine whether to render as Markdown or not #1134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 25, 2017
Merged

Record readme filename; use to determine whether to render as Markdown or not #1134

merged 8 commits into from
Oct 25, 2017

Conversation

kivikakk
Copy link
Contributor

Fixes #995. Depends on rust-lang/cargo#4633 to have the desired effect, but won't break in the absence of that PR; we'll just assume the readme is called "README.md" if the client doesn't report one.

Small CSS fix included to wrap extra long lines.

Comparison of render of https://crates.io/crates/relm's README:

Before After
screen shot 2017-10-17 at 5 51 50 pm screen shot 2017-10-17 at 5 51 51 pm

Copy link
Contributor

@kureuil kureuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this PR @kivikakk !

I just have a few remarks, otherwise this seems great to me :)

src/render.rs Outdated
@@ -141,11 +142,31 @@ impl<'a> Default for MarkdownRenderer<'a> {
/// let text = "[Rust](https://rust-lang.org/) is an awesome *systems programming* language!";
/// let rendered = markdown_to_html(text)?;
/// ```
pub fn markdown_to_html(text: &str) -> CargoResult<String> {
fn markdown_to_html(text: &str) -> CargoResult<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we drop the pub modifier on markdown_to_html, we probably want to drop it for struct MarkdownRenderer too ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

src/render.rs Outdated
let renderer = MarkdownRenderer::new();
renderer.to_html(text)
}

static MARKDOWN_EXTENSIONS: [&'static str; 7] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is not public & pretty self-explanatory, we might want to add a small doc comment to this constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

src/render.rs Outdated
".mkdown",
];

pub fn readme_to_html(text: &str, filename: &str) -> CargoResult<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a doc comment on this function ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, forgot!

src/render.rs Outdated
}
}

Ok(encode_minimal(text).replace("\n", "<br>\n"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we pulling htmlescape as a new dependency ? Isn't ammonia already doing a similar job ?

Copy link
Contributor Author

@kivikakk kivikakk Oct 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

htmlescape is replacing characters with their entities in order to basically turn text into HTML that represents/renders as the same text; ammonia is parsing HTML itself and then sanitizing that DOM. They're pretty different (and unfortunately, I looked and neither ammonia nor any of its dependencies, like html5ever, can do the job of htmlescape simply.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine :)

@@ -65,6 +65,7 @@ pub struct Crate {
pub homepage: Option<String>,
pub documentation: Option<String>,
pub readme: Option<String>,
pub readme_file: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we're now storing the README filename alongside its content, though it doesn't seem to be used anywhere ? I'm not opposed to storing it but I'm not sure to see how useful it is to store it in the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we don't strictly need to store it. I was thinking we might need it in render-readmes, but that pulls the README filename from the manifest directly. However, I feel like storing this as well might be important for things like #1025/#1019?

A more general argument is that, if there's no need to store the README filename in the database, then there's no need to store the README content in the database, because we can't properly make sense of the latter without the former.

That's moreorless the motivation of this PR in a nutshell: we don't know how to interpret the README contents without knowing its filename. So if we store the content, we should store the filename too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're totally right about needing to store the filename alongside the README content, my bad.

However, maybe we can improve a little here and instead of storing the README filename as transmitted, how about storing the normalized README type ? i.e only markdown for now. This would also allow us to use an ENUM instead of a VARCHAR field in the database and simplify its usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of doing this, but I'm worried about the loss of fidelity. e.g. say we do so, and currently store 'markdown' or NULL for everything. Later we add AsciiDoc or org-mode support — it'd be great to go back and rerender the newly supported readmes, but to do so we'd have to reach back inside each crate and retrieve the readme filename from the manifest again.

I think storing the filename directly allows us more flexibility later on in how we want to treat certain things, especially in the face of changing/new standards and formats; this is how we approach file classification at GitHub, for example, and it works well for us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but to do so we'd have to reach back inside each crate and retrieve the readme filename from the manifest again.

In any case, we probably will have to do it. The crates table only stores the README of the last published version of a given crate, so if we add org-mode or AsciiDoc support to crates.io we'd still have to reach back inside each crate in order to render it. Anyway, you're right about your approach being more flexible so let's keep it this way :)

@sgrif
Copy link
Contributor

sgrif commented Oct 18, 2017

I think we should still assume that README with no extension is markdown here. Other than that this looks fine.

@kivikakk
Copy link
Contributor Author

@sgrif That works for me! Change made.

@carols10cents
Copy link
Member

Looks great and works great with the cargo change, and the current behavior still works with stable cargo! Just merged in upstream to resolve the merge conflicts :)

bors: r+

bors-voyager bot added a commit that referenced this pull request Oct 25, 2017
1134: Record readme filename; use to determine whether to render as Markdown or not r=carols10cents

Fixes #995. Depends on rust-lang/cargo#4633 to have the desired effect, but won't break in the absence of that PR; we'll just assume the readme is called `"README.md"` if the client doesn't report one.

Small CSS fix included to wrap extra long lines.

Comparison of render of https://crates.io/crates/relm's README:

| Before | After |
| :-: | :-: |
| ![screen shot 2017-10-17 at 5 51 50 pm](https://user-images.githubusercontent.com/1915/31650690-42018c7a-b364-11e7-95f8-7e1b8e8d56a0.png) | ![screen shot 2017-10-17 at 5 51 51 pm](https://user-images.githubusercontent.com/1915/31650698-44d16588-b364-11e7-92cf-1c3201c0e60e.png) |
@bors-voyager
Copy link
Contributor

bors-voyager bot commented Oct 25, 2017

Build succeeded

@bors-voyager bors-voyager bot merged commit 931427f into rust-lang:master Oct 25, 2017
@kivikakk kivikakk deleted the record-readme-filename branch October 26, 2017 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only interpret README files whose filename ends in .md as markdown, render plain text otherwise
4 participants