Skip to content

Record readme filename; use to determine whether to render as Markdown or not #1134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 25, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ oauth2 = "0.3"
log = "0.3"
env_logger = "0.4"
hex = "0.2"
htmlescape = "0.3.1"
license-exprs = "^1.3"
dotenv = "0.10.0"
toml = "0.4"
Expand Down
1 change: 1 addition & 0 deletions app/styles/crate.scss
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,7 @@
}
.crate-readme {
line-height: 1.5;
overflow-wrap: break-word;

img {
max-width: 100%;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ALTER TABLE crates DROP COLUMN readme_file;
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ALTER TABLE crates ADD COLUMN readme_file VARCHAR;
6 changes: 3 additions & 3 deletions src/bin/render-readmes.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// Iterates over every crate versions ever uploaded and (re-)renders their
// readme using the Markdown renderer from the cargo_registry crate.
// readme using the readme renderer from the cargo_registry crate.
//
// Warning: this can take a lot of time.

Expand Down Expand Up @@ -34,7 +34,7 @@ use url::Url;

use cargo_registry::{Config, Version};
use cargo_registry::schema::*;
use cargo_registry::render::markdown_to_html;
use cargo_registry::render::readme_to_html;

const DEFAULT_PAGE_SIZE: usize = 25;
const USAGE: &'static str = "
Expand Down Expand Up @@ -255,7 +255,7 @@ fn get_readme(config: &Config, version: &Version, krate_name: &str) -> Option<St
manifest.package.readme.unwrap()
);
let contents = find_file_by_path(&mut entries, Path::new(&path), &version, &krate_name);
markdown_to_html(&contents).expect(&format!(
readme_to_html(&contents, &path).expect(&format!(
"[{}-{}] Couldn't render README",
krate_name,
version.num
Expand Down
4 changes: 4 additions & 0 deletions src/krate/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ pub struct Crate {
pub homepage: Option<String>,
pub documentation: Option<String>,
pub readme: Option<String>,
pub readme_file: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we're now storing the README filename alongside its content, though it doesn't seem to be used anywhere ? I'm not opposed to storing it but I'm not sure to see how useful it is to store it in the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we don't strictly need to store it. I was thinking we might need it in render-readmes, but that pulls the README filename from the manifest directly. However, I feel like storing this as well might be important for things like #1025/#1019?

A more general argument is that, if there's no need to store the README filename in the database, then there's no need to store the README content in the database, because we can't properly make sense of the latter without the former.

That's moreorless the motivation of this PR in a nutshell: we don't know how to interpret the README contents without knowing its filename. So if we store the content, we should store the filename too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're totally right about needing to store the filename alongside the README content, my bad.

However, maybe we can improve a little here and instead of storing the README filename as transmitted, how about storing the normalized README type ? i.e only markdown for now. This would also allow us to use an ENUM instead of a VARCHAR field in the database and simplify its usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of doing this, but I'm worried about the loss of fidelity. e.g. say we do so, and currently store 'markdown' or NULL for everything. Later we add AsciiDoc or org-mode support — it'd be great to go back and rerender the newly supported readmes, but to do so we'd have to reach back inside each crate and retrieve the readme filename from the manifest again.

I think storing the filename directly allows us more flexibility later on in how we want to treat certain things, especially in the face of changing/new standards and formats; this is how we approach file classification at GitHub, for example, and it works well for us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but to do so we'd have to reach back inside each crate and retrieve the readme filename from the manifest again.

In any case, we probably will have to do it. The crates table only stores the README of the last published version of a given crate, so if we add org-mode or AsciiDoc support to crates.io we'd still have to reach back inside each crate in order to render it. Anyway, you're right about your approach being more flexible so let's keep it this way :)

pub license: Option<String>,
pub repository: Option<String>,
pub max_upload_size: Option<i32>,
Expand All @@ -66,6 +67,7 @@ type AllColumns = (
crates::homepage,
crates::documentation,
crates::readme,
crates::readme_file,
crates::license,
crates::repository,
crates::max_upload_size,
Expand All @@ -81,6 +83,7 @@ pub const ALL_COLUMNS: AllColumns = (
crates::homepage,
crates::documentation,
crates::readme,
crates::readme_file,
crates::license,
crates::repository,
crates::max_upload_size,
Expand Down Expand Up @@ -132,6 +135,7 @@ pub struct NewCrate<'a> {
pub homepage: Option<&'a str>,
pub documentation: Option<&'a str>,
pub readme: Option<&'a str>,
pub readme_file: Option<&'a str>,
pub repository: Option<&'a str>,
pub max_upload_size: Option<i32>,
pub license: Option<&'a str>,
Expand Down
6 changes: 5 additions & 1 deletion src/krate/publish.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ pub fn publish(req: &mut Request) -> CargoResult<Response> {
homepage: new_crate.homepage.as_ref().map(|s| &**s),
documentation: new_crate.documentation.as_ref().map(|s| &**s),
readme: new_crate.readme.as_ref().map(|s| &**s),
readme_file: new_crate.readme_file.as_ref().map(|s| &**s),
repository: new_crate.repository.as_ref().map(|s| &**s),
license: new_crate.license.as_ref().map(|s| &**s),
max_upload_size: None,
Expand Down Expand Up @@ -124,7 +125,10 @@ pub fn publish(req: &mut Request) -> CargoResult<Response> {

// Render the README for this crate
let readme = match new_crate.readme.as_ref() {
Some(readme) => Some(render::markdown_to_html(&**readme)?),
Some(readme) => Some(render::readme_to_html(
&**readme,
new_crate.readme_file.as_ref().map_or("README.md", |s| &**s),
)?),
None => None,
};

Expand Down
1 change: 1 addition & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ extern crate dotenv;
extern crate flate2;
extern crate git2;
extern crate hex;
extern crate htmlescape;
extern crate lettre;
extern crate license_exprs;
#[macro_use]
Expand Down
63 changes: 54 additions & 9 deletions src/render.rs
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
use ammonia::Builder;
use comrak;
use htmlescape::encode_minimal;

use util::CargoResult;

/// Context for markdown to HTML rendering.
#[allow(missing_debug_implementations)]
pub struct MarkdownRenderer<'a> {
struct MarkdownRenderer<'a> {
html_sanitizer: Builder<'a>,
}

impl<'a> MarkdownRenderer<'a> {
/// Creates a new renderer instance.
pub fn new() -> MarkdownRenderer<'a> {
fn new() -> MarkdownRenderer<'a> {
let tags = [
"a",
"b",
Expand Down Expand Up @@ -106,7 +107,7 @@ impl<'a> MarkdownRenderer<'a> {
}

/// Renders the given markdown to HTML using the current settings.
pub fn to_html(&self, text: &str) -> CargoResult<String> {
fn to_html(&self, text: &str) -> CargoResult<String> {
let options = comrak::ComrakOptions {
ext_autolink: true,
ext_strikethrough: true,
Expand All @@ -127,22 +128,46 @@ impl<'a> Default for MarkdownRenderer<'a> {
}
}

/// Renders a markdown text to sanitized HTML.
/// Renders Markdown text to sanitized HTML.
fn markdown_to_html(text: &str) -> CargoResult<String> {
let renderer = MarkdownRenderer::new();
renderer.to_html(text)
}

/// Any readme with a filename ending in one of these extensions will be rendered as Markdown.
/// Note we also render a readme as Markdown if _no_ extension is on the filename.
static MARKDOWN_EXTENSIONS: [&'static str; 7] = [
".md",
".markdown",
".mdown",
".mdwn",
".mkd",
".mkdn",
".mkdown",
];

/// Renders a readme to sanitized HTML. An appropriate rendering method is chosen depending
/// on the extension of the supplied filename.
///
/// The returned text should not contain any harmful HTML tag or attribute (such as iframe,
/// onclick, onmouseover, etc.).
///
/// # Examples
///
/// ```
/// use render::markdown_to_html;
/// use render::render_to_html;
///
/// let text = "[Rust](https://rust-lang.org/) is an awesome *systems programming* language!";
/// let rendered = markdown_to_html(text)?;
/// let rendered = readme_to_html(text, "README.md")?;
/// ```
pub fn markdown_to_html(text: &str) -> CargoResult<String> {
let renderer = MarkdownRenderer::new();
renderer.to_html(text)
pub fn readme_to_html(text: &str, filename: &str) -> CargoResult<String> {
let filename = filename.to_lowercase();

if !filename.contains('.') || MARKDOWN_EXTENSIONS.iter().any(|e| filename.ends_with(e)) {
return markdown_to_html(text);
}

Ok(encode_minimal(text).replace("\n", "<br>\n"))
}

#[cfg(test)]
Expand Down Expand Up @@ -218,6 +243,26 @@ mod tests {
assert_eq!(result, "<p>Hello World!</p>\n");
}

#[test]
fn readme_to_html_renders_markdown() {
for f in &["README", "readme.md", "README.MARKDOWN", "whatever.mkd"] {
assert_eq!(
readme_to_html("*lobster*", f).unwrap(),
"<p><em>lobster</em></p>\n"
);
}
}

#[test]
fn readme_to_html_renders_other_things() {
for f in &["readme.exe", "readem.org", "blah.adoc"] {
assert_eq!(
readme_to_html("<script>lobster</script>\n\nis my friend\n", f).unwrap(),
"&lt;script&gt;lobster&lt;/script&gt;<br>\n<br>\nis my friend<br>\n"
);
}
}

#[test]
fn header_has_tags() {
let text = "# My crate\n\nHello, world!\n";
Expand Down
6 changes: 6 additions & 0 deletions src/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,12 @@ table! {
///
/// (Automatically generated by Diesel.)
readme -> Nullable<Varchar>,
/// The `readme_file` column of the `crates` table.
///
/// Its SQL type is `Nullable<Varchar>`.
///
/// (Automatically generated by Diesel.)
readme_file -> Nullable<Varchar>,
/// The `textsearchable_index_col` column of the `crates` table.
///
/// Its SQL type is `Tsvector`.
Expand Down
2 changes: 2 additions & 0 deletions src/tests/all.rs
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,7 @@ fn krate(name: &str) -> Crate {
homepage: None,
description: None,
readme: None,
readme_file: None,
license: None,
repository: None,
max_upload_size: None,
Expand Down Expand Up @@ -647,6 +648,7 @@ fn new_req_body(
homepage: krate.homepage,
documentation: krate.documentation,
readme: krate.readme,
readme_file: krate.readme_file,
keywords: Some(u::KeywordList(kws)),
categories: Some(u::CategoryList(cats)),
license: Some("MIT".to_string()),
Expand Down
1 change: 1 addition & 0 deletions src/tests/krate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ fn new_crate(name: &str) -> u::NewCrate {
homepage: None,
documentation: None,
readme: None,
readme_file: None,
keywords: None,
categories: None,
license: Some("MIT".to_string()),
Expand Down
1 change: 1 addition & 0 deletions src/upload.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ pub struct NewCrate {
pub homepage: Option<String>,
pub documentation: Option<String>,
pub readme: Option<String>,
pub readme_file: Option<String>,
pub keywords: Option<KeywordList>,
pub categories: Option<CategoryList>,
pub license: Option<String>,
Expand Down