Skip to content

Rework special link parsing in the post-processing of markup #3354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 27, 2018
Merged

Rework special link parsing in the post-processing of markup #3354

merged 13 commits into from
Feb 27, 2018

Conversation

thehowl
Copy link
Contributor

@thehowl thehowl commented Jan 11, 2018

Fixes #3327. Adds the ability to link emails in markdown, both [email protected], <[email protected]> and [hello](mailto:[email protected]), all of which previously didn't work. Also removed the autolink js library, as the link processing can be very easily done on the backend using a very simple blackfriday option.

Closes #3364. Closes #652.

In the new mechanism, the post processing is done by passing the input through a given set of functions, passed to a new postProcessCtx. This enables the ability to customise the post-processing features to enable, and thus also closes #2968. It also enables in the commit message many other features, such as linking, writing emails, and basically everything PostProcess does except for shortlinks.

Closes #3112 as well.

The only thing that seems to be needed is some unit tests, but this should be on its way for 1.5

@thehowl
Copy link
Contributor Author

thehowl commented Jan 11, 2018

Oh, I guess I kinda forgot about the unit tests... will need to rewrite them

@tboerger tboerger added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 11, 2018
@lafriks
Copy link
Member

lafriks commented Jan 11, 2018

But this will not render commit with just short sha clickable if I'm not wrong

@thehowl thehowl changed the title Email links in markdown; autolinking from backend [WIP] Email links in markdown; autolinking from backend Jan 11, 2018
@thehowl
Copy link
Contributor Author

thehowl commented Jan 11, 2018

But this will not render commit with just short sha clickable if I'm not wrong

It should still do AFAIK? What do you mean?

I've placed [WIP] on the header for the moment because I'm too tired now to iron out the remaining issue, but it's mostly done

@lafriks lafriks added type/refactoring Existing code has been cleaned up. There should be no new functionality. type/enhancement An improvement of existing functionality pr/wip This PR is not ready for review labels Jan 11, 2018
@lafriks
Copy link
Member

lafriks commented Jan 12, 2018

I mean this (taken from tests):
You currently have:
<a href=\"http://localhost:3000/gogits/gogs/commit/b6dd6210eaebc915fd5be5579c58cce4da2e2579\" rel=\"nofollow\">http://localhost:3000/gogits/gogs/commit/b6dd6210eaebc915fd5be5579c58cce4da2e2579</a>
But it it needs to be (text between tag):
<a href=\"http://localhost:3000/gogits/gogs/commit/b6dd6210eaebc915fd5be5579c58cce4da2e2579\" rel=\"nofollow\">b6dd6210ea</a>

@thehowl thehowl changed the title [WIP] Email links in markdown; autolinking from backend [WIP] Rework special link parsing in the post-processing of markup Jan 13, 2018
@codecov-io
Copy link

codecov-io commented Jan 13, 2018

Codecov Report

Merging #3354 into master will increase coverage by 0.35%.
The diff coverage is 89.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3354      +/-   ##
==========================================
+ Coverage   35.79%   36.15%   +0.35%     
==========================================
  Files         285      285              
  Lines       40868    40905      +37     
==========================================
+ Hits        14630    14790     +160     
+ Misses      24068    23943     -125     
- Partials     2170     2172       +2
Impacted Files Coverage Δ
modules/templates/helper.go 47.91% <37.5%> (-1.75%) ⬇️
modules/markup/markup.go 67.79% <40%> (+0.52%) ⬆️
modules/markup/markdown/markdown.go 66.35% <92.3%> (-2.35%) ⬇️
modules/markup/html.go 89.44% <92.43%> (+35.14%) ⬆️
models/repo_list.go 65.62% <0%> (-1.57%) ⬇️
modules/process/manager.go 81.15% <0%> (+4.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 769ab1e...ab78354. Read the comment docs.

@thehowl thehowl changed the title [WIP] Rework special link parsing in the post-processing of markup Rework special link parsing in the post-processing of markup Jan 13, 2018
@thehowl
Copy link
Contributor Author

thehowl commented Jan 13, 2018

Since I got sick of the state of html.go, I rewrote it entirely making it use an HTML tree instead of a tokenizer + a bunch of sprintfs. I took the chance also to rewrite a bunch of stuff. It just so happened that because this enabled much more modularity than before, it also meant that implementing #2968 was pretty trivial.

image

Note that this does not enable markdown on commits.

Reviews welcome.

@lunny
Copy link
Member

lunny commented Jan 14, 2018

@thehowl Thanks for you great PR. I think we have serval kinds of render scenes. Markup File on the tree, Wiki, Issues(comments) and Notification Mail body. Currently, all support markdown syntax and Gitea Postprocesses. Markup File supports other syntax and Gitea Postprocesses. But there are some differences between markup files and wikis. So a set of Post processes could be a group, markup files and wikis could be different group.

@lunny lunny added this to the 1.x.x milestone Jan 14, 2018
@thehowl
Copy link
Contributor Author

thehowl commented Jan 14, 2018

You mean to make this into a different package?

@ypnos
Copy link

ypnos commented Jan 15, 2018

In Gitea 1.3.2, a URL like https://github.com/select2/select2/issues/4505#issuecomment-351836912 is not parsed correctly. The dash and the part after it are not seen as part of the URL.

Is this related to the work being done here?

@thehowl
Copy link
Contributor Author

thehowl commented Jan 15, 2018

@ypnos Slightly related, however the link you posted doesn't look like a case we need to handle seeing as it is an URL for a GitHub issue, and not the local instance. On the local instance, that is something we already handle; see this: https://try.gitea.io/Howl/aaa/issues/1

@ypnos
Copy link

ypnos commented Jan 15, 2018

This is just an example of an external link with a dash in it. So external URLs in the format http://… instead of […](http://…) are not part of this PR then?

@thehowl
Copy link
Contributor Author

thehowl commented Jan 15, 2018

External links in general are a part of this PR, though they were already supported before (using js, that is). They render as a simple link, e.g. https://google.com, instead of https://google.com. Is that what you mean?

@ypnos
Copy link

ypnos commented Jan 15, 2018

Great, so the current behavior is buggy and I wondered if you could check the new code for this. Please have a look at my example comment at https://try.gitea.io/Howl/aaa/issues/1

@thehowl
Copy link
Contributor Author

thehowl commented Jan 15, 2018

Interesting. Then yes, the bug still exists in my code. The way this could be solved is by, instead of using a regex, to use strings.FieldsFunc and find in the resulting array the strings beginning with http:// or https:// and for which url.Parse doesn't return an error. But that seems too much to implement in this PR - which is already pretty big. I suggest you to create an issue, linking to this conversation.

As a workaround, I recommend you use the standard markdown feature, which is that of wrapping the URL in angle brackets (e.g. <https://google.com>).

@ypnos
Copy link

ypnos commented Jan 15, 2018

Thank you.

After some of my own research on regular expressions for matching URLs I conclude that it is not worth the trouble for achieving more proper URL detection, hence I will not file another issue on this.

Btw. I also believe the more complicated approach with url.Parse would not be the end to it, for example it would most probably include a period at the end of the URL, which should in general be excluded.

@thehowl
Copy link
Contributor Author

thehowl commented Feb 20, 2018

rebased onto master. Please review.

Copy link
Member

@lafriks lafriks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than these small issues otherwise it LG-TM

}

// remove initial parts - because Render creates a whole HTML page.
const lenInit = len(`<html><head></head><body>`)
Copy link
Member

@lafriks lafriks Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this instead use find <body> in string and respective closing tag to be more future proof?

@@ -69,7 +69,8 @@ func RenderWiki(filename string, rawBytes []byte, urlPrefix string, metas map[st
func render(parser Parser, rawBytes []byte, urlPrefix string, metas map[string]string, isWiki bool) []byte {
urlPrefix = strings.Replace(urlPrefix, " ", "+", -1)
result := parser.Render(rawBytes, urlPrefix, metas, isWiki)
result = PostProcess(result, urlPrefix, metas, isWiki)
// TODO: one day the error should be checked and returned.
result, _ = PostProcess(result, urlPrefix, metas, isWiki)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error should be at least logged

cleanMsg := template.HTMLEscapeString(msg)
fullMessage := string(markup.RenderIssueIndexPattern([]byte(cleanMsg), opts))
body := strings.Split(strings.TrimSpace(fullMessage), "\n")
fullMessage, _ := markup.RenderCommitMessage([]byte(cleanMsg), urlPrefix, "", metas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log error

msgLines := strings.Split(strings.TrimSpace(fullMessage), "\n")
// we can safely assume that it will not return any error, since there
// shouldn't be any special HTML.
fullMessage, _ := markup.RenderCommitMessage([]byte(cleanMsg), urlPrefix, urlDefault, metas)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, log error

@@ -0,0 +1,378 @@
package markup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license comment

@lafriks lafriks removed the pr/wip This PR is not ready for review label Feb 20, 2018
@lafriks lafriks modified the milestones: 1.x.x, 1.5.0 Feb 20, 2018
@tboerger tboerger added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Feb 20, 2018

"code.gitea.io/gitea/modules/setting"
"code.gitea.io/gitea/modules/util"
"github.com/stretchr/testify/assert"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing empty line between local and external package.

See https://github.com/go-gitea/gitea/blob/master/CONTRIBUTING.md#styleguide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@tboerger tboerger added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Feb 27, 2018
@lafriks lafriks merged commit 535445c into go-gitea:master Feb 27, 2018
@thehowl thehowl mentioned this pull request Mar 3, 2018
7 tasks
@thehowl thehowl deleted the fix-3327 branch April 22, 2018 20:02
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/enhancement An improvement of existing functionality type/refactoring Existing code has been cleaned up. There should be no new functionality.
Projects
None yet
7 participants