Skip to content

Wiki pages are stored as/converted to CRLF line endings #17541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
plgruener opened this issue Nov 3, 2021 · 21 comments
Open

Wiki pages are stored as/converted to CRLF line endings #17541

plgruener opened this issue Nov 3, 2021 · 21 comments
Labels

Comments

@plgruener
Copy link

Gitea Version

1.16.0+dev-455-ga5bcf1994

Git Version

No response

Operating System

No response

How are you running Gitea?

Tried in https://try.gitea.io/plgruener/wikitest/wiki/_pages

Database

No response

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Description

Any wiki page that is created in the webeditor (wiki/_new) is stored as file with (Windows-style) CRLF line endings, not LF (Unix-style).
This also means every wiki-page that is created locally with LF line endings (eg under Linux or MacOS) and then pushed is silently converted to CRLF, either on push itself or when that page is edited in the webeditor by another person. Since that LF->CRLF conversion changes every line, it makes a diff essentially useless.

Screenshots

No response

@plgruener plgruener changed the title Wiki pages are stored at/converted to CRLF line endings Wiki pages are stored as/converted to CRLF line endings Nov 3, 2021
@wxiaoguang
Copy link
Contributor

wxiaoguang commented Nov 4, 2021

In my opinion:

It is a browser behavior, I think Gitea (Web UI) should not touch it: https://github.com/whatwg/html/issues/6647

Nowadays, all modern applications in all OS can handle CRLF and LF correctly, so it won't be a problem.

The diff can ignore spaces: https://stackoverflow.com/questions/40974170/how-can-i-ignore-line-endings-when-comparing-files

If you edit files locally, git respects settings like core.autocrlf or .gitattributes to set EOL.

If these methods are not enough, then maybe we need to think about a plan to cover all cases, maybe Gitea can set settings in .gitattributes for wiki pages (I am not sure about details).

@ranvis
Copy link

ranvis commented Jan 30, 2024

What autocrlf = true does makes things worse. *.md files on working directory are in CRLF, but committing blob will be normalized to LF because files are now identified as text. Every pages are marked as changed on local clone because of this normalization. You cannot pull --ff-only anymore.

maybe Gitea can set settings in .gitattributes for wiki pages (I am not sure about details).

I think the --path option to git-hash-object --stdin should be enough as a start. Users can add .gitattributes on their own.

# Windows command prompt
> git config --list | cat
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
> dos2unix lf.md
> cp lf.md crlf.md
> unix2dos crlf.md
> git hash-object crlf.md lf.md
f60ee7132b2e348828dfff5a402c5af9cae7be6f
6ad9ec4280d83f69f748d4f599226bd4d13e7cf6
> echo *.md text > .gitattributes
> git hash-object crlf.md
6ad9ec4280d83f69f748d4f599226bd4d13e7cf6  # Git normalizes CRLF to LF
> git hash-object --stdin < crlf.md
f60ee7132b2e348828dfff5a402c5af9cae7be6f  # Git does not normalize, because of unnamed stdin input
> git hash-object --stdin --path crlf.md < crlf.md
6ad9ec4280d83f69f748d4f599226bd4d13e7cf6  # Git normalizes CRLF to LF again, thanks to --path

@wxiaoguang
Copy link
Contributor

Related to this behavior: Browsers always use "CRLF" for new lines when a textarea is submitted. So, Gitea need to do extra converting before it really writes the content into the repo.

#28119 (comment)

@ranvis
Copy link

ranvis commented Jan 31, 2024

@wxiaoguang
I appreciate if Gitea's editor views have a EOL option or something like most offline editors do.
Additionally, Gitea could have core.autocrlf per-repo setting for in-server updates.

Yet, compared to those things, support of the --path option for .gitattribute could be simpler.
This fit better with #17496 though.

@plgruener
Copy link
Author

@wxiaoguang

In my opinion:
It is a browser behavior, I think Gitea (Web UI) should not touch it

Browsers always use "CRLF" for new lines when a textarea is submitted. So, Gitea need to do extra converting before it really writes the content into the repo.

If you edit files locally, git respects settings like core.autocrlf or .gitattributes to set EOL.

Exactly, Gitea should not touch it and silently convert my files.

Honestly I don't care which settings the textarea in the browser uses. If it always uses CRLF, then the web editor should behave like any other Windows user and make use of the core.autocrlf=true setting to transparently convert the file from the index to CRLF, edit it in the browser with CRLF, and then when checking-in/committing convert that back into LF endings again.
Git already supports all of this functionality, you just have to use it?

When editing files via the webeditor in a normal (non-wiki-) repo, (almost) everything behaves as it should and LF files are not converted. So I really don't understand the problem: why can it apparently be done for the README.md in my repo, but not the Home.md in the wiki?

(I also tried how Github handles this issues, here Wiki files with LF endings are not converted.)

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Apr 27, 2024

Let me share more information: according to HTML standard, the "newline/EOL" is always "CRLF" in HTML's textarea. So Gitea backend could always get "CRLF" from a textarea.

When editing files via the webeditor in a normal (non-wiki-) repo, (almost) everything behaves as it should and LF files are not converted.

Because "LF" is hard-coded in non-wiki backend code, all CRLF are replaced by LF in backend for these files (well, Windows users might feel unhappy about this behavior ....)


I also agree that it should do the best to make EOL correct. So the solutions could be:

  1. Use an advanced frontend editor, make the editor submit correct LF/CRLF bytes to backend
  2. Make backend "auto detect" the existing file's EOL. If the existing file uses LF, then replace all EOL to LF, the same to CRLF.

Or, as a quick fix: always use "LF" for wiki files, too, just like these non-wiki files.

@wxiaoguang
Copy link
Contributor

ps: original wiki code wasn't written by me, neither the editor/upload code, I just happen to know the details 🤣 if I would have enough time I could also look into the problem and/or try to improve it, but I can't promise at the moment.

@plgruener
Copy link
Author

Thank you for the info.

Because "LF" is hard-coded in non-wiki backend code, all CRLF are replaced by LF in backend for these files (well, Windows users might feel unhappy about this behavior ....)

I hadn't even noticed that yet, but yeah, that's suboptimal as well.

I cannot argue which solution would be better – both have to do an "auto detect", so it probably doesn't matter if you do it in the front- or backend. 2. is more closely what git itself does, and it's more robust if you ever decide to switch frontend editors (or offer multiple editors for the user to chose from).

Or, as a quick fix: always use "LF" for wiki files, too, just like these non-wiki files.

Yes, I agree it should at least be consistent (else I have to always configure my wiki-repos different than the normal ones, very confusing).

@vn971
Copy link

vn971 commented May 25, 2025

What is the current recommended approach for this?
Do I understand it correctly that creating

.gitattributes

*.md eol=crlf

Will make Markdown files be CRLF-terminated on local edits, and thus compatible with current gitea's web edits, thus preventing whole-page-diffs? To me this looks like the sanest approach for now, but I'm open to opinions.

EDIT: made the wording clearer.

@ranvis
Copy link

ranvis commented May 26, 2025

@vn971
.gitattributes currently does not take effect in this context, because Gitea uses git hash-object --stdin without the --path option when creating blobs. Without the path, Git doesn't know the filename or extension, so it can't apply attributes like eol=crlf.

Relevant code:

func (repo *Repository) hashObject(reader io.Reader, save bool) (string, error) {
var cmd *Command
if save {
cmd = NewCommand("hash-object", "-w", "--stdin")
} else {
cmd = NewCommand("hash-object", "--stdin")
}

func (repo *Repository) HashObject(reader io.Reader) (ObjectID, error) {
idStr, err := repo.hashObject(reader, true)

objectHash, err := gitRepo.HashObject(strings.NewReader(content))

Adding the --path option when hashing content would allow .gitattributes to work as expected.

@vn971
Copy link

vn971 commented May 26, 2025

@ranvis thanks for the input! To clarify on the .gitattributes usage: it's not to fix gitea, it's to force local clients produce the same stuff that the web editor + gitea would produce, with the intention to eliminate whole-page diffs this way. I.e. it's intended as a work-around, not as a solution to fix this issue.
Does it make sense this way?

What you're writing looks like something very useful for anyone who'd address this in gitea though

@ranvis
Copy link

ranvis commented May 26, 2025

@vn971
Thanks for the clarification. I get your intention now.
The key point is that eol=crlf in .gitattributes only affects the working tree (i.e., how files are checked out locally). Local Git still stores blobs with LF endings regardless.

So even if local clients edit files in CRLF (thanks to eol=crlf), the committed content will still be LF, meaning it won't match the CRLF content that Gitea's web editor commits directly from textarea input.

@vn971
Copy link

vn971 commented May 26, 2025

@ranvis I cannot confirm this in my tests yet. When I mix-match local edits with web editor edits in a test repo, all the diff's affect a single line only. Here's how I tested (4 commits): https://gitea.com/vas/test/commits/branch/main Do you have another way to test that would show the problem?

@ranvis
Copy link

ranvis commented May 26, 2025

@vn971
Thanks for sharing your test case. I cloned your repo and reproduced a similar test using the Gitea wiki. Here's what I did:

  1. Cloned the wiki repo locally.
  2. Made a small edit via the web wiki UI (added one line).
  3. Pulled locally, added another line, committed and pushed.

After that, I observed that the diff on Gitea shows all lines as changed, not just the edited ones, on both Gitea and local.

Edit: Step 3 (right side of the image) was affected by my global attributes config (*.md text). See later comment for details

Image

@ranvis
Copy link

ranvis commented May 26, 2025

@vn971 Just to make sure: in the Gitea diff UI, please check if the whitespace option (third icon from the right) is set to "Show all changes."
Otherwise, changes due to EOL might be hidden.

@vn971
Copy link

vn971 commented May 26, 2025

@ranvis With regards to Gitea diff UI, do you see all of the lines changed in the earlier commits in my or your repo? E.g. this commit https://gitea.com/vas/test/commit/3b2b0f1dd0110c8ad242775bb44f6e113e5f7165

If in the above commit you see 1 changed line only, it would mean that the difference is rather in our local git configuration. I'm asking because I already have "Whitespace" > "Show all changes" in Gitea's UI. In my local git config, there's nothing that looks like it'd affect the commits. I've also made a separate test to make sure I'm not messing things up with my local config, so I've made another commit with no git config at all: https://gitea.com/vas/test/commit/f0a567367caf64d0ac2d9002b459c3d90cebfa66

Could it be that you have eol-related or crlf-related stuff in your local config? This could be useful to know for future debugging

@plgruener
Copy link
Author

plgruener commented May 26, 2025

@vn971 @ranvis

It's true, Git can only do automatic line endings conversion (or "normalization" as it's called) for [CRLF in worktree]<-->[LF in index], but not [LF in worktree]<-->[CRLF in index].
Most text editors on Linux and Mac use LF by default when creating new files, so that won't solve the issue.

Also, setting eol=crlf actually forces the normalization. It's possible to commit a file with crlf endings to the repo, but only if the eol attribute is not set, and if core.autocrlf=false. (You should explicitly set that, because it has different defaults on Windows than Linux/MacOS.)

If you want to check the line endings of your local files:

git ls-files --eol

This displays both the line ending in the worktree and in the repo.
(Or use a tool like xxd or hexdump: 0a is LF, 0d0a is CRLF.)

If you want to have LF-normalization (so CRLF committed to the repo, the reverse of what autocrlf does), you have to manually set up clean- and smudge filters, eg. like so:

[filter "eol-conversion"]
    clean  = unix2dos
    smudge = dos2unix

and apply that filter to all .md files. (both commands are included in the Dos2unix package). Don't forget to set core.autocrlf=false, too.

edit: leave out the smudge-filter if you don't want LF on Windows. Maybe using only clean=unix2dos in combination with an autocrlf setting works, but that's not tested yet.

@ranvis
Copy link

ranvis commented May 26, 2025

@vn971 You're right. In your repo's commits, I do not see whole files changed in Gitea UI with "show all changes" enabled, or local git diff.

It turns out I had this (text) in my global attributes file (C:/Users/~/.config/git/attributes):

*.md text diff=markdown

That likely caused the CRLFs to be normalized on my side. Sorry for the confusion; step 3 in my comment was affected by this config.

I still don’t know why commits created by Gitea differ, and your commits don't contain CR characters either.
I used the following command to check line endings:

git show xxxxxxx:from_local.md | hexdump -C | grep -w --color "0[ad]"

I'll take a look at the filter proposed by @plgruener when I have more time. Thanks.

@vn971
Copy link

vn971 commented May 26, 2025

EDIT: this got somewhat outdated, better read @plgruener's reply below.

@plgruener to be honest, I don't fully understand which problem are you addressing in your comment?

My current take is simple:

  • The gitea issue is still valid. This is because, currently, web editor forces the use of CRLF, which conflicts with .gitattributes and overall works very different from local clients.
  • I'm claiming that setting .gitattributes to CRLF prevents whole-page-diffs at least, which is a reasonable work-around until the issue is solved

I've used kate and nano text editors to create and modify the files, which are pretty basic and standard aren't they? I have no local configuration for *.md files or core.autocrlf. No git hooks or external commands were needed to create this commit history.

Does this take make sense for you?

P.S. For transparency, I'm not claiming superior knowledge here. In fact, when I inspect local objects with e.g. git cat-file --batch-check --batch-all-objects and git cat-file -p 2b6fc1b7d6e0e03a9f8ee4664e9c1e47f1d7f590 | hexdump -C, I see usage of both 0a 0a and singular 0a, which I fail to fully understand or explain. (0a 0d is nowhere to be seen in the blobs at all BTW.) But the git history is clean at least, so while some conversions are happening, they lead to the same results locally and in gitea's current webeditor,... hence my desire to call it a work-around.

@plgruener
Copy link
Author

plgruener commented May 26, 2025

@vn971 This issue (as per the title) is specifically about Gitea Wiki pages.
You were testing this in a "normal" repo, but the webeditor for non-wiki-repos and for wiki-repos has a different behaviour (the first forces LF, the second forces CRLF), see this comment above.
Replicate your test in the "Wiki" tab, and you will see different results.
Also I advise again to test with git ls-files --eol, because it will display both the status in the worktree and the index at once.

My previous comment with the clean filter details a working workaround with automatic conversion: regardless of whether local files are made with LF or CRLF, they will be converted to CRLF on check-in (and thus not clash with the wiki-webeditor).
Simply setting eol=crlf does not work, you'd have gotten pretty much the same result in your test in the normal non-wiki repo if you had not set eol=… at all.

@vn971
Copy link

vn971 commented May 26, 2025

( @plgruener confirming you're right. The same steps that I did on a "normal" repo fail for the wiki https://gitea.com/vas/code/wiki/from_webeditor.md.-?action=_revision )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants