Skip to content

Move to using % as an escape in format! #9161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

alexcrichton
Copy link
Member

I believe that this is almost undeniably better because now to escape '{' it
looks like "%{" instead of "{". Furthermore, in order to insert a '' you only
have to type "" (as one would expect), instead of "\".

At the same time, this also attempts to improve some error messages about
expecting a character and not finding it.

I believe that this is almost undeniably better because now to escape '{' it
looks like "%{" instead of "\\{". Furthermore, in order to insert a '\' you only
have to type "\\" (as one would expect), instead of "\\\\".

At the same time, this also attempts to improve some error messages about
expecting a character and not finding it.
@sanxiyn
Copy link
Member

sanxiyn commented Sep 13, 2013

How is this handled in other languages/libraries?

@jfager
Copy link
Contributor

jfager commented Sep 13, 2013

For escaping, .Net doubles the {, so a literal '{' becomes '{{'. I think the only people this would look weird to are those coming from something like mustache or the django template language.

In Shell, Perl, Groovy, Scala and I'm sure others, ${} indicates variable substitution, and Ruby uses #{}, so someone coming from those languages seeing %{ might guess that was indicating a sub, not escaping, but doesn't seem like an insurmountable surprise.

C, C++, ObjC, D, Go, Python, Haskell, Java, Clojure, PHP, and R are all printf or derived (though Haskell has a .Net-like template package and PHP supports interpolation with $), and JavaScript punts to a billion and a half different templating libraries, so don't have much guidance to offer given we've already put printf syntax behind us.

Ed: for the record, I'm fine w/ %, or whatever, really. I just want something that works and is easy to explain.

@thestinger
Copy link
Contributor

I still don't really understand why format! can't just pass \\ through as \\, \n through as \n, etc.

@alexcrichton
Copy link
Member Author

Sorry about that, I should have put more information in the commit message. Here's the situation we're in:

Right now we're using \ as an escape character. The main reason for doing this is that every programmer instantly recognizes \ as an escape sequence. This does have a serious downside though. The format string parser is not parsing the raw source file, but rather rust's interpretation of a string literal. What this means is that the characters you type pass through two parsers, first the rust parser, then the format string parser. What this means is that to escape the { character, you've got something like this:

"\\{" => rust parser => "\{" => format parser => "{"

And that's not even the worst part. If you want a literal \ character in your output string, you have to format that with four slashes: "\\\\" => rust => "\\" => format => "\". So basically for these reasons, I'm seeing that the \ character as an escape is basically un-usable. @thestinger and I had a discussion on IRC where we came to the conclusion that passing through \\ as \\ probably wouldn't work.

So now that we've ruled out \ as an escape character, let's pick the next most natural thing. Python has a pretty good way of doing this by having {{ become { and vice-versa }. With an excellent example from @huonw on IRC though, this leads to ambiguities. What should this format string become?

{0, select, other{test }} }}

Basically, we support methods where python does not (to support internationalization). What this means is that }} will naturally come up in the grammar as simply closing blocks. Again, sadly, the standard convention is unusable (although this has a better reason for being unusable than \).

So now we're back to square one. The syntaxes that @jfager was mentioning were mostly for interpolation, and this is rather for escaping. I do agree though that #{} may be easy to conflate with %{ when you first see it, but I think some quick usage where you will never see %{} will make you soon realize that % is an escape character. That being said, I'm still not too fond of using % as an escape character, I would much rather use \. In this situation though, % has fewer drawbacks than \ in my mind.

@jfager, @sanxiyn, are you guys ok with that explanation? I wish there were a better solution, but I'm drawing blanks :(

@jfager
Copy link
Contributor

jfager commented Sep 14, 2013

I understand the issue. I only brought up ${} interpolation to make the point about visual similarity. I disagree that people won't ever see %{} - if you're writing out a literal '{', the odds that you're going to write a literal '}' (which wouldn't need escaping, right?) soon after are pretty high - but as I said before I agree that it's a pretty minor speed bump.

Also as I said before, I'm fine w/ %. I was just trying to answer the question about other langs.

@alexcrichton
Copy link
Member Author

With the syntax that I'm proposing, you would always need the % escape, so you would never see %{} as it would generate an error about a mismatched } character. Instead if you wanted to emit the string "{}" you would have to write %{%} which I think is easy enough to see.

@vadimcn
Copy link
Contributor

vadimcn commented Sep 17, 2013

@alexcrichton, can you please explain in more detail what is the problem with "{0, select, other{test }} }}" ? If you want that to be a literal string, you simply double each { and } : "{{0, select, other{{test }}}} }}}}", no?

@alexcrichton
Copy link
Member Author

In the case of {0, select, other{test }} }}, you need to close two brackets. Is the first pair of brackets the closing ones? Or is the second pair the closing one? Which one counts as the double-escape? It's ambiguous which-is-which.

For example, modify it slightly to {0, select, other{test}} }}, then we get:

// first pair is the escape sequence
format!("{0, select, other{test}} }}", "a") // => ~"test} "

// second pair is the escape sequence
format!("{0, select, other{test}} }}", "a") // => ~"test }"

Those two strings are different, yet have the same format string, and we don't know which you specified. Also, if we resort to typing }}}} to escape }, then we're no better off than we were with \\\\ sadly :(

@vadimcn
Copy link
Contributor

vadimcn commented Sep 17, 2013

@alexcrichton: do we have a definitive format! syntax spec somewhere? I must be missing something about what's allowed inside a format string.

@alexcrichton
Copy link
Member Author

We do indeed at http://static.rust-lang.org/doc/std/fmt.html

@vadimcn
Copy link
Contributor

vadimcn commented Sep 17, 2013

I see. I would have simply used a different bracket type for select cases,- just to avoid creating new special characters that must be escaped inside format strings. I suppose that percent itself will now need to be escaped as "%%"? Or is that only inside the argument specifier? i.e. "{0, select, other{test %} %% } }" -> "test } %", but " {1} %" -> "nn %" ?

@alexcrichton
Copy link
Member Author

A lone % would be an error, it has to be matched with another valid escape sequence.

@vadimcn
Copy link
Contributor

vadimcn commented Sep 17, 2013

I mean, you only need it as an escape between { and }. Could allow single % in the rest of the format string... though, admittedly, this could be confusing.

@lilyball
Copy link
Contributor

It seems to be that a simpler solution would be to implement "raw" string literals, which don't process escapes. I'm not seeing any proposals for this at the moment, but I know the idea has come up on IRC before. The benefit of this approach is "raw" string literals have other uses too.

@alexcrichton
Copy link
Member Author

I, too, would like to see a raw-form of string literal, but I also don't consider this one use case as enough impetus for adding an entirely new string literal format to the language.

@lilyball
Copy link
Contributor

@alexcrichton Regular expressions and Windows file paths are two more places that could benefit from raw string literals.

@alexcrichton
Copy link
Member Author

Good point, I'm also writing rustdoc_web in rust and there's some huge HTML string literal which would greatly benefit from another form of string literal.

@lilyball
Copy link
Contributor

I've sent an email about raw string literals to the ML

@alexcrichton
Copy link
Member Author

I'm putting this on the backburner until the raw string discussion is resolved. I would rather not do this, and raw strings would mean that this would not necessarily have to be done.

I'll reopen if raw strings don't become a thing.

flip1995 pushed a commit to flip1995/rust that referenced this pull request Jul 18, 2022
…ring_to_pedantic, r=flip1995

Move format_push_string to restriction

Fixes rust-lang#9077 (kinda) by moving the lint to the restriction group. As I noted in that issue, I think the suggested change is too much and as the OP of the issue points out, the ramifications of the change are not necessarily easily understood. As such I don't think the lint should be enabled by default.

changelog: [`format_push_string`]: moved to restriction (see rust-lang#9077).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants