Skip to content

Commit 141fd0a

Browse files
committed
Disallow struct literals in ambiguous positions.
Do not identify struct literals by searching for `:`. Instead define a sub-category of expressions which excludes struct literals and re-define `for`, `if`, and other expressions which take an expression followed by a block (or non-terminal which can be replaced by a block) to take this sub-category, instead of all expressions.
1 parent 0fc8bc0 commit 141fd0a

File tree

1 file changed

+100
-0
lines changed

1 file changed

+100
-0
lines changed

0000-struct-grammar.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
- Start Date:
2+
- RFC PR #:
3+
- Rust Issue #:
4+
5+
# Summary
6+
7+
Do not identify struct literals by searching for `:`. Instead define a sub-
8+
category of expressions which excludes struct literals and re-define `for`,
9+
`if`, and other expressions which take an expression followed by a block (or
10+
non-terminal which can be replaced by a block) to take this sub-category,
11+
instead of all expressions.
12+
13+
# Motivation
14+
15+
Parsing by looking ahead is fragile - it could easily be broken if we allow `:`
16+
to appear elsewhere in types (e.g., type ascription) or if we change struct
17+
literals to not require the `:` (e.g., if we allow empty structs to be written
18+
with braces, or if we allow struct literals to unify field names to local
19+
variable names, as has been suggested in the past and which we currently do for
20+
struct literal patterns). We should also be able to give better error messages
21+
today if users make these mistakes. More worringly, we might come up with some
22+
language feature in the future which is not predictable now and which breaks
23+
with the current system.
24+
25+
Hopefully, it is pretty rare to use struct literals in these positions, so there
26+
should not be much fallout. Any problems can be easily fixed by assigning the
27+
struct literal into a variable. However, this is a backwards incompatible
28+
change, so it should block 1.0.
29+
30+
# Detailed design
31+
32+
Here is a simplified version of a subset of Rust's abstract syntax:
33+
34+
```
35+
e ::= x
36+
| e `.` f
37+
| name `{` (x `:` e)+ `}`
38+
| block
39+
| `for` e `in` e block
40+
| `if` e block (`else` block)?
41+
| `|` pattern* `|` e
42+
| ...
43+
block ::= `{` (e;)* e? `}`
44+
```
45+
46+
Parsing this grammar is ambiguous since `x` cannot be distinguished from `name`,
47+
so `e block` in the for expression is ambiguous with the struct literal
48+
expression. We currently solve this by using lookahead to find a `:` token in
49+
the struct literal.
50+
51+
I propose the following adjustment:
52+
53+
```
54+
e ::= e'
55+
| name `{` (x `:` e)+ `}`
56+
| `|` pattern* `|` e
57+
| ...
58+
e' ::= x
59+
| e `.` f
60+
| block
61+
| `for` e `in` e' block
62+
| `if` e' block (`else` block)?
63+
| `|` pattern* `|` e'
64+
| ...
65+
block ::= `{` (e;)* e? `}`
66+
```
67+
68+
`e' is just e without struct literal expressions. We use e' instead of e
69+
`wherever e is followed directly by block or any other non-terminal which may
70+
`have block as its first terminal (after any possible expansions).
71+
72+
For any expressions where a sub-expression is the final lexical element
73+
(closures in the subset above, but also unary and binary operations), we require
74+
two versions of the meta-expression - the normal one in `e` and a version with
75+
`e'` for the final element in `e'`.
76+
77+
Implementation would be simpler, we just add a flag to `parser::restriction`
78+
called `RESTRICT_BLOCK` or something, which puts us into a mode which reflects
79+
`e'`. We would drop in to this mode when parsing `e'` position expressions and
80+
drop out of it for all but the last sub-expression of an expression.
81+
82+
# Drawbacks
83+
84+
It makes the formal grammar and parsing a little more complicated (although it
85+
is simpler in terms of needing less lookahead and avoiding a special case).
86+
87+
# Alternatives
88+
89+
Don't do this.
90+
91+
Allow all expressions but greedily parse non-terminals in these positions, e.g.,
92+
`for N {} {}` would be parsed as `for (N {}) {}`. This seems worse because I
93+
believe it will be much rarer to have structs in these positions than to have an
94+
identifier in the first position, followed by two blocks (i.e., parse as `(for N
95+
{}) {}`).
96+
97+
# Unresolved questions
98+
99+
Do we need to expose this distinction anywhere outside of the parser? E.g.,
100+
macros?

0 commit comments

Comments
 (0)