|
| 1 | +- Start Date: |
| 2 | +- RFC PR #: |
| 3 | +- Rust Issue #: |
| 4 | + |
| 5 | +# Summary |
| 6 | + |
| 7 | +Do not identify struct literals by searching for `:`. Instead define a sub- |
| 8 | +category of expressions which excludes struct literals and re-define `for`, |
| 9 | +`if`, and other expressions which take an expression followed by a block (or |
| 10 | +non-terminal which can be replaced by a block) to take this sub-category, |
| 11 | +instead of all expressions. |
| 12 | + |
| 13 | +# Motivation |
| 14 | + |
| 15 | +Parsing by looking ahead is fragile - it could easily be broken if we allow `:` |
| 16 | +to appear elsewhere in types (e.g., type ascription) or if we change struct |
| 17 | +literals to not require the `:` (e.g., if we allow empty structs to be written |
| 18 | +with braces, or if we allow struct literals to unify field names to local |
| 19 | +variable names, as has been suggested in the past and which we currently do for |
| 20 | +struct literal patterns). We should also be able to give better error messages |
| 21 | +today if users make these mistakes. More worringly, we might come up with some |
| 22 | +language feature in the future which is not predictable now and which breaks |
| 23 | +with the current system. |
| 24 | + |
| 25 | +Hopefully, it is pretty rare to use struct literals in these positions, so there |
| 26 | +should not be much fallout. Any problems can be easily fixed by assigning the |
| 27 | +struct literal into a variable. However, this is a backwards incompatible |
| 28 | +change, so it should block 1.0. |
| 29 | + |
| 30 | +# Detailed design |
| 31 | + |
| 32 | +Here is a simplified version of a subset of Rust's abstract syntax: |
| 33 | + |
| 34 | +``` |
| 35 | +e ::= x |
| 36 | + | e `.` f |
| 37 | + | name `{` (x `:` e)+ `}` |
| 38 | + | block |
| 39 | + | `for` e `in` e block |
| 40 | + | `if` e block (`else` block)? |
| 41 | + | `|` pattern* `|` e |
| 42 | + | ... |
| 43 | +block ::= `{` (e;)* e? `}` |
| 44 | +``` |
| 45 | + |
| 46 | +Parsing this grammar is ambiguous since `x` cannot be distinguished from `name`, |
| 47 | +so `e block` in the for expression is ambiguous with the struct literal |
| 48 | +expression. We currently solve this by using lookahead to find a `:` token in |
| 49 | +the struct literal. |
| 50 | + |
| 51 | +I propose the following adjustment: |
| 52 | + |
| 53 | +``` |
| 54 | +e ::= e' |
| 55 | + | name `{` (x `:` e)+ `}` |
| 56 | + | `|` pattern* `|` e |
| 57 | + | ... |
| 58 | +e' ::= x |
| 59 | + | e `.` f |
| 60 | + | block |
| 61 | + | `for` e `in` e' block |
| 62 | + | `if` e' block (`else` block)? |
| 63 | + | `|` pattern* `|` e' |
| 64 | + | ... |
| 65 | +block ::= `{` (e;)* e? `}` |
| 66 | +``` |
| 67 | + |
| 68 | +`e' is just e without struct literal expressions. We use e' instead of e |
| 69 | +`wherever e is followed directly by block or any other non-terminal which may |
| 70 | +`have block as its first terminal (after any possible expansions). |
| 71 | + |
| 72 | +For any expressions where a sub-expression is the final lexical element |
| 73 | +(closures in the subset above, but also unary and binary operations), we require |
| 74 | +two versions of the meta-expression - the normal one in `e` and a version with |
| 75 | +`e'` for the final element in `e'`. |
| 76 | + |
| 77 | +Implementation would be simpler, we just add a flag to `parser::restriction` |
| 78 | +called `RESTRICT_BLOCK` or something, which puts us into a mode which reflects |
| 79 | +`e'`. We would drop in to this mode when parsing `e'` position expressions and |
| 80 | +drop out of it for all but the last sub-expression of an expression. |
| 81 | + |
| 82 | +# Drawbacks |
| 83 | + |
| 84 | +It makes the formal grammar and parsing a little more complicated (although it |
| 85 | +is simpler in terms of needing less lookahead and avoiding a special case). |
| 86 | + |
| 87 | +# Alternatives |
| 88 | + |
| 89 | +Don't do this. |
| 90 | + |
| 91 | +Allow all expressions but greedily parse non-terminals in these positions, e.g., |
| 92 | +`for N {} {}` would be parsed as `for (N {}) {}`. This seems worse because I |
| 93 | +believe it will be much rarer to have structs in these positions than to have an |
| 94 | +identifier in the first position, followed by two blocks (i.e., parse as `(for N |
| 95 | +{}) {}`). |
| 96 | + |
| 97 | +# Unresolved questions |
| 98 | + |
| 99 | +Do we need to expose this distinction anywhere outside of the parser? E.g., |
| 100 | +macros? |
0 commit comments