@@ -90,44 +90,122 @@ Notation
90
90
91
91
.. index :: BNF, grammar, syntax, notation
92
92
93
- The descriptions of lexical analysis and syntax use a modified
94
- `Backus–Naur form (BNF) <https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form >`_ grammar
95
- notation. This uses the following style of definition:
96
-
97
- .. productionlist :: notation
98
- name: `lc_letter ` (`lc_letter ` | "_")*
99
- lc_letter: "a"..."z"
100
-
101
- The first line says that a ``name `` is an ``lc_letter `` followed by a sequence
102
- of zero or more ``lc_letter ``\ s and underscores. An ``lc_letter `` in turn is
103
- any of the single characters ``'a' `` through ``'z' ``. (This rule is actually
104
- adhered to for the names defined in lexical and grammar rules in this document.)
105
-
106
- Each rule begins with a name (which is the name defined by the rule) and
107
- ``::= ``. A vertical bar (``| ``) is used to separate alternatives; it is the
108
- least binding operator in this notation. A star (``* ``) means zero or more
109
- repetitions of the preceding item; likewise, a plus (``+ ``) means one or more
110
- repetitions, and a phrase enclosed in square brackets (``[ ] ``) means zero or
111
- one occurrences (in other words, the enclosed phrase is optional). The ``* ``
112
- and ``+ `` operators bind as tightly as possible; parentheses are used for
113
- grouping. Literal strings are enclosed in quotes. White space is only
114
- meaningful to separate tokens. Rules are normally contained on a single line;
115
- rules with many alternatives may be formatted alternatively with each line after
116
- the first beginning with a vertical bar.
117
-
118
- .. index :: lexical definitions, ASCII
119
-
120
- In lexical definitions (as the example above), two more conventions are used:
121
- Two literal characters separated by three dots mean a choice of any single
122
- character in the given (inclusive) range of ASCII characters. A phrase between
123
- angular brackets (``<...> ``) gives an informal description of the symbol
124
- defined; e.g., this could be used to describe the notion of 'control character'
125
- if needed.
126
-
127
- Even though the notation used is almost the same, there is a big difference
128
- between the meaning of lexical and syntactic definitions: a lexical definition
129
- operates on the individual characters of the input source, while a syntax
130
- definition operates on the stream of tokens generated by the lexical analysis.
131
- All uses of BNF in the next chapter ("Lexical Analysis") are lexical
132
- definitions; uses in subsequent chapters are syntactic definitions.
133
-
93
+ The descriptions of lexical analysis and syntax use a grammar notation that
94
+ is a mixture of
95
+ `EBNF <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form >`_
96
+ and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar >`_.
97
+ For example:
98
+
99
+ .. grammar-snippet ::
100
+ :group: notation
101
+
102
+ name: `letter ` (`letter ` | `digit ` | "_")*
103
+ letter: "a"..."z" | "A"..."Z"
104
+ digit: "0"..."9"
105
+
106
+ In this example, the first line says that a ``name `` is a ``letter `` followed
107
+ by a sequence of zero or more ``letter ``\ s, ``digit ``\ s, and underscores.
108
+ A ``letter `` in turn is any of the single characters ``'a' `` through
109
+ ``'z' `` and ``A `` through ``Z ``; a ``digit `` is a single character from ``0 ``
110
+ to ``9 ``.
111
+
112
+ Each rule begins with a name (which identifies the rule that's being defined)
113
+ followed by a colon, ``: ``.
114
+ The definition to the right of the colon uses the following syntax elements:
115
+
116
+ * ``name ``: A name refers to another rule.
117
+ Where possible, it is a link to the rule's definition.
118
+
119
+ * ``TOKEN ``: An uppercase name refers to a :term: `token `.
120
+ For the purposes of grammar definitions, tokens are the same as rules.
121
+
122
+ * ``"text" ``, ``'text' ``: Text in single or double quotes must match literally
123
+ (without the quotes). The type of quote is chosen according to the meaning
124
+ of ``text ``:
125
+
126
+ * ``'if' ``: A name in single quotes denotes a :ref: `keyword <keywords >`.
127
+ * ``"case" ``: A name in double quotes denotes a
128
+ :ref: `soft-keyword <soft-keywords >`.
129
+ * ``'@' ``: A non-letter symbol in single quotes denotes an
130
+ :py:data: `~token.OP ` token, that is, a :ref: `delimiter <delimiters >` or
131
+ :ref: `operator <operators >`.
132
+
133
+ * ``e1 e2 ``: Items separated only by whitespace denote a sequence.
134
+ Here, ``e1 `` must be followed by ``e2 ``.
135
+ * ``e1 | e2 ``: A vertical bar is used to separate alternatives.
136
+ It denotes PEG's "ordered choice": if ``e1 `` matches, ``e2 `` is
137
+ not considered.
138
+ In traditional PEG grammars, this is written as a slash, ``/ ``, rather than
139
+ a vertical bar.
140
+ See :pep: `617 ` for more background and details.
141
+ * ``e* ``: A star means zero or more repetitions of the preceding item.
142
+ * ``e+ ``: Likewise, a plus means one or more repetitions.
143
+ * ``[e] ``: A phrase enclosed in square brackets means zero or
144
+ one occurrences. In other words, the enclosed phrase is optional.
145
+ * ``e? ``: A question mark has exactly the same meaning as square brackets:
146
+ the preceding item is optional.
147
+ * ``(e) ``: Parentheses are used for grouping.
148
+ * ``"a"..."z" ``: Two literal characters separated by three dots mean a choice
149
+ of any single character in the given (inclusive) range of ASCII characters.
150
+ This notation is only used in
151
+ :ref: `lexical definitions <notation-lexical-vs-syntactic >`.
152
+ * ``<...> ``: A phrase between angular brackets gives an informal description
153
+ of the matched symbol (for example, ``<any ASCII character except "\"> ``),
154
+ or an abbreviation that is defined in nearby text (for example, ``<Lu> ``).
155
+ This notation is only used in
156
+ :ref: `lexical definitions <notation-lexical-vs-syntactic >`.
157
+
158
+ The unary operators (``* ``, ``+ ``, ``? ``) bind as tightly as possible;
159
+ the vertical bar (``| ``) binds most loosely.
160
+
161
+ White space is only meaningful to separate tokens.
162
+
163
+ Rules are normally contained on a single line, but rules that are too long
164
+ may be wrapped:
165
+
166
+ .. grammar-snippet ::
167
+ :group: notation
168
+
169
+ literal: stringliteral | bytesliteral
170
+ | integer | floatnumber | imagnumber
171
+
172
+ Alternatively, rules may be formatted with the first line ending at the colon,
173
+ and each alternative beginning with a vertical bar on a new line.
174
+ For example:
175
+
176
+
177
+ .. grammar-snippet ::
178
+ :group: notation-alt
179
+
180
+ literal:
181
+ | stringliteral
182
+ | bytesliteral
183
+ | integer
184
+ | floatnumber
185
+ | imagnumber
186
+
187
+ This does *not * mean that there is an empty first alternative.
188
+
189
+ .. index :: lexical definitions
190
+
191
+ .. _notation-lexical-vs-syntactic :
192
+
193
+ Lexical and Syntactic definitions
194
+ ---------------------------------
195
+
196
+ There is some difference between *lexical * and *syntactic * analysis:
197
+ the :term: `lexical analyzer ` operates on the individual characters of the
198
+ input source, while the *parser * (syntactic analyzer) operates on the stream
199
+ of :term: `tokens <token> ` generated by the lexical analysis.
200
+ However, in some cases the exact boundary between the two phases is a
201
+ CPython implementation detail.
202
+
203
+ The practical difference between the two is that in *lexical * definitions,
204
+ all whitespace is significant.
205
+ The lexical analyzer :ref: `discards <whitespace >` all whitespace that is not
206
+ converted to tokens like :data: `token.INDENT ` or :data: `~token.NEWLINE `.
207
+ *Syntactic * definitions then use these tokens, rather than source characters.
208
+
209
+ This documentation uses the same BNF grammar for both styles of definitions.
210
+ All uses of BNF in the next chapter (:ref: `lexical `) are lexical definitions;
211
+ uses in subsequent chapters are syntactic definitions.
0 commit comments