Description
TextMate seems to lack a means to indicate that a pattern is to be matched only one time, similar to the REGEX ?
quantifier. Without this, it makes some languages difficult to correctly scope.
Take for instance PowerShell, which doesn't really have any reserved keywords. PowerShell does support if
, elseif
and else
keywords for flow control, and like many languages, else is only supported once after an if
statement. However, because keywords are not reserved, elseif
and else
can be reused as command names, and the only factor that controls that, is context. While in the context of an if
statement, elseif
and else
serve as keywords, with else
or anything that is not elseif
terminating the if
context. Additionally, ()
condition group and {}
statement blocks are required for if
and elseif
and {}
is required for else
(but that is not that important since the if
context is already terminated effectively).
# general syntax (not including comments and line breaks)
if (condition) {statement} elseif (condition-elseif) {statement-elseif} else {statement-else}
# demonstration
# `else another-command` is actually another command (native executable or user defined function)
if(1){do-something}else{do-something else}else another-command
To further complicate things, the state at which this statement is reached is unknown, as there are multiple ways to arrive at it. This prevents having a forceful exit strategy where by the grammar can force a specific rule to progress until it is clear to back out of the current stack.
Example
# hashtable using `if` in assignment
@{
key = if (cond) {statement}
key2 = if (cond) {statement} else {statement else}
else = 'const' # else here is valid hashtable literal key name, because `if` context ended above.
key3 = if (condition) {statement}
else {another statement} # `if ` context was not closed in previous line so it continued to this line
}
(Note the grammar that GitHub uses only differentiates the else
hash key above based on the presence of an =
, not on the context of the if
statement.)
I cannot find a means to formulate a TextMate grammer than can properly describe this situation. I think this is due to the lack of a property on each begin
or match
rule such as applyPatternOnce
, which would limit the matching of the pattern to only once in the current stack scope.
else
should only be allowed once perif
context.()
condition should be required but only once perif
orelseif
subcontext.{}
statement block should be required, but only once perif
,elseif
orelse
subcontext, but only after the()
condition forif
andelseif
.
Grammar constructed so far: (only partial file) (if testing, only use empty conditions and empty statement blocks and no comments, as I am not including those subsections as they are not relevant to this issue.)
{
"patterns": [
{
"comment": "else,elseif: only after if,elseif",
"begin": "(?i)(?=if[\\s{(,;&|)}])",
"end": "(?!\\G)",
"patterns": [
{
"include": "#ifStatement"
}
]
}
],
"repository": {
"ifStatement": {
"comment": "else,elseif: only after if,elseif",
"begin": "\\G(?i:(if)|(elseif))(?=[\\s{(,;&|)}])",
"beginCaptures": {
"1": {
"name": "keyword.control.if.powershell"
},
"2": {
"name": "keyword.control.if-elseif.powershell"
}
},
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"include": "#advanceToToken"
},
{
"begin": "(?<![)}])(?=\\()",
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"begin": "\\G\\(",
"beginCaptures": {
"0": {
"name": "punctuation.section.group.begin.powershell"
}
},
"end": "\\)",
"endCaptures": {
"0": {
"name": "punctuation.section.group.end.powershell"
}
},
"name": "meta.if-condition.powershell",
"patterns": [
{
"comment": "`;` not permitted here",
"match": ";",
"name": "invalid.source.powershell"
},
{
"include": "#command_mode"
}
]
},
{
"begin": "(?<=\\))(?=[\\s#]|<#|`\\s|{)",
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"include": "#advanceToToken"
},
{
"begin": "(?<!})(?={)",
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"begin": "\\G\\{",
"beginCaptures": {
"0": {
"name": "punctuation.section.braces.begin.powershell"
}
},
"end": "}",
"endCaptures": {
"0": {
"name": "punctuation.section.braces.end.powershell"
}
},
"name": "meta.statements.if-condition.powershell",
"patterns": [
{
"include": "$self"
}
]
},
{
"begin": "(?<=})(?=[\\s#]|<#|`\\s)",
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"include": "#advanceToToken"
}
]
}
]
}
]
}
]
},
{
"begin": "(?i:else)(?=[\\s{(,;&|)}])",
"beginCaptures": {
"0": {
"name": "keyword.control.if-else.powershell"
}
},
"end": "(?=.|$)",
"applyEndPatternLast": true,
"patterns": [
{
"include": "#advanceToToken"
},
{
"begin": "(?<!}){",
"beginCaptures": {
"0": {
"name": "punctuation.section.braces.begin.powershell"
}
},
"end": "}",
"endCaptures": {
"0": {
"name": "punctuation.section.braces.end.powershell"
}
},
"name": "meta.statements.if-else-condition.powershell",
"patterns": [
{
"include": "$self"
}
]
}
]
},
{
"begin": "(?i)(?=elseif[\\s{(,;&|)}])",
"end": "(?!\\G)",
"patterns": [
{
"include": "#ifStatement"
}
]
}
]
},
"advanceToToken": {
"comment": "consume spaces and comments and line ends until the next token appears",
"begin": "\\G(?=[\\s#]|<#|`\\s)",
"end": "(?!\\s)(?!$)",
"applyEndPatternLast": true,
"patterns": [
{
"comment": "useless escape, and doesn't count as a token",
"match": "`\\s",
"name": "invalid.character.escape.powershell"
},
{
"include": "#commentLine"
},
{
"include": "#commentBlock"
}
]
}
}
}
For reference to this grammar (the complete grammar): https://github.com/msftrncs/PowerShell.tmLanguage/blob/argumentmode_2ndtry/powershell.tmLanguage.json