Skip to content

TextMate lacks means to match a pattern exactly one time #114

Open
@msftrncs

Description

@msftrncs

TextMate seems to lack a means to indicate that a pattern is to be matched only one time, similar to the REGEX ? quantifier. Without this, it makes some languages difficult to correctly scope.

Take for instance PowerShell, which doesn't really have any reserved keywords. PowerShell does support if, elseif and else keywords for flow control, and like many languages, else is only supported once after an if statement. However, because keywords are not reserved, elseif and else can be reused as command names, and the only factor that controls that, is context. While in the context of an if statement, elseif and else serve as keywords, with else or anything that is not elseif terminating the if context. Additionally, () condition group and {} statement blocks are required for if and elseif and {} is required for else (but that is not that important since the if context is already terminated effectively).

# general syntax (not including comments and line breaks)
if (condition) {statement} elseif (condition-elseif) {statement-elseif} else {statement-else}

# demonstration
# `else another-command` is actually another command (native executable or user defined function)
if(1){do-something}else{do-something else}else another-command

To further complicate things, the state at which this statement is reached is unknown, as there are multiple ways to arrive at it. This prevents having a forceful exit strategy where by the grammar can force a specific rule to progress until it is clear to back out of the current stack.

Example

# hashtable using `if` in assignment
@{
    key = if (cond) {statement}
    key2 = if (cond) {statement} else {statement else}
    else = 'const' # else here is valid hashtable literal key name, because `if` context ended above.
    key3 = if (condition) {statement}
    else {another statement} # `if ` context was not closed in previous line so it continued to this line
}

(Note the grammar that GitHub uses only differentiates the else hash key above based on the presence of an =, not on the context of the if statement.)

I cannot find a means to formulate a TextMate grammer than can properly describe this situation. I think this is due to the lack of a property on each begin or match rule such as applyPatternOnce, which would limit the matching of the pattern to only once in the current stack scope.

  • else should only be allowed once per if context.
  • () condition should be required but only once per if or elseif subcontext.
  • {} statement block should be required, but only once per if, elseif or else subcontext, but only after the () condition for if and elseif.

Grammar constructed so far: (only partial file) (if testing, only use empty conditions and empty statement blocks and no comments, as I am not including those subsections as they are not relevant to this issue.)

{
	"patterns": [
		{
			"comment": "else,elseif: only after if,elseif",
			"begin": "(?i)(?=if[\\s{(,;&|)}])",
			"end": "(?!\\G)",
			"patterns": [
				{
					"include": "#ifStatement"
				}
			]
		}
	],
	"repository": {
		"ifStatement": {
			"comment": "else,elseif: only after if,elseif",
			"begin": "\\G(?i:(if)|(elseif))(?=[\\s{(,;&|)}])",
			"beginCaptures": {
				"1": {
					"name": "keyword.control.if.powershell"
				},
				"2": {
					"name": "keyword.control.if-elseif.powershell"
				}
			},
			"end": "(?=.|$)",
			"applyEndPatternLast": true,
			"patterns": [
				{
					"include": "#advanceToToken"
				},
				{
					"begin": "(?<![)}])(?=\\()",
					"end": "(?=.|$)",
					"applyEndPatternLast": true,
					"patterns": [
						{
							"begin": "\\G\\(",
							"beginCaptures": {
								"0": {
									"name": "punctuation.section.group.begin.powershell"
								}
							},
							"end": "\\)",
							"endCaptures": {
								"0": {
									"name": "punctuation.section.group.end.powershell"
								}
							},
							"name": "meta.if-condition.powershell",
							"patterns": [
								{
									"comment": "`;` not permitted here",
									"match": ";",
									"name": "invalid.source.powershell"
								},
								{
									"include": "#command_mode"
								}
							]
						},
						{
							"begin": "(?<=\\))(?=[\\s#]|<#|`\\s|{)",
							"end": "(?=.|$)",
							"applyEndPatternLast": true,
							"patterns": [
								{
									"include": "#advanceToToken"
								},
								{
									"begin": "(?<!})(?={)",
									"end": "(?=.|$)",
									"applyEndPatternLast": true,
									"patterns": [
										{
											"begin": "\\G\\{",
											"beginCaptures": {
												"0": {
													"name": "punctuation.section.braces.begin.powershell"
												}
											},
											"end": "}",
											"endCaptures": {
												"0": {
													"name": "punctuation.section.braces.end.powershell"
												}
											},
											"name": "meta.statements.if-condition.powershell",
											"patterns": [
												{
													"include": "$self"
												}
											]
										},
										{
											"begin": "(?<=})(?=[\\s#]|<#|`\\s)",
											"end": "(?=.|$)",
											"applyEndPatternLast": true,
											"patterns": [
												{
													"include": "#advanceToToken"
												}
											]
										}
									]
								}
							]
						}
					]
				},
				{
					"begin": "(?i:else)(?=[\\s{(,;&|)}])",
					"beginCaptures": {
						"0": {
							"name": "keyword.control.if-else.powershell"
						}
					},
					"end": "(?=.|$)",
					"applyEndPatternLast": true,
					"patterns": [
						{
							"include": "#advanceToToken"
						},
						{
							"begin": "(?<!}){",
							"beginCaptures": {
								"0": {
									"name": "punctuation.section.braces.begin.powershell"
								}
							},
							"end": "}",
							"endCaptures": {
								"0": {
									"name": "punctuation.section.braces.end.powershell"
								}
							},
							"name": "meta.statements.if-else-condition.powershell",
							"patterns": [
								{
									"include": "$self"
								}
							]
						}
					]
				},
				{
					"begin": "(?i)(?=elseif[\\s{(,;&|)}])",
					"end": "(?!\\G)",
					"patterns": [
						{
							"include": "#ifStatement"
						}
					]
				}
			]
		},
		"advanceToToken": {
			"comment": "consume spaces and comments and line ends until the next token appears",
			"begin": "\\G(?=[\\s#]|<#|`\\s)",
			"end": "(?!\\s)(?!$)",
			"applyEndPatternLast": true,
			"patterns": [
				{
					"comment": "useless escape, and doesn't count as a token",
					"match": "`\\s",
					"name": "invalid.character.escape.powershell"
				},
				{
					"include": "#commentLine"
				},
				{
					"include": "#commentBlock"
				}
			]
		}
	}
}

For reference to this grammar (the complete grammar): https://github.com/msftrncs/PowerShell.tmLanguage/blob/argumentmode_2ndtry/powershell.tmLanguage.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestRequest for new features or functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions