-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Spelling correction fixes should not be case-agnostic for two equally weighed options #39060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…y weighed options occur. fixes microsoft#17219
Co-authored-by: Daniel Rosenwasser <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of problems with this PR.
- I'm not sure the bug occurs in real suggestions. When it was opened, I don't think suggestions were filtered by meaning, but now,
Console
isn't suggested because it's a type. The camelValue/PascalType distinction is standard in JS so the current algorithm will usually get the right answer. - Instead of tweaking the brittle heuristics to make them case-aware, I'd rather start with Levenshtein distance being case-aware. I guess that a good way to start is to make a difference in case between otherwise-identical characters set the substitution distance to 0.1. So for substituting:
a, a
,dist=0
a, A
,dist=0.1
a, B
,dist=2
However, the first step is to address (1). We need examples as common and clearly wrong as the original bug's Console
suggestion for conole
.
As for (2), it's still possible that tweaking heuristics is the way to go, but I'd like to try taking case into account for Levenshtein distance first.
And maybe we can use Damerau–Levenshtein distance so that results are more intuitive.
|
Has it been that long since you used a class @sandersn? 😄 let node = new Node();
nodes I think this is still legitimate and worth fixing. |
I'd also say one of the reasons this is "rarer" is because #40169 was never done. |
|
@Zzzen do you want to keep working on this? |
yeah. I am not sure what to do next though |
|
👌 got it. |
ping @sandersn @sheetalkamat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of suggestions, plus some more questions.
src/compiler/core.ts
Outdated
@@ -1941,41 +1939,28 @@ namespace ts { | |||
*/ | |||
export function getSpellingSuggestion<T>(name: string, candidates: T[], getName: (candidate: T) => string | undefined): T | undefined { | |||
const maximumLengthDifference = Math.min(2, Math.floor(name.length * 0.34)); | |||
let bestDistance = Math.floor(name.length * 0.4) + 1; // If the best result isn't better than this, don't bother. | |||
let bestDistance = name.length * 0.4 + 0.1; // If the best result is worse than this, don't bother. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this? Reducing the threshold will throw out some otherwise-good suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing Math.floor
is a more accurate implementation of Whose levenshtein distance is more than 0.4 of the length of the name
. No test is broken by this, so I guess it should be acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this in a followup change just like the Math.min/max one? I want this PR to be the minimal change so that future-me has less to read 2 years from now, next time we change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
src/compiler/core.ts
Outdated
@@ -2022,7 +2011,7 @@ namespace ts { | |||
} | |||
|
|||
const res = previous[s2.length]; | |||
return res > max ? undefined : res; | |||
return res >= max ? undefined : res; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is the reason bestDistance - 1
turned into bestDistance
? Is it also need to support the switch to Math.max and Math.min, or some other reason, like the switch to non-integral values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😏 Actually, it's the result of bestDistance - 1
turned into bestDistance
. bestDistance - 1
can be replaced with bestDistance - 0.1
, but it is very fragile since the minimum of distance could be changed to other values like 0.05
in the future.
|
||
for (let i = 0; i <= s2.length; i++) { | ||
previous[i] = i; | ||
} | ||
|
||
for (let i = 1; i <= s1.length; i++) { | ||
const c1 = s1.charCodeAt(i - 1); | ||
const minJ = i > max ? i - max : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand the need for Math.ceil and Math.floor, but why change to Math.max and Math.min?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Math.max/min
is more readable for me. Anyway, reverted now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, makes sense. After this PR is in, maybe you can re-submit that change separately where it would be obvious that Math.max/min are equivalent. I thought it was more readable too, but I prefer to reduce code change per PR in complicated functions like this.
current[0] = i; | ||
/** Smallest value of the matrix in the ith column. */ | ||
let colMin = i; | ||
for (let j = 1; j < minJ; j++) { | ||
current[j] = big; | ||
} | ||
for (let j = minJ; j <= maxJ; j++) { | ||
// case difference should be insignificant. | ||
const substitutionDistance = s1[i - 1].toLowerCase() === s2[j-1].toLowerCase() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to cache the lowercase versions of s1
and s2
and index into those instead?
This piece of code needs a microbenchmark, since I have no idea. I'm not sure how to do that off the top of my head though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about extracting all symbols from some mostly used type definitions and then apply getSpellingSuggestion
randomly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? One way to stress the system would be to have a scope that fills in all possible neighbours, so that the algorithm has to run on lots of candidates, all under the maximum distance cutoff.
I'm going to run the perf suite on this too, since all those projects compile with lots of errors now -- maybe enough of them are name resolution errors to notice a slowdown.
Co-authored-by: Nathan Shively-Sanders <[email protected]>
@typescript-bot perf test this If the perf test comes back good, this is ready to go. I'd still like a synthetic benchmark before 4.2 releases in January, but it's not urgent. I'm also going to do a before/after comparison of changes with the scope size limiter turned off, to see what changes show up in suggestions from the global scope. Edit: Disabling the scope size limiter shows no changes with respect to master. |
Heya @sandersn, I've started to run the perf test suite on this PR at 99a5074. You can monitor the build here. Update: The results are in! |
@sandersn Here they are:Comparison Report - master..39060
System
Hosts
Scenarios
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance differences are just noise.
Thanks for your patience and perseverance on this @Zzzen!
…y weighed options occur.
fixes #17219