bpo-38530: Refactor AttributeError suggestions #25776

sweeneyde · 2021-05-01T09:59:30Z

Make case-swaps half the cost of any other edit
Refactor Levenshtein code to not use memory allocator, and to bail early on no match.
Add comments to Levenshtein distance code
Add test cases for Levenshtein distance behind a debug macro
Set threshold to (name_size + item_size + 3) * MOVE_COST / 6.
- Reasoning: similar to difflib.SequenceMatcher.ratio() >= 2/3:

"Multiset Jaccard similarity" >= 2/3
matching letters / total letters >= 2/3
(name_size - distance + item_size - distance) / (name_size + item_size) >= 2/3
1 - (2*distance) / (name_size + item_size) >= 2/3
1/3 >= (2*distance) / (name_size + item_size)
(name_size + item_size) / 6 >= distance
With rounding:
(name_size + item_size + 3) // 6 >= distance

https://bugs.python.org/issue38530

pablogsal · 2021-05-01T19:20:53Z

Python/suggestions.c

+    assert(d2 == d4);
+}
+
+static void


Pleas remove these functions as we don't normally tests things this way (even with the macros).

I would recommend to either adapt this tests to sole NameError example's in test_exceptions or maybe expose them so they can be used in the _testcapi module.

bedevere-bot · 2021-05-01T19:21:16Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

pablogsal · 2021-05-01T19:22:43Z

Python/suggestions.c

 static inline PyObject *
 calculate_suggestions(PyObject *dir,
-                      PyObject *name) {
+                      PyObject *name)
+{


Ditto here: please remove the tests or move them to the test suite

pablogsal · 2021-05-01T19:23:33Z

Python/suggestions.c

@@ -90,38 +197,39 @@ calculate_suggestions(PyObject *dir,
    if (name_str == NULL) {
        return NULL;
    }
+    size_t *work_buffer = PyMem_Calloc(name_size, sizeof(size_t));


I think that as we work with max length strings we can statically allocate a fixed array here, which simplifies the code a lot

pablogsal · 2021-05-01T19:58:08Z

Python/suggestions.c

+static inline int
+substitution_cost(char a, char b)
+{
+    if ((a & 31) != (b & 31)) {


Please, encapsule the 31 in a macro or a static constant for readability

pablogsal · 2021-05-01T20:39:05Z

@sweeneyde No rush if you don't have the bandwidth to do it but if we get this ready before Monday we can get it into beta 1 :)

sweeneyde · 2021-05-02T04:20:02Z

I tried modifying MAX_CANDIDATE_ITEMS and MAX_STRING_SIZE to be much larger, and I got these benchmark results:

----- Before this PR -----

STRING_LENGTH = 50
DIR_LENGTH = 8000
0.05950444 0.05835115 0.05945749 0.05820514 0.05914853

STRING_LENGTH = 25
DIR_LENGTH = 8000
0.01657816 0.01678682 0.01645611 0.01621476 0.01597858


----- After this PR -----

STRING_LENGTH = 50
DIR_LENGTH = 8000
0.01344847 0.02069782 0.02076828 0.01686316 0.02023981

STRING_LENGTH = 25
DIR_LENGTH = 8000
0.00470112 0.00391163 0.00810734 0.00343608 0.00528902

Benchmark script:

import random
from test.support import captured_stderr
import sys
from time import perf_counter

STRING_LENGTH = 50
DIR_LENGTH = 8000

def rand_string(n):
    return ''.join(random.choices("abc", k=n))

def bench(repeat=100):
    rand_strings = [rand_string(STRING_LENGTH)
                    for _ in range(DIR_LENGTH)]

    class Class:
        locals().update({h: h for h in rand_strings})

    attr = rand_strings[DIR_LENGTH//2]
    attr = attr[:9] + "x" + attr[10:]

    t0 = perf_counter()
    for _ in range(repeat):
        try:
            getattr(Class, attr)
        except AttributeError as exc:
            with captured_stderr() as err:
                sys.__excepthook__(*sys.exc_info())
    t1 = perf_counter()

    assert "Did you mean" in err.getvalue(), err.getvalue()
    assert attr in err.getvalue()
    return (t1 - t0) / repeat

if __name__ == "__main__":
    print(f"{STRING_LENGTH = }")
    print(f"{DIR_LENGTH = }")
    for _ in range(5):
        print(f"{bench():.8f}", end=" ")
    print()

pablogsal · 2021-05-02T04:24:31Z

Thanks, @sweeneyde for the benchmarking. This is great work! Although performance here is not required (because the interpreter is going to finish soon) this is still important in some edge cases like a thread that is constantly raising exceptions (yeah, I know 😉 ) so I am pleased to see that this PR will make it faster.

I guess that if we remove the call to the memory allocator and do stack locations will be slightly faster as well. Although for the average namespace this probably will not be super important. In any case, unless is super expensive I prefer to center our efforts on usability an ensure that the suggestions are the better we can have :)

sweeneyde · 2021-05-02T04:33:22Z

If that is the priority, I can take a look in the next couple of hours at implementing the "transpositions" variant with a rotation of three matrix rows. Would that be good for this PR or should it be a later PR?

pablogsal · 2021-05-02T05:41:53Z

I can take a look in the next couple of hours at implementing the "transpositions" variant with a rotation of three matrix rows. Would that be good for this PR or should it be a later PR?

You mean the gcc version here https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/spellcheck.c#L46-L142?

I have implemented that and played a bit with it but I didn't see any cases where is obviously better than the ones we cover already. I was also a bit wary of performance and also the three calls to the allocators also make it a bit messy. I changed them to static arrays of max length and I saw a bit of a speedup in general but I am not sure if it was worth it. It would be great to have some "realistic" examples where that approach is better than the one in this PR or the one that we already have so we can justify the extra complexity.

sweeneyde · 2021-05-02T05:56:48Z

I increased the range of possible dir()s to check, based on looking at some of the following, particularly numpy:

len(dir(collections)) ~= 36
len(dir(str)) ~= 80
len(dir(__builtins__)) ~= 157 (frequently used here)
len(dir(numpy.ndarray)) ~= 161
len(dir(pandas.DataFrame)) ~= 441
len(dir(scipy)) ~= 601
len(dir(numpy)) ~= 606
len(dir(sympy)) ~= 989

pablogsal · 2021-05-02T13:33:18Z

As this is still marked as WIP, tell me when is ready to review and I will do another pass.

sweeneyde · 2021-05-02T20:07:55Z

I think this should be ready to review again.

pablogsal · 2021-05-03T14:53:29Z

Lib/test/test_exceptions.py

@@ -1565,21 +1565,87 @@ def test_name_error_bad_suggestions_do_not_trigger_for_small_names(self):
    def test_name_error_suggestions_do_not_trigger_for_too_many_locals(self):


We must find a way to make this test more bearable, maybe using compile() as this is getting a bit wild. :(

pablogsal

LGTM

Thanks, @sweeneyde, this is fantastic work 🚀

I left a comment but we can address that later

pablogsal · 2021-05-03T14:56:16Z

Oh, apparently we have some leaks:

❯ ./python -m test test_exceptions test_capi -R :
0:00:00 load avg: 1.41 Run tests sequentially
0:00:00 load avg: 1.41 [1/2] test_exceptions
beginning 9 repetitions
123456789
.........
0:00:10 load avg: 1.51 [2/2] test_capi
beginning 9 repetitions
123456789
.........
test_capi leaked [38, 38, 38, 38] references, sum=152
test_capi leaked [31, 31, 31, 31] memory blocks, sum=124
test_capi failed in 46.7 sec

== Tests result: FAILURE ==

1 test OK.

1 test failed:
    test_capi

Total duration: 57.2 sec
Tests result: FAILUR

pablogsal

We need to find the leaks before we merge

bedevere-bot · 2021-05-03T14:56:34Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

pablogsal · 2021-05-03T15:32:24Z

I fixed the refleaks in ce87e15

pablogsal

Checked for refleaks:

❯ ./python -m test  test_capi -R :
0:00:00 load avg: 1.13 Run tests sequentially
0:00:00 load avg: 1.13 [1/1] test_capi
beginning 9 repetitions
123456789
.........
test_capi passed in 46.1 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 46.1 sec
Tests result: SUCCESS

sweeneyde · 2021-05-03T17:05:02Z

Thanks for the last minute review and fix!

Refactor suggestions

d04b5f4

bedevere-bot added the awaiting review label May 1, 2021

the-knights-who-say-ni added the CLA signed label May 1, 2021

pablogsal requested changes May 1, 2021

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting review labels May 1, 2021

pablogsal reviewed May 1, 2021

View reviewed changes

pablogsal self-assigned this May 1, 2021

pablogsal reviewed May 1, 2021

View reviewed changes

pablogsal added the skip news label May 1, 2021

sweeneyde added 3 commits May 1, 2021 17:42

Apply changes from review

e680014

Add tests using _testinternalcapi

8e120f3

Remove unused buffer

d716094

Allow for suggestions on larger dir()s and names

eff17fb

sweeneyde added 2 commits May 2, 2021 02:12

Correct a comment

e99baf1

Remove one more unneeded computation

3af514a

sweeneyde changed the title ~~bpo-38530: WIP: Refactor AttributeError suggestions~~ bpo-38530: Refactor AttributeError suggestions May 2, 2021

pablogsal reviewed May 3, 2021

View reviewed changes

pablogsal approved these changes May 3, 2021

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels May 3, 2021

pablogsal requested changes May 3, 2021

View reviewed changes

bedevere-bot removed the awaiting merge label May 3, 2021

bedevere-bot added the awaiting changes label May 3, 2021

Fix refleaks

ce87e15

pablogsal approved these changes May 3, 2021

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels May 3, 2021

pablogsal merged commit 80a2a4e into python:master May 3, 2021

bedevere-bot removed the awaiting merge label May 3, 2021

sweeneyde deleted the suggest branch December 19, 2021 05:33

tirkarthi mentioned this pull request Jan 21, 2022

Show suggestions in error messages in Python 3.10 ipython/ipython#13450

Closed

		@@ -1565,21 +1565,87 @@ def test_name_error_bad_suggestions_do_not_trigger_for_small_names(self):
		def test_name_error_suggestions_do_not_trigger_for_too_many_locals(self):

Uh oh!

bpo-38530: Refactor AttributeError suggestions #25776

bpo-38530: Refactor AttributeError suggestions #25776

Uh oh!

Conversation

sweeneyde commented May 1, 2021 • edited by pablogsal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal May 1, 2021

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented May 1, 2021

Uh oh!

pablogsal May 1, 2021

Choose a reason for hiding this comment

Uh oh!

pablogsal May 1, 2021

Choose a reason for hiding this comment

Uh oh!

pablogsal May 1, 2021

Choose a reason for hiding this comment

Uh oh!

pablogsal commented May 1, 2021

Uh oh!

sweeneyde commented May 2, 2021

Uh oh!

pablogsal commented May 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sweeneyde commented May 2, 2021

Uh oh!

pablogsal commented May 2, 2021

Uh oh!

sweeneyde commented May 2, 2021

Uh oh!

pablogsal commented May 2, 2021

Uh oh!

sweeneyde commented May 2, 2021

Uh oh!

pablogsal May 3, 2021

Choose a reason for hiding this comment

Uh oh!

pablogsal left a comment

Choose a reason for hiding this comment

Uh oh!

pablogsal commented May 3, 2021

Uh oh!

pablogsal left a comment

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented May 3, 2021

Uh oh!

pablogsal commented May 3, 2021

Uh oh!

pablogsal left a comment

Choose a reason for hiding this comment

Uh oh!

sweeneyde commented May 3, 2021

Uh oh!

Uh oh!

sweeneyde commented May 1, 2021 •

edited by pablogsal

Loading

pablogsal commented May 2, 2021 •

edited

Loading