Description
Most mypy tests use minimal fake stubs instead of real builtins, and if you forget to use the right stubs they default to very minimal stubs that often cause weird test failures (often crashes) that are difficult to figure out, especially for new contributors.
The main benefits of using the fake stubs are performance and debuggability. The latter might not be obvious, but the main idea is that it's often significantly easier to debug type checker issues if a test case doesn't do a lot of work and you only have a small number of things in the symbol table.
I'm not convinced that switching to full stubs everywhere is a good idea, due to the above issues. Even with better caching, the tests run via runtests.py unit-test
would likely run several times slower than they currently do if we'd use full stubs everywhere, and I'd rather see the tests run faster than slower.
Here is an idea for how to improve the situation without too many compromises:
- Combine the existing fixtures into a single fake stub that is good enough for all or almost all test cases that use fake stubs. This would still be much smaller than the full stub, so performance and debuggability would be reasonable.
- We could also add some additional features to the stubs to make them easier to use.
- Use incremental mode to avoid reprocessing the stubs on every test case.
- We'll probably want some test cases that don't use incremental mode to test non-incremental type checking as well, but it's probably enough to only have a small subset of tests like this.
- Make the fake stubs more discoverable. For example, print a note with the path of the fake stub file on test failure to make it easy to see where stubs come from.
- Implement a heuristic that will detect if a test fails due to a built-in name being undefined in the fake stubs and prints out a useful message. This doesn't need to be very fancy. Maybe find all class and function names defined in the full builtins stubs using regexps and if see if one of those names is reported as undefined.
- If we'd make test cases that use full stubs less expensive, it would be easier to justify writing more test cases that use full stubs, even if most test cases still used fake stubs. Thus contributors who don't like the fake stubs could opt out of using them, or we could even direct new contributors to writing full stub test cases by default. Core team members could periodically migrate some full stub test cases to use fake stubs if the test suite becomes too slow.