Description
The one slowest test task in our test suite is testpythoneval.py
aka run eval-test
-- in fact, on my desktop with 8-way parallelism, the whole suite takes 1m28s, and if that one task is omitted it takes 51s. This is because for much of that time it's the single straggler task running, even though since commit 56bb4ba we start it at the very beginning. (There are also some quick wins we can get to make the suite much faster than 51s, once we make this task much faster and/or break it into smaller pieces.)
Looking inside the test code, it looks like it does two things for each of a number of test cases:
- Run the type-checker on the test program
- Run the test program
and then compares the output to an expected output.
Is step 2 still useful? It made a lot of sense when mypy was its own implementation of a Python-like language, but now it seems like it just serves as a regression check on CPython. In principle it could be useful to check that our tests run the way we think they do, but that's basically a test of those test cases themselves. I'm not sure how often that's useful; I think most of the test cases themselves are very straightforward. So I suggest we cut it out.