-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
Non-informative error message in index() and remove() functions #57558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For example: >>> (1, 2, 3).index(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: tuple.index(x): x not in tuple The "x not in tuple" error message should be replaced with "4 is not in tuple". list.index() already does this (see bpo-7252): >>> [1, 2, 3].index(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 4 is not in list Although in bpo-7252 it was claimed that no other place in stdlib has this error message, I found many occurences: Modules/_collectionsmodule.c: PyErr_SetString(PyExc_ValueError, "deque.remove(x): x not in deque" There's also documentation and tests that depend on this actual error message: Doc/library/doctest.rst: ValueError: list.remove(x): x not in list bpo-7252 was fixed in r76058, and it's quite a lot of code. I think it could be done more easily using PyUnicode_FromFormat() and the %R format. |
The good thing about this is ease of debugging. You can see which is the offending value that was not found. On the other hand, the repr of a value might be very long: >>> [].index(list(range(1000)))
ValueError: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ...
(many lines of numbers)
997, 998, 999] is not in list Also, all values don't have a very informal repr: >>> class Foo: pass
...
>>> [].index(Foo())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: <__main__.Foo object at 0xb736f92c> is not in list |
|
>>> str(someobject)
'output that can change'
>>> 'something I want' in str(someobject) # more robust, but less useful if it fails
True
>>> something.index(spam)
traceback blah:
ValueError: output that can change
>>> try: something.index(spam)
... except ValueError: print('spam not in something') # more robust, but ugly
spam not in something |
doctests by their very nature tend to overspecify things - that's why actual regression tests should be written with unittest, while doctest is kept for its originally intended purpose of testing examples included in docstrings and other documentation. Still, there's also a reason why IGNORE_EXCEPTION_DETAIL is an available doctest option. |
Improving error messages has been a long, slow process as people are irritated enough to make a change. Please go ahead. Guido has explicitly excluded exception detail from the language spec multiple times. Test that depend on details are broken. Doc examples are updated as such details change. (So if you do upgrade the message, change the tests and examples ;-). |
Éric Araujo wrote:
Seems you can't with %R. I'd also like to somehow indicate that |
The test suite has code (functions) that restrict the length on AssertEqual (and other) failure messages. (I do not know the location though.) Perhaps that can be reused. This almost seems like something that should be more easily available. |
I found safe_repr() from Lib/unittest/util.py. We would require a similar function, just implemented in C. What is a good place to define such C helpers that could be used everywhere? |
The functions in Lib/unittest/util.py shouldn't be used outside unittest.
You could start by adding it in the file where you need it. If it starts becoming useful elsewhere too, it can then be moved somewhere else. I would expect such function to be private, so we are free to move it whenever we need it. |
Ezio Melotti wrote:
Well, msg147109 already lists 6 files that would use this function. |
Please don't stress too much about providing an indication that the repr has been truncated - it's an error message, not part of the normal program output. Besides, the lack of a closing ')', ']', '}' or '>' will usually indicate something is amiss in long reprs. More useful would be to raise a separate feature request about the lack of width and precision handling for string formatting in PyUnicode_FromFormatV - the common format code handling means it *accepts* the width and precision information for string codes, but then proceeds to completely ignore it. It should be using them to pad short strings (width) and truncate long ones (precision), just like printf() (only in terms of code points rather than bytes, since those codes work with Unicode text). |
Nick Coghlan wrote:
Ok. This makes things easier.
Created bpo-13428. |
Working on issue as part of Python Bug Day, Oct 2012. |
It seems that this has been fixed. Using simple tests from msg147215: ~/hg/cpython/working $ env-py3.3/bin/python
Python 3.3.0 (default, Nov 3 2012, 15:28:29)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> [].index(list(range(1000)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: [0, 1, 2, 3, 4, 5, ... 995, 996, 997, 998, 999] is not in list
[60649 refs]
>>> class Foo: pass
...
[60679 refs]
>>> [].index(Foo())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: <__main__.Foo object at 0x7f7fad31bc30> is not in list
[60677 refs] |
The original example was about tuples and displaying 'x' instead of '4'. Have that and *all* the others been fixed? |
After discussing with folks on the #python-dev tonight, I learned that I was testing with a list and I should've been using a set. I'm working on a patch now, and I'm almost ready to have it reviewed. |
Ready for review. |
You should use assertRaises/assertRaisesRegex to check the error message and add more tests to cover all the changes. I also noticed an inconsistence between 'element not in container' and 'element is not in container' (note the extra 'is'). FWIW I prefer the version without 'is'. |
Tests updated for coverage and to use assertRaisesRegex. |
From Taggnostr on #python-dev: 1.) Use assertRaises+assertIn instead of assertRaisesRegex |
Truncating repr strings to 100chars will require the patch from bpo-7330. After applying the patch from that issue, my tests work fine. |
Updated patch after taking into account Ezio's (aka Taggnostr on #python-dev) latest feedback. |
Updates after feedback from Serhiy. |
test_issuewhatever is a bit indescriptive. It'd be easier for someone glancing at the test to see what it claims to be testing if there were a more descriptive name given, like perhaps test_exception_message (or something even more verbose). |
Or test_index()/test_remove(). In this case positive test cases (and may be other exception types) should be added. |
Lib/test/test_array.py Lib/test/test_deque.py Lib/test/test_list.py Lib/test/test_tuple.py Lib/test/test_xml_etree.py |
See also my comments to previous patch about repr() and error message checking. |
Update based on Taggnostr's (Ezio on IRC, if I recall correctly) feedback from 11/12/2012 around 11:57 PST.
|
Confirmed with Ezio that issue bpo-7330 will need to be fixed/approved before this issue can be closed. |
FYI I just fixed this issue. PyUnicode_FromFormat() now supports width and precision in the format string. |
@sean Ochoa, do you want to make this into a PR? The only tweak I would suggest would be to change all error messages to either be:
or:
also, this probably needs to be changed to version 3.7 now. |
I agree with Jim Fasarakis-Hilliard this message may be change in new 3.7 version |
Additional instances of this:
|
>>> [1, 2, 3].index("'")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: ""'"" not in list Aren't too much quotes? |
Yes, that does look like too much. My rationale for adding quotes around the value was in order to make it more clear in cases where the repr exceeds 100 characters. Instead of: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2 not in list Have it clearly distinguished by using "": Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 2" not in list I'm not sure if there's a trivial way to not display so many quotes in the case you supplied, you're better suited to decide if this can be done somehow. |
Could use |
Error messages should use repr(). For example str() of bytes object emits a warning or error. See bpo-26090 for the issue with truncated reprs. |
See http://bugs.python.org/issue30477 for why I think this shouldn't be done. In short, the IndexError may be part of control flow (looping over successive matches) rather than an error (this patch adds a time/space cost with no benefit in those cases). The repr itself may be large resulting in an unusable error message. The repr may be expensive to compute for some objects. |
A potential compromise would be if ValueError gained a 'value' attribute. That would allow for the exception message to be static but still provide the information for introspection later if desired to figure out exactly what object led to the cause of the ValueError. If I remember correctly previous objections to this idea was worry about memory, though, so it seems attaching more information about the trigger of a ValueError will inevitably lead to more cost somehow. |
Thinking back over my 17 years of using Python, I really don't think this is needed at all. I can't think of a single situation where it would have helped. In some cases, the change would be detrimental to readability, introducing clutter into an otherwise clear error message. The OP's example of "ValueError: 4 is not in list" is atypical. A more typical example would be "ValueError: <main.TensorElementPlaceholder object at 0x106489e48> is not in list". As a Python instructor, I've become acutely aware that tracebacks already have usability issues due to visual clutter and an unpleasant overall appearance. IMO, this proposal would sometimes make it worse by adding more noise for the reader to filter out. Brett's idea is an improvement to the original proposal. By attaching the object to the exception rather than squeezing its representation into the string, you avoid making the error message hard to read and avoid the speed/space costs from computing the __repr__ on arbitrary objects. That said, I don't think there is any programmatic benefit to having the object in the exception. When you call "a.remove(b)", you already have "b" and it would be pointless to re-extract it with "except ValueError as e: b = e.args[-1]". I'm also concerned that the proposed changes sweep across the core affecting code that has been deployed and stable for almost two decades. Given the dubious benefits and potential detriments, I vote for declining this proposal and favoring the stable deployed code that has worked fine for many years. |
I'm fine with not changing this, but I did want to say that I don't think you're guaranteed to have access to the object that triggered the ValueError. If the value that is searched for is in some extension module where it's calculated using C code then you have no way of accessing that value unless the extension module was nice enough to provide the object in the exception, give the repr in the exception message, or document how exactly the transformation from input to what you searched for is. |
Thanks Brett. I'm going to mark this as closed/rejected. After five years, it is time to put it to rest and let it stop consuming our mental clock cycles.
I'm not sure I follow what you're saying. To call index() or remove() you already have to have the object to search for. The ValueError is raised when the searched for object is not found, not by some unknown object inside the container. |
While I agree it's reasonable to keep arbitrary value repr's out of these messages by default, I do think it would be desirable for someone sufficiently motivated to file a separate RFE about adding the value as a structured exception attribute so that custom sys.excepthook implementations and other "unhandled runtime exception" loggers may choose to handle the situation differently. This wouldn't be about the cases where code is handling the remove() or index() call directly, but rather about those where third party library code isn't handling missing values correctly, so the debugging scenario a developer is faced with looks more like this than it does the examples in the OP here:
Getting that kind of traceback in the logs for a production failure in a distributed system is all kinds of frustrating since you *know* the interpreter knew what "x" was when the error happened, it just isn't passing that information along to the developer that's trying to figure out how and why it broke. |
I concur with Raymond. But shouldn't we change error messages that contains the repr of not found value? Affected collections are list, deque and range. >>> [].index('spam')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'spam' is not in list
>>> import collections
>>> collections.deque().index('spam')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'spam' is not in deque range.index() raises different error messages depending on the type of the value: >>> range(10).index('spam')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: sequence.index(x): x not in sequence
>>> range(10).index(42)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 42 is not in range |
Thanks, I understand why this isn't the best idea now.
That is what I was thinking too, apart from removing the repr from other messages it will also unify reporting for all these methods. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: