Skip to content

Commit 8fd654c

Browse files
committed
API: remove compat keyword in Categorical constructor
Before the new Categorical work, the default two argument constructor was expecting "codes and levels" but this was changed to "values and levels" and a 'compat' kwarg was added, which could be used to switch to the old style constructor useage. It was decided that we switch to the new style constructor useage as the new default (compat=False) and implement a 'from_codes(...)' constructor. As the compat codepath is now never triggered and code should be changed to use the new 'from_codes()' constructor, remove the old codepath. Add some warnings if we are pretty sure that the old style constructor is meant, but unfortunately we don't catch all cases.
1 parent 206fb97 commit 8fd654c

File tree

3 files changed

+45
-50
lines changed

3 files changed

+45
-50
lines changed

doc/source/v0.15.0.txt

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,13 @@ users upgrade to this version.
4040
but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This change allows very easy sub-classing and creation of new index types. This should be
4141
a transparent change with only very limited API implications (See the :ref:`Internal Refactoring <whatsnew_0150.refactoring>`)
4242

43+
.. warning::
44+
45+
The refactorings in :class:`~pandas.Categorical` changed the two argument constructor from
46+
"codes/labels and levels" to "values and levels". This can lead to subtle bugs. If you use
47+
:class:`~pandas.Categorical` directly, please audit your code before updating to this pandas
48+
version and change it to use the :meth:`~pandas.Categorical.from_codes` constructor.
49+
4350
.. _whatsnew_0150.api:
4451

4552
API changes
@@ -575,9 +582,10 @@ For full docs, see the :ref:`Categorical introduction <categorical>` and the
575582
- ``pandas.core.group_agg`` and ``pandas.core.factor_agg`` were removed. As an alternative, construct
576583
a dataframe and use ``df.groupby(<group>).agg(<func>)``.
577584

578-
- Supplying "codes/labels and levels" to the :class:`~pandas.Categorical` constructor is deprecated and does
579-
not work without supplying ``compat=True``. The default mode now uses "values and levels".
580-
Please change your code to use the :meth:`~pandas.Categorical.from_codes` constructor.
585+
- Supplying "codes/labels and levels" to the :class:`~pandas.Categorical` constructor is not
586+
supported anymore. Supplying two arguments to the constructor is now interpreted as
587+
"values and levels". Please change your code to use the :meth:`~pandas.Categorical.from_codes`
588+
constructor.
581589

582590
- The ``Categorical.labels`` attribute was renamed to ``Categorical.codes`` and is read
583591
only. If you want to manipulate codes, please use one of the

pandas/core/categorical.py

Lines changed: 19 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -127,8 +127,6 @@ class Categorical(PandasObject):
127127
name : str, optional
128128
Name for the Categorical variable. If name is None, will attempt
129129
to infer from values.
130-
compat : boolean, default=False
131-
Whether to treat values as codes to the levels (old API, deprecated)
132130
133131
Attributes
134132
----------
@@ -197,7 +195,7 @@ class Categorical(PandasObject):
197195
# For comparisons, so that numpy uses our implementation if the compare ops, which raise
198196
__array_priority__ = 1000
199197

200-
def __init__(self, values, levels=None, ordered=None, name=None, fastpath=False, compat=False):
198+
def __init__(self, values, levels=None, ordered=None, name=None, fastpath=False):
201199

202200
if fastpath:
203201
# fast path
@@ -257,32 +255,29 @@ def __init__(self, values, levels=None, ordered=None, name=None, fastpath=False,
257255
raise TypeError("'values' is not ordered, please explicitly specify the level "
258256
"order by passing in a level argument.")
259257
else:
260-
# there are two ways if levels are present
261-
# the old one, where each value is a int pointer to the levels array
262-
# the new one, where each value is also in the level array (or np.nan)
258+
# there were two ways if levels are present
259+
# - the old one, where each value is a int pointer to the levels array -> not anymore
260+
# possible, but code outside of pandas could call us like that, so make some checks
261+
# - the new one, where each value is also in the level array (or np.nan)
263262

264263
# make sure that we always have the same type here, no matter what we get passed in
265264
levels = self._validate_levels(levels)
266265

267-
# There can be two ways: the old which passed in codes and levels directly
268-
# and values have to be inferred and the new one, which passes in values and levels
269-
# and _codes have to be inferred.
270-
271-
# min and max can be higher and lower if not all levels are in the values
272-
if compat and (com.is_integer_dtype(values) and
273-
(np.min(values) >= -1) and (np.max(values) < len(levels))):
274-
warn("Using 'values' as codes is deprecated.\n"
275-
"'Categorical(... , compat=True)' is only there for historical reasons and "
276-
"should not be used in new code!\n"
277-
"See https://github.com/pydata/pandas/pull/7217", FutureWarning)
278-
codes = values
279-
else:
280-
codes = _get_codes_for_values(values, levels)
266+
codes = _get_codes_for_values(values, levels)
281267

282-
# if we got levels, we can assume that the order is intended
283-
# if ordered is unspecified
284-
if ordered is None:
285-
ordered = True
268+
# TODO: check for old style usage. These warnings should be removes after 0.18/ in 2016
269+
if com.is_integer_dtype(values) and not com.is_integer_dtype(levels):
270+
warn("Values and Levels have different dtypes. Did you mean to use\n"
271+
"'Categorical.from_codes(codes, levels)'?", RuntimeWarning)
272+
273+
if com.is_integer_dtype(values) and (codes == -1).all():
274+
warn("None of the levels were found in values. Did you mean to use\n"
275+
"'Categorical.from_codes(codes, levels)'?", RuntimeWarning)
276+
277+
# if we got levels, we can assume that the order is intended
278+
# if ordered is unspecified
279+
if ordered is None:
280+
ordered = True
286281

287282
self.ordered = False if ordered is None else ordered
288283
self._codes = codes

pandas/tests/test_categorical.py

Lines changed: 15 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -39,31 +39,8 @@ def test_constructor_unsortable(self):
3939
self.assertFalse(factor.ordered)
4040

4141
def test_constructor(self):
42-
# There are multiple ways to call a constructor
4342

44-
# old style: two arrays, one a pointer to the labels
45-
# old style is now only available with compat=True
4643
exp_arr = np.array(["a", "b", "c", "a", "b", "c"])
47-
with tm.assert_produces_warning(FutureWarning):
48-
c_old = Categorical([0,1,2,0,1,2], levels=["a","b","c"], compat=True)
49-
self.assert_numpy_array_equal(c_old.__array__(), exp_arr)
50-
# the next one are from the old docs
51-
with tm.assert_produces_warning(FutureWarning):
52-
c_old2 = Categorical([0, 1, 2, 0, 1, 2], [1, 2, 3], compat=True)
53-
self.assert_numpy_array_equal(c_old2.__array__(), np.array([1, 2, 3, 1, 2, 3]))
54-
with tm.assert_produces_warning(FutureWarning):
55-
c_old3 = Categorical([0,1,2,0,1,2], ['a', 'b', 'c'], compat=True)
56-
self.assert_numpy_array_equal(c_old3.__array__(), np.array(['a', 'b', 'c', 'a', 'b', 'c']))
57-
58-
with tm.assert_produces_warning(FutureWarning):
59-
cat = pd.Categorical([1,2], levels=[1,2,3], compat=True)
60-
self.assert_numpy_array_equal(cat.__array__(), np.array([2,3]))
61-
62-
with tm.assert_produces_warning(None):
63-
cat = pd.Categorical([1,2], levels=[1,2,3], compat=False)
64-
self.assert_numpy_array_equal(cat.__array__(), np.array([1,2]))
65-
66-
# new style
6744
c1 = Categorical(exp_arr)
6845
self.assert_numpy_array_equal(c1.__array__(), exp_arr)
6946
c2 = Categorical(exp_arr, levels=["a","b","c"])
@@ -174,6 +151,21 @@ def f():
174151
self.assertTrue(len(cat.codes) == 1)
175152
self.assertTrue(cat.codes[0] == 0)
176153

154+
# Catch old style constructor useage: two arrays, codes + levels
155+
# We can only catch two cases:
156+
# - when the first is an integer dtype and the second is not
157+
# - when the resulting codes are all -1/NaN
158+
with tm.assert_produces_warning(RuntimeWarning):
159+
c_old = Categorical([0,1,2,0,1,2], levels=["a","b","c"])
160+
161+
with tm.assert_produces_warning(RuntimeWarning):
162+
c_old = Categorical([0,1,2,0,1,2], levels=[3,4,5])
163+
164+
# the next one are from the old docs, but unfortunately these don't trigger :-(
165+
with tm.assert_produces_warning(None):
166+
c_old2 = Categorical([0, 1, 2, 0, 1, 2], [1, 2, 3])
167+
cat = Categorical([1,2], levels=[1,2,3])
168+
177169
def test_constructor_with_generator(self):
178170
# This was raising an Error in isnull(single_val).any() because isnull returned a scalar
179171
# for a generator

0 commit comments

Comments
 (0)