iterrows changes dtype of columns #3566

tebeka · 2013-05-10T17:33:05Z

In [105]: df = pd.DataFrame([[1, 1.0]], columns=['x', 'y'])
In [112]: row = next(df.iterrows())[1]
In [113]: row
Out[113]: 
x    1
y    1
Name: 0, dtype: float64
In [114]: row['x'].dtype
Out[114]: dtype('float64')
In [115]: df['x'].dtype
Out[115]: dtype('int64')

The text was updated successfully, but these errors were encountered:

jreback · 2013-05-10T17:36:41Z

this is by definition correct, dtypes are maintained only in the columns

the dtype is the lowest common denomincator of the row dtypes, which in this case is float

cpcloud · 2013-05-10T20:21:59Z

additionally, it's probably very inefficient to keep track of the dtype of individual elements with the way Python's type system works, because at that point you have an object array anyway. you could just transpose the array if your use case allows that. that's not really solution, more of a workaround

jreback · 2013-05-10T20:33:52Z

@cpcloud actually the issue is that the cross section happens to be Series([1,1.0] which is a float64

there are methods that used to try to convert this (to int64, but in its very hard to in general figure out when to do this without a reference (e.g. in a groupby its somewhat easier, so we do it), but here its actually undeterministic, so it stays

cpcloud · 2013-05-10T20:37:26Z

@jreback right, df rows are series. i know, i was just trying to think (aloud) of how you would even do this without just having a single element keep track of its own dtype (I can imagine a class called PandasElement or something like that) and the obj arr seemed the easiest (or just the first thing that came into my head) i see how gb would be easier tho. wut do u mean by undeterministic here?

jreback · 2013-05-10T20:44:43Z

@cpcloud what I mean is that w/o a reference, I don't want to convert Series([1,1.0]] to intt64. It is float64, the fact that it is equivalent to a int64 is True. (I don't really mean undeterministic, was just a good word at the time :)

And converting like this (w/o user input say or a refernce frame) means a lot of the time we have to convert back when nan are introduced, so bottom line is, most operations try to keep the same dtype (and dtype flavor, e.g. float32 if at all possible). It is is a bug if its not kept in most cases (but not this one!)

cpcloud · 2013-05-10T20:48:35Z

@jreback ah ok, s.dtype.type is np.int64 and s.dtype.type is np.float64 is true (not in practice obvs), so u must make a choice, i see now y u said non-deterministic.

tebeka · 2013-05-10T21:38:11Z

@jreback I know the 1 == 1.0. However I was populating a database with a TEXT column. And instead of '1' I got '1.0' in there.

I have no problem that this is the correct behavior - just document it. ;)

jreback · 2013-05-10T22:26:38Z

@tebeka fair point

would you like to do a PR to put your example in the doc string, and add a mention in : http://pandas.pydata.org/pandas-docs/dev/basics.html#iterrows (I would use the note format, search for '::note'), sort of a 'warning'
If you are going to write to a text column, I would s.apply(lambda x: "%.2f") to guarantee formatting....

cpcloud · 2013-05-11T20:46:21Z

this can be closed

jreback · 2013-05-11T20:47:32Z

closed by #3569

cpcloud mentioned this issue May 10, 2013

DOC: document non-preservation of dtypes across rows with iterrows #3569

Merged

jreback closed this as completed May 13, 2013

jreback mentioned this issue May 13, 2013

raise on fillna passed a list or tuple #3585

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

iterrows changes dtype of columns #3566

iterrows changes dtype of columns #3566

tebeka commented May 10, 2013

jreback commented May 10, 2013

Uh oh!

cpcloud commented May 10, 2013

Uh oh!

jreback commented May 10, 2013

Uh oh!

cpcloud commented May 10, 2013

Uh oh!

jreback commented May 10, 2013

Uh oh!

cpcloud commented May 10, 2013

Uh oh!

tebeka commented May 10, 2013

Uh oh!

jreback commented May 10, 2013

Uh oh!

cpcloud commented May 11, 2013

Uh oh!

jreback commented May 11, 2013

Uh oh!

Uh oh!

iterrows changes dtype of columns #3566

iterrows changes dtype of columns #3566

Comments

tebeka commented May 10, 2013

jreback commented May 10, 2013

Uh oh!

cpcloud commented May 10, 2013

Uh oh!

jreback commented May 10, 2013

Uh oh!

cpcloud commented May 10, 2013

Uh oh!

jreback commented May 10, 2013

Uh oh!

cpcloud commented May 10, 2013

Uh oh!

tebeka commented May 10, 2013

Uh oh!

jreback commented May 10, 2013

Uh oh!

cpcloud commented May 11, 2013

Uh oh!

jreback commented May 11, 2013

Uh oh!