Skip to content

iterrows changes dtype of columns #3566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tebeka opened this issue May 10, 2013 · 10 comments
Closed

iterrows changes dtype of columns #3566

tebeka opened this issue May 10, 2013 · 10 comments
Labels
Milestone

Comments

@tebeka
Copy link

tebeka commented May 10, 2013

In [105]: df = pd.DataFrame([[1, 1.0]], columns=['x', 'y'])
In [112]: row = next(df.iterrows())[1]
In [113]: row
Out[113]: 
x    1
y    1
Name: 0, dtype: float64
In [114]: row['x'].dtype
Out[114]: dtype('float64')
In [115]: df['x'].dtype
Out[115]: dtype('int64')
@jreback
Copy link
Contributor

jreback commented May 10, 2013

this is by definition correct, dtypes are maintained only in the columns

the dtype is the lowest common denomincator of the row dtypes, which in this case is float

@cpcloud
Copy link
Member

cpcloud commented May 10, 2013

additionally, it's probably very inefficient to keep track of the dtype of individual elements with the way Python's type system works, because at that point you have an object array anyway. you could just transpose the array if your use case allows that. that's not really solution, more of a workaround

@jreback
Copy link
Contributor

jreback commented May 10, 2013

@cpcloud actually the issue is that the cross section happens to be Series([1,1.0] which is a float64

there are methods that used to try to convert this (to int64, but in its very hard to in general figure out when to do this without a reference (e.g. in a groupby its somewhat easier, so we do it), but here its actually undeterministic, so it stays

@cpcloud
Copy link
Member

cpcloud commented May 10, 2013

@jreback right, df rows are series. i know, i was just trying to think (aloud) of how you would even do this without just having a single element keep track of its own dtype (I can imagine a class called PandasElement or something like that) and the obj arr seemed the easiest (or just the first thing that came into my head) i see how gb would be easier tho. wut do u mean by undeterministic here?

@jreback
Copy link
Contributor

jreback commented May 10, 2013

@cpcloud what I mean is that w/o a reference, I don't want to convert Series([1,1.0]] to intt64. It is float64, the fact that it is equivalent to a int64 is True. (I don't really mean undeterministic, was just a good word at the time :)

And converting like this (w/o user input say or a refernce frame) means a lot of the time we have to convert back when nan are introduced, so bottom line is, most operations try to keep the same dtype (and dtype flavor, e.g. float32 if at all possible). It is is a bug if its not kept in most cases (but not this one!)

@cpcloud
Copy link
Member

cpcloud commented May 10, 2013

@jreback ah ok, s.dtype.type is np.int64 and s.dtype.type is np.float64 is true (not in practice obvs), so u must make a choice, i see now y u said non-deterministic.

@tebeka
Copy link
Author

tebeka commented May 10, 2013

@jreback I know the 1 == 1.0. However I was populating a database with a TEXT column. And instead of '1' I got '1.0' in there.

I have no problem that this is the correct behavior - just document it. ;)

@jreback
Copy link
Contributor

jreback commented May 10, 2013

@tebeka fair point

  1. would you like to do a PR to put your example in the doc string, and add a mention in : http://pandas.pydata.org/pandas-docs/dev/basics.html#iterrows (I would use the note format, search for '::note'), sort of a 'warning'

  2. If you are going to write to a text column, I would s.apply(lambda x: "%.2f") to guarantee formatting....

@cpcloud
Copy link
Member

cpcloud commented May 11, 2013

this can be closed

@jreback
Copy link
Contributor

jreback commented May 11, 2013

closed by #3569

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants