Skip to content

BUG: Fix a bug when using DataFrame.to_records with unicode column names #13462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

Conversation

AlexisMignon
Copy link
Contributor

@AlexisMignon AlexisMignon commented Jun 16, 2016

Fix a bug when using DataFrame.to_records with unicode column names in python 2

@jreback
Copy link
Contributor

jreback commented Jun 16, 2016

tests please

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 16, 2016
def test_to_records_with_unicode_column_names(self):
# Issue #11879. to_records used to raise an exception when used
# with column names containing non ascii caracters in Python 2
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this being fixed by this PR? If so, remove try-except block and compare the result with the expected result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected result is that it doesn't raise an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, you can remove try-except and compare the result with expected array. We don't use self.fail often because any error results in failure.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

can you rebase / update?

@codecov-io
Copy link

codecov-io commented Sep 26, 2016

Current coverage is 84.75% (diff: 100%)

Merging #13462 into master will decrease coverage by <.01%

@@             master     #13462   diff @@
==========================================
  Files           145        145          
  Lines         51139      51139          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43344      43343     -1   
- Misses         7795       7796     +1   
  Partials          0          0          

Powered by Codecov. Last update 136a6fb...39e8226

@AlexisMignon
Copy link
Contributor Author

I've resynced my branch but it still fails. Not sure whether it is specifically due to my code though.

@jreback
Copy link
Contributor

jreback commented Nov 25, 2016

can you rebase / update according to comments

@jreback
Copy link
Contributor

jreback commented Dec 26, 2016

closing as stale. if you want to update, pls comment.

@jreback jreback closed this Dec 26, 2016
Changed the way dtype is specified in to_records in order to allow unicode field names.
@AlexisMignon
Copy link
Contributor Author

I changed the test as requested. Note that i corrected another problem due to the fact that numpy does not allow to specify dtype with unicode field names as list of tuples, but allows it using dictionnaries

@jreback jreback reopened this Dec 27, 2016
# with column names containing non ascii caracters in Python 2
result = DataFrame(data={u"accented_name_é": [1.0]}).to_records()
# Note that numpy allows for unicode field names but dtypes need
# to be specified using dictionnary intsead of list of tuples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a reference for this? is it listed as a numpy bug? (if not it should be)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is referenced here:
numpy/numpy#2407

@jreback
Copy link
Contributor

jreback commented Dec 27, 2016

pls add a whatsnew for 0.20.0. lgtm. otherwise

@jreback jreback added this to the 0.20.0 milestone Dec 30, 2016
@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

can you rebase / update

@jreback jreback closed this in 25dcff5 Feb 27, 2017
@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

thanks @AlexisMignon

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnicodeEncodeError from DataFrame.to_records
4 participants