Skip to content

Cannot read some columns of sequential integers #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
poliquin opened this issue Jul 2, 2020 · 2 comments
Closed

Cannot read some columns of sequential integers #35

poliquin opened this issue Jul 2, 2020 · 2 comments
Labels
enhancement New feature or request waiting for librdata changes the issue needs some fixes to the C library librdata before it can be solved

Comments

@poliquin
Copy link

poliquin commented Jul 2, 2020

The readme says there's support for integer columns, but I get an error "LibrdataError: Invalid file, or file has unsupported features" trying to read even a simple, one column data frame of sequential integers. Maybe related to #30.

To Reproduce
Create the rds file from within R...

library(tidyverse)
df <- tibble(x=1:10)
write_rds(df, 'tmp.rds', compress='gz')

Now read the file from within Python...

import pyreadr
df = pyreadr.read_r('tmp.rds')
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
#  File "/home/poliquin/anaconda3/envs/py38/lib/python3.8/site-packages/pyreadr/pyreadr.py", line 47, in read_r
#   parser.parse(path)
#  File "pyreadr/librdata.pyx", line 117, in pyreadr.librdata.Parser.parse
#  File "pyreadr/librdata.pyx", line 142, in pyreadr.librdata.Parser.parse
# pyreadr.custom_errors.LibrdataError: Invalid file, or file has unsupported features

Now, for something very strange, create the data frame in a slightly different way...

library(tidyverse)
df <- tibble(x=as.integer(c(1,2,3,4,5,6,7,8,9,10)))
write_rds(df, 'tmp.rds', compress='gz')

and everything works fine! ...

pyreadr.read_r('tmp.rds')
OrderedDict([(None,     x
0   1
1   2
2   3
3   4
4   5
5   6
6   7
7   8
8   9
9  10)])

File example
tmp.rds.zip

Expected behavior
It should not matter how the data frame is constructed in R. Reading a column of integers should work as in my second example.

Setup Information:
How did you install pyreadr? pip
Platform: Ubuntu Linux 20.04
Python Version: 3.8
Python Distribution: Anaconda
Using Virtualenv or condaenv? Yes, conda
R Version:

platform       x86_64-pc-linux-gnu
arch           x86_64
os             linux-gnu
system         x86_64, linux-gnu
status
major          4
minor          0.1
year           2020
month          06
day            06
svn rev        78648
language       R
version.string R version 4.0.1 (2020-06-06)
nickname       See Things Now
@ofajardo
Copy link
Owner

ofajardo commented Jul 2, 2020

thanks for the reproducible example. I have posted an issue in the C library to see if they can fix it.

@ofajardo ofajardo added enhancement New feature or request waiting for librdata changes the issue needs some fixes to the C library librdata before it can be solved labels Jul 9, 2020
@ofajardo
Copy link
Owner

solved in pyreadr 0.3.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request waiting for librdata changes the issue needs some fixes to the C library librdata before it can be solved
Projects
None yet
Development

No branches or pull requests

2 participants