Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Regarding the pyarrow engine for read_csv():
Currently it appears that the column_types parameter is not provided to ConvertOptions of pyarrow.csv.read_csv() , even though it seems very analogous to the dtype option of pd.read_csv().
If provided, it would disable type inference for those columns and improve performance.
Currently the dtype parameter provided to pd.read_csv() is only used to convert data types of the DataFrame after it is produced by pa.read_csv().to_pandas(), so does not improve the performance of pa.read_csv()
Feature Description
All that would be needed is to create a mapping of Pandas dtypes to PyArrow dtypes (maybe this already exists)?
And then use this mapping to create column_types from dtype, and provide to ConvertOptions
Alternative Solutions
Additional Context
No response