Skip to content

ENH: Add engine_kwargs to read_csv #52301

Open
@Finndersen

Description

@Finndersen

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Regarding the pyarrow engine for read_csv():

Currently it appears that the column_types parameter is not provided to ConvertOptions of pyarrow.csv.read_csv() , even though it seems very analogous to the dtype option of pd.read_csv().

If provided, it would disable type inference for those columns and improve performance.

Currently the dtype parameter provided to pd.read_csv() is only used to convert data types of the DataFrame after it is produced by pa.read_csv().to_pandas(), so does not improve the performance of pa.read_csv()

Feature Description

All that would be needed is to create a mapping of Pandas dtypes to PyArrow dtypes (maybe this already exists)?
And then use this mapping to create column_types from dtype, and provide to ConvertOptions

Alternative Solutions

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions