Support categorical variables with CSVs #10153

esafak · 2015-05-16T16:38:47Z

It would be nice to be able to read CSVs with categorical variables using read_csv's dtype parameter instead of casting the columns after the fact.

TomAugspurger · 2015-05-16T18:38:53Z

I'm not opposed to this in principle, but I think the API will necessarily be clunky. Would we require (or allow) the user to specify all categories in the call to read_csv.

@esafak we do support categoricals in read/write_hdf if that's an option for you (it may not be).

esafak · 2015-05-16T18:42:02Z

Can't we already declare the dtypes of selected columns? I thought the problem was limited to categoricals, but if not, please expand my request to all dtypes.

TomAugspurger · 2015-05-16T18:48:56Z

You can specify the types. I was just thinking

pd.read_csv('file.csv', dtypes={'A': np.int64, 'B': pd.CategoricalDtype(['cat1', 'cat2', 'cat3'])})

which means you'd need to know all the categories up front. Or we infer them and you'll need to check that they're aren't any surprising categories.

sinhrks · 2015-07-11T21:26:17Z

Nice workaround, but I think it is still nice to support category arg.

As a first step, how about converting the specified columns to Categorical after parsing? Though it is very nice to have optimized IO logic...

TomAugspurger added API Design IO CSV read_csv, to_csv labels May 16, 2015

sinhrks mentioned this issue Jul 11, 2015

ENH: Allow read_csv dtype to accept category #10551

Closed

jreback added this to the 0.18.2 milestone May 9, 2016

jreback added Difficulty Intermediate labels May 9, 2016

This was referenced Jun 4, 2016

API/ENH: union Categorical #13361

Closed

ENH: parse categoricals in read_csv #13406

Closed

jreback closed this as completed in a292c13 Aug 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support categorical variables with CSVs #10153

Support categorical variables with CSVs #10153

esafak commented May 16, 2015

TomAugspurger commented May 16, 2015

Uh oh!

esafak commented May 16, 2015

Uh oh!

TomAugspurger commented May 16, 2015

Uh oh!

sinhrks commented Jul 11, 2015

Uh oh!

Uh oh!

Support categorical variables with CSVs #10153

Support categorical variables with CSVs #10153

Comments

esafak commented May 16, 2015

TomAugspurger commented May 16, 2015

Uh oh!

esafak commented May 16, 2015

Uh oh!

TomAugspurger commented May 16, 2015

Uh oh!

sinhrks commented Jul 11, 2015

Uh oh!