ENH: Improved `CategoricalDtype` subtype handling.

### Feature Type

- [X] Adding new functionality to pandas

- [X] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

Internally categories already distinguish different subtypes. consider for example:

```python
import pandas as pd
s = pd.Series(["foo", "bar"], dtype=object)
print(s.astype("category"))
print(s.astype("string").astype("category"))
```

In the first case, `s.dtype.categories` is `Index(['bar', 'foo'], dtype='object')`, in the latter case it is `Index(['bar', 'foo'], dtype='string')`.

However currently handling of these subtypes is a bit awkward, hence the proposed features are quality-of-life improvements when working with such kinds of data, mainly:

1. Allow direct casting to categories of specific subtype via `.astype("category[<type>]")`
2. Ensure round tripping subtypes when serializing in formats that support categorical types.

```python
import pandas as pd
df = pd.DataFrame({"col":[ "foo", "bar"]}, dtype=object)
df = df.astype("string").astype("category")
df.to_parquet("test.parquet")
print(df["col"].dtype.categories)
df= pd.read_parquet("test.parquet")
print(df["col"].dtype.categories)
```

### Feature Description

- [ ] Make `CategoricalDtype` a `typing.Generic` parametrized by a scalar type. (⇝ relevant for `pandas-stubs`)
- [ ] The fallback should be `category[object]` (cf. https://github.com/python/mypy/issues/4236#issuecomment-521628880)
- [ ] Allow type casting `.astype("category[<type>]")`
  - `series.astype("category[string]")` should behave equivalently to `series.astype("string").astype("category")`
- [ ] Allow usage in constructor methods such as `read_csv(file, dtype=...)` and `DataFrame(..., dtype=...)`
- [ ] Ensure category subtypes are maintained trough serialization and loading
   - In particular, when reading parquet/feather format. (⇝ interoperability with `pyarrow`'s  dictionary type)
- [ ] Allow type checking`series.dtype == "category[string]"`.
  - Possibly `series.dtype == "string"` and `pd.api.types.is_string_dtype(series)` should evaluate to `True` if the `dtype` is `category[string]`, since `category` acts only as a kind of wrapper and things like `Series.str` accessor are still applicable. **(needs discussion)**

### Alternative Solutions

Existing functionality is to manually cast as `.astype(<type>).astype("category")` whenever necessary, or to explicitly construct an instance of `CategoricalDtype`, which however requires a-priori knowledge of the categories.

### Additional Context

Allowing direct casting to `category[<type>]` when using `read_csv` should bring minor performance benfits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Improved `CategoricalDtype` subtype handling. #48515

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Improved CategoricalDtype subtype handling. #48515

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ENH: Improved `CategoricalDtype` subtype handling. #48515