Closed
Description
I find R's expand.grid()
function quite useful for quick creation of example datasets. For example:
expand.grid(height = seq(60, 70, 5), weight = seq(100, 180, 40), sex = c("Male","Female"))
height weight sex
1 60 100 Male
2 65 100 Male
3 70 100 Male
4 60 140 Male
5 65 140 Male
6 70 140 Male
7 60 180 Male
8 65 180 Male
9 70 180 Male
10 60 100 Female
11 65 100 Female
12 70 100 Female
13 60 140 Female
14 65 140 Female
15 70 140 Female
16 60 180 Female
17 65 180 Female
18 70 180 Female
A simple implementation of this for pandas
is easy to put together:
def expand_grid(dct):
rows = itertools.product(*dct.values())
return pd.DataFrame.from_records(rows, columns=dct.keys())
df = expand_grid(
{'height': range(60, 71, 5),
'weight': range(100, 181, 40),
'sex': ['Male', 'Female']}
)
print(df)
Do people think this would be a useful addition?
If so, what kind of features should it have beyond the basics? A dtypes
argument, specifying which column should be the index, etc.?
I'm also not sure if expand_grid
is the most intuitive name, but given that it's duplicating
R functionality, maybe it's best just to leave it as is.