Skip to content

Commit 695c824

Browse files
committed
Merge branch 'main' into median-aggregate
2 parents cdd51dd + 03f6bb6 commit 695c824

14 files changed

+318
-428
lines changed

.github/workflows/test.yaml

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,25 +25,39 @@ jobs:
2525
fail-fast: false
2626
matrix:
2727
python-version: ["3.9", "3.10", "3.11", "3.12"]
28-
environment-file: [ci/environment.yml, ci/environment_released.yml]
28+
environment-file: [ci/environment.yml]
2929

3030
steps:
3131
- uses: actions/checkout@v2
3232
with:
3333
fetch-depth: 0 # Needed by codecov.io
3434

35+
- name: Get current date
36+
id: date
37+
run: echo "date=$(date +%Y-%m-%d)" >> "${GITHUB_OUTPUT}"
38+
3539
- name: Install Environment
3640
uses: mamba-org/setup-micromamba@v1
3741
with:
3842
environment-file: ${{ matrix.environment-file }}
3943
create-args: python=${{ matrix.python-version }}
40-
cache-environment: true
44+
# Wipe cache every 24 hours or whenever environment.yml changes. This means it
45+
# may take up to a day before changes to unpinned packages are picked up.
46+
# To force a cache refresh, change the hardcoded numerical suffix below.
47+
cache-environment-key: environment-${{ steps.date.outputs.date }}-0
4148

4249
- name: Install dask-expr
4350
run: python -m pip install -e . --no-deps
4451

52+
- name: Print dask versions
53+
# Output of `micromamba list` is buggy for pip-installed packages
54+
run: pip list | grep -E 'dask|distributed'
55+
4556
- name: Run tests
4657
run: py.test -n auto --verbose --cov=dask_expr --cov-report=xml
4758

59+
- name: Run Dask DataFrame tests
60+
run: python -c "import dask.dataframe as dd; dd.test_dataframe()"
61+
4862
- name: Coverage
4963
uses: codecov/codecov-action@v3

README.md

Lines changed: 3 additions & 345 deletions
Original file line numberDiff line numberDiff line change
@@ -62,349 +62,7 @@ production settings.
6262
API Coverage
6363
------------
6464

65-
**`dask_expr.DataFrame`**
65+
Dask-Expr covers almost everything of the Dask DataFrame API. The only missing features are:
6666

67-
- `abs`
68-
- `add`
69-
- `add_prefix`
70-
- `add_sufix`
71-
- `align`
72-
- `all`
73-
- `any`
74-
- `apply`
75-
- `assign`
76-
- `astype`
77-
- `bfill`
78-
- `clip`
79-
- `combine_first`
80-
- `copy`
81-
- `count`
82-
- `cummax`
83-
- `cummin`
84-
- `cumprod`
85-
- `cumsum`
86-
- `dask`
87-
- `div`
88-
- `divide`
89-
- `drop`
90-
- `drop_duplicates`
91-
- `dropna`
92-
- `dtypes`
93-
- `eval`
94-
- `explode`
95-
- `ffill`
96-
- `fillna`
97-
- `floordiv`
98-
- `groupby`
99-
- `head`
100-
- `idxmax`
101-
- `idxmin`
102-
- `ìloc`
103-
- `index`
104-
- `isin`
105-
- `isna`
106-
- `join`
107-
- `map`
108-
- `map_overlap`
109-
- `map_partitions`
110-
- `mask`
111-
- `max`
112-
- `mean`
113-
- `memory_usage`
114-
- `memory_usage_per_partition`
115-
- `merge`
116-
- `min`
117-
- `min`
118-
- `mod`
119-
- `mode`
120-
- `mul`
121-
- `nlargest`
122-
- `nsmallest`
123-
- `nunique_approx`
124-
- `partitions`
125-
- `pivot_table`
126-
- `pow`
127-
- `prod`
128-
- `query`
129-
- `radd`
130-
- `rdiv`
131-
- `rename`
132-
- `rename_axis`
133-
- `repartition`
134-
- `replace`
135-
- `reset_index`
136-
- `rfloordiv`
137-
- `rmod`
138-
- `rmul`
139-
- `round`
140-
- `rpow`
141-
- `rsub`
142-
- `rtruediv`
143-
- `sample`
144-
- `select_dtypes`
145-
- `set_index`
146-
- `shift`
147-
- `shuffle`
148-
- `sort_values`
149-
- `std`
150-
- `sub`
151-
- `sum`
152-
- `tail`
153-
- `to_parquet`
154-
- `to_timestamp`
155-
- `truediv`
156-
- `var`
157-
- `visualize`
158-
- `where`
159-
160-
161-
**`dask_expr.Series`**
162-
163-
- `abs`
164-
- `add`
165-
- `align`
166-
- `all`
167-
- `any`
168-
- `apply`
169-
- `astype`
170-
- `between`
171-
- `bfill`
172-
- `clip`
173-
- `combine_first`
174-
- `copy`
175-
- `count`
176-
- `cummax`
177-
- `cummin`
178-
- `cumprod`
179-
- `cumsum`
180-
- `dask`
181-
- `div`
182-
- `divide`
183-
- `drop_duplicates`
184-
- `dropna`
185-
- `dtype`
186-
- `explode`
187-
- `ffill`
188-
- `fillna`
189-
- `floordiv`
190-
- `groupby`
191-
- `head`
192-
- `idxmax`
193-
- `idxmin`
194-
- `index`
195-
- `isin`
196-
- `isna`
197-
- `map`
198-
- `map_partitions`
199-
- `mask`
200-
- `max`
201-
- `mean`
202-
- `memory_usage`
203-
- `memory_usage_per_partition`
204-
- `min`
205-
- `min`
206-
- `mod`
207-
- `mode`
208-
- `mul`
209-
- `nlargest`
210-
- `nsmallest`
211-
- `nunique_approx`
212-
- `partitions`
213-
- `pow`
214-
- `prod`
215-
- `product`
216-
- `radd`
217-
- `rdiv`
218-
- `rename`
219-
- `rename_axis`
220-
- `repartition`
221-
- `replace`
222-
- `reset_index`
223-
- `rfloordiv`
224-
- `rmod`
225-
- `rmul`
226-
- `round`
227-
- `rpow`
228-
- `rsub`
229-
- `rtruediv`
230-
- `shift`
231-
- `shuffle`
232-
- `std`
233-
- `sub`
234-
- `sum`
235-
- `tail`
236-
- `to_frame`
237-
- `to_timestamp`
238-
- `truediv`
239-
- `unique`
240-
- `value_counts`
241-
- `var`
242-
- `visualize`
243-
- `where`
244-
245-
246-
**`dask_expr.Index`**
247-
248-
- `abs`
249-
- `align`
250-
- `all`
251-
- `any`
252-
- `apply`
253-
- `astype`
254-
- `clip`
255-
- `combine_first`
256-
- `copy`
257-
- `count`
258-
- `dask`
259-
- `dtype`
260-
- `fillna`
261-
- `groupby`
262-
- `head`
263-
- `idxmax`
264-
- `idxmin`
265-
- `index`
266-
- `isin`
267-
- `isna`
268-
- `map_partitions`
269-
- `max`
270-
- `memory_usage`
271-
- `min`
272-
- `min`
273-
- `mode`
274-
- `nunique_approx`
275-
- `partitions`
276-
- `prod`
277-
- `rename`
278-
- `rename_axis`
279-
- `repartition`
280-
- `replace`
281-
- `reset_index`
282-
- `round`
283-
- `shuffle`
284-
- `std`
285-
- `sum`
286-
- `tail`
287-
- `to_frame`
288-
- `to_timestamp`
289-
- `var`
290-
- `visualize`
291-
292-
293-
**`dask_expr._groupby.GroupBy`**
294-
295-
- `agg`
296-
- `aggregate`
297-
- `apply`
298-
- `bfill
299-
- `count`
300-
- `ffill`
301-
- `first`
302-
- `last`
303-
- `max`
304-
- `mean`
305-
- `median`
306-
- `min`
307-
- `nunique`
308-
- `prod`
309-
- `shift`
310-
- `size`
311-
- `std`
312-
- `sum`
313-
- `transform`
314-
- `value_counts`
315-
- `var`
316-
317-
Support for ``SeriesGroupBy`` and ``DataFrameGroupBy``.
318-
319-
**`dask_expr._resample.Resampler`**
320-
321-
- `agg`
322-
- `count`
323-
- `first`
324-
- `last`
325-
- `max`
326-
- `mean`
327-
- `median`
328-
- `min`
329-
- `nunique`
330-
- `ohlc`
331-
- `prod`
332-
- `quantile`
333-
- `sem`
334-
- `size`
335-
- `std`
336-
- `sum`
337-
- `var`
338-
339-
340-
**`dask_expr._rolling.Rolling`**
341-
342-
- `agg`
343-
- `apply`
344-
- `count`
345-
- `max`
346-
- `mean`
347-
- `median`
348-
- `min`
349-
- `quantile`
350-
- `std`
351-
- `sum`
352-
- `var`
353-
- `skew`
354-
- `kurt`
355-
356-
357-
**Binary operators (`DataFrame`, `Series`, and `Index`)**:
358-
359-
- `__add__`
360-
- `__radd__`
361-
- `__sub__`
362-
- `__rsub__`
363-
- `__mul__`
364-
- `__pow__`
365-
- `__rmul__`
366-
- `__truediv__`
367-
- `__rtruediv__`
368-
- `__lt__`
369-
- `__rlt__`
370-
- `__gt__`
371-
- `__rgt__`
372-
- `__le__`
373-
- `__rle__`
374-
- `__ge__`
375-
- `__rge__`
376-
- `__eq__`
377-
- `__ne__`
378-
- `__and__`
379-
- `__rand__`
380-
- `__or__`
381-
- `__ror__`
382-
- `__xor__`
383-
- `__rxor__`
384-
385-
386-
**Unary operators (`DataFrame`, `Series`, and `Index`)**:
387-
388-
- `__invert__`
389-
- `__neg__`
390-
- `__pos__`
391-
392-
**Accessors**:
393-
394-
- `CategoricalAccessor`
395-
- `DatetimeAccessor`
396-
- `StringAccessor`
397-
398-
**Function**
399-
400-
- `concat`
401-
- `from_pandas`
402-
- `merge`
403-
- `pivot_table`
404-
- `read_csv`
405-
- `read_parquet`
406-
- `repartition`
407-
- `to_datetime`
408-
- `to_numeric`
409-
- `to_timedelta`
410-
- `to_parquet`
67+
- ``melt``
68+
- named GroupBy Aggregations

0 commit comments

Comments
 (0)