Skip to content

Commit 6ff27ca

Browse files
toddrjenmax-sixty
andauthored
Add additional str accessor methods for DataArray (#4622)
* add type hints for the str accessor class * allow str accessors to use regular expression objects for regular expressions * implement casefold and normalize str accessor functions * implement one-to-many str accessor functions * implement cat, join, format, +, *, and % * support elementwise operations in many str accessor functions * update whats-new.rst, api.rst, and api-hidden.rst * test fixes * implement requested fixes * more fixes * typing fixes * fix docstring * fix more docstring * remove encoding header Co-authored-by: Maximilian Roos <[email protected]>
1 parent c195c26 commit 6ff27ca

File tree

5 files changed

+5261
-673
lines changed

5 files changed

+5261
-673
lines changed

doc/api-hidden.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -324,14 +324,21 @@
324324
core.accessor_dt.TimedeltaAccessor.seconds
325325

326326
core.accessor_str.StringAccessor.capitalize
327+
core.accessor_str.StringAccessor.casefold
328+
core.accessor_str.StringAccessor.cat
327329
core.accessor_str.StringAccessor.center
328330
core.accessor_str.StringAccessor.contains
329331
core.accessor_str.StringAccessor.count
330332
core.accessor_str.StringAccessor.decode
331333
core.accessor_str.StringAccessor.encode
332334
core.accessor_str.StringAccessor.endswith
335+
core.accessor_str.StringAccessor.extract
336+
core.accessor_str.StringAccessor.extractall
333337
core.accessor_str.StringAccessor.find
338+
core.accessor_str.StringAccessor.findall
339+
core.accessor_str.StringAccessor.format
334340
core.accessor_str.StringAccessor.get
341+
core.accessor_str.StringAccessor.get_dummies
335342
core.accessor_str.StringAccessor.index
336343
core.accessor_str.StringAccessor.isalnum
337344
core.accessor_str.StringAccessor.isalpha
@@ -342,20 +349,26 @@
342349
core.accessor_str.StringAccessor.isspace
343350
core.accessor_str.StringAccessor.istitle
344351
core.accessor_str.StringAccessor.isupper
352+
core.accessor_str.StringAccessor.join
345353
core.accessor_str.StringAccessor.len
346354
core.accessor_str.StringAccessor.ljust
347355
core.accessor_str.StringAccessor.lower
348356
core.accessor_str.StringAccessor.lstrip
349357
core.accessor_str.StringAccessor.match
358+
core.accessor_str.StringAccessor.normalize
350359
core.accessor_str.StringAccessor.pad
360+
core.accessor_str.StringAccessor.partition
351361
core.accessor_str.StringAccessor.repeat
352362
core.accessor_str.StringAccessor.replace
353363
core.accessor_str.StringAccessor.rfind
354364
core.accessor_str.StringAccessor.rindex
355365
core.accessor_str.StringAccessor.rjust
366+
core.accessor_str.StringAccessor.rpartition
367+
core.accessor_str.StringAccessor.rsplit
356368
core.accessor_str.StringAccessor.rstrip
357369
core.accessor_str.StringAccessor.slice
358370
core.accessor_str.StringAccessor.slice_replace
371+
core.accessor_str.StringAccessor.split
359372
core.accessor_str.StringAccessor.startswith
360373
core.accessor_str.StringAccessor.strip
361374
core.accessor_str.StringAccessor.swapcase

doc/api.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,38 +420,58 @@ String manipulation
420420
:toctree: generated/
421421
:template: autosummary/accessor_method.rst
422422

423+
DataArray.str._apply
424+
DataArray.str._padder
425+
DataArray.str._partitioner
426+
DataArray.str._re_compile
427+
DataArray.str._splitter
428+
DataArray.str._stringify
423429
DataArray.str.capitalize
430+
DataArray.str.casefold
431+
DataArray.str.cat
424432
DataArray.str.center
425433
DataArray.str.contains
426434
DataArray.str.count
427435
DataArray.str.decode
428436
DataArray.str.encode
429437
DataArray.str.endswith
438+
DataArray.str.extract
439+
DataArray.str.extractall
430440
DataArray.str.find
441+
DataArray.str.findall
442+
DataArray.str.format
431443
DataArray.str.get
444+
DataArray.str.get_dummies
432445
DataArray.str.index
433446
DataArray.str.isalnum
434447
DataArray.str.isalpha
435448
DataArray.str.isdecimal
436449
DataArray.str.isdigit
450+
DataArray.str.islower
437451
DataArray.str.isnumeric
438452
DataArray.str.isspace
439453
DataArray.str.istitle
440454
DataArray.str.isupper
455+
DataArray.str.join
441456
DataArray.str.len
442457
DataArray.str.ljust
443458
DataArray.str.lower
444459
DataArray.str.lstrip
445460
DataArray.str.match
461+
DataArray.str.normalize
446462
DataArray.str.pad
463+
DataArray.str.partition
447464
DataArray.str.repeat
448465
DataArray.str.replace
449466
DataArray.str.rfind
450467
DataArray.str.rindex
451468
DataArray.str.rjust
469+
DataArray.str.rpartition
470+
DataArray.str.rsplit
452471
DataArray.str.rstrip
453472
DataArray.str.slice
454473
DataArray.str.slice_replace
474+
DataArray.str.split
455475
DataArray.str.startswith
456476
DataArray.str.strip
457477
DataArray.str.swapcase

doc/whats-new.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,22 @@ New Features
2727
- Support for `dask.graph_manipulation
2828
<https://docs.dask.org/en/latest/graph_manipulation.html>`_ (requires dask >=2021.3)
2929
By `Guido Imperiale <https://github.com/crusaderky>`_
30+
- Many of the arguments for the :py:attr:`DataArray.str` methods now support
31+
providing an array-like input. In this case, the array provided to the
32+
arguments is broadcast against the original array and applied elementwise.
33+
- :py:attr:`DataArray.str` now supports `+`, `*`, and `%` operators. These
34+
behave the same as they do for :py:class:`str`, except that they follow
35+
array broadcasting rules.
36+
- A large number of new :py:attr:`DataArray.str` methods were implemented,
37+
:py:meth:`DataArray.str.casefold`, :py:meth:`DataArray.str.cat`,
38+
:py:meth:`DataArray.str.extract`, :py:meth:`DataArray.str.extractall`,
39+
:py:meth:`DataArray.str.findall`, :py:meth:`DataArray.str.format`,
40+
:py:meth:`DataArray.str.get_dummies`, :py:meth:`DataArray.str.islower`,
41+
:py:meth:`DataArray.str.join`, :py:meth:`DataArray.str.normalize`,
42+
:py:meth:`DataArray.str.partition`, :py:meth:`DataArray.str.rpartition`,
43+
:py:meth:`DataArray.str.rsplit`, and :py:meth:`DataArray.str.split`.
44+
A number of these methods allow for splitting or joining the strings in an
45+
array. (:issue:`4622`)
3046
- Thanks to the new pluggable backend infrastructure external packages may now
3147
use the ``xarray.backends`` entry point to register additional engines to be used in
3248
:py:func:`open_dataset`, see the documentation in :ref:`add_a_backend`
@@ -36,6 +52,7 @@ New Features
3652
developed by `B-Open <https://www.bopen.eu>`_.
3753
By `Aureliana Barghini <https://github.com/aurghs>`_ and `Alessandro Amici <https://github.com/alexamici>`_.
3854

55+
3956
Breaking changes
4057
~~~~~~~~~~~~~~~~
4158
- :py:func:`open_dataset` and :py:func:`open_dataarray` now accept only the first argument

0 commit comments

Comments
 (0)