GH-39010: [Python] Introduce `maps_as_pydicts` parameter for `to_pylist`, `to_pydict`, `as_py` #45471

jonded94 · 2025-02-09T22:52:46Z

Rationale for this change

Currently, unfortunately MapScalar/Array types are not deserialized into proper Python dicts, which is unfortunate since this breaks "roundtrips" from Python -> Arrow -> Python:

import pyarrow as pa

schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))])
data = [{'x': {'a': 1}}]
pa.RecordBatch.from_pylist(data, schema=schema).to_pylist()
# [{'x': [('a', 1)]}]

This is especially bad when storing TiBs of deeply nested data (think of lists in structs in maps...) that were created from Python and serialized into Arrow/Parquet, since they can't be read in again with native pyarrow methods without doing extremely ugly and computationally costly workarounds.

What changes are included in this PR?

A new parameter maps_as_pydicts is introduced to to_pylist, to_pydict, as_py which will allow proper roundtrips:

import pyarrow as pa

schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))])
data = [{'x': {'a': 1}}]
pa.RecordBatch.from_pylist(data, schema=schema).to_pylist(maps_as_pydicts="strict")
# [{'x': {'a': 1}}]

Are these changes tested?

Yes. There are tests for to_pylist and to_pydict included for pyarrow.Table, whilst low-level MapScalar and especially a nesting with ListScalar and StructScalar is tested.

Also, duplicate keys now should throw an error, which is also tested for.

Are there any user-facing changes?

Yes. The as_py() method on Scalar instances can be called with a new keyword argument maps_as_pydicts.

As a consequence, if you implement your own Scalar subclass (for example for an extension type), you should change its signature to accept that new argument. For example this definition:

class JSONArrowScalar(pa.ExtensionScalar):
    def as_py(self):
        return deserialize_json(self.value.as_py() if self.value else None)

could be changed to:

class JSONArrowScalar(pa.ExtensionScalar):
    def as_py(self, **kwargs):
        return deserialize_json(self.value.as_py(**kwargs) if self.value else None)

GitHub Issue: [Python] Support converting map to dict (instead of list of tuples) in scalar conversion / to_pylist #39010

Fix ExampleUuidScalarType Add tests for `maps_as_pydicts` Add test for duplicate map keys Formatting fixes Add docstring for 'maps_as_pydicts' Formatting fixes Call from_arrays from Table Fix last hopefully issues Correct MapScalar method "as_py" when there are multiple keys present

github-actions · 2025-02-09T22:53:24Z

⚠️ GitHub issue #39010 has been automatically assigned in GitHub to PR creator.

pitrou · 2025-02-10T14:23:14Z

While this is not a bad idea in itself, it seems like the roundtripping concern could be solved more efficiently by making from_pylist accept a list of tuples for map fields.

pitrou · 2025-02-10T14:23:55Z

Also:

they can't be read in again with native pyarrow methods without doing extremely ugly and computationally costly workarounds

Please note that from_pylist and to_pylist are quite costly in themselves. Usually you want to avoid these kinds of roundtrips to/from Python objects if you are concerned with performance.

jonded94 · 2025-02-10T16:26:28Z

While this is not a bad idea in itself, it seems like the roundtripping concern could be solved more efficiently by making from_pylist accept a list of tuples for map fields.

Let me clarify what this is about. Map fields are already createable with from_pylist by using list of tuples, as I show in the tests I added. Even the code in my initial message can show this. Fundamentally, it's about adding opt-in behaviour to to_pylist to arrive at a functionality one would expect from a Python perspective:

data = [{'x': {'a': 1}}]
pa.RecordBatch.from_pylist(data, schema=schema).to_pylist()
^---------------------------------------------^
  this works fine, data will properly encoded in the Arrow way of encoding Maps
                                                ^---------^
                                                this will give lists of tuples instead of dicts

You can use data = [{'x': [('a', 1)]}] here too, this will yield the same RecordBatch. This then of course technically would qualify as a proper "roundtrip", but this is not what this issue is about, it's about deserializing Map Arrow types as the ~"expected" Python equivalent, at least as an opt-in method such as pandas already supports for some couple of years now (shown in the linked Github issue).

Please note that from_pylist and to_pylist are quite costly in themselves.

Yes, but this is part of a very large distributed machine learning setup, where relatively intricate filters applied on deeply nested list/struct/map columns. The compute of the actual machine learning outclasses the compute one has to do to deserialize Python objects by many orders of magnitude.

For pure data queries, we would not use bare Python objects of course.

pitrou · 2025-02-10T16:41:46Z

it's about deserializing Map Arrow types as the ~"expected" Python equivalent, at least as an opt-in method such as pandas already supports for some couple of years now (shown in the linked Github issue).

I see, thanks. Then, do we want to reuse the same parameter signature as in the Pandas-related PR? I.e., allow either None, "lossy" and "strict", rather than a boolean.

jonded94 · 2025-02-10T16:47:41Z

allow either None, "lossy" and "strict", rather than a boolean.

Sure, I actually also stumbled across that when I revisited that original Github issue. Before I do that, I'd like to ask whether you're generally fine with adding this new parameter to every as_py method? This has to be done because the to_pylist method calls as_py on its member arrays (which can be all possible types), and therefore all array/scalar types have to support this parameter. I did not see any other way to easily implement this. I'm willing to do quick progress here, so if you come up with another idea, let me know.

pitrou · 2025-02-10T16:49:05Z

Sure, I actually also stumbled across that when I revisited that original Github issue. Before I do that, I'd like to ask whether you're generally fine with adding this new parameter to every as_py method?

That sounds ok to me. Ideally, to_pylist wouldn't call as_py in a loop (which is going to be quite slow), but that would be a major refactor.

pitrou · 2025-02-10T16:49:54Z

By the way, we probably want to make the new parameter keyword-only?

jonded94 · 2025-02-10T19:34:52Z

I addressed the remarks :) There is some weird error in the "Docs" job, I don't know what this is about.

pitrou · 2025-02-11T16:00:12Z

Hmm, it looks like some of the CI failures will need #45500 to be merged first

jonded94 · 2025-02-13T19:36:24Z

I rebased the branch, now the CI tests seem fine again, I think?

Could we get a approval/review of this? :)

pitrou

Thanks @jonded94 ! This looks good on the principle, here are some assorted comments.

pitrou · 2025-02-17T12:40:40Z

python/pyarrow/array.pxi

+            This can change the ordering of (key, value) pairs, and will
+            deduplicate multiple keys, resulting in a possible loss of data.


I think the ordering comment is obsolete, as Python dicts are ordered nowadays. Unless the underlying implementation does something weird, ordering should therefore be preserved.

Removed the ordering part, added some explanation of which value survives on duplicate keys.

python/pyarrow/array.pxi

pitrou · 2025-02-17T12:41:32Z

python/pyarrow/table.pxi

+            Arrow Map, as in [(key1, value1), (key2, value2), ...].
+
+            If 'lossy' or 'strict', convert Arrow Map arrays to native Python dicts.
+            This can change the ordering of (key, value) pairs, and will


Same comment re: ordering

Same as above

pitrou · 2025-02-17T12:42:50Z

python/pyarrow/tests/test_scalars.py

+    with pytest.raises(ValueError):
+        assert s.as_py(maps_as_pydicts="strict")
+
+    assert s.as_py(maps_as_pydicts="lossy") == {'a': 2}


Can we check that a warning is actually emitted? See pytest.warns

Implemented a check for this warning

python/pyarrow/scalar.pxi

pitrou · 2025-02-17T12:47:27Z

python/pyarrow/scalar.pxi

+            raise ValueError(
+                "Invalid value for 'maps_as_pydicts': "
+                + "valid values are 'lossy', 'strict' or `None` (default). "
+                + f"Received '{maps_as_pydicts}'."


Nit: it may be more idiomatic to use the repr here

Suggested change

+ f"Received '{maps_as_pydicts}'."

+ f"Received {maps_as_pydicts!r}."

Implemented the suggested change

pitrou · 2025-02-17T12:47:50Z

python/pyarrow/scalar.pxi

+        for key, value in self:
+            if key in result_dict:
+                if maps_as_pydicts == "strict":
+                    raise ValueError(


I would make this a KeyError. Also, the message should perhaps contain the duplicate key?

Made it a KeyError

pitrou · 2025-02-20T14:00:44Z

@github-actions crossbow submit -g python

github-actions · 2025-02-20T14:03:29Z

Revision: 93045c4

Submitted crossbow builds: ursacomputing/crossbow @ actions-9728f80818

Task	Status
example-python-minimal-build-fedora-conda
example-python-minimal-build-ubuntu-venv
test-conda-python-3.10
test-conda-python-3.10-hdfs-2.9.2
test-conda-python-3.10-hdfs-3.2.1
test-conda-python-3.10-pandas-latest-numpy-latest
test-conda-python-3.11
test-conda-python-3.11-dask-latest
test-conda-python-3.11-dask-upstream_devel
test-conda-python-3.11-hypothesis
test-conda-python-3.11-pandas-latest-numpy-1.26
test-conda-python-3.11-pandas-latest-numpy-latest
test-conda-python-3.11-pandas-nightly-numpy-nightly
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly
test-conda-python-3.11-spark-master
test-conda-python-3.12
test-conda-python-3.12-cpython-debug
test-conda-python-3.13
test-conda-python-3.9
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5
test-conda-python-emscripten
test-cuda-python-ubuntu-22.04-cuda-11.7.1
test-debian-12-python-3-amd64
test-debian-12-python-3-i386
test-fedora-39-python-3
test-ubuntu-22.04-python-3
test-ubuntu-22.04-python-313-freethreading
test-ubuntu-24.04-python-3

pitrou

+1, will merge if CI is green.

pitrou · 2025-02-20T15:17:38Z

CI failures are unrelated.

Linchin · 2025-02-24T20:01:46Z

Just fyi this might cause backward incompatibility issue because the user defined extension types are not expecting maps_as_pydicts as an argument for to_py(). We are seeing this in our prerelease tests:

_________________________ test_json_arrow_record_batch _________________________

    def test_json_arrow_record_batch():
        data = [
            json.dumps(value, sort_keys=True, separators=(",", ":"))
            for value in JSON_DATA.values()
        ]
        arr = pa.array(data, type=db_dtypes.JSONArrowType())
        batch = pa.RecordBatch.from_arrays([arr], ["json_col"])
        sink = pa.BufferOutputStream()
    
        with pa.RecordBatchStreamWriter(sink, batch.schema) as writer:
            writer.write_batch(batch)
    
        buf = sink.getvalue()
    
        with pa.ipc.open_stream(buf) as reader:
            result = reader.read_all()
    
        json_col = result.column("json_col")
        assert isinstance(json_col.type, db_dtypes.JSONArrowType)
    
>       s = json_col.to_pylist()

tests/unit/test_json.py:225: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/table.pxi:1380: in pyarrow.lib.ChunkedArray.to_pylist
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: JSONArrowScalar.as_py() got an unexpected keyword argument 'maps_as_pydicts'

pyarrow/array.pxi:1[67](https://github.com/googleapis/python-db-dtypes-pandas/actions/runs/13017353125/job/37736481135?pr=310#step:5:68)7: TypeError
- generated xml file: /home/runner/work/python-db-dtypes-pandas/python-db-dtypes-pandas/unit_prerelease_3.12_sponge_log.xml -
=========================== short test summary info ============================
FAILED tests/unit/test_json.py::test_json_arrow_to_pylist - TypeError: JSONArrowScalar.as_py() got an unexpected keyword argument 'maps_as_pydicts'
FAILED tests/unit/test_json.py::test_json_arrow_record_batch - TypeError: JSONArrowScalar.as_py() got an unexpected keyword argument 'maps_as_pydicts'
2 failed, 298 passed in 1.[81](https://github.com/googleapis/python-db-dtypes-pandas/actions/runs/13017353125/job/37736481135?pr=310#step:5:82)s

(Link: https://github.com/googleapis/python-db-dtypes-pandas/actions/runs/13017353125/job/37736481135?pr=310)

…o_pylist`, `to_pydict`, `as_py` (apache#45471) ### Rationale for this change Currently, unfortunately `MapScalar`/`Array` types are not deserialized into proper Python `dict`s, which is unfortunate since this breaks "roundtrips" from Python -> Arrow -> Python: ``` import pyarrow as pa schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))]) data = [{'x': {'a': 1}}] pa.RecordBatch.from_pylist(data, schema=schema).to_pylist() # [{'x': [('a', 1)]}] ``` This is especially bad when storing TiBs of deeply nested data (think of lists in structs in maps...) that were created from Python and serialized into Arrow/Parquet, since they can't be read in again with native `pyarrow` methods without doing extremely ugly and computationally costly workarounds. ### What changes are included in this PR? A new parameter `maps_as_pydicts` is introduced to `to_pylist`, `to_pydict`, `as_py` which will allow proper roundtrips: ``` import pyarrow as pa schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))]) data = [{'x': {'a': 1}}] pa.RecordBatch.from_pylist(data, schema=schema).to_pylist(maps_as_pydicts="strict") # [{'x': {'a': 1}}] ``` ### Are these changes tested? Yes. There are tests for `to_pylist` and `to_pydict` included for `pyarrow.Table`, whilst low-level `MapScalar` and especially a nesting with `ListScalar` and `StructScalar` is tested. Also, duplicate keys now should throw an error, which is also tested for. ### Are there any user-facing changes? No callsites should be broken, simply a new keyword-only optional parameter is added. * GitHub Issue: apache#39010 Authored-by: Jonas Dedden <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>

omatthew98 · 2025-02-25T18:47:22Z

Just fyi this might cause backward incompatibility issue because the user defined extension types are not expecting maps_as_pydicts as an argument for to_py(). We are seeing this in our prerelease tests:

_________________________ test_json_arrow_record_batch _________________________

    def test_json_arrow_record_batch():
        data = [
            json.dumps(value, sort_keys=True, separators=(",", ":"))
            for value in JSON_DATA.values()
        ]
        arr = pa.array(data, type=db_dtypes.JSONArrowType())
        batch = pa.RecordBatch.from_arrays([arr], ["json_col"])
        sink = pa.BufferOutputStream()
    
        with pa.RecordBatchStreamWriter(sink, batch.schema) as writer:
            writer.write_batch(batch)
    
        buf = sink.getvalue()
    
        with pa.ipc.open_stream(buf) as reader:
            result = reader.read_all()
    
        json_col = result.column("json_col")
        assert isinstance(json_col.type, db_dtypes.JSONArrowType)
    
>       s = json_col.to_pylist()

tests/unit/test_json.py:225: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/table.pxi:1380: in pyarrow.lib.ChunkedArray.to_pylist
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: JSONArrowScalar.as_py() got an unexpected keyword argument 'maps_as_pydicts'

pyarrow/array.pxi:1[67](https://github.com/googleapis/python-db-dtypes-pandas/actions/runs/13017353125/job/37736481135?pr=310#step:5:68)7: TypeError
- generated xml file: /home/runner/work/python-db-dtypes-pandas/python-db-dtypes-pandas/unit_prerelease_3.12_sponge_log.xml -
=========================== short test summary info ============================
FAILED tests/unit/test_json.py::test_json_arrow_to_pylist - TypeError: JSONArrowScalar.as_py() got an unexpected keyword argument 'maps_as_pydicts'
FAILED tests/unit/test_json.py::test_json_arrow_record_batch - TypeError: JSONArrowScalar.as_py() got an unexpected keyword argument 'maps_as_pydicts'
2 failed, 298 passed in 1.[81](https://github.com/googleapis/python-db-dtypes-pandas/actions/runs/13017353125/job/37736481135?pr=310#step:5:82)s

(Link: https://github.com/googleapis/python-db-dtypes-pandas/actions/runs/13017353125/job/37736481135?pr=310)

We (Ray Data team) are also running into backward compatibility issues like this in our tests against pyarrow nightly with the same error mentioned here:

[2025-02-25T06:27:20Z] =================================== FAILURES ===================================
--
  | [2025-02-25T06:27:20Z] ____________ test_convert_to_pyarrow_array_object_ext_type_fallback ____________
  | [2025-02-25T06:27:20Z]
  | [2025-02-25T06:27:20Z]     def test_convert_to_pyarrow_array_object_ext_type_fallback():
  | [2025-02-25T06:27:20Z]         column_values = create_ragged_ndarray(
  | [2025-02-25T06:27:20Z]             [
  | [2025-02-25T06:27:20Z]                 "hi",
  | [2025-02-25T06:27:20Z]                 1,
  | [2025-02-25T06:27:20Z]                 None,
  | [2025-02-25T06:27:20Z]                 [[[[]]]],
  | [2025-02-25T06:27:20Z]                 {"a": [[{"b": 2, "c": UserObj(i=123)}]]},
  | [2025-02-25T06:27:20Z]                 UserObj(i=456),
  | [2025-02-25T06:27:20Z]             ]
  | [2025-02-25T06:27:20Z]         )
  | [2025-02-25T06:27:20Z]         column_name = "py_object_column"
  | [2025-02-25T06:27:20Z]
  | [2025-02-25T06:27:20Z]         # First, assert that straightforward conversion into Arrow native types fails
  | [2025-02-25T06:27:20Z]         with pytest.raises(ArrowConversionError) as exc_info:
  | [2025-02-25T06:27:20Z]             _convert_to_pyarrow_native_array(column_values, column_name)
  | [2025-02-25T06:27:20Z]
  | [2025-02-25T06:27:20Z]         assert (
  | [2025-02-25T06:27:20Z]             str(exc_info.value)
  | [2025-02-25T06:27:20Z]             == "Error converting data to Arrow: ['hi' 1 None list([[[[]]]]) {'a': [[{'b': 2, 'c': UserObj(i=123)}]]}\n UserObj(i=456)]"  # noqa: E501
  | [2025-02-25T06:27:20Z]         )
  | [2025-02-25T06:27:20Z]
  | [2025-02-25T06:27:20Z]         # Subsequently, assert that fallback to `ArrowObjectExtensionType` succeeds
  | [2025-02-25T06:27:20Z]         pa_array = convert_to_pyarrow_array(column_values, column_name)
  | [2025-02-25T06:27:20Z]
  | [2025-02-25T06:27:20Z] >       assert pa_array.to_pylist() == column_values.tolist()
  | [2025-02-25T06:27:20Z]
  | [2025-02-25T06:27:20Z] python/ray/air/tests/test_arrow.py:121:
  | [2025-02-25T06:27:20Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  | [2025-02-25T06:27:20Z]
  | [2025-02-25T06:27:20Z] >   ???
  | [2025-02-25T06:27:20Z] E   TypeError: as_py() got an unexpected keyword argument 'maps_as_pydicts'

pitrou · 2025-02-25T19:51:08Z

@Linchin @omatthew98 I think the way around this would be to take a **kwargs in your as_py method and then forward it to any nested as_py call (if any).

For example turn this:

class JSONArrowScalar(pa.ExtensionScalar):
    def as_py(self):
        return JSONArray._deserialize_json(self.value.as_py() if self.value else None)

into this:

class JSONArrowScalar(pa.ExtensionScalar):
    def as_py(self, **kwargs):
        return JSONArray._deserialize_json(self.value.as_py(**kwargs) if self.value else None)

pitrou · 2025-02-25T19:55:36Z

I've updated the PR description, we should remember to call out this potential incompatibility in the release notes for the next version.

…o_pylist`, `to_pydict`, `as_py` (apache#45471) ### Rationale for this change Currently, unfortunately `MapScalar`/`Array` types are not deserialized into proper Python `dict`s, which is unfortunate since this breaks "roundtrips" from Python -> Arrow -> Python: ``` import pyarrow as pa schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))]) data = [{'x': {'a': 1}}] pa.RecordBatch.from_pylist(data, schema=schema).to_pylist() # [{'x': [('a', 1)]}] ``` This is especially bad when storing TiBs of deeply nested data (think of lists in structs in maps...) that were created from Python and serialized into Arrow/Parquet, since they can't be read in again with native `pyarrow` methods without doing extremely ugly and computationally costly workarounds. ### What changes are included in this PR? A new parameter `maps_as_pydicts` is introduced to `to_pylist`, `to_pydict`, `as_py` which will allow proper roundtrips: ``` import pyarrow as pa schema = pa.schema([pa.field('x', pa.map_(pa.string(), pa.int64()))]) data = [{'x': {'a': 1}}] pa.RecordBatch.from_pylist(data, schema=schema).to_pylist(maps_as_pydicts="strict") # [{'x': {'a': 1}}] ``` ### Are these changes tested? Yes. There are tests for `to_pylist` and `to_pydict` included for `pyarrow.Table`, whilst low-level `MapScalar` and especially a nesting with `ListScalar` and `StructScalar` is tested. Also, duplicate keys now should throw an error, which is also tested for. ### Are there any user-facing changes? No callsites should be broken, simply a new keyword-only optional parameter is added. * GitHub Issue: apache#39010 Authored-by: Jonas Dedden <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>

#51041) ## Why are these changes needed? Our tests with pyarrow nightly caught a backwards incompatibility bug with a [recent pyarrow change](apache/arrow#45471). To fix this we simply need to pass along kwargs in our `as_py` method as suggested by the pyarrow team [here](apache/arrow#45471 (comment)). --------- Signed-off-by: Matthew Owen <[email protected]>

ray-project#51041) ## Why are these changes needed? Our tests with pyarrow nightly caught a backwards incompatibility bug with a [recent pyarrow change](apache/arrow#45471). To fix this we simply need to pass along kwargs in our `as_py` method as suggested by the pyarrow team [here](apache/arrow#45471 (comment)). --------- Signed-off-by: Matthew Owen <[email protected]>

#51041) ## Why are these changes needed? Our tests with pyarrow nightly caught a backwards incompatibility bug with a [recent pyarrow change](apache/arrow#45471). To fix this we simply need to pass along kwargs in our `as_py` method as suggested by the pyarrow team [here](apache/arrow#45471 (comment)). --------- Signed-off-by: Matthew Owen <[email protected]> Signed-off-by: Abrar Sheikh <[email protected]>

ray-project#51041) ## Why are these changes needed? Our tests with pyarrow nightly caught a backwards incompatibility bug with a [recent pyarrow change](apache/arrow#45471). To fix this we simply need to pass along kwargs in our `as_py` method as suggested by the pyarrow team [here](apache/arrow#45471 (comment)). --------- Signed-off-by: Matthew Owen <[email protected]>

ray-project#51041) ## Why are these changes needed? Our tests with pyarrow nightly caught a backwards incompatibility bug with a [recent pyarrow change](apache/arrow#45471). To fix this we simply need to pass along kwargs in our `as_py` method as suggested by the pyarrow team [here](apache/arrow#45471 (comment)). --------- Signed-off-by: Matthew Owen <[email protected]> Signed-off-by: Jay Chia <[email protected]>

github-actions bot added Component: Python awaiting review Awaiting review labels Feb 9, 2025

pitrou requested a review from jorisvandenbossche February 10, 2025 14:24

jonded94 added 4 commits February 10, 2025 19:59

Made maps_as_pydicts compatible to the pandas API

a3492b6

Make maps_as_pydicts a kwarg-only parameter

59264a5

Correct test typo

e861daa

Formatting fix

548793b

Merge branch 'apache:main' into introduce-maps-as-pydicts-parameter

c42058a

pitrou requested changes Feb 17, 2025

View reviewed changes

jonded94 added 4 commits February 19, 2025 15:49

Merge branch 'apache:main' into introduce-maps-as-pydicts-parameter

93b41f5

Implement PR remarks

7ebf2d9

Fix typo

c86b467

Merge branch 'apache:main' into introduce-maps-as-pydicts-parameter

93045c4

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 20, 2025

pitrou approved these changes Feb 20, 2025

View reviewed changes

pitrou merged commit f6bfa7b into apache:main Feb 20, 2025
16 checks passed

pitrou removed the awaiting committer review Awaiting committer review label Feb 20, 2025

pitrou mentioned this pull request Feb 20, 2025

[Python] Support converting map to dict (instead of list of tuples) in scalar conversion / to_pylist #39010

Closed

Linchin mentioned this pull request Feb 24, 2025

test: remove pyarrow prerelease pin googleapis/python-db-dtypes-pandas#327

Merged

omatthew98 mentioned this pull request Mar 3, 2025

[data] Pass along kwargs to prevent breaking ArrowPythonObjectScalar ray-project/ray#51041

Merged

8 tasks

tobim mentioned this pull request Apr 26, 2025

AwkwardArrowArray.to_pylist() is incompatible with Arrow 20.0.0-rc2 scikit-hep/awkward#3483

Closed

ivanthewebber mentioned this pull request Apr 29, 2025

[Data] PyArrow 20.0.0 Backward Incompatability (unexpected keyword argument 'maps_as_pydicts') ray-project/ray#52685

Open

stdrc mentioned this pull request May 9, 2025

should specify version for pyarrow dependency arrow-udf/arrow-udf#129

Closed

mikelui mentioned this pull request May 10, 2025

ENH: access arrow-backed map as a python dictionary pandas-dev/pandas#61427

Open

3 tasks

		This can change the ordering of (key, value) pairs, and will
		deduplicate multiple keys, resulting in a possible loss of data.

	+ f"Received '{maps_as_pydicts}'."
	+ f"Received {maps_as_pydicts!r}."

GH-39010: [Python] Introduce maps_as_pydicts parameter for to_pylist, to_pydict, as_py #45471

GH-39010: [Python] Introduce maps_as_pydicts parameter for to_pylist, to_pydict, as_py #45471

Uh oh!

Conversation

jonded94 commented Feb 9, 2025 • edited by pitrou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Feb 9, 2025

Uh oh!

pitrou commented Feb 10, 2025

Uh oh!

pitrou commented Feb 10, 2025

Uh oh!

jonded94 commented Feb 10, 2025

Uh oh!

pitrou commented Feb 10, 2025

Uh oh!

jonded94 commented Feb 10, 2025

Uh oh!

pitrou commented Feb 10, 2025

Uh oh!

pitrou commented Feb 10, 2025

Uh oh!

jonded94 commented Feb 10, 2025

Uh oh!

pitrou commented Feb 11, 2025

Uh oh!

jonded94 commented Feb 13, 2025

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou commented Feb 20, 2025

Uh oh!

github-actions bot commented Feb 20, 2025

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

pitrou commented Feb 20, 2025

Uh oh!

Uh oh!

Linchin commented Feb 24, 2025

Uh oh!

omatthew98 commented Feb 25, 2025

Uh oh!

pitrou commented Feb 25, 2025

Uh oh!

pitrou commented Feb 25, 2025

Uh oh!

Uh oh!

GH-39010: [Python] Introduce `maps_as_pydicts` parameter for `to_pylist`, `to_pydict`, `as_py` #45471

GH-39010: [Python] Introduce `maps_as_pydicts` parameter for `to_pylist`, `to_pydict`, `as_py` #45471

jonded94 commented Feb 9, 2025 •

edited by pitrou

Loading