Skip to content

Commit c5fa3fb

Browse files
committed
updates
1 parent 13d30d2 commit c5fa3fb

File tree

5 files changed

+54
-35
lines changed

5 files changed

+54
-35
lines changed

doc/source/user_guide/sparse.rst

Lines changed: 27 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -263,15 +263,11 @@ have no replacement.
263263
Interaction with scipy.sparse
264264
-----------------------------
265265

266-
SparseDataFrame
267-
~~~~~~~~~~~~~~~
266+
Use :meth:`DataFrame.sparse.from_coo` to create a ``DataFrame`` with sparse values from a sparse matrix.
268267

269-
.. versionadded:: 0.20.0
270-
271-
Pandas supports creating sparse dataframes directly from ``scipy.sparse`` matrices.
268+
.. versionadded:: 0.25.0
272269

273270
.. ipython:: python
274-
:okwarning:
275271
276272
from scipy.sparse import csr_matrix
277273
@@ -281,25 +277,22 @@ Pandas supports creating sparse dataframes directly from ``scipy.sparse`` matric
281277
sp_arr = csr_matrix(arr)
282278
sp_arr
283279
284-
sdf = pd.SparseDataFrame(sp_arr)
285-
sdf
280+
sdf = pd.DataFrame.sparse.from_spmatrix(sp_arr)
281+
sdf.head()
282+
sdf.dtypes
286283
287284
All sparse formats are supported, but matrices that are not in :mod:`COOrdinate <scipy.sparse>` format will be converted, copying data as needed.
288-
To convert a ``SparseDataFrame`` back to sparse SciPy matrix in COO format, you can use the :meth:`SparseDataFrame.to_coo` method:
285+
To convert back to sparse SciPy matrix in COO format, you can use the :meth:`DataFrame.sparse.to_coo` method:
289286

290287
.. ipython:: python
291288
292-
sdf.to_coo()
289+
sdf.sparse.to_coo()
293290
294-
SparseSeries
295-
~~~~~~~~~~~~
296-
297-
A :meth:`SparseSeries.to_coo` method is implemented for transforming a ``SparseSeries`` indexed by a ``MultiIndex`` to a ``scipy.sparse.coo_matrix``.
291+
:meth:`Series.sparse.to_coo` is implemented for transforming a ``Series`` with sparse values indexed by a ``MultiIndex`` to a ``scipy.sparse.coo_matrix``.
298292

299293
The method requires a ``MultiIndex`` with two or more levels.
300294

301295
.. ipython:: python
302-
:okwarning:
303296
304297
s = pd.Series([3.0, np.nan, 1.0, 3.0, np.nan, np.nan])
305298
s.index = pd.MultiIndex.from_tuples([(1, 2, 'a', 0),
@@ -309,19 +302,17 @@ The method requires a ``MultiIndex`` with two or more levels.
309302
(2, 1, 'b', 0),
310303
(2, 1, 'b', 1)],
311304
names=['A', 'B', 'C', 'D'])
312-
313305
s
314-
# SparseSeries
315-
ss = s.to_sparse()
306+
ss = s.astype('Sparse')
316307
ss
317308
318-
In the example below, we transform the ``SparseSeries`` to a sparse representation of a 2-d array by specifying that the first and second ``MultiIndex`` levels define labels for the rows and the third and fourth levels define labels for the columns. We also specify that the column and row labels should be sorted in the final sparse representation.
309+
In the example below, we transform the ``Series`` to a sparse representation of a 2-d array by specifying that the first and second ``MultiIndex`` levels define labels for the rows and the third and fourth levels define labels for the columns. We also specify that the column and row labels should be sorted in the final sparse representation.
319310

320311
.. ipython:: python
321312
322-
A, rows, columns = ss.to_coo(row_levels=['A', 'B'],
323-
column_levels=['C', 'D'],
324-
sort_labels=True)
313+
A, rows, columns = ss.sparse.to_coo(row_levels=['A', 'B'],
314+
column_levels=['C', 'D'],
315+
sort_labels=True)
325316
326317
A
327318
A.todense()
@@ -332,16 +323,16 @@ Specifying different row and column labels (and not sorting them) yields a diffe
332323

333324
.. ipython:: python
334325
335-
A, rows, columns = ss.to_coo(row_levels=['A', 'B', 'C'],
336-
column_levels=['D'],
337-
sort_labels=False)
326+
A, rows, columns = ss.sparse.to_coo(row_levels=['A', 'B', 'C'],
327+
column_levels=['D'],
328+
sort_labels=False)
338329
339330
A
340331
A.todense()
341332
rows
342333
columns
343334
344-
A convenience method :meth:`SparseSeries.from_coo` is implemented for creating a ``SparseSeries`` from a ``scipy.sparse.coo_matrix``.
335+
A convenience method :meth:`Series.sparse.from_coo` is implemented for creating a ``Series`` with sparse values from a ``scipy.sparse.coo_matrix``.
345336

346337
.. ipython:: python
347338
@@ -351,23 +342,28 @@ A convenience method :meth:`SparseSeries.from_coo` is implemented for creating a
351342
A
352343
A.todense()
353344
354-
The default behaviour (with ``dense_index=False``) simply returns a ``SparseSeries`` containing
345+
The default behaviour (with ``dense_index=False``) simply returns a ``Series`` containing
355346
only the non-null entries.
356347

357348
.. ipython:: python
358-
:okwarning:
359349
360-
ss = pd.SparseSeries.from_coo(A)
350+
ss = pd.Series.sparse.from_coo(A)
361351
ss
362352
363353
Specifying ``dense_index=True`` will result in an index that is the Cartesian product of the
364354
row and columns coordinates of the matrix. Note that this will consume a significant amount of memory
365355
(relative to ``dense_index=False``) if the sparse matrix is large (and sparse) enough.
366356

367357
.. ipython:: python
368-
:okwarning:
369358
370-
ss_dense = pd.SparseSeries.from_coo(A, dense_index=True)
359+
ss_dense = pd.Series.sparse.from_coo(A, dense_index=True)
371360
ss_dense
372361
373362
363+
.. _sparse.subclasses:
364+
365+
Sparse Subclasses
366+
-----------------
367+
368+
The :class:`SparseSeries` and :class:`SparseDataFrame` classes are deprecated. Visit their
369+
API pages for usage.

pandas/core/arrays/sparse.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2033,7 +2033,8 @@ def from_coo(cls, A, dense_index=False):
20332033
from pandas.core.sparse.scipy_sparse import _coo_to_sparse_series
20342034
from pandas import Series
20352035

2036-
result = _coo_to_sparse_series(A, dense_index=dense_index)
2036+
result = _coo_to_sparse_series(A, dense_index=dense_index,
2037+
sparse_series=False)
20372038
# SparseSeries -> Series[sparse]
20382039
result = Series(result.values, index=result.index, copy=False)
20392040

pandas/core/series.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1589,7 +1589,6 @@ def to_sparse(self, kind='block', fill_value=None):
15891589
SparseSeries
15901590
Sparse representation of the Series.
15911591
"""
1592-
# TODO: deprecate
15931592
from pandas.core.sparse.series import SparseSeries
15941593

15951594
values = SparseArray(self, kind=kind, fill_value=fill_value)

pandas/core/sparse/scipy_sparse.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -116,14 +116,19 @@ def _sparse_series_to_coo(ss, row_levels=(0, ), column_levels=(1, ),
116116
return sparse_matrix, rows, columns
117117

118118

119-
def _coo_to_sparse_series(A, dense_index=False):
119+
def _coo_to_sparse_series(A, dense_index=False, sparse_series=True):
120120
"""
121121
Convert a scipy.sparse.coo_matrix to a SparseSeries.
122122
Use the defaults given in the SparseSeries constructor.
123123
"""
124+
from pandas import SparseDtype
125+
124126
s = Series(A.data, MultiIndex.from_arrays((A.row, A.col)))
125127
s = s.sort_index()
126-
s = s.to_sparse() # TODO: specify kind?
128+
if sparse_series:
129+
s = s.to_sparse() # TODO: specify kind?
130+
else:
131+
s = s.astype(SparseDtype(s.dtype))
127132
if dense_index:
128133
# is there a better constructor method to use here?
129134
i = range(A.shape[0])

pandas/tests/arrays/sparse/test_accessor.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,3 +101,21 @@ def test_density(self):
101101
res = df.sparse.density
102102
expected = 0.75
103103
assert res == expected
104+
105+
@pytest.mark.parametrize("dtype", ['int64', 'float64'])
106+
@pytest.mark.parametrize("dense_index", [True, False])
107+
@td.skip_if_no_scipy
108+
def test_series_from_coo(self, dtype, dense_index):
109+
import scipy.sparse
110+
111+
A = scipy.sparse.eye(3, format='coo', dtype=dtype)
112+
result = pd.Series.sparse.from_coo(A, dense_index=dense_index)
113+
index = pd.MultiIndex.from_tuples([(0, 0), (1, 1), (2, 2)])
114+
expected = pd.Series(pd.SparseArray(np.array([1, 1, 1], dtype=dtype)),
115+
index=index)
116+
if dense_index:
117+
expected = expected.reindex(
118+
pd.MultiIndex.from_product(index.levels)
119+
)
120+
121+
tm.assert_series_equal(result, expected)

0 commit comments

Comments
 (0)