Skip to content

Commit e796c27

Browse files
committed
DOC: io/v0.13/release notes
CLN: py3 updates
1 parent 4790b93 commit e796c27

File tree

4 files changed

+54
-14
lines changed

4 files changed

+54
-14
lines changed

doc/source/io.rst

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1230,6 +1230,37 @@ nanoseconds
12301230
import os
12311231
os.remove('test.json')
12321232
1233+
.. _io.json_normalize:
1234+
1235+
Normalization
1236+
~~~~~~~~~~~~~
1237+
1238+
.. versionadded:: 0.13.0
1239+
1240+
Pandas provides a utility function to take a dict or list of dicts and *normalize* this semi-structured data
1241+
into a flat table.
1242+
1243+
.. ipython:: python
1244+
1245+
from pandas.io.json import json_normalize
1246+
data = [{'state': 'Florida',
1247+
'shortname': 'FL',
1248+
'info': {
1249+
'governor': 'Rick Scott'
1250+
},
1251+
'counties': [{'name': 'Dade', 'population': 12345},
1252+
{'name': 'Broward', 'population': 40000},
1253+
{'name': 'Palm Beach', 'population': 60000}]},
1254+
{'state': 'Ohio',
1255+
'shortname': 'OH',
1256+
'info': {
1257+
'governor': 'John Kasich'
1258+
},
1259+
'counties': [{'name': 'Summit', 'population': 1234},
1260+
{'name': 'Cuyahoga', 'population': 1337}]}]
1261+
1262+
json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
1263+
12331264
HTML
12341265
----
12351266

@@ -1244,7 +1275,7 @@ Reading HTML Content
12441275

12451276
.. _io.read_html:
12461277

1247-
.. versionadded:: 0.12
1278+
.. versionadded:: 0.12.0
12481279

12491280
The top-level :func:`~pandas.io.html.read_html` function can accept an HTML
12501281
string/file/url and will parse HTML tables into list of pandas DataFrames.
@@ -1620,7 +1651,7 @@ advanced strategies
16201651

16211652
.. note::
16221653

1623-
The prior method of accessing Excel is now deprecated as of 0.12,
1654+
The prior method of accessing Excel is now deprecated as of 0.12.0,
16241655
this will work but will be removed in a future version.
16251656

16261657
.. code-block:: python
@@ -2291,7 +2322,7 @@ The default is 50,000 rows returned in a chunk.
22912322
22922323
.. note::
22932324

2294-
.. versionadded:: 0.12
2325+
.. versionadded:: 0.12.0
22952326

22962327
You can also use the iterator with ``read_hdf`` which will open, then
22972328
automatically close the store when finished iterating.
@@ -2580,7 +2611,7 @@ Pass ``min_itemsize`` on the first table creation to a-priori specifiy the minim
25802611
``min_itemsize`` can be an integer, or a dict mapping a column name to an integer. You can pass ``values`` as a key to
25812612
allow all *indexables* or *data_columns* to have this min_itemsize.
25822613

2583-
Starting in 0.11, passing a ``min_itemsize`` dict will cause all passed columns to be created as *data_columns* automatically.
2614+
Starting in 0.11.0, passing a ``min_itemsize`` dict will cause all passed columns to be created as *data_columns* automatically.
25842615

25852616
.. note::
25862617

@@ -2860,7 +2891,7 @@ Reading from STATA format
28602891

28612892
.. _io.stata_reader:
28622893

2863-
.. versionadded:: 0.12
2894+
.. versionadded:: 0.12.0
28642895

28652896
The top-level function ``read_stata`` will read a dta format file
28662897
and return a DataFrame:

doc/source/release.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,8 @@ Improvements to existing features
169169
high-dimensional arrays).
170170
- :func:`~pandas.read_html` now supports the ``parse_dates``,
171171
``tupleize_cols`` and ``thousands`` parameters (:issue:`4770`).
172+
- :meth:`~pandas.io.json.json_normalize` is a new method to allow you to create a flat table
173+
from semi-structured JSON data. :ref:`See the docs<io.json_normalize>` (:issue:`1067`)
172174

173175
API Changes
174176
~~~~~~~~~~~

doc/source/v0.13.0.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -490,6 +490,8 @@ Enhancements
490490
- ``tz_localize`` can infer a fall daylight savings transition based on the structure
491491
of the unlocalized data (:issue:`4230`), see :ref:`here<timeseries.timezone>`
492492
- DatetimeIndex is now in the API documentation, see :ref:`here<api.datetimeindex>`
493+
- :meth:`~pandas.io.json.json_normalize` is a new method to allow you to create a flat table
494+
from semi-structured JSON data. :ref:`See the docs<io.json_normalize>` (:issue:`1067`)
493495

494496
.. _whatsnew_0130.experimental:
495497

pandas/io/json.py

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# pylint: disable-msg=E1101,W0613,W0603
22

33
import os
4+
import copy
45
from collections import defaultdict
56
import numpy as np
67

@@ -570,8 +571,11 @@ def nested_to_record(ds,prefix="",level=0):
570571
ds = [ds]
571572
singleton = True
572573

574+
new_ds = []
573575
for d in ds:
574-
for k,v in d.items(): # modifying keys inside loop, not lazy
576+
577+
new_d = copy.deepcopy(d)
578+
for k,v in d.items():
575579
# each key gets renamed with prefix
576580
if level == 0:
577581
newkey = str(k)
@@ -582,16 +586,17 @@ def nested_to_record(ds,prefix="",level=0):
582586
# only at level>1 do we rename the rest of the keys
583587
if not isinstance(v,dict):
584588
if level!=0: # so we skip copying for top level, common case
585-
v = d.pop(k)
586-
d[newkey]= v
589+
v = new_d.pop(k)
590+
new_d[newkey]= v
587591
continue
588592
else:
589-
v = d.pop(k)
590-
d.update(nested_to_record(v,newkey,level+1))
593+
v = new_d.pop(k)
594+
new_d.update(nested_to_record(v,newkey,level+1))
595+
new_ds.append(new_d)
591596

592597
if singleton:
593-
return ds[0]
594-
return ds
598+
return new_ds[0]
599+
return new_ds
595600

596601

597602
def json_normalize(data, record_path=None, meta=None,
@@ -658,7 +663,7 @@ def _pull_field(js, spec):
658663
data = [data]
659664

660665
if record_path is None:
661-
if any([isinstance(x,dict) for x in data[0].itervalues()]):
666+
if any([isinstance(x,dict) for x in compat.itervalues(data[0])]):
662667
# naive normalization, this is idempotent for flat records
663668
# and potentially will inflate the data considerably for
664669
# deeply nested structures:
@@ -719,7 +724,7 @@ def _recursive_extract(data, path, seen_meta, level=0):
719724
result.rename(columns=lambda x: record_prefix + x, inplace=True)
720725

721726
# Data types, a problem
722-
for k, v in meta_vals.iteritems():
727+
for k, v in compat.iteritems(meta_vals):
723728
if meta_prefix is not None:
724729
k = meta_prefix + k
725730

0 commit comments

Comments
 (0)