Skip to content

BUG: very very slow when append long dictionary into HDF5 file #41616

Closed
@xkungfu

Description

@xkungfu
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import numpy as np
import pandas as pd
from pandas import HDFStore,DataFrame# create (or open) an hdf5 file and opens in append mode

currenttimedata =  {
  "time": "20210101080808",
  "desc": "Ford",
  "status": "success",
  "detail0": "somevalue",
  "detail1": "somevalue",
  "detail2": "somevalue",
  "detail3": "somevalue",
  "detail4": "somevalue",
  "detail5": "somevalue",
  "detail6": "somevalue",
  "detail7": "somevalue",
  "detail8": "somevalue",
  "detail9": "somevalue",
  "detail10": "somevalue",
  "detail11": "somevalue",
  "detail12": "somevalue",
  "detail13": "somevalue",
  "detail14": "somevalue",
  "detail15": "somevalue",
  "detail16": "somevalue",
  "detail17": "somevalue",
  "detail18": "somevalue",
  "detail19": "somevalue",
  "detail20": "somevalue",
  "detail21": "somevalue",
  "detail22": "somevalue",
  "detail23": "somevalue",
  "detail24": "somevalue",
  "detail25": "somevalue",
  "detail26": "somevalue",
  "detail27": "somevalue",
  "detail28": "somevalue",
  "detail29": "somevalue",
  "detail30": "somevalue",
  "detail31": "somevalue",
  "detail32": "somevalue",
  "detail33": "somevalue",
  "detail34": "somevalue",
  "detail35": "somevalue",
  "detail36": "somevalue",
  "detail37": "somevalue",
  "detail38": "somevalue",
  "detail39": "somevalue",
  "detail40": "somevalue",
  "detail41": "somevalue",
  "detail42": "somevalue",
  "detail43": "somevalue",
  "detail44": "somevalue",
  "detail45": "somevalue",
  "detail46": "somevalue",
  "detail47": "somevalue",
  "detail48": "somevalue",
  "detail49": "somevalue",
  "detail50": "somevalue",
}
                  
hdf =HDFStore('storage.h5')

data = {}
for key, value in currenttimedata.items():
        data[key] = [value]

print("data: ", data)

df =DataFrame(data, columns=list(currenttimedata.keys()))
print("df: ", df)
hdf.put('d1', df, format='table', data_columns=True)
print("hdf[d1] 1: ", hdf['d1']) 
for x in range(100):
        print("x: ", x)
        hdf.append('d1', df, format='table', data_columns=True)

print("hdf[d1] 2: ", hdf['d1']) 

hdf.close()# closes the file

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

append very very slow!

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 44.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 3.0.0
IPython : None
pandas_datareader: 0.9.0
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsIO HDF5read_hdf, HDFStorePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions