Warm tip: This article is reproduced from serverfault.com, please click

python-在每个步骤之后重复打印 pandas 数据框的形状的替代方法

(python - Alternative to repeatedly printing shapes of the pandas dataframe after every step)

发布于 2020-11-30 19:55:00

你好, pandas 用户,
我经常发现自己在每个处理步骤之后都在打印数据框的形状。我这样做是为了监视数据形状如何变化并确保正确完成数据。例如

print(df.shape)
df=df.dropna()
print(df.shape)
df=df.melt()
print(df.shape)
...

我想知道是否有更好的/优雅的方式,最好是一种短程方式或自动方式来执行此类操作。

Questioner
Ramirez
Viewed
0
Ramirez 2020-12-06 08:31:50

我即兴创作了Matthew Cox的答案,并为 pandas 数据框本身添加了一个属性。这大大简化了事情。

import numpy as np
import pandas as pd

# set logger
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# log changes in dataframe
def log_(df, fun, *args, **kwargs):
    logging.info(f'shape changed from {df.shape}', )
    df1 = getattr(df, fun)(*args, **kwargs)
    logging.info(f'shape changed to   {df1.shape}')
    return df1

# custom pandas dataframe
@pd.api.extensions.register_dataframe_accessor("log")
class log:
    def __init__(self, pandas_obj):
        self._obj = pandas_obj
    def dropna(self,**kws):
        return log_(self._obj,fun='dropna',**kws)
# demo data    
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
                   "toy": [np.nan, 'Batmobile', 'Bullwhip'],
                   "born": [pd.NaT, pd.Timestamp("1940-04-25"),
                            pd.NaT]})
# trial
df.log.dropna()
# stderr
INFO:root:shape changed from (3, 3)
INFO:root:shape changed to   (1, 3)
# returns dropna'd dataframe