你好, pandas 用户,
我经常发现自己在每个处理步骤之后都在打印数据框的形状。我这样做是为了监视数据形状如何变化并确保正确完成数据。例如
print(df.shape)
df=df.dropna()
print(df.shape)
df=df.melt()
print(df.shape)
...
我想知道是否有更好的/优雅的方式,最好是一种短程方式或自动方式来执行此类操作。
我即兴创作了Matthew Cox的答案,并为 pandas 数据框本身添加了一个属性。这大大简化了事情。
import numpy as np
import pandas as pd
# set logger
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# log changes in dataframe
def log_(df, fun, *args, **kwargs):
logging.info(f'shape changed from {df.shape}', )
df1 = getattr(df, fun)(*args, **kwargs)
logging.info(f'shape changed to {df1.shape}')
return df1
# custom pandas dataframe
@pd.api.extensions.register_dataframe_accessor("log")
class log:
def __init__(self, pandas_obj):
self._obj = pandas_obj
def dropna(self,**kws):
return log_(self._obj,fun='dropna',**kws)
# demo data
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
"toy": [np.nan, 'Batmobile', 'Bullwhip'],
"born": [pd.NaT, pd.Timestamp("1940-04-25"),
pd.NaT]})
# trial
df.log.dropna()
# stderr
INFO:root:shape changed from (3, 3)
INFO:root:shape changed to (1, 3)
# returns dropna'd dataframe