I have raw data as following example. At instant t1, a variable has a value x1, this variable should be recorded at instant t2 if and only if its value is not equal to x1. There is a way to compare a value in dataframes in python with the previous value and delete it if it's the same. I tried follow function, but it doesn't work.Please help.
df
time Variable Value
2014-07-11 19:50:20 Var1 10
2014-07-11 19:50:30 Var1 20
2014-07-11 19:50:40 Var1 20
2014-07-11 19:50:50 Var1 30
2014-07-11 19:50:60 Var1 20
2014-07-11 19:50:70 Var2 50
2014-07-11 19:50:80 Var2 60
2014-07-11 19:50:90 Var2 70
Coding:
for y in df.time:
for x in df.Value:
if y == y:
if x == x:
df1 = df.drop_duplicates(subset = ['time', 'Variable', 'Value'], keep=False)
else:
df1 = df.drop_duplicates(['time', 'Variable', 'Value'])
Expected output:
df
time Variable Value
2014-07-11 19:50:20 Var1 10
2014-07-11 19:50:30 Var1 20
2014-07-11 19:50:50 Var1 30
2014-07-11 19:50:60 Var1 20
2014-07-11 19:50:70 Var2 50
2014-07-11 19:50:80 Var2 60
2014-07-11 19:50:90 Var2 70
df.drop_duplicates(subset=['Variable','Value'],keep='first')
# time Variable Value
#2014-07-11 19:50:20 Var1 10
#2014-07-11 19:50:30 Var1 20
#2014-07-11 19:50:50 Var2 30
#2014-07-11 19:50:60 Var2 40
#2014-07-11 19:50:70 Var2 50
Thank you. So, why did we just take 2 subsets while we got 3?
You do not want the same variable with the same value.
Many thanks. Your answer works, but I get one issue in my data, that, at the instant t3, same var1, and value 3 is same to value 1, and I would like to keep value 3 at var1 at t3 because it's not consecutive with t1. I updated my data.
Its' a different question and requires a different answer.
Thank you. I will raise the other question.