I need some help and don't know how to start with it.
Basically, i have two csv input files (coming from 2 different nodes) and would like to sum the values with the result going to elasticsearch..
Any help is greatly appreciated...
CSV1
node link rate-in rate-out
node1 link1 10 20
node1 link2 30 50
node1 link3 40 60
CSV2
node link rate-in rate-out
node2 link1 20 10
node2 link2 50 70
node2 link3 80 40
Result --> create elasticsearch index
node link rate-in rate-out
allnode link1 30 30
allnode link2 80 120
allnode link3 120 100
Thanks!
Pandas has good support for reading and manipulating CSV files.
import pandas as pd
df = pd.concat([
pd.read_csv('csv1.csv'),
pd.read_csv('csv2.csv')
])
result=df.groupby('link', as_index=False).sum()
result['node'] = 'allnode'
result.to_csv('result.csv')
Where csv1.csv is:
node,link,rate-in,rate-out node1,link1,10,20 node1,link2,30,50 node1,link3,40,60
And csv2.csv is:
node,link,rate-in,rate-out node2,link1,20,10 node2,link2,50,70 node2,link3,80,40
The result.csv file will contain:
link,rate-in,rate-out,node link1,30,30,allnode link2,80,120,allnode link3,120,100,allnode
If you want to reorder the columns you can do that by providing an order list
result[[
'node','link','rate-in','rate-out'
]].to_csv('result.csv', index=False)
small question, in the result.csv the node column is put at the end.. is it possible to put this as the first column?