Warm tip: This article is reproduced from serverfault.com, please click

can graphite or grafana used to monitor pyspark metrics?

发布于 2020-12-03 06:03:06

In a pyspark project we have pyspark dataframe.foreachPartition(func) and in that func we have some aiohttp call to transfer data. What type of monitor tools can be used to monitor the metrics like data rate, throughput, time elapsed...? Can we use statsd and graphite or grafana in this case(they're prefered if possible)? Thanks.

Questioner
JamesWang
Viewed
0
JamesWang 2021-02-22 07:45:07

Here is my solution. I used PySpark's accumulators to collect the metrics(number of http calls, payload sent per call, etc.) at each partitions, at the driver node, assign these accumulators' value to statsD gauge variable, and send these metrics to Graphite server and eventually visualized them in Grafana dashboard. It works so far so good.