Warm tip: This article is reproduced from stackoverflow.com, please click
elasticsearch kibana

Issue with nested aggregations ElasticSearch : doing a sum after a max

发布于 2020-03-29 21:01:33

I know sub aggregation isn't possible with metric aggregations and that Elasticsearch supports sub aggregations with buckets. But I am a bit lost on how to do this.

I want to do a sum after nested aggregations and after having aggregated by max timestamp.

Something like the code below, give me this error : "Aggregator [max_date_aggs] of type [max] cannot accept sub-aggregations" which is normal. Is there a way to make it works?

{
"aggs": {
    "sender_comp_aggs": {
        "terms": {
            "field": "senderComponent"
        },
        "aggs": {
            "activity_mnemo_aggs": {
                "terms": {
                    "field": "activityMnemo"
                },
                "aggs": {
                    "activity_instance_id_aggs": {
                        "terms": {
                            "field": "activityInstanceId"
                        },
                        "aggs": {
                            "business_date_aggs": {
                                "terms": {
                                    "field": "correlationIdSet.businessDate"
                                },
                                "aggs": {
                                    "context_set_id_closing_aggs": {
                                        "terms": {
                                            "field": "contextSetId.closing"
                                        },
                                        "aggs": {
                                            "max_date_aggs": {
                                                "max": {
                                                    "field": "timestamp"
                                                },
                                                "aggs" : {
                                                    "sum_done": {
                                                        "sum": {
                                                            "field": "itemNumberDone"
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }

Thank you

Questioner
SophiP
Viewed
83
Daniel Schneiter 2020-02-01 23:41

I am not 100% sure what you would like to achieve, it helps if you also would have shared the mapping.

A bucket aggregation is about defining the buckets/groups. As you do in your example, you can wrap/nest bucket aggregations to further break down your buckets into sub-buckets and so on.

By default Elasticsearch always calculates the count-metric, but you can specify other metrics to get calculated as well. A metric is calculated per bucket / for a bucket (and not for another metric) this is why you cannot nest a metrics aggregation under a metric aggregation, it simply does not make sense.

Depending how your data looks like the only change you may need to do is, moving the sum_done aggregation out of the aggs-clause, to the same level as your max_date_aggs-aggregation.

Code Snippet

"aggs": {
  "max_date_aggs": { "max": {"field": "timestamp"} },
  "sum_done": { "sum": { "field": "itemNumberDone"} }
}

After you refined your question and you provided I managed to come up with a solution requiring one single request. As previously mentioned that sum-metric aggregation needs to operate on a bucket and not a metric. The solution is pretty straight forward: rather than calculating the max-date, just re-formulate this aggregation to a terms-aggregation, sorted by descending timestamp, asking for exactly one bucket.

Solution

GET gos_element/_search
{
  "size": 0, 
  "aggs": {
    "sender_comp_aggs": {
      "terms": {"field": "senderComponent.keyword"},
      "aggs": {
        "activity_mnemo_aggs": {
          "terms": {"field": "activityMnemo.keyword"},
          "aggs": {
            "activity_instance_id_aggs": {
              "terms": {"field": "activityInstanceId.keyword"},
              "aggs": {
                "business_date_aggs": {
                  "terms": {"field": "correlationIdSet.businessDate"},
                  "aggs": {
                    "context_set_id_closing_aggs": {
                      "terms": {"field": "contextSetId.closing.keyword"},
                      "aggs": {
                        "max_date_bucket_aggs": {
                          "terms": {
                            "field": "timestamp",
                            "size": 1, 
                            "order": {"_key": "desc"} 
                          },
                          "aggs": {
                            "sum_done": {
                              "sum": {"field": "itemNumberDone"}
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

As I relied on the default Elasticsearch mapping, I had to refer to the .keyword-version of the fields. If your fields are directly mapped to a field of type keyword, you don't need to do that.

You can test the request above right away after indexing the documents provided by you with the following 2 commands:

PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz
{
  "senderComponent": "PS",
  "timestamp": "2020-01-28T02:31:00Z",
  "activityMnemo": "PScommand",
  "activityInstanceId": "123466",
  "activityStatus": "Progress",
  "activityStatusNumber": 300,
  "specificActivityStatus": "",
  "itemNumberTotal": 10,
  "itemNumberDone": 9,
  "itemNumberInError": 0,
  "itemNumberNotStarted": 1,
  "itemNumberInProgress": 0,
  "itemUnit": "Command",
  "itemList": [],
  "contextSetId": {
    "PV": "VAR",
    "closing": "PARIS"
  },
  "correlationIdSet": {
    "closing": "PARIS",
    "businessDate": "2020-01-27",
    "correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
  },
  "errorSet": [],
  "kpiSet": "",
  "activitySpecificPayload": "",
  "messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}


PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz8z
{
  "senderComponent": "PS",
  "timestamp": "2020-01-28T03:01:00Z",
  "activityMnemo": "PScommand",
  "activityInstanceId": "123466",
  "activityStatus": "End",
  "activityStatusNumber": 200,
  "specificActivityStatus": "",
  "itemNumberTotal": 10,
  "itemNumberDone": 10,
  "itemNumberInError": 0,
  "itemNumberNotStarted": 0,
  "itemNumberInProgress": 0,
  "itemUnit": "Command",
  "itemList": [],
  "contextSetId": {
    "PV": "VAR",
    "closing": "PARIS"
  },
  "correlationIdSet": {
    "closing": "PARIS",
    "businessDate": "2020-01-27",
    "correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
  },
  "errorSet": [],
  "errorMessages": "",
  "kpiSet": "",
  "activitySpecificPayload": "",
  "messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}

As a result you get back the following response (with value 10 as expected):

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "sender_comp_aggs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "PS",
          "doc_count" : 2,
          "activity_mnemo_aggs" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "PScommand",
                "doc_count" : 2,
                "activity_instance_id_aggs" : {
                  "doc_count_error_upper_bound" : 0,
                  "sum_other_doc_count" : 0,
                  "buckets" : [
                    {
                      "key" : "123466",
                      "doc_count" : 2,
                      "business_date_aggs" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                          {
                            "key" : 1580083200000,
                            "key_as_string" : "2020-01-27T00:00:00.000Z",
                            "doc_count" : 2,
                            "context_set_id_closing_aggs" : {
                              "doc_count_error_upper_bound" : 0,
                              "sum_other_doc_count" : 0,
                              "buckets" : [
                                {
                                  "key" : "PARIS",
                                  "doc_count" : 2,
                                  "max_date_bucket_aggs" : {
                                    "doc_count_error_upper_bound" : 0,
                                    "sum_other_doc_count" : 1,
                                    "buckets" : [
                                      {
                                        "key" : 1580180460000,
                                        "key_as_string" : "2020-01-28T03:01:00.000Z",
                                        "doc_count" : 1,
                                        "sum_done" : {
                                          "value" : 10.0
                                        }
                                      }
                                    ]
                                  }
                                }
                              ]
                            }
                          }
                        ]
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}