I know sub aggregation isn't possible with metric aggregations and that Elasticsearch supports sub aggregations with buckets. But I am a bit lost on how to do this.
I want to do a sum after nested aggregations and after having aggregated by max timestamp.
Something like the code below, give me this error : "Aggregator [max_date_aggs] of type [max] cannot accept sub-aggregations" which is normal. Is there a way to make it works?
{
"aggs": {
"sender_comp_aggs": {
"terms": {
"field": "senderComponent"
},
"aggs": {
"activity_mnemo_aggs": {
"terms": {
"field": "activityMnemo"
},
"aggs": {
"activity_instance_id_aggs": {
"terms": {
"field": "activityInstanceId"
},
"aggs": {
"business_date_aggs": {
"terms": {
"field": "correlationIdSet.businessDate"
},
"aggs": {
"context_set_id_closing_aggs": {
"terms": {
"field": "contextSetId.closing"
},
"aggs": {
"max_date_aggs": {
"max": {
"field": "timestamp"
},
"aggs" : {
"sum_done": {
"sum": {
"field": "itemNumberDone"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Thank you
I am not 100% sure what you would like to achieve, it helps if you also would have shared the mapping.
A bucket aggregation is about defining the buckets/groups. As you do in your example, you can wrap/nest bucket aggregations to further break down your buckets into sub-buckets and so on.
By default Elasticsearch always calculates the count-metric, but you can specify other metrics to get calculated as well. A metric is calculated per bucket / for a bucket (and not for another metric) this is why you cannot nest a metrics aggregation under a metric aggregation, it simply does not make sense.
Depending how your data looks like the only change you may need to do is, moving the sum_done
aggregation out of the aggs
-clause, to the same level as your max_date_aggs
-aggregation.
Code Snippet
"aggs": {
"max_date_aggs": { "max": {"field": "timestamp"} },
"sum_done": { "sum": { "field": "itemNumberDone"} }
}
After you refined your question and you provided I managed to come up with a solution requiring one single request. As previously mentioned that sum
-metric aggregation needs to operate on a bucket and not a metric. The solution is pretty straight forward: rather than calculating the max
-date, just re-formulate this aggregation to a terms
-aggregation, sorted by descending timestamp, asking for exactly one bucket.
Solution
GET gos_element/_search
{
"size": 0,
"aggs": {
"sender_comp_aggs": {
"terms": {"field": "senderComponent.keyword"},
"aggs": {
"activity_mnemo_aggs": {
"terms": {"field": "activityMnemo.keyword"},
"aggs": {
"activity_instance_id_aggs": {
"terms": {"field": "activityInstanceId.keyword"},
"aggs": {
"business_date_aggs": {
"terms": {"field": "correlationIdSet.businessDate"},
"aggs": {
"context_set_id_closing_aggs": {
"terms": {"field": "contextSetId.closing.keyword"},
"aggs": {
"max_date_bucket_aggs": {
"terms": {
"field": "timestamp",
"size": 1,
"order": {"_key": "desc"}
},
"aggs": {
"sum_done": {
"sum": {"field": "itemNumberDone"}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
As I relied on the default Elasticsearch mapping, I had to refer to the .keyword
-version of the fields. If your fields are directly mapped to a field of type keyword
, you don't need to do that.
You can test the request above right away after indexing the documents provided by you with the following 2 commands:
PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz
{
"senderComponent": "PS",
"timestamp": "2020-01-28T02:31:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "Progress",
"activityStatusNumber": 300,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 9,
"itemNumberInError": 0,
"itemNumberNotStarted": 1,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
PUT gos_element/_doc/AW_yu3dIa2R_HwqpSz8z
{
"senderComponent": "PS",
"timestamp": "2020-01-28T03:01:00Z",
"activityMnemo": "PScommand",
"activityInstanceId": "123466",
"activityStatus": "End",
"activityStatusNumber": 200,
"specificActivityStatus": "",
"itemNumberTotal": 10,
"itemNumberDone": 10,
"itemNumberInError": 0,
"itemNumberNotStarted": 0,
"itemNumberInProgress": 0,
"itemUnit": "Command",
"itemList": [],
"contextSetId": {
"PV": "VAR",
"closing": "PARIS"
},
"correlationIdSet": {
"closing": "PARIS",
"businessDate": "2020-01-27",
"correlationId": "54947df8-0e9e-4471-a2f9-9af509fb5899"
},
"errorSet": [],
"errorMessages": "",
"kpiSet": "",
"activitySpecificPayload": "",
"messageGroupUUID": "54947df8-0e9e-4471-a2f9-9af509fb5899"
}
As a result you get back the following response (with value 10 as expected):
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"sender_comp_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PS",
"doc_count" : 2,
"activity_mnemo_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PScommand",
"doc_count" : 2,
"activity_instance_id_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "123466",
"doc_count" : 2,
"business_date_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1580083200000,
"key_as_string" : "2020-01-27T00:00:00.000Z",
"doc_count" : 2,
"context_set_id_closing_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "PARIS",
"doc_count" : 2,
"max_date_bucket_aggs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : 1580180460000,
"key_as_string" : "2020-01-28T03:01:00.000Z",
"doc_count" : 1,
"sum_done" : {
"value" : 10.0
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
}
]
}
}
}
Hi, first of all thank you for you quick reply! I tried moving the sum_done at the same level as the max_date_aggs. It shows me the sum of all of the items done and not the sum of the items done which have the max of timestamp. I also tried to use buckets but I don't know how to get the sum of the field itemNumberDone. I'll add further details right away.
Oh, what you would like to achieve is something completely different. You would need to have a bucket / scope with for all documents having that max_date as value. Most likely you have to solve that on application side: - 1st request: query Elasticsearch to figure out about the max_date(s) - 2nd request: use a filters aggregation with the individual max_dates as filters to get the buckets created and then you can ask Elasticsearc to for the sum over the documents in this buckets. I probably want to reconsider your data model anyway as you have so many nested aggregations already.
I tried buckets but I don't know how to retrieve and then sum the itemNumberDone. I'll give you the mapping and further details.
When you say "on application side". You mean create a backend that do the resquests? I would like to do that with only one query and no backend -if possible-.
Thank you, it worked! I changed "_key" by "_term" and it worked just fine.