Warm tip: This article is reproduced from stackoverflow.com, please click
docker logstash-grok

Parsing Quoted Strings and DateTime Offset

发布于 2020-03-29 21:01:47

With Grok Debuger I am trying to parse some custom data:

1 1 "Device 1" 1 "Input 1" 0 "On" "Off" "2020-01-01T00:00:00.1124303+00:00"

So far I have:

%{INT:id} %{INT:device} %{QUOTEDSTRING:device_name} %{INT:input} %{QUOTEDSTRING:input_name} %{INT:state} %{QUOTEDSTRING:on_phrase} %{QUOTEDSTRING:off_phrase} \"%{TIMESTAMP_ISO8601:when}\"

However, I am getting things like double quotes around strings %{QUOTEDSTRING), and two lots of hours and minutes with the time and date %{TIMESTAMP_ISO8601:when}

{
  "id": [
    [
      "1"
    ]
  ],
  "device": [
    [
      "1"
    ]
  ],
  "device_name": [
    [
      ""Device 1""
    ]
  ],
  "input": [
    [
      "1"
    ]
  ],
  "input_name": [
    [
      ""Input 1""
    ]
  ],
  "state": [
    [
      "0"
    ]
  ],
  "on_phrase": [
    [
      ""On""
    ]
  ],
  "off_phrase": [
    [
      ""Off""
    ]
  ],
  "when": [
    [
      "2020-01-01T00:00:00.1124303+00:00"
    ]
  ],
  "YEAR": [
    [
      "2020"
    ]
  ],
  "MONTHNUM": [
    [
      "01"
    ]
  ],
  "MONTHDAY": [
    [
      "01"
    ]
  ],
  "HOUR": [
    [
      "00",
      "00"
    ]
  ],
  "MINUTE": [
    [
      "00",
      "00"
    ]
  ],
  "SECOND": [
    [
      "00.1124303"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      "+00:00"
    ]
  ]
}

Also, I am a little stuck when it comes to the logstash.conf as I am not sure what I would put as the index in the output. The following code is from a previous example from github:

input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
  }
}

output {
  elasticsearch {
    hosts => "elasticsearch:9200"
    manage_template => false
    index => "sample-%{+YYYY.MM.dd}"
  }
}

I'm guessing mine would look something like this:

input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{INT:id} %{INT:device} %{QUOTEDSTRING:device_name} %{INT:input} %{QUOTEDSTRING:input_name} %{INT:state} %{QUOTEDSTRING:on_phrase} %{QUOTEDSTRING:off_phrase} \"%{TIMESTAMP_ISO8601:when}\"" }
  }
}

output {
  elasticsearch {
    hosts => "elasticsearch:9200"
    manage_template => false
    index => "sample-%{????????}"
  }
}

Again I'm unclear as to what I am supposed to do with "sample-%{????????}"

Questioner
user1574598
Viewed
16
Mafor 2020-01-31 19:56

In regard to the double-double-quotes: just use DATA instead of QUOTEDSTRING:

"%{DATA:device_name}"

Duplicated entries in the hours and minutes come from the timezone: first entry is the actual hour, the second one is the hour of the timezone. Same for the minutes.

To get rid of it you would need a custom pattern:

"(?<when>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?(?<ISO8601_TIMEZONE>Z|[+-](?:2[0123]|[01]?[0-9])(?::?(?:[0-5][0-9])))?)"

(if you are not interested in parsing the timestamp at all, just use DATA again).

So, your pattern might look like this:

%{INT:id} %{INT:device} "%{DATA:device_name}" %{INT:input} "%{DATA:input_name}" %{INT:state} "%{DATA:on_phrase}" "%{DATA:off_phrase}" "(?<when>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?(?<ISO8601_TIMEZONE>Z|[+-](?:2[0123]|[01]?[0-9])(?::?(?:[0-5][0-9])))?)"

Regarding index:

  • you can omit it completely then the default one is used: logstash-%{+YYYY.MM.dd}
  • you can use sample-%{+YYYY.MM.dd} if you want to have separate indexes for each day
  • you can use sample- to have just one index
  • you can use any other combination of the fields in your index pattern