温馨提示:本文翻译自stackoverflow.com,查看原文请点击:docker - Parsing Quoted Strings and DateTime Offset

docker - 解析引用的字符串和DateTime偏移量

发布于 2020-03-29 21:50:55

使用Grok Debuger,我想解析一些自定义数据:

1 1“设备1” 1“输入1” 0“开”“关”“ 2020-01-01T00:00:00.1124303 + 00:00”

到目前为止,我有:

%{INT:id}%{INT:device}%{QUOTEDSTRING:device_name}%{INT:input}%{QUOTEDSTRING:input_name}%{INT:state}%{QUOTEDSTRING:on_phrase}%{QUOTEDSTRING:off_phrase} \“ %{TIMESTAMP_ISO8601:何时} \“

但是,我得到的东西像是字符串的双引号%{QUOTEDSTRING),以及带有时间和日期的两个小时和分钟%{TIMESTAMP_ISO8601:when}

{
  "id": [
    [
      "1"
    ]
  ],
  "device": [
    [
      "1"
    ]
  ],
  "device_name": [
    [
      ""Device 1""
    ]
  ],
  "input": [
    [
      "1"
    ]
  ],
  "input_name": [
    [
      ""Input 1""
    ]
  ],
  "state": [
    [
      "0"
    ]
  ],
  "on_phrase": [
    [
      ""On""
    ]
  ],
  "off_phrase": [
    [
      ""Off""
    ]
  ],
  "when": [
    [
      "2020-01-01T00:00:00.1124303+00:00"
    ]
  ],
  "YEAR": [
    [
      "2020"
    ]
  ],
  "MONTHNUM": [
    [
      "01"
    ]
  ],
  "MONTHDAY": [
    [
      "01"
    ]
  ],
  "HOUR": [
    [
      "00",
      "00"
    ]
  ],
  "MINUTE": [
    [
      "00",
      "00"
    ]
  ],
  "SECOND": [
    [
      "00.1124303"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      "+00:00"
    ]
  ]
}

另外,在谈到时,我有些困惑,logstash.conf因为我不确定要放在里面index的内容output以下代码来自github的先前示例:

input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
  }
}

output {
  elasticsearch {
    hosts => "elasticsearch:9200"
    manage_template => false
    index => "sample-%{+YYYY.MM.dd}"
  }
}

我猜我的看起来像这样:

input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{INT:id} %{INT:device} %{QUOTEDSTRING:device_name} %{INT:input} %{QUOTEDSTRING:input_name} %{INT:state} %{QUOTEDSTRING:on_phrase} %{QUOTEDSTRING:off_phrase} \"%{TIMESTAMP_ISO8601:when}\"" }
  }
}

output {
  elasticsearch {
    hosts => "elasticsearch:9200"
    manage_template => false
    index => "sample-%{????????}"
  }
}

再次,我不清楚我应该怎么做 "sample-%{????????}"

查看更多

提问者
user1574598
被浏览
24
Mafor 2020-01-31 19:56

关于双引号:只需使用DATA而不是QUOTEDSTRING即可:

"%{DATA:device_name}"

小时和分钟中重复的条目来自时区:第一个条目是实际的小时,第二个条目是时区的小时。分钟也一样。

要摆脱它,您将需要一个自定义模式:

"(?<when>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?(?<ISO8601_TIMEZONE>Z|[+-](?:2[0123]|[01]?[0-9])(?::?(?:[0-5][0-9])))?)"

(如果您根本对解析时间戳不感兴趣,只需再次使用DATA)。

因此,您的模式可能如下所示:

%{INT:id} %{INT:device} "%{DATA:device_name}" %{INT:input} "%{DATA:input_name}" %{INT:state} "%{DATA:on_phrase}" "%{DATA:off_phrase}" "(?<when>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?(?<ISO8601_TIMEZONE>Z|[+-](?:2[0123]|[01]?[0-9])(?::?(?:[0-5][0-9])))?)"

关于指数:

  • 您可以完全省略它,然后使用默认值: logstash-%{+YYYY.MM.dd}
  • sample-%{+YYYY.MM.dd}如果您想每天有单独的索引,可以使用
  • 您可以只使用sample-一个索引
  • 您可以在索引模式中使用字段的任何其他组合