我使用以下信息模式获取了此文件:
# Query 1: 204.60k QPS, 230.79x concurrency, ID XXXXXXXXXX at byte 19XXX9318
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2020-01-29 18:18:59.073995 to 18:18:59.074005
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 7 2
# Exec time 10 2ms 1ms 1ms 1ms 1ms 12us 1ms
# Rows affecte 0 0 0 0 0 0 0 0
# Query size 7 74 37 37 37 37 0 37
# Warning coun 0 0 0 0 0 0 0 0
# String:
# Hosts 10.1.1.5 (1/50%), 10.8.0.2 (1/50%)
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms ################################################################
# 10ms
# 100ms
# 1s
# 10s+
SHOW SESSION STATUS LIKE 'XXXXX'\G
\n break line
repeat
我想在python中运行脚本以仅从该文件中获取一些信息。将有多个查询。
目前我正在尝试这样的事情:
#!/usr/bin/python
file = open("/etc/openvpn/logs/log-2020_01_29_06_20_PM.txt", "r")
read = file.read()
removeChar = read.replace("#", "")
for item in removeChar.split("\n"):
if "Hosts" and "Time range" in item:
print item.strip()
输出为:
Time range: 2020-01-29 18:18:59.073995 to 18:18:59.074005
Time range: 2020-01-29 18:18:58.489162 to 18:18:59.188582
Time range: 2020-01-29 18:18:58.666020 to 18:18:58.666028
我希望它是这样的:
['Query 1, 2020-01-29 18:18:59, 10.1.1.5, 10.8.0.2, SHOW SESSION STATUS LIKE 'XXXXX'\G']
['Query 2, 2020-01-29 18:19:59, 10.1.1.5, 10.8.0.2, SHOW FROM BLA * LIKE 'BLA'\G']
我厌倦了尝试找到方法,也正在学习python,因为它是一种学习的好语言!:)
谢谢。
您可以尝试不使用正则表达式,而仅使用字符串操作:
data = '''# Query 1: 204.60k QPS, 230.79x concurrency, ID XXXXXXXXXX at byte 19XXX9318
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2020-01-29 18:18:59.073995 to 18:18:59.074005
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 7 2
# Exec time 10 2ms 1ms 1ms 1ms 1ms 12us 1ms
# Rows affecte 0 0 0 0 0 0 0 0
# Query size 7 74 37 37 37 37 0 37
# Warning coun 0 0 0 0 0 0 0 0
# String:
# Hosts 10.1.1.5 (1/50%), 10.8.0.2 (1/50%)
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms ################################################################
# 10ms
# 100ms
# 1s
# 10s+
SHOW SESSION STATUS LIKE 'XXXXX'\G
\n break line
repeat
'''
data = data.split('\n')
all_results = []
result = []
for row in data:
if row.startswith('# Query ') and not row.startswith('# Query size'):
row = row.split(':')[0].split('# ')[1]
result.append(row)
elif row.startswith('# Hosts'):
row = row.replace('# Hosts', '').replace(' ', '').split(',')
result.append(row[0].split('(')[0])
result.append(row[1].split('(')[0])
elif row.startswith('# Time range:'):
row = row.replace('# Time range:', '').split('.')[0].strip()
result.append(row)
elif row.startswith('SHOW') and row.endswith('\G'):
result.append(row)
result = ', '.join(result)
all_results.append(result)
result = []
print(all_results)
# output: "Query 1, 2020-01-29 18:18:59, 10.1.1.5, 10.8.0.2, SHOW SESSION STATUS LIKE 'XXXXX'\\G"
嘿! 谢谢你的伙伴!当我尝试获取文件并读取并使用它时,它说:IndexError:列表索引超出范围。我该如何克服?我想打开一个文件并阅读
错误发生在哪一行?或张贴,如果您可以测试这些文件之一。
我添加了<< [data = open(“ / path / to / file.txt”,“ r”)] >>然后将数据拆分为<< [data = data.read()。split('\ n ')] >>和错误,它在第10行<< [结果= [data [0] .split(':')[0] .split('#')[1]] IndexError:列表索引超出范围]] >>
每个文件是否包含多个查询?
是。它是一个包含每周查询的文件...,我想格式化所有与您一样的格式,因为下一步将其全部存档到数据库中,以查看谁做了什么:)