我正在尝试使用python代码使用mapreduce执行hadoop流,但是,它始终会给出相同的错误结果,
File: file:/C:/py-hadoop/map.py is not readable
或者
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
我在Windows 10操作系统上使用hadoop 3.1.1和python 3.8
这是我的 map减少命令行
hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file C:/py-hadoop/map.py,C:/py-hadoop/reduce.py -mapper "python map.py" -reducer "python reduce.py" -input /user/input -output /user/python-output
map.py
import sys
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print ("%s\t%s" % (word, 1))
reduce.py
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
clean = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
try:
count = int(count)
except ValueError:
continue
word = filter(lambda x: x in clean, word).lower()
if current_word == word:
current_count += count
else:
if current_word:
print ("%s\t%s" % (current_word, current_count))
current_count = count
current_word = word
if current_word == word:
print ("%s\t%s" % (current_word, current_count))
也已经尝试过使用其他命令行,例如
hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file C:/py-hadoop/map.py -mapper "python map.py" -file C:/py-hadoop/reduce.py -reducer "python reduce.py" -input /user/input -output /user/python-output
和
hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file py-hadoop/map.py -mapper "python map.py" -file py-hadoop/reduce.py -reducer "python reduce.py" -input /user/input -output /user/python-output
但仍然给出完全相同的错误结果,
对不起,如果我的英语不好,我不是母语
已经解决了,问题是由reduce.py引起的,这是我的新reduce.py
import sys
import collections
counter = collections.Counter()
for line in sys.stdin:
word, count = line.strip().split("\t", 1)
counter[word] += int(count)
for x in counter.most_common(9999):
print(x[0],"\t",x[1])
这是我以前运行的命令行
hadoop jar C:/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.1.1.jar -file C:/py-hadoop/map.py -file C:/py-hadoop/reduce.py -mapper "python map.py" -reducer "python reduce.py" -input /user/input -output /user/python-output