单词计数示例
from pydoop.pipes import Mapper, Reducer, Factory, runTask
class WordCountMapper(Mapper):
def map(self, context):
words = context.getInputValue().split()
for w in words:
context.emit(w, "1")
class WordCountReducer(Reducer):
def reduce(self, context):
s = 0
while context.nextValue():
s += int(context.getInputValue())
context.emit(context.getInputKey(), str(s))
runTask(Factory(WordCountMapper, WordCountReducer))
对简单任务,可以使用pydoop_script工具:
def mapper(k, text, writer):
for word in text.split():
writer.emit(word, 1)
def reducer(word, count, writer):
writer.emit(word, sum(map(int, count))) 参考:
下载:https://sourceforge.net/projects/pydoop/files
示例地址:http://pydoop.sourceforge.net/docs/examples/index.html
最新版下载:http://sourceforge.net/projects/pydoop/files/Pydoop-0.5/pydoop-0.5.2-rc2.tar.gz/download
主页:http://sourceforge.net/apps/mediawiki/pydoop/index.php?title=Main_Page