Apache Nutch 1.3 学习笔记十(插件扩展)
1. 自己扩展一个简单的插件
这里扩展一个Nutch的URLFilter插件,叫MyURLFilter
1.1 生成一个Package
首先生成一个与urlfilter-regex类似的包结构
如org.apache.nutch.urlfilter.my
1.2 在这个包中生成相应的扩展文件
再生成一个MyURLFilter.java文件,内容如下:
[*]
package org.apache.nutch.urlfilter.my;
[*]
[*]
import java.io.BufferedReader;
[*]
import java.io.IOException;
[*]
import java.io.InputStreamReader;
[*]
[*]
[*]
import org.apache.hadoop.conf.Configuration;
[*]
import org.apache.nutch.net.URLFilter;
[*]
import org.apache.nutch.urlfilter.prefix.PrefixURLFilter;
[*]
[*]
[*]
public class MyURLFilter implements URLFilter{ // 这里的继承自Nutch的URLFilter扩展
[*]
private Configuration conf;
[*]
[*]
public MyURLFilter()
[*]
{}
[*]
@Override
[*]
public String filter(String urlString) {// 对url字符串进行过滤
[*]
// TODO Auto-generated method stub
[*]
return "My Filter:"+ urlString;
[*]
}
[*]
[*]
[*]
@Override
[*]
public Configuration getConf() {
[*]
// TODO Auto-generated method stub
[*]
return this.conf;
[*]
}
[*]
[*]
[*]
@Override
[*]
public void setConf(Configuration conf) {
[*]
// TODO Auto-generated method stub
[*]
this.conf = conf;
[*]
}
[*]
[*]
public static void main(String[] args) throws IOException
[*]
{
[*]
[*]
MyURLFilter filter = new MyURLFilter();
[*]
[*]
BufferedReader in=new BufferedReader(new InputStreamReader(System.in));
[*]
String line;
[*]
while((line=in.readLine())!=null) {
[*]
String out=filter.filter(line);
[*]
if(out!=null) {
[*]
System.out.println(out);
[*]
}
[*]
}
[*]
}
[*]
[*]
[*]
}
1.3 打包成jar包并生成相应的plugin.xml文件
打包可以用ivy或者是eclipse来打,每一个plugin都有一个描述文件plugin.xml,内容如下:
[*]
[*]
[*]
[*]
[*]
[*]
[*]
[*]
页:
[1]