Nutch hello world
Nutch hello world download and install ant download and install Cygwin download HBase 0.94.14 http://mirrors.cnnic.cn/apache/hbase/stable/hbase-0.98.9-hadoop2-bin.tar.gz config java_home in .bashrc Download a source package http://mirror.bit.edu.cn/apache/nutch/2.2.1/ cd apache-nutch-2.2.1 Run ant Now there is a directory runtime/local which contains a ready to use Nutch installation. Customize your crawl properties Add your agent name in the value field of the http.agent.name property in conf/nutch-site.xml, for example: ...