Search in sources :

Example 1 with ParserFactory

use of org.apache.nutch.parse.ParserFactory in project nutch by apache.

the class FeedParser method setConf.

/**
 * Sets the {@link Configuration} object for this {@link Parser}. This
 * {@link Parser} expects the following configuration properties to be set:
 *
 * <ul>
 * <li>URLNormalizers - properties in the configuration object to set up the
 * default url normalizers.</li>
 * <li>URLFilters - properties in the configuration object to set up the
 * default url filters.</li>
 * </ul>
 *
 * @param conf
 *          The Hadoop {@link Configuration} object to use to configure this
 *          {@link Parser}.
 */
public void setConf(Configuration conf) {
    this.conf = conf;
    this.parserFactory = new ParserFactory(conf);
    this.normalizers = new URLNormalizers(conf, URLNormalizers.SCOPE_OUTLINK);
    this.filters = new URLFilters(conf);
    this.defaultEncoding = conf.get("parser.character.encoding.default", "windows-1252");
}
Also used : ParserFactory(org.apache.nutch.parse.ParserFactory) URLFilters(org.apache.nutch.net.URLFilters) URLNormalizers(org.apache.nutch.net.URLNormalizers)

Aggregations

URLFilters (org.apache.nutch.net.URLFilters)1 URLNormalizers (org.apache.nutch.net.URLNormalizers)1 ParserFactory (org.apache.nutch.parse.ParserFactory)1