Search in sources :

Example 6 with ResponseData

use of org.codelibs.fess.crawler.entity.ResponseData in project fess by codelibs.

the class FessXpathTransformerTest method test_canonicalXpath.

public void test_canonicalXpath() throws Exception {
    final FessXpathTransformer transformer = new FessXpathTransformer();
    transformer.init();
    final Map<String, Object> dataMap = new HashMap<String, Object>();
    final ResponseData responseData = new ResponseData();
    responseData.setUrl("http://example.com/");
    String data = "<html><body>aaa</body></html>";
    Document document = getDocument(data);
    try {
        transformer.putAdditionalData(dataMap, responseData, document);
        fail();
    } catch (final ComponentNotFoundException e) {
    // ignore
    }
    data = "<html><head><link rel=\"canonical\" href=\"http://example.com/\"></head><body>aaa</body></html>";
    document = getDocument(data);
    try {
        transformer.putAdditionalData(dataMap, responseData, document);
        fail();
    } catch (final ComponentNotFoundException e) {
    // ignore
    }
    data = "<html><head><link rel=\"canonical\" href=\"http://example.com/foo\"></head><body>aaa</body></html>";
    document = getDocument(data);
    try {
        transformer.putAdditionalData(dataMap, responseData, document);
        fail();
    } catch (final ChildUrlsException e) {
        final Set<RequestData> childUrlList = e.getChildUrlList();
        assertEquals(1, childUrlList.size());
        assertEquals("http://example.com/foo", childUrlList.iterator().next().getUrl());
    }
    data = "<html><link rel=\"canonical\" href=\"http://example.com/foo\"><body>aaa</body></html>";
    document = getDocument(data);
    try {
        transformer.putAdditionalData(dataMap, responseData, document);
        fail();
    } catch (final ChildUrlsException e) {
        final Set<RequestData> childUrlList = e.getChildUrlList();
        assertEquals(1, childUrlList.size());
        assertEquals("http://example.com/foo", childUrlList.iterator().next().getUrl());
    }
}
Also used : ChildUrlsException(org.codelibs.fess.crawler.exception.ChildUrlsException) Set(java.util.Set) ComponentNotFoundException(org.lastaflute.di.core.exception.ComponentNotFoundException) HashMap(java.util.HashMap) ResponseData(org.codelibs.fess.crawler.entity.ResponseData) Document(org.w3c.dom.Document)

Example 7 with ResponseData

use of org.codelibs.fess.crawler.entity.ResponseData in project fess by codelibs.

the class DocumentHelperTest method test_getContent_maxSymbol.

public void test_getContent_maxSymbol() {
    DocumentHelper documentHelper = new DocumentHelper() {

        protected int getMaxSymbolTermSize() {
            return 2;
        }
    };
    ResponseData responseData = new ResponseData();
    Map<String, Object> dataMap = new HashMap<>();
    assertEquals("", documentHelper.getContent(responseData, null, dataMap));
    assertEquals("", documentHelper.getContent(responseData, "", dataMap));
    assertEquals("", documentHelper.getContent(responseData, " ", dataMap));
    assertEquals("", documentHelper.getContent(responseData, "  ", dataMap));
    assertEquals("", documentHelper.getContent(responseData, "\t", dataMap));
    assertEquals("", documentHelper.getContent(responseData, "\t\t", dataMap));
    assertEquals("", documentHelper.getContent(responseData, "\t \t", dataMap));
    assertEquals("123 abc", documentHelper.getContent(responseData, " 123 abc ", dataMap));
    assertEquals("123 あいう", documentHelper.getContent(responseData, " 123 あいう ", dataMap));
    assertEquals("123 abc", documentHelper.getContent(responseData, " 123\nabc ", dataMap));
    assertEquals("123abc", documentHelper.getContent(responseData, " 123abc ", dataMap));
    assertEquals("!!", documentHelper.getContent(responseData, "!!!", dataMap));
    assertEquals("//", documentHelper.getContent(responseData, "///", dataMap));
    assertEquals("::", documentHelper.getContent(responseData, ":::", dataMap));
    assertEquals("@@", documentHelper.getContent(responseData, "@@@", dataMap));
    assertEquals("[[", documentHelper.getContent(responseData, "[[[", dataMap));
    assertEquals("``", documentHelper.getContent(responseData, "```", dataMap));
    assertEquals("{{", documentHelper.getContent(responseData, "{{{", dataMap));
    assertEquals("~~", documentHelper.getContent(responseData, "~~~", dataMap));
    assertEquals("!\"", documentHelper.getContent(responseData, "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~", dataMap));
}
Also used : HashMap(java.util.HashMap) ResponseData(org.codelibs.fess.crawler.entity.ResponseData)

Example 8 with ResponseData

use of org.codelibs.fess.crawler.entity.ResponseData in project fess by codelibs.

the class DocumentHelperTest method test_getContent.

public void test_getContent() {
    DocumentHelper documentHelper = new DocumentHelper();
    ResponseData responseData = new ResponseData();
    Map<String, Object> dataMap = new HashMap<>();
    assertEquals("", documentHelper.getContent(responseData, null, dataMap));
    assertEquals("", documentHelper.getContent(responseData, "", dataMap));
    assertEquals("", documentHelper.getContent(responseData, " ", dataMap));
    assertEquals("", documentHelper.getContent(responseData, "  ", dataMap));
    assertEquals("", documentHelper.getContent(responseData, "\t", dataMap));
    assertEquals("", documentHelper.getContent(responseData, "\t\t", dataMap));
    assertEquals("", documentHelper.getContent(responseData, "\t \t", dataMap));
    assertEquals("123 abc", documentHelper.getContent(responseData, " 123 abc ", dataMap));
    assertEquals("123 あいう", documentHelper.getContent(responseData, " 123 あいう ", dataMap));
    assertEquals("123 abc", documentHelper.getContent(responseData, " 123\nabc ", dataMap));
}
Also used : HashMap(java.util.HashMap) ResponseData(org.codelibs.fess.crawler.entity.ResponseData)

Example 9 with ResponseData

use of org.codelibs.fess.crawler.entity.ResponseData in project fess by codelibs.

the class FessXpathTransformerTest method assertGetThumbnailUrl.

private void assertGetThumbnailUrl(String data, String expected) throws Exception {
    final Document document = getDocument(data);
    final FessXpathTransformer transformer = new FessXpathTransformer();
    transformer.init();
    final ResponseData responseData = new ResponseData();
    responseData.setUrl("http://example.com/");
    assertEquals(expected, transformer.getThumbnailUrl(responseData, document));
}
Also used : ResponseData(org.codelibs.fess.crawler.entity.ResponseData) Document(org.w3c.dom.Document)

Example 10 with ResponseData

use of org.codelibs.fess.crawler.entity.ResponseData in project fess by codelibs.

the class FessXpathTransformerTest method test_processMetaRobots_no.

public void test_processMetaRobots_no() throws Exception {
    final String data = "<html><body>foo</body></html>";
    final Document document = getDocument(data);
    final FessXpathTransformer transformer = new FessXpathTransformer();
    final ResponseData responseData = new ResponseData();
    responseData.setUrl("http://example.com/");
    transformer.processMetaRobots(responseData, new ResultData(), document);
    assertFalse(responseData.isNoFollow());
}
Also used : ResultData(org.codelibs.fess.crawler.entity.ResultData) ResponseData(org.codelibs.fess.crawler.entity.ResponseData) Document(org.w3c.dom.Document)

Aggregations

ResponseData (org.codelibs.fess.crawler.entity.ResponseData)19 ResultData (org.codelibs.fess.crawler.entity.ResultData)10 HashMap (java.util.HashMap)8 Document (org.w3c.dom.Document)8 ChildUrlsException (org.codelibs.fess.crawler.exception.ChildUrlsException)7 Map (java.util.Map)5 ArrayList (java.util.ArrayList)4 Date (java.util.Date)4 HashSet (java.util.HashSet)4 List (java.util.List)4 Set (java.util.Set)4 StreamUtil.stream (org.codelibs.core.stream.StreamUtil.stream)4 CrawlerSystemException (org.codelibs.fess.crawler.exception.CrawlerSystemException)4 FessConfig (org.codelibs.fess.mylasta.direction.FessConfig)4 ComponentUtil (org.codelibs.fess.util.ComponentUtil)4 Logger (org.slf4j.Logger)4 LoggerFactory (org.slf4j.LoggerFactory)4 SerializeUtil (org.codelibs.core.io.SerializeUtil)3 StringUtil (org.codelibs.core.lang.StringUtil)3 ComponentNotFoundException (org.lastaflute.di.core.exception.ComponentNotFoundException)3