Search in sources :

Example 1 with ResponseHandler

use of com.kyj.fx.voeditor.visual.util.ResponseHandler in project Gargoyle by callakrsos.

the class TF_IDF method findAllLinks.

@Test
public void findAllLinks() throws Exception {
    URL url;
    url = new URL("https://search.naver.com/search.naver?where=nexearch&query=%ED%91%9C%EC%B0%BD%EC%9B%90&sm=top_hty&fbm=1&ie=utf8");
    ResponseHandler<Set<String>> responseHandler = new ResponseHandler<Set<String>>() {

        @Override
        public Set<String> apply(InputStream is, Integer code) {
            Set<String> collect = Collections.emptySet();
            try {
                Document parse = Jsoup.parse(is, "UTF-8", "http");
                /*a 태그 만 추출.*/
                // parse.getElementsByTag("a");
                Elements elementsByTag = parse.getElementsByTag("a");
                collect = elementsByTag.stream().filter(e -> e.hasAttr("href")).map(e -> e.attr("href").trim()).filter(e -> e.startsWith("http") || e.startsWith("https")).filter(v -> {
                    if ("https://submit.naver.com/".equals(v))
                        return false;
                    else if ("http://www.naver.com".equals(v))
                        return false;
                    else if (v.startsWith("https://nid.naver.com"))
                        return false;
                    else if (v.startsWith("http://searchad.naver.com"))
                        return false;
                    else if (v.contains("namu.wiki"))
                        return false;
                    else if (v.contains("wikipedia.org"))
                        return false;
                    else if (v.startsWith("http://music.naver.com"))
                        return false;
                    else if (v.startsWith("http://m.post.naver.com"))
                        return false;
                    else if (v.startsWith("http://tvcast.naver.com"))
                        return false;
                    else if (v.startsWith("http://shopping.naver.com"))
                        return false;
                    else if (v.startsWith("https://help.naver"))
                        return false;
                    else if (v.startsWith("http://www.navercorp.com"))
                        return false;
                    else if (v.startsWith("http://book.naver.com"))
                        return false;
                    else if (v.startsWith("http://www.cwpyo.com"))
                        return false;
                    else if (v.startsWith("http://navercast.naver.com"))
                        return false;
                    return true;
                }).collect(Collectors.toSet());
            } catch (IOException e) {
                e.printStackTrace();
            }
            return collect;
        }
    };
    Set<String> reqeustSSL = RequestUtil.requestSSL(url, responseHandler);
    // reqeustSSL.forEach(System.out::println);
    getString(reqeustSSL);
}
Also used : URL(java.net.URL) RequestUtil(com.kyj.fx.voeditor.visual.util.RequestUtil) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) BoilerpipeSAXInput(com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput) KeyValue(com.kyj.fx.voeditor.visual.framework.KeyValue) ExtractorBase(com.kohlschutter.boilerpipe.extractors.ExtractorBase) URLModel(com.kyj.fx.voeditor.visual.framework.URLModel) Before(org.junit.Before) InputSource(org.xml.sax.InputSource) ProxyInitializable(com.kyj.fx.voeditor.visual.main.initalize.ProxyInitializable) Logger(org.slf4j.Logger) ResponseHandler(com.kyj.fx.voeditor.visual.util.ResponseHandler) MalformedURLException(java.net.MalformedURLException) Collection(java.util.Collection) Set(java.util.Set) IOException(java.io.IOException) Test(org.junit.Test) ValueUtil(com.kyj.fx.voeditor.visual.util.ValueUtil) ArticleSentencesExtractor(com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor) Collectors(java.util.stream.Collectors) List(java.util.List) KeepEverythingExtractor(com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor) StringReader(java.io.StringReader) Document(org.jsoup.nodes.Document) Jsoup(org.jsoup.Jsoup) Elements(org.jsoup.select.Elements) Collections(java.util.Collections) TextDocument(com.kohlschutter.boilerpipe.document.TextDocument) InputStream(java.io.InputStream) ArticleExtractor(com.kohlschutter.boilerpipe.extractors.ArticleExtractor) Set(java.util.Set) ResponseHandler(com.kyj.fx.voeditor.visual.util.ResponseHandler) InputStream(java.io.InputStream) IOException(java.io.IOException) Document(org.jsoup.nodes.Document) TextDocument(com.kohlschutter.boilerpipe.document.TextDocument) Elements(org.jsoup.select.Elements) URL(java.net.URL) Test(org.junit.Test)

Example 2 with ResponseHandler

use of com.kyj.fx.voeditor.visual.util.ResponseHandler in project Gargoyle by callakrsos.

the class TF_IDF method getString.

public void getString(Collection<String> links) {
    URLModel[] array = links.parallelStream().map(link -> {
        URLModel model = URLModel.empty();
        try {
            ResponseHandler<URLModel> responseHandler = new ResponseHandler<URLModel>() {

                @Override
                public URLModel apply(InputStream is, Integer code) {
                    if (code == 200) {
                        return new URLModel(link, ValueUtil.toString(is));
                    }
                    return URLModel.empty();
                }
            };
            if (link.startsWith("https")) {
                model = RequestUtil.requestSSL(new URL(link), responseHandler);
            } else {
                model = RequestUtil.request(new URL(link), responseHandler);
            }
        } catch (Exception e) {
            return URLModel.empty();
        }
        return model;
    }).filter(v -> !v.isEmpty()).map(v -> {
        String content = v.getContent();
        ExtractorBase instance = ArticleExtractor.getInstance();
        InputSource source = new InputSource(new StringReader(content));
        source.setEncoding("UTF-8");
        try {
            content = ValueUtil.HTML.getNewsContent(instance, source);
            v.setContent(content);
        } catch (Exception e) {
            v = URLModel.empty();
            e.printStackTrace();
        }
        return v;
    }).filter(v -> !v.isEmpty()).toArray(URLModel[]::new);
    List<KeyValue> tf_IDF = ValueUtil.toTF_IDF(array);
    tf_IDF.forEach(v -> {
        System.out.println(v.toString());
    });
}
Also used : URL(java.net.URL) RequestUtil(com.kyj.fx.voeditor.visual.util.RequestUtil) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) BoilerpipeSAXInput(com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput) KeyValue(com.kyj.fx.voeditor.visual.framework.KeyValue) ExtractorBase(com.kohlschutter.boilerpipe.extractors.ExtractorBase) URLModel(com.kyj.fx.voeditor.visual.framework.URLModel) Before(org.junit.Before) InputSource(org.xml.sax.InputSource) ProxyInitializable(com.kyj.fx.voeditor.visual.main.initalize.ProxyInitializable) Logger(org.slf4j.Logger) ResponseHandler(com.kyj.fx.voeditor.visual.util.ResponseHandler) MalformedURLException(java.net.MalformedURLException) Collection(java.util.Collection) Set(java.util.Set) IOException(java.io.IOException) Test(org.junit.Test) ValueUtil(com.kyj.fx.voeditor.visual.util.ValueUtil) ArticleSentencesExtractor(com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor) Collectors(java.util.stream.Collectors) List(java.util.List) KeepEverythingExtractor(com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor) StringReader(java.io.StringReader) Document(org.jsoup.nodes.Document) Jsoup(org.jsoup.Jsoup) Elements(org.jsoup.select.Elements) Collections(java.util.Collections) TextDocument(com.kohlschutter.boilerpipe.document.TextDocument) InputStream(java.io.InputStream) ArticleExtractor(com.kohlschutter.boilerpipe.extractors.ArticleExtractor) InputSource(org.xml.sax.InputSource) KeyValue(com.kyj.fx.voeditor.visual.framework.KeyValue) ResponseHandler(com.kyj.fx.voeditor.visual.util.ResponseHandler) ExtractorBase(com.kohlschutter.boilerpipe.extractors.ExtractorBase) InputStream(java.io.InputStream) URL(java.net.URL) MalformedURLException(java.net.MalformedURLException) IOException(java.io.IOException) StringReader(java.io.StringReader) URLModel(com.kyj.fx.voeditor.visual.framework.URLModel)

Aggregations

TextDocument (com.kohlschutter.boilerpipe.document.TextDocument)2 ArticleExtractor (com.kohlschutter.boilerpipe.extractors.ArticleExtractor)2 ArticleSentencesExtractor (com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor)2 ExtractorBase (com.kohlschutter.boilerpipe.extractors.ExtractorBase)2 KeepEverythingExtractor (com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor)2 BoilerpipeSAXInput (com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput)2 KeyValue (com.kyj.fx.voeditor.visual.framework.KeyValue)2 URLModel (com.kyj.fx.voeditor.visual.framework.URLModel)2 ProxyInitializable (com.kyj.fx.voeditor.visual.main.initalize.ProxyInitializable)2 RequestUtil (com.kyj.fx.voeditor.visual.util.RequestUtil)2 ResponseHandler (com.kyj.fx.voeditor.visual.util.ResponseHandler)2 ValueUtil (com.kyj.fx.voeditor.visual.util.ValueUtil)2 IOException (java.io.IOException)2 InputStream (java.io.InputStream)2 StringReader (java.io.StringReader)2 MalformedURLException (java.net.MalformedURLException)2 URL (java.net.URL)2 Collection (java.util.Collection)2 Collections (java.util.Collections)2 HashMap (java.util.HashMap)2