Search in sources :

Example 6 with URLModel

use of com.kyj.fx.voeditor.visual.framework.URLModel in project Gargoyle by callakrsos.

the class TF_IDF method getString.

public void getString(Collection<String> links) {
    URLModel[] array = links.parallelStream().map(link -> {
        URLModel model = URLModel.empty();
        try {
            ResponseHandler<URLModel> responseHandler = new ResponseHandler<URLModel>() {

                @Override
                public URLModel apply(InputStream is, Integer code) {
                    if (code == 200) {
                        return new URLModel(link, ValueUtil.toString(is));
                    }
                    return URLModel.empty();
                }
            };
            if (link.startsWith("https")) {
                model = RequestUtil.requestSSL(new URL(link), responseHandler);
            } else {
                model = RequestUtil.request(new URL(link), responseHandler);
            }
        } catch (Exception e) {
            return URLModel.empty();
        }
        return model;
    }).filter(v -> !v.isEmpty()).map(v -> {
        String content = v.getContent();
        ExtractorBase instance = ArticleExtractor.getInstance();
        InputSource source = new InputSource(new StringReader(content));
        source.setEncoding("UTF-8");
        try {
            content = ValueUtil.HTML.getNewsContent(instance, source);
            v.setContent(content);
        } catch (Exception e) {
            v = URLModel.empty();
            e.printStackTrace();
        }
        return v;
    }).filter(v -> !v.isEmpty()).toArray(URLModel[]::new);
    List<KeyValue> tf_IDF = ValueUtil.toTF_IDF(array);
    tf_IDF.forEach(v -> {
        System.out.println(v.toString());
    });
}
Also used : URL(java.net.URL) RequestUtil(com.kyj.fx.voeditor.visual.util.RequestUtil) LoggerFactory(org.slf4j.LoggerFactory) HashMap(java.util.HashMap) BoilerpipeSAXInput(com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput) KeyValue(com.kyj.fx.voeditor.visual.framework.KeyValue) ExtractorBase(com.kohlschutter.boilerpipe.extractors.ExtractorBase) URLModel(com.kyj.fx.voeditor.visual.framework.URLModel) Before(org.junit.Before) InputSource(org.xml.sax.InputSource) ProxyInitializable(com.kyj.fx.voeditor.visual.main.initalize.ProxyInitializable) Logger(org.slf4j.Logger) ResponseHandler(com.kyj.fx.voeditor.visual.util.ResponseHandler) MalformedURLException(java.net.MalformedURLException) Collection(java.util.Collection) Set(java.util.Set) IOException(java.io.IOException) Test(org.junit.Test) ValueUtil(com.kyj.fx.voeditor.visual.util.ValueUtil) ArticleSentencesExtractor(com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor) Collectors(java.util.stream.Collectors) List(java.util.List) KeepEverythingExtractor(com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor) StringReader(java.io.StringReader) Document(org.jsoup.nodes.Document) Jsoup(org.jsoup.Jsoup) Elements(org.jsoup.select.Elements) Collections(java.util.Collections) TextDocument(com.kohlschutter.boilerpipe.document.TextDocument) InputStream(java.io.InputStream) ArticleExtractor(com.kohlschutter.boilerpipe.extractors.ArticleExtractor) InputSource(org.xml.sax.InputSource) KeyValue(com.kyj.fx.voeditor.visual.framework.KeyValue) ResponseHandler(com.kyj.fx.voeditor.visual.util.ResponseHandler) ExtractorBase(com.kohlschutter.boilerpipe.extractors.ExtractorBase) InputStream(java.io.InputStream) URL(java.net.URL) MalformedURLException(java.net.MalformedURLException) IOException(java.io.IOException) StringReader(java.io.StringReader) URLModel(com.kyj.fx.voeditor.visual.framework.URLModel)

Aggregations

URLModel (com.kyj.fx.voeditor.visual.framework.URLModel)6 IOException (java.io.IOException)4 MalformedURLException (java.net.MalformedURLException)4 URL (java.net.URL)4 ExtractorBase (com.kohlschutter.boilerpipe.extractors.ExtractorBase)3 TextDocument (com.kohlschutter.boilerpipe.document.TextDocument)2 ArticleExtractor (com.kohlschutter.boilerpipe.extractors.ArticleExtractor)2 KeepEverythingExtractor (com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor)2 BoilerpipeSAXInput (com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput)2 KeyValue (com.kyj.fx.voeditor.visual.framework.KeyValue)2 RealtimeSearchItemVO (com.kyj.fx.voeditor.visual.framework.RealtimeSearchItemVO)2 RequestUtil (com.kyj.fx.voeditor.visual.util.RequestUtil)2 ResponseHandler (com.kyj.fx.voeditor.visual.util.ResponseHandler)2 ValueUtil (com.kyj.fx.voeditor.visual.util.ValueUtil)2 InputStream (java.io.InputStream)2 StringReader (java.io.StringReader)2 Collection (java.util.Collection)2 Collections (java.util.Collections)2 List (java.util.List)2 Set (java.util.Set)2