Search in sources :

Example 21 with Node

use of org.jsoup.nodes.Node in project structr by structr.

the class MicroformatParser method extractChildContent.

private Object extractChildContent(final Element element) {
    final List<String> parts = new LinkedList<>();
    element.traverse(new NodeVisitor() {

        @Override
        public void head(Node node, int depth) {
            if (node instanceof Element) {
                final Element element = (Element) node;
                final Set<String> classes = element.classNames();
                removeEmpty(classes);
                if (classes.isEmpty()) {
                    parts.add(element.ownText());
                }
            }
        }

        @Override
        public void tail(Node node, int depth) {
        }
    });
    if (parts.isEmpty()) {
        final String ownText = element.ownText();
        if (StringUtils.isNotBlank(ownText)) {
            parts.add(element.ownText());
        }
    }
    if (parts.isEmpty()) {
        return null;
    }
    if (parts.size() == 1) {
        return parts.get(0);
    }
    return parts;
}
Also used : Set(java.util.Set) LinkedHashSet(java.util.LinkedHashSet) Node(org.jsoup.nodes.Node) Element(org.jsoup.nodes.Element) LinkedList(java.util.LinkedList) NodeVisitor(org.jsoup.select.NodeVisitor)

Example 22 with Node

use of org.jsoup.nodes.Node in project substitution-schedule-parser by vertretungsplanme.

the class SVPlanParser method parseSVPlanSchedule.

@NotNull
SubstitutionSchedule parseSVPlanSchedule(List<Document> docs) throws IOException, JSONException {
    SubstitutionSchedule v = SubstitutionSchedule.fromData(scheduleData);
    for (Document doc : docs) {
        if (doc.select(".svp").size() > 0) {
            for (Element svp : doc.select(".svp")) {
                parseSvPlanDay(v, svp, doc);
            }
        } else if (doc.select(".Trennlinie").size() > 0) {
            Element div = new Element(Tag.valueOf("div"), "");
            for (Node node : doc.body().childNodesCopy()) {
                if (node instanceof Element && ((Element) node).hasClass("Trennlinie") && div.select("table").size() > 0) {
                    parseSvPlanDay(v, div, doc);
                    div = new Element(Tag.valueOf("div"), "");
                } else {
                    div.appendChild(node);
                }
            }
            parseSvPlanDay(v, div, doc);
        } else {
            parseSvPlanDay(v, doc, doc);
        }
    }
    v.setClasses(getAllClasses());
    v.setTeachers(getAllTeachers());
    return v;
}
Also used : SubstitutionSchedule(me.vertretungsplan.objects.SubstitutionSchedule) Element(org.jsoup.nodes.Element) TextNode(org.jsoup.nodes.TextNode) Node(org.jsoup.nodes.Node) Document(org.jsoup.nodes.Document) NotNull(org.jetbrains.annotations.NotNull)

Example 23 with Node

use of org.jsoup.nodes.Node in project opacclient by opacapp.

the class Heidi method parse_reservations.

protected List<ReservedItem> parse_reservations(String html) {
    Document doc = Jsoup.parse(html);
    List<ReservedItem> reservations = new ArrayList<>();
    DateTimeFormatter fmt = DateTimeFormat.forPattern("dd.MM.yyyy").withLocale(Locale.GERMAN);
    for (Element tr : doc.select("table.kontopos tr")) {
        ReservedItem item = new ReservedItem();
        Element desc = tr.child(1).select("label").first();
        Element pos = tr.child(3);
        if (tr.child(1).select("a").size() > 0) {
            String kk = getQueryParamsFirst(tr.child(1).select("a").first().absUrl("href")).get("katkey");
            item.setId(kk);
        }
        if (tr.child(0).select("input").size() > 0) {
            item.setCancelData(tr.child(0).select("input").first().val());
        }
        int i = 0;
        for (Node node : desc.childNodes()) {
            if (node instanceof TextNode) {
                String text = ((TextNode) node).text().trim();
                if (i == 0) {
                    item.setAuthor(text);
                } else if (i == 1) {
                    item.setTitle(text);
                }
                i++;
            }
        }
        i = 0;
        for (Node node : pos.childNodes()) {
            if (node instanceof TextNode) {
                String text = ((TextNode) node).text().trim();
                if (i == 0 && text.contains("")) {
                    try {
                        item.setReadyDate(fmt.parseLocalDate(text));
                    } catch (IllegalArgumentException e) {
                        item.setStatus(text);
                    }
                } else if (i == 1) {
                    item.setBranch(text);
                }
                i++;
            }
        }
        reservations.add(item);
    }
    return reservations;
}
Also used : Element(org.jsoup.nodes.Element) TextNode(org.jsoup.nodes.TextNode) Node(org.jsoup.nodes.Node) ArrayList(java.util.ArrayList) ReservedItem(de.geeksfactory.opacclient.objects.ReservedItem) TextNode(org.jsoup.nodes.TextNode) Document(org.jsoup.nodes.Document) DateTimeFormatter(org.joda.time.format.DateTimeFormatter)

Example 24 with Node

use of org.jsoup.nodes.Node in project opacclient by opacapp.

the class Heidi method account.

@Override
public AccountData account(Account account) throws IOException, JSONException, OpacErrorException {
    login(account);
    String html;
    Document doc;
    AccountData adata = new AccountData(account.getId());
    DateTimeFormatter fmt = DateTimeFormat.forPattern("dd.MM.yyyy").withLocale(Locale.GERMAN);
    html = httpGet(opac_url + "/konto.cgi?sess=" + sessid, getDefaultEncoding());
    doc = Jsoup.parse(html);
    doc.setBaseUri(opac_url + "/");
    for (Element td : doc.select("table.konto td")) {
        if (td.text().contains("Offene")) {
            String text = td.text().trim().replaceAll("Offene[^0-9]+Geb.+hren:[^0-9]+([0-9.," + "]+)[^0-9€A-Z]*(€|EUR|CHF|Fr.)", "$1 $2");
            adata.setPendingFees(text);
        }
    }
    List<LentItem> lent = new ArrayList<>();
    for (Element tr : doc.select("table.kontopos tr")) {
        LentItem item = new LentItem();
        Element desc = tr.child(1).select("label").first();
        String dates = tr.child(2).text().trim();
        if (tr.child(1).select("a").size() > 0) {
            String kk = getQueryParamsFirst(tr.child(1).select("a").first().absUrl("href")).get("katkey");
            item.setId(kk);
        }
        int i = 0;
        for (Node node : desc.childNodes()) {
            if (node instanceof TextNode) {
                String text = ((TextNode) node).text().trim();
                if (i == 0) {
                    item.setAuthor(text);
                } else if (i == 1) {
                    item.setTitle(text);
                } else if (text.contains("Mediennummer")) {
                    item.setBarcode(text.replace("Mediennummer: ", ""));
                }
                i++;
            }
        }
        if (tr.child(0).select("input").size() == 1) {
            item.setProlongData(tr.child(0).select("input").first().val());
            item.setRenewable(true);
        } else {
            item.setProlongData("§" + tr.child(0).select("span").first().attr("class"));
            item.setRenewable(false);
        }
        String todate = dates;
        if (todate.contains("-")) {
            String[] datesplit = todate.split("-");
            todate = datesplit[1].trim();
        }
        try {
            item.setDeadline(fmt.parseLocalDate(todate.substring(0, 10)));
        } catch (IllegalArgumentException e) {
            e.printStackTrace();
        }
        lent.add(item);
    }
    adata.setLent(lent);
    List<ReservedItem> reservations = new ArrayList<>();
    html = httpGet(opac_url + "/konto.cgi?konto=v&sess=" + sessid, getDefaultEncoding());
    reservations.addAll(parse_reservations(html));
    html = httpGet(opac_url + "/konto.cgi?konto=b&sess=" + sessid, getDefaultEncoding());
    reservations.addAll(parse_reservations(html));
    adata.setReservations(reservations);
    return adata;
}
Also used : Element(org.jsoup.nodes.Element) TextNode(org.jsoup.nodes.TextNode) Node(org.jsoup.nodes.Node) ArrayList(java.util.ArrayList) TextNode(org.jsoup.nodes.TextNode) Document(org.jsoup.nodes.Document) AccountData(de.geeksfactory.opacclient.objects.AccountData) ReservedItem(de.geeksfactory.opacclient.objects.ReservedItem) LentItem(de.geeksfactory.opacclient.objects.LentItem) DateTimeFormatter(org.joda.time.format.DateTimeFormatter)

Example 25 with Node

use of org.jsoup.nodes.Node in project opacclient by opacapp.

the class VuFind method parseCopies.

static void parseCopies(DetailedItem res, Document doc, JSONObject data) throws JSONException {
    if ("doublestacked".equals(data.optString("copystyle"))) {
        // e.g. http://vopac.nlg.gr/Record/393668/Holdings#tabnav
        // for Athens_GreekNationalLibrary
        Element container = doc.select(".tab-container").first();
        String branch = "";
        for (Element child : container.children()) {
            if (child.tagName().equals("h5")) {
                branch = child.text();
            } else if (child.tagName().equals("table")) {
                int i = 0;
                String callNumber = "";
                for (Element row : child.select("tr")) {
                    if (i == 0) {
                        callNumber = row.child(1).text();
                    } else {
                        Copy copy = new Copy();
                        copy.setBranch(branch);
                        copy.setShelfmark(callNumber);
                        copy.setBarcode(row.child(0).text());
                        copy.setStatus(row.child(1).text());
                        res.addCopy(copy);
                    }
                    i++;
                }
            }
        }
    } else if ("stackedtable".equals(data.optString("copystyle"))) {
        // e.g. http://search.lib.auth.gr/Record/376356
        // or https://katalog.ub.uni-leipzig.de/Record/0000196115
        // or https://www.stadt-muenster.de/opac2/Record/0367968
        Element container = doc.select(".recordsubcontent, .tab-container").first();
        // .tab-container is used in Muenster.
        String branch = "";
        JSONObject copytable = data.getJSONObject("copytable");
        for (Element child : container.children()) {
            if (child.tagName().equals("div")) {
                child = child.child(0);
            }
            if (child.tagName().equals("h3")) {
                branch = child.text();
            } else if (child.tagName().equals("table")) {
                if (child.select("caption").size() > 0) {
                    // Leipzig_Uni
                    branch = child.select("caption").first().ownText();
                }
                int i = 0;
                String callNumber = null;
                if ("headrow".equals(copytable.optString("signature"))) {
                    callNumber = child.select("tr").get(0).child(1).text();
                }
                for (Element row : child.select("tr")) {
                    if (i < copytable.optInt("_offset", 0)) {
                        i++;
                        continue;
                    }
                    Copy copy = new Copy();
                    if (callNumber != null) {
                        copy.setShelfmark(callNumber);
                    }
                    copy.setBranch(branch);
                    Iterator<?> keys = copytable.keys();
                    while (keys.hasNext()) {
                        String key = (String) keys.next();
                        if (key.startsWith("_"))
                            continue;
                        if (copytable.optString(key, "").contains("/")) {
                            // Leipzig_Uni
                            String[] splitted = copytable.getString(key).split("/");
                            int col = Integer.parseInt(splitted[0]);
                            int line = Integer.parseInt(splitted[1]);
                            int j = 0;
                            for (Node node : row.child(col).childNodes()) {
                                if (node instanceof Element) {
                                    if (((Element) node).tagName().equals("br")) {
                                        j++;
                                    } else if (j == line) {
                                        copy.set(key, ((Element) node).text());
                                    }
                                } else if (node instanceof TextNode && j == line && !((TextNode) node).text().trim().equals("")) {
                                    copy.set(key, ((TextNode) node).text());
                                }
                            }
                        } else {
                            // Thessaloniki_University
                            if (copytable.optInt(key, -1) == -1)
                                continue;
                            String value = row.child(copytable.getInt(key)).text();
                            copy.set(key, value);
                        }
                    }
                    res.addCopy(copy);
                    i++;
                }
            }
        }
    }
}
Also used : JSONObject(org.json.JSONObject) Copy(de.geeksfactory.opacclient.objects.Copy) Element(org.jsoup.nodes.Element) TextNode(org.jsoup.nodes.TextNode) Node(org.jsoup.nodes.Node) TextNode(org.jsoup.nodes.TextNode)

Aggregations

Node (org.jsoup.nodes.Node)55 Element (org.jsoup.nodes.Element)39 TextNode (org.jsoup.nodes.TextNode)39 Document (org.jsoup.nodes.Document)19 ArrayList (java.util.ArrayList)17 Elements (org.jsoup.select.Elements)11 IOException (java.io.IOException)7 HashMap (java.util.HashMap)6 Copy (de.geeksfactory.opacclient.objects.Copy)5 DetailedItem (de.geeksfactory.opacclient.objects.DetailedItem)5 NameValuePair (org.apache.http.NameValuePair)5 BasicNameValuePair (org.apache.http.message.BasicNameValuePair)5 DateTimeFormatter (org.joda.time.format.DateTimeFormatter)5 JSONException (org.json.JSONException)5 NotReachableException (de.geeksfactory.opacclient.networking.NotReachableException)4 Detail (de.geeksfactory.opacclient.objects.Detail)4 UnsupportedEncodingException (java.io.UnsupportedEncodingException)4 URI (java.net.URI)4 Matcher (java.util.regex.Matcher)4 URISyntaxException (java.net.URISyntaxException)3