Search in sources :

Example 6 with CSVParser

use of com.Ostermiller.util.CSVParser in project eol-globi-data by jhpoelen.

the class StudyImporterForMetaTableIT method importAll.

@Test
public void importAll() throws IOException, StudyImporterException {
    final List<Map<String, String>> links = new ArrayList<Map<String, String>>();
    final InteractionListener interactionListener = properties -> links.add(properties);
    final StudyImporterForMetaTable.TableParserFactory tableFactory = (config, dataset) -> {
        String firstFewLines = "intertype,obstype,effunit,effort,obsunit,obsquant,germnotes,\"REPLACE(Interaction.notes, ',', ';')\",AnimalNumber,AnimalClass,AnimalOrder,AnimalFamily,AnimalGenus,AnimalSpecies,AnimalSubSpecies,AnimalType,AnimalCommonName,PlantNumber,PlantFamily,PlantGenus,PlantSpecies,PlantSubSpecies,country,region,ProvinceDistrictCity,ProtectedArea,HabitatWhite,HabitatAuthor,author,title,year,journal,volume,number,pages,USER,DEF_timestamp,,,\n" + "seed disperser,direct observation,months,4,dung density,,,Article focused on elephant density per habitat type based on seed/plant types identified in dung at the various research locations. All identified plant types are being assumed to be dispersed by the elephants,1441,Mammalia,Proboscidea,Elephantidae,Loxodonta,africana,,NULL,African Bush Elephant,4035,Poaceae,Cynodon,dactylon,NULL,Mozambique,NULL,NULL,yes,forest transitions and mosaics,mangroves dune grass plains forest woodland riverine,\"De Boer, W.F. and Ntumi, C.P. and Correia, A.U. and Mafuca, J.M.\",Diet and distribution of elephant in the Maputo Elephant Reserve; Mozambique,2000,African Journal of Ecology,38,3,188-201,Mary,0000-00-00 00:00:00,,,\n" + "seed disperser,direct observation,months,4,dung density,,,Article focused on elephant density per habitat type based on seed/plant types identified in dung at the various research locations. All identified plant types are being assumed to be dispersed by the elephants,1441,Mammalia,Proboscidea,Elephantidae,Loxodonta,africana,,NULL,African Bush Elephant,3639,Poaceae,Aristida,canescens,NULL,Mozambique,NULL,NULL,yes,forest transitions and mosaics,mangroves dune grass plains forest woodland riverine,\"De Boer, W.F. and Ntumi, C.P. and Correia, A.U. and Mafuca, J.M.\",Diet and distribution of elephant in the Maputo Elephant Reserve; Mozambique,2000,African Journal of Ecology,38,3,188-201,Mary,0000-00-00 00:00:00,,,\n" + "seed disperser,direct observation,months,4,dung density,,,Article focused on elephant density per habitat type based on seed/plant types identified in dung at the various research locations. All identified plant types are being assumed to be dispersed by the elephants,1441,Mammalia,Proboscidea,Elephantidae,Loxodonta,africana,,NULL,African Bush Elephant,3574,Poaceae,Andropogon,eucomus,NULL,Mozambique,NULL,NULL,yes,forest transitions and mosaics,mangroves dune grass plains forest woodland riverine,\"De Boer, W.F. and Ntumi, C.P. and Correia, A.U. and Mafuca, J.M.\",Diet and distribution of elephant in the Maputo Elephant Reserve; Mozambique,2000,African Journal of Ecology,38,3,188-201,Mary,0000-00-00 00:00:00,,,\n" + "seed disperser,direct observation,months,4,dung density,,,Article focused on elephant density per habitat type based on seed/plant types identified in dung at the various research locations. All identified plant types are being assumed to be dispersed by the elephants,1441,Mammalia,Proboscidea,Elephantidae,Loxodonta,africana,,NULL,African Bush Elephant,5125,Phyllanthaceae,Phyllanthus,reticulatus,NULL,Mozambique,NULL,NULL,yes,forest transitions and mosaics,mangroves dune grass plains forest woodland riverine,\"De Boer, W.F. and Ntumi, C.P. and Correia, A.U. and Mafuca, J.M.\",Diet and distribution of elephant in the Maputo Elephant Reserve; Mozambique,2000,African Journal of Ecology,38,3,188-201,Mary,0000-00-00 00:00:00,,,\n" + "seed disperser,direct observation,months,4,dung density,,,Article focused on elephant density per habitat type based on seed/plant types identified in dung at the various research locations. All identified plant types are being assumed to be dispersed by the elephants,1441,Mammalia,Proboscidea,Elephantidae,Loxodonta,africana,,NULL,African Bush Elephant,399,Myrtaceae,Syzygium,cordatum,,Mozambique,NULL,NULL,yes,forest transitions and mosaics,mangroves dune grass plains forest woodland riverine,\"De Boer, W.F. and Ntumi, C.P. and Correia, A.U. and Mafuca, J.M.\",Diet and distribution of elephant in the Maputo Elephant Reserve; Mozambique,2000,African Journal of Ecology,38,3,188-201,Mary,0000-00-00 00:00:00,,,\n" + "seed disperser,direct observation,months,4,dung density,,,Article focused on elephant density per habitat type based on seed/plant types identified in dung at the various research locations. All identified plant types are being assumed to be dispersed by the elephants,1441,Mammalia,Proboscidea,Elephantidae,Loxodonta,africana,,NULL,African Bush Elephant,374,Moraceae,Ficus,sycomorus,,Mozambique,NULL,NULL,yes,forest transitions and mosaics,mangroves dune grass plains forest woodland riverine,\"De Boer, W.F. and Ntumi, C.P. and Correia, A.U. and Mafuca, J.M.\",Diet and distribution of elephant in the Maputo Elephant Reserve; Mozambique,2000,African Journal of Ecology,38,3,188-201,Mary,0000-00-00 00:00:00,,,\n" + "seed disperser,direct observation,months,4,dung density,,,Article focused on elephant density per habitat type based on seed/plant types identified in dung at the various research locations. All identified plant types are being assumed to be dispersed by the elephants,1441,Mammalia,Proboscidea,Elephantidae,Loxodonta,africana,,NULL,African Bush Elephant,4398,Moraceae,Ficus,sp,NULL,Mozambique,NULL,NULL,yes,forest transitions and mosaics,mangroves dune grass plains forest woodland riverine,\"De Boer, W.F. and Ntumi, C.P. and Correia, A.U. and Mafuca, J.M.\",Diet and distribution of elephant in the Maputo Elephant Reserve; Mozambique,2000,African Journal of Ecology,38,3,188-201,Mary,0000-00-00 00:00:00,,,\n" + "seed disperser,direct observation,years,4,NULL,NULL,NULL,NULL,3051,Animal,Animal,Animal,Animal,animal,NULL,general animal,NULL,4176,Caesalpinioideae,Distemonanthus,benthamianus,NULL,Cameroon,NULL,NULL,yes,NULL,semideciduous tropical rain forest,\"Hardesty, B.D. and Parker, V.T.\",Community seed rain patterns and a comparison to adult community structure in a West African tropical forest,2003,Plant Ecology,164,1,49-64,Mary,8/15/12 9:35,,,\n" + "ingestion,direct observation,years,2,NULL,NULL,NULL,during both summer and winter season,1462,Mammalia,Artiodactyla,Bovidae,Madoqua,kirkii,,NULL,Kirk's Dikdik,6897,Moraceae,Ficus,petersii,NULL,Namibia,South West Africa,NULL,yes,NULL,riverine thicket,\"Tinley, K.\",Dikdik; Madoqua kirkii; in south-west Africa: notes on distribution; ecology; and behaviour,1969,Madoqua,1,NULL,Jul-33,Anna,2/24/14 18:40,,,\n";
        return new LabeledCSVParser(new CSVParser(IOUtils.toInputStream(firstFewLines)));
    };
    final String baseUrl = "https://raw.githubusercontent.com/globalbioticinteractions/AfricaTreeDatabase/master";
    final String resource = baseUrl + "/globi.json";
    importAll(interactionListener, tableFactory, baseUrl, resource);
    assertThat(links.size(), is(9));
}
Also used : URL(java.net.URL) Assert.assertNotNull(org.junit.Assert.assertNotNull) DatasetImpl(org.eol.globi.service.DatasetImpl) Test(org.junit.Test) IOException(java.io.IOException) JsonNode(org.codehaus.jackson.JsonNode) StringContains.containsString(org.junit.internal.matchers.StringContains.containsString) CSVParser(com.Ostermiller.util.CSVParser) ArrayList(java.util.ArrayList) Assert.assertThat(org.junit.Assert.assertThat) IOUtils(org.apache.commons.io.IOUtils) List(java.util.List) ResourceUtil(org.eol.globi.util.ResourceUtil) Assert(junit.framework.Assert) Map(java.util.Map) LabeledCSVParser(com.Ostermiller.util.LabeledCSVParser) Dataset(org.eol.globi.service.Dataset) Is.is(org.hamcrest.core.Is.is) URI(java.net.URI) StringStartsWith.startsWith(org.hamcrest.core.StringStartsWith.startsWith) ObjectMapper(org.codehaus.jackson.map.ObjectMapper) CoreMatchers.nullValue(org.hamcrest.CoreMatchers.nullValue) InputStream(java.io.InputStream) CSVParser(com.Ostermiller.util.CSVParser) LabeledCSVParser(com.Ostermiller.util.LabeledCSVParser) ArrayList(java.util.ArrayList) StringContains.containsString(org.junit.internal.matchers.StringContains.containsString) LabeledCSVParser(com.Ostermiller.util.LabeledCSVParser) Map(java.util.Map) Test(org.junit.Test)

Example 7 with CSVParser

use of com.Ostermiller.util.CSVParser in project eol-globi-data by jhpoelen.

the class StudyImporterForMetaTableIT method importREEMWithStaticCSV.

@Test
public void importREEMWithStaticCSV() throws IOException, StudyImporterException {
    final List<Map<String, String>> links = new ArrayList<Map<String, String>>();
    final InteractionListener interactionListener = properties -> links.add(properties);
    final StudyImporterForMetaTable.TableParserFactory tableFactory = (config, dataset) -> {
        String firstFewLines = "Hauljoin,\" Pred_nodc\",\" Pred_specn\",\" Prey_nodc\",\" Pred_len\",\" Year\",\" Month\",\" day\",\" region\",\" Pred_name\",\" Prey_Name\",\" Vessel\",\" Cruise\",\" Haul\",\" Rlat\",\" Rlong\",\" Gear_depth\",\" Bottom_depth\",\" Start_hour\",\" Surface_temp\",\" Gear_temp\",\" INPFC_Area\",\" Stationid\",\" Start_date\",\" Prey_sz1\",\" Prey_sex\"\n" + "11012118.0,8791030401.0,5.0,9999999998.0,53.0,1994.0,7.0,11.0,AI,\"Pacific cod Gadus macrocephalus\",\"Rocks \",95.0,199401.0,148.0,51.43,178.81999999999999,222.0,228.0,11.0,0.63,0.41999999999999998,542.0,118-11,\"1994-07-11 00:00:00\",3.0,\n" + "11012118.0,8791030401.0,8.0,9999999998.0,53.0,1994.0,7.0,11.0,AI,\"Pacific cod Gadus macrocephalus\",\"Rocks \",95.0,199401.0,148.0,51.43,178.81999999999999,222.0,228.0,11.0,0.63,0.41999999999999998,542.0,118-11,\"1994-07-11 00:00:00\",3.0,\n" + "11012118.0,8791030401.0,9.0,9999999998.0,58.0,1994.0,7.0,11.0,AI,\"Pacific cod Gadus macrocephalus\",\"Rocks \",95.0,199401.0,148.0,51.43,178.81999999999999,222.0,228.0,11.0,0.63,0.41999999999999998,542.0,118-11,\"1994-07-11 00:00:00\",13.0,\n" + "11012118.0,8791030401.0,9.0,9999999998.0,58.0,1994.0,7.0,11.0,AI,\"Pacific cod Gadus macrocephalus\",\"Rocks \",95.0,199401.0,148.0,51.43,178.81999999999999,222.0,228.0,11.0,0.63,0.41999999999999998,542.0,118-11,\"1994-07-11 00:00:00\",3.0,\n";
        return new LabeledCSVParser(new CSVParser(IOUtils.toInputStream(firstFewLines)));
    };
    final String baseUrl = "https://raw.githubusercontent.com/globalbioticinteractions/noaa-reem/master";
    final String resource = baseUrl + "/globi.json";
    importAll(interactionListener, tableFactory, baseUrl, resource);
    assertThat(links.size(), is(12));
    final Map<String, String> firstLine = links.get(0);
    assertThat(firstLine.get(StudyImporterForTSV.INTERACTION_TYPE_ID), is("http://purl.obolibrary.org/obo/RO_0002470"));
    assertThat(firstLine.get(StudyImporterForTSV.INTERACTION_TYPE_NAME), is("eats"));
    assertThat(firstLine.get(StudyImporterForTSV.TARGET_TAXON_ID), is(nullValue()));
    assertThat(firstLine.get(StudyImporterForTSV.TARGET_TAXON_NAME), is("Rocks"));
    assertThat(firstLine.get(StudyImporterForTSV.SOURCE_TAXON_ID), is("NODC:8791030401"));
    assertThat(firstLine.get(StudyImporterForTSV.SOURCE_TAXON_NAME), is("Pacific cod Gadus macrocephalus"));
    assertThat(firstLine.get(StudyImporterForMetaTable.EVENT_DATE), startsWith("1994-07-11"));
    assertThat(firstLine.get(StudyImporterForMetaTable.LATITUDE), is("51.43"));
    assertThat(firstLine.get(StudyImporterForMetaTable.LONGITUDE), is("178.81999999999999"));
}
Also used : URL(java.net.URL) Assert.assertNotNull(org.junit.Assert.assertNotNull) DatasetImpl(org.eol.globi.service.DatasetImpl) Test(org.junit.Test) IOException(java.io.IOException) JsonNode(org.codehaus.jackson.JsonNode) StringContains.containsString(org.junit.internal.matchers.StringContains.containsString) CSVParser(com.Ostermiller.util.CSVParser) ArrayList(java.util.ArrayList) Assert.assertThat(org.junit.Assert.assertThat) IOUtils(org.apache.commons.io.IOUtils) List(java.util.List) ResourceUtil(org.eol.globi.util.ResourceUtil) Assert(junit.framework.Assert) Map(java.util.Map) LabeledCSVParser(com.Ostermiller.util.LabeledCSVParser) Dataset(org.eol.globi.service.Dataset) Is.is(org.hamcrest.core.Is.is) URI(java.net.URI) StringStartsWith.startsWith(org.hamcrest.core.StringStartsWith.startsWith) ObjectMapper(org.codehaus.jackson.map.ObjectMapper) CoreMatchers.nullValue(org.hamcrest.CoreMatchers.nullValue) InputStream(java.io.InputStream) CSVParser(com.Ostermiller.util.CSVParser) LabeledCSVParser(com.Ostermiller.util.LabeledCSVParser) ArrayList(java.util.ArrayList) StringContains.containsString(org.junit.internal.matchers.StringContains.containsString) LabeledCSVParser(com.Ostermiller.util.LabeledCSVParser) Map(java.util.Map) Test(org.junit.Test)

Example 8 with CSVParser

use of com.Ostermiller.util.CSVParser in project eol-globi-data by jhpoelen.

the class CSVTSVUtil method createTSVParser.

public static CSVParser createTSVParser(Reader reader) {
    final CSVParser parser = new CSVParser(reader);
    parser.changeDelimiter('\t');
    return parser;
}
Also used : CSVParser(com.Ostermiller.util.CSVParser) ExcelCSVParser(com.Ostermiller.util.ExcelCSVParser) LabeledCSVParser(com.Ostermiller.util.LabeledCSVParser)

Example 9 with CSVParser

use of com.Ostermiller.util.CSVParser in project eol-globi-data by jhpoelen.

the class DOIResolverCache method init.

void init(final Reader reader) throws PropertyEnricherException, IOException {
    DB db = initDb("doiCache");
    StopWatch watch = new StopWatch();
    watch.start();
    final CSVParser parser = CSVTSVUtil.createTSVParser(reader);
    LOG.info("doi cache building...");
    doiCitationMap = db.createTreeMap("doiCache").pumpPresort(300000).pumpIgnoreDuplicates().pumpSource(new Iterator<Fun.Tuple2<String, String>>() {

        private String[] line;

        String getCitation(String[] line) {
            return line[1];
        }

        String getDOI(String[] line) {
            return line[0];
        }

        @Override
        public boolean hasNext() {
            try {
                do {
                    line = parser.getLine();
                } while (line != null && line.length > 1 && !StringUtils.isNoneBlank(getCitation(line), getDOI(line)));
                boolean hasNext = line != null && line.length > 1 && StringUtils.isNoneBlank(getCitation(line), getDOI(line));
                if (!hasNext) {
                    System.out.println("[no more]");
                }
                return hasNext;
            } catch (IOException e) {
                LOG.error("problem reading", e);
                return false;
            }
        }

        @Override
        public Fun.Tuple2<String, String> next() {
            String citationString = StringUtils.defaultString(line[1], "");
            String doi = StringUtils.defaultString(line[0], "");
            return new Fun.Tuple2<>(citationString, doi);
        }
    }).make();
    watch.stop();
    LOG.info("doi cache built in [" + watch.getTime() / 1000 + "] s.");
}
Also used : CSVParser(com.Ostermiller.util.CSVParser) IOException(java.io.IOException) DB(org.mapdb.DB) Fun(org.mapdb.Fun) StopWatch(org.apache.commons.lang3.time.StopWatch)

Example 10 with CSVParser

use of com.Ostermiller.util.CSVParser in project eol-globi-data by jhpoelen.

the class OpenTreeUtil method readTaxonomy.

public static void readTaxonomy(OpenTreeListener listener, InputStream inputStream) throws IOException {
    LabeledCSVParser parser = CSVTSVUtil.createLabeledCSVParser(new CSVParser(IOUtils.toBufferedInputStream(inputStream), '\t'));
    while (parser.getLine() != null) {
        String taxonId = parser.getValueByLabel("uid");
        String[] externalIds = StringUtils.split(parser.getValueByLabel("sourceinfo"), ",");
        for (String otherTaxonId : externalIds) {
            listener.taxonSameAs(taxonId, otherTaxonId);
        }
    }
}
Also used : LabeledCSVParser(com.Ostermiller.util.LabeledCSVParser) CSVParser(com.Ostermiller.util.CSVParser) LabeledCSVParser(com.Ostermiller.util.LabeledCSVParser)

Aggregations

CSVParser (com.Ostermiller.util.CSVParser)14 LabeledCSVParser (com.Ostermiller.util.LabeledCSVParser)12 Test (org.junit.Test)10 StringReader (java.io.StringReader)5 IOException (java.io.IOException)4 InputStream (java.io.InputStream)4 HashMap (java.util.HashMap)4 URI (java.net.URI)3 URL (java.net.URL)3 ArrayList (java.util.ArrayList)3 List (java.util.List)3 Map (java.util.Map)3 Assert (junit.framework.Assert)3 IOUtils (org.apache.commons.io.IOUtils)3 JsonNode (org.codehaus.jackson.JsonNode)3 ObjectMapper (org.codehaus.jackson.map.ObjectMapper)3 Dataset (org.eol.globi.service.Dataset)3 DatasetImpl (org.eol.globi.service.DatasetImpl)3 ResourceUtil (org.eol.globi.util.ResourceUtil)3 CoreMatchers.nullValue (org.hamcrest.CoreMatchers.nullValue)3