Examples with DocumentStream - org.ojai.DocumentStream

Example 1 with DocumentStream

use of org.ojai.DocumentStream in project drill by apache.

the class TestEncodedFieldPaths method setup_TestEncodedFieldPaths.

@BeforeClass
public static void setup_TestEncodedFieldPaths() throws Exception {
    try (Table table = DBTests.createOrReplaceTable(TABLE_NAME, ImmutableMap.of("codes", "codes"))) {
        tableCreated = true;
        tablePath = table.getPath().toUri().getPath();
        DBTests.createIndex(TABLE_NAME, INDEX_NAME, new String[] { "age" }, new String[] { "name.last", "data.salary" });
        DBTests.admin().getTableIndexes(table.getPath(), true);
        try (final InputStream in = TestEncodedFieldPaths.class.getResourceAsStream(JSON_FILE_URL);
            final DocumentStream stream = Json.newDocumentStream(in)) {
            table.insertOrReplace(stream);
            table.flush();
        }
        // wait for the indexes to sync
        DBTests.waitForRowCount(table.getPath(), 5, INDEX_FLUSH_TIMEOUT);
        DBTests.waitForIndexFlush(table.getPath(), INDEX_FLUSH_TIMEOUT);
    } finally {
        test("ALTER SESSION SET `planner.disable_full_table_scan` = true");
    }
}

Also used : Table(com.mapr.db.Table) InputStream(java.io.InputStream) DocumentStream(org.ojai.DocumentStream) BeforeClass(org.junit.BeforeClass)

Example 2 with DocumentStream

use of org.ojai.DocumentStream in project drill by apache.

the class TestScanRanges method setup_TestSimpleJson.

@BeforeClass
public static void setup_TestSimpleJson() throws Exception {
    // Without intra-tablet partitioning, this test should run with only one minor fragment
    try (Table table = DBTests.createOrReplaceTable(TABLE_NAME, false);
        InputStream in = MaprDBTestsSuite.getJsonStream(JSON_FILE_URL);
        DocumentStream stream = Json.newDocumentStream(in)) {
        tableCreated = true;
        tablePath = table.getPath().toUri().getPath();
        List<Document> docs = Lists.newArrayList(stream);
        for (char ch = 'A'; ch <= 'T'; ch++) {
            for (int rowIndex = 0; rowIndex < 5000; rowIndex++) {
                for (int i = 0; i < docs.size(); i++) {
                    final Document document = docs.get(i);
                    final String id = String.format("%c%010d%03d", ch, rowIndex, i);
                    document.set("documentId", rowIndex);
                    table.insertOrReplace(id, document);
                }
            }
        }
        table.flush();
        DBTests.waitForRowCount(table.getPath(), TOTAL_ROW_COUNT);
        setSessionOption("planner.width.max_per_node", 5);
    }
}

Also used : Table(com.mapr.db.Table) InputStream(java.io.InputStream) Document(org.ojai.Document) DocumentStream(org.ojai.DocumentStream) BeforeClass(org.junit.BeforeClass)

Example 3 with DocumentStream

use of org.ojai.DocumentStream in project drill by apache.

the class LargeTableGen method generateTableWithIndex.

public void generateTableWithIndex(String tablePath, int recordNumber, String[] indexDef) throws Exception {
    // create index
    initRandVector(recordNumber);
    initDictionary();
    DBTests.setTableStatsSendInterval(1);
    if (admin.tableExists(tablePath)) {
    // admin.deleteTable(tablePath);
    }
    // create Json String
    int batch, i;
    int BATCH_SIZE = 2000;
    try (Table table = createOrGetTable(tablePath, recordNumber)) {
        // create index
        createIndex(table, indexDef);
        for (batch = 0; batch < recordNumber; batch += BATCH_SIZE) {
            int batchStop = Math.min(recordNumber, batch + BATCH_SIZE);
            StringBuffer strBuf = new StringBuffer();
            for (i = batch; i < batchStop; ++i) {
                strBuf.append(String.format("{\"rowid\": \"%d\", \"reverseid\": \"%d\", \"id\": {\"ssn\": \"%s\"}, \"contact\": {\"phone\": \"%s\", \"email\": \"%s\"}," + "\"address\": {\"city\": \"%s\", \"state\": \"%s\"}, \"name\": { \"fname\": \"%s\", \"lname\": \"%s\" }," + "\"personal\": {\"age\" : %s, \"income\": %s, \"birthdate\": {\"$dateDay\": \"%s\"} }," + "\"activity\": {\"irs\" : { \"firstlogin\":  \"%s\" } }," + "\"driverlicense\":{\"$numberLong\": %s} } \n", i + 1, recordNumber - i, getSSN(i), getPhone(i), getEmail(i), getAddress(i)[2], getAddress(i)[1], getFirstName(i), getLastName(i), getAge(i), getIncome(i), getBirthdate(i), getFirstLogin(i), getSSN(i)));
            }
            try (InputStream in = new StringBufferInputStream(strBuf.toString());
                DocumentStream stream = Json.newDocumentStream(in)) {
                try {
                    // insert a batch  of document in stream
                    table.insert(stream, "rowid");
                } catch (Exception e) {
                    System.out.println(stream.toString());
                    throw e;
                }
            }
        }
        table.flush();
        DBTests.waitForIndexFlush(table.getPath(), INDEX_FLUSH_TIMEOUT);
        Thread.sleep(200000);
    }
}

Also used : Table(com.mapr.db.Table) StringBufferInputStream(java.io.StringBufferInputStream) StringBufferInputStream(java.io.StringBufferInputStream) InputStream(java.io.InputStream) DocumentStream(org.ojai.DocumentStream)

Example 4 with DocumentStream

use of org.ojai.DocumentStream in project drill by apache.

the class TestSimpleJson method setup_TestSimpleJson.

@BeforeClass
public static void setup_TestSimpleJson() throws Exception {
    try (Table table = DBTests.createOrReplaceTable(TABLE_NAME);
        InputStream in = MaprDBTestsSuite.getJsonStream(JSON_FILE_URL);
        DocumentStream stream = Json.newDocumentStream(in)) {
        tableCreated = true;
        tablePath = table.getPath().toUri().getPath();
        for (Document document : stream) {
            table.insert(document, "business_id");
        }
        table.flush();
    }
}

Also used : Table(com.mapr.db.Table) InputStream(java.io.InputStream) Document(org.ojai.Document) DocumentStream(org.ojai.DocumentStream) BeforeClass(org.junit.BeforeClass)

Example 5 with DocumentStream

use of org.ojai.DocumentStream in project drill by apache.

the class RestrictedJsonRecordReader method readToInitSchema.

public void readToInitSchema() {
    DBDocumentReaderBase reader = null;
    vectorWriter.setPosition(0);
    try (DocumentStream dstream = table.find()) {
        reader = (DBDocumentReaderBase) dstream.iterator().next().asReader();
        documentWriter.writeDBDocument(vectorWriter, reader);
    } catch (UserException e) {
        throw UserException.unsupportedError(e).addContext(String.format("Table: %s, document id: '%s'", getTable().getPath(), reader == null ? null : IdCodec.asString(reader.getId()))).build(logger);
    } catch (SchemaChangeException e) {
        if (getIgnoreSchemaChange()) {
            logger.warn("{}. Dropping the row from result.", e.getMessage());
            logger.debug("Stack trace:", e);
        } else {
            throw dataReadError(logger, e);
        }
    } finally {
        vectorWriter.setPosition(0);
    }
}

Also used : SchemaChangeException(org.apache.drill.exec.exception.SchemaChangeException) DBDocumentReaderBase(com.mapr.db.ojai.DBDocumentReaderBase) UserException(org.apache.drill.common.exceptions.UserException) DocumentStream(org.ojai.DocumentStream)

Aggregations

DocumentStream (org.ojai.DocumentStream)6 Table (com.mapr.db.Table)4 InputStream (java.io.InputStream)4 BeforeClass (org.junit.BeforeClass)3 Document (org.ojai.Document)3 DBDocumentReaderBase (com.mapr.db.ojai.DBDocumentReaderBase)1 StringBufferInputStream (java.io.StringBufferInputStream)1 UserException (org.apache.drill.common.exceptions.UserException)1 SchemaChangeException (org.apache.drill.exec.exception.SchemaChangeException)1 DocumentStore (org.ojai.store.DocumentStore)1 Query (org.ojai.store.Query)1 QueryCondition (org.ojai.store.QueryCondition)1