Search in sources :

Example 6 with PageOutput

use of org.embulk.spi.PageOutput in project embulk by embulk.

the class SamplingParserPlugin method runFileInputSampling.

public static Buffer runFileInputSampling(final FileInputRunner runner, ConfigSource inputConfig, ConfigSource sampleBufferConfig) {
    final SampleBufferTask sampleBufferTask = sampleBufferConfig.loadConfig(SampleBufferTask.class);
    // override in.parser.type so that FileInputRunner creates SamplingParserPlugin
    ConfigSource samplingInputConfig = inputConfig.deepCopy();
    samplingInputConfig.getNestedOrSetEmpty("parser").set("type", "system_sampling").set("sample_buffer_bytes", sampleBufferTask.getSampleBufferBytes());
    samplingInputConfig.set("decoders", null);
    try {
        runner.transaction(samplingInputConfig, new InputPlugin.Control() {

            public List<TaskReport> run(TaskSource taskSource, Schema schema, int taskCount) {
                if (taskCount == 0) {
                    throw new NoSampleException("No input files to read sample data");
                }
                int maxSize = -1;
                int maxSizeTaskIndex = -1;
                for (int taskIndex = 0; taskIndex < taskCount; taskIndex++) {
                    try {
                        runner.run(taskSource, schema, taskIndex, new PageOutput() {

                            @Override
                            public void add(Page page) {
                                // TODO exception class
                                throw new RuntimeException("Input plugin must be a FileInputPlugin to guess parser configuration");
                            }

                            public void finish() {
                            }

                            public void close() {
                            }
                        });
                    } catch (NotEnoughSampleError ex) {
                        if (maxSize < ex.getSize()) {
                            maxSize = ex.getSize();
                            maxSizeTaskIndex = taskIndex;
                        }
                        continue;
                    }
                }
                if (maxSize <= 0) {
                    throw new NoSampleException("All input files are empty");
                }
                taskSource.getNested("ParserTaskSource").set("force", true);
                try {
                    runner.run(taskSource, schema, maxSizeTaskIndex, new PageOutput() {

                        @Override
                        public void add(Page page) {
                            // TODO exception class
                            throw new RuntimeException("Input plugin must be a FileInputPlugin to guess parser configuration");
                        }

                        public void finish() {
                        }

                        public void close() {
                        }
                    });
                } catch (NotEnoughSampleError ex) {
                    throw new NoSampleException("All input files are smaller than minimum sampling size");
                }
                throw new NoSampleException("All input files are smaller than minimum sampling size");
            }
        });
        throw new AssertionError("SamplingParserPlugin must throw SampledNoticeError");
    } catch (SampledNoticeError error) {
        return error.getSample();
    }
}
Also used : InputPlugin(org.embulk.spi.InputPlugin) Schema(org.embulk.spi.Schema) Page(org.embulk.spi.Page) ConfigSource(org.embulk.config.ConfigSource) PageOutput(org.embulk.spi.PageOutput) List(java.util.List) TaskSource(org.embulk.config.TaskSource)

Aggregations

PageOutput (org.embulk.spi.PageOutput)6 List (java.util.List)3 TaskSource (org.embulk.config.TaskSource)3 InputPlugin (org.embulk.spi.InputPlugin)3 Page (org.embulk.spi.Page)3 Schema (org.embulk.spi.Schema)3 ArrayList (java.util.ArrayList)2 ConfigSource (org.embulk.config.ConfigSource)2 ImmutableList (com.google.common.collect.ImmutableList)1 ConfigDiff (org.embulk.config.ConfigDiff)1 TaskReport (org.embulk.config.TaskReport)1 AbortTransactionResource (org.embulk.spi.AbortTransactionResource)1 CloseResource (org.embulk.spi.CloseResource)1 Column (org.embulk.spi.Column)1 ColumnVisitor (org.embulk.spi.ColumnVisitor)1 FileInputRunner (org.embulk.spi.FileInputRunner)1 PageReader (org.embulk.spi.PageReader)1 TransactionalPageOutput (org.embulk.spi.TransactionalPageOutput)1 Timestamp (org.embulk.spi.time.Timestamp)1 TimestampFormatter (org.embulk.spi.time.TimestampFormatter)1