Search in sources :

Example 16 with TaskSource

use of org.embulk.config.TaskSource in project embulk by embulk.

the class PreviewExecutor method doPreview.

@SuppressWarnings("checkstyle:OverloadMethodsDeclarationOrder")
private PreviewResult doPreview(final PreviewTask task, final InputPlugin input, final List<FilterPlugin> filterPlugins) {
    try {
        input.transaction(task.getInputConfig(), new InputPlugin.Control() {

            public List<TaskReport> run(final TaskSource inputTask, Schema inputSchema, final int taskCount) {
                Filters.transaction(filterPlugins, task.getFilterConfigs(), inputSchema, new Filters.Control() {

                    public void run(final List<TaskSource> filterTasks, final List<Schema> filterSchemas) {
                        Schema inputSchema = filterSchemas.get(0);
                        Schema outputSchema = filterSchemas.get(filterSchemas.size() - 1);
                        PageOutput out = new SamplingPageOutput(task.getSampleRows(), outputSchema);
                        try {
                            for (int taskIndex = 0; taskIndex < taskCount; taskIndex++) {
                                try {
                                    out = Filters.open(filterPlugins, filterTasks, filterSchemas, out);
                                    input.run(inputTask, inputSchema, taskIndex, out);
                                } catch (NoSampleException ex) {
                                    if (taskIndex == taskCount - 1) {
                                        throw ex;
                                    }
                                }
                            }
                        } finally {
                            out.close();
                        }
                    }
                });
                // program never reaches here because SamplingPageOutput.finish throws an error.
                throw new NoSampleException("No input records to preview");
            }
        });
        throw new AssertionError("PreviewExecutor executor must throw PreviewedNoticeError");
    } catch (PreviewedNoticeError previewed) {
        return previewed.getPreviewResult();
    }
}
Also used : InputPlugin(org.embulk.spi.InputPlugin) PageOutput(org.embulk.spi.PageOutput) Schema(org.embulk.spi.Schema) ArrayList(java.util.ArrayList) List(java.util.List) TaskSource(org.embulk.config.TaskSource)

Example 17 with TaskSource

use of org.embulk.config.TaskSource in project embulk by embulk.

the class SamplingParserPlugin method runFileInputSampling.

public static Buffer runFileInputSampling(final FileInputRunner runner, ConfigSource inputConfig, ConfigSource sampleBufferConfig) {
    final SampleBufferTask sampleBufferTask = sampleBufferConfig.loadConfig(SampleBufferTask.class);
    // override in.parser.type so that FileInputRunner creates SamplingParserPlugin
    ConfigSource samplingInputConfig = inputConfig.deepCopy();
    samplingInputConfig.getNestedOrSetEmpty("parser").set("type", "system_sampling").set("sample_buffer_bytes", sampleBufferTask.getSampleBufferBytes());
    samplingInputConfig.set("decoders", null);
    try {
        runner.transaction(samplingInputConfig, new InputPlugin.Control() {

            public List<TaskReport> run(TaskSource taskSource, Schema schema, int taskCount) {
                if (taskCount == 0) {
                    throw new NoSampleException("No input files to read sample data");
                }
                int maxSize = -1;
                int maxSizeTaskIndex = -1;
                for (int taskIndex = 0; taskIndex < taskCount; taskIndex++) {
                    try {
                        runner.run(taskSource, schema, taskIndex, new PageOutput() {

                            @Override
                            public void add(Page page) {
                                // TODO exception class
                                throw new RuntimeException("Input plugin must be a FileInputPlugin to guess parser configuration");
                            }

                            public void finish() {
                            }

                            public void close() {
                            }
                        });
                    } catch (NotEnoughSampleError ex) {
                        if (maxSize < ex.getSize()) {
                            maxSize = ex.getSize();
                            maxSizeTaskIndex = taskIndex;
                        }
                        continue;
                    }
                }
                if (maxSize <= 0) {
                    throw new NoSampleException("All input files are empty");
                }
                taskSource.getNested("ParserTaskSource").set("force", true);
                try {
                    runner.run(taskSource, schema, maxSizeTaskIndex, new PageOutput() {

                        @Override
                        public void add(Page page) {
                            // TODO exception class
                            throw new RuntimeException("Input plugin must be a FileInputPlugin to guess parser configuration");
                        }

                        public void finish() {
                        }

                        public void close() {
                        }
                    });
                } catch (NotEnoughSampleError ex) {
                    throw new NoSampleException("All input files are smaller than minimum sampling size");
                }
                throw new NoSampleException("All input files are smaller than minimum sampling size");
            }
        });
        throw new AssertionError("SamplingParserPlugin must throw SampledNoticeError");
    } catch (SampledNoticeError error) {
        return error.getSample();
    }
}
Also used : InputPlugin(org.embulk.spi.InputPlugin) Schema(org.embulk.spi.Schema) Page(org.embulk.spi.Page) ConfigSource(org.embulk.config.ConfigSource) PageOutput(org.embulk.spi.PageOutput) List(java.util.List) TaskSource(org.embulk.config.TaskSource)

Aggregations

TaskSource (org.embulk.config.TaskSource)17 ConfigSource (org.embulk.config.ConfigSource)12 Schema (org.embulk.spi.Schema)12 List (java.util.List)9 Test (org.junit.Test)9 ImmutableList (com.google.common.collect.ImmutableList)8 ArrayList (java.util.ArrayList)7 FilterPlugin (org.embulk.spi.FilterPlugin)7 TaskReport (org.embulk.config.TaskReport)5 InputPlugin (org.embulk.spi.InputPlugin)5 SchemaConfigException (org.embulk.spi.SchemaConfigException)4 ConfigDiff (org.embulk.config.ConfigDiff)3 ConfigException (org.embulk.config.ConfigException)3 PageOutput (org.embulk.spi.PageOutput)3 ImmutableMap (com.google.common.collect.ImmutableMap)2 LinkedList (java.util.LinkedList)2 Column (org.embulk.spi.Column)2 ExecutorPlugin (org.embulk.spi.ExecutorPlugin)2 FileInputRunner (org.embulk.spi.FileInputRunner)2 Page (org.embulk.spi.Page)2