use of technology.tabula.Table in project drill by apache.
the class PdfUtils method getSpecificTable.
/**
* Returns a specific table from a PDF document. Returns null in the event that
* the user requests a table that does not exist. If there is an error with the document
* the function will throw a UserException.
* @param document The source PDF document
* @param tableIndex The index of the desired table
* @return The desired Table, null if the table is not valid, or if the document has no tables.
*/
public static Table getSpecificTable(PDDocument document, int tableIndex, ExtractionAlgorithm algorithm) {
NurminenDetectionAlgorithm detectionAlgorithm = new NurminenDetectionAlgorithm();
ExtractionAlgorithm algExtractor;
if (algorithm == null) {
algExtractor = DEFAULT_ALGORITHM;
} else {
algExtractor = algorithm;
}
ObjectExtractor objectExtractor = new ObjectExtractor(document);
PageIterator pages = objectExtractor.extract();
Table specificTable;
int tableCounter = 0;
while (pages.hasNext()) {
Page page = pages.next();
List<Rectangle> rectanglesOnPage = detectionAlgorithm.detect(page);
List<Table> tablesOnPage = new ArrayList<>();
for (Rectangle guessRect : rectanglesOnPage) {
Page guess = page.getArea(guessRect);
tablesOnPage.addAll(algExtractor.extract(guess));
if (tablesOnPage.size() == 0) {
return null;
}
for (Table table : tablesOnPage) {
if (tableCounter == tableIndex) {
specificTable = table;
return specificTable;
}
tableCounter++;
}
}
}
try {
objectExtractor.close();
} catch (Exception e) {
throw UserException.parseError(e).message("Error extracting table: " + e.getMessage()).build(logger);
}
return null;
}
use of technology.tabula.Table in project drill by apache.
the class TestPdfUtils method testTableExtractorWithNoBoundingFrame.
@Test
public void testTableExtractorWithNoBoundingFrame() throws Exception {
PDDocument document = getDocument("spreadsheet_no_bounding_frame.pdf");
List<Table> tableList = PdfUtils.extractTablesFromPDF(document);
document.close();
assertEquals(tableList.size(), 1);
}
use of technology.tabula.Table in project drill by apache.
the class TestPdfUtils method testTableExtractorWitMultipage.
@Test
public void testTableExtractorWitMultipage() throws Exception {
PDDocument document = getDocument("us-020.pdf");
List<Table> tableList = PdfUtils.extractTablesFromPDF(document);
document.close();
assertEquals(tableList.size(), 4);
}
use of technology.tabula.Table in project drill by apache.
the class TestPdfUtils method testGetSpecificTableOutSideOfBounds.
@Test
public void testGetSpecificTableOutSideOfBounds() throws Exception {
PDDocument document = getDocument("us-020.pdf");
Table table = PdfUtils.getSpecificTable(document, 4, null);
assertNull(table);
}
Aggregations