Exemplos de TesseractOCRParser em Java

Linguagem de programação: Java

Espaço para nome / nome do pacote: org.apache.tika.parser.ocr

Classe / Tipo: TesseractOCRParser

Exemplos em hotexamples.com: 4

TesseractOCRParser em Java - 4 exemplos encontrados. Esses são os exemplos do mundo real mais bem avaliados de org.apache.tika.parser.ocr.TesseractOCRParser em Java extraídos de projetos de código aberto. Você pode avaliar os exemplos para nos ajudar a melhorar a qualidade deles.

Métodos Frequentes

Exibir Ocultar

getSupportedTypes(2)

getImageMagickProg(1)

hasTesseract(1)

parseInline(1)

Métodos Frequentes

getSupportedTypes (2)

getImageMagickProg (1)

hasTesseract (1)

parseInline (1)

Relacionados

EnumWikipedia

GameState

SyngentaReporter

PropertyMetaData

CanaryWorld

LogFactory

Encodes

TestingJtaPlatformImpl.INSTANCE

Exception

Related in langs

ManagerInterface (PHP)

byrd_create_user (PHP)

ParameterBinding (C#)

RecruiterCandidates (C#)

CPLPushErrorHandler (C++)

config_found (C++)

GoAndroid_destroyManager (Go)

IndexTemplate (Go)

VersionInfo (Python)

read (Python)

Exemplo n.º 1

0

Exibir arquivo

Arquivo: AbstractPDF2XHTML.java Projeto: Zarana-Parekh/tika

void doOCROnCurrentPage() throws IOException, TikaException, SAXException { if (config.getOCRStrategy().equals(NO_OCR)) { return; } TesseractOCRConfig tesseractConfig = context.get(TesseractOCRConfig.class, DEFAULT_TESSERACT_CONFIG); TesseractOCRParser tesseractOCRParser = new TesseractOCRParser(); if (!tesseractOCRParser.hasTesseract(tesseractConfig)) { throw new TikaException( "Tesseract is not available. " + "Please set the OCR_STRATEGY to NO_OCR or configure Tesseract correctly"); } PDFRenderer renderer = new PDFRenderer(pdDocument); TemporaryResources tmp = new TemporaryResources(); try { BufferedImage image = renderer.renderImage(pageIndex, 2.0f, config.getOCRImageType()); Path tmpFile = tmp.createTempFile(); try (OutputStream os = Files.newOutputStream(tmpFile)) { // TODO: get output format from TesseractConfig ImageIOUtil.writeImage(image, config.getOCRImageFormatName(), os, config.getOCRDPI()); } try (InputStream is = TikaInputStream.get(tmpFile)) { tesseractOCRParser.parseInline(is, xhtml, tesseractConfig); } } catch (IOException e) { handleCatchableIOE(e); } catch (SAXException e) { throw new IOExceptionWithCause("error writing OCR content from PDF", e); } finally { tmp.dispose(); } }

Exemplo n.º 2

0

Exibir arquivo

Arquivo: TesseractOCRParserTest.java Projeto: Zarana-Parekh/tika

@Test public void testImageMagick() throws Exception { InputStream stream = TesseractOCRConfig.class.getResourceAsStream("/test-properties/TesseractOCR.properties"); TesseractOCRConfig config = new TesseractOCRConfig(stream); String[] CheckCmd = {config.getImageMagickPath() + TesseractOCRParser.getImageMagickProg()}; assumeTrue(ExternalParser.check(CheckCmd)); }

Exemplo n.º 3

0

Exibir arquivo

Arquivo: TesseractOCRParserTest.java Projeto: Zarana-Parekh/tika

/* If Tesseract is found, test we retrieve the proper number of supporting Parsers. */ @Test public void offersTypesIfFound() throws Exception { TesseractOCRParser parser = new TesseractOCRParser(); DefaultParser defaultParser = new DefaultParser(); ParseContext parseContext = new ParseContext(); MediaType png = MediaType.image("png"); // Assuming that Tesseract is on the path, we should find 5 Parsers that support PNG. assumeTrue(canRun()); assertEquals(5, parser.getSupportedTypes(parseContext).size()); assertTrue(parser.getSupportedTypes(parseContext).contains(png)); // DefaultParser will now select the TesseractOCRParser. assertEquals( TesseractOCRParser.class, defaultParser.getParsers(parseContext).get(png).getClass()); }

Exemplo n.º 4

0

Exibir arquivo

Arquivo: TesseractOCRParserTest.java Projeto: Zarana-Parekh/tika

/* Check that if Tesseract is not found, the TesseractOCRParser claims to not support any file types. So, the standard image parser is called instead. */ @Test public void offersNoTypesIfNotFound() throws Exception { TesseractOCRParser parser = new TesseractOCRParser(); DefaultParser defaultParser = new DefaultParser(); MediaType png = MediaType.image("png"); // With an invalid path, will offer no types TesseractOCRConfig invalidConfig = new TesseractOCRConfig(); invalidConfig.setTesseractPath("/made/up/path"); ParseContext parseContext = new ParseContext(); parseContext.set(TesseractOCRConfig.class, invalidConfig); // No types offered assertEquals(0, parser.getSupportedTypes(parseContext).size()); // And DefaultParser won't use us assertEquals(ImageParser.class, defaultParser.getParsers(parseContext).get(png).getClass()); }