官方github地址:

https://github.com/tesseract-ocr/tesseract

注意,tess4j中用到的JAI类库只支持以下图像类型:

详情可到进入下面链接查看:

https://github.com/jai-imageio/jai-imageio-core

安装系统环境(可选)

https://github.com/UB-Mannheim/tesseract/wiki

如果不安装,则会在执行OCR识别时出现如下作物提示:

java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package.
https://github.com/jai-imageio/jai-imageio-core
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215) ~[tess4j-4.5.5.jar:4.5.5]
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195) ~[tess4j-4.5.5.jar:4.5.5]

下载语言数据

https://github.com/tesseract-ocr/tessdata

到这个连接中,下载zip包:

做OCR识别之前,不配置tessdata的话,会出现以下错误:

当然,你也可以只下载eng.traineddata

引入依赖

//OCR依赖
implementation 'net.sourceforge.tess4j:tess4j:4.5.5'

//JAI Image I/O 扩展库
implementation group: 'com.github.jai-imageio', name: 'jai-imageio-jpeg2000', version: '1.4.0'

代码示例

        File ocrFile = new File("ocr.png ");
        //使用OCR提取图片文字
        Tesseract tesseract = new Tesseract();
        //设置 Tesseract 数据文件的路径,如果不是默认路径的话
        //tesseract.setDatapath("path_to_your_tessdata_folder");
        try {
            String result = tesseract.doOCR(ocrFile);
            System.out.println(result);
        } catch (TesseractException e) {
            System.err.println(e.getMessage());
        }

其他

java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package.
https://github.com/jai-imageio/jai-imageio-core
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215) ~[tess4j-4.5.5.jar:4.5.5]
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195) ~[tess4j-4.5.5.jar:4.5.5]

出现这种情况,请注意一下是否你的图像类型不属于支持的范围,请查看JAI的官网链接,别怀疑,PNG和JPG都是不支持的~!