官方github地址:
https://github.com/tesseract-ocr/tesseract
注意,tess4j中用到的JAI类库只支持以下图像类型:

详情可到进入下面链接查看:
https://github.com/jai-imageio/jai-imageio-core
安装系统环境(可选)
https://github.com/UB-Mannheim/tesseract/wiki

如果不安装,则会在执行OCR识别时出现如下作物提示:
java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package. https://github.com/jai-imageio/jai-imageio-core at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215) ~[tess4j-4.5.5.jar:4.5.5] at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195) ~[tess4j-4.5.5.jar:4.5.5]
下载语言数据
https://github.com/tesseract-ocr/tessdata
到这个连接中,下载zip包:

做OCR识别之前,不配置tessdata的话,会出现以下错误:

当然,你也可以只下载eng.traineddata
引入依赖
//OCR依赖 implementation 'net.sourceforge.tess4j:tess4j:4.5.5' //JAI Image I/O 扩展库 implementation group: 'com.github.jai-imageio', name: 'jai-imageio-jpeg2000', version: '1.4.0'
代码示例
File ocrFile = new File("ocr.png ");
//使用OCR提取图片文字
Tesseract tesseract = new Tesseract();
//设置 Tesseract 数据文件的路径,如果不是默认路径的话
//tesseract.setDatapath("path_to_your_tessdata_folder");
try {
String result = tesseract.doOCR(ocrFile);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
其他
java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package. https://github.com/jai-imageio/jai-imageio-core at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215) ~[tess4j-4.5.5.jar:4.5.5] at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195) ~[tess4j-4.5.5.jar:4.5.5]
出现这种情况,请注意一下是否你的图像类型不属于支持的范围,请查看JAI的官网链接,别怀疑,PNG和JPG都是不支持的~!