killoturtle.blogg.se

Text recognition software open source
Text recognition software open source










text recognition software open source
  1. #Text recognition software open source pdf#
  2. #Text recognition software open source archive#
  3. #Text recognition software open source license#
  4. #Text recognition software open source series#

Deployed instance available at, results are available in nw-page-editor - Simple app for visual editing of Page XML files. archiscribe - Web application for transcribing OCR ground truth from.LAREX - A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.Also supports ALTO XML, FineReader XML, and HOCR. PRImA PAGE Viewer - Java based viewer for PAGE XML files (layout + text content).OCRFeeder - GTK graphical user interface that allows the users to correct characters or bounding boxes, ODT export and more.

#Text recognition software open source series#

PoCoTo - Fast interactive batch corrections of complete OCR error series in OCR'ed historical documents.VietOCR - A Java/.NET GUI frontend for Tesseract OCR engine, including jTessBo圎ditor a graphical Tesseract box data editor.gImageReader - gImageReader is a simple Gtk/Qt front-end to tesseract-ocr.

text recognition software open source

#Text recognition software open source archive#

  • Paperless - Scan, index, and archive all of your paper documents.
  • Paperwork - Using scanners and OCR to grep paper documents the easy way.
  • ocr-gt-tools - Client-Server application for editing OCR ground truth.
  • qt-box-editor - QT4 editor of tesseract-ocr box files.
  • moz-hocr-editor - Firefox Addon for editing hOCR files Discontinued.
  • tesseract-recognize - Tesseract-based tool that outputs result in Page XML format ( docker image).
  • Ocrocis - Project manager interface for Ocropy, see also external project homepage.
  • #Text recognition software open source pdf#

    Pdf2PdfOCR - A tool to OCR a PDF (or supported images) and add a text "layer" (a "pdf sandwich") in the original file making it a searchable PDF.OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched.py-pagexml - Python library for handling PAGE XML and OPF files.omni:us Pages Format (OPF) - XML schema very similar to PAGE XML that has some additional features.PAGE-XML Schema - XML schema of the PAGE XML format along with documentation and examples.GDZ - METS/TEI-based GDZ document format.TEI SIG on Libraries - Best Practices for TEI in Libraries.TEI-OCR - TEI customization for OCR generated layout and content information.AbbyyToAlto - PHP script converting from Abbyy 6 to ALTO XML.alto-tools - Various tools to work with ALTO files, Python.ALTO XML Documentation - Documentation and use cases for ALTO.ALTO XML Schema - XML Schema and development of the ALTO XML format.hOCRTools - hOCR to ALTO conversion XSLT.hocr-parser - hOCR Specification Python Parser.ocr-transform - CLI tool to convert between hOCR and ALTO, MIT.hocr-tools - Tools for doing various useful things with hOCR files, Apache 2.0.hebOCR - Hebrew character recognition library (previously named hocr, see Wikipedia article) GPL.xplab - A GTK 2 tool for pattern matching.OCRchie - Modular Optical Character Recognition Software.kognition - An omnifont OCR software for KDE.Eye - an experimental Java OCR (image-to-text) application.Cuneiform - CuneiForm OCR was developed by Cognitive Technologies.doctr - A seamless & high-performing OCR library powered by Deep Learning.Calamari - OCR Engine based on OCRopy and Kraken.simple-ocr-opencv and its fork - A simple pythonic OCR engine using opencv and numpy.RWTH-OCR - The RWTH Aachen University Optical Character Recognition System.attention-ocr - OCR engine using visual attention mechanisms.SwiftOCR - fast and simple OCR library written in Swift.

    text recognition software open source

  • ocular - Machine-learning OCR for historic documents.
  • #Text recognition software open source license#

  • gocr - OCR engine under the GNU Public License led by Joerg Schulenburg.
  • kraken - Ocropus fork with sane defaults.
  • ocropus 0.4 - Older v0.4 state of Ocropus, with tesseract 2.04 and iulib, C++.
  • ocropus - OCR engine based on LSTM, Apache 2.0.
  • EasyOCR - OCR engine built on PyTorch by JaidedAI, Apache 2.0.
  • tesseract - The definitive Open Source OCR engine Apache 2.0.
  • Older and possibly abandoned OCR engines.
  • This list contains links to great software tools and libraries and literatureĬontributions are welcome, as is feedback.












    Text recognition software open source