The easiest way to install OCRmyPDF is to follow the steps for your operatingsystem/platform. This version may be out of date, however.
Official docker build of Clear Linux OS for Intel Architecture. Brew install docker-compose. In prometheus.yml, add the following. Obviously, you can alter the scrape interval: scrapeconfigs: - jobname: cadvisor scrapeinterval: 5s staticconfigs: - targets: - cadvisor:8080 Then you will have to alter or create the docker-compose.yml config file. Today I’d like to announce Homebrew 2.3.0. The most significant changes since 2.2.0 are GitHub Actions CI usage, fetching resources before installation, Docker image improvements and the deprecation of brew install from URLs. Build performed in parallel, so, it is highly recommended to not use npm task per platform (e.g. Npm run dist:mac && npm run dist:win32), but specify multiple platforms/targets in one build command.
These platforms have one-liner installs:
Debian, Ubuntu | aptinstallocrmypdf |
Windows Subsystem for Linux | aptinstallocrmypdf |
Fedora | dnfinstallocrmypdf |
macOS | brewinstallocrmypdf |
LinuxBrew | brewinstallocrmypdf |
FreeBSD | pkginstallpy37-ocrmypdf |
Conda (WSL, macOS, Linux) | condainstallocrmypdf |
More detailed procedures are outlined below. If you want to do a manualinstall, or install a more recent version than your platform provides, read on.
Platform-specific steps
- Installing on Linux
- Installing on macOS
- Installing on Windows
- Installing with Python pip
- Installing HEAD revision from sources
OCRmyPDF versions in Debian & Ubuntu |
Users of Debian 9 (“stretch”) or later, or Ubuntu 18.04 or later, including usersof Windows Subsystem for Linux, may simply
As indicated in the table above, Debian and Ubuntu releases may lagbehind the latest version. If the version available for your platform isout of date, you could opt to install the latest version from source.See Installing HEAD revision fromsources. Ubuntu 16.10 to 17.10inclusive also had ocrmypdf, but these versions are end of life.
For full details on version availability for your platform, check theDebian Package Tracker orUbuntu launchpad.net.
Note
OCRmyPDF for Debian and Ubuntu currently omit the JBIG2 encoder.OCRmyPDF works fine without it but will produce larger output files.If you build jbig2enc from source, ocrmypdf 7.0.0 and later willautomatically detect it (specifically the jbig2
binary) on thePATH
. To add JBIG2 encoding, see Installing the JBIG2 encoder.
OCRmyPDF version |
Users of Fedora 29 or later may simply
For full details on version availability, check the Fedora PackageTracker.
If the version available for your platform is out of date, you could optto install the latest version from source. See Installing HEAD revisionfrom sources.
Note
OCRmyPDF for Fedora currently omits the JBIG2 encoder due to patentissues. OCRmyPDF works fine without it but will produce larger outputfiles. If you build jbig2enc from source, ocrmypdf 7.0.0 and laterwill automatically detect it on the PATH
. To add JBIG2 encoding,see Installing the JBIG2 encoder.
Ubuntu 20.04 includes ocrmypdf 9.6.0 - you can install that with apt
. Toinstall a more recent version, uninstall the system-provided version ofocrmypdf, and install the following dependencies:
To install ocrmypdf for the system:
To install for the current user only:
Ubuntu 18.04 includes ocrmypdf 6.1.2 - you can install that with apt
, butit is quite old now. To install a more recent version, uninstall the old versionof ocrmypdf, and install the following dependencies:
We will need a newer version of pip
then was available for Ubuntu 18.04:
Then install the most recent ocrmypdf for the local user and set theuser’s PATH
to check for the user’s Python packages.
To add JBIG2 encoding, see Installing the JBIG2 encoder.
No package is available for Ubuntu 16.04. OCRmyPDF 8.0 and newer requirePython 3.6. Ubuntu 16.04 ships Python 3.5, but you can install Python3.6 on it. Or, you can skip Python 3.6 and install OCRmyPDF 7.x or older- for that procedure, please see the installation documentation for theversion of OCRmyPDF you plan to use.
Install system packages for OCRmyPDF
This will install a Python 3.6 binary at /usr/bin/python3.6
alongside the system’s Python 3.5. Do not remove the system Python. Thiswill also install Tesseract 4.0 from a PPA, since the version availablein Ubuntu 16.04 is too old for OCRmyPDF.
Now install pip for Python 3.6. This will install the Python 3.6 versionof pip
at /usr/local/bin/pip
.
Install OCRmyPDF
OCRmyPDF requires the locale to be set for UTF-8. On some minimalUbuntu installations, such as the Ubuntu 16.04 Docker images it may benecessary to set the locale.
Now install OCRmyPDF for the current user, and ensure that the PATH
environment variable contains $HOME/.local/bin
.
To add JBIG2 encoding, see Installing the JBIG2 encoder.
There is an Arch User Repository (AUR) package for OCRmyPDF.
Installing AUR packages as root is not allowed, so you must first setup anon-root user andconfigure sudo.The standard Docker image, archlinux/base:latest
, does not have anon-root user configured, so users of that image must follow these guides. Ifyou are using a VM image, such as the official Vagrant image, this work may alreadybe completed for you.
Next you should install the base-devel package group. This includes thestandard tooling needed to build packages, such as a compiler and binary tools.
Now you are ready to install the OCRmyPDF package.
At this point you will have a working install of OCRmyPDF, but the Tesseractinstall won’t include any OCR language data. You can install thetesseract-data package group to add all supportedlanguages, or use that package listing to identify the appropriate package foryour desired language.
As an alternative to this manual procedure, consider using an AUR helper. Such a tool willautomatically fetch, build and install the AUR package, resolve dependencies(including dependencies on AUR packages), and ease the upgrade procedure.
If you have any difficulties with installation, check the repository packagepage.
Note
The OCRmyPDF AUR package currently omits the JBIG2 encoder. OCRmyPDF worksfine without it but will produce larger output files. The encoder isavailable from the jbig2enc-git AUR package and may be installedusing the same series of steps as for the installation OCRmyPDF AURpackage. Alternatively, it may be built manually from source following theinstructions in Installing the JBIG2 encoder. If JBIG2 isinstalled, OCRmyPDF 7.0.0 and later will automatically detect it.
To install OCRmyPDF for Alpine Linux:
There is no OS-level packaging available for Mageia, so you must install thedependencies:
To install ocrmypdf for the system:
Or, to install for the current user only:
See theRepology page.
In general, first install the OCRmyPDF package for your system, thenoptionally use the procedure Installing with Pythonpip to install a more recent version.
OCRmyPDF is now a standard Homebrew formula. Toinstall on macOS:
This will include only the English language pack. If you need otherlanguages you can optionally install them all:
Note
Users who previously installed OCRmyPDF on macOS usingpipinstallocrmypdf
should remove the pip version(pip3uninstallocrmypdf
) before switching to the Homebrewversion.
Note
Users who previously installed OCRmyPDF from the private tap shouldswitch to the mainline version (brewuntapjbarlow83/ocrmypdf
)and install from there.
These instructions probably work on all macOS supported by Homebrew, and arefor installing a more current version of OCRmyPDF than is available fromHomebrew. Note that the Homebrew versions usually track the release versionsfairly closely.
If it’s not already present, install Homebrew.
Update Homebrew:
Install or upgrade the required Homebrew packages, if any are missing.To do this, use breweditocrmypdf
to obtain a recent list of Homebrewdependencies. You could also check the azure-pipelines.yml
.
This will include the English, French, German and Spanish languagepacks. If you need other languages you can optionally install them all:
Update the homebrew pip:
You can then install OCRmyPDF from PyPI, for the current user:
or system-wide:
The command line program should now be available:
Note
Administrator privileges will be required for some of these steps.
You must install the following for Windows:
- Python 3.7 (64-bit) or later
- Tesseract 4.0 or later
- Ghostscript 9.50 or later
Using the Chocolatey package manager, install thefollowing when running in an Administrator command prompt:
chocoinstallpython3
chocoinstall--pretesseract
chocoinstallghostscript
chocoinstallpngquant
(optional)
The commands above will install Python 3.x (latest version), Tesseract, Ghostscriptand pngquant. Chocolatey may also need to install the Windows Visual C++ RuntimeDLLs or other Windows patches, and may require a reboot.
You may then use pip
to install ocrmypdf. (This can performed by a user orAdministrator.):
pipinstallocrmypdf
Chocolatey automatically selects appropriate versions of these applications. If youare installing them manually, please install 64-bit versions of all applications for64-bit Windows, or 32-bit versions of all applications for 32-bit Windows. Mixingthe “bitness” of these programs will lead to errors.
OCRmyPDF will check the Windows Registry and standard locations in your Program Filesfor third party software it needs (specifically, Tesseract and Ghostscript). Tooverride the versions OCRmyPDF selects, you can modify the PATH
environmentvariable. Follow these directionsto change the PATH.
Warning
As of early 2021, users have reported problems with the Microsoft Store version ofPython affected most third party Python packages including OCRmyPDF. Please usePython downloaded from Python.org or Chocolatey as recommended here.
- Install Ubuntu 18.04 for Windows Subsystem for Linux, if not already installed.
- Follow the procedure to install OCRmyPDF on Ubuntu 18.04.
- Open the Windows command prompt and create a symlink:
Then confirm that the expected version from PyPI () is installed:
You can then run OCRmyPDF in the Windows command prompt or Powershell, prefixingwsl
, and call it from Windows programs or batch files.
First install the the following prerequisite Cygwin packages using setup-x86_64.exe
:
Note
The Cygwin package for Ghostscript in versions 9.52 and9.52-1 contained a bug that caused an exception to occur whenocrmypdf invoked gs. Make sure you have either 9.50 (or earlier)or 9.52-2 (or later).
Then open a Cygwin terminal (i.e. mintty
), run the following commands. Notethat if you are using the version of pip
that was installed with the CygwinPython package, the command name will be pip3
. If you have since updatedpip
(with, for instance pip3install--upgradepip
) the the command islikely just pip
instead of pip3
:
The optional dependency “unpaper” that is currently not available under Cygwin.Without it, certain options such as --clean
will produce an error message.However, the OCR-to-text-layer functionality is available.
You can also Install the Docker container on Windows. Ensure thatyour command prompt can run the docker “hello world” container.
FreeBSD 11.3, 12.0, 12.1-RELEASE and 13.0-CURRENT are supported. Otherversions likely work but have not been tested.
To install a more recent version, you could attempt to first install the systemversion with pkg
, then use pipinstall--userocrmypdf
.
For some users, installing the Docker image will be easier thaninstalling all of OCRmyPDF’s dependencies.
See OCRmyPDF Docker image for more information.
OCRmyPDF is delivered by PyPI because it is a convenient way to installthe latest version. However, PyPI and pip
cannot address the factthat ocrmypdf
depends on certain non-Python system libraries andprograms being installed.
For best results, first install your platform’sversion ofocrmypdf
, using the instructions elsewhere in this document. Thenyou can use pip
to get the latest version if your platform versionis out of date. Chances are that this will satisfy most dependencies.
Use ocrmypdf--version
to confirm what version was installed.
Then you can install the latest OCRmyPDF from the Python wheels. Firsttry:
You should then be able to run ocrmypdf--version
and see that thelatest version was located.
Since pip3install--user
does not work correctly on some platforms,notably Ubuntu 16.04 and older, and the Homebrew version of Python,instead use this for a system wide installation:
Note
AArch64 (ARM64) users: this process will be difficult because mostPython packages are not available as binary wheels for your platform.You’re probably better off using a platform install on Debian, Ubuntu,or Fedora.
OCRmyPDF currently requires these external programs and libraries to beinstalled, and must be satisfied using the operating system packagemanager. pip
cannot provide them.
- Python 3.6 or newer
- Ghostscript 9.15 or newer
- qpdf 8.1.0 or newer
- Tesseract 4.0.0-beta or newer
As of ocrmypdf 7.2.1, the following versions are recommended:
- Python 3.7 or 3.8
- Ghostscript 9.23 or newer
- qpdf 8.2.1
- Tesseract 4.0.0 or newer
- jbig2enc 0.29 or newer
- pngquant 2.5 or newer
- unpaper 6.1
jbig2enc, pngquant, and unpaper are optional. If missing certainfeatures are disabled. OCRmyPDF will discover them as soon as they areavailable.
jbig2enc, if present, will be used to optimize the encoding ofmonochrome images. This can significantly reduce the file size of theoutput file. It is not required.jbig2enc is not generallyavailable for Ubuntu or Debian due to lingering concerns about patentissues, but can easily be built from source. To add JBIG2 encoding, seeInstalling the JBIG2 encoder.
pngquant, if present, is optionally used to optimize the encoding ofPNG-style images in PDFs (actually, any that are that losslesslyencoded) by lossily quantizing to a smaller color palette. It is onlyactivated then the --optimize
argument is 2
or 3
.
unpaper, if present, enables the --clean
and --clean-final
command line options.
These are in addition to the Python packaging dependencies, meaning thatunfortunately, the pipinstall
command cannot satisfy all of them.
If you have git
and Python 3.6 or newer installed, you can installfrom source. When the pip
installer runs, it will alert you ifdependencies are missing.
If you prefer to build every from source, you will need to buildpikepdf fromsource.First ensure you can build and install pikepdf.
To install the HEAD revision from sources in the current Python 3environment:
Or, to install in developmentmode,allowing customization of OCRmyPDF, use the -e
flag:
You may find it easiest to install in a virtual environment, rather thansystem-wide:
However, ocrmypdf
will only be accessible on the system PATH whenyou activate the virtual environment.
To run the program:
If not yet installed, the script will notify you about dependencies thatneed to be installed. The script requires specific versions of thedependencies. Older version than the ones mentioned in the release notesare likely not to be compatible to OCRmyPDF.
To install all of the development and test requirements:
To add JBIG2 encoding, see Installing the JBIG2 encoder.
Completions for bash
and fish
are available in the project’smisc/completion
folder. The bash
completions are likely zsh
compatible but this has not been confirmed. Package maintainers, pleaseinstall these at the appropriate locations for your system.
To manually install the bash
completion, copymisc/completion/ocrmypdf.bash
to /etc/bash_completion.d/ocrmypdf
(rename the file).
To manually install the fish
completion, copymisc/completion/ocrmypdf.fish
to~/.config/fish/completions/ocrmypdf.fish
.
You can install the CLI with a curl
utility script, brew
or by downloading the binary from the releases page. Once installed you'll get the faas-cli
command and faas
alias.
Linux or macOS¶
Utility script with curl
:
The flag -E
allows for any http_proxy
environmental variables to be passed through to the installation bash script.
Non-root with curl downloads the binary into your current directory and will then print installation instructions:
Via brew:
Note
The brew
release may not run the latest minor release but is updated regularly.
Windows¶
In PowerShell:
Environment variable overrides¶
Several overrides exist which will be used by default if set and no other command-line flag has been set.
OPENFAAS_TEMPLATE_URL
- to set the default URL to pull templates fromOPENFAAS_PREFIX
- for use withfaas-cli new
- this can act in place of--prefix
OPENFAAS_URL
- to override the default gateway URL
Running faas-cli
with sudo¶
Docker Brew Install
If you're running the faas-cli with sudo
we recommend using sudo -E
to pass through any environmental variables you may have configured such as a http_proxy
, https_proxy
or no_proxy
entry.
Docker Brew Install
Docker image¶
The faas-cli
is also available as a Docker image making it convenient for use in CI jobs such as with a Jenkins pipeline or a task in cron.
There is no 'latest' tag, so find the version of the CLI you want to use from the tags page on the Docker Hub. These correspond to the release from GitHub.
Docker Brew Vs Dmg
Note: the Docker image cannot be used to perform a build directly, but you can use it to generate a build context which can be used with a container builder such as Docker, buildkit or Kaniko in another part of your build pipeline.
Use-cases for the Docker image:
Docker Brew
- Generate the build context without running
docker build
-faas-cli --shrinkwrap
- Deploy an existing image to a remote server
faas-cli deploy
- Manage secrets with
faas-cli secret
- Invoke functions via cron with
faas-cli invoke
- Check the health of your remote gateway with
faas-cli info
Building from source¶
The contributing guide has instructions for building from source and for configuring a Golang development environment.
Docker Brew Image
- Star/fork on GitHub: faas-cli