FOSS for Academia: qpdf
Whenever possible, I try to find and use FOSS tools in my academic life. FOSS stands for “free, open-source software,” where “free” in this context means “freedom” and not simply its cost (though its cost is also free). While there are various levels of zealousness in the open-source community (as well as various debates about what constitutes freedom), I try to make use of it not only for my own ideological reasons but because it is a better way to develop software. Without this turning into too much of a lecture about the merit of FOSS, it is enough to say that I am always seeking out ways to incorporate FOSS into my work.
A recently indispensable tool is qpdf
:
GitHub - qpdf/qpdf: qpdf: A content-preserving PDF document transformer
qpdf: A content-preserving PDF document transformer - qpdf/qpdf
At one point or another in an academic career, you are likely to struggle to do something with a PDF. While I generally think standardization can be useful so that we are not in a constant battle over formats, the standardization of PDF as the go-to document format is disappointing. Adobe’s software is garbage for a variety of reasons, and many things you might want to do with a PDF requires more pain than it ever should. Luckily, qpdf
fixes a lot of this pain, assuming you are comfortable with the CLI software.
The goal of qpdf
is to ease “content-preserving transformations” of PDF files. This becomes a lifesaver when I need to do things that should be simple, but are otherwise complex with Adobe Reader. While qpdf
has all kinds of features, here are a few I find myself using all the time:
Simple page extraction
qpdf in.pdf --pages . PAGENUMBERS -- out.pdf
qpdf
can parse a variety of inputs for the page numbers, including a single page (e.g. 1
), a range of pages (e.g. 1-5
), and a noncontinuous group of pages (e.g. 1-5,9,19-35
). This flexibility can also allow the rearrangement of pages, as in:
qpdf in.pdf --pages . 15-20,1-14 -- out.pdf
Merge multiple PDFs
qpdf --empty --pages *.pdf -- out.pdf
Assuming you have a directory full of PDF files, the previous command will merge all of them into a single file, out.pdf
.
Remove non-page data from a PDF
qpdf --empty --pages in.pdf -- out.pdf
This non-page data command can also be extended to extract specific pages, which can be helpful when selecting pages from PDFs with a table of contents or other non-page data that might be broken in an extracted PDF, for example:
qpdf --empty --pages in.pdf 1-5 -- out.pdf
Splitting, merging, and extracting pages from a PDF can be an incredible pain with other software, but qpdf
makes it simple. Cannot recommend it enough!