Whenever possible, I try to find and use FOSS tools in my academic life. FOSS stands for “free, open-source software,” where “free” in this context means “freedom” and not simply its cost (though its cost is also free). While there are various levels of zealousness in the open-source community (as well as various debates about what constitutes freedom), I try to make use of it not only for my own ideological reasons but because it is a better way to develop software. Without this turning into too much of a lecture about the merit of FOSS, it is enough to say that I am always seeking out ways to incorporate FOSS into my work.

A recently indispensable tool is qpdf:

GitHub - qpdf/qpdf: QPDF: A content-preserving PDF document transformer

QPDF: A content-preserving PDF document transformer - qpdf/qpdf

At one point or another in an academic career, you are likely to struggle to do something with a PDF. While I generally think standardization can be useful so that we are not in a constant battle over formats, the standardization of PDF as the go-to document format is disappointing. Adobe’s software is garbage for a variety of reasons, and many things you might want to do with a PDF requires more pain than it ever should. Luckily, qpdf fixes a lot of this pain, assuming you are comfortable with the CLI software.

The goal of qpdf is to ease “content-preserving transformations” of PDF files. This becomes a lifesaver when I need to do things that should be simple, but are otherwise complex with Adobe Reader. While qpdf has all kinds of features, here are a few I find myself using all the time:

Simple page extraction

qpdf in.pdf --pages . PAGENUMBERS -- out.pdf

qpdf can parse a variety of inputs for the page numbers, including a single page (e.g. 1), a range of pages (e.g. 1-5), and a noncontinuous group of pages (e.g. 1-5,9,19-35). This flexibility can also allow the rearrangement of pages, as in:

qpdf in.pdf --pages . 15-20,1-14 -- out.pdf

Merge multiple PDFs

qpdf --empty --pages *.pdf -- out.pdf

Assuming you have a directory full of PDF files, the previous command will merge all of them into a single file, out.pdf.

Remove non-page data from a PDF

qpdf --empty --pages in.pdf -- out.pdf

This non-page data command can also be extended to extract specific pages, which can be helpful when selecting pages from PDFs with a table of contents or other non-page data that might be broken in an extracted PDF, for example:

qpdf --empty --pages in.pdf 1-5 -- out.pdf

Splitting, merging, and extracting pages from a PDF can be an incredible pain with other software, but qpdf makes it simple. Cannot recommend it enough!