Split PDF Files by Page with Python

March 15, 2024

Sometimes you need to split a large PDF document into individual pages. Whether you’re extracting specific pages for sharing or processing pages separately, this Python utility makes it straightforward.

The Solution

Using the PyPDF2 library, we can iterate through a PDF file and save each page as a separate document:

import PyPDF2

def split_pdf(input_pdf_path, output_folder):
    # Open the PDF file
    with open(input_pdf_path, 'rb') as input_file:
        # Create a PDF reader object
        pdf_reader = PyPDF2.PdfReader(input_file)

        # Iterate through each page in the PDF
        for page_num in range(len(pdf_reader.pages)):
            # Create a new PDF writer object for each page
            pdf_writer = PyPDF2.PdfWriter()
            pdf_writer.add_page(pdf_reader.pages[page_num])

            # Output PDF file name
            output_pdf_path = f"{output_folder}/page_{page_num + 1}.pdf"

            # Write the page to a new PDF file
            with open(output_pdf_path, 'wb') as output_file:
                pdf_writer.write(output_file)

# Example usage
input_pdf_path = 'input.pdf'  # Path to your input PDF file
output_folder = 'output_pages'  # Output folder where individual pages will be saved
split_pdf(input_pdf_path, output_folder)

How It Works

  1. Open the PDF: We use PyPDF2.PdfReader to read the input PDF file
  2. Iterate through pages: Loop through each page using pdf_reader.pages
  3. Create individual files: For each page, create a new PdfWriter object
  4. Save separately: Write each page to its own file with a numbered filename

Installation

First, install the required library:

pip install PyPDF2

Usage

  1. Place your PDF file in the same directory as the script (or provide the full path)
  2. Create an output folder for the split pages
  3. Run the script
  4. Find your individual page files in the output folder

The output files will be named page_1.pdf, page_2.pdf, etc.

Use Cases

This utility is useful for:

  • Extracting specific pages from large documents
  • Preparing individual pages for different recipients
  • Processing pages separately for OCR or analysis
  • Creating page-by-page backups of important documents

Simple, effective, and ready to use!


Written by Mykyta Khmel. I write about things I build and problems I solve - from scaling self-service portals to automating DORA metrics. Sometimes 3D graphics, always pragmatic. Find me on GitHub and Threads.