How to Convert HTML to PDF with xhtml2pdf

Introduction

xhtml2pdf is a popular Python library that facilitates the conversion of HTML to PDF format. It leverages HTML and CSS to create PDFs, making it an ideal choice for Python developers looking to generate PDF documents from web content. This article guides you through the process of using xhtml2pdf, complete with straightforward code examples.

Why xhtml2pdf?

Ease of Use: Simple and intuitive API, making it accessible for beginners.
CSS Support: Good support for CSS to style PDF documents.
Flexibility: Works well for both simple and moderately complex PDF generation tasks.

Getting Started with xhtml2pdf

Step 1: Install xhtml2pdf

First, you need to install xhtml2pdf. You can do this easily using pip:

pip install xhtml2pdf

Step 2: Convert HTML to PDF

With xhtml2pdf, you can convert a simple HTML string to a PDF file or read from an HTML file. Here’s an example of converting an HTML string to a PDF:

from xhtml2pdf import pisa
from io import BytesIO

# Define your HTML content
html_content = """
<html>
<head>
    <title>Convert HTML to PDF</title>
</head>
<body>
    <h1>Welcome to xhtml2pdf!</h1>
    <p>This is a simple HTML to PDF conversion example.</p>
</body>
</html>
"""

# Create a BytesIO buffer for the PDF
output = BytesIO()

# Convert HTML content to PDF
pdf_status = pisa.CreatePDF(BytesIO(html_content.encode('utf-8')), dest=output)

# Check if there was an error
if pdf_status.err:
    print("An error occurred.")
else:
    # Save the PDF to a file
    with open("output.pdf", "wb") as pdf_file:
        pdf_file.write(output.getvalue())
    print("PDF created successfully.")

In this example, pisa.CreatePDF is used to convert the HTML content into a PDF. The result is written into a BytesIO buffer, which is then saved to a file.

Converting HTML Files to PDF

To convert an HTML file to a PDF, you can modify the above example to read the HTML content from a file:

from xhtml2pdf import pisa

# Function to convert HTML to PDF
def convert_html_to_pdf(source_html, output_filename):
    with open(source_html, 'r') as html_file:
        html_content = html_file.read()

    with open(output_filename, 'wb') as pdf_file:
        pisa_status = pisa.CreatePDF(html_content, dest=pdf_file)

    if pisa_status.err:
        return False
    return True

# Convert an HTML file to a PDF file
if convert_html_to_pdf('input.html', 'output.pdf'):
    print("PDF created successfully.")
else:
    print("An error occurred.")

This function reads the HTML content from source_html and writes the converted PDF to output_filename. The pisa.CreatePDF function does the conversion.

Other Python libraries

There are other Python libraries capable of converting HTML to PDF and you can find more information about it in this article on How To Convert HTML to PDF with Python.

Conclusion

xhtml2pdf provides a Pythonic way to convert HTML to PDF, offering a balance between ease of use and flexibility. Whether you’re generating reports, invoices, or any other PDF documents from HTML content, xhtml2pdf is a capable tool that fits seamlessly into Python projects.