How To Convert HTML to PDF with Python (2025 Update)

Introduction

Creating PDFs programmatically in Python is a common task, and although there’s no single correct method for doing so, certain approaches can be more efficient and faster than others.
This article explores some of the most efficient methods for converting HTML to PDF using Python.

Why create PDF from HTML?

Here are some reasons on why generating PDFs from HTML is a good idea:

Market Norm: HTML serves as the internet’s foundation, providing a universally accepted and widespread method for organizing content.
Established Technology: Open standards ensure HTML compatibility across an extensive array of tools and technologies.
Flexibility: The availability of numerous tools eases the process of conversion.
CSS Customization: Employ CSS for intricate styling within PDF documents.
Media Integration: HTML offers a solid foundation for creating documents enriched with multimedia elements.

To sum up, changing HTML into PDFs brings together the best aspects of both: HTML’s flexibility, accessibility, and interactivity, and the portability and standardization of PDFs.

HTML to PDF using Python Libraries

Many Python libraries exist for creating PDFs from HTML content, and some of these are detailed below.

1. Pyppeteer

Pyppeteer is a Python port of Puppeteer, providing a high-level API to control Chrome/Chromium over the DevTools Protocol. It can be used for web scraping, taking screenshots, and generating PDFs from HTML. For a detailed guide on using Pyppeteer for PDF and image generation, see this article.

Let’s explore how we can use pyppeteer to create PDFs from HTML.

Step 1: Install Pyppeteer

First, we need to install pyppeteer with the following command:

pip install pyppeteer

Step 2: Python code to convert HTML to PDF with Pyppeteer

import asyncio
from pyppeteer import launch

async def create_pdf(url, pdf_path):
  browser = await launch()
  page = await browser.newPage()

  await page.goto(url)
  await page.pdf({'path': pdf_path, 'format': 'A4'})
  await browser.close()

# Run create_pdf
asyncio.get_event_loop().run_until_complete(generate_pdf('https://google.com', 'google_example.pdf'))

In the generate_pdf method of the code, the following steps are taken:

A new browser instance is launched in headless mode.
A new page is opened in the browser, and it waits until the page is ready.
The browser navigates to the URL provided in the url argument and waits for the page to fully load.
A PDF of the webpage is generated and saved to the location specified in pdf_path, with the format (dimensions) set to A4.
The headless browser is then closed.

Now, to generate a PDF from custom HTML this is the code and process:

import asyncio
from pyppeteer import launch

async def create_pdf_from_html(html, pdf_path):
  browser = await launch()
  page = await browser.newPage()

  await page.setContent(html)
  await page.pdf({'path': pdf_path, 'format': 'A4'})
  await browser.close()

# HTML content
html = '''
<html>
  <head>
      <title>PDF Example</title>
  </head>

  <body>
      <h1>Hey, this will turn into a PDF!</h1>
  </body>
</html>
'''

# Run create_pdf_from_html
asyncio.get_event_loop().run_until_complete(generate_pdf_from_html(html, 'from_html.pdf'))

The method generate_pdf_from_html in the Pyppeteer example above demonstrates how to generate PDFs using custom HTML content. Here’s what happens in this approach:

A new headless browser instance is launched.
A new page is opened in the headless browser, and it waits until it is ready.
The content of the page is explicitly set to our custom HTML content.
A PDF of the webpage is generated and saved to the location specified in pdf_path, with the format set to ‘A4’.
The headless browser is closed.

2. python-pdfkit

python-pdfkit is a Python wrapper for the wkhtmltopdf utility, which uses Webkit to convert HTML to PDF. For a detailed guide on using python-pdfkit, see this article.

Step 1: Install python-pdfkit

First, let’s install python-pdfkit with pip:

pip install pdfkit

Step 2: Python code to convert website URL to PDF with python-pdfkit

import pdfkit

# URL to fetch
url = 'https://engadget.com'

# PDF path to save
pdf_path = 'example.pdf'

pdfkit.from_url(url, pdf_path)

pdfkit supports generating PDFs from website URLs out of the box just like Pyppeteer.

In the above code, as you can see, pdfkit is generating pdf just from one line code. pdfkit.from_url is all you need to generate a PDF.

Step 3: Generating PDF from custom HTML content

import pdfkit

# HTML content
html = '''
<html>
  <head>
      <title>PDF Example</title>
  </head>

  <body>
      <h1>Hey, this will turn into a PDF!</h1>
  </body>
</html>
'''

# PDF path to save
pdf_path = 'example.pdf'

# Create PDF
pdfkit.from_string(html, pdf_path)

To generate a PDF from custom HTML content using python-pdfkit, you simply need to use pdfkit.from_string and provide the HTML content along with the path for the PDF file.

3. xhtml2pdf

xhtml2pdf is a Python library that enables the creation of PDFs from HTML content with a slightly different approach. Let’s take a look at xhtml2pdf in use. For a detailed guide on using xhtml2pdf, see this article.

Step 1: Install xhtml2pdf

To install xhtml2pdf, use the following command:

pip install xhtml2pdf requests

Step 2: Generate PDF from a website URL with xhtml2pdf:

It’s important to note that xhtml2pdf doesn’t have a built-in feature to parse URLs. However, we can use the requests library in Python to retrieve content from a URL.

from xhtml2pdf import pisa
import requests

def url_to_pdf(url, pdf_path):
  # Retrieve the HTML content from the URL
  response = requests.get(url)
  if response.status_code != 200:
      print(f"Failed to retrieve from URL: {url}")
      return False

  html_content = response.text

  # Create PDF
  with open(pdf_path, "wb") as pdf_file:
      pisa_status = pisa.CreatePDF(html_content, dest=pdf_file)
  return not pisa_status.err

# URL to retrieve
url_to_retrieve = "https://bbc.com"

# PDF path to save
pdf_path = "bbc.pdf"

# Create PDF
url_to_pdf(url_to_retrieve, pdf_path)

In the url_to_pdf method of the code, the following steps are performed:

We use the requests library to fetch the webpage content from the specified URL.
After receiving the content, we extract the text portion using response.text.
Next, for creating the PDF, we use pisa.CreatePDF. Here, we pass our HTML content and the name for the output PDF file.

Step 3: Create a PDF from custom HTML content using xhtml2pdf:

from xhtml2pdf import pisa

def convert_html_to_pdf(html_string, pdf_path):
  with open(pdf_path, "wb") as pdf_file:
    pisa_status = pisa.CreatePDF(html_string, dest=pdf_file)
  return not pisa_status.err

# HTML content
html = '''
<html>
  <head>
      <title>PDF Example</title>
  </head>

  <body>
      <h1>Hey, this will turn into a PDF!</h1>
  </body>
</html>
'''

# Create PDF
pdf_path = "example.pdf"
convert_html_to_pdf(html, pdf_path):

Creating a PDF from custom HTML content is similar to the process for URLs, with just one key difference: instead of passing a URL, we directly provide the actual HTML content to our creation method. The method then uses this custom HTML content to create the PDF.

4. Playwright

Playwright is a modern, lightweight library for headless browser automation. It supports multiple browsers (Firefox, Chromium, Edge, Safari) across platforms and languages, making it versatile for tasks like PDF generation.

To use Python as a converter for HTML to PDF with Playwright, follow these steps:

Step 1: Install Playwright:

pip install playwright
playwright install

Step 2: Generate PDF from Website URL:

import asyncio
from playwright.async_api import async_playwright

async def url_to_pdf(url, output_path):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto(url)
        await page.pdf(path=output_path)
        await browser.close()

# Example usage
url = 'https://example.com'
output_path = 'example_url.pdf'
asyncio.run(url_to_pdf(url, output_path))

Step 3: Generate PDF from Custom HTML Content:

import asyncio
from playwright.async_api import async_playwright

async def html_to_pdf(html_content, output_path):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.set_content(html_content)
        await page.pdf(path=output_path)
        await browser.close()

html_content = '''
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Sample HTML</title>
</head>
<body>
    <h1>Hello, World!</h1>
    <p>This is a sample HTML content to be converted to PDF.</p>
</body>
</html>
'''
output_path = 'custom_html.pdf'
asyncio.run(html_to_pdf(html_content, output_path))

5. WeasyPrint

WeasyPrint is a visual rendering engine that follows the W3C specifications for HTML and CSS. It’s known for its excellent CSS support and ability to generate high-quality PDFs without requiring external dependencies like browsers or rendering engines.

Step 1: Install WeasyPrint:

pip install weasyprint

Step 2: Generate PDF from Website URL:

from weasyprint import HTML

def url_to_pdf(url, output_path):
    HTML(url=url).write_pdf(output_path)

# Example usage
url = 'https://example.com'
output_path = 'example_url.pdf'
url_to_pdf(url, output_path)

Step 3: Generate PDF from Custom HTML Content:

from weasyprint import HTML

def html_to_pdf(html_content, output_path):
    HTML(string=html_content).write_pdf(output_path)

html_content = '''
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Sample HTML</title>
    <style>
        body { font-family: Arial, sans-serif; }
        h1 { color: navy; }
    </style>
</head>
<body>
    <h1>Hello, World!</h1>
    <p>This is a sample HTML content to be converted to PDF.</p>
</body>
</html>
'''
output_path = 'custom_html.pdf'
html_to_pdf(html_content, output_path)

6. Selenium

Selenium is a powerful tool for web automation that can also be used for PDF generation. It’s particularly useful when dealing with dynamic web pages that require JavaScript execution.

Step 1: Install Selenium:

pip install selenium webdriver-manager

Step 2: Generate PDF from Website URL:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options

def url_to_pdf(url, output_path):
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--print-to-pdf=' + output_path)

    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=chrome_options)

    try:
        driver.get(url)
        # Wait for any dynamic content to load
        driver.implicitly_wait(10)
    finally:
        driver.quit()

# Example usage
url = 'https://example.com'
output_path = 'example_url.pdf'
url_to_pdf(url, output_path)

Step 3: Generate PDF from Custom HTML Content:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options

def html_to_pdf(html_content, output_path):
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--print-to-pdf=' + output_path)

    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=chrome_options)

    try:
        # Create a temporary HTML file
        with open('temp.html', 'w') as f:
            f.write(html_content)

        # Load the temporary file
        driver.get('file://' + os.path.abspath('temp.html'))
        driver.implicitly_wait(10)
    finally:
        driver.quit()
        # Clean up temporary file
        os.remove('temp.html')

html_content = '''
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Sample HTML</title>
    <style>
        body { font-family: Arial, sans-serif; }
        h1 { color: navy; }
    </style>
</head>
<body>
    <h1>Hello, World!</h1>
    <p>This is a sample HTML content to be converted to PDF.</p>
</body>
</html>
'''
output_path = 'custom_html.pdf'
html_to_pdf(html_content, output_path)

Comparing all the options

While each of these tools primarily focuses on converting HTML to PDF, they each offer unique features and methods.

Below, you’ll find a detailed table comparing these tools to help you select the one that best fits your requirements.

Library	Pros	Cons	Best For	Performance	Memory Usage	CSS Support	JavaScript Support	Unicode Support	Docker Support	Cloud Compatible	Headless Mode	WebSocket Support	Mobile Emulation	Price	Active Development
Pyppeteer	Provides a full browser environment with comprehensive JS and CSS support. Easy to use API.	Performance may vary. Requires Chrome/Chromium installation. Higher resource usage.	Handling intricate web content, single-page applications (SPAs), and dynamic JS content.	Medium	High	Excellent	Excellent	Yes	Yes	Yes	Yes	Yes	Yes	Free	Active
xhtml2pdf	Pure Python easy setup. Good for simpler HTML/CSS documents. Low resource usage.	Limited JavaScript and CSS support. May struggle with complex layouts.	Simple HTML/CSS to PDF conversion tasks where JavaScript rendering is not needed.	Good	Low	Basic	No	Yes	Yes	Yes	N/A	No	No	Free	Moderate
python-pdfkit	Fast native conversion. Good CSS support via wkhtmltopdf. Reliable output.	Limited JavaScript support. Requires wkhtmltopdf installation.	Various HTML to PDF tasks requiring good CSS rendering but limited JavaScript execution.	Good	Medium	Good	Limited	Yes	Yes	Yes	Yes	No	No	Free	Active
Playwright	Cross-browser support, powerful automation capabilities, good performance. Modern API.	Requires installation of browser binaries. Higher resource usage.	Comprehensive testing and rendering of complex web pages, including those with dynamic content.	Good	High	Excellent	Excellent	Yes	Yes	Yes	Yes	Yes	Yes	Free	Very Active
WeasyPrint	Excellent CSS support. No external dependencies. Clean Python API.	No JavaScript support. Installation can be complex on some systems.	Static HTML/CSS documents requiring precise CSS rendering and layout control.	Good	Medium	Excellent	No	Yes	Yes	Yes	N/A	No	No	Free	Active
Selenium	Extensive browser support, mature ecosystem, good for complex web pages.	Slower than other options, requires browser installation.	Complex web applications requiring full browser automation and testing.	Medium	High	Excellent	Excellent	Yes	Yes	Yes	Yes	Yes	Yes	Free	Active

In summary, the choice between Pyppeteer, xhtml2pdf, python-pdfkit, Playwright, WeasyPrint, and Selenium depends on your project’s specific requirements.

Pyppeteer shines when dealing with dynamic content, thanks to its complete browser automation features. On the other hand, xhtml2pdf provides a simple Python-focused solution for basic conversions.

python-pdfkit, which builds upon wkhtmltopdf, sits as a flexible option in between. Developers should consider the features, setup intricacies, and performance of each library in relation to their project’s needs to make the right decision.

Playwright stands out for its cross-browser capabilities and powerful automation features, making it ideal for rendering complex and dynamic web pages.

WeasyPrint is excellent for static HTML/CSS documents requiring precise CSS rendering and layout control, but it lacks JavaScript support and can be complex to install on some systems.

Selenium is particularly useful for complex web pages that require JavaScript execution, but it’s slower than other options and requires browser installation.

Why Choose a Managed Template-Based Solution?

While the libraries we’ve discussed are powerful, they come with significant operational overhead:

Infrastructure Management
- Need to maintain browser installations
- Handle system dependencies
- Manage server resources
- Deal with scaling issues
Development Complexity
- Write and maintain error handling
- Implement retry mechanisms
- Handle browser crashes
- Manage memory usage
- Deal with timeouts
Production Challenges
- Ensure consistent rendering across environments
- Handle high concurrency
- Manage file storage
- Implement caching strategies
- Monitor performance
Maintenance Burden
- Keep dependencies updated
- Handle security patches
- Monitor system health
- Deal with browser updates
- Maintain compatibility

Using Templated for Template-Based Generation

Templated provides a managed solution that eliminates these operational challenges. It’s a visual template editor and API that allows you to create and manage templates for generating images and PDFs. Here’s why it’s a better choice for production environments:

1. Visual Template Editor

Templated offers a powerful visual editor where you can:

Design templates with a drag-and-drop interface
Add text, images, and shapes
Create dynamic layers
Preview changes in real-time
Version control your templates
Collaborate with team members

2. Simple API Integration

import requests
import json

def batch_generate_pdfs(html_contents, output_paths, max_workers=4):
    results = queue.Queue()

    def generate_single_pdf(html, output_path):
        try:
            result = generate_pdf(html, output_path)
            results.put(("success", output_path, result))
        except Exception as e:
            results.put(("error", output_path, str(e)))

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [
            executor.submit(generate_single_pdf, html, path)
            for html, path in zip(html_contents, output_paths)
        ]

    return list(results.queue)

3. Key Benefits

No Infrastructure Management
- No need to install or maintain browsers
- No server provisioning required
- Automatic scaling
- Global CDN delivery
Reliable Performance
- 99.9% uptime guarantee
- Automatic retries
- Built-in error handling
- Consistent rendering
Advanced Features
- Visual template editor
- Version control
- Real-time preview
- Batch processing
- Webhook notifications

Getting Started with Templated

Create an Account
- Sign up at app.templated.io/signup
- Get your API key from the dashboard
Design Your Template
- Use the visual editor to create your template
- Add dynamic layers for variable content
- Preview and test your template
- Save and version your template
Integrate the API
- Use the code examples above
- Start with simple image generation
- Gradually add more features
Monitor Usage
- Track API calls
- Monitor performance
- Set up alerts

Conclusion

While Python libraries for PDF generation are powerful, they require significant development and operational effort. Templated provides a managed solution that eliminates these challenges, allowing you to focus on your core business logic.

To get started with Templated:

Sign up for a free account
Explore the Template Gallery
Check out the API Documentation
Start generating images in minutes!

Other languages

You you want to learn how to convert HTML to PDF in other languages here are other resources for you to explore:

Conclusion

PDF generation is now a standard part of every business application, and it shouldn’t be a source of stress for developers.

We’ve explored how to use third-party libraries for straightforward PDF generation. However, for more complex scenarios like template management, Templated offers a seamless solution through simple API calls to generate PDFs.

To get started, sign up for a free account and begin automating your PDFs today!