· tutorials · 13 min read

Updated on

How To Convert HTML to PDF with Python (2025 Update)

Learn different approaches on how to convert HTML to PDF and pick the one best for your needs

Summarize with AI:
Learn different approaches on how to convert HTML to PDF and pick the one best for your needs

Introduction

We often encounter the need to create PDFs based on content. While there is no right or wrong way to generate PDFs, some approaches are more efficient and quicker to build than others.

Previously, we had to write all the boilerplate code to generate PDFs in our applications. However, now we have many great libraries and tools that can help us quickly implement this feature.

The most important part of generating PDFs is the input data. The most common and useful approach is to generate PDFs from HTML content or based on a website URL.

In this article, we will look into some approaches that we can take to generate PDFs from HTML using Python.

TL;DR: We provide a robust template-based solution designed for seamless PDF generation with popular programming languages like Python.

Why generate PDF from HTML?

Before we move on to the libraries, first let’s see why we prefer HTML as input data for generating PDFs. Some of the reasons are as follows:

  1. Open and Mature Technology: HTML is an open standard, which ensures that tools and technologies built around it are widely available and well-understood. Its maturity also means that most of the challenges and quirks are well-documented, making troubleshooting easier.
  2. Cost-effective: There are a plethora of tools, libraries, and APIs available (both free and paid) that can convert HTML to PDF, reducing the need for specialized software for PDF creation.
  3. Embed Multimedia: HTML supports the embedding of multimedia such as images, videos, and audio. Although not all of these can be directly translated into a PDF, having a source in HTML provides options for creating rich, multimedia-enhanced documents.
  4. Styling with CSS: Cascading Style Sheets (CSS) provide powerful styling options for HTML content, allowing for branding, theming, and visual consistency. These can then be reflected in the resulting PDF.
  5. Easy to Learn and Use: Learning the basics of HTML can be done quickly, making it accessible for many users to create content.

In summary, converting PDFs from HTML combines the best of both worlds: the flexibility, accessibility, and interactivity of HTML with the portability and standardization of PDFs.

HTML to PDF using Python Libraries

There are many libraries available in Python that allow the generation of PDFs from HTML content, some of them are explained below.

When generating HTML to PDF in Python, we need libraries and solutions which do not compromise the formatting of the PDF. With the following libraries you don’t need to worry about losing formatting because all the below solutions take care of the formatting when generating HTML to PDF using Python.

1. Pyppeteer

Pyppeteer is a Python port of the Node library Puppeteer, which provides a high-level API over the Chrome DevTools Protocol. It’s like you are running a browser in your code that can do similar things that your browser can do. Puppeteer can be used to scrape data from websites, take screenshots for a website, and much more. Let’s see how we can utilize pyppeteer to generate PDFs from HTML.

First, we need to install pyppeteer with the following command:

Terminal window
pip install pyppeteer

Generate PDF from a website URL

import asyncio
from pyppeteer import launch
async def generate_pdf(url, pdf_path):
browser = await launch()
page = await browser.newPage()
await page.goto(url)
await page.pdf({'path': pdf_path, 'format': 'A4'})
await browser.close()
# Run the function
asyncio.get_event_loop().run_until_complete(generate_pdf('https://example.com', 'example.pdf'))

In the above code, if you see the generate_pdf method, we are doing the following things:

  1. Launching a new headless browser instance
  2. Opens a new tab or page in the headless browser and waits for it to be ready.
  3. Navigate to the URL specified in the url argument and wait for the page to load.
  4. Generates a PDF of the webpage. The PDF is saved at the location specified in pdf_path, and the format is set to A4.
  5. Closes the headless browser.

Generate PDF from Custom HTML content

import asyncio
from pyppeteer import launch
async def generate_pdf_from_html(html_content, pdf_path):
browser = await launch()
page = await browser.newPage()
await page.setContent(html_content)
await page.pdf({'path': pdf_path, 'format': 'A4'})
await browser.close()
# HTML content
html_content = '''
<!DOCTYPE html>
<html>
<head>
<title>PDF Example</title>
</head>
<body>
<h1>Hello, world!</h1>
</body>
</html>
'''
# Run the function
asyncio.get_event_loop().run_until_complete(generate_pdf_from_html(html_content, 'from_html.pdf'))

Above is another example using Pyppeteer on how we can use our own custom HTML content to generate PDFs. Let’s see what is happening in the method generate_pdf_from_html:

  1. Launching a new headless browser instance
  2. Opens a new tab or page in the headless browser and waits for it to be ready.
  3. Now we are explicitly setting the content of the page to our HTML content
  4. Generates a PDF of the webpage. The PDF is saved at the location specified in pdf_path, and the format is set to ‘A4’.
  5. Closes the headless browser.

2. python-pdfkit

python-pdfkit is a Python wrapper for the wkhtmltopdf utility, which uses Webkit to convert HTML to PDF.

First, let’s install python-pdfkit with pip:

Terminal window
pip install pdfkit

Generate PDF from a website URL

import pdfkit
# URL to fetch
url = 'https://cnn.com'
# PDF path to save
pdf_path = 'example.pdf'
pdfkit.from_url(url, pdf_path)

pdfkit supports generating PDFs from website URLs out of the box just like Pyppeteer.

In the above code, as you can see, pdfkit is generating pdf just from one line code. pdfkit.from_url is all you need to generate a PDF.

Generate PDF from Custom HTML content

import pdfkit
# HTML content
html = '''
<html>
<head>
<title>PDF Example</title>
</head>
<body>
<h1>Hey, this will turn into a PDF!</h1>
</body>
</html>
'''
# PDF path to save
pdf_path = 'example.pdf'
# Create PDF
pdfkit.from_string(html, pdf_path)

To generate a PDF from custom HTML content using python-pdfkit, you simply need to use pdfkit.from_string and provide the HTML content along with the path for the PDF file.

3. xhtml2pdf

xhtml2pdf is another Python library that lets you generate PDFs from HTML content. Let’s see xhtml2pdf in action.

The following command is to install xhtml2pdf:

Terminal window
pip install xhtml2pdf requests

To generate PDF from a website URL

Note that xhtml2pdf does not have an in-built feature to parse the URL, but we can use requests in Python to get the content from a URL.

from xhtml2pdf import pisa
import requests
def convert_url_to_pdf(url, pdf_path):
# Fetch the HTML content from the URL
response = requests.get(url)
if response.status_code != 200:
print(f"Failed to fetch URL: {url}")
return False
html_content = response.text
# Generate PDF
with open(pdf_path, "wb") as pdf_file:
pisa_status = pisa.CreatePDF(html_content, dest=pdf_file)
return not pisa_status.err
# URL to fetch
url_to_fetch = "https://google.com"
# PDF path to save
pdf_path = "google.pdf"
# Generate PDF
if convert_url_to_pdf(url_to_fetch, pdf_path):
print(f"PDF generated and saved at {pdf_path}")
else:
print("PDF generation failed")

In the above code, we are doing the following things in our method convert_url_to_pdf:

  1. First, we are using requests to get the webpage content from the URL.
  2. Once we get the content, we select the text part from the response using response.text
  3. Now the generating PDF part comes, we are using pisa.CreatePDF and pass our HTML content and PDF file name for the output.

Generating PDF from custom HTML content

from xhtml2pdf import pisa
def convert_html_to_pdf(html_string, pdf_path):
with open(pdf_path, "wb") as pdf_file:
pisa_status = pisa.CreatePDF(html_string, dest=pdf_file)
return not pisa_status.err
# HTML content
html = '''
<html>
<head>
<title>PDF Example</title>
</head>
<body>
<h1>Hey, this will turn into a PDF!</h1>
</body>
</html>
'''
# Create PDF
pdf_path = "example.pdf"
convert_html_to_pdf(html, pdf_path)

Creating a PDF from custom HTML content is similar to the process for URLs, with just one key difference: instead of passing a URL, we directly provide the actual HTML content to our creation method. The method then uses this custom HTML content to create the PDF.

4. Playwright

Playwright is a modern, lightweight library for headless browser automation. It supports multiple browsers (Firefox, Chromium, Edge, Safari) across platforms and languages, making it versatile for tasks like PDF generation.

To use Python as a converter for HTML to PDF with Playwright, follow these steps:

Step 1: Install Playwright:

Terminal window
pip install playwright
playwright install

Step 2: Generate PDF from Website URL:

import asyncio
from playwright.async_api import async_playwright
async def url_to_pdf(url, output_path):
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto(url)
await page.pdf(path=output_path)
await browser.close()
# Example usage
url = 'https://example.com'
output_path = 'example_url.pdf'
asyncio.run(url_to_pdf(url, output_path))

Step 3: Generate PDF from Custom HTML Content:

import asyncio
from playwright.async_api import async_playwright
async def html_to_pdf(html_content, output_path):
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.set_content(html_content)
await page.pdf(path=output_path)
await browser.close()
html_content = '''
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Sample HTML</title>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is a sample HTML content to be converted to PDF.</p>
</body>
</html>
'''
output_path = 'custom_html.pdf'
asyncio.run(html_to_pdf(html_content, output_path))

5. WeasyPrint

WeasyPrint is a visual rendering engine that follows the W3C specifications for HTML and CSS. It’s known for its excellent CSS support and ability to generate high-quality PDFs without requiring external dependencies like browsers or rendering engines.

Step 1: Install WeasyPrint:

Terminal window
pip install weasyprint

Step 2: Generate PDF from Website URL:

from weasyprint import HTML
def url_to_pdf(url, output_path):
HTML(url=url).write_pdf(output_path)
# Example usage
url = 'https://example.com'
output_path = 'example_url.pdf'
url_to_pdf(url, output_path)

Step 3: Generate PDF from Custom HTML Content:

from weasyprint import HTML
def html_to_pdf(html_content, output_path):
HTML(string=html_content).write_pdf(output_path)
html_content = '''
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Sample HTML</title>
<style>
body { font-family: Arial, sans-serif; }
h1 { color: navy; }
</style>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is a sample HTML content to be converted to PDF.</p>
</body>
</html>
'''
output_path = 'custom_html.pdf'
html_to_pdf(html_content, output_path)

6. Selenium

Selenium is a powerful tool for web automation that can also be used for PDF generation. It’s particularly useful when dealing with dynamic web pages that require JavaScript execution.

Step 1: Install Selenium:

Terminal window
pip install selenium webdriver-manager

Step 2: Generate PDF from Website URL:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
def url_to_pdf(url, output_path):
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--print-to-pdf=' + output_path)
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=chrome_options)
try:
driver.get(url)
# Wait for any dynamic content to load
driver.implicitly_wait(10)
finally:
driver.quit()
# Example usage
url = 'https://example.com'
output_path = 'example_url.pdf'
url_to_pdf(url, output_path)

Step 3: Generate PDF from Custom HTML Content:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
def html_to_pdf(html_content, output_path):
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--print-to-pdf=' + output_path)
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=chrome_options)
try:
# Create a temporary HTML file
with open('temp.html', 'w') as f:
f.write(html_content)
# Load the temporary file
driver.get('file://' + os.path.abspath('temp.html'))
driver.implicitly_wait(10)
finally:
driver.quit()
# Clean up temporary file
os.remove('temp.html')
html_content = '''
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Sample HTML</title>
<style>
body { font-family: Arial, sans-serif; }
h1 { color: navy; }
</style>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is a sample HTML content to be converted to PDF.</p>
</body>
</html>
'''
output_path = 'custom_html.pdf'
html_to_pdf(html_content, output_path)

Comparing all the options

While each of these tools primarily focuses on converting HTML to PDF, they each offer unique features and methods.

Below, you’ll find a detailed table comparing these tools to help you select the one that best fits your requirements.

LibraryProsConsBest ForPerformanceMemory UsageCSS SupportJavaScript SupportUnicode SupportDocker SupportCloud CompatibleHeadless ModeWebSocket SupportMobile EmulationPriceActive Development
PyppeteerProvides a full browser environment with comprehensive JS and CSS support. Easy to use API.Performance may vary. Requires Chrome/Chromium installation. Higher resource usage.Handling intricate web content, single-page applications (SPAs), and dynamic JS content.MediumHighExcellentExcellentYesYesYesYesYesYesFreeActive
xhtml2pdfPure Python easy setup. Good for simpler HTML/CSS documents. Low resource usage.Limited JavaScript and CSS support. May struggle with complex layouts.Simple HTML/CSS to PDF conversion tasks where JavaScript rendering is not needed.GoodLowBasicNoYesYesYesN/ANoNoFreeModerate
python-pdfkitFast native conversion. Good CSS support via wkhtmltopdf. Reliable output.Limited JavaScript support. Requires wkhtmltopdf installation.Various HTML to PDF tasks requiring good CSS rendering but limited JavaScript execution.GoodMediumGoodLimitedYesYesYesYesNoNoFreeActive
PlaywrightCross-browser support, powerful automation capabilities, good performance. Modern API.Requires installation of browser binaries. Higher resource usage.Comprehensive testing and rendering of complex web pages, including those with dynamic content.GoodHighExcellentExcellentYesYesYesYesYesYesFreeVery Active
WeasyPrintExcellent CSS support. No external dependencies. Clean Python API.No JavaScript support. Installation can be complex on some systems.Static HTML/CSS documents requiring precise CSS rendering and layout control.GoodMediumExcellentNoYesYesYesN/ANoNoFreeActive
SeleniumExtensive browser support, mature ecosystem, good for complex web pages.Slower than other options, requires browser installation.Complex web applications requiring full browser automation and testing.MediumHighExcellentExcellentYesYesYesYesYesYesFreeActive

In summary, the choice between Pyppeteer, xhtml2pdf, python-pdfkit, Playwright, WeasyPrint, and Selenium depends on your project’s specific requirements.

Pyppeteer shines when dealing with dynamic content, thanks to its complete browser automation features. On the other hand, xhtml2pdf provides a simple Python-focused solution for basic conversions.

python-pdfkit, which builds upon wkhtmltopdf, sits as a flexible option in between. Developers should consider the features, setup intricacies, and performance of each library in relation to their project’s needs to make the right decision.

Playwright stands out for its cross-browser capabilities and powerful automation features, making it ideal for rendering complex and dynamic web pages.

WeasyPrint is excellent for static HTML/CSS documents requiring precise CSS rendering and layout control, but it lacks JavaScript support and can be complex to install on some systems.

Selenium is particularly useful for complex web pages that require JavaScript execution, but it’s slower than other options and requires browser installation.

Why Choose a Managed Template-Based Solution?

While the libraries we’ve discussed are powerful, they come with significant operational overhead:

  1. Infrastructure Management

    • Need to maintain browser installations
    • Handle system dependencies
    • Manage server resources
    • Deal with scaling issues
  2. Development Complexity

    • Write and maintain error handling
    • Implement retry mechanisms
    • Handle browser crashes
    • Manage memory usage
    • Deal with timeouts
  3. Production Challenges

    • Ensure consistent rendering across environments
    • Handle high concurrency
    • Manage file storage
    • Implement caching strategies
    • Monitor performance
  4. Maintenance Burden

    • Keep dependencies updated
    • Handle security patches
    • Monitor system health
    • Deal with browser updates
    • Maintain compatibility

Using Templated for Template-Based Generation

Templated provides a managed solution that eliminates these operational challenges. It’s a visual template editor and API that allows you to create and manage templates for generating images and PDFs. Here’s why it’s a better choice for production environments:

1. Visual Template Editor

Templated offers a powerful visual editor where you can:

  • Design templates with a drag-and-drop interface
  • Add text, images, and shapes
  • Create dynamic layers
  • Preview changes in real-time
  • Version control your templates
  • Collaborate with team members

2. Simple API Integration

import requests
import json
def batch_generate_pdfs(html_contents, output_paths, max_workers=4):
results = queue.Queue()
def generate_single_pdf(html, output_path):
try:
result = generate_pdf(html, output_path)
results.put(("success", output_path, result))
except Exception as e:
results.put(("error", output_path, str(e)))
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = [
executor.submit(generate_single_pdf, html, path)
for html, path in zip(html_contents, output_paths)
]
return list(results.queue)

3. Key Benefits

  • No Infrastructure Management

    • No need to install or maintain browsers
    • No server provisioning required
    • Automatic scaling
    • Global CDN delivery
  • Reliable Performance

    • 99.9% uptime guarantee
    • Automatic retries
    • Built-in error handling
    • Consistent rendering
  • Advanced Features

    • Visual template editor
    • Version control
    • Real-time preview
    • Batch processing
    • Webhook notifications

Getting Started with Templated

  1. Create an Account

  2. Design Your Template

    • Use the visual editor to create your template
    • Add dynamic layers for variable content
    • Preview and test your template
    • Save and version your template
  3. Integrate the API

    • Use the code examples above
    • Start with simple image generation
    • Gradually add more features
  4. Monitor Usage

    • Track API calls
    • Monitor performance
    • Set up alerts

Conclusion

While Python libraries for PDF generation are powerful, they require significant development and operational effort. Templated provides a managed solution that eliminates these challenges, allowing you to focus on your core business logic.

To get started with Templated:

  1. Sign up for a free account
  2. Explore the Template Gallery
  3. Check out the API Documentation
  4. Start generating images in minutes!

Other languages

You you want to learn how to convert HTML to PDF in other languages here are other resources for you to explore:

Conclusion

While Python libraries for PDF generation are powerful, they require significant development and operational effort. For production environments, a managed solution like Templated eliminates these challenges, allowing you to focus on your core business logic.

We’ve explored how to use third-party libraries for straightforward PDF generation. However, for more complex scenarios like template management, Templated offers a seamless solution through simple API calls to generate PDFs.

To get started with Templated:

  1. Sign up for a free account
  2. Explore the Template Gallery
  3. Check out the API Documentation
  4. Start generating PDFs in minutes!
Automate your images and PDFs with a powerful API

Automate your marketing, social media visuals, banners, PDFs and more with our
 API and no-code integrations

Learn More
Back to Blog

Ready to automate your images and PDFs?

Sign up to our free trial and try it for yourself

See our latest posts

View all posts »
Automate Personalized Welcome Email Images with Templated and Make

Automate Personalized Welcome Email Images with Templated and Make

Automate personalized welcome email images in minutes using Templated and Make. Create a template, pull names from Google Sheets, and send unique visuals that boost engagement, stand out in inboxes, and make every “welcome” feel one-of-a-kind.

How to Automate Your Real Estate Template Using Templated and Make

How to Automate Your Real Estate Template Using Templated and Make

Create stunning Real Estate images in bulk using Templated and Make.com. This automation lets you turn spreadsheet data into ready-to-publish images, no design skills or manual effort needed.

Automate Your Image Generation using Spreadsheets (No code needed)

Automate Your Image Generation using Spreadsheets (No code needed)

Easily automate image creation using spreadsheets with Templated, no code or extension needed. Just upload a formatted sheet with your template ID and layer data, and generate images in minutes. Perfect for hiring posts, marketing visuals, and more!

4 Best Polotno Alternatives

4 Best Polotno Alternatives

These tools offer easier setup and white-label options: an embed-ready editor, iframe based studio with design assets, a full SDK for custom control, and a modular enterprise solution, fit for all budgets and tech needs.