· tutorials · 9 min read

Python: How To Convert HTML to PDF

Learn simple methods to convert HTML to PDF using Python and the better approach

Learn simple methods to convert HTML to PDF using Python and the better approach

Introduction

Creating PDFs with code in Python is a common task, and there are various tools and libs that make this task easier. In this article you will learn the most used methods for converting HTML to PDF using Python and a better approach.

Why HTML to PDF?

  • Market Standard: The web’s backbone is HTML, making it a familiar and ubiquitous choice for content formatting.
  • Mature Technology: With its open standards, HTML is supported by a vast array of tools and technologies.
  • Versatility: The plethora of tools simplifies the conversion process.
  • Styling with CSS: Leverage CSS for sophisticated styling in your PDFs.
  • Multimedia Inclusion: HTML provides a rich starting point for media-rich documents.

HTML to PDF using Python Libraries

1. python-pdfkit

python-pdfkit is a Python wrapper for the wkhtmltopdf utility, which uses Webkit to convert HTML to PDF.

First, let’s install python-pdfkit with pip:

Terminal window
pip install pdfkit
  • To create a PDF from a website URL with python-pdfkit:
import pdfkit
# URL to fetch
url = 'https://engadget.com'
# PDF path to save
pdf_path = 'example.pdf'
pdfkit.from_url(url, pdf_path)

pdfkit supports generating PDFs from website URLs out of the box.

In the code above, we generate a pdf with one line of code usign pdfkit.
pdfkit.from_url is all you need to generate a PDF.

Generating PDF from custom HTML content

import pdfkit
# HTML content
html = '''
<html>
<head>
<title>PDF Example</title>
</head>
<body>
<h1>Hey, this will turn into a PDF!</h1>
</body>
</html>
'''
# PDF path to save
pdf_path = 'example.pdf'
# Create PDF
pdfkit.from_string(html, pdf_path)

To generate a PDF from custom HTML content using python-pdfkit, you simply need to use pdfkit.from_string and provide the HTML content along with the path for the PDF file.

2. Pyppeteer

Pyppeteer is the Python version of the Node library Puppeteer. It offers a high-level API over the Chrome DevTools Protocol. It’s like having a browser in your code that can do things a regular browser does. Puppeteer can be used for tasks like scraping data from websites, taking screenshots, and more. Let’s explore how we can use pyppeteer to create PDFs from HTML.

What is the difference between Pyppeteer and Puppeteer?

First, we need to install pyppeteer with the following command:

Terminal window
pip install pyppeteer

This is the code to convert a website URL to PDF:

import asyncio
from pyppeteer import launch
async def create_pdf(url, pdf_path):
browser = await launch()
page = await browser.newPage()
await page.goto(url)
await page.pdf({'path': pdf_path, 'format': 'A4'})
await browser.close()
# Run create_pdf
asyncio.get_event_loop().run_until_complete(generate_pdf('https://google.com', 'google_example.pdf'))

In the generate_pdf method of the code, the following steps are taken:

  1. A new browser instance is launched in headless mode.
  2. A new page is opened in the browser, and it waits until the page is ready.
  3. The browser navigates to the URL provided in the url argument and waits for the page to fully load.
  4. A PDF of the webpage is generated and saved to the location specified in pdf_path, with the format (dimensions) set to A4.
  5. The headless browser is then closed.

Now, to generate a PDF from custom HTML this is the code and process:

import asyncio
from pyppeteer import launch
async def create_pdf_from_html(html, pdf_path):
browser = await launch()
page = await browser.newPage()
await page.setContent(html)
await page.pdf({'path': pdf_path, 'format': 'A4'})
await browser.close()
# HTML content
html = '''
<html>
<head>
<title>PDF Example</title>
</head>
<body>
<h1>Hey, this will turn into a PDF!</h1>
</body>
</html>
'''
# Run create_pdf_from_html
asyncio.get_event_loop().run_until_complete(generate_pdf_from_html(html, 'from_html.pdf'))

The method generate_pdf_from_html in the Pyppeteer example above demonstrates how to generate PDFs using custom HTML content. Here’s what happens in this approach:

  1. A new headless browser instance is launched.
  2. A new page is opened in the headless browser, and it waits until it is ready.
  3. The content of the page is explicitly set to our custom HTML content.
  4. A PDF of the webpage is generated and saved to the location specified in pdf_path, with the format set to ‘A4’.
  5. The headless browser is closed.

3. xhtml2pdf

xhtml2pdf is a Python library that enables the creation of PDFs from HTML content with a slighly different approach. Let’s take a look at xhtml2pdf in use.

To install xhtml2pdf, use the following command:

Terminal window
pip install xhtml2pdf requests

It’s important to note that xhtml2pdf doesn’t have a built-in feature to parse URLs. However, we can use the requests library in Python to retrieve content from a URL.

from xhtml2pdf import pisa
import requests
def url_to_pdf(url, pdf_path):
# Retrieve the HTML content from the URL
response = requests.get(url)
if response.status_code != 200:
print(f"Failed to retrieve from URL: {url}")
return False
html_content = response.text
# Create PDF
with open(pdf_path, "wb") as pdf_file:
pisa_status = pisa.CreatePDF(html_content, dest=pdf_file)
return not pisa_status.err
# URL to retrieve
url_to_retrieve = "https://bbc.com"
# PDF path to save
pdf_path = "bbc.pdf"
# Create PDF
url_to_pdf(url_to_retrieve, pdf_path)

In the url_to_pdf method of the code, the following steps are performed:

  1. We use the requests library to fetch the webpage content from the specified URL.
  2. After receiving the content, we extract the text portion using response.text.
  3. Next, for creating the PDF, we use pisa.CreatePDF. Here, we pass our HTML content and the name for the output PDF file.

Now, to create a PDF from custom HTML content using xhtml2pdf:

from xhtml2pdf import pisa
def convert_html_to_pdf(html_string, pdf_path):
with open(pdf_path, "wb") as pdf_file:
pisa_status = pisa.CreatePDF(html_string, dest=pdf_file)
return not pisa_status.err
# HTML content
html = '''
<html>
<head>
<title>PDF Example</title>
</head>
<body>
<h1>Hey, this will turn into a PDF!</h1>
</body>
</html>
'''
# Create PDF
pdf_path = "example.pdf"
convert_html_to_pdf(html, pdf_path):

Creating a PDF from custom HTML content is similar to the process for URLs, with just one key difference: instead of passing a URL, we directly provide the actual HTML content to our creation method. The method then uses this custom HTML content to create the PDF.

Comparing python-pdfkit, Pyppeteer and xhtml2pdf

While each of these tools primarily focuses on converting HTML to PDF, they each offer unique features and methods.

Below, you’ll find a detailed table comparing these tools to help you select the one that best fits your requirements.

Feature/AspectPyppeteerxhtml2pdfpython-pdfkit
NatureBrowser automation libraryHTML/CSS to PDF converterWrapper around wkhtmltopdf
Based OnBuilt upon Puppeteer with Chromium (headless)Utilizes ReportLab & html5libUtilizes wkhtmltopdf
DependenciesRequires Chrome/ChromiumPython librariesRequires wkhtmltopdf
LanguagePythonPythonPython
Javascript SupportProvides full browser environment with JS supportNoLimited (via wkhtmltopdf)
CSS SupportOffers comprehensive CSS support similar to ChromeLimitedGood (via wkhtmltopdf)
PerformancePerformance may vary, especially in full browser modeModerateFast (native conversion)
Ease of SetupRequires moderate effort, including the installation of ChromiumEasy (pure Python)Moderate (requires wkhtmltopdf)
API FlexibilityOffers extensive API flexibility, particularly for full browser automationModerate (focused on PDF)Moderate (wrapper around tool)
UsageSuitable for handling intricate web content, SPAs, and dynamic JS-based contentGood for simpler HTML/CSS docsCommon for various HTML to PDF tasks

The choice between Pyppeteer, xhtml2pdf, and python-pdfkit depends on your project’s specific requirements.

Pyppeteer shines when dealing with dynamic content, thanks to its complete browser automation features. On the other hand, xhtml2pdf provides a simple Python-focused solution for basic conversions.

python-pdfkit, which builds upon wkhtmltopdf, sits as a flexible option in between. Developers should consider the features, setup intricacies, and performance of each library in relation to their project’s needs to make the right decision.

A better approach: HTML to PDF using Templated

The examples above demonstrate how to convert HTML to PDF and web pages to PDF using libraries. However, when it comes to tasks like generating PDFs using templates or keeping track of generated PDFs, there’s a better approach.

For instance, to keep track of generated PDFs, you’ll need to develop your own system for tracking the files created. Similarly, if you want to use custom templates, such as those for invoice or certificates generation, you must create and manage those templates manually.

An alternative solution is to utilize Templated, an API-based platform designed for PDF and Image generation, which is ideal for handling such use cases. Their PDF generation API is powered by a Chromium-based rendering engine that fully supports JavaScript, CSS, and HTML.

This approach simplifies the process and eliminates the need for extensive manual template management and tracking. To get started with PDF generation using Templated, follow the documentation and harness the power of this efficient solution.

1. Create PDFs with templates

Using Templated you can design your PDF template using a drag-and-drop editor.
Check the video below to see an example of the editor:

After logging in you will see your Dashboard where you can manage your templated or create new ones:

How to create a template from the dashboard

From your Dashboard, you can design your own templates or customize a existing one from our Template Gallery. Bellow is the Certificate of Achievement Template you can use.
There are 100+ free templates available that you can pick and customize to your needs.

Shows the certicate template in the Template Editor

To start using Templated API, you need to get your API key that can be found on the API Integration tab on your dashboard.

Shows where to get the API key from the Dashboard

Now that you have your Templated account ready, let’s see how you can integrate your application with the API. In this example we will be using a certificate template to generate PDFs.

import requests
import json
# Initialize HTTP client
client = requests.Session()
# API URL
url = "https://api.templated.io/v1/render"
# Set headers
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer `${YOUR_API_KEY}`",
}
# Payload data
payload = {
"template": template_id,
"format": "pdf",
"layers": {
"date": { "text": "02/10/2024" },
"name": { "text": "John Doe" },
"signature": { "text": "Dr. Mark Brown" },
"details": {
"text": "This certificate is awarded to John Doe in recognition of their successful " +
"completion of Computer Sciente Degree on 02/10/2024."
}
},
}
# Serialize payload to JSON
json = json.dumps(payload)
# Make the POST request
response = client.post(url, data=json, headers=headers)
# Read the response
response = response.text
# Print the response
print(response)

and If we check response we have the following

{
"renderUrl":"PDF_URL",
"status":"success",
"template":"YOUR_TEMPLATE_ID"
}

In the code above, using Templated to convert HTML to PDF is pretty simple. No additional libraries need to be installed. You only need to make a single API call, providing your data as the request body. That’s all there is to it!

You can use the renderUrl from the response to download or distribute the generated PDF.

Conclusion

PDF generation is now a standard part of every business application, and it shouldn’t be a source of stress for developers.

We’ve explored how to use third-party libraries for straightforward PDF generation. However, for more complex scenarios like template management, Templated offers a seamless solution through simple API calls.

To get started, sign up for a free account and begin automating your PDFs today!

Generate images and PDFs with a simple API

Generate social media visuals, banners, PDFs and more with our
 API and no-code integrations

Learn More
Back to Blog

Ready to start generating your images and PDFs?

Sign up to our free trial and try it for yourself

See our latest posts

View all posts »