· tutorials · 7 min read

How to Convert HTML to PDF using Apache PDFBox

Master the conversion of HTML to PDF in Java with Apache PDFBox through this detailed guide, complete with practical code examples.

Master the conversion of HTML to PDF in Java with Apache PDFBox through this detailed guide, complete with practical code examples.

Introduction

Converting HTML to PDF is a common requirement in software development, particularly for generating reports, invoices, and other document types from web content. Apache PDFBox, a powerful open-source Java library, provides a robust set of tools for working with PDF documents, including creating, rendering, and manipulating them. This article will guide you through the process of converting HTML to PDF using Apache PDFBox, along with helpful code examples.

Why Apache PDFBox?

  • Open Source: Apache PDFBox is maintained by the Apache Software Foundation, ensuring a high level of reliability and community support.
  • Comprehensive API: It offers a comprehensive set of functionalities for PDF manipulation, making it a versatile tool for developers.
  • High Performance: Designed for high-performance environments, PDFBox is suitable for large-scale applications.

Getting Started with Apache PDFBox

Step 1: Add Apache PDFBox Dependency

First, you need to include Apache PDFBox in your project. If you are using Maven, add the following dependencies to your pom.xml:

<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.24</version>
</dependency>

Step 2: Understanding PDFBox Capabilities

Before diving into the code, it’s important to understand that Apache PDFBox does not directly convert HTML to PDF. Instead, it provides the tools to create PDF documents programmatically. You’ll need to parse the HTML content and use PDFBox to draw the text and images onto a PDF.

Creating a Simple PDF Document

Here’s a basic example of using Apache PDFBox to create a PDF document:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
public class CreatePdfWithPDFBox {
public static void main(String[] args) {
try (PDDocument document = new PDDocument()) {
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.beginText();
contentStream.setFont(PDType1Font.HELVETICA, 12);
contentStream.newLineAtOffset(100, 700);
contentStream.showText("Hello, PDFBox!");
contentStream.endText();
contentStream.close();
document.save("HelloPDFBox.pdf");
System.out.println("PDF created successfully.");
} catch (Exception e) {
e.printStackTrace();
}
}
}

Converting HTML to PDF with PDFBox

To convert HTML content to a PDF, you need to parse the HTML and manually draw its content onto a PDF document using PDFBox. This process can be complex and requires handling the HTML structure, styles, and elements like text and images individually.

Here’s a conceptual example:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class HtmlToPdfWithPDFBox {
public static void main(String[] args) {
try (PDDocument document = new PDDocument()) {
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// Load your HTML document
String html = "<html><body><p>Hello, PDFBox with HTML content!</p></body></html>";
Document doc = Jsoup.parse(html);
// Set initial Y position and line height
float yPosition = 750;
float lineHeight = 15;
// Set font and font size
contentStream.beginText();
contentStream.setFont(PDType1Font.HELVETICA, 12);
// Parse and draw the text from HTML
Elements paragraphs = doc.select("p");
for (Element paragraph : paragraphs) {
String text = paragraph.text();
contentStream.newLineAtOffset(100, yPosition);
contentStream.showText(text);
yPosition -= lineHeight;
}
contentStream.endText();
contentStream.close();
document.save("HtmlToPdf.pdf");
System.out.println("PDF created successfully from HTML.");
} catch (Exception e) {
e.printStackTrace();
}
}
}

In this example, you would need to implement the logic to parse the HTML and convert it into PDF content. This might involve using an HTML parser library, extracting text and image elements, and then using PDFBox to draw these elements onto the PDF.

A better approach: HTML to PDF using Templated

The examples above demonstrate how to convert HTML to PDF and web pages to PDF using libraries. However, when it comes to tasks like generating PDFs using templates or keeping track of generated PDFs, additional steps are required.

For instance, to keep track of generated PDFs, you’ll need to develop your own system for tracking the files created. Similarly, if you want to use custom templates, such as those for invoice or certificates generation, you must create and manage those templates manually.

An alternative solution is to utilize Templated, an API-based platform designed for PDF and Image generation, which is ideal for handling such use cases. Their PDF generation API is powered by a Chromium-based rendering engine that fully supports JavaScript, CSS, and HTML.

This approach simplifies the process and eliminates the need for extensive manual template management and tracking. To get started with PDF generation using Templated, follow the documentation and harness the power of this efficient solution.

Create PDFs with templates

Using Templated’s PDF generation API you can design your PDF template using a drag-and-drop editor.
Check the video below to see an example of the editor:

After logging in you will see your Dashboard where you can manage your templates or create new ones:

How to create a template from the dashboard

From your Dashboard, you can design your own templates or customize a existing one from our Template Gallery. Bellow is the Certificate of Achievement Template you can use.
There are 100+ free templates available that you can pick and customize to your needs.

Shows the certicate template in the Template Editor

To start using Templated API, you need to get your API key that can be found on the API Integration tab on your dashboard.

Shows where to get the API key from the Dashboard

Now that you have your Templated account ready, let’s see how you can integrate your application with the API. In this example we will be using a certificate template to generate PDFs.

import java.net.HttpURLConnection;
import java.net.URL;
import java.io.OutputStream;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.json.JSONObject;
public class TemplatedApiRequest {
public static void main(String[] args) {
try {
URL url = new URL("https://api.templated.io/v1/render");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type", "application/json");
conn.setRequestProperty("Authorization", "Bearer ${YOUR_API_KEY}");
conn.setDoOutput(true);
JSONObject payload = new JSONObject();
payload.put("template", "template_id");
payload.put("format", "pdf");
JSONObject layers = new JSONObject();
// Add layers to the changes object
changes.put("date", "02/10/2024");
changes.put("name", "John Doe");
changes.put("signature", "Dr. Mark Brown");
changes.put("details", "This certificate is awarded to John Doe in recognition of their successful completion of Computer Science Degree on 02/10/2024.");
payload.put("changes", changes);
// Add layers to the changes object
JSONObject dateLayer = new JSONObject();
dateLayer.put("text", "02/10/2024");
layers.put("date", dateLayer);
JSONObject nameLayer = new JSONObject();
nameLayer.put("text", "John Doe");
layers.put("name", nameLayer);
JSONObject signatureLayer = new JSONObject();
signatureLayer.put("text", "Dr. Mark Brown");
layers.put("signature", signatureLayer);
JSONObject detailsLayer = new JSONObject();
detailsLayer.put("text", "This certificate is awarded to John Doe in recognition of their successful completion of Computer Science Degree on 02/10/2024.");
layers.put("details", detailsLayer);
payload.put("layers", layers);
OutputStream os = conn.getOutputStream();
os.write(payload.toString().getBytes());
os.flush();
os.close();
int responseCode = conn.getResponseCode();
System.out.println("Response Code : " + responseCode);
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
// Print the response
System.out.println(response.toString());
} catch (Exception e) {
e.printStackTrace();
}
}
}

and If we check response we have the following

{
"renderUrl":"PDF_URL",
"status":"success",
"template":"YOUR_TEMPLATE_ID"
}

In the code above, using Templated to convert HTML to PDF is pretty simple. No additional libraries need to be installed. You only need to make a single API call, providing your data as the request body. That’s all there is to it!

You can use the renderUrl from the response to download or distribute the generated PDF.

Other Java libraries

There are other Java libraries capable of converting HTML to PDF and you can find more information about it in this article on How To Convert HTML to PDF with Java.

Conclusion

While Apache PDFBox is a powerful tool for PDF manipulation, its direct application for HTML to PDF conversion involves a manual process of parsing HTML and rendering it as PDF content. For simpler use cases, PDFBox offers straightforward methods to create and manipulate PDF documents. For complex HTML documents, consider using a combination of HTML parsing libraries and PDFBox for rendering, or explore other libraries specifically designed for HTML to PDF conversion.

Automate your images and PDFs with a powerful API

Automate your marketing, social media visuals, banners, PDFs and more with our
 API and no-code integrations

Learn More
Back to Blog

Ready to start generating your images and PDFs?

Sign up to our free trial and try it for yourself

See our latest posts

View all posts »
How To Automate Job Posting Images for LinkedIn

How To Automate Job Posting Images for LinkedIn

Learn how to automate job posting images for LinkedIn using Templated, Google Sheets, and Zapier, saving time while keeping your visuals consistent.

How To Automate Instagram Posts using Google Sheets & Make

How To Automate Instagram Posts using Google Sheets & Make

Learn how to automate your Instagram posts using Templated and Google Sheets, saving time and ensuring a steady flow of engaging content for your audience.

How to Generate Images from Google Sheets with Zapier

How to Generate Images from Google Sheets with Zapier

Learn how to automate image generation using data from Google Sheets and Templated's powerful API with Zapier. This step-by-step guide will help you create a no-code workflow to streamline your visual content creation process.

How to Generate Images from Google Sheets with Make

How to Generate Images from Google Sheets with Make

Learn how to automate image generation using data from Google Sheets and Templated's powerful API with Make (formerly Integromat). This step-by-step guide will help you create a no-code workflow to streamline your visual content creation process.