PDF generation with Serverless + AWS Lambda and Puppeteer.

Shubham Pandey
4 min readJun 14, 2021

--

In this article, I will take you through steps on how to generate PDF using AWS Lambda, Serverless framework, Puppeteer.

Recently I was working on a requirement to generate PDFs through the server-side because of the dynamicity of data involved. Luckily there are tools like Puppeteer which help us to generate pdfs.

Puppeteer Documentation:
Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.

Puppeteer basically launches a headless chrome browser where you can take screenshots of HTML pages, generate pdf of HTML pages with your choice of configurations.

Let’s start doing actual stuff:

Steps to follow:

  • Create a project.
  • Get all required dependencies.
  • Create your serverless.yml configuration.
  • Write a function to create a pdf.
  • Create templates to help in organizing stuff(HTML) better (optional).

Create a project

First, create a project where we will be creating this service.

Once you do npm init, you will be asked for some configuration. please fill them as per your choice.

Get all required dependencies

Once you do npm init you will have a package.json file created in your existing folder.

Add below dependencies to your project too:

"dependencies": {       "aws-sdk": "^2.719.0",       "chrome-aws-lambda": "^3.1.1",       "express": "^4.17.1",       "puppeteer-core": "^3.1.0",       "serverless-apigw-binary": "^0.4.4",       "serverless-http": "^2.7.0",       "serverless-plugin-include-dependencies": "^4.1.0",}

chrome-aws-lambda lets you invoke puppeteer, if you try to invoke puppeteer directly you may face browser version compatibility issues or node version issues.

Run npm install after adding these dependencies.

Create your serverless.yml configuration:

Now it's time to set up your serverless config where we will bind a handler which will create pdfs for us, setting up an S3 bucket if you want to store pdf in s3.

We need to setup apigwBinary types, else it will not work.

A basic version for generating pdf out of HTML requires only this much, trust me nothing more in configuration.

Write a function to create a pdf:

Once you have your configuration done, let write some code that will create a pdf from HTML content.

As we have created a function that runs in nodejs12.x environment, whose handler will sit in our index file.

Let’s create our index file:

// run this from terminal
touch index.js

Content of index.js

const serverless = require('serverless-http');
const express = require('express');
const app = express();
const chromium = require('chrome-aws-lambda');
const htmlGenerator = require('./reportTemplate');
async function createPdf(url, res) {
//options for pdf
const options = {
format: 'A4',
printBackground: true,
margin: {
bottom: 70,
top: 0,
left: 0,
right: 0,
},
};
let browser = null;try {
browser = await chromium.puppeteer.launch({
headless: true,
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
});
// launch a new page
const page = await browser.newPage();
// navigate to the specified url
await page.goto(event.url);
// generate a pdf stream of the page with options
const pdfStream = await page.pdf(options);
// convert stream to b64
const b64 = pdfStream.toString('base64');
await browser.close();
res.send({ statusCode: 200, pdfData: b64 });
} catch (err) {
res.send({ statusCode: 500, userMessage: err });
}
}
app.post('/pdf-generator', async function (req, res) {
const body = JSON.parse(req.body.toString());
const url = body.get('url');
await createPdf(url, res);
});
module.exports.handler = serverless(app);

Here htmlGenerator is a plain HTML template generator based on the JSON given to it from a post API request.
I will explain the response and templating in the next step.

Here we are trying to do:

  • We create pdf based on JSON templates which basically contains data from a database or static data as well.
  • JSON templates get passed on to htmlGenerator which checks which template to pick (will be explained in next step) and generates HTML string.
  • The returned HTML string is passed on to createPdf , a function that uses puppeteer to launch a headless chrome browser and sets its content to the HTML string.
  • Next, we create a pdf out of the HTML page and convert it to stream/base64 that can be used to save data to S3 Buckets for generating links.

Create HTML templates to help in organizing stuff better (optional):

Our POST API request:

{
"data": {
"pages":[
{
"content":[
{
"type":1, //template code
"title":"I am title",
"subtitle":"I am subtitle."
}
]
}
]
}
}

So here basically, we have specific types for various templates, in this example we use type = 1

Basically, with this, we can pass this type to a function that maps to template code to the template with the data here (title and subtitle).

So we can have a plain HTML string with Title and Subtitle as dynamic data(variables) to the template.

e.g:

const titleSubtitle = (template) => {
return return `
<div style="display:flex;">
<div style="font-size:20px;font-weight:bold">
${template.title}
</div>
<div style="font-size:20px;font-weight:bold">
${template.subtitle}
</div>
</div>`;
};

So once we pass this content block to titleSubtitle it will return a plain HTML string that can be used to generate HTML content.

I hope following all the above-mentioned steps excluding the optional one, you will be able to set up your pdf generation service with node.js, express.js, AWS lambda, puppeteer, and serverless framework.

Thanks for reading! Open to any suggestions or improvements from readers. Comment if anyone gets stuck in any of the step happy to help :).

--

--

Shubham Pandey

A traveler and reader working with JS, React, Python, AWS and Serverless