Combining PDF documents using iText7 and C#

Using a list of PDFs, it's easy to merge together all the documents into one.

Combining PDF documents using iText7 and C#
Photo by Scott Graham / Unsplash

A quick and not-so-pretty-looking tree of code, this simply takes in any IEnumerable<byte[]> where byte[] represents the PDF.

First we'd need our list of PDFs, which could look like this:

var manual = File.ReadAllBytes(@"C:\Users\Niko Uusitalo\Documents\Manual.pdf");
var receipt = File.ReadAllBytes(@"C:\Users\Niko Uusitalo\Documents\receipt.pdf");

var pdfList = new List<byte[]> { manual, receipt };
An example of getting file bytes and poor variable naming.

Then with iText7 we can create this tree of a method:

public byte[] Combine(IEnumerable<byte[]> pdfs)
{
    using (var writerMemoryStream = new MemoryStream())
    {
        using (var writer = new PdfWriter(writerMemoryStream))
        {
            using (var mergedDocument = new PdfDocument(writer))
            {
                var merger = new PdfMerger(mergedDocument);

                foreach (var pdfBytes in pdfs)
                {
                    using (var copyFromMemoryStream = new MemoryStream(pdfBytes))
                    {
                        using (var reader = new PdfReader(copyFromMemoryStream))
                        {
                            using (var copyFromDocument = new PdfDocument(reader))
                            {
                                merger.Merge(copyFromDocument, 1, copyFromDocument.GetNumberOfPages());
                            }
                        }
                    }
                }
            }
        }

        return writerMemoryStream.ToArray();
    }
}

As many iText7 PDF objects implement IDisposable, we have to manually release the unmanaged resources. In the order of above, the resources being managed are:

  1. MemoryStream which will ultimately be the return byte[]
  2. The PdfWriter that relies on the previous MemoryStream
  3. The PdfDocument that is written to, forming object representation of the return PDF
  4. The crux of this method, PdfMerger. This is how we merge PDFs together into the previous PdfDocument object
  5. The variable copyFromMemoryStream is the MemoryStream of the currently worked on item from the input list
  6. Then we use a PdfDocument to read it

Feel free to swap out the using blocks for .close() if that fits your style.

Finally, if we zoom in just on the Merge function, here it takes in three parameters:

merger.Merge(copyFromDocument, 1, copyFromDocument.GetNumberOfPages());
  1. The document we're copying from - copyFromDocument
  2. The starting page - starting from page 1
  3. The number number to copy to - copying all pages via putting in the number of pages