How to Get All the Metadata for a PDF With iText7 and C#

In this post I'll take you through getting all the metadata for a PDF with iText7 and C# in a way you can understand what's happening and just copy it into your codebase.

How to Get All the Metadata for a PDF With iText7 and C#
Photo by Pok Rie

The following is a plug and play block of code that makes no assumptions on your code base. There will most probably be a more efficient way to incorporate this for your needs such as passing more things around by reference, or having guard clauses. This example also uses the newer C#8 using declaration syntax, but feel free to swap that out with the usual using statement syntax.

This code below takes in a byte[] of the PDF, safely opens it and extracts all the metadata into a Dictionary<string, string> for you to consume. If you already have the PdfDocument object in your code, then your task is even easier.

public Dictionary<string, string> GetAllMetadata(byte[] pdf)
{
    var metadataDictionary = new Dictionary<string, string>();
    using var inputStream = new MemoryStream(pdf);
    using var reader = new PdfReader(inputStream);
    using var document = new PdfDocument(reader);

    var trailer = document.GetTrailer();
    var metadataInfo = trailer.GetAsDictionary(PdfName.Info);
    var keys = metadataInfo.KeySet();

    foreach (var key in keys)
    {
        var value = ((PdfString)metadataInfo.Get(key)).GetValue();
        metadataDictionary.Add(key.GetValue(), value);
    }

    return metadataDictionary;
}

Want to quickly spin this up to play with it? You can get your PDF byte[] via:

var pdf = File.ReadAllBytes(@"C:\document.pdf");