How to Get the Standard Metadata for a PDF With iText7 and C#
In this post I'll take you through getting the standard set of metadata for a PDF with iText7 and C# in a way you can understand what's happening and just copy it into your codebase.
The following is a plug and play block of code that makes no assumptions on your code base. There will most probably be a more efficient way to incorporate this for your needs such as passing more things around by reference, or having guard clauses. This example also uses the newer C#8 using
declaration syntax, but feel free to swap that out with the usual using
statement syntax.
This code below takes in a byte[]
of the PDF, safely opens it and extracts the default metadata into a Dictionary<string, string>
for you to consume. If you already have the PdfDocument
object in your code, then your task is even easier.
public Dictionary<string, string> GetStandardMetadata(byte[] pdf)
{
var metadataDictionary = new Dictionary<string, string>();
using var inputStream = new MemoryStream(pdf);
using var reader = new PdfReader(inputStream);
using var document = new PdfDocument(reader);
var documentInfo = document.GetDocumentInfo();
metadataDictionary.Add("Title", documentInfo.GetTitle());
metadataDictionary.Add("Author", documentInfo.GetAuthor());
metadataDictionary.Add("Subject", documentInfo.GetSubject());
metadataDictionary.Add("Creator", documentInfo.GetCreator());
metadataDictionary.Add("Producer", documentInfo.GetProducer());
metadataDictionary.Add("Keywords", documentInfo.GetKeywords());
return metadataDictionary;
}
Want to quickly spin this up to play with it? You can get your PDF byte[]
via:
var pdf = File.ReadAllBytes(@"C:\document.pdf");