How to Natively Read .tgz Files With the New C# TarReader Class

In .NET 7 we can now natively decompress/unpack and open .tgz/.tar.gz files/archives without third party libraries or complex code.

How to Natively Read .tgz Files With the New C# TarReader Class

We now have the System.Formats.Tar namespace which allows us to natively interact with .tar files.

This was a fantastic API proposal by Carlos Sanchez which aligns with making .NET cross platform as *nix based systems often deal with tarballs for archiving.

Since just reading .tar files is literally in the documentation, let's mix this new API with the existing GZipStream API to open another commonly seen *nix file type: The compressed GZipped version of a .tar called .tgz.

using var gzip = new GZipStream(tgzStream, CompressionMode.Decompress);

using var unzippedStream = new MemoryStream();
await gzip.CopyToAsync(unzippedStream);
unzippedStream.Seek(0, SeekOrigin.Begin);

using var reader = new TarReader(unzippedStream);

while (reader.GetNextEntry() is TarEntry entry)
{
	Console.WriteLine($"Entry name: {entry.Name}, entry type: {entry.EntryType}");
    //entry.ExtractToFile(destinationFileName: Path.Join("D:/MyExtractionFolder/", entry.Name), overwrite: false);
}

Notes

  • tgzStream is any Stream object that represents the .tgz file.
  • Why copy over the GZipStream to a MemoryStream? We can't seek the GZipStream.
  • The commented line is ripped straight from the documentation and is how to save the iterated file from the archive.

And finally, the output of the WriteLine() calls above:

Showing all the archive entries from a .tgz.

I also really like how the while loop is written because originally I had a loose TarEntry? variable above to hold onto the iteration value, due to newer patterns we can put it all in the same line! Inspiration from Nick Craver:

Troubleshooting

You might run into the following:

System.IO.InvalidDataException: 'Found truncated data while decoding.'

You may need to seek the tgzStream variable to the beginning like so before using it in the code above:

tgzStream.Seek(0, SeekOrigin.Begin);

To Conclude

I hope you found this post useful. I had to write it after I realised how easy it was nowadays when opening up NPM .tgz payloads for my project to create Pokémon spritesheets:

GitHub - nikouu/pokesprite-spritesheet: The tilesheets for my needs for my Living Dex project via https://github.com/msikma/pokesprite
The tilesheets for my needs for my Living Dex project via https://github.com/msikma/pokesprite - GitHub - nikouu/pokesprite-spritesheet: The tilesheets for my needs for my Living Dex project via ht...