Programming Journey Entry 10: File limits and work arounds

All of these Programming Journey posts can be found in the associated category of this blog.


PowerShell, again

As mentioned in the last post and several others, I started a simple-sounding PowerShell script to zip my Steam game library. And mostly finished up until it collapsed like a house of cards (checkmate). At which time I decided to start over but more correctly.


It’s always something

A couple days ago I decided maybe I would take a short break from this specific script and explore an alternative approach to doing this same thing but in a much different way.

Through this exploration (I’ll to get in a second) I discovered that, actually, the most import linchpin of the entire script – compress-archive – has a 2 GB file size limit.

See I wanted to do zipping of all the game some folders, including literally 100+ Gigabyte folders.

I discovered this when I decided I wanted to compare compression rations between the zips and this other approach by zipping a 100 gigabyte folder (Dirt Rally 2.0 is ~109 Gigabytes, though that doesn’t really matter). I tried to use compress-archive for this operation and it errored out.

I learned that a) compress-archive errors at 2 Gigabyte zip files, b) that the actual standard for zip file creation (via winzip etc) is actually 4 Gigabytes and c) there’s a newer zip standard call zip64 that has much higher file size limits (multiple Petabytes).

I also learned there may or may not be a latest version of the .NET library responsible for things like using zip64 with compress-archive but that the solutions I found online do not seem to be working for me.

Oh and another minor point: I just used the GUI of 7zip to try zipping that Dirt Rally 2.0 ~109 Gigabytes folder and discovered it only compressed about 0.01%. Technically less than that. It took my PC ~1 hour 10 minutes (my CPU is kind of old at this point) to create the zip file and that’s how much storage space I saved. I don’t know what I was expecting but something more than that for some reason.

So this lead me to stop and think about the solutions to this:

    • switch to writing the script in Python since Python’s built-in zip functionality does support zip64 (C# might also work, haven’t looked into it).
    • Some combination of PS and Python where PS is the abstraction layer and calls a Python script that just does the zip operation part and the two scripts talk to each other…
    • I was already going to work on adding options for third party zip utilities like 7zip anyway. Just move that idea up the time line so it’s bring-your-own-compression as a default
    • Just blatantly mention “this script is made for games of less than 1.8 Gigabytes” and just stop it there.

    Exploring the alternative

    My kooky idea as an alternative was to use VHDX files instead of zip files. Let me explain why.

    Zip files have several advantages like using the compress-archive cmdlet that doesn’t require any dependencies or downloads, theoretically compressing the folder (though not by much in some cases) and if nothing else sticking a folder into a single file so it can moved and copied with less fuss than copying an entire folder.

    There’s also disadvantages, the most obvious of which – again using the compress-archive cmdlet – files can’t be removed from the archive, only updated and deleted. Which means if a large game like Dirt Rally 2.0 (or Baldur’s Gate 3, which is even larger) gets a 10 Megabyte update I would have to delete the entire Baldur’s Gate 3 archive and re-create it from scratch. That’s kind of a big disadvantage.

    Which is where the VHDX idea came in.

    For anyone unaware: a VHDX file is basically an disk image file that can be "mounted" as a volume in Windows then manipulated like a physical device (partitioned, formatted, adding/deleting folders, etc). Usually these files are associated with Virtual Machines like Hyper-V but nothing stopping anyone from using them for other things.

    My idea was to dynamically create a VHDX file of appropriate size for the game folder, then do the normal initialize, partition, format, and finally copy-item or robocopy the game folder into the VHDX volume and dismount it.

    Of course VHDX files don’t have any compression built-in. But as it turns out NTFS does. So my further idea was to, right after the game folder is copied to the VHDX volume, use NTFS compression the files.

    Actually, the compression part didn’t work out that good. Even after compressing the files the VHDX still ended up as large or larger than the original folder. But it does have an advantage: when there’s an update to a game mount the VHDX file and run a robocopy /mir type operation against it, from the game folder to the VHDX version, and it’s up to date. No waiting an hour+ to recreate that zip file.

    It’s always something part 2: The disk partatador

    I spent I don’t know how many hours creating this rough draft of a script to create this VHDX file, prep it, mount it, copy to it, compress it and dismount it and actually got it working. Until I ran into an issue.

    I developed this only on one PC. I went to test it on a second PC and the important New-VHD cmdlet came back as not existing.

    So I did some digging and it turns out those VHD related cmdlets I had been using require enable all the Hyper-V related Windows options to be enabled.

    Which, ok require the user to enable Hyper-V…? Well it could the user is running a home version of Windows where that isn’t an option. Or maybe the user just doesn’t want to bother with Hyper-V for any or not reason. I’m not here to judge.

    So that lead to yet another exploration: can I do all things with VHDx files without enabling Hyper-V? And can I detect if Hyper-V is enabled or not and respond accordingly?

    Well it turns out the answer to both questions is “yes”.

    I can create VHDX files using diskpart and I can mount and manipulate them using existing non-cmdlet already provided by Windows. I think entirely diskpart.

    Where do I go from here?

    From this point I could continue the script as if nothing has changed. Just modify my existing function that detects when folders contain less then 50KB to also have an upper limit of ~1.7 or whatever Gigabytes (in case there are updates) and continue on until it’s finished.

    Then once it’s working explore options for folders over 2 Gigabytes: third part utility like 7zip or whichever method of VHDx creation, depending on user preference.

    Or I can switch entirely to using VHDXs. Or I can switch entirely to using 7zip and not bother with compress-archive. Or re-write it in Python since I’ve barely started the re-write anyway.

    Yep. A lot of decisions to make.


    Game Library Auto Archiver
    GitHub Repo:
    https://github.com/tildesarecool/Game-Library-Auto-Archiver

    For those who may also be trying to learn PowerShell, I wanted to point out this the sticky thread on the PowerShell subreddit, What have you done with PowerShell this month?. You can go back through the months worth of these threads and find a lot of tricks and scripts you would not have otherwise come across. I don’t know if there’s a listing of them some place. The “beginner resources” page is pretty great, too.


    Reference links:


    Leave a comment