Python adventure Part 2: Parsing binary files (Atari 2600 project)

If you want to start at the beginning, please see part one about this project. To see a list of the posts about this so far, see the posts under the category for the project. I created a GitHub project, if you would like to see the current version of the Python script.

Note: I wrote below in June 2023 and some how lost motivation to continue. I actually had a little more progress than that listed here, but what’s covered seems like a good stopping point. Eventually I’ll make a part 3.


As covered outlined briefly in the first post, I wanted to figure out a rudimentary way of identifying individual 2600 carts and loading an associated rom into an emulator. This would be with the aid of an Arduino to do the actual ROM reading.

Since that post I’ve actually made a lot of progress and learned a lot of things.

I decided to use Atari 2600 carts in particular because of of how amazingly primitive the technology is. I mean you might think you understand how primitive this 1977 level technology is, but actually it’s even more primitive than that.

I’m just emphasizing this as a lead up to say these binary ROM files are sometime only 2KB in size. Two…Kilobytes…

I often format my hard drives at 64K sectors. And these are 2KB. There are larger ones, actually. Some as large as 32 KBs. Still quite small though.

And there is not standard to the ROM format: no standard lead characters, no FFDD sort of footer to denote the end of the binary data. Nothing.

As mentioned in the last post I saw that binary to hex Ruby script from a blog and it got me thinking: what does that actually mean and do?

Apparently it means bringing the blob of binary, what is effectively binary “noise” and converting to a hexadecimal equivalent as an ASCII string. If you don’t know what means I’ll get to it.

Fortunately, Python already has a function for such things as converting binary to hex strings. I actually got ChatGPT’s help for this but at least it seems quite simple. I just hard code in the name of the ROM file and make sure it exists, for instance.

# rudimentary bin-to-hex.py script
infile = open('atarigame.bin', 'br' ) # pff. ifexist checks are for chumps

line_data = ""
while True: # overly easy way to just-keep-reading the binary
    bytes_read = infile.read(16) # read this many bytes at a time
    if not bytes_read: # dumb way saying "until end of file"
        break # breaks out of the while true infinite loop
    for byte in bytes_read: # loop through the data captured in the byte_read variable
        byte_hex = format(byte, '02x') # apparently format it as "02 hex" which means hex expressed only as 2 characters, e.g. FF is the highest
        line_data += byte_hex # use this variable to consolidate this new hex value one value at a time
infile.close() # have to remember to close the file when done

#print(line_data) # debug to check output before I added the write to text functionality

# Write the hex string to a text file named 
with open('atarigame.txt', 'w') as outfile:
    outfile.write(line_data)

This produces a text file with the entire binary as a big hex string.

To make sure this was really working, I also created a script to convert from that hex string back into a binary. Then loading that binary into an emulator (a web based one written in WASM, no less) – the game loaded as normal.

The next steps seemed relatively obvious to take on one at a time:

  • Point my bin-to-hex script at a directory containing many ROM files and have it loop through each one and perform this operation – leaving behind a text file with the name of the rom file with txt extension.
  • Be able to specify the file extension of the rom files I want to perform the action on (bin or a26 or whatever)
  • Add the ability to provide a text file with one file name per line as input – for the script to operate against

All these features ended up requiring command line arguments which meant usage diagrams and examples listed. ChatGPT was almost entirely helpful with this.

The next step was to get it to read ONLY the first 32 bytes of the ROM file. I chose 32 bytes arbitrarily because I didn’t think there was a reason that these individual games written in assembly would have any chance of having the same 32 bytes at their head.

So I had the script loop through the binary files in a directory, reading the first 32 bytes and writing that string a to single log file containing the file name and the hex string separated by a tab. Here’s an example:

Activision Decathlon, The (USA).a26	c1888502852a1001c88506b1bf8507b1b7851c18a5a265d885a2852ba9009002
Adventure (USA).a26	93bd49ff859420a1f2c8b1938584c8b1938585ad82022908f00bbd4aff20d3f2
Adventures of Tron (USA).a26	02a9fd205af3a5b3c9909022a99085b3d078a9f9205af3a5b3c904b0112007fe
Air Raid (USA).a26	f01eb5dcc9d0b018100bd6c2bd00f7d5c2900db009f6c2bdfdf6d5c2b00295c2

This is just sample data, by the way. It’s hard to tell due to wrapping, but there’s a tab separating the data.

As a side note, I chose tab as the separator instead of something easy like a comma because tab separated text can be directly pasted into google sheets and automatically separate into columns.

As my next step I wrote a separate script to check the log file for duplicate strings. But this script turned out not to be necessary as several ROM files simply had a string either FFs or 00s for the first 256 bytes. These were the larger sized games like Tunnel Runner (coming in as 12K). I don’t know why some seem to have that header but it’s probably related to “memory banking”. Whatever that is.

This lead me to conclude I need to skip this possible header string to get my arbitrary UID.

Side note: I had been using this same "conversation" with ChatGPT continually up to now. I had added features and re-defined functionality so many times I think I had reached the end of its abilities. So I started a new chat, pasted in the last working version of the script and very carefully worded my request for the additional functionality I wanted. And it actually gave me back a working script with the new functionality. Just a minor tip for (free) ChatGPT 3.5 (as of June 2023, anyway).

Eventually I did fact get the functionality of skipping a specified number of bytes into the ROM file and capturing 32 bytes starting at that point then writing that to the log file with the file name of the ROM. That’s where I am as I write this. I should probably decide on a standard number of bytes into the ROM to start on. I was thinking of 640 bytes as a hilarious homage.

There is one last feature I would like to add to this script: I want to be able to give the script a loosely based version of the final functionality: give the script a 32 byte hex string and have it look up the ROM and run it. Somewhat simulating the final functionality with the cartridges. That would likely work better as a separate script.

I still have yet to actually read a single physical cartridge or program an Arduino. So that will likely be the next step or a step for some future date: hello world Latte Panda v1 and the world of 2016 era Windows 10. Or more likely Linux.


References and links of note:

Leave a comment