CRC32 Calculation Under Windows

Discussion in 'Programmer's Corner' started by MrAl, May 2, 2016.

  1. MrAl

    Thread Starter Well-Known Member

    Jun 17, 2014
    2,418
    488
    Hello,


    I am looking for an algorithm to calculate the CRC32 code for a file under Windows. When i say "the CRC32 code" i mean the SAME one that Windows uses in Windows Explorer when creating or checking a .zip file. This would "apparently" be the same as any other .zip file CRC32 calculation, except none of the algorithms found on the web so far actually come up with the same CRC32 code.
    I really need to calculate this myself not use someone else's program.

    The reason for this could be caused by one of several things.

    First, the readme APPNOTE for PKZip shows a "magic number" of 0xdebb20e3 but they do not specify what that actually does. The problem is that the general phrase "magic number" can refer to one of several things, such as the file starter signature (which it may be an older version or something) or maybe the starting CRC32 code, or maybe the ending xor code. So it can be any general constant that the author chose to reference as a "magic number". Some enlightenment on what they actually use that for would help in itself.

    Second, they may have changed the starting code that they used a long time ago, but maybe they didnt.

    Third, they may have changed the ending xor code, or dont use it at all, etc., but maybe they didnt.

    Fourth, the polynomial generator seed they use could be different than that published on the web for PKZip.

    Several of the algorithms i have tried from the web do produce the same code, but not the same as Windows. I even tried a web based CRC32 generator and it produced the same code but still not the same as Windows Explorer.

    I have tried ONE program, called "7-zip", that does produce the same CRC32 code as Windows Explorer (for .zip files) but i do not have the source code, and i need the source code so i can write my own super fast CRC32 generator. I dont think there is any way around this because i need to read files from within a special program i already have written where the files are contained inside a .zip file. To check the validity of the files i need to generate the CRC32 code myself, and it must match the one in Windows or 7-Zip.

    Note this has nothing to do with compressing the file itself, just calculating the CRC32 code so the file can be tested for corruption by recalculating the CRC32 code and then comparing to the CRC32 code stored in the .zip file with the file bytes.

    Hey, thanks for any direct help or any help that could lead to a solution. I had been checking on the web for several hours yesterday and could not come up with a solution. The sad part is that the solution is probably a very simple fix, and sadder yet is this is what happens with incomplete documentation we see on the web quite often.
    What else could help is information on how to reverse engineer the CRC32 generator used in some program. This would not violate any copyright because it is not part of the encryption algorithms used in .zip files sometimes and the CRC32 technique itself is not copyrighted.
     
  2. nsaspook

    AAC Fanatic!

    Aug 27, 2009
    2,907
    2,163
  3. MrAl

    Thread Starter Well-Known Member

    Jun 17, 2014
    2,418
    488
    Hello there,

    Thanks for the link i think that will help a lot. I should have probably looked for that (har har) but was getting too tired last night.

    I found out that one thing that was being done which was not mentioned in most of the code found on the web was that the last CRC32 code that was calculated was reversed bitwise. Thus bits 100011 would turn into 110001 and that would be the final CRC32 code (if it was that short). That makes all the difference. I think that came about because the CRC algorithm originated from Ethernet transmissions used for error checking the received message.

    I find it amazing that there are so many sites that have the incorrect code snips showing. One site i found even goes as far as offering an online CRC32 calculator which allows us to upload a file for calculating the CRC32 code, and the final code comes out totally incorrect, or at least not matching the main standard used in many .zip files :)

    I am still testing, but i found out that the 'magic number' they talk about on some sites is the CRC32 code that results from appending the reversed CRC32 code for the file to the end of the file and then calculating the CRC32 code again for the whole bunch of bytes. If that number appears, then the file is not corrupted, at least to a probability of about 1-1/(2^32) which isnt too bad really. So a way to check the file's validity is to append the bytes and then calculate the CRC32, then compare to the 'magic number'. If it is the same the file is considered not corrupted.
     
    Last edited: May 2, 2016
    nsaspook likes this.
  4. MrAl

    Thread Starter Well-Known Member

    Jun 17, 2014
    2,418
    488
    Hello again,

    I just wanted to add a final note that more clearly explains the magic number.

    The magic number is the complement of the CRC32 code that results from computing the CRC32 of the original file alone with the reversed CRC32 code appended to the end of the file before calculating this second CRC32.
    To be clear however, the appended CRC32 code is reversed bytewise. An example makes this much more clear.

    I'll quote the bytes in a tiny file of three bytes here:
    {#30, #0D, #0A}

    which in order of the file in hex we can write as:
    30 0D 0A

    This file is the result of creating a text file with the single character '0' (a zero) and a single carriage return and line feed (crlf).

    The CRC32 code that matches Windows Explorer zip files and most other zip files for that small file is:
    #8E51ABD1

    and separating the bytes we have (all hex):
    8E 51 AB D1

    To test the file, we first reverse that code byte wise and get:
    D1 AB 51 8E

    then add that to the end of the file and get:
    30 0D 0A D1 AB 51 8E

    then compute the CRC32 for that group of bytes, and we get the CRC32 code:
    #2144DF1C

    and taking the complement of that we get the quoted 'magic number' for any file using this test:
    #DEBB20E3

    It's interesting in that if we do this for ANY file, we always get that same number #DEBB20E3 and that is one way to test the file for validity.

    Normally when the file is first obtained the CRC32 code is computed and then stored either with the file in a zip file or other record file, then at a later date when the file is to be tested the reversed CRC32 code could be appended to the file and then compute the CRC32 code of that and if it results in the magic number then that file is valid. However, since we already have the CRC32 code stored i just compare that to the new CRC32 code computed at the later date to verify the file. So for me the magic number is more of a curiosity than something i might use in the future, but some web sites talk about it so it's worth knowing about i guess.


    It's also a little interesting to me that i have been using CRC32 codes for years and years, but have calculated them using a different polynomial base so i originally got results that dont match most internal zip file CRC32 calculations. If the polynomial is considered good enough though as long as we keep a record of how we calculated it we can always compare it to the file's CRC32 when we want to verify it at a later date. Having a formula that matches many other applications is not a bad idea though as then any one of them can be used.

    Note:
    The poly generator number for this discussion is: 0xEDB88320
    and sometimes the inverse of that is used.
    The magic number is the magic number when using this generator poly as any other poly would require knowing a different magic number, but using another poly would result in CRC32 codes that do not match most .zip files internal calculation for the CRC32.
     
    Last edited: May 4, 2016
    nsaspook likes this.
  5. MrAl

    Thread Starter Well-Known Member

    Jun 17, 2014
    2,418
    488
    Hello again, another small update...


    I checked the idea of appending bytes to the end of the file for testing for validity and realized that there is a very good use for the so called "magic number".

    What i did was created a file for another purpose, well, part of the program i was working on, and it was a C language file with extension ".h" as usual. That kind of file is an include file for the main program file, but it reads as a text file because all the C language source code is in it. That means it can be open in any text editor.

    But here is the interesting thing...
    First i do a CRC32 calculation on the file (after it is completely done being typed into with instructions) and get a number which is four hex digits.
    Next, i append that number after reversing the bytes to the end of the file, but actually modify the original file doing this. This makes the file 4 bytes longer. The file is therefore now stored with the correct CRC32 code for the raw file, but it's backwards.

    Now to verify the file, all i have to do is calculate the CRC32 code for the ENTIRE file, including the last four bytes (which would automatically get included in Windows Explorer when sticking it into a zip file). The result is the invert of the magic number again: 0x2144DF1C

    So now i dont have to store the CRC32 code in another database file because it is already there in the file itself, and that makes the final file CRC32 code the invert magic number 0x2144DF1C again. This would mean that if EVERY file was done in this way, all the CRC32 codes would ALL be 0x2144DF1C so no need to store the codes in a separate data base file. If that code does not come out, the file is corrupt.

    Pretty cool yes, but there is a catch.
    1. The file can never be modified in any way or else the old CRC32 has to be removed and the new added.
    2. That also means it can no longer be resaved from a text editor because the appended bytes will probably not be seen as new text as some could be hex numbers below 0x20.
    3. This one is a real problem. If the code is rebuilt using a compiler, the compiler might complain that there is a problem at the end of the file because there is no crfl at the end (hex 0D 0A). My compiler complains, so this means that if i want to do this i have to remove the four bytes before compiling, then put them back when done. Sort of a big pain.

    I tried putting a comment char at the end of the file first: "//" in C code, so the four hex chars come after that, but the compiler still complained that there was no newline at the end of the file and gave errors because it tried to compile those four bytes as actual typed in C code. Too bad, that messes things up.

    So i might just go back to the data base storage of the CRC32 codes, or else i'll have to put up with removing the codes from lots of files before compiling, which i think is more trouble than it is worth except for maybe really important files.

    LATER:
    Found that removing the last CRLF from the end of the file before appending the backward CRC32 code works better. With a comment at the end of the file to begin with, the CRC32 code just looks like weird text to the compiler so it ignores it.
    It may not work with all CRC32 codes however as they may have low value characters like 0x00 or 0x01 etc., which may bother the compiler.

    Have fun :)
     
    Last edited: May 5, 2016
Loading...