Is my file encrypted or just a binary?

Thread Starter

strantor

Joined Oct 3, 2010
5,905
I pulled a file out of an embedded device that I'm trying to hack and it's not in a readable format. I'm trying to determine if it's encrypted or just a binary. I wrote a little script to count the occurrences of bytes to see if the distribution matches the typical distribution of letters in written text and what I found is that no, it doesn't, and actually I'm not sure what I found, apart from confusion:
Code:
[in order of # of occurrences]
Byte ... Int ...ASCII..# of occurrences
0x20 ... 032 ...   ... 1507
0x4C ... 076 ... L ... 833
0x30 ... 048 ... 0 ... 824
0x2E ... 046 ... . ... 821
0xA2 ... 162 ... ¢ ... 819
0x77 ... 119 ... w ... 813
0x61 ... 097 ... a ... 812
0xEA ... 234 ... ê ... 812
0x40 ... 064 ... @ ... 810
0x63 ... 099 ... c ... 810
0xA1 ... 161 ... ¡ ... 810
0xAB ... 171 ... « ... 810
0x07 ... 007 ...  ... 808
0x5D ... 093 ... ] ... 808
0x60 ... 096 ... ` ... 806
0xCB ... 203 ... Ë ... 805
0x4D ... 077 ... M ... 803
0x6D ... 109 ... m ... 803
0xB5 ... 181 ... µ ... 803
0xC3 ... 195 ... Ã ... 802
0x3B ... 059 ... ; ... 801
0x98 ... 152 ...  ... 801
0x25 ... 037 ... % ... 800
0x2F ... 047 ... / ... 800
0xE5 ... 229 ... å ... 799
0xF4 ... 244 ... ô ... 799
0x33 ... 051 ... 3 ... 798
0x43 ... 067 ... C ... 798
0xBC ... 188 ... ¼ ... 798
0xE8 ... 232 ... è ... 798
0xAC ... 172 ... ¬ ... 796
0x0B ... 011 ... ... 795
0x0C ... 012 ... ... 795
0xFD ... 253 ... ý ... 795
0x02 ... 002 ...  ... 794
0x17 ... 023 ...  ... 794
0x62 ... 098 ... b ... 794
0xD9 ... 217 ... Ù ... 794
0x69 ... 105 ... i ... 793
0xA0 ... 160 ...   ... 793
0xC5 ... 197 ... Å ... 793
0xEE ... 238 ... î ... 793
0x08 ... 008 ... ... 792
0x36 ... 054 ... 6 ... 792
0xDD ... 221 ... Ý ... 792
0x65 ... 101 ... e ... 791
0x70 ... 112 ... p ... 791
0x99 ... 153 ...  ... 791
0xB7 ... 183 ... · ... 791
0x1A ... 026 ...  ... 790
0xAA ... 170 ... ª ... 790
0x12 ... 018 ...  ... 789
0x42 ... 066 ... B ... 789
0xE0 ... 224 ... à ... 789
0x10 ... 016 ...  ... 788
0x16 ... 022 ...  ... 788
0xC0 ... 192 ... À ... 788
0xF2 ... 242 ... ò ... 788
0x9C ... 156 ...  ... 787
0x53 ... 083 ... S ... 786
0xBE ... 190 ... ¾ ... 786
0x8D ... 141 ...  ... 785
0xFA ... 250 ... ú ... 785
0x54 ... 084 ... T ... 784
0x5A ... 090 ... Z ... 784
0x81 ... 129 ...  ... 784
0xBA ... 186 ... º ... 784
0xA8 ... 168 ... ¨ ... 783
0xCD ... 205 ... Í ... 783
0xF7 ... 247 ... ÷ ... 783
0xFB ... 251 ... û ... 783
0x2C ... 044 ... , ... 782
0x7F ... 127 ...  ... 782
0x8F ... 143 ...  ... 782
0x95 ... 149 ...  ... 782
0xD1 ... 209 ... Ñ ... 782
0x28 ... 040 ... ( ... 781
0x1D ... 029 ...  ... 780
0x26 ... 038 ... & ... 780
0x74 ... 116 ... t ... 780
0x67 ... 103 ... g ... 779
0xC2 ... 194 ... Â ... 779
0x13 ... 019 ...  ... 778
0xA7 ... 167 ... § ... 778
0xB0 ... 176 ... ° ... 778
0xD4 ... 212 ... Ô ... 778
0x7E ... 126 ... ~ ... 777
0x83 ... 131 ...  ... 777
0xBB ... 187 ... » ... 777
0x03 ... 003 ...  ... 776
0x68 ... 104 ... h ... 776
0xC1 ... 193 ... Á ... 776
0xD7 ... 215 ... × ... 776
0x64 ... 100 ... d ... 775
0x7D ... 125 ... } ... 775
0x80 ... 128 ...  ... 775
0xF3 ... 243 ... ó ... 775
0x56 ... 086 ... V ... 774
0x6C ... 108 ... l ... 774
0xDB ... 219 ... Û ... 774
0x46 ... 070 ... F ... 773
0x5E ... 094 ... ^ ... 773
0x0E ... 014 ...  ... 772
0xDF ... 223 ... ß ... 772
0x19 ... 025 ...  ... 771
0x86 ... 134 ...  ... 771
0x88 ... 136 ...  ... 771
0xB3 ... 179 ... ³ ... 771
0xD0 ... 208 ... Ð ... 771
0xDE ... 222 ... Þ ... 771
0x23 ... 035 ... # ... 770
0x37 ... 055 ... 7 ... 770
0x87 ... 135 ...  ... 770
0xD5 ... 213 ... Õ ... 770
0x2D ... 045 ... - ... 769
0x73 ... 115 ... s ... 769
0x90 ... 144 ...  ... 769
0xAE ... 174 ... ® ... 769
0x5C ... 092 ... \ ... 768
0x7C ... 124 ... | ... 768
0xE3 ... 227 ... ã ... 768
0x57 ... 087 ... W ... 767
0x5B ... 091 ... [ ... 767
0x6B ... 107 ... k ... 767
0xBD ... 189 ... ½ ... 767
0xF8 ... 248 ... ø ... 767
0xD6 ... 214 ... Ö ... 766
0x85 ... 133 ...  ... 765
0x91 ... 145 ...  ... 765
0xD3 ... 211 ... Ó ... 765
0x47 ... 071 ... G ... 764
0x51 ... 081 ... Q ... 764
0x9B ... 155 ...  ... 764
0x04 ... 004 ...  ... 763
0x9E ... 158 ...  ... 763
0xAD ... 173 ...  ... 763
0xCC ... 204 ... Ì ... 763
0xF5 ... 245 ... õ ... 763
0x0F ... 015 ...  ... 762
0x1F ... 031 ...  ... 762
0x21 ... 033 ... ! ... 762
0xDA ... 218 ... Ú ... 762
0xE7 ... 231 ... ç ... 762
... 761
0x18 ... 024 ...  ... 761
0x9D ... 157 ...  ... 761
0xDC ... 220 ... Ü ... 761
0x22 ... 034 ... " ... 760
0x45 ... 069 ... E ... 760
0x82 ... 130 ...  ... 760
0x96 ... 150 ...  ... 760
0xE1 ... 225 ... á ... 760
0x3D ... 061 ... = ... 759
0xC8 ... 200 ... È ... 759
0xD2 ... 210 ... Ò ... 759
0xE9 ... 233 ... é ... 759
0xB9 ... 185 ... ¹ ... 758
0x59 ... 089 ... Y ... 757
0x89 ... 137 ...  ... 757
0x48 ... 072 ... H ... 756
0xA9 ... 169 ... © ... 756
0x35 ... 053 ... 5 ... 755
0x4F ... 079 ... O ... 754
0xBF ... 191 ... ¿ ... 754
0xCA ... 202 ... Ê ... 754
0x05 ... 005 ...  ... 753
0x32 ... 050 ... 2 ... 753
0xA6 ... 166 ... ¦ ... 753
0xED ... 237 ... í ... 753
0x27 ... 039 ... ' ... 752
0x41 ... 065 ... A ... 752
0x5F ... 095 ... _ ... 752
0x78 ... 120 ... x ... 752
0x7A ... 122 ... z ... 752
0xF1 ... 241 ... ñ ... 752
0x01 ... 001 ...  ... 751
0x49 ... 073 ... I ... 751
0x4E ... 078 ... N ... 751
0x6E ... 110 ... n ... 751
0x06 ... 006 ...  ... 750
0x58 ... 088 ... X ... 750
0x71 ... 113 ... q ... 750
0xA5 ... 165 ... ¥ ... 750
0xFF ... 255 ... ÿ ... 750
0x79 ... 121 ... y ... 749
0xA3 ... 163 ... £ ... 749
0xB1 ... 177 ... ± ... 749
0x44 ... 068 ... D ... 748
0x4A ... 074 ... J ... 748
0x6A ... 106 ... j ... 748
0xC4 ... 196 ... Ä ... 748
0xC6 ... 198 ... Æ ... 748
0x14 ... 020 ...  ... 747
0x9F ... 159 ...  ... 747
0xF6 ... 246 ... ö ... 747
0x34 ... 052 ... 4 ... 746
0xCF ... 207 ... Ï ... 746
0x09 ... 009 ...      ... 745
0x0A ... 010 ...
... 745
0x31 ... 049 ... 1 ... 745
0x66 ... 102 ... f ... 745
0x8C ... 140 ...  ... 745
0x97 ... 151 ...  ... 745
0xB6 ... 182 ... ¶ ... 745
0xE6 ... 230 ... æ ... 745
0x2A ... 042 ... * ... 744
0xC7 ... 199 ... Ç ... 744
0xC9 ... 201 ... É ... 744
0xEC ... 236 ... ì ... 744
0x38 ... 056 ... 8 ... 743
0x39 ... 057 ... 9 ... 743
0xFE ... 254 ... þ ... 743
0x3C ... 060 ... < ... 742
0x75 ... 117 ... u ... 741
0xCE ... 206 ... Î ... 741
0x1B ... 027 ...  ... 740
0x72 ... 114 ... r ... 740
0xF9 ... 249 ... ù ... 740
0x8B ... 139 ...  ... 739
0x29 ... 041 ... ) ... 738
0x3E ... 062 ... > ... 737
0x7B ... 123 ... { ... 737
0xA4 ... 164 ... ¤ ... 737
0x1E ... 030 ...  ... 736
0x3A ... 058 ... : ... 736
0x92 ... 146 ...  ... 736
0xF0 ... 240 ... ð ... 736
0xE4 ... 228 ... ä ... 735
0x6F ... 111 ... o ... 732
0x8E ... 142 ...  ... 732
0x93 ... 147 ...  ... 731
0x3F ... 063 ... ? ... 730
0x4B ... 075 ... K ... 729
0x9A ... 154 ...  ... 729
0xEB ... 235 ... ë ... 729
0x8A ... 138 ...  ... 728
0x84 ... 132 ...  ... 727
0x15 ... 021 ...  ... 725
0xD8 ... 216 ... Ø ... 725
0xE2 ... 226 ... â ... 725
0x50 ... 080 ... P ... 724
0x1C ... 028 ...  ... 722
0xB2 ... 178 ... ² ... 722
0x24 ... 036 ... $ ... 720
0x2B ... 043 ... + ... 720
0x94 ... 148 ...  ... 719
0x52 ... 082 ... R ... 718
0x11 ... 017 ...  ... 716
0xAF ... 175 ... ¯ ... 714
0xFC ... 252 ... ü ... 711
0x76 ... 118 ... v ... 708
0xEF ... 239 ... ï ... 708
0x55 ... 085 ... U ... 701
0xB8 ... 184 ... ¸ ... 696
0xB4 ... 180 ... ´ ... 693
0x00 ... 000 ...   ... 0
Total # of bytes: 256
Average # of occurrences: 765.28125
Apart from 0x20 with 1507 occurrences, every byte is represented pretty close to the same number of times. 765 times +/- 68. It seems to me, whether encrypted or binary, there would be some bytes that get used a whole lot more than others. What do these findings mean, if anything? Does this look like a binary or the work of an encryption?
 
Last edited:

MrSalts

Joined Apr 2, 2020
1,501
Embedded devices either have numerical values or commands (which are binary "words"and you can look those up for each microcontroller). Or register assignments or bit-sized flags on various bytes. There is very little text involved unless you are writing to a display or serial messages once the hex-file is generated.

note, a command may be decrement-skip-if-zero and that can be a 6-bit value for 32-instruction processor.
 

boostbuck

Joined Oct 5, 2017
179
It is binary, and could perhaps be encrypted.

What is the data stream you are examining? Do you have any reason to expect it to conform to english text?

I assume it is executable code, rather than data. If executable, it is unlikely to be encrypted. If encrypted, it is unlikely to demonstrate simple statistical distribution patterns - most encryption these days has that door blocked.

Your statistical analysis tells you: if it is data then it is not unencrypted english text, and likely not any other unencrypted language. Not much else. If code, your analysis means little.

If it is executable code, you can try a disassembler for the particular processor it executes on to see if you can generate a readable listing. If data, you need more information about how it is used.
 

michael8

Joined Jan 11, 2015
271
Also you might not know the bit order. If for example it was an 8 bit memory with bits 76543210 perhaps it was
used with the bits wired as 01234567 or even 10356724.
 

xox

Joined Sep 8, 2017
674
The fact that there are roughly twice as many occurrences of whitespace (0x20) strongly indicates that there is at least some kind of structure to the data. Perhaps that value is used as a delimiter between each "section"?

First examine the first few bytes of the data in ASCII to see if a "signature" is written there. If so, you are almost definitely dealing with a bona fide file format. (Which is bound to be documented somewhere.) Otherwise you can check to see whether or not the the there is an opcode there. (A jump instruction for example.) What CPU is this running on?
 

Thread Starter

strantor

Joined Oct 3, 2010
5,905
It is binary, and could perhaps be encrypted.

What is the data stream you are examining? Do you have any reason to expect it to conform to english text?

I assume it is executable code, rather than data.
This is related to this thread. I retrieved the file from a BeagleBone Black that is embedded in one of my existing rail units. The BeagleBone just runs a HTTP server to host a web page to interface with the real star of the show, another embedded device (M6EM RFID module). When you upload firmware updates through the web page, the BeagleBone decrypts/decompresses the file, keeps the BeagleBone-specific parts and sends the RFID-specific parts to the RFID module. The file is called "m6em_module_image.sim" and I suspect it remained on the BeagleBone through poor housekeeping and contains the secret to decoding ISO10374 RFID tags.

I see you are not intending to show the retrieved "text".
My goal is not to crack it publicly on the internet and torpedo the manufacturer's IP. My goal isn't even to avoid paying for the unlock codes. I currently have an open order for 5 more readers including unlock codes, but they're backordered for several more months and I need a solution sooner. I'm trying to find the specific means by which these RFID tags are decoded. I already purchased the ISO10374 spec and it was laughably vague. A bunch of "shall" instead of "how." I can not find any documentation on the internet about it as it is a very niche thing, apparently locked up in the minds of a handful of industry experts. So I am reduced to trying to backwards engineer the answers so that I can implement them on a totally different platform. My employer paid for these devices and the software on them, and I don't want to be responsible for pirating their software in a public forum.
 

Thread Starter

strantor

Joined Oct 3, 2010
5,905
The fact that there are roughly twice as many occurrences of whitespace (0x20) strongly indicates that there is at least some kind of structure to the data. Perhaps that value is used as a delimiter between each "section"?

First examine the first few bytes of the data in ASCII to see if a "signature" is written there. If so, you are almost definitely dealing with a bona fide file format. (Which is bound to be documented somewhere.) Otherwise you can check to see whether or not the the there is an opcode there. (A jump instruction for example.) What CPU is this running on?
Yes, the signature is present. 8 ASCII characters followed by 3 spaces "TM-Spaik ".

"TM" probably stands for "ThingMagic" - the manufacturer of the module. I don't know what the other letters might mean.

I don't know what processor it is. It's hidden under a RF can. I'll see in the morning if I can peel it off without destroying the module. I doubt they would have designed their own processor, but maybe.
 

xox

Joined Sep 8, 2017
674
Seems to be some undocumented proprietary format. In that case even if you were able to extract some measure of meaning from the whole mess of it that would still likely leave too many unknowns.

Going that "low-level" with things can be a great challenge but realistically obtaining the necessary hardware will be a much better guarantee for success. Unless of course you have a lot of time to work on it of course, but you did mention a rapidly approaching deadline.

Why don't you contact the railroad? Maybe someone there could steer you toward a more reliable supplier. Or (if you can afford it) just pay a visit to the factory itself. It may seem a little unorthodox but my grandfather for one was rather keen of that method of doing business. Walk in with a checkbook and walk out with a bill of sale!
 

Thread Starter

strantor

Joined Oct 3, 2010
5,905
Seems to be some undocumented proprietary format. In that case even if you were able to extract some measure of meaning from the whole mess of it that would still likely leave too many unknowns.

Going that "low-level" with things can be a great challenge but realistically obtaining the necessary hardware will be a much better guarantee for success. Unless of course you have a lot of time to work on it of course, but you did mention a rapidly approaching deadline.

Why don't you contact the railroad? Maybe someone there could steer you toward a more reliable supplier. Or (if you can afford it) just pay a visit to the factory itself. It may seem a little unorthodox but my grandfather for one was rather keen of that method of doing business. Walk in with a checkbook and walk out with a bill of sale!
So it isn't the railroad who runs this show, it's a company called TransCore. From what I have been able to put together, they wrote the spec that was adopted by ISO and the American Association of Railroads, and they wrote it based on the equipment they had already designed, and that's why the spec is so worthless. It's intentionally obfuscated to ensure transcore's market share. TransCore had the market cornered on these devices for years and only in the past few years did they start getting any competition. Railroads like to standardize on things, so they still blindly buy Transcore stuff at 10x the price, while smaller outfits, companies with railyards (like my employer) buy from the competition. When I was told about the shortage of the units I buy, I went out and got quotes for transcore readers and the lead time was even longer. The whole market is backordered. Chip shortage nonsense. Either I crack this nut or we reschedule basically a whole year's worth of scheduled projects. I gave an honest estimate of my ability to get it done, so there is already a finger hovering over the reschedule button, but I still would like to not fail.
 

Thread Starter

strantor

Joined Oct 3, 2010
5,905
Could it just be compressed with a common algorithm like LZW?
I'll look into LZW, never heard of it. I was able to open some of the other stuff having odd file extensions with tar on Linux and WinZip on windows, but not this one.
 

BobTPH

Joined Jun 5, 2013
4,735
Could it just be compressed with a common algorithm like LZW?
Good call. A good compression algorithm should produce a uniform distribution of byte values.

I would expect a typical binary data file to have patterns if repeated values unless compressed.

Bob
 

Thread Starter

strantor

Joined Oct 3, 2010
5,905
I eneded up not going back into the office until today. I am still researching a few things (LZW is one) but for now I can at least answer this:
What CPU is this running on?
Under the RF can, I found an Atmel AT91SAM7S256 and an Impinj R2000M2A. I don't know which of those two the "m6em_module_image.sim" file belongs to. I went over lots of Atmel/Microchip documentation trying to figure out if "*.sim" is a file format that they use, but didn't find anything other than the fact that there are a lot of options and it's possible that it is for the Atmel chip. But I'm leaning more towards the Impinj R2000 RFID chip. I am waiting on Impinj to grant me access to their developer portal so I can access the documents that will confirm/refute.
 
Top