Ok I am trying to write a disassemblier for the PIC18
insruction set and registers are as follows
I am confused about how to go about translating the hex file back into asm. Ok for every 4 character = 2 bytes = 1 instruction.
Should I convert this hex decemial instruction to binary and start comparing with the bit's of each instruction set until I get a match.
For instance say I have 2700 = 0010 0111 0000 0000 in binary
if you load an array of string values of the binary values for the opcodes.
For instance addwf F,D,A 0010 01da ffff ffff just have an array entry
"0010 01da ffff ffff".
Then is it just taking 0010 0111 0000 0000 and comparing bit by bit until you weed out the only one that fits.
Because I am just worried if any opcodes could have the same starting byte, If so then how many bits do I have to check before I know the correct opcode. Seems to me it would vary which makes this all the more harder.
It is easy when you have an opcode that has only one binary value. But my problem is I haven't found away to distingish between opcodes that have many binary values assoiated to them.
Any help or suggestions would be great.
0010 0111 0000 0000 => addwf 0 right? but what if the da = 00 instead of 11 How would i represent this differently
I guess I don't understand
So for addwf would I write for when it is da =00 as opposed to 11?
I now get f means register , k is constant but I am confused how to display the difference in d a b s when dissasemblying?
I figured out how I am going to disassembly basically look for full matches first and go to smaller and smaller sizes...etc
For example it will try to match with
first then go to the next biggest , something like match the first 10 chars then to the first 8 char ,...etc
But what to do with d a b s ??
insruction set and registers are as follows
Rich (BB code):
nop 0000 0000 0000 0000
;nil 1111 pppp pppp pppp
addwf F,D,A 0010 01da ffff ffff
addwfc F,D,A 0010 00da ffff ffff
andwf F,D,A 0001 01da ffff ffff
clrf F,A 0110 101a ffff ffff
comf F,D,A 0001 11da ffff ffff
cpfseq F,A 0110 001a ffff ffff
cpfsgt F,A 0110 010a ffff ffff
cpfslt F,A 0110 000a ffff ffff
decf F,D,A 0000 01da ffff ffff
decfsz F,D,A 0010 11da ffff ffff
dcfsnz F,D,A 0100 11da ffff ffff
incf F,D,A 0010 10da ffff ffff
incfsz F,D,A 0011 11da ffff ffff
infsnz F,D,A 0100 10da ffff ffff
iorwf F,D,A 0001 00da ffff ffff
movf F,D,A 0101 00da ffff ffff
movff Y 1100 ffff ffff ffff
movwf F,A 0110 111a ffff ffff
mulwf F,A 0000 001a ffff ffff
negf F,A 0110 110a ffff ffff
rlcf F,D,A 0011 01da ffff ffff
rlncf F,D,A 0100 01da ffff ffff
rrcf F,D,A 0011 00da ffff ffff
rrncf F,D,A 0100 00da ffff ffff
setf F,A 0110 100a ffff ffff
subfwb F,D,A 0101 01da ffff ffff
subwf F,D,A 0101 11da ffff ffff
subwfb F,D,A 0101 10da ffff ffff
swapf F,D,A 0011 10da ffff ffff
tstfsz F,A 0110 011a ffff ffff
xorwf F,D,A 0001 10da ffff ffff
bcf F,B,A 1001 bbba ffff ffff
bsf F,B,A 1000 bbba ffff ffff
btfsc F,B,A 1011 bbba ffff ffff
btfss F,B,A 1010 bbba ffff ffff
btg F,B,A 0111 bbba ffff ffff
bc N 1110 0010 nnnn nnnn
bn N 1110 0110 nnnn nnnn
bnc N 1110 0011 nnnn nnnn
bnn N 1110 0111 nnnn nnnn
bnov N 1110 0101 nnnn nnnn
bnz N 1110 0001 nnnn nnnn
bov N 1110 0100 nnnn nnnn
bra M 1101 0nnn nnnn nnnn
bz N 1110 0000 nnnn nnnn
call W 1110 110s kkkk kkkk
clrwdt 0000 0000 0000 0100
daw 0000 0000 0000 0111
goto W 1110 1111 kkkk kkkk
pop 0000 0000 0000 0110
push 0000 0000 0000 0101
rcall M 1101 1nnn nnnn nnnn
reset 0000 0000 1111 1111
retfie S 0000 0000 0001 000s
retlw K 0000 1100 kkkk kkkk
return S 0000 0000 0001 001s
sleep 0000 0000 0000 0011
addlw K 0000 1111 kkkk kkkk
andlw K 0000 1011 kkkk kkkk
iorlw K 0000 1001 kkkk kkkk
lfsr Z 1110 1110 00ff kkkk
movlb C 0000 0001 0000 kkkk
movlw K 0000 1110 kkkk kkkk
mullw K 0000 1101 kkkk kkkk
sublw K 0000 1000 kkkk kkkk
xorlw K 0000 1010 kkkk kkkk
tblrd* 0000 0000 0000 1000
tblrd*+ 0000 0000 0000 1001
tblrd*- 0000 0000 0000 1010
tblrd+* 0000 0000 0000 1011
tblwt* 0000 0000 0000 1100
tblwt*+ 0000 0000 0000 1101
tblwt*- 0000 0000 0000 1110
tblwt+* 0000 0000 0000 1111
Rich (BB code):
0FFF TOSU
0FFE TOSH
0FFD TOSL
0FFC STKPTR
0FFB PCLATU
0FFA PCLATH
0FF9 PCL
0FF8 TBLPTRU
0FF7 TBLPTRH
0FF6 TBLPTRL
0FF5 TABLAT
0FF4 PRODH
0FF3 PRODL
0FF2 INTCON
0FF1 INTCON2
0FF0 INTCON3
0FEF INDF0
0FEE POSTINC0
0FED POSTDEC0
0FEC PREINC0
0FEB PLUSW0
0FEA FSR0H
0FE9 FSR0L
0FE8 WREG
0FE7 INDF1
0FE6 POSTINC1
0FE5 POSTDEC1
0FE4 PREINC1
0FE3 PLUSW1
0FE2 FSR1H
0FE1 FSR1L
0FE0 BSR
0FDF INDF2
0FDE POSTINC2
0FDD POSTDEC2
0FDC PREINC2
0FDB PLUSW2
0FDA FSR2H
0FD9 FSR2L
0FD8 STATUS
0FD7 TMR0H
0FD6 TMR0L
0FD4 0FD4h
0FD5 T0CON
0FD3 OSCCON
0FD2 LVDCON
0FD1 WDTCON
0FD0 RCON
0FCF TMR1H
0FCE TMR1L
0FCD T1CON
0FCC TMR2
0FCB PR2
0FCA T2CON
0FC9 SSPBUF
0FC8 SSPADD
0FC7 SSPSTAT
0FC6 SSPCON1
0FC5 SSPCON2
0FC4 ADRESH
0FC3 ADRESL
0FC2 ADCON0
0FC1 ADCON1
0FC0 ADCON2
0FBF CCPR1H
0FBE CCPR1L
0FBD CCP1CON
0FBC CCPR2H
0FBB CCPR2L
0FBA CCP2CON
0FB9 CCPR3H
0FB8 CCPR3L
0FB7 CCP3CON
0FB6 0FB6h
0FB5 CVRCON
0FB4 CMCON
0FB3 TMR3H
0FB2 TMR3L
0FB1 T3CON
0FB0 PSPCON
0FAF SPBRG1
0FAE RCREG1
0FAD TXREG1
0FAC TXSTA1
0FAB RCSTA1
0FAA EEADRH
0FA9 EEADR
0FA8 EEDATA
0FA7 EECON2
0FA6 EECON1
0FA5 IPR3
0FA4 PIR3
0FA3 PIE3
0FA2 IPR2
0FA1 PIR2
0FA0 PIE2
0F9F IPR1
0F9E PIR1
0F9D PIE1
0F9C MEMCON
0F9B 0F9Bh
0F9A TRISJ
0F99 TRISH
0F98 TRISG
0F97 TRISF
0F96 TRISE
0F95 TRISD
0F94 TRISC
0F93 TRISB
0F92 TRISA
0F91 LATJ
0F90 LATH
0F8F LATG
0F8E LATF
0F8D LATE
0F8C LATD
0F8B LATC
0F8A LATB
0F89 LATA
0F88 PORTJ
0F87 PORTH
0F86 PORTG
0F85 PORTF
0F84 PORTE
0F83 PORTD
0F82 PORTC
0F81 PORTB
0F80 PORTA
Should I convert this hex decemial instruction to binary and start comparing with the bit's of each instruction set until I get a match.
For instance say I have 2700 = 0010 0111 0000 0000 in binary
if you load an array of string values of the binary values for the opcodes.
For instance addwf F,D,A 0010 01da ffff ffff just have an array entry
"0010 01da ffff ffff".
Then is it just taking 0010 0111 0000 0000 and comparing bit by bit until you weed out the only one that fits.
Because I am just worried if any opcodes could have the same starting byte, If so then how many bits do I have to check before I know the correct opcode. Seems to me it would vary which makes this all the more harder.
It is easy when you have an opcode that has only one binary value. But my problem is I haven't found away to distingish between opcodes that have many binary values assoiated to them.
Any help or suggestions would be great.
0010 0111 0000 0000 => addwf 0 right? but what if the da = 00 instead of 11 How would i represent this differently
I guess I don't understand
Rich (BB code):
where: f register file address
d destination select:
(0, -> w), (1 -> f)
the letters w or f may be used
to select the destination
s destination select:
(0, -> f and w), (1, -> f)
the letters w or f may be used
to select the destination
t table byte select:
(0, -> lower byte)
(1, -> upper byte)
i table pointer control
(0, -> no change)
(1, -> post increment)
b bit address of an 8-bit file register
p peripheral register file address
k literal constant
label label name
I now get f means register , k is constant but I am confused how to display the difference in d a b s when dissasemblying?
I figured out how I am going to disassembly basically look for full matches first and go to smaller and smaller sizes...etc
For example it will try to match with
Rich (BB code):
tblrd* 0000 0000 0000 1000
tblrd*+ 0000 0000 0000 1001
tblrd*- 0000 0000 0000 1010
tblrd+* 0000 0000 0000 1011
tblwt* 0000 0000 0000 1100
tblwt*+ 0000 0000 0000 1101
tblwt*- 0000 0000 0000 1110
tblwt+* 0000 0000 0000 1111
Rich (BB code):
lfsr Z 1110 1110 00ff kkkk
Last edited: