XC8: .asm to C ( i.e. Lowering One's Expectations)

nsaspook

Joined Aug 27, 2009
16,336
[


Well....talk about bloat!

So, I need a state machine in an interrupt so as to quickly execute some action based upon the current state. I encoded it the same as before in the main line code, and lo and behold:

Code:
/opt/microchip/xc8/v2.40/pic/sources/c99/common/Umul16.c:15:: advisory: (1510) non-reentrant function "___wmul" appears in multiple call graphs and has been duplicated by the compiler
To select the proper state function from the function array, 16 bit multiplier code must be linked into the app. This code is non-reentrant, so the compiler must import at least two copies of the same multiplier function. I assume if I include state machines in other interrupt handlers, each will require their own respective 16 bit mulitplier routines.

This for something that I can do in .asm with a handful of instructions.

Just FYI -- as the project get bigger, the C to .asm code size ratio is getting worse. I'm at about 100:1 right now -- mostly because I needed sprintf to do some byte to ASCII conversion. Obviously, I'll have to hand turn some conversion code. I cannot depend on the stock libs to produce reasonably sized/speed code.

And...the fact that I have to perform a multiplication in an interrupt handler! Absolutely atrocious. It'll probably be quicker to decode the 16 states with the stock and if-then-else structure.

I hate C. At least for 8 bit micros. For all the reasons I've been bitching about the last 10 years. I was right.
MPLAB® XC8 C Compiler User’s Guide
5.9.7.1 DISABLING DUPLICATION

The automatic duplication of the function can be inhibited by the use of a special pragma. This should only be done if the source code guarantees that an interrupt cannot occur while the function is being called from any main-line code. Typically this would be achieved by disabling interrupts before calling the function. It is not sufficient to disable the interrupts inside the function after it has been called; if an interrupt occurs when executing the function, the code can fail. See Section 5.9.5 “Enabling Interrupts”, for more information on how interrupts can be disabled. The pragma is: #pragma interrupt_level 1 The pragma should be placed before the definition of the function that is not to be duplicated. The pragma will only affect the first function whose definition follows. For example, if the function read is only ever called from main-line code when the interrupts are disabled, then duplication of the function can be prevented if it is also called from an interrupt function as follows. #pragma interrupt_level 1 int read(char device) { // ... } In main-line code, this function would typically be called as follows: di(); // turn off interrupts read(IN_CH1); ei(); // re-enable interrupts The level value specified indicates for which interrupt the function will not be duplicated. For mid-range devices, the level should always be 1; for PIC18 devices it can be 1 or 2 for the low- or high-priority interrupt functions, respectively. To disable duplication for both interrupt priorities, use the pragma twice to specify both levels 1 and 2. The following function will not be duplicated if it is also called from the low- and high-priority interrupt functions. #pragma interrupt_level 1 #pragma interrupt_level 2 int timestwo(int a) { return a * 2; }
You can also check the XC8 stack model.
https://microchipdeveloper.com/xc8:duplicated-functions

multiplication in an interrupt handler!

Sure, if the software is interrupt bound (packet processing I/O in an ISR) and we have vectored high/low level interrupt on a 8-bit processor with shadow registers.
 

Thread Starter

joeyd999

Joined Jun 6, 2011
6,333
multiplication in an interrupt handler!

Sure, if the software is interrupt bound (packet processing I/O in an ISR) and we have vectored high/low level interrupt on a 8-bit processor with shadow registers.
I just want to write an efficient state decoder. It shouldn't take more than 8 or so instruction cycles!
 

ApacheKid

Joined Jan 12, 2015
1,762
I just want to write an efficient state decoder. It shouldn't take more than 8 or so instruction cycles!
Isn't it time to move on? surely there are alternatives to these primitive 8 bit devices in 2022? I mean I was programming 8 bit MPU's without even an assembler, in the late 70s, almost fifty years ago now!

1668717516703.png

and

1668717546896.png

Are you mass producing this gadget? what exactly are the parameters you're working within? what are your constraints? really, I'm very curious.
 
Last edited:

nsaspook

Joined Aug 27, 2009
16,336
Those primitive 8 bit devices are a huge market that some, unfamiliar with today's 8-bit processors in embedded products, seem to be unable to understand why 8-bit controllers even exists.

https://www.imarcgroup.com/8-bit-microcontroller-market#:~:text=The global 8-bit microcontroller,6.70% during 2022-2027.
The global 8-bit microcontroller market reached a value of US$ 7.14 Billion in 2021. Looking forward, IMARC Group expects the market to reach a value of US$ 10.73 Billion by 2027, exhibiting a CAGR of 6.70% during 2022-2027.
https://www.embedded.com/why-wont-the-8-bit-microcontroller-die/
Why Won’t the 8-bit Microcontroller Die?
Why won’t 8-bit microcontrollers die? More than a decade ago, Jack Ganssle published a rant, 8-bits is dead, where he concluded, “as high-end processors drop in price, those at the bottom get cheaper too, which opens up new markets that could never have afforded semiconductor intelligence.” While many teams focus on the cutting edge of the industry, there are just as many, if not more, opportunities where simple and cheap microcontrollers fit the need!

While many of us who work at the cutting-edge wonder why the 8-bit microcontroller won’t die, the truth is that 8-bit microcontrollers probably dominate our industry. We’re just too busy chasing the latest and greatest flashy, shiny microcontroller or buying into the marketing hype to recognize the capabilities of these industry workhorses.
 

ApacheKid

Joined Jan 12, 2015
1,762
Those primitive 8 bit devices are a huge market that some, unfamiliar with today's 8-bit processors in embedded products, seem to be unable to understand why 8-bit controllers even exists.

https://www.imarcgroup.com/8-bit-microcontroller-market#:~:text=The global 8-bit microcontroller,6.70% during 2022-2027.

https://www.embedded.com/why-wont-the-8-bit-microcontroller-die/
Why Won’t the 8-bit Microcontroller Die?
Well I never argued that there wasn't a market, just suggesting that this might be time for a change given the hassle he's clearly experiencing trying to get anything acceptable done in C on that device.

In fact with his clear expertise in these devices and his obvious skill with assembler, he should consider writing a C compiler that can do the kinds of things he'd like, might be a market for such a thing...
 
Last edited:

Thread Starter

joeyd999

Joined Jun 6, 2011
6,333
We’re just too busy chasing the latest and greatest flashy, shiny microcontroller or buying into the marketing hype to recognize the capabilities of these industry workhorses.
Yup. And I now understand why many believe these devices are so underpowered: C cuts the native performance by at least a factor of 10 in all of code size, execution speed, and power consumption.
 

WBahn

Joined Mar 31, 2012
32,919
Yup. And I now understand why many believe these devices are so underpowered: C cuts the native performance by at least a factor of 10 in all of code size, execution speed, and power consumption.
Which is probably less than it would get cut by if nearly any other high-level language were used. That's the trade off in wanting to use a high-level language -- you get to accept the overhead associated with the virtual machine model it implements.
 

Thread Starter

joeyd999

Joined Jun 6, 2011
6,333
Which is probably less than it would get cut by if nearly any other high-level language were used. That's the trade off in wanting to use a high-level language -- you get to accept the overhead associated with the virtual machine model it implements.
I don't want to use a high-level language.
 

ApacheKid

Joined Jan 12, 2015
1,762
Yup. And I now understand why many believe these devices are so underpowered: C cuts the native performance by at least a factor of 10 in all of code size, execution speed, and power consumption.
Sorry, but this just does not sound right, I've never seen that kind of disparity before on any platform, like the machine is spending 90% of its time doing unnecessary work? I've never seen this in mainframes, minicomputers, or 8 or 16 or 32 bit or 64 bit microprocessors.

Are you possibly, unwittingly, building this in debug mode? if so, many (even rudimentary) optimizations are skipped so that the generated code can easily be made to correspond with the source code.

Contact Microchip and tell them that their compiler generates code that consumes ten times the CPU resources that get consumed if the code is written in assembler, it sounds outlandish.

What is the problem? let me ask you please, what is the shortest C code you can write that proves the ten times slower claim? Let me see the code and the generated assembler and then your assembler equivalent please.

And as for "many believe these devices are so underpowered" they are! An 8 bit device manipulates 8 bits at a time whereas a 32 bit device moves four times as many bytes at a time, of course 8 bits can't compare with 32, why do you think they even make 32 and 64 bit processors?
 
Last edited:

ApacheKid

Joined Jan 12, 2015
1,762
I just stumbled upon this:

Many years ago I was teaching someone to program in C. The exercise was to rotate a graphic through 90 degrees. He came back with a solution that took several minutes to complete, mainly because he was using multiplies and divides etc.

I showed him how to recast the problem using bit shifts, and the time to process came down to about 30 seconds on the non-optimizing compiler he had.

I had just got an optimizing compiler and the same code rotated the graphic in < 5 seconds. I looked at the assembly code that the compiler was generating, and from what I saw decided there and then that my days of writing assembler were over.
From here.

So, something to note, to at least bear in mind, is that the way someone expresses something in C might be inefficient, that is they might be able to express in assembler very well but express it - comparatively - poorly in C.
 
Last edited:

Thread Starter

joeyd999

Joined Jun 6, 2011
6,333
So, something to note, to at least bear in mind, is that the way someone expresses something in C might be inefficient, that is they might be able to express in assembler very well but express it - comparatively - poorly in C.
Why (TF!) do you think I am spending sooooooo much time trying to discover constructs in C that are comparable in code size and execution speed in .asm? I know the C compiler is stupid, and that I have to babysit it to get the smallest, most efficient code. And, even then, I am not getting close in either to what I can do in .asm, for all but the most trivial constructs.
 

MrSalts

Joined Apr 2, 2020
2,767
Sorry, but this just does not sound right, I've never seen that kind of disparity before on any platform, like the machine is spending 90% of its time doing unnecessary work? I've never seen this in mainframes, minicomputers, or 8 or 16 or 32 bit or 64 bit microprocessors.

Are you possibly, unwittingly, building this in debug mode? if so, many (even rudimentary) optimizations are skipped so that the generated code can easily be made to correspond with the source code.

Contact Microchip and tell them that their compiler generates code that consumes ten times the CPU resources that get consumed if the code is written in assembler, it sounds outlandish.

What is the problem? let me ask you please, what is the shortest C code you can write that proves the ten times slower claim? Let me see the code and the generated assembler and then your assembler equivalent please.

And as for "many believe these devices are so underpowered" they are! An 8 bit device manipulates 8 bits at a time whereas a 32 bit device moves four times as many bytes at a time, of course 8 bits can't compare with 32, why do you think they even make 32 and 64 bit processors?
I can't imagine someone being so willing to waste time over a $39 license unlock of the PRO compiler. Also, for the savings of $1 or $2 that he is not going to a 16 or 32-bit chip. I can only conclude that the perceived learning curve is too steep to make another step. And the margins are too slim to deviate from the plan. But I'll just keep watching this slow speed accident. Post 174 175
 
Last edited:

Thread Starter

joeyd999

Joined Jun 6, 2011
6,333
A challenge, @ApacheKid:

Here's a routine I use for 24 bit binary to packed BCD conversion in PIC18 mpasm. It uses 31 words of program space (62 bytes), and requires 571 instruction cycles including the call and the return. This is equivalent to 35.6875uS execution time with 64Mhz clock (typically the fastest clock available on a PIC18).

Write me your best code in C. You and/or I can compile it in any way you wish. And we'll compare results.

Code:
;****************************************************
;** BIN2BCD -- Convert temp2:0 to BCD in bcd[3:0]    **
;****************************************************

bin2bcd  clrf      bcd         ;preclear BCD result
         clrf      bcd+1
         clrf      bcd+2
         clrf      bcd+3
     
         movlw     24          ;24 bits to convert
         movwf     bitcnt

         bra       nodbl       ;no doubling required first pass
     
;double current bcd value

b2blp    movf      bcd,w
         addwf     bcd,w
         daw                   ;don't forget to decimal adjust!
         movwf     bcd

         movf      bcd+1,w
         addwfc    bcd+1,w
         daw
         movwf     bcd+1

         movf      bcd+2,w
         addwfc    bcd+2,w
         daw
         movwf     bcd+2

         movf      bcd+3,w
         addwfc    bcd+3,w
         daw
         movwf     bcd+3

nodbl    rlcf      temp,f       ;rotate out the high bit
         rlcf      temp+1,f
         rlcf      temp+2,f
         btfsc     status,c
         incf      bcd,f        ;if 1, add it in to the bcd value
     
         decfsz    bitcnt,f
         bra       b2blp
 
         return

;**********  End BIN2BCD  ***************
 

MrSalts

Joined Apr 2, 2020
2,767
A challenge, @ApacheKid:

Here's a routine I use for 24 bit binary to packed BCD conversion in PIC18 mpasm. It uses 31 words of program space (62 bytes), and requires 571 instruction cycles including the call and the return. This is equivalent to 35.6875uS execution time with 64Mhz clock (typically the fastest clock available on a PIC18).

Write me your best code in C. You and/or I can compile it in any way you wish. And we'll compare results.

Code:
;****************************************************
;** BIN2BCD -- Convert temp2:0 to BCD in bcd[3:0]    **
;****************************************************

bin2bcd  clrf      bcd         ;preclear BCD result
         clrf      bcd+1
         clrf      bcd+2
         clrf      bcd+3
     
         movlw     24          ;24 bits to convert
         movwf     bitcnt

         bra       nodbl       ;no doubling required first pass
     
;double current bcd value

b2blp    movf      bcd,w
         addwf     bcd,w
         daw                   ;don't forget to decimal adjust!
         movwf     bcd

         movf      bcd+1,w
         addwfc    bcd+1,w
         daw
         movwf     bcd+1

         movf      bcd+2,w
         addwfc    bcd+2,w
         daw
         movwf     bcd+2

         movf      bcd+3,w
         addwfc    bcd+3,w
         daw
         movwf     bcd+3

nodbl    rlcf      temp,f       ;rotate out the high bit
         rlcf      temp+1,f
         rlcf      temp+2,f
         btfsc     status,c
         incf      bcd,f        ;if 1, add it in to the bcd value
     
         decfsz    bitcnt,f
         bra       b2blp
 
         return

;**********  End BIN2BCD  ***************
I think [A) you are missing the point of a higher level programming language. (B) have discovered why people still program in asm and you should stick with ASM if your list of ASM-related benefits outweigh the benefits you get from C. C isnt the right decision for everyone.

This thread reminds me of a contractor coming home from the worksite and saying, "a Rolls Royce a terrible car, I spent the last two hours trying to get my lumber and portable Table saw into the trunk. If fit perfectly well in my F150."
 

ApacheKid

Joined Jan 12, 2015
1,762
A challenge, @ApacheKid:

Here's a routine I use for 24 bit binary to packed BCD conversion in PIC18 mpasm. It uses 31 words of program space (62 bytes), and requires 571 instruction cycles including the call and the return. This is equivalent to 35.6875uS execution time with 64Mhz clock (typically the fastest clock available on a PIC18).

Write me your best code in C. You and/or I can compile it in any way you wish. And we'll compare results.

Code:
;****************************************************
;** BIN2BCD -- Convert temp2:0 to BCD in bcd[3:0]    **
;****************************************************

bin2bcd  clrf      bcd         ;preclear BCD result
         clrf      bcd+1
         clrf      bcd+2
         clrf      bcd+3

         movlw     24          ;24 bits to convert
         movwf     bitcnt

         bra       nodbl       ;no doubling required first pass

;double current bcd value

b2blp    movf      bcd,w
         addwf     bcd,w
         daw                   ;don't forget to decimal adjust!
         movwf     bcd

         movf      bcd+1,w
         addwfc    bcd+1,w
         daw
         movwf     bcd+1

         movf      bcd+2,w
         addwfc    bcd+2,w
         daw
         movwf     bcd+2

         movf      bcd+3,w
         addwfc    bcd+3,w
         daw
         movwf     bcd+3

nodbl    rlcf      temp,f       ;rotate out the high bit
         rlcf      temp+1,f
         rlcf      temp+2,f
         btfsc     status,c
         incf      bcd,f        ;if 1, add it in to the bcd value

         decfsz    bitcnt,f
         bra       b2blp

         return

;**********  End BIN2BCD  ***************
Are you now challenging me sir? I previously asked you to show me the C source code that - you allege - produces code (for the 8 bit PIC you use) that consumes 10 times the CPU time that a hand written assembler version (of same problem) consumes.

So before I spend any time looking at your routine, I'd appreciate an answer to my question, please note I have not said you are wrong, I have only expressed skepticism, it is what I do.

I'd like to see the C source code you used to evaluate the performance costs that you cited, in order to say code X reduces performance compared to Y, by ten times I really do need to see X if you seek my support.

Now I'm sure that what you said was probably, partially, anecdotal not literal, but you do need some objective basis, a scientific basis for that claim, so can you show us the C code that proves the compiler you are using yields code that is ten times slower than what you can write by hand? either do that or retract the claim sir.
 
Last edited:

Thread Starter

joeyd999

Joined Jun 6, 2011
6,333
I'd like to see the C source code you used to evaluate the performance costs that you cited, in order to say code X reduces performance compared to Y, by ten times I really do need to see X if you seek my support.
First, I never sought your support. You jumped into my thread to offer your general solution of "use a bigger part".

Second, I am not ignoring your request: I am busy writing code, figuring out optimizations, compiling, and testing -- trying to identify where code and instruction cycles are being unnecessarily consumed -- all the while posting updates here because (I may be wrong) some may find my efforts interesting. I also noticed some find it annoying. Why they just don't ignore the thread I will never understand.

Third, let's be clear: My current effort is mostly an evaluation of effectiveness of coding C on PIC18F silicon with respect to the type of work I do. Yes, it is a real project with a real commercial potential at the end, so the code I am writing I am writing as if I intend to ship it some day. The only reason I am considering C is that the newer Microchip PIC18 parts are not supported under the old MPLAB, and MPLABX support for MPASM ended like 7 years ago. Other than C, my only other option is PIC_AS, which -- best I can tell so far -- is atrocious.

Fourth, I will show code here to the extent that I can (much is secret) to illustrate the inefficiencies I am discovering, in the hopes that you and others will show more efficient ways to do it. I suggested the BIN2BCD code because -- in that particular case -- efficiencies are gained by having direct access to the DAW instruction which -- AFAICT -- C does not allow. I was hoping someone would show me how to avoid the inner decimal adjust loop that is required by the C implementation of the Double Dabble algorithm.

Finally, my project -- at this moment -- stands at 12,294 bytes optimized, 10052 bytes at O level 1, and 8725 bytes at O level 2. This for functionality that I can encode in about 1K bytes in .asm.

I know where most of these extra bytes are coming from -- and I don't have time to explain now (gotta get my kid to the doc) -- but I will explain in my next post.
 

ApacheKid

Joined Jan 12, 2015
1,762
First, I never sought your support. You jumped into my thread to offer your general solution of "use a bigger part".

Second, I am not ignoring your request: I am busy writing code, figuring out optimizations, compiling, and testing -- trying to identify where code and instruction cycles are being unnecessarily consumed -- all the while posting updates here because (I may be wrong) some may find my efforts interesting. I also noticed some find it annoying. Why they just don't ignore the thread I will never understand.

Third, let's be clear: My current effort is mostly an evaluation of effectiveness of coding C on PIC18F silicon with respect to the type of work I do. Yes, it is a real project with a real commercial potential at the end, so the code I am writing I am writing as if I intend to ship it some day. The only reason I am considering C is that the newer Microchip PIC18 parts are not supported under the old MPLAB, and MPLABX support for MPASM ended like 7 years ago. Other than C, my only other option is PIC_AS, which -- best I can tell so far -- is atrocious.

Fourth, I will show code here to the extent that I can (much is secret) to illustrate the inefficiencies I am discovering, in the hopes that you and others will show more efficient ways to do it. I suggested the BIN2BCD code because -- in that particular case -- efficiencies are gained by having direct access to the DAW instruction which -- AFAICT -- C does not allow. I was hoping someone would show me how to avoid the inner decimal adjust loop that is required by the C implementation of the Double Dabble algorithm.

Finally, my project -- at this moment -- stands at 12,294 bytes optimized, 10052 bytes at O level 1, and 8725 bytes at O level 2. This for functionality that I can encode in about 1K bytes in .asm.

I know where most of these extra bytes are coming from -- and I don't have time to explain now (gotta get my kid to the doc) -- but I will explain in my next post.
I see, so you have no actual example of C code we can look at that that when compiled using the compiler you are working with, runs ten times slower than a purportedly equivalent algorithm coded in assembler, very well, that answers that question, you have no evidence to share.
 
Top