I'm not sure we need to go down any roads... I'm pretty sure we are on the same page here....though, the fact that V-USB is a software implementation means it can also work on just about any microcontroller. There's a reason no one has ported the assembly of V-USB over to othersMost of the programs I write don't really need optimization. Also C code can be optimized. I mean download some of the zip's from that site, and examine it.
Would you like to go down that road?
Normally professional software is not so much optimized these days.
Either a more powerful chip is used, or parallel computing.
It is OK to give it a try and see if the C code can be spelled differently. It is OK to use some small assembler constructs. But doing programming work using these instruction cycle timing charts, to me these days seem to be over.
Either CPLD, more powerful chip, or parallel computing are the solutions.
Even if it is possible (on Atmel controllers) to do USB in software. I have looked at such a source code.