# Composite (complex) Video 100% Straight C on a One-Dollar MCU!

#### MMcLaren

Joined Feb 14, 2010
853
Just to confirm my suspicions, that ASM could do close to double res -- how many asm instructions required to do these
Rich (BB code):
PORTB = cache[0][segmentColumn] & 128;
C instructions in assembly?

Bit mask / compare a bit to a byte and write to port.
You don't necessarily have to resort to assembly language to improve performance. For example, if you use absolute indexes instead of variable indexes then getting a byte from the array takes only one cycle (a simple "movf" instruction will be generated by the compiler)... Another example, if I may? Get rid of the OR gates and connect your output to RB7. Use PORTB as a variable, that is, write the data byte from the array to PORTB and then simply add PORTB to PORTB to sequentially send all eight bits out on RB7. And so instead of spending 7 or more cycles to shift each bit out, as you're doing in your program now, you spend only two cycles per bit;

Rich (BB code):
  //
// send cache[line 0][byte 0..4]
//
portb = cache[0][0];          // line 0, byte 0    (2 cycles)
portb += portb;               // send pixel 01     (2 cycles)
portb += portb;               // send pixel 02     (2 cycles)
portb += portb;               // send pixel 03     (2 cycles)
portb += portb;               // send pixel 04     (2 cycles)
portb += portb;               // send pixel 05     (2 cycles)
portb += portb;               // send pixel 06     (2 cycles)
portb += portb;               // send pixel 07     (2 cycles)
portb = cache[0][1];          // line 0, byte 1    (2 cycles)
portb += portb;               // send pixel 09     (2 cycles)
~~~~
As you can see, using assembler is not essential, but knowing assembler can be a big help (you can look at the compiler output and tweak the C instructions to get the desired assembler code). In this case, the compiler should generate the same code that I would write in assembler. So you can triple your bandwidth and get rid of the silly OR gates at the cost of lengthier but faster program code.

Cheerful regards, Mike

Last edited:

#### T.Jackson

Joined Nov 22, 2011
328
You're right.

Rich (BB code):
PORTB = cache[0][segmentColumn]; PORTB += PORTB;
PORTB = cache[1][segmentColumn]; PORTB += PORTB;
PORTB = cache[2][segmentColumn]; PORTB += PORTB;
PORTB = cache[3][segmentColumn]; PORTB += PORTB;

Rich (BB code):
PORTB = cache[0][segmentColumn] & 128;
PORTB = cache[1][segmentColumn] & 64;
PORTB = cache[2][segmentColumn] & 32;
PORTB = cache[3][segmentColumn] & 16;

Your approach shaves 1uS OFF. 250nS per draw.

#### Attachments

• 26 KB Views: 43
• 26.6 KB Views: 38
Last edited:

#### T.Jackson

Joined Nov 22, 2011
328
What a waste of OR gates then.

Gotta give me credit for being original though.

Last edited by a moderator:

#### SgtWookie

Joined Jul 17, 2007
22,221
You could also use a logical shift ' portb = portb >> 1 ' effecting a *2 or portb += portb. I don't know offhand if it would take the same number of machine cycles for mikroe C

portb += portb should translate to an ADDWF; 1 word and 1 machine cycle.
portb = portb >> 1 should translate to an RRF; 1 word and 1 machine cycle.

Might be worthwhile to see if there is a difference.

#### T.Jackson

Joined Nov 22, 2011
328
This:
Rich (BB code):
PORTB = cache[0][segmentColumn]; PORTB += PORTB;
PORTB = cache[1][segmentColumn]; PORTB += PORTB;
PORTB = cache[2][segmentColumn]; PORTB += PORTB;
PORTB = cache[3][segmentColumn]; PORTB += PORTB;
Always seems to write BCD 254 to PORTB though.

I must be missing something

Last edited:

#### T.Jackson

Joined Nov 22, 2011
328
I have a feeling that the OR gates will be beneficial regardless of what road is taken.

#### T.Jackson

Joined Nov 22, 2011
328
@MMcLaren:

You're thinking in terms of bytes I think is the problem. We are actually dealing with 'bits' here.

#### SgtWookie

Joined Jul 17, 2007
22,221
Rich (BB code):
unsigned short cache[5][16];
...
PORTB = cache[0][segmentColumn]; PORTB += PORTB;
PORTB = cache[1][segmentColumn]; PORTB += PORTB;
PORTB = cache[2][segmentColumn]; PORTB += PORTB;
PORTB = cache[3][segmentColumn]; PORTB += PORTB;
Ahhh - how do you think that these are equivalent?
You're forgetting the adds or shifts in between.
Rich (BB code):
    PORTB = cache[0][segmentColumn]; Load a byte into portb
portb += portb;          // send pixel 01  by adding; shifts bits 1 pos    (2 cycles)
portb += portb;          // send pixel 02  by adding; shifts bits 1 pos   (2 cycles)
portb += portb;          // send pixel 03  by adding; shifts bits 1 pos   (2 cycles)
...
portb = portb >> 1;      // send pixel 06  by shifting (RR)  (2 cycles)
portb = portb >> 1;      // send pixel 07  by shifting (RR)  (2 cycles)
PORTB = cache[1][segmentColumn];  Load the next byte
portb += portb;          // send pixel 01  by adding; shifts bits 1 pos    (2 cycles)
The length of an unsigned short is dependent on the machine, but you're probably getting 16 bits.

Unless you really need unsigned short, why not use char? That's always 8 bits.
Or did you really need more than 8 bits?

Last edited:

#### T.Jackson

Joined Nov 22, 2011
328
You could use char. All I know is that what I have done works. That's fact. I don't claim to be a C expert but what I have done simply works, and what you're telling me else to do simply doesn't.

#### T.Jackson

Joined Nov 22, 2011
328
[
Rich (BB code):
    PORTB = cache[0][segmentColumn]; Load a byte into portb
portb += portb;          // send pixel 01  by adding; shifts bits 1 pos    (2 cycles)
portb += portb;          // send pixel 02  by adding; shifts bits 1 pos   (2 cycles)
portb += portb;          // send pixel 03  by adding; shifts bits 1 pos   (2 cycles)
...
portb = portb >> 1;      // send pixel 06  by shifting (RR)  (2 cycles)
portb = portb >> 1;      // send pixel 07  by shifting (RR)  (2 cycles)
PORTB = cache[1][segmentColumn];  Load the next byte
portb += portb;          // send pixel 01  by adding; shifts bits 1 pos    (2 cycles)
So we would need 2 loads + a shift?

#### MMcLaren

Joined Feb 14, 2010
853
@MMcLaren:

You're thinking in terms of bytes I think is the problem. We are actually dealing with 'bits' here.
I am dealing with bits. Adding a value to itself is the same as using a "clrc" (clear carry) and "rlf" (rotate left file) instruction sequence (same as what you get in C with a << 1 operand). If you're doing it right, simulator should show pattern like that below (remember, output is on the RB7 pin);

Rich (BB code):
       //    cache[0][0] = 0b11010101

portb = cache[0][0];         // output bit 7
portb += portb;              // output bit 6
portb += portb;              // output bit 5
portb += portb;              // output bit 4
portb += portb;              // output bit 3
portb += portb;              // output bit 2
portb += portb;              // output bit 1
portb += portb;              // output bit 0

portb
11010101        output the '1' on RB7
10101010        output the '1' on RB7
01010100        output the '0' on RB7
10101000        output the '1' on RB7
01010000        output the '0' on RB7
10100000        output the '1' on RB7
01000000        output the '0' on RB7
10000000        output the '1' on RB7

Last edited:

#### SgtWookie

Joined Jul 17, 2007
22,221
First you need to load a byte into portb.
Then you need to crank the 8 bits out, one bit at a time.

Just by loading the byte into portb by "portb = cache[0][0];" you've sent the first bit, right?
Then do
portb += portb; // 7 more times, or
portb = portb >> 1; // 7 more times.

#### T.Jackson

Joined Nov 22, 2011
328
I am dealing with bits. Simulator should show pattern like that below;
I don't use simulators. I use a compiler IDE at best. I did all of my University Java assignments in notepad.

#### SgtWookie

Joined Jul 17, 2007
22,221
They should actually require the same number of instructions and execute in the same amount of time (adding vs shifting). However, it would be interesting to see if there IS a difference.

#### THE_RB

Joined Feb 11, 2008
5,438
I know 2% of composite video and nothing about VGA.
25uS scan line is it?
Generally.
VGA is easier to interface in the way it has separate hsync and vsync pins. And of course colour is a breeze, with three pins for RGB.

...
I would actually like to see this project done in ASM, using THIS one-dollar part. If done in ASM, I feel that the resolution could be doubled, and it would produce a technically more accurate video signal. ...
That PIC 16F628 is REALLY not a good choice of micro, it is limited to 5MIPs and you can get a 18F series PIC that will do 16MIPs for not much more in price. And even 16MIPs will struggle to do anything that is useful...

MrChips said:
What a great idea! I never thought of it. I am working on an ARM Cortex-M4 development with huge amount of processing capabilities. It can generate the video stream and still do the other processing at the same time.

RB, I will contact you to get the VGA details. The possibilities are endless.
From memory I just downloaded some VGA info (timing diagrams etc) that were easy to find on the net. The sync is easy to generate then you just make the RGB pins go hi/lo as needed to make the pixels light up as the scan line goes across the screen.

A high power micro would be a good choice, there is a gap in the market for a product that accepts serial data, and sends VGA out to any old monitor. Ideally it would be a single chip, just connect power and xtal. And be about $15 for the chip, if it was DIP you could sell a lot of chips. So instead of hobby people using cheap LCDs with their micro projects, they would buy the VGA chip and send text or pixel data as serial, and could use an old monitor as a nice big colour display. Probably specs similar to early VGA (like a dos screen), ie text 80 char x 25 lines and pixels 640x480. 16 colour would be fine. From memory there is a commercial module for about$50 but that just too much. But a \$15 chip you could add to any project would be a killer product.

#### T.Jackson

Joined Nov 22, 2011
328
They should actually require the same number of instructions and execute in the same amount of time (adding vs shifting). However, it would be interesting to see if there IS a difference.
Rich (BB code):
PORTB = cache[0][segmentColumn];
PORTB = PORTB >> 1;
PORTB = PORTB >> 1;
PORTB = PORTB >> 1;
PORTB = PORTB >> 1;
5.8uS @20Mhz

I agree that shift out is workable.

#### John P

Joined Oct 14, 2008
1,910
With my compiler,

portb += portb;

would be a single instruction, whereas

portb = portb >> 1;

(shifting either way) would be two instructions. That's because on a PIC processor, a shift is a shift "through the carry bit", which comes in as a new top or bottom bit. The compiler always inserts an instruction to clear the carry bit before doing the shift, and I don't think you can tell it not to. So for this purpose, the addition is better.

Of course you could always put in a single line of assembly code, but the objective here is to do it all in C.

#### MMcLaren

Joined Feb 14, 2010
853
You could also use a logical shift ' portb = portb >> 1 ' effecting a *2 or portb += portb. I don't know offhand if it would take the same number of machine cycles for mikroe C

portb += portb should translate to an ADDWF; 1 word and 1 machine cycle.
portb = portb >> 1 should translate to an RRF; 1 word and 1 machine cycle.

Might be worthwhile to see if there is a difference.
Hi Sarge,

The portb = portb >> 1 instruction should actually be portb = portb << 1 or portb <<= 1 to multiply by 2. All the compilers I've seen will generate a clrc or bcf STATUS,C (clear carry) instruction followed by an rlf (rotate left file) instruction (2 cycles).

#### T.Jackson

Joined Nov 22, 2011
328
Generally.
VGA is easier to interface in the way it has separate hsync and vsync pins. And of course colour is a breeze, with three pins for RGB.
Yeah I bet. You're joking right?

Here we are dealing with a 64uS scan line, and you're suggesting a 25uS one with good res and color?

#### MrChips

Joined Oct 2, 2009
24,409
RB, shucks, you've let the cat out of the bag!
Ok, I'm taking orders. Just send me your specifications, i.e. how do you want to interface with this chip. I think I can do anything with this baby. Maybe we have to write our own graphics protocol.