It is not clear to me what you want. Does the picture you included in post #1 represent all of your requirements? How about the ability to do a parallel load?
Do you want a logic level design or something more detailed?
If I was going to do it I would use JK flip-flops.