A 4 x 4 multiply will have 256 possible results with 8-bit result.
All you need is a look-up table of 256 bytes, that is, 8-bit address and 8-bit word size.
There are many 256-byte PROMS available and execution speed would be very fast (10ns or faster).
For fun, attached is an 8x8 hardware multiplier written in Abel for a PLD. I wrote this back in the mid '90s. Don't ask me how it works. I don't remember. IIRC, it computes partial products and sums them together.