This isn't quite true. Cores sharing a die potentially have much higher bandwidth communicating with each other than package-to-package connections. For parallelized processes this can mean considerably more throughput.My point is really that there is no need fo greater sjngle-package processing power except the reduction of package size, which is all a marketing goal.
Making the package smaller raises the power density and increases the internal temperature, reducing the reliability. It also makes any system more difficult to service, thus increasing the amount of electronic waste materials.
So really, the results are simply not worth the effort.
A much greater gain in performance would come from more efficient code. But creating more efficient code will require levels of both skill and talent presently not available.
Reduction in package size also has practical implications for portable, implantable, wearable, and otherwise embedded devices. When you are trying to make things like smart watches, smart glasses, medical implants, and the like, package size is completely relevant. Tiny packages with low voltage, high efficiency processors are an enabling technology.