On second thought, it seems that it is inevitable that a 3.52W has to be wasted from either the Q1 or R1 if a constant 40mA is needed to sink to load and a differential voltage of 88V between HV and LV is present. There is no way I could "escape" since this 40mA has to pass through this 88V no matter what, hence constitute the 88X0.04=3.52W wastage.
And the so called better efficiency is for shunt regulator, when the load is off (disconnected), the full 40mA has to be dumped to the Zener diode as useless energy. But for the series method, when the load is switched off (or gradually lightened, the power dissipated by the NPN would reduce as well since current through CE is actually nothing but the load current, while assuming negligible base current.
And the so called better efficiency is for shunt regulator, when the load is off (disconnected), the full 40mA has to be dumped to the Zener diode as useless energy. But for the series method, when the load is switched off (or gradually lightened, the power dissipated by the NPN would reduce as well since current through CE is actually nothing but the load current, while assuming negligible base current.