True, good points...Actually, that "magic" part is reasonably well explained in optical physics.
Optical waves are dispersed in all directions from a single point source. When they encounter a medium of different refractive index such as the lens they bend towards a different direction in such a manner that they come to a focal point on the retina of the eye or the sensor in the camera.
In a pin-hole camera where there is no lens required, optical waves travel along straight lines. An image appears on the image plane wherever the plane is situated. There is no unique focal plane or focal distance.
Hence for the purpose of your inquiry, you can eliminate the lens and consider the case of a pin-hole camera which has a tiny aperture.
But in the case of multiple CCD’s in close proximity, there’s sufficient quantity of photons to activate n sensors that are there with essentially the same image, which is baffling...