I can only speak for my phone/gpu (Nexus S/PowerVR SGX 540) and on that device, using shaders for everthing IS a lot slower than using the fixed function pipeline. On the other hand, you have to keep in mind that An3DBenchXL is really stressing polygon count in some tests. When running the less demanding original An3DBench, the difference is much less. I'm still looking for ways to improve this though, but i don't expect much from what i've read so far.
Concerning shadows: As said, this will require to render the depth map into a texture. This isn't supported ATM. I think, i'm going to see how adoption rate of 2.0 really is once i release it and decide based on that. The current problem with ES 2.0 ATM is (IMHO), that the API is ahead of the hardware. Even multi-texturing is MUCH slower when using shaders than it is when using the fixed function pipeline. At least on the SGX 540...i'm still looking for information on how many shader pipelines this sucker really has...