It restructures the Object3D for optimized rendering and, depending on the compile-mode, puts the result into vertex arrays or display lists.
Humm... I thought that was done already...
Then it's obviously that caching the geometry on the gpu (or using DLs which also stores more data and optimizes it) is faster than before, as the api was sending the commands in realtime

I've coded in OpenGL before and I can tell (by personal experience, using nVidia drivers

) that DLs are faster than compiled vertex arrays and vertex buffer objects (VBOs), but that depends on the driver. As you know, a DL is created once and can't be changed anymore, so the driver can optimize it's data and command as it sees best. In the extreme, it can even issue the gl commands in a total different order.
As long as you are not artificially limiting the frame rate by using a sleep or at least Vsync, any engine will consume 100% of at least one CPU because it will pump out data as fast as possible. It's a myth that the cpu sits there doing nothing while the gpu does all the work.
I forgot that detail... OOPS
