In reply to raft's question about octrees, here are some performance statistics taken on my main machine (Core2 Quad@3Ghz, Radeon HD 4870, Vista, Java 6) displaying a typical FPS-view of a Quake3 level. Using other geometry, another view and another machine, the results can be completely different. Anyway:
Default pipeline:
no octree: 305fps
coarse octree: 300fps
finer octree: 340fps
Compiled pipeline:
no octree: 950fps
coarse octree: 450fps
finer octree: 305fps
You can see that the normal pipeline benefits from the finer octree, because it takes the geometry load away from the pipeline very early. Using the octree on the normal pipeline causes additional overhead for processing the tree, but doesn't require more state changes in the pipeline than without it. The coarse tree doesn't help, because it can't discard enough geometry to make up for the additional processing needed.
The compiled pipeline on the other hand is the complete opposite. The hardware has no problems with processing the complete geometry of a Quake3-level each frame, but using the octree causes more render calls and more state changes (because of smaller render batchs), slowing it down.
Keep in mind that these results are only valid for a quite low poly object displayed using a hardware with high poly throughput. When rendering a landscape or similar on an Intel onboard chipset, it may be a different story.