While still doing the calculations in floating point, i'm sending the results converted to fixed point to the GPU now. While this makes no difference at all if you do it one by one, it largely improves performance when doing it in batch, i.e. instead of transfering float[]s into native memory, i'm now using int[]s. I really don't understand why this is the case. I would have expected the floats to be copied without any changes from VM to native memory but that doesn't seem to be the case somehow.
One ninja animates @18fps now, two at @10, three @6 on my phone now.
Edit: The increase doesn't come from rendering the scene itself, because static geometry doesn't improve that much (not even 10%), so it has to be the data transfer. My MD2 test case now runs at 54 fps instead of 26 and my blitting test at 44 instead of 12. Considering that i started with 4fps for that MD2 with the initial release, this is not too bad...