getWorldTransformation() gets you the combined transformation matrix to transform the object into world space, i.e. the space where you are moving the camera in.

The camera's transformation means actually the world's transformation into camera space (that's a space where the camera is always located at 0,0,0 and looking down the z-axis).

When applying the inverse world transformation to the camera, you are actually reverting the world transformation...it's twisted...difficult to explain...

Imagine the object's center at (0,0,0). You now apply a rotation to it and it's now located at (2,1,0). Assume that this already is it's final transformation to world space. So for the camera, you now need a transformation that brings this point in world space to (0,0,0) (which is the camera's location in camera space). But that's exactly what we already did the other way round, so we just have to take the inverse of what made (0,0,0)->(2,1,0) and we have the desired (2,1,0)->(0,0,0).

Because the camera has no combined tranformation matrix, you have to use the rotational part of the resulting matrix only and do the rest with setPosition(). You could easily derive the value for setPosition() from the inverted matrix too (by using invert() instead of invert3x3()) if you want to, but it isn't needed and most likely slower due to the more complex inversion of a 4x4 matrix.