Another idea: Render the window object with walls around the window and write a shader that sets the wall fragments to transparent but still writes their depth values into the depth buffer. Then clean the color buffer and render the scene.
One question on this, wouldn't this also prevent the background image (i.e. the camera stream) from drawing? Since the walls have to be closer than the video stream, drawing them into the depth buffer would prevent the image to be seen as well, wouldn't it?
Also, I can understand how to make a shader that paints the walls transparent, but how do I still draw them into the depth buffer?
Wouldn't all this require several rendering passes?