Speeding up rendering an Object3D multiple times: "Instance-Batch-Rendering"

Redman · July 07, 2016, 05:08:42 PM

I'm currently working on an RTS project for Android using jPCT-AE. There could possibly be up to 300+ objects on screen at one time, and I found I needed to lighten the CPU processing and memory load. I am working on something I'm dubbing "Instance-Batch-Rendering" and I wanted to get thoughts and suggestions.

This method is for speeding up the process of rendering an Object3D multiple times on screen.

How it works:

Instead of cloning an Object3D, reusing the Mesh data, and sharing compiled data, this method only creates and adds 1 Object3D to the world for each unique mesh. The Object3D must always be visible by the camera for this method to work, but it doesn't matter where its positioned. It just matters that jPCT adds it to the visList for rendering.
I have created a lighter weight Instance3D object, whose main functionality is maintaining the world transform matrix (rotations, translations, scales) as well as integer ID of which model instance it represents. It's basically an Object3D, but has no mesh data and doesn't take up much memory at all. On each UI render frame, I manually create my own visList of Instance3D's based off the camera. Because my project is an RTS with a top down camera, its extremely easy to calculate which objects are roughly on screen just by their x-y(z) coordinates as it's a grid. The visLists for each type (different Meshes) of Instance get stored and calculated on every frame. If there are no visible Instances for an Object3D, it sets the Object3D visibility to false so it doesn't get picked up in the jPCT render pipeline.
This Object3D has an IRenderHook attached to it. On render, the IRenderHook.beforeRendering gets called in which it read fetches the first item in the visList. Uniforms are passed into a Shader for the camera back matrix, the camera position, as well as a uniform for the the instance's world transform matrix. The IRenderHook.repeatRendering gets called, increments the render index for the instance, and fetches the next instance if there are more. It sets the transform matrix uniform of the instance and returns true. If there are no more, it returns false.
A custom shader is added to the Object3D which uses the passed in Uniforms to calculate the instance's ModelViewMatrix and ModelViewProjectionMatrix on the GPU-side (offloading matrix calculations to the GPU).

So far, I have seen pretty good performance improvements using this method. I have not tested everything out, and this is certainly not for everybody. I have a bit more work to do on this methodology, but I wanted to gather opinions on it. If anybody is interested, I will publish the work in this Thread, but it will be a use at your own risk as it will not support everything that jPCT-AE does and you will definitely need to build your own logic to fit your project.

I will be adding support for my hybrid-GPU-bones-animated objects for this methodology so it can support animations as well as static objects.

This is only for GLES 2.0+ as it uses Shaders at its core.

EgonOlsen · July 07, 2016, 08:44:37 PM

The actual idea behind adding the repeatRendering()-method was exactly this. However, nobody has done it before (including myself)...

Good job. I'm pretty sure that it would make an interesting addition to the Wiki...

Redman · July 08, 2016, 02:38:35 PM

Sure. I also thought of another good use of repeatRendering() that I'm going to need... a particle emitter. I'll release a GLSL particle emitter next (no plan for collision detection particles).

Redman · July 11, 2016, 05:12:44 PM

I don't get much free time, but here's an update:

Refactored the code for easy integration.
Support for translation and scaling.
Tested adding a grid of 15x15 of the same 64 polygon object (225 objects totaling 14,400 poly's). On my device it held at the 60 fps cap.

Todo:

add rotation support
add animation support
move ModelViewMatrix to the software from hardware vertex shader. not the proper approach to calculate on GPU as it will be recalculated for every vertex. Speed will vary greatly based off the poly count of the model currently.

More to follow.

Redman · July 12, 2016, 10:08:16 PM

I have finished my todo list of adding rotations & animation support (and moving the MVM to software instead of vertex shader calc ). I will add something to the Wiki soon. More to follow.

Redman · July 15, 2016, 04:46:20 AM

Anybody want to try this out to see how my documentation is:
http://www.jpct.net/wiki/index.php?title=Instance_Batch_Rendering

I'm a little tired writing it

AeroShark333 · July 20, 2016, 12:39:33 AM

Hmm looks interesting but how can one detect when an InstanceObject3D is touched?

EDIT:
Another question, does the order in which the InstanceObject3D's are drawn matter?

Redman · July 21, 2016, 05:06:50 PM

QuoteHmm looks interesting but how can one detect when an InstanceObject3D is touched?

By default the mesh of an InstanceObject3D is not touched by IBR. If you do alter the mesh data, you already have access to the Mesh / Object3D as you've altered the vertices / uvs. If you do alter the mesh, note that it will affect all instances as it uses a shared mesh. If you need to get the Object3D of an InstanceObject3D type, use the InstanceManager.getObject3DOfInstanceType().

As for the transform matrices of the InstanceObject3D, by default it does a sort of lazy-transformations. Any time you do a scale/translation/rotation, it sets a protected boolean changed to true. On call of getTransformMatrix(), it will re-create the transform matrix if changed==true, otherwise it just returns the previously calculated transform matrix. getTransformMatrix() get called when rendering each instance (every frame). I have added a method to the code called hasChanged(), which will return true if there has been a translate, scale or rotation since last getTransformMatrix(). I hope this answers your question.

QuoteAnother question, does the order in which the InstanceObject3D's are drawn matter?

It can, and Egon may be able to better explain some of this. It should only matter if your Object3D uses transparency or you set the OpenGL to disable the depth test, in which case it will render in synchronous order. To put it simply, I don't believe transparency InstanceObject3D's are going to play nicely with other transparent Object3D's, other transparent InstanceObject3D types, or overlaps of the multiple IntanceObject3D of that type. I believe jPCT has software-side ordering (based off the Object3D's origin distance from camera?) when it comes to transparent objects, and the transparent objects are rendered last as transparency requires the solid pixels behind it for the different pixel write modes (add, modulate, blend, etc...). Because all Object3D's for this method are stacked a certain distance from the camera, they will be picked up by jPCT render pipeline in regards to that distance from the camera. All InstanceObject3D's that are rendered would be synchronously rendered on top of one another according to the order when added. I could add simple sorting, but it won't solve all problems, as transparency is a complex subject that requires work arounds and tweaks for speed. But it will never work great with IBR as the distance from the camera will mess up jPCT's sorting order.

Egon, did I miss anything or is anything inaccurate?

EgonOlsen · July 22, 2016, 10:19:36 AM

Sounds fine. Sorting of transparent objects is done per object based on the distance from the camera. All other objects are sorted by state by default.