GPU-based particle system

Introduction

I initially didn't know what I wanted to do as my specialization project. I considered improving our shadows, making them soft through some well-known implementation. But then I thought back to what I enjoyed the most during the making of Aurora Engine and realized that I had a lot of fun making a particle system and gradually adding new features to it, seeing it used and seeing how it performed and thinking about how I could improve it.


I wanted something that would challenge me and make me have to learn more about DirectX as a result. I had previously worked with visual effect graph to make cool visual effects. Particle systems being almost too perfect a case for using the GPU rather than the CPU I wanted to make a new GPU particle system to use in our projects because I knew it would not only look really good but it would also force me to delve deeper into some of the more obscured (for me) parts of the graphics pipeline.

I had grand plans at first, I wanted to have a GPU particle-system, an editor where some settings could be set and the ability to spawn particles in the shape of a mesh. My stretch goals included force fields, as I had found some good source material I could work with and also depth buffer collision for particles. I was unsure if I would be able to reach all of my goals, but I was okay with cutting a lot, I already had plans to cut everything past mesh spawn shape, and even the editor if need be. As it turns out, due to a lot of difficulty with debugging the GPU pipeline I didn't get nearly as far as I wanted.


Another thing I failed to take into account was how much time it would take me to create a website.


Starting out

First off, I had to find some sources explaining GPU-particle systems, how to implement them and how they work. I found quite a few sources, but they used slightly different implementations. Then I wanted do was learn about the new features I had to utilize now. Those mainly being new buffers called structured buffers and compute shaders. What I started off doing was just reading about how they worked and getting an understanding for how unordered access views work and what they are. This led to me also developing a better understanding for shader resource views which I had only used for textures earlier.


After this I needed a way to add support for structured buffers and compute shaders for our engine. A bit later I also realized I needed support for append/consume buffers, staging buffers and a way to initialize a buffer with data. This turned out to be a little tricky, render pipeline state objects weren't made with compute shaders in mind so there needed to be an entire new structure with compute render pipeline state objects instead.

After some work together with Isak Morand-Holmqvist we had now implemented very basic structured buffers and a test compute shader that was just supposed to tint everything a color and swap between them which was sent through a structured buffer. This first test didn't test out any append/consume buffers as there was no support for them yet. But at least it was a partial success, which took much longer than I had anticipated or wanted.


Below is a .gif from a test level from Atlantean Ascent where we tested out the test compute shader a structured buffer. Please note the seizure warning.

Seizure warning
Working with buffers and unordered access views

When the most basic structure was in place I started writing the new particle system component, adding it to our inspector so I could add it during runtime to an object. Then I had to decide on what sort of an implementation I wanted to use. It took quite a while to research different implementations, to find one I felt comfortable with and that I could understand. As a result I didn't progress very far during the first phase.


Somewhere in the middle of all of this I started to discover indirect drawing and dispatching. Indirect drawing is helpful partially because it can allow the GPU to be more independent, allowing the GPU to control how draw is called. The particle system already generates particles, so I can then generate the vertex information from the particle data, so I can skip setting the vertex and index buffers entirely.


After finally deciding to use a particle pool, a dead list buffer, two alive lists which I swap and somewhere also deciding to learn about indirect drawing and dispatching I could finally start implementing it all.


Problems with getting it all to work

I ran into lots of issues once I started to try and render my system. I got really stuck trying to figure out what was going wrong and understanding what to do when. I felt a bit overwhelmed.


I continually kept on running into problems that the structure for structured buffers that I had set up earlier didn't work, and the Render Hardware Interface class we used wasn't ready. At first I wanted to avoid indirect dispatching because I didn't feel I understood it enough, but eventually I decided I wanted to try to use it still.


I found myself rewriting a lot of the compute shaders and restructuring. I considered trying to enable GPU disassembly in visual studio, because I remember having it working at some point, but it seemed like it might take too much time, but it would certainly help with debugging the emit and simulate shaders. Eventually I found that the best way for me to test out my system was to make a graphics command that does everything in one single go to emit, simulate and render one emitter. I had run into a wall.


I noticed that the staging buffer didn't work properly, when I used it for debugging by reading from a counter. I also realized that I needed to manually init the deadBuffer with data in a init particles compute shader. Once I had figured this out, time was running short. Right now I'm stuck not knowing exactly why the compute shaders aren't properly appending particles to the alive lists, which is what I'm trying to figure out.


One particularly tricky part is ensuring that your compute shaders don't append or consume too much from the buffers you have set up. Appends when the buffer is already max size are discarded, while consuming the last element results in undefined behavior. It took me some time to understand this.


The frustrating thing is that with these systems it's really all or nothing. Until I can get the particle system to fill the lists and render on screen I don't really have much to show, which is incredibly disappointing, disheartening and frustrating.


Unfortunately this is simply the nature of graphics programming, but I feel like I'm close to the peak on the curve of frustration, with a little bit more work I will get it working and then I think it will go much faster to implement additional features, once everything is in place.


Moving forward

So this is where I am at right now, strangely I have mixed feelings of disappointment, frustration and anger. Yet at the stage I am at now I know that I am close on a break-through as long as I don't melt-down first. My plans for moving forward is trying to start the alive lists already populated with living particles to debug. Checking all constant buffers and double checking the all the counters for my particle related buffers. I might try to forcibly always spawn one particle per frame to in order to find where the problem lies currently. Another option to debug is RenderDoc, although I have found RenderDoc to be less useful for compute shaders, but that might just be my inexperience talking. The plan is still to have this working in a week or two to use in the eight final project! I will not give up and hopefully I will be able to update this page with some good news.



Retrospective

Looking back, I wish I had selected an implementation of a particle system that used fewer buffers, because it would lead to fewer points of failure, which would be much easier to debug. More realistic planning would probably help keep my morale up. Some previous knowledge or a course about compute shaders, buffers (structure, append/consume) and indirect rendering would certainly have helped, but a reason I wanted to try my hand at this was in order to really challenge myself.



Here are the buffers used for the simulation shader (left) and the emit shader buffers (right). As I said, it would probably have been easier to choose an implementation using fewer points of failure.


Despite the disappointing result thus far, I am not at all regretful of having chosen this as my specialization, because when I get it working I know I will have a lot of fun with expanding it, adding features, restructuring, optimizing and building an editor around it that perhaps my technical artists can use.


I think just constructing a simpler system from scratch, which didn't fit very well into the structure of the engine would have been better, to improve my own morale by seeing something actually working. The idea here would be to have a system hard-coded to always render at origo, which would have cut down the time I spent making sure I somewhat followed the standard we use for the engine.

Make it work, make it fast, make it right!


When I get some time, I will work to complete this project.