Procedural Levels Generation — Puzzle Game

Pudding Entertainment
10 min readAug 26, 2020

If you have ever worked on any kind of games with levels (think of Candy Crush) then you know that manual level creation might take an enormous amount of development time. Eventually, you will want to ease this process and create an algorithm to do the job for you.

After releasing 3 games out of which 2 involved levels design I made up my mind — no more handcrafting, it doesn’t justify the time spent. And here it is, my latest released game — Hashi Flow — where all the levels were generated procedurally.

In this article I’d like to share this challenging thorny path to eventual success.

Interlude

First thing first — the problem statement. To understand what I was trying to solve let me briefly explain the basics of the game. The main concept is based on two famous puzzle types — Numberlink and Hashi. The former one contributes the mechanic of board fill whilst the latter one introduces a definition of nodes and their interconnections. A player’s goal is to fill the entire board using all the nodes respecting their counts. To grasp the idea better have a look at this 30 seconds intro:

With that said, the problem statement was identified as:

Create an algorithm that will generate unique levels with as few as possible solutions.

Few words about myself — I’m a professional self-taught Enterprise Java developer with 6+ years of experience. There are two very important keywords in the last sentence — self-taught and Enterprise. You can read this combination as Zero algorithms background. Yes, that’s right, I haven’t mastered Bubble sort or Breadth-first search at University, neither used them in practice. Nevertheless, with due diligence and good searching skills everything is possible.

Enough of introduction, let’s jump right into the topic.

Part 1. Unity and C#. First try and disappointment

I started by breaking down the problem into multiple actionable tasks:

  1. Randomly generate the game board
  2. Gather all interconnections of nodes
  3. Multiply those combinations with each other to get all possible combinations
  4. Loop through those combinations to find out only the winning ones
  5. If there are too many (imperative measurement) of winning combinations — drop the level
  6. Run till all the levels are generated

There are also a few constraints:

  • Board size ranges from 4x4 to 7x7
  • Maximum count of unoccupied cells per level = 0.75 * board_size ^2
  • Minimum count of unoccupied cells per level = 0.625 * board_size ^2
  • Node can have 1 to 4 outgoing connections depending on its position
  • Each cell has an index

As an example, for a 4x4 board the amount of unoccupied cells would be 10–12 and amount of nodes 4–6. And a 6x6 board would have 22–26 unoccupied cells with 10–14 nodes. I’ve come up with those formulas by designing a couple of levels by hand trying out different values.

Additionally, to help with development, I took a notebook and drew a 4x4 board with all the cells enumerated. A piece of paper is always my good companion in gamedev.

My notebook

If at this point you are already thinking “Brute-force computation is never going to work, there are billions of combinations to process!” — congratulations, you have a good algorithmic background.

First task on the list looked like an easy but important one, as it was supposed to build a good foundation for the rest of the project. To put it in simple words, a board is basically a specified amount of cells with some of them being occupied by nodes. Hence, to generate a game board a random generation of nodes is needed. An obvious approach with System.Random was immediately complicated by newly identified constraint — I couldn’t rely on randomness utterly because without extra checks such boards would simply end up being unsolvable. A separate handling of corners, side and inner positions was needed. Additionally, the node count itself cannot be blindly randomized as it has a dependency on the node position. To illustrate the problem in a better way take a look at the picture below:

These are added constraints:

  • If a node position is corner — its count can be 1 or 2 depending on a neighbor nodes
  • If a node position is side — its count can be 1 to 3 depending on a neighbor nodes
  • If a node position is inner — its count can be 1 to 4 depending on a neighbor nodes

At this point in order to be able to work with all those corner, side and inner positions an actual representation of a board was needed. Initially I tried to organize it in some sort of a matrix structure, but that failed fast. Searching here and there for a while, finally, I arrived at geeksforgeeks.org where the Depth First Traversal (DFT) algorithm was described. And even a perfectly fitting solution to get all the paths from one node to all the other nodes was already prepared. With a couple of adjustments my LevelGenerator script printed out something meaningful at last.

Warning! To spice things up I’m going to show some code here. Those scripts were never cleaned up and should be treated with care. Don’t judge me!

The basic idea was to extend DFT to also fill the Dictionary of all connections from the given node. You can see a pinch of Object-oriented mindset, as there is a separate POJO inner class Connection created. This code worked perfectly and quite fast — in less than a second for the 4x4 board it produced ~80–170 values.

With happiness in heart and confidence in mind I continued walking this way forward into inevitable failure.

The next task was to take all those generated paths and “multiply” them with each other to gather a set of all possible combinations of inter-connections. With the lack of algorithmic background I had nothing to do but search for an answer, which was right around the corner — Cartesian product.

For those who never worked with it, here is a brief explanation. Cartesian product is a way to find all combinations between given data sets. Let’s say you have two sets: [A, B, C] and [1, 2]. Applying Cartesian product to those two sets will result in { [A1], [B1], [C1], [A2], [B2], [C2] }. It is fair to say that with N amount of incoming sets m the resulting set would be of size m0.size * m1.size * … * mN.size. Which is A LOT OF data. Even for such a small level setup as I had for a 4x4 board, the number of entries to be processed easily crossed 10 millions.

I ended up with the following implementation of Cartesian product. Note that this script was shorten and parts you have seen in the previous version of the script were deliberately removed to make it shorter :

After the script was finished and attached to a GameObject on a scene in Unity it took around five minutes for it to actually generate a single level… But it worked! At that point I’ve already spent around one week figuring out all the bits and pieces, that was indeed a great progress. And who cares if that would take approximately 20 hours to generate 100 levels needed for 4x4 game mode? That’s how I was thinking and, man, was I wrong!

You can probably predict what happened next. I decided to start the generation in the background while doing some house duties. When I came back to check on it… my laptop was already frozen. Something wasn’t right. I restarted the system and started the process all over again. The system froze one more time. Repeating the previous steps, this time I opened a Task manager to monitor OS vitals and finally realized the problem — my laptop doesn’t have enough RAM to process this amount of data! At first I was shocked and devastated as apparently all the work was in vain. But then suddenly there was a light at the end of a tunnel — what if the whole level generation was written in Java, my primary language? After all, Unity has its own memory management that a developer has a little control of, plus the whole game engine UI. Whilst in Java one can create UI-less console application applying a couple of memory optimization techniques. We are all aware of that common developer pitfall: there is obviously nothing wrong with the taken approach, just the tool is not good enough, so let’s use another one!

Read on to find out if Java saved my little project from a complete disaster (spoiler alert: it did, but not from the first try).

Part 2. Java. New hopes… and disappointment again

Working in your preferred language always feels like being home after a long business trip. Upon opening Intellij IDEA all my hopes arose again. Before anything else I rewrote the code from C# to Java, which didn’t take long. On the first launch immediately and expectedly it threw OutOfMemoryError. That was the time to roll my sleeves up and apply various optimization techniques I could come up with, namely:

  • Own implementation of Cartesian product has been replaced with Lists.cartesianProduct from Guava library. It works significantly faster.
  • Multithreading with Callable
  • System.gc() call
  • Optimization of collections usage

You can look up the final version here:

At this point you might be asking about the code for the level generation itself, as till now I haven’t shown that part yet. Truth to be told, I didn’t retain a C# version, only the Java one (which was basically the copy of it). Check it out here:

Essentially, it is the most straight-forward implementation of those constraints from the previous section.

After all of the optimizations applied my little LevelGenerator was able to produce 100 levels for the 4x4 board in under one and a half hours. That was it, I thought, success has finally come and the rest would be easy — “just” run the same code for boards of sizes 5x5 to 7x7. I started with a 5x5 board to face nothing but another big failure — there was not even a single level generated! Looking into the class you will find this piece of code there:

private Set<List<Connection>> ProcessResults() {
BigInteger expectedCombinations = BigInteger.ONE;
for (List<Connection> value : connectionsMapping.values()) {
expectedCombinations = expectedCombinations.multiply(BigInteger.valueOf(value.size()));
}
System.out.println("expected combinations count " + expectedCombinations);
if (expectedCombinations.compareTo(BigInteger.valueOf(Integer.MAX_VALUE)) > 0) {
return new HashSet<>();
}
...
}

This check — comparison to Integer.MAX_VALUE — was added because Lists.cartesianProduct obviously cannot handle List of such big sizes. That rarely happened to the 4x4 board, but with 5x5 that was the only case. There was nothing else I could do. At this point I had just two thoughts — either drop the whole game or generate all the remaining levels manually (initial version was planned to have 600 levels). But then a miracle happened…

Part 3. Algorithms, way to success

A miracle that is also known as experience. A bit late in a process, but realization came that I actually never searched for the solution to a problem similar to mine. It turned out there is tons of useful material on the topic (like this Github project or this article). But the real eye opener was one particular answer from StackOverflow with the simple line that completely blew off all my efforts:

The number of possible combinations is pretty high to start with, but also grows ridiculously fast once you start making the board larger.

Realistically speaking, unless I magically get a supercomputer that might be able to process such huge amounts of data, the whole approach should be reworked. That was probably the harshest programming lesson I’ve learned since the beginning of my professional career — algorithms are useful outside of math-related development fields.

After reading some of the newly discovered materials about cracking the Numberlink algorithm, I ended up with the following solution:

Instead of first generating the level skeleton and then figuring out if it can be solved, in this approach a level is being generated node by node. Algorithm takes all the cells, shuffles them and one by one walks through the resulting list applying the following rules:

  • Every next node has to have a valid adjusted node. For example, if a given node is to be located on a corner position it should not have two other nodes next to it, otherwise it is unreachable. That part was described already previously.
  • Path size from one node to another cannot be longer than the length of one corner to the opposite one. Unless it is the last node, then it will be the size of remaining unoccupied cells.
  • Every next node should have a connection with the existing one. The same code for the DFT algorithm is reused to find a matching connection. All occupied cells are excluded from the data set.
  • Once the board is filled, count for each node is calculated as the amount of incoming connections.
  • Generated level is checked for uniqueness

The resulting algorithm works satisfactory fast — 100 levels for 6x6 board are generated in 30 seconds, for 7x7 board in around 5 minutes. The quality of generated levels is relatively good, even though mostly it is not a single solution per level. From the timing It is rather predictable that the bigger boards generation will take hours, but that’s the future problem.

Summary

Procedural level generation has been in games since forever, but still remains a difficult task to implement correctly on the first try. On the flip side, the gained advantages will outweigh all the development efforts as that will allow to create interesting content without extra hassle. Studying algorithms will undoubtedly help to come up with the proper solution faster, but it is possible to have it even without a strong algorithmic background.

Thanks for reading, I hope you got some good learning out of my painful yet successful journey.

You can check the game out in Google Play Hashi Flow.

Support

If you like the content you read and want to support the author — thank you very much!
Here is my Ethereum wallet for tips:
0xB34C2BcE674104a7ca1ECEbF76d21fE1099132F0

--

--

Pudding Entertainment

Serious software engineer with everlasting passion for GameDev. Dreaming of next big project. https://pudding.pro