[Translation] TDD in game dev or "rabbit hell"

[Translation] TDD in game dev or "rabbit hell"


TDD is used quite rarely in game devs. It is usually easier to hire a tester than to allocate a developer to write tests — this is how both resources and time are saved. Therefore, every successful example of using TDD becomes more interesting. Under the cut, the translation of the material, where this development technique was used to create the movement of characters in the game ElemenTerra.



Test-driven development or TDD (development through testing) is a software development technique in which the whole process is divided into many small cycles. Unit-tests are written, then the code that passes these tests is written, and then refactoring is done. And the algorithm repeats.

TDD Basics


Suppose we write a function that adds two numbers. In an ordinary workflow, we would just write it. But to use TDD, you need to start by creating placeholder functions and unit tests:

 //Placeholder function that gives incorrect results:
 int add (int a, int b) {
  return -1;
 }//Unit tests that fail if add does not give correct results:
 void runTests () {
  if (add (1, 1) is not equal to 2)
  throw error;
  if (add (2, 2) is not equal to 4)
  throw error;
 }  

At first, our unit tests won't work, because the placeholder function returns -1 for each input. Now we can do add correctly to return a + b . Tests will be passed. This may seem like a workaround, but there are several advantages:

If by mistake we write add as a - b , our tests will not work, and we will immediately find out how to fix the function. Without tests, we may not catch this error and see a non-standard reaction that will take time to debug.
We can continue the tests and run them at any time while writing code. This means that if another programmer accidentally changes add , then he immediately finds out about the error - the tests will not work again.

TDD in game dev


With TDD, game development has two problems. First, many gaming functions have subjective goals that cannot be measured. And secondly, it is hard to write tests that cover all the possibilities of the world space, which are full of complex interacting objects. Developers who want their characters to “look good” or physical simulations “not look jerky” will find it difficult to express these metrics in the form of deterministic “passed/not passed” conditions.

However, the TDD technique is applicable to complex and subjective features - for example, to the movement of characters. And in the game ElemenTerra we did it.

Unit tests against debag levels


Before starting the practice, I want to distinguish between an automatic unit test and the traditional “debug level”. Creating hidden locations with artificial conditions is common in game devs. This allows programmers and QA to monitor individual events.


Secret Debug Level in The Legend of Zelda: The Wind Waker

There are many such levels in ElemenTerra: a level full of problem geometry for a player character, levels with special user interfaces that trigger certain game states and others.

Like unit tests, these debug levels can be used to reproduce and diagnose errors. But in some ways they differ:

Unit tests divide systems into parts and evaluate each individually, while debugging levels conduct tests in a more holistic way. After finding the error at the debug level, developers may still need to manually search for the error point.
Unit tests are automated and have to give deterministic results every time, while many debug levels are “controlled” by the player. This makes a difference in the sessions.

But this does not mean that unit tests are better than debug levels. The latter are often more practical. However, unit testing can be used even on systems where it has not traditionally been present.

Welcome to Rabbit Hell


In ElemenTerra, players use the mystical forces of nature to save creatures affected by the cosmic storm. One of these forces is the ability to create paths that lead creatures to food and shelter. Since these paths are dynamic grids created by players, the movement of the creature must cope with unusual geometric cases and arbitrarily complex terrain.

Character movement is one of those complex systems where "everything affects everything else." If you have ever done this, then you know that when writing new code, it is very easy to break existing functionality. Do you need rabbits to climb on small ledges? Okay, but now they are twitching, climbing the slopes. Do you want the path of lizards not to cross? It worked, but now their typical behavior is spoiled.

As a person responsible for AI and most of the gameplay code, I knew that I did not have time for any surprises. I wanted to immediately notice a setback, so working with TDD seemed like a good option.

The next step was to create a system in which I could easily identify each case of movement as a simulated test for “passed/failed”:



This “rabbit hell” consists of 18 isolated corridors. Each with a creature and its own route, designed to move only if a certain movement function is working. Tests are considered successful if the rabbit is able to move for an infinitely long time without getting stuck. Otherwise - unsuccessful. Note that we only test the body of creatures (pawn in terms of Unreal), not artificial intelligence. In ElemenTerra, creatures can eat, sleep, and react to the world, but in “rabbit hell” their only instruction is to run between two points.

Here are some examples of such tests:


1, 2, 3: Free movement, static obstacles and dynamic obstacles


8 and 9: Uniform slopes and uneven terrain


10: Vanishing Floor


13: Playing a bug in which creatures rotated around nearby targets endlessly


14 and 15: The ability to navigate flat and complex protrusions

Talk about the similarities and differences between my implementation and the “clean” TDD.

My system looked like TDD in this:

  • I started working on features by creating tests, and then I wrote the code necessary to run them.
  • I continued to perform old tests, adding new functions.
  • Each test measured exactly one part of the system, which allowed me to quickly find problems.
  • Tests were automated and did not require player input.

And it was different:

  • When evaluating tests, there was an element of subjectivity. While real movement errors (the character did not go from A to B) could be detected programmatically.That is, for example, skewed positions, the problems of synchronization of animation and jerky movement required human evaluation.
  • Tests were not fully deterministic. Random factors, like frame rate fluctuations, caused small deviations. But in general, creatures usually follow the same paths and have the same success/failure between sessions.

Restrictions


Using TDD to move an ElemenTerra creature was a huge plus, but my approach had a few limitations:

  • Unit tests evaluated each movement feature separately, so errors with combinations of several features were not considered. Sometimes you had to supplement unit tests with traditional debug levels.
  • ElemenTerra has four kinds of creatures, but tests only contain rabbits. This is a feature of our production schedule (the other three types were added much later to the development). Fortunately, all four have the same movement possibilities, but the large body of Mossmork caused several problems. Next time, I would have the tests dynamically spawn the selected species instead of using pre-placed rabbits.


This Mossmork requires a bit more space unlike a bunny

TDD - your choice?


Developers may spend too much power on the levels for unit tests that the player will never appreciate. I do not deny it, I myself got a lot of pleasure from creating the “rabbit hell”. Such internal functions can take a lot of time and jeopardize the more important milestones. To prevent this from happening, carefully study where and when to use unit tests. Below, I have highlighted several criteria that justify TDD for the movement of an ElemenTerra creature.

1. Will it take a lot of time to manually perform test tasks?

Before you spend time on automated testing, you need to check whether we can evaluate the function using ordinary game controls. If you want to make sure your keys unlock the doors, close the key and open the door for them. Creating unit tests for this feature would be a waste of time — manual testing takes only a few seconds.
2. Is it difficult to create test tasks manually?

Automated unit tests are justified when there are known and difficult to reproduce cases. Test 7 of the “rabbit hell” tests how walking along the ledges is what the AI ​​is usually trying hard to avoid. Such a situation may be difficult or impossible to reproduce with the help of game controls, and the tests are easy.

3. Do you know that the desired results will not change?

Game design is completely based on iterations, so the goals of features can change as your game is remade. Even small changes can invalidate the metrics by which you evaluate your features, and, therefore, any unit tests. If the creatures' behavior during eating, sleeping and interacting with the player changed several times, then the transition from point A to point B remained unchanged. Therefore, the movement code and its unit tests remained relevant throughout the development.

4. Is the regression likely to go unnoticed?

Did you have a situation when you complete one of the last tasks before sending the game, and suddenly find an error that breaks the rules? And in the function that you finished many years ago. Games are gigantic interconnected systems, and therefore it is natural that adding a new function B can lead to the failure of the old function A.

This is not so bad when a broken function is used everywhere (for example, a jump) - you should immediately notice a mechanical breakdown.Errors discovered in late development can disrupt the schedule, and after launching can harm the gameplay.

5. The worst thing that can happen when using tests and without them?

Creating tests is one of the forms of risk management. Imagine that you decide whether to buy insurance for a vehicle. You need to answer three questions:

  • How much do monthly premiums cost?
  • How likely is it that a car will be damaged?
  • How expensive would the worst-case scenario be if you were not insured?

For TDD, we can imagine monthly installments in the form of production costs for servicing our unit tests, the probability of damage to a car as a probability of getting a bug, and the cost of a complete replacement of a car as the worst scenario for a regression error.

If it takes a lot of time to create a feature test, it is simple and unlikely to be changed (or it can be handled if it breaks down in later development), unit tests can cause more problems than good. If tests are easy to do, the function is unstable and interconnected (or its errors will take a lot of time), then tests will help.

Limits of Automation


Unit tests can be a great addition to finding and fixing bugs, but they do not replace the need for professional quality control in large-scale games. QA is an art that requires creativity, subjective judgment and excellent technical communication.

Source text: [Translation] TDD in game dev or "rabbit hell"