A Treatise on Tuning (Part One)
10/03/2007 | Devlog | | Discuss
Recently, Aether approached me and told me that he wants my soul – or a devlog on tuning. After days of soul searching, I decided to write the devlog.
I’ll tell you a little about the current tuning situation before I get into the mechanics and philosophy behind our tuning. Contrary to my expectations, it’s actually been quite a while since I did any intensive tuning. My priorities changed when we delayed our launch a while back, and I spent the past few months working on lots of additions and various issues instead of tuning. That’s because tuning can happen right up until launch (and beyond), but there’s a limitation on how far into development we can add stuff. So, I created a lot more ship variants and worked on raising issues to address problems in the game.
There’s a degree of conflict in the tuning schedule. While it’s possible to make tuning changes later than other changes, it’s also important to get tuning changes in the hands of the players. It takes time and experience to test tuning changes, which means it’s best for that to take place in beta. I periodically take a break from other work items in order to push important tuning changes to beta.
Tuning Philosophy
While I work on tuning in a variety of areas, ship tuning is my primary responsibility. Ship tuning covers ships, outfitting and ship combat in general. Skills have a major impact in ship combat, but I don’t work on those directly. Isildur does the skill tuning, and I’ll often give recommendations for things that should change.
There are two competing goals that drive ship tuning: 1) All ships should be useful, and small ships should be able to contribute to fights, 2) Players should always be looking to improve or replace ships and feel a strong sense of progression when they upgrade. In making ships, I have to be careful that I don’t make the small ships worthless. However, that does not mean that all the ships are supposed to be equal. Big ships are supposed to be better. With how our mechanics work, I can’t simply make it so that small ships are too hard to hit for big ships to deal with them effectively. Actually, I wouldn’t do that even if I could – I want it to be a challenge to take on a frigate in a cutter. I find battles that involve a mixture of ships to be the most fun, although most societies will field as many big ships as they can acquire.

Validation
Tuning is one of the hardest parts of a game to test, especially when you get into the fine details. It’s hard to test something for proper tuning in house because it’s very time intensive and takes extensive knowledge of the systems involved. It tends to require lots of general play instead of focused testing. So, it’s important to validate tuning changes. The validation process is not just used after the fact. These are all things that help instigate change in addition to determining whether a change works as intended. Here are the main five areas for validation:
My own experiences playing the game
I spend quite a bit of time playing our game, because it’s essential to have a good handle on things in order to tune properly. I compare my experiences to my expectations, and I also bounce things off the other designers.
Beta tester feedback
This is a tough one. Player feedback is essential, but it’s also jumbled and difficult to digest. It’s rare to see a general consensus on issues, and players who are satisfied don’t tend to provide any feedback. Oftentimes, the reports I get from players vary wildly from my own experiences. I always setup tests to see if I can witness the symptoms they describe, and those tests often come up empty.
Usually, there are three causes: 1) situational bugs that don’t show up in my controlled variable tests, 2) incomplete understanding of the systems on the tester’s part, 3) system changes require gameplay adaptation. It’s easiest to describe the overall problem with a simple analogy: players are the patients and developers are the doctors. The players are the best people at discovering problems and identifying the symptoms, but it’s all too common for them to misdiagnose themselves.
Just because there’s a seemingly obvious connection between the symptoms and an apparent source of the problem doesn’t mean that’s the root cause. In many cases, they aren’t even remotely related. For the developers, it’s important to focus on the symptoms and test those against a wide range of variables (potential causes). That helps us track down and eliminate the problem. For example, it’s common to see complaints “X is happening because of Y, and it’s terrible!” Since I know more about the backend systems than a player possibly can, I read those statements and know that Y is impossible unless it’s caused by a strange bug. People don’t often respond well when they’re told that their claims are impossible, especially when the person telling them is perceived as being responsible for Y.
This doesn’t invalidate the tester feedback; the entire process would be impossible without testers. Rather, it makes it more important to temper the feedback and evaluate it from a dispassionate analytical position. If a patient goes to a doctor complaining about symptoms and insisting it’s a horrible disease, the doctor is going to want to backtrack and trace it down. The doctor will attempt to rule out the disease. It’d be wrong and dangerous to treat the patient based on the patient’s diagnosis, even if it turns out that the patient is right.
The biggest issue we’ve had with ship combat lately turned out to be a situational bug that wasn’t at all related to any of the work I had done. It’s pretty common to discover that the symptoms are caused by a bug. This makes things difficult, because it is easy for the testers to see the symptoms and it’s hard for us to see them. This is especially true when I’m playing the game and not running into the same problems. That means we have to blindly explore potential causes until we hit the right one, trusting that there’s truth in the feedback. Players often exaggerate, overreact or simply misunderstand what happened to cause the problem (which is another problem all on its own).
Mathematical proof
I try to do everything mathematically. Are warships underpowered? If the answer is yes, then there’s something wrong with my formulas to calculate the value of warships. Maintaining formulas is one of the hardest parts of ship tuning because things are always changing. For months, the majority of my time went into overhauling all the ships. A lot of that time went into creating the system to rate ships. Then I went and replaced the speed system with one that relied on a more complex relationship between speed, acceleration and turning. That completely invalidated a large portion of the level calcs. I reworked the calcs to get them to account for the changes in the values of the speed-related stats. Then I got code support for new stats. Then I discovered I was using a tool to calculate DPS that was way off.
So I made my own DPS calcs and referenced those in the formulas. Then I changed the scale for some other stats. Each step means the formulas need to be revamped, but with all of these changes there isn’t always time to rework the formulas. Changing a single stat’s formula is simple. Setting new values on ships and changing the scale of a stat aren’t difficult tasks. But the giant level calc formula? Incorporating those changes into a formula that spits out the levels for over 100 ships and getting it to properly weigh all of them is very, very difficult. Since the total output values must remain the same, a single change cascades through the entire system and requires numerous adjustments.
It creates a lot of work, but I trust math. If there’s something wrong in the math, the flaw is almost certainly going to float to the surface. There’s a definite process to tracking down math problems.
Metrics
Metrics data on what people are doing in the beta aren’t part of my normal tuning process because they aren’t easily visible in fine tuning. However, metrics are essential for large-scale tuning tasks. Depending on the metric, it can either tell you the symptom or the cause. For example, a metric that shows what ships people are using can tell me a symptom: players are not using a specific ship. Let’s assume you know that some players have way too much money and other players don’t have enough. In this case, a metric might be able to identify the cause: certain activities produce a disproportionate amount of income, thus the existence of the symptom is based on the player’s participation level in those activities.
Gut feeling
Sometimes, I just feel like something is wrong and should be changed. Maybe I can’t show it mathematically and no one is complaining about it, but it’s still not right. When this happens, the first step is to do some research and development into the existing system and potential alternatives. This happened with our mechanic for making AIs strike colors (surrender). It just felt wrong even though it worked, so I started poking it… and it didn’t take long to discover how to break the mechanic and force AIs to surrender quickly.
While the five areas I’ve described all contribute to tuning, it’s incredibly rare to get all five of these to agree. Usually, there just isn’t information available from all five of these sources.
Fun vs Balance
Dealing with fun versus balance is a common problem in beta. During beta, we can freely experiment with more drastic changes than we can after launch, assuming those changes are not too time intensive. We can completely change how things work in different areas. This creates some problems. First off, testers are human, therefore they don’t react well to change. That means that the feedback we get for the first week after a big change usually isn’t as valuable as the feedback in the second week. However, it’s important to evaluate and test the feedback to make sure something isn’t completely broken.
New testers are a great asset in this context. Combining the reactions of new testers with the ones from existing testers provides a lot of insight into a situation. Recently, we changed the requirements for how players initiate boarding to make it much harder than it was before. In earlier builds, boarding was the golden path – it was way faster than defeating ships. When the change hit beta, there was a generally mixed reaction. Quite a few players said it was broken and is nearly impossible. However, feedback from new testers was almost always the same: boarding is too easy. There are several implications here:
1) The older, broken grappling mechanics taught players the wrong way to play the game. Applying those lessons to the new environment makes it appear non-functional.
2) There are certain cases where it is too hard to board because low-level NPCs were using too many skills. This is not a global issue.
3) Boarding is still too easy when you learn the system, and there are several ways to force a boarding action.
4) Overall, the system works much better than it did before.
This illustrates how sometimes we’re making balancing changes that aren’t fun in the short run, but they’re really better for the game balance and fun post launch. Also, the disparity in experiences is crucial, because the changes were affected by a different system (NPC skills)., which means that there was no universally applicable answer.
Several members of the FLS staff attended the Austin Game Developers Conference in early September. Lead Developer Joe gave a presentation on our metric tools. At one point in the presentation, he talked about how designers don’t always use the metrics data for good – and he showed a picture of kids burning ants with a magnifying glass. I was in the audience and realized, “Hey, that’s me!” While Joe presented it as a joke (I hope!), sometimes players feel it’s true, especially in beta. Game development is about providing entertainment and that’s always the end goal with tuning. That’s why nerfs exist, and the more of them that can take place in beta, the better. If I get ships well-balanced enough in beta, then hopefully I won’t need to nerf many players’ favorite ships after launch.
We’re willing to let our testers suffer more than we would if we were running a live server with paying customers. Beta testers often have to deal with shell shock on top of long-standing issues. In beta, changes tend to happen in a certain sequence and there are gaps when things don’t work properly and future changes will fix those problems. Whenever possible, we’ll take steps to fix them sooner rather than later, but sometimes it is necessary to wait. For example, there were several problems with engaging in PvP and keeping people in combat (when you exit combat, you can escape the battle). The length of the in combat timer was too short, which made it easy to escape. On the other hand, there were skills that could permanently keep you in combat at long range. We had to wait for a series of changes to happen before we could fix these related problems, so it stayed that way for quite awhile.
Ultimately, the goal is to make everything fun. Balanced games are more fun, but that’s not always true of the process. Post-launch, there will be a lot more fine-tuning than sweeping changes. The beta testers who stick through it all are ultimately contributing to the success of the game, as MMO beta programs have a notoriously high turnover rate and long-term testers are crucial for testing balance. When it comes to ship combat, the beta testers are the best resource I have because they spend so much time fighting.
Stay tuned for part two.
10/03/2007 | Devlog | | Discuss
![]()

