Dev Tools for a Server Cluster

Share |
Posted by brendanw  |  09/24/2007  |  Devlog  |  Discuss

Monitoring tools for the server cluster

One of the initiatives being pushed by the engineering department (DevCo) these days has been trying to improve how we monitor the beta and deal with critical beta issues like crashes, character data loss, and exploits, etc. Keeping on top of these issues is immensely important and in the past we (the devs) have been a little slow to respond to these problems. A major part of our slow reaction time has been due to the visibility of any given problem. I wanted to share with you what sort of tools we’ve been working on to help with exposing beta issues more quickly and also give you a picture of some of the inner gristle of the beta server cluster.

Anatomy of a Server Cluster

Before I start talking about the tools DevCo has been working on I should explain how our beta cluster is set up. Currently our beta cluster consists of 4 super beefy machines sitting at our co-location facility in downtown Seattle. Each machine runs a number of server apps that control different aspects of our MMO. This diagram shows the current configuration of the four machines. All of the dynamic game data needed by various cluster apps (character info, landmark state, etc) are stored on our database server 10k-sql01. Each cluster machine runs one instance of a special app called “BigBrother” whose only job is to start up other server apps upon request. One of the BigBrothers elects themselves the master BigBrother and tells the others what to do.

There are two main classes of server apps. The first class handles cluster wide services like missions and chat. These typically run on the first machine, but BigBrother can chose to run a server app on any machine depending on server load. These special services are:

  • MissionServer – Tracks all the missions each player has
  • LoginServer – Handles character creation and player login
  • ChatServer – Handles Chat, Mail, and Group/Society management
  • ConnectionServer – Gate keeper from the Internet to the server cluster through which your game client connects
  • DispatchServer – Handles all delayed message sending (i.e. messages to offline characters) like auction listings being sold and in game e-mail messages
  • CacheServer – Gate keeper to the database. Makes game data lookups from the database faster.

The other class of server apps is the ZoneServer. There are hundreds of these running on a given server cluster. Each server controls an instance that players and AIs currently reside in. Towns, mission encounters, Ad-Hoc battles and the Open Sea are all zone servers. Whenever you go from a town to the Open Sea your character is actually being saved to the database, removed from the town zone server, and then reloaded in the Open Sea nav zone. Those of you that were unfortunate enough to get the character data loss bug we had a while ago (i.e. went to one zone, gained XP, left zone, new XP was missing) were hitting a bug in this zoning process, which unfortunately is VERY complicated. If you want to know more about our server architecture check out the chapter Joe wrote, MMP Server Cluster Architecture, in Massively Multiplayer Game Development 2.

A View From Afar

So remember how I said that the server cluster isn’t located here at FlyingLab HQ but rather at a co-location facility downtown. The next question you might have is, “How do you keep an eye on what the cluster is doing”? Well, I’m glad you asked! For some time we’ve been using Remote Desktop Connection (or RDC for short) to do everything with the remote cluster machines. This was a serious drag because RDC over our DSL line is a bit on the slow side. It became quickly appearant that we needed some web-based tools to manage the cluster from afar and thus Web OpsViewer was born. NOTE: All links you see here are pictures, not links to the actual tools. This fist picture is the cluster overview page where we can take the cluster up or down, see how many server apps are running, or access more specific cluster info. Clicking on the Process List gives a list of all the core server apps running, which machine they are running on, when they started, what their load is, etc. Clicking on the Zone List link gives a list of all of the zones currently running. The image here is just a snippet of the total listings as there are usually hundreds of these entries. We also keep track of all the zones that have shutdown, which is particularly useful if a server goes down abnormally and we need to find out why (more on that in a bit).

Memoirs of a ZoneServer

You might have noticed the


Posted by brendanw  |  09/24/2007  |  Devlog  |  Discuss

(divider)

Worldwide: us.png ru.png au.png