Some of you may know that we have had issues over the past couple of years with the TWC servers where the server just gets sort of maxed out and stops functioning. Suddenly it will take 10-15 minutes to enter the server, any communication to the server might be quite slow, and so on.
The strange thing is, it usually happens when I make some rather small changes that don't seem like they would cause a dramatic effect at all.
I've been gradually getting my head around what must be the root cause of this, and another incident this weekend has completely confirmed it in my mind.
So, briefly, here are the conclusions:
- It's not the number of aircraft per se - neither breathers nor ai nor the combination of the two. It is certainly possible to get overloaded with aircraft but the symptoms are different and the failure is more graceful and gradual than sudden. FYI a powerful server can handle 150-200 aircraft in-game quite well, 200-250 maybe but not optimal, and then 250+ is where things get a lot more dicey. But if you run into the problem I outline below you can remove 50 aircraft with no noticeable effect and say 100 with barely noticeable effect.
- Ground stationaries in general don't seem to have much effect - particularly if they're "just scenery" that doesn't interact with players, for example by shooting at them. It may take a little longer to get these loaded on entering the server but the penalty for say tens of thousands of statics vs just a couple of hundred is remarkably small. When I've encountered this problem I've tried removing many thousands of ground stationaries with no noticeable effect at all.
- When you have the problem I'm describing, you can monitor it on the server by simply watching server CPU load. The problem occurs when the server CPU is maxed out. With multi-core CPUs of course you won't see 100% CPU but you'll see one core maxed out with the main server thread and that is when you'll find yourself getting into trouble. The server can do some limited multi-tasking but like if you have a quad-core CPU, you start your server, and you find launcher64.exe is continually taking 25%+ CPU (let's say before any players have even entered) then you're likely in trouble. If you were even at 18% rather than 25-30% you might have enough breathing room but if it's more like 25%+ at "idle" then you're likely in trouble. More to the point, you can monitor things rather well just by keeping an eye on that CPU percentage on the server.
[[Note here that when I say "25%" what I actually mean is that one core of the CPU is at 100%. This is what is important for the server, as it essentially runs in a single core. It's just that our typical CPU monitoring software like Windows Task Manager or Performance Monitor, defines 100% CPU as full usage of all cores. So if you have 4 cores, you'll see full usage of one core show up as 25% usage, if you have 8 cores it will be 12% usage, 12 or 16 cores will be different/lower percentages yet, etc. That is just a quirk of the reporting software - the issue here is whether that one core is maxed out or not.]]
Just for example, one symptom of this problem is very long server load times for players joining the server. You'll see for example, CPU doesn't look maxed out and players join the server in 2-3 minutes. When the server is maxed out or a bit beyond, suddenly the players take 10-15-20 minutes to join. The "join" bar gets stuck at 99% for a long, long, long time.
Players complain that "you have too many things in your server". But that is not the root cause at all. You can add 20 or 30K ground stationaries and then remove them without making a noticeable difference. You can add 150 aircraft and remove them and again, no noticeable difference.
It's not "things in the server" per se but one particular type of thing that causes this particular problem.- Related, I noticed there were clear thresholds where a certain version of the server wouldn't run at all on Computer A, would barely run on Computer B, and would run just fine on Computer C. The difference in every case was simply raw CPU power. Computer C had enough CPU cycles to handle this particular type of load (see below), which seems to be fairly steady-state regardless of how many players are in the server etc, with just enough cycles left over to run everything else, too. (All the other thing that go on in the server, like AI aircraft, player aircraft, etc etc, clearly take a far lower percentage of the CPU, or perhaps they scale more gracefully somehow as max load is approached.) The CPUs on computers A & B couldn't even keep up with this load and had nothing left over for the rest, thus what ran just fine on Computer C would barely run or not run at all on the less capable CPUs.
- The lock-up situation seems to have a rather distinct threshold effect. Like if you keep adding more aircraft to the server you'll find a sort of gradual drop-off of responsiveness. But if you add more of this problem element (see what it is below) you'll have some hundreds to thousands of them in-game and all is well, and you add just a few more (10? 20? maybe 100? whatever it is, a small amount added in proportion to the amount you already have) and suddenly you go from "perfectly functional working server" to "completely disastrous non-responsive server"
- I have long suspected that AA/artillery is the ultimate cause of this problem. You can test this by gradually adding more & more AA/Artillery to a server and sure enough you find behavior like I outline above - up to a point everything works fine and then you reach a definite threshold and suddenly your server is completely non-functional. You can get to a point where you can't even join the server even though there is nothing going on in it.
- So AA & artillery is culprit #1.
- But what I hadn't thought through is a large number of other game elements act just like AA/artillery (and must have the same underlying code to a great degree) and so they add to the problem exactly as through they are AA/artillery. These include:
- Ships - anything that has a gun or shoots (some kinds of barges & transport ships etc) but even more so, MILITARY ships. Those things are like 10 batteries of artillery. Some of them can fire at a tremendous rate and have a bunch of different guns all on one single platform. So, one military ship counts for maybe a dozen or a few dozen individual AA/artillery - depending on the exact ship and armament.
- Ground vehicles. Again - anything that fires or has a gun. So for example, if you have a ground convoy of six or eight vehicles. Each vehicle can pack a lot of firepower (depending on which exact vehicle you choose) and some fire at a high rate of speed. So again, half a dozen convoy vehicles might be worth a dozen or two artillery or AA pieces.
- Finally, trains seem to punch above their weight in terms of CPU load. I don't know that they are exactly in the same class as the artillery/AA and similar shoot-ey things, but they are more problematic than you would think. At one point we had like 3 trains running on each side (so 6 total) and found that you could either have those 6 trains or like 100 AI aircraft - that was about the tradeoff.
There might be a few more things on the list, but the point is, anything that aims, shoots, and fires is going to use a percentage of the available CPU power. AA/artillery, ships whether stationary or moving (particularly military ships), and convoy type vehicles that shoot and defend themselves - all will add to the problem and at the point you have "too many" of these your server will go from working nicely to broken in about 1 second flat.
You have a clearly limited amount of "artillery power" available on the server, depending on your server's CPU capability, and if you exceed the workable amount you'll see a rather quick transition from "functions well" to "doesn't function at all".
At any rate, that's my observation. If you have a server that was running fine, you made a few changes and suddenly it's laggy and takes forever for players to log in, look at these "shooter" type elements first.
Just for example, I added what I thought of as "just a few" ships and suddenly our server went from functional to dysfunctional. (Of course, some of the ships were military, so count them as like 4 dozen AA/artillery pieces.)
Another time I added "a few" convoys, with similar effect. This was about 4 convoys with 6 or 8 vehicles in them. That seems like a small amount to me. (Also if you added 10 dozen of these items as static "stationaries" you wouldn't see any noticeable change in server functionality. But the live moving/shooting version looks similar but has a far, far different requirement as far as CPU power.) And once again it's not just that these have to be thought of as "aha, another artillery/AA but mobile." On the contrary, these items seem to punch somewhat above their weight as far as CPU usage, when compared with AA or artillery pieces. One of these is worth several AA/artillery pieces - whether 2 or 5 or 10 or 20 I couldn't say, but definitely more than just one.
FYI!
Bookmarks