f13.net forums - CCP Fixing lag - system player limits implemented!

Welcome, Guest. Please login or register.
October 15, 2025, 09:26:22 PM

we're back, baby

Quote from: Endie on September 30, 2008, 12:59:24 PM

We've got a pile of bugs: people moving while cloaked in battleships without cloaks fitted. People in fleet appearing as members of NPC corps. Drones travelling into pos shields. And so on...

Well, good time to train recon ships in the safety of a station then.

Mo' speed.

Quote

We recently deployed a new technology to the EVE universe, StacklessIO, which is a new, robust, network technology for both the EVE server and clients. The server version was released 16 September and the client version was released 30 September with the Empyrean Age 1.1.1 patch. We have received great feedback and we hope you are enjoying it. In my dev blog on StacklessIO I mentioned that there would be a follow-up dev blog on related topics.

And here it is: 2^EVE = EVE64.

StacklessIO, after years of development, has been a big success. We measured the improved performance, and you've told us on the forums and in the local chat in Jita that we have made a significant advancement in our goal of eliminating all lag from EVE Online.

Normally Jita reaches a maximum of about 800-900 pilots on any given Sunday. On the Friday following the deployment of StacklessIO, 19 September, there were close to 1,000 concurrent pilots in Jita and on the Saturday, 20 September, the maximum number reached 1,400. This is more than have ever been in Jita at the same time. Under our old network technology Jita could become rather unresponsive at 800-900 pilots but on the Sunday, 21 September, it was quite playable and very responsive with 800 pilots, thanks to StacklessIO.

Alas, there were teething problems. At 1,400 pilots the node hosting Jita ran out of memory and crashed. As crazy as it may sound this was very exciting since we had not been in the position before to be able to have that problem, as Jita would lag out before reaching that point under our old network technology. We immediately turned our attention to solving the challenge of giving the EVE server more memory to access.

CCP porkbelly wrote a dev blog three years ago entitled "64 Bits" where he described our first attempts at compiling the EVE server as a 64-bit program and the main reason for doing so: Access to more memory. At that time we were not able to complete the 64-bit migration since the old network technology did not work correctly as a 64-bit program. Having replaced the old network technology with StacklessIO we were in the position to continue that work.

And we started it, completed it and deployed EVE64 last week! Yes, we pulled it off in a single week! That might almost sound recklessly fast to some but this was achieved with a strike team that stepped up to the challenge. There is a lot of enthusiasm within CCP today to tackle the lag monster now that we have this new platform to build on.

The EVE server runs on a cluster of blades and is divided into proxy nodes and server nodes. The EVE clients connect to the proxy nodes, which act as dispatchers and are also an outer layer of defense for the server nodes that run the solar systems simulation.

The proxies are now all running EVE64. We are planning to reduce the number of proxy nodes, which in return will lead to overall increased performance of the EVE server as the total number of proxy servers in our system affects scalability of our application layer. Now that the proxy nodes can address more memory they have the ability to service more client connections, as their performance is mostly a function of IO capacity (StacklessIO) and memory. The proxies are essentially proprietary software routers that just became vastly more powerful under this new paradigm.

The server nodes will run a mix of 32- and 64-bit nodes since most nodes in the cluster don't have memory requirements requiring EVE64. By replacing 32-bit code with 64-bit code more memory is immediately required since, e.g., all memory pointers double in size. The need has to be clear as there is not gain in all cases to run EVE64, but where there is need we are now able to respond to it. Our network protocols that run on top of StacklessIO make sure that this mixed mode cluster configuration of EVE32 and EVE64 runs completely transparent to all code within the system.

The normal setup in the cluster for the server nodes is that each blade has two 64-bit processors, 4 GB of memory and runs Window Server 2003 x64. Each blade runs two nodes and each node then hosts a number of solar systems. There are also dedicated nodes for the market, dedicated nodes for corporation services, a dedicated head node for the cluster, etc.

Finally there is a pool of dedicated dual-CPU, dual-core, machines that only run a single EVE64 node per machine. Jita and four other high use solar systems are assigned to that pool. That pool is now running all native 64-bit code and the blades have been upgraded to 16 GB of memory. These blades also have more powerful CPUs which has helped as well. We are currently working with our vendors on testing out even more powerful hardware options now that we can utilise the hardware much better.

This Monday, 29 September, we saw a fleet battle with over 1100 pilots reported in local. Field reports indicate that the fight was quite responsive for the first 10 minutes but then the node "missed its heart beat" as we call it and was removed from the cluster by our cluster integrity watchdog routines. This again is another exciting problem as we can address that as well under our StacklessIO world and that will be the subject of the next blog.

Quote from: eldaec on October 06, 2008, 11:56:20 PM

Magic Patch! this guy looks legit

Well, some of the stuff that has been happening in Vale this week (dying but keeping your ship, getting insurance and losing a few modules; dead ships staying on grid but becoming untargetable beside their pods; drones bouncing in and out of hostile POS shields; 700-person lag in 350-person systems and many, many more) are probably best chalked up to supernatural forces, it is true.

More 'under the hood' talk.

Quote

Gentlemen!

I thought I would add to the recent dev blogs we have had over the last couple of weeks and talk about what has been going on with the Tranquility cluster itself - in relation to the StacklessIO and 64-bit EVE enhancements and where we are heading into the future.

1 x EVE Server Cluster

The EVE Cluster is broken into 3 distinct layers, and a bit of the terminology that is thrown around from time to time (including later in this blog) can be explained quite simply here.

* Proxy Blades- These are the public facing segment of the EVE Cluster - they are responsible for taking player connections and establishing player communication within the rest of the cluster.
* SOL Blades - These are the workhorses of Tranquility and are the primary focus of our ongoing work. The cluster is divided across 90 - 100 SOL blades which run 2 nodes each.
o Node - a single EVE server process. This is the lowest level of granularity within the cluster.
o Dedicated SOL blade - These are SOL blades that we dedicate to one system only. Systems such as Jita, Motsu and Saila reside on these. They run two nodes like any other SOL blade, however the second node is idle and does not load any solar systems.
* Database Cluster - This is the persistence layer of EVE Online. The running nodes interact heavily with the Database, and of course pretty much everything to do with the game lives here. Thanks to our RamSans, our database is able to keep up with the enormous I/O load that Tranquility generates.
o At peak hours, our database is processing over 2,000 transactions per second, which generates around 38,000 IOPS (input output operations per second)
o To keep up with this load, we currently have two RamSans.

1 x SOL Blade

The EVE Server application itself (also known as a node) is primarily a CPU intensive process. Due to the nature of the Stackless Python programming methodology chosen for EVE Online, the python component of each node is a single thread, which means it can only ever utilize 1 CPU core at a time.

Our SOLs are IBM blades, and up until quite recently were almost all running AMD Opteron 2.8Ghz Dual Core processors with 4GB of DDR1 Ram. Over the last 6 months or so, we have been investigating options for replacing these Opteron processors with something more powerful. We selected some dual socket, dual core Intel Xeon 3.0Ghz Woodcrest blades for testing purposes, and have been using them as an integral part of our StacklessIO testing (as blogged about here by CCP Explorer). Now that StacklessIO has been released we are able to use these blades to their fullest, and as a first step looked at ways we could use these test blades on Tranquility.

1 x Rapid Deployment

When we hit 1400 players in Jita and then had the unfortunate incident where the SOL blade powering Jita ran out of memory, we looked to our Intel test blades for help. We shuffled some RAM around and were able to get 5 new Intel SOL blades with 16Gb of DDR2 Ram each ready for use. We did a staggered test deployment of these to Tranquility last week. On Friday, confident of their stability and anticipating performance increases, we set them up as dedicated SOL blades. That evening, Jita, Saila and Motsu were performing better than ever, and there was much rejoicing. Over the last weekend, the GM's did not receive a single "Stuck Character" petition from Jita!

3 x Epic Fleet Fights

That Saturday, out of the blue we saw one of the nodes supporting 0.0 go to Critical status and shortly afterwards it shut down. This happened a few more times in quick succession, and it became apparent that there was a new issue where extremely loaded nodes were simply not able to keep up with their heartbeat. This issue in itself is fixable and we are working hard to get it resolved.

At this point, it was apparent that with 700+ players trying to "pew pew", the AMD node they were on was not going to do anything other than keep crashing. We re-mapped the system in question to one of our dedicated Intel blades, just to see what it was capable of. Jita had performed so well the night before, that we thought these nodes would handle a fleet fight quite nicely. The system held, and the rest, as they say, is history.

On Sunday night, the M-OEE8 System was the hotspot and it had been placed on an Intel 64 bit dedicated SOL blade in anticipation. It held fine with a peak of around 450 players.

On Monday night, over 1000 players tried to start a fight in this system. As with Sunday, we had anticipated there would be fighting there, so it had been placed on a dedicated node. Unfortunately, what had caused node crashes at 700 players on our AMD blades caused our Intel blade to miss its heartbeat after going a bit over 1200 players. Interestingly enough, despite missing its heart beat, many players have reported that the performance of this blade with 1000 players was very good in the 10 - 15 minutes prior to its shutdown.

I would like to stress that we at CCP are very excited by this, and we are very hopeful that once the issue causing these node deaths is solved that we will start to see this impressive performance much more often. A lot of people have put in a lot of hard work towards new technologies and it is starting to pay off for you, the players.

So where do we go from here?

We are by no means finished with these upgrades, and there is still a lot of work to be done. During this last two week period, we have proved the readiness of some of our new technology, and we now need to work on the best way to ensure everyone can benefit...

Newer, Faster Blades

Our 3.0Ghz Intel Woodcrest blades are nice, but that processor architecture has been replaced by Wolfdale, which is even more powerful. So we have put a fast-track order in for some Intel Xeon 3.3Ghz Wolfdale blades. We expect to have these in the cluster very soon and we anticipate these will give us an even bigger performance boost than we have seen so far, paving the way for a new Tranquility Cluster. It is worth noting that the hardware we are beginning to purchase now is the hardware that will see us all the way into the HPC era. There will be a detailed presentation about the status of this project at Fanfest in November.

Help us to help you

It's nice that we have this new hardware, but there is going to be an interim period while we work to upgrade the existing hardware to this HPC-ready specification. During this period, we will be proactively working to place fleet fight systems onto a dedicated node at downtime. We often can't predict where our players are planning to unleash hell, so we need to know which systems are going to have fleet fights! We are working on a way to allow players to directly contact the Virtual World Operations team with this information, however in the mean time corporation directors are invited to petition any planned operations (use the Stuck Character category at least 24 hours in advance, and please include estimated attack / defense numbers), and we will take note of this when we assign systems to dedicated sol blades during downtime.

With that, I will leave you with my final - subtle - personal thoughts on the matter.

This is EVE Online!

Impossible is nothing!

HPC?
"High performance computing" - A supercomputer that ccp is producing in partnership with just about everyone. Most supercomputers are designed around running multiple threads at a stately pace across distributed clusters. TQ needs to run single threaded apps as fast as fast can be, so they're researching areas that have never been touched before.