f13.net

f13.net General Forums => Eve Online => Topic started by: bhodi on September 21, 2008, 07:47:14 AM



Title: CCP Fixing lag - system player limits implemented!
Post by: bhodi on September 21, 2008, 07:47:14 AM
Quote
During the downtime today, Sunday 21st September, we deployed a new feature to Tranquility. The Stargate will now check how many players there are in the system and deny access if you attempt to jump into it when too many players are present. In addition to this you will not be able to login on a character if the system has too many players. We have also increased the amount of GMs this evening and if you are unable to login, please press the ESC button and submit a petition in the Stuck category.

This is a temporary solution to this issue, we are working on a permanent feature which has the highest priority and will be implemented as soon as possible.

Yeah... we'll see.

No word on the actual system limit.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: NiX on September 21, 2008, 09:08:44 AM
If they actually say what the limit is, that would be horribly stupid. All it takes is a bunch of bored people to clog up a vital jump gate.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Der Helm on September 21, 2008, 09:15:08 AM
Jita had 600 people online, right after downtime.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Jayce on September 21, 2008, 10:38:11 AM
Sooner or later they will figure out the number, then warfare will devolve into:

1 - get enough people to lock the system.  If < 5% are enemy, that's ok too.
2 - reinforce their POSes
3 - do not logoff for the stront timing
4 - profit.

I really don't see how this is not a ridiculously stupid idea.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: eldaec on September 21, 2008, 11:03:59 AM
Jita had 600 people online, right after downtime.

My guess is that system limits will vary, and they could presumably exclude docked pilots from the count. Though 'you cannot undock because the system is full' wouldn't be much fun.


In the short term they could also exclude high sec from the limit, since I don't think anyone ever had 5 minute module lag in high sec.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Sparky on September 21, 2008, 11:27:11 AM
Sooner or later they will figure out the number, then warfare will devolve into:

1 - get enough people to lock the system.  If < 5% are enemy, that's ok too.
2 - reinforce their POSes
3 - do not logoff for the stront timing
4 - profit.

I really don't see how this is not a ridiculously stupid idea.

Even if you don't have enough pilots with all their alts, make trial accounts!  This will be horribly, horribly abused.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Vedi on September 21, 2008, 11:28:07 AM
There are actually lots of way of getting into system apart from gates or login. From the top of mye head:
  • Jumping in (capital or black-ops) by cyno.
  • Jumpclones.
  • Normal clones (when you die you reappear in a station where your clone is registered).
  • Gating in via a Titan.
  • Jump bridges.

If this is a temporary thing, it wouldn't surprise me if they missed a couple of ways to circumvent the measure. Will be interesting to see what that devolves into, strategy wise.

For instance, say you got a system with only one stargate in it (dead-end). If you can get a sizable enemy fleet in there that is confident that they occupy majority of the slots, you can then move a large fleet into the neigbouring system, filling that up and trapping the enemy fleet. You could then jump-clone in a few hundred into the system where the enemy fleet is trapped, undock and engage. Or just go ahead and reinforce every POS in every enemy system.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Endie on September 21, 2008, 11:50:58 AM
This will make taking space even harder.  Want to be sure of a system defence?  Get 2/3 or 3/4 of the limit in (even if it varies by cluster people will quickly work out likely values by experimentation) and let your enemy trickle in, hutting each on the head then waiting for the replacement to be allowed in.  And for anyone who has flew in F-T or the like, you'll know that you can pop caps all day long like that.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: eldaec on September 21, 2008, 01:42:56 PM
System capping will work just as well for the attacker as the defender.

If for instance you were attacking a Vanguard system, you'd be able to get in hours before, safespot up and lock defenders out whilst they hide in a station 58 jumps away.

What it will do is swap the lag battles for long term wars of attitrition with alliances logging out huge numbers of ships in a system days earlier, and then continually attempting to log in till the population shifts two to one in your favour.

Basically, much more theory-craft-faggotry driving strategy.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Jayce on September 21, 2008, 03:22:04 PM
If this had been in place when we took FAT and 25S, we'd probably still be there.  1000 strong alliance, logged in, in station.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: JoeTF on September 21, 2008, 03:23:08 PM
Cap is supposed to be around 1k or something, so it shoudn't be an issue.

However, in course if implementing this CCP screwed up the servers ROYALLY. Killmails are not generating (not new, but yeah), people are getting double payouts from insurance, people who died ending up in brand new ship after relogging, people who got jammed, tackled by three guys and attacked by 15 fighters (hi! :awesome_for_real:) getting 1mln km form the gate in brand new ship after relogging, people getting new ship but only with half of the fittings...

I play this game for quite a few years, but I haven't seen mess like this even on the test server.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Miasma on September 21, 2008, 05:25:03 PM
Is this because of the EvE-O thread you guys linked about the factional warfare having the same problems as 0.0?


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Pax on September 22, 2008, 01:25:26 AM
The next thing we know is, CCP releases "Premium Tickets" with "Preferred System Access" to enter capped out clusters. At a nominal fee, of course.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Drugstore Space Cowboy on September 22, 2008, 05:21:15 AM
The next thing we know is, CCP releases "Premium Tickets" with "Preferred System Access" to enter capped out clusters. At a nominal fee, of course.

Join the New World Order!


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Pax on September 22, 2008, 05:54:39 AM
The next thing we know is, CCP releases "Premium Tickets" with "Preferred System Access" to enter capped out clusters. At a nominal fee, of course.

Join the New World Order!

It's not a conspiracy, it has already happened in Silkroad Online - they capped servers, figured paying customers could no longer log into the game, so they released Premium Tickets in different flavours (and with different price tags). The terms in ".." aren't just possible catchphrases, but actual item descriptions from SRO.
True story.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: TheDreamr on September 22, 2008, 07:06:51 AM
Honestly, it's just Jita and this time we mean it!

Quote
Server hotfix applied Monday 22nd September regarding system player limit.
reported by: CCP Navigator | 2008.09.22 12:17:07

During downtime today, Monday 22nd September, a hotfix was deployed to Tranquility. The system cap applied to the server yesterday has now been removed from all systems with the exception of Jita. From today, Jita will still maintain a maximum player capacity which has been raised significantly.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Mirochai on September 22, 2008, 11:47:04 AM
Get 1000 farmers to log in to Jita when you don't want people trading. Then do it for any other system when it shifts.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: bosoa on September 22, 2008, 03:53:06 PM
Sooner or later they will figure out the number, then warfare will devolve into:

1 - get enough people to lock the system.  If < 5% are enemy, that's ok too.
2 - reinforce their POSes
3 - do not logoff for the stront timing
4 - profit.

I really don't see how this is not a ridiculously stupid idea.

i would be willing to bet real cash that CCP didnt even think about this. give it a few months (if it can be done) and after all the petitions come in some proggie will get the idea and then the hamsters will get back on their wheel so he can undo the crap he did >.>


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: lac on September 25, 2008, 06:04:17 AM
Devblog - system avoidance and jita cleanup.
Quote
A Tale of Two Cities

Iceland's capital of Reykjavík is an interesting experience for me. It's big, it's busy and, as a result of being a convenient place to get most anything one needs, it can be a bit expensive; much like any other major metropolis that has over a hundred thousand inhabitants. It's a place with a little too much hustle and bustle for me.

Often times though, I do have to travel there. For example, if I find myself needing a new bicycle, I know that the stores in Hafnarfjörður simply don't have the same stock as the ones in Reykjavík. Therefore I hop on the bus and go through the pain of entangling myself in the fast-paced lifestyle and traffic congestion of the capital (we do have some bus priority lanes, which is cool).

I can't really complain since it's my own decision and I'm going back to Hafnarfjörður with a spanking new, albeit overpriced due to run amok inflation, bike. *ring ring*

What really grinds my gears to a halt is that I can't go visit my sister in Mosfellsbær without first going through the traffic-congested Reykjavík. If only there was a way to go around Reykjavík...

Where was I? Oh yes.

We have a similar problem in a little space game called EVE Online. The Jita system is a lot like Reykjavík. The biggest playability issue with Jita (if you don't count the lag and spam) is that people sometimes have no choice but to go there for various reasons:

* When navigating through this region of space the autopilot sometimes wants to send you through Jita.
* There is a constellation you cannot reach without going through Jita.
* There are some agents in Jita that new players sometimes have the misfortune of working for.
* Agents sometimes send you on missions into Jita.

We are (finally, I hear you scream) fixing all of these issues.


Autopilot Avoidance

We have a new feature in place that allows players to set a list of systems, constellations and regions that the autopilot should never send them to. This list is preconfigured to avoid the Jita system.

This works the same as the current "avoid pod kill" feature; the autopilot will act as if the system isn't even there and will rather return no path at all than a path that takes you through the undesirable systems.

What this means is that you can happily navigate all around the Forge region on autopilot without having to worry about getting stuck in the Jita blackhole for the better part of a Sunday evening.


Figure 1: Autopilot avoidance turned off
http://staff.ccpgames.com/codemonkey/avoidance6.png

Figure 2: Autopilot avoidance turned on, with Jita on the avoidance list
http://staff.ccpgames.com/codemonkey/avoidance5.png

Figure 3: A new map mode showing your avoidance systems
http://staff.ccpgames.com/codemonkey/avoidance4.png

This is a completely general system to avoid any number of star systems, constellations or regions. You will therefore be able to use this to avoid that annoying faction that keeps shooting you or that constellation that has those pirates in it.


World Shaping

There is a constellation called Ruomo which is famous for being a place where you cannot go unless you go through Jita. Citizens of the constellation have finally rebelled and gotten themselves a spanking new hyperspace bypass, courtesy of the Interstellar Bottleneck Bypass Initiative (IBBI for short).

Soon you will never have to go through Jita to reach any other system.


Agents & Missions

Agents in Jita have finally had enough and are hauling butt out of there. Jita will be completely agent-less soon.

Also, agents outside Jita will never send you into Jita to complete a mission for them.


When will we get it?

This system will be implemented incrementally but you should see this start to trickle in soon. Not the CCP soon(tm) mind you, but very very soon.


Conclusion

This is being implemented at the same time as we're aggressively optimizing our software backbone to provide a smoother playing experience in Jita. That is, however, outside the scope of this devblog.

Our goal is to create an environment where players don't have to go to Jita unless they want to. This is very important to the overall player experience.

Together, these two avenues of attacking the problem should provide the EVE denizens with a much smoother and more consistent playing experience. It's not a silver bullet, but every little bit helps.

Now, if only the Reykjavík Transit Authority would do the same...

By the way, "A Tale of Two Cities" refers to Reykjavík and Jita. Clever, ain't it? Jita isn't even a city... Hilarious!
- Nonni


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Simond on September 25, 2008, 11:03:51 AM
Conclusion: People will find a new hub.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Pax on September 25, 2008, 11:07:17 AM
This... and united. will need a new spot.
The second thing after setting IBBI to avoid Jita will be setting IBBI to avoid Rancer.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Thrawn on September 25, 2008, 11:26:49 AM
Conclusion: People will find a new hub.

Why?    :uhrr:


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Moosehands on September 25, 2008, 12:01:48 PM
Conclusion: People will find a new hub.

Why?    :uhrr:

Because people are social, because the convenience of a centralized market translates to profit on the part of merchants, and because the contract system blows goats and the best way to get stuff sold through it is to find a huge group of players and spam local.

They removed all belts in Jita, the population continued to climb.  Prior to that, they rearranged the whole universe.  People just moved from Yulai to Jita.

Every decently populated MMO is always going to have at least one completely player driven gathering spot.  Trying to find ways to reduce the reasons people will want to go there is counterproductive.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Thrawn on September 25, 2008, 12:07:54 PM
From what I understood most of what they are doing is just trying to keep people that have no reason to go to Jita from having to pass through it, not trying to force people out of it.  The only change that seems to be trying to keep people from actually using Jita is the removal of agents, but I can't imagine any sane person would try to run missions out of Jita to begin with.  I don't see why this would make people look for a new trade hub.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Jayce on September 25, 2008, 12:44:53 PM
I think this has a chance of working, certainly better than arbitrarily capping the number of people.

The agent system has that ridiculous penalty for turning down a mission, so people are torn between contributing to the Jita problem when t they get a mission there, or losing standing.  IMO the mechanic is sort of busted, but that's another topic.

What they are doing is trying to make it a trade hub ONLY, no other activities, even passing through, encouraged.  I think that will help some.  Next they should consider some incentives to move to Amarr or Oursalaert or Rens.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: eldaec on September 25, 2008, 03:03:21 PM
Conclusion: People will find a new hub.

Why?    :uhrr:

Because people are social, because the convenience of a centralized market translates to profit on the part of merchants, and because the contract system blows goats and the best way to get stuff sold through it is to find a huge group of players and spam local.

No, he means why would this mean they have to find a new hub that isn't Jita?

Nothing in this post makes Jita a less desirable place to trade.

Quite the reverse.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Endie on September 25, 2008, 04:30:32 PM
Indeed.  This will remove a few score people who didn't need to be in Jita, and weren't trading.  It'll improve lag a little.   Making Jita a nicer place to trade,  Marginally.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: MahrinSkel on September 25, 2008, 11:45:38 PM
Yeah, they already tried the "break the hub" method, in Yulai.  It's why everything in Genesis region is so many more jumps from everything than normal, they cut so many of the links to and inside that region to get people out of Yulai.  For a while it pushed people to Oursalaert, but eventually things stablized on the Rens/Amarr/Jita triangle, with Jita by far the highest volume. I used to go to Ours and Yulai for cheap T2, people who set up their production lines there early and didn't move away when the markets did would often price war each other to 20% discounts off the Jita price fighting over a trivial local market base, and I'd swoop in every few weeks and buy the market out.

--Dave


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: lac on September 30, 2008, 07:33:43 AM
More lag fixing dev blog goodness.
Quote
StacklessIO

For the past two years we have been developing new technology, called StacklessIO, to increase the performance of our network communication infrastructure in EVE. This new network layer reduces network latency and improves performance in high-volume situations, e.g., in fleet-fights and market hubs such as Jita.

On 16 September we successfully deployed StacklessIO to Tranquility. We noticed an astounding, yet expected, measurable difference.

Normally Jita reaches a maximum of about 800-900 pilots on Sundays. On the Friday following the deployment of StacklessIO there were close to 1,000 concurrent pilots in Jita and on Saturday the maximum number reached 1,400. This is more than have ever been in Jita at the same time. Jita could become rather unresponsive at 800-900 pilots but on Sunday it was quite playable and very responsive with 800 pilots. It should continue to be snappier and more responsive in the future.

http://staff.ccpgames.com/explorer/devblogs/images/playerpopinjita.png

The Measurements

This spring we saw the fruits of our R&D work when we deployed StacklessIO to Singularity and began measuring the difference.

Confirming suspicion we had had for a long time the Core Server Group team, lead by CCP porkbelly, proved that StacklessIO vastly outperformed the old network technology. They also demonstrated that the old technology could sometimes, under extreme lab conditions, delay network packets in an arbitrary manner for a significant amount of time.

Later CCP Atlas of the EVE Software Group showed that those symptoms also happened in wild with the old technology; although on a smaller scale, then network response to client requests could in some cases be delayed for a few minutes on highly loaded nodes in the cluster. In particular we measured client network communication to the node that hosts Jita.

Since the client and server clocks are synchronised then we called a remote service on the server from the client, the server responded with the global time and we measured the server and client deltas. We also called that same service directly on the server node to measure the service call's processing time, which turned out to be negligible.

What we discovered in our tests is that the server delta was almost identical to the client's received delta so the delay was due to the remote service call taking a long time to reach the server-side service, most likely somewhere in the network layer on the server. The values on the graphs below are seconds.

http://staff.ccpgames.com/explorer/devblogs/images/delta_jetbyte.png

This is a Sunday profile and is very specific to Jita. This was one of the primary reasons why Jita could sometimes become fairly unresponsive on Sunday evenings. It was not uncommon that client requests could take up to 1-2 minutes to reach the service layer on the server, and the requests would be delayed seemingly randomly since for two requests in succession then the first one could be delayed for minutes while the second one would get a response almost immediately. From a player's perspective this would manifest itself in lag and strange client behaviour as requests were delayed and completed by the server much out-of-order.

By comparison, here is Jita with approximately the same number of players, around 800 pilots in local, after the deployment of StacklessIO.

http://staff.ccpgames.com/explorer/devblogs/images/delta_stacklessIO.png

It's very apparent that StacklessIO does not demonstrate any of the earlier issues. There is only one small spike and two small bumps but we must keep in mind that such isolated occurrences could be caused by general network issues on the internet. Since the client/server network communication has to travel through the internet then some delays would be expected depending on general internet health and the particular ISP.

There are no systemic issues anymore as with the old network technology and StacklessIO provides all-around superior performance.

One of the other measurements we did was to ping all nodes in the cluster from a single node to measure network latency within the server cluster. The values in the tables below are seconds.

Ping Pre-StacklessIO
Time Minimum Maximum Average Stddev
16:00 0.00065 3.22 0.042 0.032
21:00 0.00064 4.36 0.068 0.056
22:00 0.00065 1.21 0.027 0.027
23:00 0.00064 4.36 0.027 0.028
00:00 0.00065 1.01 0.020 0.017

Ping StacklessIO
Time Minimum Maximum Average Stddev
16:00 0.00064 2.00 0.014 0.021
21:00 0.00064 1.02 0.014 0.018
22:00 0.00064 0.25 0.009 0.011
23:00 0.00064 1.93 0.014 0.021
00:00 0.00064 1.06 0.010 0.014

From the table we notice that the minimum values are the same before and after. The lowest maximum is approximately the same but overall the maximum values are lower with StacklessIO by approximately a factor of 2 and they are more consistent.

The average values are lower overall with StacklessIO by a factor of 3 and the standard deviation is lower by a factor of 2. Below is a visual representation of the average values.

http://staff.ccpgames.com/explorer/devblogs/images/ping_avg.png

At 1,400 pilots on Saturday the node hosting Jita ran out of memory and crashed. As crazy as it may sound this was very exciting since we had not been in the position before to be able to have that problem. We immediately turned our attention to solving that challenge and are making significant progress. I will provide information on that specific effort in a dev blog later.

But we have already made good progress on memory optimisation as a part of the StacklessIO technology effort, e.g., memory usage on the proxy servers in the cluster reduced significantly.

http://staff.ccpgames.com/explorer/devblogs/images/proxy.png

The two tall peaks are memory issues we encountered in the first days after deploying StacklessIO. A task force was put into action and it reduced the memory usage by 50% compared to pre-StacklessIO values.

The graphs and measurements above show primarily statistics for Jita but the benefits of StacklessIO apply everywhere. We measured Jita in particular because we could rely on activity and regular load in Jita for measurements. StacklessIO should have a positive impact on your playing experience, no matter where you are in the EVE universe and no matter what you are doing.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Amarr HM on September 30, 2008, 08:22:19 AM
Actually CCP seem to have done it right this time, can't see anything but positive from these changes. Though if they do decide to move the system cap limit elsewhere then it might be exploitable. The AP feature is interesting, Jita was always easily avoidable if you setup waypoints but there's lots of dumbass or lazy pilots out there so I suppose it's circumventing them more than Jita  :drillf:

CRITICAL

    * A new autopilot feature has been added: Avoid System. This will allow you to set systems, constellations or even entire regions to be ignored by your autopilot. By default, all autopilots will avoid Jita unless you configure it otherwise.
    * Dynamic System Cap: We have implemented a system that allows us to set a maximum number of players that are allowed into a given solar system. At this time only Jita has this cap set. The cap can be changed during runtime and we will be adjusting it in the coming days as we make software improvements and add more hardware to Jita.
    * An Auto-Move feature has been added to the login process. If you are trying to log into a system that currently has more people in it than the dynamic cap (see above), you will be given the option of moving yourself to a neighbouring system.

FEATURES

World Shaping

    * 3 new jumps have been added around Jita:

    * Muvolailen - Maurasi
    * Maurasi - Perimeter
    * Veisto - Sarekuwa


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Predator Irl on September 30, 2008, 08:33:07 AM
This sounds very promising, the idea of having no mission runners in Jita long overdue. Bypassing Jita completely, well I think this will take a decent amount of traffic out of the way also. Who knows, Jita of the future could be lag-free. I've already noticed improvements in the past few weeks.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Amarr HM on September 30, 2008, 09:30:48 AM
They really took their time but this is a welcome change to everyone I'm sure.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Moosehands on September 30, 2008, 10:59:18 AM
From what I understood most of what they are doing is just trying to keep people that have no reason to go to Jita from having to pass through it, not trying to force people out of it.  The only change that seems to be trying to keep people from actually using Jita is the removal of agents, but I can't imagine any sane person would try to run missions out of Jita to begin with.  I don't see why this would make people look for a new trade hub.

Looks like I completely misunderstood your point then.  Sorry about that.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: bhodi on September 30, 2008, 12:52:08 PM
You know what else broke today?

Outposts (stations) no longer show up in the right click menu.

That's pretty rad.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Endie on September 30, 2008, 12:59:24 PM
We've got a pile of bugs: people moving while cloaked in battleships without cloaks fitted.  People in fleet appearing as members of NPC corps.  Drones travelling into pos shields.  And so on...


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Der Helm on September 30, 2008, 01:38:01 PM
We've got a pile of bugs: people moving while cloaked in battleships without cloaks fitted.  People in fleet appearing as members of NPC corps.  Drones travelling into pos shields.  And so on...
Well, good time to train recon ships in the safety of a station then.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: lac on October 03, 2008, 07:57:55 AM
Mo' speed.

Quote
We recently deployed a new technology to the EVE universe, StacklessIO, which is a new, robust, network technology for both the EVE server and clients. The server version was released 16 September and the client version was released 30 September with the Empyrean Age 1.1.1 patch. We have received great feedback and we hope you are enjoying it. In my dev blog on StacklessIO I mentioned that there would be a follow-up dev blog on related topics.

And here it is: 2^EVE = EVE64.

StacklessIO, after years of development, has been a big success. We measured the improved performance, and you've told us on the forums and in the local chat in Jita that we have made a significant advancement in our goal of eliminating all lag from EVE Online.

Normally Jita reaches a maximum of about 800-900 pilots on any given Sunday. On the Friday following the deployment of StacklessIO, 19 September, there were close to 1,000 concurrent pilots in Jita and on the Saturday, 20 September, the maximum number reached 1,400. This is more than have ever been in Jita at the same time. Under our old network technology Jita could become rather unresponsive at 800-900 pilots but on the Sunday, 21 September, it was quite playable and very responsive with 800 pilots, thanks to StacklessIO.

Alas, there were teething problems. At 1,400 pilots the node hosting Jita ran out of memory and crashed. As crazy as it may sound this was very exciting since we had not been in the position before to be able to have that problem, as Jita would lag out before reaching that point under our old network technology. We immediately turned our attention to solving the challenge of giving the EVE server more memory to access.

CCP porkbelly wrote a dev blog three years ago entitled "64 Bits" where he described our first attempts at compiling the EVE server as a 64-bit program and the main reason for doing so: Access to more memory. At that time we were not able to complete the 64-bit migration since the old network technology did not work correctly as a 64-bit program. Having replaced the old network technology with StacklessIO we were in the position to continue that work.

And we started it, completed it and deployed EVE64 last week! Yes, we pulled it off in a single week! That might almost sound recklessly fast to some but this was achieved with a strike team that stepped up to the challenge. There is a lot of enthusiasm within CCP today to tackle the lag monster now that we have this new platform to build on.

The EVE server runs on a cluster of blades and is divided into proxy nodes and server nodes. The EVE clients connect to the proxy nodes, which act as dispatchers and are also an outer layer of defense for the server nodes that run the solar systems simulation.

The proxies are now all running EVE64. We are planning to reduce the number of proxy nodes, which in return will lead to overall increased performance of the EVE server as the total number of proxy servers in our system affects scalability of our application layer. Now that the proxy nodes can address more memory they have the ability to service more client connections, as their performance is mostly a function of IO capacity (StacklessIO) and memory. The proxies are essentially proprietary software routers that just became vastly more powerful under this new paradigm.

The server nodes will run a mix of 32- and 64-bit nodes since most nodes in the cluster don't have memory requirements requiring EVE64. By replacing 32-bit code with 64-bit code more memory is immediately required since, e.g., all memory pointers double in size. The need has to be clear as there is not gain in all cases to run EVE64, but where there is need we are now able to respond to it. Our network protocols that run on top of StacklessIO make sure that this mixed mode cluster configuration of EVE32 and EVE64 runs completely transparent to all code within the system.

The normal setup in the cluster for the server nodes is that each blade has two 64-bit processors, 4 GB of memory and runs Window Server 2003 x64. Each blade runs two nodes and each node then hosts a number of solar systems. There are also dedicated nodes for the market, dedicated nodes for corporation services, a dedicated head node for the cluster, etc.

Finally there is a pool of dedicated dual-CPU, dual-core, machines that only run a single EVE64 node per machine. Jita and four other high use solar systems are assigned to that pool. That pool is now running all native 64-bit code and the blades have been upgraded to 16 GB of memory. These blades also have more powerful CPUs which has helped as well. We are currently working with our vendors on testing out even more powerful hardware options now that we can utilise the hardware much better.

This Monday, 29 September, we saw a fleet battle with over 1100 pilots reported in local. Field reports indicate that the fight was quite responsive for the first 10 minutes but then the node "missed its heart beat" as we call it and was removed from the cluster by our cluster integrity watchdog routines. This again is another exciting problem as we can address that as well under our StacklessIO world and that will be the subject of the next blog.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: eldaec on October 06, 2008, 11:56:20 PM
Magic Patch!  :pedobear:


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: Endie on October 07, 2008, 03:05:53 AM
Magic Patch!  :pedobear:

Well, some of the stuff that has been happening in Vale this week (dying but keeping your ship, getting insurance and losing a few modules; dead ships staying on grid but becoming untargetable beside their pods; drones bouncing in and out of hostile POS shields; 700-person lag in 350-person systems and many, many more) are probably best chalked up to supernatural forces, it is true.


Title: Re: CCP Fixing lag - system player limits implemented!
Post by: lac on October 08, 2008, 01:50:33 AM
More 'under the hood' talk.
Quote
Gentlemen!

I thought I would add to the recent dev blogs we have had over the last couple of weeks and talk about what has been going on with the Tranquility cluster itself - in relation to the StacklessIO and 64-bit EVE enhancements and where we are heading into the future.

1 x EVE Server Cluster

The EVE Cluster is broken into 3 distinct layers, and a bit of the terminology that is thrown around from time to time (including later in this blog) can be explained quite simply here.

* Proxy Blades- These are the public facing segment of the EVE Cluster - they are responsible for taking player connections and establishing player communication within the rest of the cluster.
* SOL Blades - These are the workhorses of Tranquility and are the primary focus of our ongoing work. The cluster is divided across 90 - 100 SOL blades which run 2 nodes each.
o Node - a single EVE server process. This is the lowest level of granularity within the cluster.
o Dedicated SOL blade - These are SOL blades that we dedicate to one system only. Systems such as Jita, Motsu and Saila reside on these. They run two nodes like any other SOL blade, however the second node is idle and does not load any solar systems.
* Database Cluster - This is the persistence layer of EVE Online. The running nodes interact heavily with the Database, and of course pretty much everything to do with the game lives here. Thanks to our RamSans, our database is able to keep up with the enormous I/O load that Tranquility generates.
o At peak hours, our database is processing over 2,000 transactions per second, which generates around 38,000 IOPS (input output operations per second)
o To keep up with this load, we currently have two RamSans.

1 x SOL Blade

The EVE Server application itself (also known as a node) is primarily a CPU intensive process. Due to the nature of the Stackless Python programming methodology chosen for EVE Online, the python component of each node is a single thread, which means it can only ever utilize 1 CPU core at a time.

Our SOLs are IBM blades, and up until quite recently were almost all running AMD Opteron 2.8Ghz Dual Core processors with 4GB of DDR1 Ram. Over the last 6 months or so, we have been investigating options for replacing these Opteron processors with something more powerful. We selected some dual socket, dual core Intel Xeon 3.0Ghz Woodcrest blades for testing purposes, and have been using them as an integral part of our StacklessIO testing (as blogged about here by CCP Explorer). Now that StacklessIO has been released we are able to use these blades to their fullest, and as a first step looked at ways we could use these test blades on Tranquility.

1 x Rapid Deployment

When we hit 1400 players in Jita and then had the unfortunate incident where the SOL blade powering Jita ran out of memory, we looked to our Intel test blades for help. We shuffled some RAM around and were able to get 5 new Intel SOL blades with 16Gb of DDR2 Ram each ready for use. We did a staggered test deployment of these to Tranquility last week. On Friday, confident of their stability and anticipating performance increases, we set them up as dedicated SOL blades. That evening, Jita, Saila and Motsu were performing better than ever, and there was much rejoicing. Over the last weekend, the GM's did not receive a single "Stuck Character" petition from Jita!

3 x Epic Fleet Fights

That Saturday, out of the blue we saw one of the nodes supporting 0.0 go to Critical status and shortly afterwards it shut down. This happened a few more times in quick succession, and it became apparent that there was a new issue where extremely loaded nodes were simply not able to keep up with their heartbeat. This issue in itself is fixable and we are working hard to get it resolved.

At this point, it was apparent that with 700+ players trying to "pew pew", the AMD node they were on was not going to do anything other than keep crashing. We re-mapped the system in question to one of our dedicated Intel blades, just to see what it was capable of. Jita had performed so well the night before, that we thought these nodes would handle a fleet fight quite nicely. The system held, and the rest, as they say, is history.

On Sunday night, the M-OEE8 System was the hotspot and it had been placed on an Intel 64 bit dedicated SOL blade in anticipation. It held fine with a peak of around 450 players.

On Monday night, over 1000 players tried to start a fight in this system. As with Sunday, we had anticipated there would be fighting there, so it had been placed on a dedicated node. Unfortunately, what had caused node crashes at 700 players on our AMD blades caused our Intel blade to miss its heartbeat after going a bit over 1200 players. Interestingly enough, despite missing its heart beat, many players have reported that the performance of this blade with 1000 players was very good in the 10 - 15 minutes prior to its shutdown.

I would like to stress that we at CCP are very excited by this, and we are very hopeful that once the issue causing these node deaths is solved that we will start to see this impressive performance much more often. A lot of people have put in a lot of hard work towards new technologies and it is starting to pay off for you, the players.

So where do we go from here?

We are by no means finished with these upgrades, and there is still a lot of work to be done. During this last two week period, we have proved the readiness of some of our new technology, and we now need to work on the best way to ensure everyone can benefit...

Newer, Faster Blades

Our 3.0Ghz Intel Woodcrest blades are nice, but that processor architecture has been replaced by Wolfdale, which is even more powerful. So we have put a fast-track order in for some Intel Xeon 3.3Ghz Wolfdale blades. We expect to have these in the cluster very soon and we anticipate these will give us an even bigger performance boost than we have seen so far, paving the way for a new Tranquility Cluster. It is worth noting that the hardware we are beginning to purchase now is the hardware that will see us all the way into the HPC era. There will be a detailed presentation about the status of this project at Fanfest in November.

Help us to help you

It's nice that we have this new hardware, but there is going to be an interim period while we work to upgrade the existing hardware to this HPC-ready specification. During this period, we will be proactively working to place fleet fight systems onto a dedicated node at downtime. We often can't predict where our players are planning to unleash hell, so we need to know which systems are going to have fleet fights! We are working on a way to allow players to directly contact the Virtual World Operations team with this information, however in the mean time corporation directors are invited to petition any planned operations (use the Stuck Character category at least 24 hours in advance, and please include estimated attack / defense numbers), and we will take note of this when we assign systems to dedicated sol blades during downtime.

With that, I will leave you with my final - subtle - personal thoughts on the matter.

This is EVE Online!

Impossible is nothing!

HPC?
"High performance computing" - A supercomputer that ccp is producing in partnership with just about everyone. Most supercomputers are designed around running multiple threads at a stately pace across distributed clusters. TQ needs to run single threaded apps as fast as fast can be, so they're researching areas that have never been touched before.