f13.net forums - Cloud computing services

Welcome, Guest. Please login or register.
July 01, 2025, 05:13:36 PM

we're back, baby

f13.net | f13.net General Forums | General Discussion | Topic: Cloud computing services

0 Members and 1 Guest are viewing this topic.

Pages: [1] 2

Author

Topic: Cloud computing services (Read 11441 times)

Terracotta Army
Posts: 7441

on: July 23, 2013, 01:37:43 PM

I'm just wondering if anyone here has any experience with any of the available cloud computing services that now exist.

I'm looking at some CPU-intensive jobs for some research I'm doing which could easily require a few months of runtime each, although I have the option to do more shorter runs, or fewer longer runs, and I'm not sure where the balance is. I have read through bits of the Amazon EC2 site, but I was wondering if someone here could explain this to me like I'm five. My main confusion is over the difference between things like light vs medium vs heavy utilisation and the size of the cluster. I assume more $$ means more speed, but is there a sweet spot I should be aiming for?

I'm guessing that because I don't want to buy instances for a whole year (just for as long as the job takes) I'm probably better off going for the on-demand services rather than reserving a whole instance for myself?

I love the smell of facepalm in the morning

Ghambit

Terracotta Army
Posts: 5576

Reply #1 on: July 23, 2013, 02:00:26 PM

The only bit I know of is the free 750hrs (which is about a month straight) they were hammering on academia to use. I believe there MIGHT be more free time if you're affiliated as well (a school, IEEE, ACM, whatever). If I were you, I'd just try it out for the free time as see how it feels.

There's a lot of damned hype around EC2 so it can't be all that bad.

"See, the beauty of webgames is that I can play them on my phone while I'm plowing your mom." -Samwise

Ironwood

Terracotta Army
Posts: 28240

Reply #2 on: July 23, 2013, 02:06:57 PM

What is it you're actually trying to accomplish and what were you going to use ?

"Mr Soft Owl has Seen Some Shit." - Sun Tzu

Terracotta Army
Posts: 7441

Reply #3 on: July 23, 2013, 02:09:25 PM

Yea, I was looking at the free 750hours, as I read it that's limited to a single Micro instance anyway, so a lot of the confusion over what to use goes away. I just wonder where the value-for-money comes in once you start wanting or needing more CPU power.

Cheers.

I love the smell of facepalm in the morning

Terracotta Army
Posts: 7441

Reply #4 on: July 23, 2013, 02:11:08 PM

Quote from: Ironwood on July 23, 2013, 02:06:57 PM

What is it you're actually trying to accomplish and what were you going to use ?

Phylogenetic analysis; I need to build a lot of trees. Hundreds of millions of them in fact, and some of them are awkwardly large. I have access to a HPC cluster here, but it has a tendency to get switched off at weekends sometimes, and I may need to run things for weeks or months to get enough of a sample.

I love the smell of facepalm in the morning

Lantyssa

Terracotta Army
Posts: 20848

Reply #5 on: July 23, 2013, 03:42:10 PM

You're academia, aren't you? Can't you get a grant to run on several of the large clusters such as PNL? Are those restricted to US groups only?

Hahahaha! I'm really good at this!

Trippy

Administrator
Posts: 23657

Reply #6 on: July 23, 2013, 03:52:46 PM

Yes I can go into all sorts of gory details about EC2 as I used it daily for over 2 years.

Can you be more specific about what this phylogenetic analysis task entails? E.g. is mostly integer-based arithmetic or floating-point? Is the task highly parallelizable (uses multiple threads and/or processes)? How much memory and disk I/O are involved?


« Last Edit: July 23, 2013, 03:54:25 PM by Trippy »

Salamok

Terracotta Army
Posts: 2803

Reply #7 on: July 23, 2013, 09:07:31 PM

The shitty part of EC2 is that it is confusing as hell to figure out what an elastic compute unit is and how many of them you need, the upside is if you go no prepaid or contract you can pull the plug at any time w/o being bound by any sort of commitment. So you can set up some sort of test run and keep a close eye on what it is costing you.

Amazon seems to be ridiculously far ahead of what everyone else is offering in this space (vpc, elasticache, dynamo db and the shitload of other service types they offer above and beyond a simple server or database) but it sounds like you may only need raw compute time, if this is the case you might find a cheaper alternative elsewhere. The people I meet that are heavily cloud based are using Amazon to spin up large hadoop clusters to process real time analytics for extremely high traffic sites (they also bitch about this shit going down all the time). Hadoop also seems like it would work really well for what you are describing.

That all said from what I can tell Amazon doesn't seem to be in the business of building your solution as much as they are in the business of providing you a platform so while there is good information available on how to do stuff and Amazon is willing to discuss some details on how to go about it (and point you to more documentation) it is still going to be up to you to configure it all.

P.S. - IIRC the trial developer micro instance includes a concurrent database instance as well (and a few other things) so you get 750 hours ec2 + 750 hours of rds..


« Last Edit: July 23, 2013, 09:10:19 PM by Salamok »

Ghambit

Terracotta Army
Posts: 5576

Reply #8 on: July 23, 2013, 10:58:15 PM

They've got a big conference in Vegas comin up in Nov.; boot camps and all. Even a gamedev section. 'Twer me I'd expense it out and go.
https://reinvent.awsevents.com/index.html

"See, the beauty of webgames is that I can play them on my phone while I'm plowing your mom." -Samwise

Kageru

Terracotta Army
Posts: 4549

Reply #9 on: July 23, 2013, 11:08:13 PM

I look forward to hearing the executive summary. Is buying computer power that way cheaper than getting some second hand PC's and letting them grind 24/7.

Is a man not entitled to the hurf of his durf?
- Simond

Trippy

Administrator
Posts: 23657

Reply #10 on: July 23, 2013, 11:17:53 PM

No it isn't.

Quinton

Terracotta Army
Posts: 3332

is saving up his raid points for a fancy board title

Reply #11 on: July 23, 2013, 11:39:57 PM

Operating your own hardware does have additional costs, but yeah, in general dollar-per-unit-of-compute is going to be less if you do so.

The advantage of cloud compute stuff is being on-demand, being able to pay for 800 cores for 1 hour instead of running your own machine for a month (assuming your work is parallelizable), being able to easily scale to meet demand if you need it, not having to deal with your own physical machine deployment, upgrades, replacements, etc.

If you're doing computationally demanding stuff or just work that requires continuous CPU load while running, you won't want Amazon's "micro" instances -- they're designed for low cost, low utilization, small burst stuff:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts_micro_instances.html

Terracotta Army
Posts: 7441

Reply #12 on: July 24, 2013, 03:18:48 AM

Quote from: Lantyssa on July 23, 2013, 03:42:10 PM

You're academia, aren't you? Can't you get a grant to run on several of the large clusters such as PNL? Are those restricted to US groups only?

I'm not sure what PNL is, but I looked at the academic section of amazon and their quarterly grant application window closed about a week ago, and I'm not really keen to wait around for months for capacity, although I'll probably put some sort of request in on the offchance that it goes through.

Quote from: Trippy on July 23, 2013, 03:52:46 PM

Cheers Trippy. The software is called BEAST, and it's one of the most widely used tool for phylogenetic analysis in research. I'm not great at the low-level side of things, so I'm not sure what sort of arithmetic it uses, but I do know it has options for running on GPUs, so I'm guessing it is optimised for whatever those use. In terms of memory, as best I am aware, it can run out of memory with large trees, but otherwise it isn't particularly memory intensive.

The executive summary of the software:

Quote

BEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.

In terms of disk I/O, it has to write to a logfile, and these can get quite large (100s of MB to 10s of GB). The software itself is not inherently parallelizable (as I understand the meaning of the word), however you can run multiple independent instances and combine the output. This is done quite commonly to improve sample sizes within a shorter timeframe at the cost of CPU power.

Also, in my experience you don't want more than one instance of BEAST running on the same processor, because they can interfere with each other. As such, the idea of being able to partition runs either across or within instances has appeal. The other advantage (as I see it) is that amazon guarantees uptime, and instant (or near instant) access to the processor power. So I could kick my runs off and then leave them ticking for a month or so.

I love the smell of facepalm in the morning

Ironwood

Terracotta Army
Posts: 28240

Reply #13 on: July 24, 2013, 03:24:17 AM

This is all fantastically doable, though Trippy will be much better than me at the headhurting technical ins and outs, but I have to warn you straight off the bat that what you're thinking about doing is looking to end up as a costly proposition. Do you have your budget in order ?

"Mr Soft Owl has Seen Some Shit." - Sun Tzu

Yegolev

Moderator
Posts: 24440

2/10 WOULD NOT INGEST

Reply #14 on: July 24, 2013, 07:24:02 AM

It's definitely doable, but I'm interested to see what happens when regular people need on-demand computing.

Note that the term "cloud" is far too vague, which is why everyone is asking what you really need done. Sounds like you need a computing system that will remain online for a month or three, and really you only need it to remain powered and operational for the duration, which is doable via your own hardware. I suppose the question is: how much CPU/RAM/storage do you need? This capacity planning can be done best by engaging people experienced with BEAST implementations.

Why am I homeless? Why do all you motherfuckers need homes is the real question.
They called it The Prayer, its answer was law
Mommy come back 'cause the water's all gone

Terracotta Army
Posts: 7441

Reply #15 on: July 24, 2013, 08:52:49 AM

Thanks for the comments; I agree that the nomenclature is frustratingly vague. Essentially my problem is that I have some computational work that I could run on my desktop if time wasn't an issue, but given that it is, and that the HPC cluster I have access to is a finite resource. I think part of my problem is that because AWS is trying to offer every possible cloud solution, it's not entirely clear to someone like me which is the one I want. Buying extra capacity is an alternative, but there's definitely an appeal to renting 100 Cores to get the runs done in two weeks rather than buying an 8core machine and getting it done in months.

It may prove to be cost-prohibitive, although I do have some grant money to cover stuff like this.

I've been through the BEAST mailing list, and I can't find any discussion of this. I'll probably ask there once I have actually managed how to get into an AWS instance, but I felt people here would probably have more experience with the fundamental 'how the fuck do I even use this' end of cloud computing.

I love the smell of facepalm in the morning

Salamok

Terracotta Army
Posts: 2803

Reply #16 on: July 24, 2013, 09:06:10 AM

The fact that BEAST can be GPU based (you can do GPU computing on Amazon but their graphics cards may not be the best for what you are doing) leads me to think that building your own machine(s) might be reasonable. Find some bitcoin farmers cost/performance chart and build a few of those.

I was looking at this how-to guide for bitcoin farming using AWS and the xlarge instance the guy was using doesn't "mine" nearly as well as a reasonably priced DIY local machine. Not that what you are doing is probably as simple of a calculation as churning through bitcoin hashes but still if someone took the time to optimized BEAST for GPU's then it is probably because that is a more cost effective way to run it. Although one of the advantages of the cloud is the 1 click to spin up a somewhat tuned image of the OS of your choice in a few minutes.


« Last Edit: July 24, 2013, 09:08:42 AM by Salamok »

Yegolev

Moderator
Posts: 24440

2/10 WOULD NOT INGEST

Reply #17 on: July 24, 2013, 10:59:53 AM

Since he says it is possible to combine outputs of separate BEAST runs, I'm not sure that piling all the cores into one image is needed at all, if you can just get a few dozen shitboxes going it might be very cost effective. Of course if you want 100 cores you'll need at least 50 shitboxes, so now we're looking at power requirements.

Curious about how BEAST handles interrupted workloads? Can it resume a workload or is it lost when a machine goes offline?

Why am I homeless? Why do all you motherfuckers need homes is the real question.
They called it The Prayer, its answer was law
Mommy come back 'cause the water's all gone

Salamok

Terracotta Army
Posts: 2803

Reply #18 on: July 24, 2013, 11:15:42 AM

Quote from: Yegolev on July 24, 2013, 10:59:53 AM

From the bitcoin mining stuff I have read a $150 graphics card outperforms a shitbox by at least a single order of magnitude, so a single machine with 4 x graphics cards would probably out perform 50 shitboxes significantly.

edit - actually the how to guide I posted has a breakdown of work done by the CPU vs. GPU: GPU calculated ~75,000khash/s, while each CPU core did approx 1,400khash/s

edit2 - This is all operating under the huge assumption that the workload for BEAST calculations per second are in someway similar to bitcoin hashes per second. There are tons of ways that could be a faulty assumption but the fact that the makers of BEAST decided to add GPU support leads me to believe that at the very least GPU's are better for the task than CPU's in some way.


« Last Edit: July 24, 2013, 11:22:17 AM by Salamok »

Trippy

Administrator
Posts: 23657

Reply #19 on: July 24, 2013, 11:20:18 AM

Okay I did some Googling on this BEAST thingy and it doesn't seem like anybody has published any benchmarks on the different instance types so you are likely going to need to do this benchmarking yourself or find somebody who has the UNIX and AWS skills to set this up for you[1].

I did find a couple of benchmark pages here:

http://beast.bio.ed.ac.uk/Benchmarks
http://www.phylo.org/tools/beast_how_fast.html

so I'll use those later to make some educated guesses on what you'll want to get.

First the basics of choosing an EC2 instance for CPU-intensive tasks. I'll ignore the computer cluster instance types (the ones with dedicated GPUs) for now. Amazon uses the term "EC2 Compute Unit" (ECU) to measure the CPU performance of their instance types. An ECU is defined as:

Quote

EC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

As a simplification you can think of ECUs as GHzs. So a single virtual CPU (vCPU) that's rated as, say, 2.5 ECUs you can think of as running at 2.5 GHz on a Xeon or Operton CPU.

This table here shows you the vCPU and ECU ratings for the different instance types:

https://aws.amazon.com/ec2/instance-types/#measuring-performance

You have to divide the ECUs by the vCPUs to get the ECUs per vCPU. E.g. on a m1.large there are 2 vCPUs and 4 ECUs so each vCPU is running at 2 ECUs (2 GHz). On a c1.medium there are 2 vCPUs and 5 ECUs so each vCPU is running at 2.5 ECUs (2.5 GHz).

So the m1.large has less CPU performance than the c1.medium but it costs more to run[2] because it has a lot more memory available (7.5 GB vs. 1.7 GB).

To optimize your budget what you would want to do in theory is run one or more benchmarks on a sampling of instance type and figure out the cost per "throughput unit" of the different instance types and then pick the instance type that gives you the most bang for the buck.

I did something similar at the company I work at when we were still using EC2. One of the things we do is transcode and mix audio files (e.g. mix two WAV files together and convert the resulting WAV into an AAC/M4A file). So I setup a benchmark with a large fixed set of jobs to render (mix and transcode) and then ran a varying number of "worker" renderer processes (to figure out the optimum number of workers to run per instance type) on the main instance types, got the time it took on each instance and worker# combination, and calculated a "cost per render" to pick most cost effective instance type to use.

Amazon also now offers instances with NVIDIA GPUs which presumably would work with BEAST + BEAGLE. Looking at the first benchmark page linked above on benchmark2 the run using the GTX 285 GPU ran 5.6 times faster than the run using beagle_sse (0.84 minutes vs. 4.74 minutes).

The Tesla M2050 looks to be about same performance as the GTX 285 in terms of GFLOPs[3] though it's hard to say for sure since most GFLOPs numbers I've seen for the GTX 285 don't specify single precision or double precision (I'll assume they are the same performance for now).

Doing a rough back of the envelope calculation:

GPU instance has 16 vCPU / 33.5 ECU = 2.1 ECU per vCPU
GPU on-demand instance cost: $2.100 per Hour

c1.medium has 2 vCPU / 5 ECU = 2.5 ECU per vCPU
c1.medium on-demand instance cost: $0.145 per Hour

7x c1.medium = 14 vCPU / 35 ECU = $1.015 per Hour

So 7x c1.medium has roughly equivalent CPU power to the GPU instance (not including using the GPUs) and is about half as expensive. However benchmark2 runs 5.6 times faster on a GTX 285 so it would be cheaper to run benchmark2 on the GPU instance using the GPUs (2x more expensive but ~5x as fast)[4]. This assumes you can split your work onto multiple CPUs / instances and still get linear (with a slop of one) scalability and your work benefits as much from using the GPU as benchmark2 does.

Some other EC2 things to know about:

EC2 instances are virtualized computing environment running on top of the Xen hypervisor. As these are virtualized resources performance will not be as good as running on the equivalent hardware directly. On top of the overhead of running things on top of Xen there's the issue that your instance(s) is usually "competing" for hardware resources with the other instances running on that same server box so that reduces performance even more.

EC2 instances can fail. They don't magically "move" themselves if the underlying hardware fails so you will need some sort of backup/recovery plan. We had EC2 instances die on us, though their failure rate seems lower than the typical hardware you would build for yourself to run in a data center. Also the local disk storage on the instances that aren't EBS-backed will go poof if your instance dies or you terminate it explicitly. This means you'll need somewhere to store your files / results that you can't recreate that's more durable. Usually this either means sending the files you want to preserve to S3 or mounting an EBS volume on the instance and writing/copying the files there.

The performance of the same instance type can vary from instance to instance. Not only can the underlying hardware vary somewhat, changing the performance, but where the underlying server is racked and setup and what other people's instances on that hardware are doing can affect performance (as mentioned above). Some people will launch an instance run some I/O tests on it and then kill it if it doesn't meet some minimum standard for that instance type and repeat until they get a set of instances they are satisfied with.

Disk and network I/O on EC2 is notoriously "flaky" as in the latency and throughput can vary dramatically at any given time. Amazon has been trying to mitigate some of that by offering instance types with SSDs and better network performance but it's still not as good as using your own hardware. Your stuff is probably not going to bottleneck on this sort of thing, though, so you probably don't need to worry about this stuff.

[1] This seems like a good opportunity to publish a paper and get a free conference trip if you are willing to do the work.

[2] An m1.large costs $0.240 per Hour with an On-Demand US East instance compared to $0.145 per Hour for a c1.medium.

[3] http://www.nvidia.com/docs/IO/105880/DS-Tesla-M-Class-Aug11.pdf

Edit: [4] Actually it may be ~10x as fast as there are 2 M2050 in that instance


« Last Edit: July 24, 2013, 11:25:43 AM by Trippy »

Terracotta Army
Posts: 7441

Reply #20 on: July 24, 2013, 11:59:49 AM

Trippy, that's a phenomenal breakdown, thanks so much for taking the time. It's getting towards the end of the day here, so I'll probably need to re-read it all again tomorrow and see that it makes sense.

If I get one of the free micro instances running, I should be able to load BEAST and my runfile into it fairly simply and set it running? I think this would be the next step, while considering the power:cost voodoo. Is it a matter of effectivley opening a remote desktop and then dragging and dropping the files and software in, or do I need some (semi-)permanent amazon skydrive folder for the instance to read and write from?

Quote from: Yegolev on July 24, 2013, 10:59:53 AM

I think if the program dies it dies. It writes to a logfile that you can combine with other logfiles though. Essentially it is searching through a space of increasingly likely tree topologies and you can smush the trees it writes out together after a fashion to improve your search. Ideally though the longer the run the better though.

Some sort of GPU box is probably plan B, or plan A depending on how my supervisor sees things.

I love the smell of facepalm in the morning

Yegolev

Moderator
Posts: 24440

2/10 WOULD NOT INGEST

Reply #21 on: July 24, 2013, 12:15:56 PM

This Amazon stuff is obviously tailored for non-commercial applications, and I'll be very interested to find out how this plays out. I work in a space where random slowdowns and colliding workloads are not generally tolerated. I'm also curious about how much performance Xen saps, as well as how it manages workload, just from an academic perspective.

Why am I homeless? Why do all you motherfuckers need homes is the real question.
They called it The Prayer, its answer was law
Mommy come back 'cause the water's all gone

Trippy

Administrator
Posts: 23657

Reply #22 on: July 24, 2013, 12:28:20 PM

Quote from: K9 on July 24, 2013, 11:59:49 AM

If I get one of the free micro instances running, I should be able to load BEAST and my runfile into it fairly simply and set it running? I think this would be the next step, while considering the power:cost voodoo. Is it a matter of effectivley opening a remote desktop and then dragging and dropping the files and software in, or do I need some (semi-)permanent amazon skydrive folder for the instance to read and write from?

You are a Windows-user aren't you? Ohhhhh, I see.

Running Windows on EC2 is ridiculously expensive -- it adds 50% to the cost of running an instance. Do not run Windows on EC2. Learn to love Unix/Linux. If you have to run Windows you might want to check prices and performance on Azure.

On Unix the normal way this works is use ssh (scp) to copy the files from your local machine to the EC2 instance running Linux. The instance is setup with an ssh key when you launch it.

Trippy

Administrator
Posts: 23657

Reply #23 on: July 24, 2013, 12:31:17 PM

Quote from: Yegolev on July 24, 2013, 12:15:56 PM

Huh? Netflix and a large number of other commercial companies run stuff on EC2 and AWS. This is why when AWS goes down a large swath of the Web goes down with it.

Terracotta Army
Posts: 7441

Reply #24 on: July 24, 2013, 12:38:32 PM

Quote from: Trippy on July 24, 2013, 12:28:20 PM

Quote from: K9 on July 24, 2013, 11:59:49 AM

You are a Windows-user aren't you? Ohhhhh, I see.

I have to use what I'm given ok! Oh ho ho ho. Reallllly?

Although frankly, while I have a linux machine at home, windows is by and large so much easier day-to-day for me as a user I'm not complaining. Linux does have a slight tendency to be dick-punchy at times in my experience.

Is there any particular difference with the linux instance? I assume it doesn't matter what machine you use to remote desktop or SSH into the instance with?

Quote from: Yegolev on July 24, 2013, 12:15:56 PM

I'll try and give some feedback when I have some, from the perspective of a non-technical user.

I love the smell of facepalm in the morning

Trippy

Administrator
Posts: 23657

Reply #25 on: July 24, 2013, 12:42:46 PM

Generally you don't run a windowing environment on a Linux EC2 instance cause that would be painfully slow and a resource hog. It also wouldn't solve your file transfer problem (would need to run something like Samba for that). You can ssh to your Linux EC2 instance from any environment that supports ssh including Windows.

Ironwood

Terracotta Army
Posts: 28240

Reply #26 on: July 24, 2013, 12:44:24 PM

By the looks of it, BEAST will Unix up : You really ought to, since as Trippy points out, it will bring the cost tumbling down.

We go mostly Windows Instances, except for Zabbix, and the costs can get mighty.

EDIT : and apparently more discussion while typing. Ah well.


« Last Edit: July 24, 2013, 12:55:57 PM by Ironwood »

"Mr Soft Owl has Seen Some Shit." - Sun Tzu

Terracotta Army
Posts: 7441

Reply #27 on: July 24, 2013, 12:55:06 PM

Thanks for clearing that up. To clarify though, once the instance terminates, everything in it goes, so I'd need to SSH in to put everything in there, and then again to get everything out, is that right? Can the instance write the activity log to a command prompt or something too?

I love the smell of facepalm in the morning

Ironwood

Terracotta Army
Posts: 28240

Reply #28 on: July 24, 2013, 01:26:01 PM

You can use either snapshots or EBS to keep them if you want. That way you can spin them up again from base.

"Mr Soft Owl has Seen Some Shit." - Sun Tzu

Terracotta Army
Posts: 7441

Reply #29 on: July 24, 2013, 01:30:17 PM

Is that part of the EC2, or a separate program I should use? Please excuse my ignorance.

I love the smell of facepalm in the morning

Ironwood

Terracotta Army
Posts: 28240

Reply #30 on: July 24, 2013, 01:33:37 PM

Part of it.

"Mr Soft Owl has Seen Some Shit." - Sun Tzu

Terracotta Army
Posts: 7441

Reply #31 on: July 24, 2013, 01:54:20 PM

Grand, thanks a lot!

I love the smell of facepalm in the morning

Terracotta Army
Posts: 7441

Reply #32 on: July 24, 2013, 01:54:56 PM

Maybe I'll write this up as F13's most boring radicalthon ever.

I love the smell of facepalm in the morning

Ironwood

Terracotta Army
Posts: 28240

Reply #33 on: July 24, 2013, 02:16:23 PM

Just let us know when you get to the End-Boss. He's hard.

Ohhhhh, I see.

"Mr Soft Owl has Seen Some Shit." - Sun Tzu

Trippy

Administrator
Posts: 23657

Reply #34 on: July 24, 2013, 02:20:01 PM

EBS is effectively a network storage drive. You can create an EBS volume of any size you need and then "attach" it to an instance, mounting it as a partition on that instance (analogous to mapping a network share to a drive letter on Windows). If the instance an EBS volume is attached to goes away the EBS volume still persists and you can attach it to another instance to get at the contents again (it has to be attached somewhere, though, you can't access it independently from an EC2 instance). EBS volumes are very very reliable (much more reliable than a network storage array most people can build for themselves). We never had one lose data or "go bad" when we were using them. They also have a built-in backup feature where you can take a "snapshot" of an EBS volume which gets stored on S3. They are, however, not very performant unless you pay more and even then you are still not getting the performance of a "local" drive.

In your case it's probably easiest for you to write your activity log directly to an EBS volume. This might actually slow down the performance somewhat (assuming BEAST log writing is synchronous blocking I/O) but it's easier than rolling your own solution to backup the activity log somewhere else like S3.

If you had the requisite Unix skills a more clever way would be to send your activity log through something like syslog which forwards the log entries to a central place or collect them into a DB running somewhere. That way when you have multiple instances working you can see all the log entries in one place rather than having to check each instance individually. But that takes work.

Pages: [1] 2

f13.net | f13.net General Forums | General Discussion | Topic: Cloud computing services

Jump to: