f13.net

f13.net General Forums => General Discussion => Topic started by: K9 on July 23, 2013, 01:37:43 PM



Title: Cloud computing services
Post by: K9 on July 23, 2013, 01:37:43 PM
I'm just wondering if anyone here has any experience with any of the available cloud computing services that now exist.

I'm looking at some CPU-intensive jobs for some research I'm doing which could easily require a few months of runtime each, although I have the option to do more shorter runs, or fewer longer runs, and I'm not sure where the balance is. I have read through bits of the Amazon EC2 (http://aws.amazon.com/ec2/) site, but I was wondering if someone here could explain this to me like I'm five. My main confusion is over the difference between things like light vs medium vs heavy utilisation and the size of the cluster. I assume more $$ means more speed, but is there a sweet spot I should be aiming for?

I'm guessing that because I don't want to buy instances for a whole year (just for as long as the job takes) I'm probably better off going for the on-demand services rather than reserving a whole instance for myself?


Title: Re: Cloud computing services
Post by: Ghambit on July 23, 2013, 02:00:26 PM
The only bit I know of is the free 750hrs (which is about a month straight) they were hammering on academia to use.  I believe there MIGHT be more free time if you're affiliated as well (a school, IEEE, ACM, whatever).  If I were you, I'd just try it out for the free time as see how it feels.

There's a lot of damned hype around EC2 so it can't be all that bad.


Title: Re: Cloud computing services
Post by: Ironwood on July 23, 2013, 02:06:57 PM
What is it you're actually trying to accomplish and what were you going to use ?


Title: Re: Cloud computing services
Post by: K9 on July 23, 2013, 02:09:25 PM
Yea, I was looking at the free 750hours, as I read it that's limited to a single Micro instance anyway, so a lot of the confusion over what to use goes away. I just wonder where the value-for-money comes in once you start wanting or needing more CPU power.

Cheers.


Title: Re: Cloud computing services
Post by: K9 on July 23, 2013, 02:11:08 PM
What is it you're actually trying to accomplish and what were you going to use ?

Phylogenetic analysis; I need to build a lot of trees. Hundreds of millions of them in fact, and some of them are awkwardly large. I have access to a HPC cluster here, but it has a tendency to get switched off at weekends sometimes, and I may need to run things for weeks or months to get enough of a sample.


Title: Re: Cloud computing services
Post by: Lantyssa on July 23, 2013, 03:42:10 PM
You're academia, aren't you?  Can't you get a grant to run on several of the large clusters such as PNL?  Are those restricted to US groups only?


Title: Re: Cloud computing services
Post by: Trippy on July 23, 2013, 03:52:46 PM
Yes I can go into all sorts of gory details about EC2 as I used it daily for over 2 years.

Can you be more specific about what this phylogenetic analysis task entails? E.g. is mostly integer-based arithmetic or floating-point? Is the task highly parallelizable (uses multiple threads and/or processes)? How much memory and disk I/O are involved?


Title: Re: Cloud computing services
Post by: Salamok on July 23, 2013, 09:07:31 PM
The shitty part of EC2 is that it is confusing as hell to figure out what an elastic compute unit is and how many of them you need, the upside is if you go no prepaid or contract you can pull the plug at any time w/o being bound by any sort of commitment.  So you can set up some sort of test run and keep a close eye on what it is costing you.  

Amazon seems to be ridiculously far ahead of what everyone else is offering in this space (vpc, elasticache, dynamo db and the shitload of other service types they offer above and beyond a simple server or database) but it sounds like you may only need raw compute time, if this is the case you might find a cheaper alternative elsewhere.  The people I meet that are heavily cloud based are using Amazon to spin up large hadoop clusters to process real time analytics for extremely high traffic sites (they also bitch about this shit going down all the time).  Hadoop also seems like it would work really well for what you are describing.

That all said from what I can tell Amazon doesn't seem to be in the business of building your solution as much as they are in the business of providing you a platform so while there is good information available on how to do stuff and Amazon is willing to discuss some details on how to go about it (and point you to more documentation) it is still going to be up to you to configure it all.


P.S. - IIRC the trial developer micro instance includes a concurrent database instance as well (and a few other things) so you get 750 hours ec2 + 750 hours of rds..


Title: Re: Cloud computing services
Post by: Ghambit on July 23, 2013, 10:58:15 PM
They've got a big conference in Vegas comin up in Nov.; boot camps and all.  Even a gamedev section.  'Twer me I'd expense it out and go.
https://reinvent.awsevents.com/index.html



Title: Re: Cloud computing services
Post by: Kageru on July 23, 2013, 11:08:13 PM

I look forward to hearing the executive summary. Is buying computer power that way cheaper than getting some second hand PC's and letting them grind 24/7.


Title: Re: Cloud computing services
Post by: Trippy on July 23, 2013, 11:17:53 PM
No it isn't.


Title: Re: Cloud computing services
Post by: Quinton on July 23, 2013, 11:39:57 PM
Operating your own hardware does have additional costs, but yeah, in general dollar-per-unit-of-compute is going to be less if you do so.

The advantage of cloud compute stuff is being on-demand, being able to pay for 800 cores for 1 hour instead of running your own machine for a month (assuming your work is parallelizable), being able to easily scale to meet demand if you need it, not having to deal with your own physical machine deployment, upgrades, replacements, etc.

If you're doing computationally demanding stuff or just work that requires continuous CPU load while running, you won't want Amazon's "micro" instances -- they're designed for low cost, low utilization, small burst stuff:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts_micro_instances.html


Title: Re: Cloud computing services
Post by: K9 on July 24, 2013, 03:18:48 AM
You're academia, aren't you?  Can't you get a grant to run on several of the large clusters such as PNL?  Are those restricted to US groups only?

I'm not sure what PNL is, but I looked at the academic section of amazon and their quarterly grant application window closed about a week ago, and I'm not really keen to wait around for months for capacity, although I'll probably put some sort of request in on the offchance that it goes through.

Yes I can go into all sorts of gory details about EC2 as I used it daily for over 2 years.

Can you be more specific about what this phylogenetic analysis task entails? E.g. is mostly integer-based arithmetic or floating-point? Is the task highly parallelizable (uses multiple threads and/or processes)? How much memory and disk I/O are involved?


Cheers Trippy. The software is called BEAST (http://beast.bio.ed.ac.uk/Main_Page), and it's one of the most widely used tool for phylogenetic analysis in research. I'm not great at the low-level side of things, so I'm not sure what sort of arithmetic it uses, but I do know it has options for running on GPUs, so I'm guessing it is optimised for whatever those use. In terms of memory, as best I am aware, it can run out of memory with large trees, but otherwise it isn't particularly memory intensive.

The executive summary of the software:

Quote
BEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.

In terms of disk I/O, it has to write to a logfile, and these can get quite large (100s of MB to 10s of GB). The software itself is not inherently parallelizable (as I understand the meaning of the word), however you can run multiple independent instances and combine the output. This is done quite commonly to improve sample sizes within a shorter timeframe at the cost of CPU power.

Also, in my experience you don't want more than one instance of BEAST running on the same processor, because they can interfere with each other. As such, the idea of being able to partition runs either across or within instances has appeal. The other advantage (as I see it) is that amazon guarantees uptime, and instant (or near instant) access to the processor power. So I could kick my runs off and then leave them ticking for a month or so.


Title: Re: Cloud computing services
Post by: Ironwood on July 24, 2013, 03:24:17 AM
This is all fantastically doable, though Trippy will be much better than me at the headhurting technical ins and outs, but I have to warn you straight off the bat that what you're thinking about doing is looking to end up as a costly proposition.  Do you have your budget in order ?


Title: Re: Cloud computing services
Post by: Yegolev on July 24, 2013, 07:24:02 AM
It's definitely doable, but I'm interested to see what happens when regular people need on-demand computing.

Note that the term "cloud" is far too vague, which is why everyone is asking what you really need done.  Sounds like you need a computing system that will remain online for a month or three, and really you only need it to remain powered and operational for the duration, which is doable via your own hardware.  I suppose the question is: how much CPU/RAM/storage do you need?  This capacity planning can be done best by engaging people experienced with BEAST implementations.


Title: Re: Cloud computing services
Post by: K9 on July 24, 2013, 08:52:49 AM
Thanks for the comments; I agree that the nomenclature is frustratingly vague. Essentially my problem is that I have some computational work that I could run on my desktop if time wasn't an issue, but given that it is, and that the HPC cluster I have access to is a finite resource. I think part of my problem is that because AWS is trying to offer every possible cloud solution, it's not entirely clear to someone like me which is the one I want. Buying extra capacity is an alternative, but there's definitely an appeal to renting 100 Cores to get the runs done in two weeks rather than buying an 8core machine and getting it done in months.

It may prove to be cost-prohibitive, although I do have some grant money to cover stuff like this.

I've been through the BEAST mailing list, and I can't find any discussion of this. I'll probably ask there once I have actually managed how to get into an AWS instance, but I felt people here would probably have more experience with the fundamental 'how the fuck do I even use this' end of cloud computing.


Title: Re: Cloud computing services
Post by: Salamok on July 24, 2013, 09:06:10 AM
The fact that BEAST can be GPU based (you can do GPU computing on Amazon but their graphics cards may not be the best for what you are doing) leads me to think that building your own machine(s) might be reasonable.  Find some bitcoin farmers cost/performance chart and build a few of those.

I was looking at this how-to guide for bitcoin farming using AWS (https://bitcointalk.org/index.php?PHPSESSID=lb0gn937doaj32fannrr7mmtb0&topic=8405.0) and the xlarge instance the guy was using doesn't "mine" nearly as well as a reasonably priced DIY local machine.  Not that what you are doing is probably as simple of a calculation as churning through bitcoin hashes but still if someone took the time to optimized BEAST for GPU's then it is probably because that is a more cost effective way to run it.  Although one of the advantages of the cloud is the 1 click to spin up a somewhat tuned image of the OS of your choice in a few minutes.


Title: Re: Cloud computing services
Post by: Yegolev on July 24, 2013, 10:59:53 AM
Since he says it is possible to combine outputs of separate BEAST runs, I'm not sure that piling all the cores into one image is needed at all, if you can just get a few dozen shitboxes going it might be very cost effective.  Of course if you want 100 cores you'll need at least 50 shitboxes, so now we're looking at power requirements.

Curious about how BEAST handles interrupted workloads?  Can it resume a workload or is it lost when a machine goes offline?


Title: Re: Cloud computing services
Post by: Salamok on July 24, 2013, 11:15:42 AM
Since he says it is possible to combine outputs of separate BEAST runs, I'm not sure that piling all the cores into one image is needed at all, if you can just get a few dozen shitboxes going it might be very cost effective.  Of course if you want 100 cores you'll need at least 50 shitboxes, so now we're looking at power requirements.

Curious about how BEAST handles interrupted workloads?  Can it resume a workload or is it lost when a machine goes offline?

From the bitcoin mining stuff I have read a $150 graphics card outperforms a shitbox by at least a single order of magnitude, so a single machine with 4 x graphics cards  would probably out perform 50 shitboxes significantly.  

edit - actually the how to guide I posted has a breakdown of work done by the CPU vs. GPU: GPU calculated ~75,000khash/s, while each CPU core did approx 1,400khash/s

edit2 - This is all operating under the huge assumption that the workload for BEAST calculations per second are in someway similar to bitcoin hashes per second.  There are tons of ways that could be a faulty assumption but the fact that the makers of BEAST decided to add GPU support leads me to believe that at the very least GPU's are better for the task than CPU's in some way.


Title: Re: Cloud computing services
Post by: Trippy on July 24, 2013, 11:20:18 AM
Okay I did some Googling on this BEAST thingy and it doesn't seem like anybody has published any benchmarks on the different instance types so you are likely going to need to do this benchmarking yourself or find somebody who has the UNIX and AWS skills to set this up for you[1].

I did find a couple of benchmark pages here:

http://beast.bio.ed.ac.uk/Benchmarks
http://www.phylo.org/tools/beast_how_fast.html

so I'll use those later to make some educated guesses on what you'll want to get.

First the basics of choosing an EC2 instance for CPU-intensive tasks. I'll ignore the computer cluster instance types (the ones with dedicated GPUs) for now. Amazon uses the term "EC2 Compute Unit" (ECU) to measure the CPU performance of their instance types. An ECU is defined as:

Quote
EC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

As a simplification you can think of ECUs as GHzs. So a single virtual CPU (vCPU) that's rated as, say, 2.5 ECUs you can think of as running at 2.5 GHz on a Xeon or Operton CPU.

This table here shows you the vCPU and ECU ratings for the different instance types:

https://aws.amazon.com/ec2/instance-types/#measuring-performance

You have to divide the ECUs by the vCPUs to get the ECUs per vCPU. E.g. on a m1.large there are 2 vCPUs and 4 ECUs so each vCPU is running at 2 ECUs (2 GHz). On a c1.medium there are 2 vCPUs and 5 ECUs so each vCPU is running at 2.5 ECUs (2.5 GHz).

So the m1.large has less CPU performance than the c1.medium but it costs more to run[2] because it has a lot more memory available (7.5 GB vs. 1.7 GB).

To optimize your budget what you would want to do in theory is run one or more benchmarks on a sampling of instance type and figure out the cost per "throughput unit" of the different instance types and then pick the instance type that gives you the most bang for the buck.

I did something similar at the company I work at when we were still using EC2. One of the things we do is transcode and mix audio files (e.g. mix two WAV files together and convert the resulting WAV into an AAC/M4A file). So I setup a benchmark with a large fixed set of jobs to render (mix and transcode) and then ran a varying number of "worker" renderer processes (to figure out the optimum number of workers to run per instance type) on the main instance types, got the time it took on each instance and worker# combination, and calculated a "cost per render" to pick most cost effective instance type to use.


Amazon also now offers instances with NVIDIA GPUs which presumably would work with BEAST + BEAGLE. Looking at the first benchmark page linked above on benchmark2 the run using the GTX 285 GPU ran 5.6 times faster than the run using beagle_sse (0.84 minutes vs. 4.74 minutes).

The Tesla M2050 looks to be about same performance as the GTX 285 in terms of GFLOPs[3] though it's hard to say for sure since most GFLOPs numbers I've seen for the GTX 285 don't specify single precision or double precision (I'll assume they are the same performance for now).

Doing a rough back of the envelope calculation:

GPU instance has 16 vCPU / 33.5 ECU = 2.1 ECU per vCPU
GPU on-demand instance cost: $2.100 per Hour

c1.medium has 2 vCPU / 5 ECU = 2.5 ECU per vCPU
c1.medium on-demand instance cost: $0.145 per Hour

7x c1.medium = 14 vCPU / 35 ECU = $1.015 per Hour

So 7x c1.medium has roughly equivalent CPU power to the GPU instance (not including using the GPUs) and is about half as expensive. However benchmark2 runs 5.6 times faster on a GTX 285 so it would be cheaper to run benchmark2 on the GPU instance using the GPUs (2x more expensive but ~5x as fast)[4]. This assumes you can split your work onto multiple CPUs / instances and still get linear (with a slop of one) scalability and your work benefits as much from using the GPU as benchmark2 does.


Some other EC2 things to know about:

EC2 instances are virtualized computing environment running on top of the Xen hypervisor. As these are virtualized resources performance will not be as good as running on the equivalent hardware directly. On top of the overhead of running things on top of Xen there's the issue that your instance(s) is usually "competing" for hardware resources with the other instances running on that same server box so that reduces performance even more.

EC2 instances can fail. They don't magically "move" themselves if the underlying hardware fails so you will need some sort of backup/recovery plan. We had EC2 instances die on us, though their failure rate seems lower than the typical hardware you would build for yourself to run in a data center. Also the local disk storage on the instances that aren't EBS-backed will go poof if your instance dies or you terminate it explicitly. This means you'll need somewhere to store your files / results that you can't recreate that's more durable. Usually this either means sending the files you want to preserve to S3 or mounting an EBS volume on the instance and writing/copying the files there.

The performance of the same instance type can vary from instance to instance. Not only can the underlying hardware vary somewhat, changing the performance, but where the underlying server is racked and setup and what other people's instances on that hardware are doing can affect performance (as mentioned above). Some people will launch an instance run some I/O tests on it and then kill it if it doesn't meet some minimum standard for that instance type and repeat until they get a set of instances they are satisfied with.

Disk and network I/O on EC2 is notoriously "flaky" as in the latency and throughput can vary dramatically at any given time. Amazon has been trying to mitigate some of that by offering instance types with SSDs and better network performance but it's still not as good as using your own hardware. Your stuff is probably not going to bottleneck on this sort of thing, though, so you probably don't need to worry about this stuff.


[1] This seems like a good opportunity to publish a paper and get a free conference trip if you are willing to do the work.

[2] An m1.large costs $0.240 per Hour with an On-Demand US East instance compared to $0.145 per Hour for a c1.medium.

[3] http://www.nvidia.com/docs/IO/105880/DS-Tesla-M-Class-Aug11.pdf

Edit: [4] Actually it may be ~10x as fast as there are 2 M2050 in that instance


Title: Re: Cloud computing services
Post by: K9 on July 24, 2013, 11:59:49 AM
Trippy, that's a phenomenal breakdown, thanks so much for taking the time. It's getting towards the end of the day here, so I'll probably need to re-read it all again tomorrow and see that it makes sense.

If I get one of the free micro instances running, I should be able to load BEAST and my runfile into it fairly simply and set it running? I think this would be the next step, while considering the power:cost voodoo. Is it a matter of effectivley opening a remote desktop and then dragging and dropping the files and software in, or do I need some (semi-)permanent amazon skydrive folder for the instance to read and write from?

Since he says it is possible to combine outputs of separate BEAST runs, I'm not sure that piling all the cores into one image is needed at all, if you can just get a few dozen shitboxes going it might be very cost effective.  Of course if you want 100 cores you'll need at least 50 shitboxes, so now we're looking at power requirements.

Curious about how BEAST handles interrupted workloads?  Can it resume a workload or is it lost when a machine goes offline?

I think if the program dies it dies. It writes to a logfile that you can combine with other logfiles though. Essentially it is searching through a space of increasingly likely tree topologies and you can smush the trees it writes out together after a fashion to improve your search. Ideally though the longer the run the better though.

Some sort of GPU box is probably plan B, or plan A depending on how my supervisor sees things.


Title: Re: Cloud computing services
Post by: Yegolev on July 24, 2013, 12:15:56 PM
This Amazon stuff is obviously tailored for non-commercial applications, and I'll be very interested to find out how this plays out.  I work in a space where random slowdowns and colliding workloads are not generally tolerated.  I'm also curious about how much performance Xen saps, as well as how it manages workload, just from an academic perspective.


Title: Re: Cloud computing services
Post by: Trippy on July 24, 2013, 12:28:20 PM
If I get one of the free micro instances running, I should be able to load BEAST and my runfile into it fairly simply and set it running? I think this would be the next step, while considering the power:cost voodoo. Is it a matter of effectivley opening a remote desktop and then dragging and dropping the files and software in, or do I need some (semi-)permanent amazon skydrive folder for the instance to read and write from?
You are a Windows-user aren't you? :oh_i_see: :awesome_for_real:

Running Windows on EC2 is ridiculously expensive -- it adds 50% to the cost of running an instance. Do not run Windows on EC2. Learn to love Unix/Linux. If you have to run Windows you might want to check prices and performance on Azure.

On Unix the normal way this works is use ssh (scp) to copy the files from your local machine to the EC2 instance running Linux. The instance is setup with an ssh key when you launch it.


Title: Re: Cloud computing services
Post by: Trippy on July 24, 2013, 12:31:17 PM
This Amazon stuff is obviously tailored for non-commercial applications, and I'll be very interested to find out how this plays out.  I work in a space where random slowdowns and colliding workloads are not generally tolerated.  I'm also curious about how much performance Xen saps, as well as how it manages workload, just from an academic perspective.
Huh? Netflix and a large number of other commercial companies run stuff on EC2 and AWS. This is why when AWS goes down a large swath of the Web goes down with it.


Title: Re: Cloud computing services
Post by: K9 on July 24, 2013, 12:38:32 PM
If I get one of the free micro instances running, I should be able to load BEAST and my runfile into it fairly simply and set it running? I think this would be the next step, while considering the power:cost voodoo. Is it a matter of effectivley opening a remote desktop and then dragging and dropping the files and software in, or do I need some (semi-)permanent amazon skydrive folder for the instance to read and write from?
You are a Windows-user aren't you? :oh_i_see: :awesome_for_real:

Running Windows on EC2 is ridiculously expensive -- it adds 50% to the cost of running an instance. Do not run Windows on EC2. Learn to love Unix/Linux. If you have to run Windows you might want to check prices and performance on Azure.

On Unix the normal way this works is use ssh (scp) to copy the files from your local machine to the EC2 instance running Linux. The instance is setup with an ssh key when you launch it.


I have to use what I'm given ok!  :grin: Although frankly, while I have a linux machine at home, windows is by and large so much easier day-to-day for me as a user I'm not complaining. Linux does have a slight tendency to be dick-punchy at times in my experience.

Is there any particular difference with the linux instance? I assume it doesn't matter what machine you use to remote desktop or SSH into the instance with?

This Amazon stuff is obviously tailored for non-commercial applications, and I'll be very interested to find out how this plays out.  I work in a space where random slowdowns and colliding workloads are not generally tolerated.  I'm also curious about how much performance Xen saps, as well as how it manages workload, just from an academic perspective.

I'll try and give some feedback when I have some, from the perspective of a non-technical user.


Title: Re: Cloud computing services
Post by: Trippy on July 24, 2013, 12:42:46 PM
Generally you don't run a windowing environment on a Linux EC2 instance cause that would be painfully slow and a resource hog. It also wouldn't solve your file transfer problem (would need to run something like Samba for that). You can ssh to your Linux EC2 instance from any environment that supports ssh including Windows.


Title: Re: Cloud computing services
Post by: Ironwood on July 24, 2013, 12:44:24 PM
By the looks of it, BEAST will Unix up :  You really ought to, since as Trippy points out, it will bring the cost tumbling down.

We go mostly Windows Instances, except for Zabbix, and the costs can get mighty.

EDIT :  and apparently more discussion while typing.  Ah well.


Title: Re: Cloud computing services
Post by: K9 on July 24, 2013, 12:55:06 PM
Thanks for clearing that up. To clarify though, once the instance terminates, everything in it goes, so I'd need to SSH in to put everything in there, and then again to get everything out, is that right? Can the instance write the activity log to a command prompt or something too?


Title: Re: Cloud computing services
Post by: Ironwood on July 24, 2013, 01:26:01 PM
You can use either snapshots or EBS to keep them if you want.  That way you can spin them up again from base.


Title: Re: Cloud computing services
Post by: K9 on July 24, 2013, 01:30:17 PM
Is that part of the EC2, or a separate program I should use? Please excuse my ignorance.


Title: Re: Cloud computing services
Post by: Ironwood on July 24, 2013, 01:33:37 PM
Part of it.


Title: Re: Cloud computing services
Post by: K9 on July 24, 2013, 01:54:20 PM
Grand, thanks a lot!


Title: Re: Cloud computing services
Post by: K9 on July 24, 2013, 01:54:56 PM
Maybe I'll write this up as F13's most boring radicalthon ever.


Title: Re: Cloud computing services
Post by: Ironwood on July 24, 2013, 02:16:23 PM
Just let us know when you get to the End-Boss.  He's hard.

 :oh_i_see:


Title: Re: Cloud computing services
Post by: Trippy on July 24, 2013, 02:20:01 PM
EBS is effectively a network storage drive. You can create an EBS volume of any size you need and then "attach" it to an instance, mounting it as a partition on that instance (analogous to mapping a network share to a drive letter on Windows). If the instance an EBS volume is attached to goes away the EBS volume still persists and you can attach it to another instance to get at the contents again (it has to be attached somewhere, though, you can't access it independently from an EC2 instance). EBS volumes are very very reliable (much more reliable than a network storage array most people can build for themselves). We never had one lose data or "go bad" when we were using them. They also have a built-in backup feature where you can take a "snapshot" of an EBS volume which gets stored on S3. They are, however, not very performant unless you pay more and even then you are still not getting the performance of a "local" drive.

In your case it's probably easiest for you to write your activity log directly to an EBS volume. This might actually slow down the performance somewhat (assuming BEAST log writing is synchronous blocking I/O) but it's easier than rolling your own solution to backup the activity log somewhere else like S3.

If you had the requisite Unix skills a more clever way would be to send your activity log through something like syslog which forwards the log entries to a central place or collect them into a DB running somewhere. That way when you have multiple instances working you can see all the log entries in one place rather than having to check each instance individually. But that takes work.


Title: Re: Cloud computing services
Post by: K9 on July 24, 2013, 02:25:29 PM
Thanks Trippy, this is really helpful. I'll have a good dig through this tomorrow and see what I can make work. I really appreciate it.

Just let us know when you get to the End-Boss.  He's hard.

 :oh_i_see:

At the moment I feel like I'm wiping on Lord Marrowgar all over again, only this time I can't blame it on inept, semi-afk people  :awesome_for_real:


Title: Re: Cloud computing services
Post by: Samwise on July 24, 2013, 02:31:58 PM
I'm disappointed that this thread has yet to produce any gems via the cloud-to-butt translator plugin.


Title: Re: Cloud computing services
Post by: Ironwood on July 24, 2013, 02:33:55 PM
I would like you to know that it was considered.



Title: Re: Cloud computing services
Post by: Ironwood on July 24, 2013, 02:35:08 PM
Thanks Trippy, this is really helpful. I'll have a good dig through this tomorrow and see what I can make work. I really appreciate it.

Just let us know when you get to the End-Boss.  He's hard.

 :oh_i_see:

At the moment I feel like I'm wiping on Lord Marrowgar all over again, only this time I can't blame it on inept, semi-afk people  :awesome_for_real:

Well, the good news is you can make some IAM accounts for Trippy and I and he can do the clever bits and I can do the charging you for it.



Title: Re: Cloud computing services
Post by: Quinton on July 24, 2013, 07:31:47 PM
If I get one of the free micro instances running, I should be able to load BEAST and my runfile into it fairly simply and set it running? I think this would be the next step, while considering the power:cost voodoo. Is it a matter of effectivley opening a remote desktop and then dragging and dropping the files and software in, or do I need some (semi-)permanent amazon skydrive folder for the instance to read and write from?

The micro instances are designed for bursty use and will throttle your CPU usage if you try to run continuously.  They are not going to be good for trying to figure out how the larger instances will perform.  They should be fine for figuring out how to setup and configure your environment though.


Title: Re: Cloud computing services
Post by: Ingmar on July 24, 2013, 08:35:01 PM
At the moment I feel like I'm wiping on Lord Marrowgar all over again, only this time I can't blame it on inept, semi-afk people  :awesome_for_real:

CLOOOUDSTOOOORM


Title: Re: Cloud computing services
Post by: Viin on July 24, 2013, 08:59:49 PM
... They also have a built-in backup feature where you can take a "snapshot" of an EBS volume which gets stored on S3. They are, however, not very performant unless you pay more and even then you are still not getting the performance of a "local" drive. ...

Do you have to use an EBS volume to move data to an S3 bucket? Or is there another way to move data to an S3 bucket? (in the case of BEAST, maybe during a log roll?)


Title: Re: Cloud computing services
Post by: Trippy on July 24, 2013, 09:11:50 PM
No you don't have to use an EBS volume to move something to S3. S3 is independent of EC2 and will work with anything that can "speak" HTTP.


Title: Re: Cloud computing services
Post by: Quinton on July 24, 2013, 11:31:05 PM
Yeah, there's a ton of tools out there to work with S3, including things like: http://code.google.com/p/s3fs/ that use fuse to let you mount s3 buckets as remote filesystems, etc.


Title: Re: Cloud computing services
Post by: K9 on July 25, 2013, 05:17:39 AM
Thanks Trippy, this is really helpful. I'll have a good dig through this tomorrow and see what I can make work. I really appreciate it.

Just let us know when you get to the End-Boss.  He's hard.

 :oh_i_see:

At the moment I feel like I'm wiping on Lord Marrowgar all over again, only this time I can't blame it on inept, semi-afk people  :awesome_for_real:

Well, the good news is you can make some IAM accounts for Trippy and I and he can do the clever bits and I can do the charging you for it.



Do you accept DKP?


Title: Re: Cloud computing services
Post by: Yegolev on July 25, 2013, 06:33:10 AM
Huh? Netflix and a large number of other commercial companies run stuff on EC2 and AWS. This is why when AWS goes down a large swath of the Web goes down with it.

Hey, I'm just going by your post which didn't paint the most rosy picture.  Load issues and service interruptions were prominent, at least in my mind.

I do imagine, though, that if you're Netflix then you're paying extra for better performance and uptime.  The EBS description is closer to what I expect from a SAN, though.  Small-time instances would get low-tier storage, high-dollar instances would get tier 1, and everybody getting snapshots is great.  I had to configure clone devices for anything I wanted to take a snapshot of, meaning I was grabbing double the storage and getting frowned upon by the SAN people, aka the Scrooge McDucks of Disks.

I reckon in my "spare time" I will look to see what sort of playtime I can have with this stuff.


Title: Re: Cloud computing services
Post by: Salamok on July 25, 2013, 06:59:45 AM
... They also have a built-in backup feature where you can take a "snapshot" of an EBS volume which gets stored on S3. They are, however, not very performant unless you pay more and even then you are still not getting the performance of a "local" drive. ...

Do you have to use an EBS volume to move data to an S3 bucket? Or is there another way to move data to an S3 bucket? (in the case of BEAST, maybe during a log roll?)

Based on my limited knowledge (talking to people who use it, reading AWS documentation and going to a few AWS workshops):
S3 is not a normal file system, it very quickly writes data to 2 locations before firing a callback giving the thumbs up that your data is safe (then proceeds to copy your data to 1 or more additional locations for performance) the process for getting stuff to and from it does not use normal file system commands so programs that are written to write files that way (aka everything) will not just work without help.  EBS is exactly like a normal file system, when you create an EC2 instance it is running on an EBS volume.  

If it were me in the case of BEAST I wouldn't even bother with S3 (unless it has a version that directly supports it), I would just have it write it's results to the local (ie root) file system and periodically pull down the results to a local machine and clear out the logs.  Use the EBS snapshot tool to backup the setup w/o data.

Actually if it were me I would just build a machine that was designed to crunch this type of data.  The only 3 upsides to using AWS for this that I can see are:
  • Scalability - IF this proves to be even moderately cost effective using AWS then you can scale it very very easily - this would allow you to turn around and sell unlimited amounts of BEAST time to other people.
  • Grant Money - It is probably much much easier to write a successful grant proposal filled with cloudy buzzwords than it is to write up a proposal for what looks to be a series of really kick ass gaming rigs.
  • Awesome Learning Experience - You get to learn a few things about AWS and if you wrote your grant properly you will be getting paid to learn AWS (and hopefully attend cool conferences and workshops)

Huh? Netflix and a large number of other commercial companies run stuff on EC2 and AWS. This is why when AWS goes down a large swath of the Web goes down with it.

Hey, I'm just going by your post which didn't paint the most rosy picture.  Load issues and service interruptions were prominent, at least in my mind.

I do imagine, though, that if you're Netflix then you're paying extra for better performance and uptime.  The EBS description is closer to what I expect from a SAN, though.  Small-time instances would get low-tier storage, high-dollar instances would get tier 1, and everybody getting snapshots is great.  I had to configure clone devices for anything I wanted to take a snapshot of, meaning I was grabbing double the storage and getting frowned upon by the SAN people, aka the Scrooge McDucks of Disks.

I reckon in my "spare time" I will look to see what sort of playtime I can have with this stuff.
Banks and credit card companies are also heavy cloud users, I was talking to a guy that said he helped some Australian commonwealth bank set up an entire system in the cloud to handle servicing a south east asian country that they didn't not feel was stable/safe enough to want a large physical presence/heavy infrastructure investment in.  Cloud scaleability works well for robustness as well, shit crashing regularly isn't as inconvenient in a large scale self aware cluster that is designed to spin up more nodes to deal with demand.  I suppose it comes down to what you define as a commercial application, the AWS system certainly is not designed to run off the shelf (shrink wrap) software but it is definitely aimed at enterprise software.   I consider Amazon (the website) to be one of the most amazing examples of a website on the internet (it is complex, fast and very robust), AWS was written first and foremost to create that "commercial application".


Title: Re: Cloud computing services
Post by: Yegolev on July 25, 2013, 08:30:16 AM
I agree with all of that stuff, it's mostly the term "cloud" that confuses things.  There are many, many ways to define and deploy virtual computers; my domain is not web services but rather what is traditionally called "enterprise computing", which has different architectural requirements than something like AWS and web stores and so on.  So it's really different from what I'm used to, while probably using tech that I'm somewhat-to-very familiar with.

So a bank "in the cloud" can mean lots of things.  What probably is meant here is that the main computing systems are not in Unstable Asian Location but there are leveraged endpoints for performance reasons.  Again that might mean "web interface for end users" or it might be old-style branch-office shit.  Or maybe something else, not sure.

My experiences in this bank story would be back in HQ, as it were, dealing with the largest systems and their concerns for performance, reliability, DR, etc.  I think that's not "cloud" because it's not abstracted enough (via some slick management front-end, which is how Cloud is sold to IT directors: reduce the number of FTE required to manage the systems) or because it doesn't provide a slick interface to end users.

I suppose this also connects to my advice to learn virtualization or get out of IT.  The fact is that major efforts are underway to automate away old-fashioned admin tasks, under the banner of Cloud Computing.  Amazon's thingy (I read a bit this morning) that lets you enter basic requirements and then builds out a partition for you?  That's a huge item in corp IT that will replace lots of frontline people sooner or later.  I you want to keep employed, it's advisable to be one of the people that know how to work on the cloud management systems when they break or need upgrading (and they will).


Title: Re: Cloud computing services
Post by: Salamok on July 25, 2013, 09:09:51 AM
Amazon's thingy (I read a bit this morning) that lets you enter basic requirements and then builds out a partition for you?  That's a huge item in corp IT that will replace lots of frontline people sooner or later.  I you want to keep employed, it's advisable to be one of the people that know how to work on the cloud management systems when they break or need upgrading (and they will).

The thing that amazed me (even more amazing since I don't have a heavy bash/powershell background) was the amount of stuff that could be done from the cli. We look at "cloud services" and think that's pretty cool then look a little bit deeper and see how people are manageing cloud services (AWS CLI tool, puppet, chef, vagrant, etc...) and think holy fuck.  I don't have a any recent exposure to large scaleable environments (like since 1998) so the thought that people are out there spinning up complex infrastructures almost as quickly as I start my little toy virtual box VM is pretty amazing to me, on top of that they are writing programs that monitor these infrastructures and spin up/down resources based on demand is just sick...

It actually becomes the cheapest option for many large scenarios that produce peak hour needs that would be very expensive to build as your baseline.


Title: Re: Cloud computing services
Post by: Yegolev on July 25, 2013, 10:42:32 AM
Since you've managed to make a left turn into my IT arena, here's a link to an IBM product for this:
http://www-03.ibm.com/systems/software/director/

Systems Director, or ISD as I heard it from Nigel Griffiths yesterday, is the Borg of system management tools.  I mean suite, not tool.  It works with many other lesser tools, such as all that Tivoli shit, NIM, etc., and it at least wants to give a GUI interface to doing just that: spin up on-demand partitions, rearrange workload on demand and automatically, and mostly take the neckbeard out of provisioning as well as general management of an IT infrastructure.  Of course, we're not there yet but it's coming.  Eventually, executives will be able to press the EASY button and get a fully-formed IT infrastructure as if it sprang from the forehead of Zeus (probably not really) and get a bill in the mail on the following week (probably really).  I expect that Amazon's press-button-get-computer system will be how everyone does things in the future, to some degree.

I believe HP's parallel product is Operations Orchestration (HPOO, nyuk nyuk) but I'll know soon enough once I take this training class.

The adoption is a bit slow, like virtualization, but the advantages of the required subcomponents are very real.  Now that virtualization is mostly in the wild, we can do newer things like IBM's Live Partition Mobility (if you have the latest hardware frames) which lets us move LPARs from one physical host to another with a subsecond network interruption.  This is even better than older shit like partitions within partitions and OS-based workload management.  That of course paves the way for enhanced load balancers that can move shit around based on need.  Automating this stuff is already possible.

So what will I be doing?  It's a fact that people will always need custom solutions and so I'm going to be the guy that designs those solutions, or architects them if I get moving on the career fast enough before I retire or die.  That will require knowledge of all these systems and the directors on top of them.  The admins that currently do ground-level things either manually or via larger toolset will need to know how to use and abuse the overarching suite that will be doing what they are doing now.


Title: Re: Cloud computing services
Post by: Viin on July 25, 2013, 09:24:55 PM
I think this is all very cool stuff. I peruse the AWS job openings every once in awhile and day dream, but the wife would never move to Seattle.


Title: Re: Cloud computing services
Post by: Yegolev on July 26, 2013, 07:01:00 AM
Heh, why does Amazon require you to be on site?  I know you're in management but still.  Employees are going to the Cloud as well.


Title: Re: Cloud computing services
Post by: Ironwood on July 26, 2013, 07:33:29 AM
It is rather hard to get in the door though. 


Title: Re: Cloud computing services
Post by: Hammond on July 26, 2013, 08:18:43 AM
Interesting stuff Yegolev, I have worked on a few smaller virtual clusters and never had anything big enough to justify the neato wizzbang stuff that IBM director / HPOO brings to the table. I am finally getting to the point that I am building out the bones of a new cluster here with a eye to retire / migrate most of the physical boxes.

Oh on the subject of Amazon one of my old bosses who current works there is going to be moving on. He is in the network side of the house and I guess that the hours are just brutal and he is tired of putting in 60 to 80 hour weeks.


Title: Re: Cloud computing services
Post by: Ard on July 26, 2013, 09:13:54 AM
Yeah, that jives with what more than a few people have told me about Amazon since I moved here, "Do not work there, unless you're only doing it to get experience and move on".


Title: Re: Cloud computing services
Post by: Yegolev on July 26, 2013, 11:27:24 AM
It is rather hard to get in the door though. 

This is very true for quite a while, and moreso now with all this push to have the machines fix themselves, plus every company embracing LEAN and downsizing teams.  There are portholes in India and Manilla, though I'm not suggesting that's the best way to get into large-corp IT.

There's still an opening to backfill my old spot that will go external on August 6, but I believe an old coworker with experience in the environment will be getting it.  Besides knowing someone, of course, I suppose what first-world job-seekers need to do is go contract and get converted.  Places to look include those which don't allow offshore resources, such as gubment and financial.


Title: Re: Cloud computing services
Post by: Ironwood on July 26, 2013, 11:39:50 AM
It's not my goal anymore.  Ironically, it's much easier to be a God amongst insects than kill yourself working in an environment where everyone's a clever dick and you don't stand out.

 :grin:


Title: Re: Cloud computing services
Post by: Yegolev on July 26, 2013, 12:23:35 PM
Not sure that's ironic. :awesome_for_real:

I've done a great job keeping my hours at 40/week, and from a recent meeting about timekeeping, it seems others in my group are also doing fine with that.  I imagine it depends greatly on how awful your organization is.  The smaller orgs I have been part of have all been shittier than the two massive global corps that I have worked for.

Also interesting is that since I switched over to another account, one that really seems to be at least as ornery and probably-unquestionably more important to my employers, my stress and frustration is lower because I'm no longer personally vested in any shenanigans or personalities.