Cloud Computing to Astronomy: Study of Performance at Circuit Level

Praveen Kr. Vishnoi¹, Rahul Yadav², Sohit Teotia³, Ravi Kant Vyas⁴

^{1, 2}Dept. of IT, SITE, Nathdwara, Rajasthan

³Dept. of MCA, IET, Alwar, Rajasthan

⁴Dept. of CS, ITM, Bhilwara, Rajasthan

¹erpv89@gmail.com, ²yadav.rahul@live.com, ³sohitt@gmail.com, ⁴vyasravikant@gmail.com

Abstract—Cloud computing is a powerful new technology in which recent investigating funds the benefits which offers scientific computing. We have used three workflow applications to compare the performance of processing data on the EC2 cloud of Amazon with the performance on the Abe high-performance cluster at the National Center for Supercomputing Applications (NCSA). We show that the Amazon EC2 cloud offers better performance and value for processor- and memory-limited applications than for I/O-bound applications. We provide an example of how the cloud is well suited to the generation of a science product: an atlas of periodograms for the 210,000 light curves released by the NASA Kepler Mission. This atlas means to support the identification of periodic signals, including those due to transiting exo-planets, in the Kepler data sets.

Keywords – Cloud computing, Workflow Application, Cost Analysis, I/O Bound, Memory Bound, Amazon EC2.

I. Introduction

Vast quantities of data are made available to scientists at sophisticated and ever-accelerating rate, and approaches to data mining data discovery and analysis are being developed to extract the full scientific content contained in this data tsunami. The e-Science paradigm is enabling the synthesis of new data products through the reprocessing and re-sampling of existing data products. In this paper, we investigate the applicability of cloud computing to scientific applications. Cloud computing in this context refers to pay-as-you-go, on-demand compute resources made available by a third-party provider.

We study the cost and performance of one cloud service provider, Amazon EC2, in running workflow applications. We investigate the performance of three workflow applications with different I/O, memory and CPU requirements on Amazon EC2, and compare the performance of the cloud with that of a typical high-performance cluster (HPC). Our goal is to identify which applications give best performance on the cloud at the lowest cost.

We describe the application of cloud computing to the generation of a new data product: an atlas of periodograms for the 210,000 light curves publicly released to date by the Kepler Mission. Kepler is designed to search for Earth-like exoplanets by observing their transits across their host star. The atlas of periodograms will support the identification of candidate exoplanets through the periodicities caused by the transits, as well as supporting studies of general variability in the Kepler data sets.

II. Evaluating Applications On The Amazon Ec2 Cloud

A) Goals Of This Study

Our goal is to determine which types of scientific workflow applications are cheaply and efficiently run on the Amazon EC2 cloud (hereafter, AmEC2). Workflows are loosely coupled parallel applications that consist of a set of computational tasks linked by data- and control-flow dependencies. Unlike tightly coupled applications, in which tasks communicate directly through the network, workflow tasks typically communicate using the file system: the output files written by one task become input files to be read by dependent tasks later in the workflow.

Given that AmEC2 uses only commodity hardware and given that applications make very different demands on resources, it is likely that cost and performance will vary dramatically by application. It was therefore important to study workflow applications that make different demands on resources. Thus the goals of this study are:

1. Understand the performance of three workflow applications with different I/O, memory and CPU requirements on a commercial cloud.

2. Compare the performance of the cloud with that of a high-performance cluster (HPC) equipped with a high-performance network and parallel file system, and

3. Analyze the various costs associated with running workflows on a commercial cloud.

B) Choice of Workflow Applications

We have chosen three workflow applications because their usage of computational resources is very different: Montage astronomy, Broadband from seismology, and Epigenome from biochemistry.

Montage [1] is a toolkit for aggregating astronomical images in Flexible Image Transport System (FITS) format into mosaics. Broadband generates and compares intensity measures of seismograms from several high- and low- frequency earthquake simulation codes. Epigenome maps short DNA segments collected using high-throughput gene sequencing machines to a previously constructed reference genome. Table I summarizes the relative resource usage of these three applications. The following three paragraphs give the technical specifications for the specific workflows used in this study.

Montage was used to generate an 8-degree mosaic of M16 composed of images from the Two Micron All Sky Survey. The resulting workflow contained 10,429 tasks, read 4.2 GB of input data, and produced 7.9 GB of output data. Montage is considered I/O-bound because it spends more than 95% of its time waiting on I/O operations.

App.	I/O	Memory	CPU
Montage	High	Low	Low
Broadband	Medium	High	Medium
Epigenome	Low	Medium	High

TABLE1: SUMMARY OF RESOURCES USE BY THE WORKFLOW

Broadband used four earthquake source descriptions and five sites to generate a workflow containing 320 tasks that read 6 GB of input data and wrote 160 MB of output data.

Broadband is considered to be memory-limited because more than 75% of its runtime is consumed by tasks requiring more than 1 GB of physical memory.

The Epigenome workflow maps human DNA sequences to a reference copy of chromosome 21. The workflow contained 81 tasks, read 1.8 GB of input data, and produced 300 MB of output data. Epigenome is considered to be CPU- bound because it spends 99% of its runtime in the CPU and only 1% on I/O and other activities.

C) Experimental Set-Up

In this section we summarize the experimental set-up. For a complete description, see [2] and [3]. We compared the performance of AmEC2 with that of the Abe High Performance Cluster (hereafter, Abe) at the National Center for Supercomputing Applications, which is equipped with a high speed network and parallel file system to provide high-performance I/O.

To provide an unbiased comparison of the performance of workflows on AmEC2 and Abe, the experiments presented here were all run on single nodes, using the local disk on both AmEC2 and Abe. For comparison we also ran experiments using the parallel file system on Abe. Intuitively, the parallel file system would be expected to significantly improve the runtime of I/O-intensive applications like Montage, but would be less of an advantage for CPU-intensive applications like Epigenome.

The two Abe nodes use the same resource type—64-bit Xeon machines—but differ in the I/O devices used: abe.local uses a local partition for I/O, and abe.lustre uses a shared Lustre™ parallel file system. Both instances use a 10 Gbps InfiniBand™ network. The computational capacity of abe.lustre is roughly equivalent to that of c1.xlarge, which is useful when comparing the performance of Abe and AmEC2 and in estimating the virtualization overhead of AmEC2.

On AmEC2, executables were pre-installed in a Virtual Machine image, which is deployed on the node. The input data was stored in the Amazon Elastic Block Store (EBS) (a SAN-like storage service), while the output and intermediate files, as well as the application executables, and were stored on local disks. For Abe, all application executables and input files were stored in the Luster™ file system. For abe.local experiments, the input data were copied to a local disk before running the workflow, and all intermediate and output data were written to the same local disk. For abe.lustre, all intermediate and output data were written to the Luster^TM file system.

All jobs on both platforms were managed and executed through a job submission host at the Information Sciences Institute (ISI) using the Pegasus Workflow Management System (Pegasus WMS), which includes Pegasus [4] and Condor [5]. On AmEC2 we configured our VM image to start Condor worker processes when each node boots.

III. EVALUATING APPLICATIONS ON THE AMAZON EC2 CLOUD

A) Performance Comparison between Amazon EC2 and the Abe High Performance Cluster

Fig. 1 compares the runtimes of the Montage, Broadband and Epigenome workflows on all the Amazon EC2 and Abe platforms listed in Table II. Runtime in this context refers to the total amount of wall clock time, in seconds, from the moment the first workflow task is submitted until the last task completes. These runtimes exclude the following:

· The time required to install and boot the VM, which typically averages between 70 and 90 seconds (AmEC2 only)

· The latency in provisioning resources from Abe using the pilot jobs, which is highly dependent on the current system load.

· The time to transfer input and output data, which varies with the load on the wide Area Network (WAN).

This definition of runtime (also known as-makespan) enables a one-to-one comparison of the performances of AmEC2 and Abe. The similarities between the specifications of c1.xlarge and abe.local allow us to estimate the virtualization overhead for each application on AmEC2.

i. Montage (I/O-bound)

The best performance was achieved on the m1.xlarge resource type. It has double the memory of the other types, and the extra memory is used by the Linux kernel for the file system buffer cache to reduce the amount of time the application spends waiting for I/O. This is particularly beneficial for an I/O-intensive application like Montage. Reasonably good performance was achieved on all resource types except m1.small, which is much less powerful than the other types. The AmEC2 c1.xlarge type is nearly equivalent to abe.local and delivered nearly equivalent performance (within 8%), indicating the virtualization overhead does not seriously degrade performance for this application.

ii. Broadband (Memory-bound)

For Broadband the processing advantage of the parallel file system largely disappears: abe.lustre offers only slightly better performance than abe.local And abe.local’s performance is only 1% better than c1.xlarge, so virtualization overhead is essentially negligible. For a memory-intensive application like Broadband, AmEC2 can achieve nearly the same performance as Abe as long as there is more than 1 GB of memory per core. If there is less, then some cores must sit idle to prevent the system from running out of memory or swapping.

FIGURE 1.THE PROCESSING TIMES FOR THE MONTAGE, BROADBAND AND EPIGENOME WORKFLOWS ON THE AMAZON EC2 CLOUD AND THE HIGH PERFORMANCE CLUSTER. THE LEGEND IDENTIFIES THE PROCESSORS.

ii. Epigenome (CPU-bound)

As with Broadband, the parallel file system in Abe provides no processing advantage for Epigenome: processing times on abe.lustre were only 2% faster than on abe.local. Epigenome performance suggests that virtualization overhead may be more significant for a CPU-bound application: the processing time for c1.xlarge was some 10% larger than for abe.local. The machines with the most cores gave the best performance for Epigenome, as would be expected for a CPU-bound application.

IV. COST-ANALYSIS OF RUNNING WORKFLOW APPLICATIONS ON AMAZON EC2

AmEC2 itemizes charges for the use of all of its resources, including charges for:

· Resources, including the use of VM instances and processing,

· Data storage, including the cost of virtual images in S3 and input data in S3,

· Data transfer, including charges for transferring input data into the cloud, and

· Transferring output data and log files between the summit host and AmEC2.

i. Resource Cost

Fig. 2 clearly shows the trade-off between performance and cost for Montage. The most powerful processor, c1.xlarge, offers a three-fold performance advantage over the least powerful, m1.small, but at five times the cost. The most cost-effective solution is c1.medium, which offers performance of only 20% less than m1.xlarge but at five- times lower cost.

For Broadband, the picture is quite different. Processing costs do not vary widely with machine, so there is no reason to choose less-powerful machines. Similar results apply to Epigenome: the machine offering the best performance, c1.xlarge, is also the second-cheapest machine.

FIGURE 2. THE PROCESSING COSTS FOR THE MONTAGE, BROADBAND AND EPIGENOME WORKFLOWS FOR THE AMAZON EC2 PROCESSORS GIVEN IN THE LEGEND.

ii. Storage Cost

Storage cost is made up of the cost to store VM images in the Simple Storage Service (S3, an object-based storage system), and the cost of storing input data in the Elastic Block Store (EBS, a SAN-like block-based storage system). Both S3 and EBS use fixed monthly charges for the storage of data, and charges for accessing the data, which can vary according to the application. The rates for fixed charges are $0.15 per GB/month for S3 and $0.10 per GB/month for EBS. The main difference in cost is that EBS is charged based on the amount of disk storage requested, whereas S3 only charges for what is used. Additionally, EBS can be attached to only one computing instance, whereas S3 can be access concurrently by any number of instances. The variable charges for data storage are $0.01 per 1,000 PUT operations and $0.01 per 10,000 GET operations for S3, and $0.10 per million I/O operations for EBS.

The 32-bit image used for the experiments in this paper was 773 MB, compressed, and the 64-bit image was 729MB, compressed, for a total fixed cost of $0.22 per month. The fixed monthly cost of storing input data for the three applications is shown in Table III. For the experiments described in this study, there were 4,616 S3 GET operations and 2,560 S3 PUT operations for a total variable cost of approximately $0.03. In addition, there were 3.18 million I/O operations on EBS for a total variable cost of $0.30.

Appli.	I/p Vol.	Monthly Cost
Montage	4.3 GB	$0.66
Broadband	4.1 GB	$0.66
Epigenome	1.8 GB	$0.26

TABLE 2 MONTHLY STORAGE COST

iii. Transfer Cost

In addition to resource and storage charges, AmEC2 charges $0.10 per GB for transfer into the cloud, and $0.17 per GB for transfer out of the cloud. Tables IV and V show the transfer sizes and costs for the three workflows.

Input is the amount of input data to the workflow, output is the amount of output data, and logs refers to the amount of logging data recorded for workflow tasks and transferred back to the summit host. The cost of the protocol used by Condor to communicate between the summit host and the workers is not included, but it is estimated to be much less than $0.01 per workflow.

iv. Sample Cost Effectiveness Study

We provide here a simple example of a cost-effectiveness study to answer the question: Is it cheaper to host an on- demand image mosaic service locally or on AmEC2? The costs described here are current as of October 2010. The calculations presented assume that the two services process requests for 36,000 mosaics of 2MASS images (total size10TB) of size 4 sq deg over a period of three years. This workload is typical of the requests made to an existing image mosaic service hosted at the Infrared Processing and Analysis Center. Table VI summarizes the costs of the local service, using hardware choices typical of those used at IPAC. The roll-up of the power, cooling and administration are estimates provided by IPAC system management. Table VII gives similar calculations for AmEC2; the costs there include the costs of data transfer, I/O etc. Clearly, the local service is the least expensive choice. The high costs of data storage in AmEC2, and the high cost of data transfer and I/O in the case of an I/O-bound application like Montage, make AmEC2 much less attractive than a local service. An example of a much more cost-effective astronomy application will be given in Section V.

Item	Cost ($)
12 TB RAID 5 disk farm and enclosure(3 yr support)	12,000
Dell 2650 Xeon quad–core processor,1 TB staging area	5,000
Power, cooling and administration	6,000
Total 3-year Cost	23,000
Cost per mosaic	0.64

TABLE 3. COST PER MOSAIC OF A LOCALLY HOSTED IMAGE MOSAIC SERVICE

Item	Cost ($)
Network Transfer In	1,000
Data Storage on Elastic Block Storage	36,000
Processor Cost (c1.medium)	4,500
I/O operations	7,000
Network Transfer Out	4,200
Total 3-year Cost	52,700
Cost per mosaic	1.46

TABLE 4: COST PER MOSAIC OF A MOSAIC SERVICE HOSTED IN THE AMAZON EC2 CLOUD

iii. Summary Of The Comparative Study: When To Use The Cloud

· Virtualization overhead on AmEC2 is generally small, but most evident for CPU-bound applications.

· The resources offered by AmEC2 are generally less powerful than those available in high-performance clusters and generally do not offer the same performance. This is particularly the case for I/O– bound applications, whose performance benefits greatly from the availability of parallel file systems. This advantage essentially disappears for CPU- and Memory-bound applications.

· End-users should understand the resource usage of their applications and undertake a cost-benefit study of the resources offered to establish a processing and storage strategy. They should take into account factors such as:

· Amazon EC2 itemizes charges for resource usage, data transfer and storage, and the impact of these costs should be evaluated.

· For I/O-bound applications, the most expensive resources are not necessarily the most cost- effective.

· Data transfer costs can exceed the processing costs for data-intensive applications.

· Amazon EC2 offers no cost benefit over locally hosted storage, and is generally more expensive, but does eliminate local maintenance and energy costs, and does offer high-quality, reliable, storage.

iv. Conclusions

Our study has shown that cloud computing offers a powerful and cost-effective new resource for scientists, especially for compute and memory intensive applications. For I/O-bound applications, however, high-performance clusters equipped with parallel file systems and high performance networks do offer superior performance. End- users should perform a cost-benefit study of cloud resources as part of their usage strategy.

We have used the calculation of an atlas of periodograms of light curves measured by the Kepler mission as an example of how the Amazon cloud can be used to generate a new science product. Although the monetary costs presented here were small, these costs can grow significantly as the number of curves grows, or as the search parameters are adjusted. As a result, commercial clouds may not be best suited for large-scale computations. On the other hand, there is now a movement towards providing academic clouds, such as those being built by Future Grid or the National Energy Research Scientific Computing Center (NERSC) that will provide virtual environment capabilities to the scientific community. What remains to be seen is whether the level of service provided by academia can be on the par with that delivered by commercial entities.

REFERENCES

[1] J. C. Jacob, D. S. Katz, G. B. Berriman, J. Good, A. C. Laity, E. Deelman, C. Kesselman, G. Singh, M.-H. Su, T. A. Prince, and R. Williams, “Montage: a grid portal and software toolkit for science-grade astronomical image mosaics”. Computational Science and Engineering. 2010. Vol, 4, Number 2, 1.

[2] G. B. Berriman, E. Deelman, P. Groth, and G. Juve. “The Application of Cloud Computing to the Creation of Image Mosaics and Management of Their Provenance”, SPIE Conference 7740: Software and Cyberinfrastructure for Astronomy (GET REF), 2010.

[3] G. Juve, E. Deelman, K. Vahi, G. Mehta, G. B. Berriman, B. P. Berman, and P. Maechling, “Scientific Workflow Applications on Amazon EC2”. Cloud Computing Workshop in Conjunction with e-Science Oxford, UK: IEEE, 2009.

[4] Deelman, E. et al., ―Pegasus: A framework for mapping complex scientific workflows onto distributed systems,‖ Scientific Programming, 2005.

Professor's Tip of Networks and Data

Friday, July 12, 2013