Friday, July 12, 2013

DEALING WITH UNCONCEALED KEYS IN RSA CRYPTOSYSTEM

DEALING WITH UNCONCEALED KEYS IN RSA CRYPTOSYSTEM

Rahul Yadav1, Praveen Kumar Vishnoi2, Sohit Kumar Ch.3
1,2Department of IT, SITE Nathdwara, Rajasthan
3Department of MCA, IET Alwar, Rajasthan
1yadav.rahul@live.com, 2erpv89@gmail.com,
3sohitt@gmail.com



Abstract: We discuss approaches for the performance and security of public key cryptosystems. In particular we study the RSA public key cryptosystem. We begin with an introduction to the theory of RSA public key cryptosystem and describe the main results to be used throughout the rest of the work. Next we discuss text book versions of existing algorithms suggested and used for the real world implementation of RSA public key cryptosystems in the literature. Then we discuss the pre-assumptions exist in the key generation of RSA, during this phase we will show the concept of un-concealed key values .We go on to analyze the security dimension on the RSA public key cryptosystem which use the concept of unconcealed key values during the key generation steps of RSA. On the next we will see that how we can avoid the generation of unconcealed key values
Key words & Phrases: Key Words and Phrases: Public-key cryptosystems, security, cryptography, unconcealed key(s), Bézout’s identity.

1. Introduction

There exist two main branches of Network Security: cryptography is domain concerning with the writing of messages in ciphers and the creation of cryptographic algorithms, while cryptanalysis is concerned with reading ciphers by breaking secret keys. In Ancient times there exist many examples of cryptography. Around thousand years BC, Egyptian people made use of hieroglyphic transformations to secure messages. Mesopotamian and Babylonian intellectuals applied similar techniques to render cuneiform tablets into garbage or simply Cipher to the uninitiated persons. The Greeks were first to employ cryptography (with the close relation to steganography, which is concerned with concealing the existence of communication rather than content) for army and top secret purposes.

With the advancement of computing and communication networks in the 1970s, securing communication and data in transition over the networks became an important concern in the private sector too, not limited with army. Especially the needs of the banking in particular, for secure digital communication became most urgent. Developments at IBM resulted to the design of Lucifer, one of the first ciphers for digital communications. The National Security Agency (NSA) offered several improvements to Lucifer, and in late 1975 the National Bureau of Standards presented its modified and more secure version as the Data Encryption Standard, or DES. Thus the first standard for encryption began to gain widespread use as a Private Key cryptographic method proceeded by the Diffie Hellman Key exchange to RSA and Digital Signatures and many more up signcryption.
In this paper our main focus will remain over RSA key generation. We begin with an introduction to the theory of RSA public key cryptosystem and describe the main results to be used throughout the rest of the work. Next we discuss text book versions of existing algorithms suggested and used for the real world implementation of RSA public key cryptosystems in the literature. Then we discuss the pre-assumptions exist in the key generation of RSA, during this phase we will show the concept of un-concealed key values.

2. RSA cryptosystem

RSA is the very first algorithm of Public Key Cryptography and most popular yet in modern era of communication. To encrypt a message M with RSA method, using a public key pair (e, n) proceed as follows. (Here e and n is a pair of positive integers.)

  First, represent the message as an integer between 0 and n — 1. (Break a long message into a series of blocks, and represent each block as such an integer.) Use any standard representation. The purpose here is not to encrypt the message but only to get it into the numeric form necessary for encryption, Then, encrypt the message by raising it to the eth power modulo n. That is, the result (the cipher text C) is the remainder when Me is divided by n.

    Decryption of the cipher, raise it to another power d, again modulo n. The encryption and decryption algorithms E and D are thus:
C   =   E(M) = Me   (mod n), for a message M .
M  =  D(C) = Cd    (mod n), for a cipher text C .
     Note that encryption does not increase the size of a message; both the message and the cipher text are integers in the range 0 to n — 1
     Method by which one chooses his or her encryption and decryption keys, if want to use RSA method; First compute n as the product of two large and distinct prime numbers p and q:
n = p *  q .
    These primes are very large, “random” primes. Although one will make n public, the factors p and q will be effectively hidden from everyone else due to the enormous difficulty of factoring n. This enables communicating entity hides the way decryption exponent d can be derived from encryption exponent e.  Then pick the integer d to be a large, random integer which is co-prime to (p — 1) * (q — 1). That is, check that d satisfies:
gcd(d, (p — 1) * (q — 1)) = 1
(“gcd” means “greatest common divisor”).
    
The integer e is finally computed from p, q, and d to be the “multiplicative inverse” of d, modulo (p — 1) • (q — 1) using extended Euclid algorithm. Thus we have
d ≡ e-1 (mod (p — 1) * (q — 1)).
Or simply,
e * d = 1   (mod (p — 1) * (q — 1)).
     The RSA algorithm should not be badly co-related with the “exponentiation” technique presented in Diffie and Hellman Key Exchange to solve the key distribution. Diffie and Hellman’s technique permits two users to determine a key in common to be used in a normal cryptographic system. It is not based on a trap-door one-way permutation

3. Break through with the foundational algorithms of RSA

Fermat’s and Euler’s Theorems, these are the two theorems that play important role in RSA’s public key cryptosystem. RSA’s implementation is impossible without help of the Extended Euclid Algorithm as two of the key exponents e and d holds the properties of Bézout's identity as
ax + by = gcd (a, b)
or for simplicity
ae + bd = 1
We crack a trick and when calculating decryption exponent d we are known to value e and its gcd (that is 1 for co-primes), and utilize Bézout's identity with Extended Euclid Algorithm to calculate d, and Fermat’s Theorem plays its role during enciphering and deciphering; while a special case of Euler’s Theorem plays role in key generation which is nothing but the Toient function calculated as a value
φ(n) = [(p-1)(q-1)].
It is used in generation of key exponents e and d.

Fermat’s Theorem: Fermat’s theorem states the fallowing: if a is a positive integer not divisible by p, then
ap-1 = 1 mod p
An alternative form of the theorem is also useful: If p is prime and a is any positive integer, then
ap = a mod p
Or its simply Rich Schroeppel’s proof of RSA’s underlying mathematics i.e.
Med = M mod n (as n is product of primes)

Euler’s Totient Function: Before presenting the Euler’s theorem, we need to introduce an important quantity in number theory, referred to as Euler’s Totient function and written φ(n), where φ(n) is the number of positive integers less than n and relatively prime to n. For a prime number p,
φ(p) = p-1

Now suppose that we have two prime numbers p and q, with p != q. Then, for
n = p*q,
φ(n) = φ(pq) = φ(p) * φ(q) = (p -1) (q -1)
the fallowing relation holds if p and q are relatively prime
φ(n) = φ(pq) = φ(p) * φ(q )
And, for p being prime
φ(Pe)= Pe - Pe-1

Extended Euclidean Algorithm: The extended Euclidean algorithm is an extension to the Euclidean algorithm for finding the greatest common divisor (GCD) of integers a and b: it also finds the integers x and y in Bézout's identity.
ax + by = gcd (a, b)
(Typically either x or y is negative).
    
The extended Euclidean algorithm is particularly useful when a and b are coprime, since x is the modular multiplicative inverse of a modulo b.

Here we present the Iterative textbook version of Extended Euclid algorithm to find the value
d = b-1 mod m
Extended Euclid [m, b]   // m>b
1.      (A1, A2, A3) = (1, 0, m);  (B1, B2, B3) = (0, 1, b);
2.      if B3 = 0 return A3 = gcd(m, b); no multiplicative inverse
3.      if B3 = 1 return B3 = gcd(m, b); B2 = b-1 mod m
4.      Q = floor(A3 / B3)
5.      (T1, T2, T3) = (A1 – QB1, A2 – QB2, A3 – QB3)
6.      (A1, A2, A3) = (B1, B2, B3)
7.      (B1, B2, B3) = (T1, T2, T3)
8.      goto 2

4. Key generation of RSA algorithm in depth

RSA: Un-concealed Key Values: In a simplest textbook example of RSA keys yields;
p=17, q=11
n = p*q = 17*11 = 187
φ(n) = (p-1)*(q-1) = 16*10 = 160
we calculate all the key values and encountered some of the special cases. You may observe the key values for encryption keys e = (31, 49, 79, 81, 111, 129, 159) with respect to n=187, that Decryption keys d for these values by the relation d = (k* φ(n) +1)/e, are respectively identical, i.e.
e = (31, 49, 79, 81, 111, 129, 159).
d = (31, 49, 79, 81, 111, 129, 159).

“In the case, values of e map to some values of d; these key values can be referred as un-concealed keys”.
Now these three numbers are subject to major attention as their distance from Euler’s Totient function of n is fixed:
159 = 160 -1 = φ(n) - 1
(79, 81) = (160/2) +1 = (φ(n)/2) +1
Now notice these values for more large primes:
n = p * q
1517= 37 * 41
1641179 = 1999 * 821
201703 = 401 * 503
φ(n)
1440
1638360
200800
e = φ(n) – 1, (φ(n)/2) +1
1439, 719, 721
1638359, 819179, 819181
200799, 100399, 100401
d =
1439, 719, 721
1638359, 819179, 819181
200799, 100399, 100401

In general, for ease of notation  if we put
φ(n) = g
And for the simplified properties of congruence relation between e and d using Bézout's identity,
As in Bézout's identity
ax + by = gcd (a, b)

(Typically either x or y is negative). A Bézout's identity is the representation of the elements of a finite field under the multiplicative modulo operation. We can transform the modular multiplicative inverse relation used in RSA into Bézout's identity, as:
d = e-1 mod φ(n)
Implications;
·         d = (1/e) mod φ(n)
·         d*e = 1 mod φ(n)
·         d*e mod φ(n) = 1
·         d*e -1 mod φ(n) = 0
·         (d*e -1)/ φ(n) = k  
// using Bézout's properties for some integer constant k
Yields
·         (k* φ(n) + 1) / e = d

Since: (1438 * 1440 + 1)/ 1439 = 1439
(359* 1440 + 1)/ 719 = 719
(361 * 1440 + 1)/ 721 = 721
And (1638358 * 1638360 + 1)/ 1638359 = 1638359
(409589* 1638360 + 1)/ 819179 = 819179
(409591* 1638360 + 1)/ 819181 = 819181
And (200798 * 200800 + 1)/ 200799 = 200799
(50199 * 200800 + 1)/ 100399 = 100399
(50201 * 200800 + 1)/ 100401 = 100401

In general, If φ(n) = g
((g-2)*g +1)/ g-1 = g-1                           … (A)
(((g/4)-1)*g +1)/ ((g/2)-1) = (g/2)-1       ... (B)
(((g/4)+1)*g +1)/ ((g/2)+1) = (g/2)+1    ... (C)

So we can say that key values for e = φ(n) – 1, (φ(n)/2) +1 will always be unconcealed. Further this relation can’t be generalized for g/2w because we will not get the term of the form g/2w as the midterm in the relation. There will be 4 more unconcealed key values for this kind of finite field, i.e. multiplication modulo field, for which the fallowing relation will be always true including above 3 (i.e. a multiplication modulo field have 7 unconcealed entities excluding 0, 1, and Base(φ(n)):

5. Unconcealed Keys
In the ring of positive integers less than φ(n) = (p-1) (q-1) for primes p and q under the operations multiplication modulo, if we derive a relation between two integers said (e, d) of the form (the multiplicative inverse relation).
d ≡ e-1 mod φ(n)
     We find a mapping (d==e) for the exactly seven values of e, now if we sort these values these will lie in the ring in fallowing sequence:
1.               φ(n)/4 – v
2.               φ(n)/4 + v
3.               φ(n)/2 -1
4.               φ(n)/2 +1
5.               3 *φ(n)/4 – v
6.               3 *φ(n)/4 + v
7.               φ(n) -1
(for a fixed number v, randomly dependent on φ(n)).

     If we square these value(s) and then decrement the result by 1, the resulting value (e2-1) will be a multiple of φ(n).

5.1 Security violence due to uses of unconcealed keys:

By using the unconcealed keys in Public key cryptosystem disclosing public key is equivalent to disclosing the private key, so one should strictly avoid uses of these kinds of entities. For the cryptanalysts it will be good practice to try to decrypt the message first by the known entity that is public key.

5.2. Resolving Un-concealed key values

We found that each unconcealed key value fallows one definite relation which is;

(e2 -1) mod g= 0

     There will be two methods to avoid such kind of values,
1.      Modifying the structure of Extended Euclid algorithm
2.      Placing a check for if (e2 -1) mod g= then avoid the key, in step of selection of key value, step 4 of RSA key generation.

6. Conclusions

The main results drown from this work are fallowing
1.      Security of the RSA cryptosystem can be enhanced in a public key system by avoiding the unconcealed keys.
2.      We can use the RSA scheme in a symmetric manner by using unconcealed keys, the most easy calculated symmetric keys are given as φ(n)-1 and φ(n)/2 + 1.

References

Dissertations:
[1] G. Durfee; Cryptanalysis of RSA using algebraic and lattice methods Introduction notes; Stanford University, June 2002; Pages 1-3;

[2] R. Yadav; Review: RSA and ABESARM Methods for Performance Enhancement Chapters 3 and 6; Academy of Business & Engineering Sciences, Ghaziabad, Apr 2009; Pages 15-17, 24-28;

Papers:
[3] R.L. Rivest, A. Shamir, and L. Adleman; A Method for Obtaining Digital Signatures and Public-Key Cryptosystems Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge; 1978; original RSA papers @ http://people.csail.mit.edu/rivest/Rsapaper.pdf; p 6-8;

[4] R. Schroeppel; contributor author of the underlying mathematics on original RSA papers as per the [2], Massachusetts Institute of Technology, Cambridge; 1978;

[5] R. Yadav, A. Perti, and M. Tausif: Review RSA and ABESARM Methods for Performance Enhancement Proceedings of Conference on Information Technology for Business Transformation, Ajay Kumar Garg Engineering College, Ghaziabad; Apr 2009;

[6] R. Yadav, H. Gupta, and D. Yadav: RSA Operations with Performance Tuning Proceedings of Conference on Issues & Challenges in Networking Intelligence and Computing Technologies, Krishna Institute of Engineering & Technology, Ghaziabad; Sep 2011;

Books:
[7] William Stallings; cryptography and Network Security, Principles and Practice Third edition; Pearson Education (Singapore), Delhi; 2003;

[8] Singh, S. The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography. New York: Anchor Books, 1999.

Others:

[9] Wikipedia, the free encyclopedia; Bézout's identity, Modular multiplicative inverse, extended Euclidean algorithm; @ http://en.wikipedia.org/wiki/; Apr 2009, and revised on Nov 2011;

Cloud Computing to Astronomy: Study of Performance at Circuit Level

Cloud Computing to Astronomy: Study of Performance at Circuit Level

Praveen Kr. Vishnoi1, Rahul Yadav2, Sohit Teotia3, Ravi Kant Vyas4
1, 2Dept. of IT, SITE, Nathdwara, Rajasthan
3Dept. of MCA, IET, Alwar, Rajasthan
4Dept. of CS, ITM, Bhilwara, Rajasthan
1erpv89@gmail.com, 2yadav.rahul@live.com, 3sohitt@gmail.com, 4vyasravikant@gmail.com









AbstractCloud computing is a powerful new technology in which recent investigating funds the benefits which offers scientific computing. We have used three workflow applications to compare the performance of processing data on the EC2 cloud of Amazon with the performance on the Abe high-performance cluster at the National Center for Supercomputing Applications (NCSA). We show that the Amazon EC2 cloud offers better performance and value for processor- and memory-limited applications than for I/O-bound applications.  We provide an example of how the cloud is well suited to the generation of a science product: an atlas of periodograms for the 210,000 light curves released by the NASA Kepler Mission. This atlas means to support the identification of periodic signals, including those due to transiting exo-planets, in the Kepler data sets.

Keywords – Cloud computing, Workflow Application, Cost Analysis, I/O Bound, Memory Bound, Amazon EC2.

I.                    Introduction

Vast quantities of data are made available to scientists at sophisticated and ever-accelerating rate, and approaches to data mining data discovery and analysis are being developed to extract the full scientific content contained in this data tsunami. The e-Science paradigm is enabling the synthesis of new data products through the reprocessing and re-sampling of existing data products. In this paper, we investigate the applicability of cloud computing to scientific applications. Cloud computing in this context refers to pay-as-you-go, on-demand compute resources made available by a third-party provider.
We study the cost and performance of one cloud service provider, Amazon EC2, in running workflow applications. We investigate the performance of three workflow applications with different I/O, memory and CPU requirements on Amazon EC2, and compare the performance of the cloud with that of a typical high-performance cluster (HPC). Our goal is to identify which applications give best performance on the cloud at the lowest cost.
We describe the application of cloud computing to the generation of a new data product: an atlas of periodograms for the 210,000 light curves publicly released to date by the Kepler Mission. Kepler is designed to search for Earth-like exoplanets by observing their transits across their host star. The atlas of periodograms will support the identification of candidate exoplanets through the periodicities caused by the transits, as well as supporting studies of general variability in the Kepler data sets. 

II.                  Evaluating Applications On The Amazon Ec2 Cloud

A)       Goals Of This Study
Our goal is to determine which types of scientific workflow applications are cheaply and efficiently run on the Amazon EC2 cloud (hereafter, AmEC2). Workflows are loosely coupled parallel applications that consist of a set of computational tasks linked by data- and control-flow dependencies. Unlike tightly coupled applications, in which tasks communicate directly through the network, workflow tasks typically communicate using the file system: the output files written by one task become input files to be read by dependent tasks later in the workflow.
Given that AmEC2 uses only commodity hardware and given that applications make very different demands on resources, it is likely that cost and performance will vary dramatically by application. It was therefore important to study workflow applications that make different demands on resources. Thus the goals of this study are:
1.       Understand the performance of three workflow applications with different I/O, memory and CPU requirements on a commercial cloud.
2.       Compare the performance of the cloud with that of a high-performance cluster (HPC) equipped with a high-performance network and parallel file system, and
3.       Analyze the various costs associated with running workflows on a commercial cloud.

B)       Choice of Workflow Applications
We have chosen three workflow applications because their usage of computational resources is very different: Montage astronomy, Broadband from seismology, and Epigenome from biochemistry.
Montage [1] is a toolkit for aggregating astronomical images in Flexible Image Transport System (FITS) format into mosaics. Broadband generates and compares intensity measures of seismograms from several high- and low- frequency earthquake simulation codes.   Epigenome maps short DNA segments collected using high-throughput gene sequencing machines to a previously constructed reference genome.  Table I summarizes the relative resource usage of these three applications. The following three paragraphs give the technical specifications for the specific workflows used in this study.
Montage was used to generate an 8-degree mosaic of M16 composed of images from the Two Micron All Sky Survey.  The resulting workflow contained 10,429 tasks, read 4.2 GB of input data, and produced 7.9 GB of output data. Montage is considered I/O-bound because it spends more than 95% of its time waiting on I/O operations.

App.
I/O
Memory
CPU
Montage
High
Low
Low
Broadband
Medium
High
Medium
Epigenome
Low
Medium
High

TABLE1: SUMMARY OF RESOURCES USE BY THE WORKFLOW

Broadband used four earthquake source descriptions and five sites to generate a workflow containing 320 tasks that read 6 GB of input data and wrote 160 MB of output data.
Broadband is considered to be memory-limited because more than 75% of its runtime is consumed by tasks requiring more than 1 GB of physical memory.
 The Epigenome workflow maps human DNA sequences to a reference copy of chromosome 21.  The workflow contained 81 tasks, read 1.8 GB of input data, and produced 300 MB of output data. Epigenome is considered to be CPU- bound because it spends 99% of its runtime in the CPU and only 1% on I/O and other activities.

C)      Experimental Set-Up
In this section we summarize the experimental set-up. For a complete description, see [2] and [3]. We compared the performance of AmEC2 with that of the Abe High Performance Cluster (hereafter, Abe) at the National Center for Supercomputing Applications, which is equipped with a high speed network and parallel file system to provide high-performance I/O.
To provide an unbiased comparison of the performance of   workflows   on   AmEC2   and   Abe,   the   experiments presented here were all run on single nodes, using the local disk on both AmEC2 and Abe. For comparison we also ran experiments   using   the   parallel   file   system   on   Abe. Intuitively, the parallel file system would be expected to significantly improve the   runtime of I/O-intensive applications like Montage, but would be less of an advantage for CPU-intensive applications like Epigenome.
The two Abe nodes use the same resource type—64-bit Xeon machines—but differ in the I/O devices used: abe.local uses a local partition for I/O, and abe.lustre uses a shared Lustre™ parallel file system.  Both instances use a 10 Gbps InfiniBand™   network.   The   computational   capacity   of abe.lustre is roughly equivalent to that of c1.xlarge, which is useful when comparing the performance of Abe and AmEC2 and in estimating the virtualization overhead of AmEC2.
On AmEC2, executables were pre-installed in a Virtual Machine image, which is deployed on the node. The input data was stored in the Amazon Elastic Block Store (EBS) (a SAN-like storage service), while the output and intermediate files, as well as the application executables, and were stored on local disks. For Abe, all application executables and input files were stored in the Luster™ file system. For abe.local experiments, the input data were copied to a local disk before running the workflow, and all intermediate and output data were written to the same local disk. For abe.lustre, all intermediate and output data were written to the LusterTM file system.
All jobs on both platforms were managed and executed through a job submission host at the Information Sciences Institute (ISI) using the Pegasus Workflow Management System (Pegasus WMS), which includes Pegasus [4] and Condor [5]. On AmEC2 we configured our VM image to start Condor worker processes when each node boots.

III.         EVALUATING APPLICATIONS ON THE AMAZON EC2 CLOUD

A)       Performance Comparison between Amazon EC2 and the Abe High Performance Cluster

Fig. 1 compares the runtimes of the Montage, Broadband and Epigenome workflows on all the Amazon EC2 and Abe platforms listed in Table II.  Runtime in this context refers to the total amount of wall clock time, in seconds, from the moment the first workflow task is submitted until the last task completes. These runtimes exclude the following:

·         The time required to install and boot the VM, which typically averages between 70 and 90 seconds (AmEC2 only)
·         The latency in provisioning resources from Abe using the pilot jobs, which is highly dependent on the current system load.
·         The time to transfer input and output data, which varies with the load on the wide Area Network (WAN).

This definition of runtime (also known as-makespan) enables a one-to-one comparison of the performances of AmEC2 and Abe. The similarities between the specifications of   c1.xlarge   and   abe.local   allow   us   to   estimate   the virtualization overhead for each application on AmEC2.

                                                          i.      Montage (I/O-bound)

The best performance was achieved on the m1.xlarge resource type. It has double the memory of the other types, and the extra memory is used by the Linux kernel for the file system buffer cache to reduce the amount of time the application spends waiting for I/O. This is particularly beneficial for an I/O-intensive application like Montage. Reasonably good performance was achieved on all resource types except m1.small, which is much less powerful than the other types. The AmEC2 c1.xlarge type is nearly equivalent to abe.local and delivered nearly equivalent performance (within 8%), indicating the virtualization overhead does not seriously degrade performance for this application.

ii.  Broadband (Memory-bound)

For Broadband the processing advantage of the parallel file system largely disappears: abe.lustre offers only slightly better   performance   than   abe.local And   abe.local’s performance is only 1% better than c1.xlarge, so virtualization overhead is essentially negligible. For a memory-intensive application like Broadband, AmEC2 can achieve nearly the same performance as Abe as long as there is more than 1 GB of memory per core. If there is less, then some cores must sit idle to prevent the system from running out of memory or swapping.

  FIGURE 1.THE PROCESSING TIMES FOR THE MONTAGE, BROADBAND AND EPIGENOME WORKFLOWS ON THE AMAZON EC2 CLOUD AND THE HIGH PERFORMANCE CLUSTER. THE LEGEND IDENTIFIES THE PROCESSORS.

                                                        ii.      Epigenome (CPU-bound)

As with Broadband, the parallel file system in Abe provides no processing advantage for Epigenome: processing times on abe.lustre were only 2% faster than on abe.local. Epigenome performance suggests that virtualization overhead may be more significant for a CPU-bound application: the processing time for c1.xlarge was some 10% larger than for abe.local. The machines with the most cores gave the best performance for Epigenome, as would be expected for a CPU-bound application.

IV.                COST-ANALYSIS OF RUNNING WORKFLOW APPLICATIONS ON AMAZON EC2

AmEC2 itemizes charges for the use of all of its resources, including charges for:
·         Resources, including the use of VM instances and processing,
·         Data storage, including the cost of virtual images in S3 and input data in S3,
·         Data transfer, including charges for transferring input data into the cloud, and
·         Transferring output data and log files between the summit host and AmEC2.

i. Resource Cost

Fig. 2 clearly shows the trade-off between performance and cost for Montage. The most powerful processor, c1.xlarge, offers a three-fold performance advantage over the least powerful, m1.small, but at five times the cost. The most cost-effective solution is c1.medium, which offers performance of only 20% less than m1.xlarge but at five- times lower cost.
For Broadband, the picture is quite different. Processing costs do not vary widely with machine, so there is no reason to choose less-powerful machines. Similar results apply to Epigenome: the machine offering the best performance, c1.xlarge, is also the second-cheapest machine.
FIGURE 2. THE PROCESSING COSTS FOR THE MONTAGE, BROADBAND AND EPIGENOME WORKFLOWS FOR THE AMAZON EC2 PROCESSORS GIVEN IN THE LEGEND.

ii.                   Storage Cost

Storage cost is made up of the cost to store VM images in the Simple Storage Service (S3, an object-based storage system), and the cost of storing input data in the Elastic Block Store (EBS, a SAN-like block-based storage system). Both S3 and EBS use fixed monthly charges for the storage of data, and charges for accessing the data, which can vary according to the application. The rates for fixed charges are $0.15 per GB/month for S3 and $0.10 per GB/month for EBS. The main difference in cost is that EBS is charged based on the amount of disk storage requested, whereas S3 only charges for what is used. Additionally, EBS can be attached to only one computing instance, whereas S3 can be access concurrently by any number of instances.  The variable charges for data storage are $0.01 per 1,000 PUT operations and $0.01 per 10,000 GET operations for S3, and $0.10 per million I/O operations for EBS.
The 32-bit image used for the experiments in this paper was 773 MB, compressed, and the 64-bit image was 729MB, compressed, for a total fixed cost of $0.22 per month. The fixed monthly cost of storing input data for the three applications is shown in Table III. For the experiments described in this study, there were 4,616 S3 GET operations and 2,560 S3 PUT operations for a total variable cost of approximately $0.03. In addition, there were 3.18 million I/O operations on EBS for a total variable cost of $0.30.

Appli.
I/p Vol.
Monthly Cost
Montage
4.3 GB
$0.66
Broadband
4.1 GB
$0.66
Epigenome
1.8 GB
$0.26

TABLE 2 MONTHLY STORAGE COST

iii. Transfer Cost

In addition to resource and storage charges, AmEC2 charges $0.10 per GB for transfer into the cloud, and $0.17 per GB for transfer out of the cloud. Tables IV and V show the transfer sizes and costs for the three workflows.
Input is the amount of input data to the workflow, output is the amount of output data, and logs refers to the amount of logging data recorded for workflow tasks and transferred back to the summit host. The cost of the protocol used by Condor to communicate between the summit host and the workers is not included, but it is estimated to be much less than $0.01 per workflow.

iv. Sample Cost Effectiveness Study


We provide here a simple example of a cost-effectiveness study to answer the question: Is it cheaper to host an on- demand image mosaic service locally or on AmEC2? The costs described here are current as of October 2010. The calculations presented assume that the two services process requests for 36,000 mosaics of 2MASS images (total size10TB) of size 4 sq deg over a period of three years. This workload is typical of the requests made to an existing image mosaic service hosted at the Infrared Processing and Analysis Center. Table VI summarizes the costs of the local service, using hardware choices typical of those used at IPAC. The roll-up of the power, cooling and administration are estimates provided by IPAC system management. Table VII gives similar calculations for AmEC2; the costs there include the costs of data transfer, I/O etc.  Clearly, the local service is the least expensive choice. The high costs of data storage in AmEC2, and the high cost of data transfer and I/O in the case of an I/O-bound application like Montage, make AmEC2 much less attractive than a local service. An example of a much more cost-effective astronomy application will be given in Section V.

Item
Cost ($)
12 TB RAID 5 disk farm and enclosure(3 yr support)
12,000
Dell 2650 Xeon quadcore processor,1 TB staging area
5,000
Power, cooling and administration
6,000
Total 3-year Cost
23,000
Cost per mosaic
0.64

TABLE 3. COST PER MOSAIC OF A LOCALLY HOSTED IMAGE MOSAIC SERVICE

Item
Cost ($)
Network Transfer In
1,000
Data Storage on Elastic Block Storage
36,000
Processor Cost (c1.medium)
4,500
I/O operations
7,000
Network Transfer Out
4,200
Total 3-year Cost
52,700
Cost per mosaic
1.46

TABLE 4: COST PER MOSAIC OF A MOSAIC SERVICE HOSTED IN THE AMAZON EC2 CLOUD

iii.                 Summary Of The Comparative Study: When To Use The Cloud

·         Virtualization overhead on AmEC2 is generally small, but most evident for CPU-bound applications.
·         The resources offered by AmEC2 are generally less powerful than those available in high-performance clusters and generally do not offer the same performance. This is particularly the case for I/O– bound applications, whose performance benefits greatly from the availability of parallel file systems. This advantage essentially disappears for CPU- and Memory-bound applications.
·         End-users should understand the resource usage of their applications and undertake a cost-benefit study of the resources offered to establish a processing and storage strategy. They should take into account factors such as:
·         Amazon EC2 itemizes charges for resource usage, data transfer and storage, and the impact of these costs should be evaluated.
·         For I/O-bound applications, the most expensive resources are not necessarily the most cost- effective.
·         Data transfer costs can exceed the processing costs for data-intensive applications.
·         Amazon EC2 offers no cost benefit over locally hosted storage, and is generally more expensive, but does eliminate local maintenance and energy costs, and does offer high-quality, reliable, storage.

iv.                   Conclusions

Our study has shown that cloud computing offers a powerful and cost-effective new resource for scientists, especially for compute and memory intensive applications. For I/O-bound applications, however, high-performance clusters equipped with parallel file systems and high performance networks do offer superior performance.  End- users should perform a cost-benefit study of cloud resources as part of their usage strategy.

We have used the calculation of an atlas of periodograms of light curves measured by the Kepler mission as an example of how the Amazon cloud can be used to generate a new science product.  Although the monetary costs presented here were small, these costs can grow significantly as the number of curves grows, or as the search parameters are adjusted. As a result, commercial clouds may not be best suited for large-scale computations. On the other hand, there is now a movement towards providing academic clouds, such as those being built by Future Grid or the National Energy Research Scientific Computing Center (NERSC) that will provide virtual environment capabilities to the scientific community.   What remains to be seen is whether the level of service provided by academia can be on the par with that delivered by commercial entities.

REFERENCES
[1]     J. C. Jacob, D. S. Katz, G. B. Berriman, J. Good,  A. C. Laity, E. Deelman, C. Kesselman, G. Singh, M.-H. Su, T. A.  Prince, and R. Williams, “Montage: a grid portal and software toolkit for science-grade astronomical image mosaics”. Computational Science and Engineering. 2010. Vol, 4, Number 2, 1.
[2]     G. B. Berriman, E. Deelman, P. Groth, and G. Juve. “The Application of   Cloud   Computing to the Creation of Image Mosaics and Management of Their Provenance”, SPIE Conference 7740: Software and Cyberinfrastructure for Astronomy (GET REF), 2010.
[3]     G. Juve, E. Deelman, K. Vahi, G. Mehta, G. B.    Berriman, B. P. Berman, and P. Maechling, “Scientific Workflow Applications on Amazon EC2”.  Cloud Computing Workshop in Conjunction with e-Science Oxford, UK: IEEE, 2009.
[4]     Deelman, E.  et al., ―Pegasus: A framework for  mapping complex scientific              workflows onto distributed systems, Scientific Programming, 2005.