My Photo
E-Commerce Service
Amazon E-Commerce Service (ECS) exposes Amazon's product data and e-commerce functionality.

Elastic Compute Cloud
Amazon Elastic Compute Cloud is a web service that provides resizable compute capacity in the cloud.

Historical Pricing
The Amazon Historical Pricing web service gives developers programmatic access to over three years of actual sales data for books, music, videos, and DVDs.

Mechanical Turk
One of the best ways to understand Amazon Mechanical Turk is to complete a HIT and see what the experience is like.

Simple Storage Service
Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.

Simple Queue Service
Amazon Simple Queue Service offers a reliable, highly scalable hosted queue for storing messages as they travel between computers.

Alexa Thumbnails
All thumbnail images are accessible via web services, using SOAP or REST.

Alexa Top Sites
The Alexa Top Sites web service provides ranked lists of the top sites on the Internet.

Alexa Web Information Service
The Alexa Web Information Service makes Alexa's vast repository of information about the traffic and structure of the web available to developers.

Alexa Web Search
The Alexa Web Search web service offers programmatic access to Alexa's web search engine.

WalkScore.com - Another Web Hosting Success Story

SeEveryday, we hear new stories about a cool new startup and its success story.

Today, It was WalkScore.com. The website offers some great information about which neighborhood/city is more walkable than the rest (San francisco was #1 and Seattle was #6). Walk Score calculates the walkability of an address by locating nearby stores, restaurants, schools, parks, etc. I think it is a great site for those who are consider moving to a new neighborhood or those who simply like the car-free lifestyle, especially because the gas prices are setting new records everyday.

This Seattle-based innovative startup hit almost all the major newspapers, blogs and websites last week from SF Chronicle to Washington Post, Los Angeles Times Blog to ABC News, From USA Today to MSNBC.

In an email, Matt Lerner from WalkScore says:

We would never have weathered being the #1 story on Yahoo! yesterday if it weren't for Amazon!  THANK YOU!

We're a hybrid-philanthropy business which means we prioritize social good over profit--and therefore we're on a pretty tight infrastructure budget :-)  What's so great about Amazon cloud computing is that it was very cheap from an infrastructure and dev standpoint for us to scale up quickly.  In a nutshell we have only one physical web server and didn't want to deal with the expense of a hardware upgrade so:

  • We set up 4 EC2 instances to serve the walkability heat map tiles you see overlaid on top of the Google maps. Here is Seattle for example.
  • We moved all of our images, CSS, and JS files to Amazon S3 which took a big load off of our one web server.
  • We were able to accommodate a spike of about 80K unique visitors during a three hour period thanks to Amazon

One great article which I would like to highlight is How We Built a Web Hosting Infrastructure on EC2. Its a nice read if you are trying to host your website on Amazon EC2.

-- Jinesh

Can Scanning as a Service Clean Your Desk Off?

Does your desk look like this photo? No comment on where the photo was taken, of course... There's hope!

Pixily just launched, with a business model that could be described as "NetFlix in reverse". They offer a plan that allows you to send them one envelope per month (envelopes can contain up to 50 items) filled with documents that you want scanned and made searchable. This base plan costs $14.95 per month, and of course higher volume plans are available.

Prasad Thammineni, CEO of Pixily, came to our AWS Startup Event last fall in Boston, where I had the opportunity  to meet him. Pixily is based in Waltham, MA and a big user of AWS--in fact, a Prasad says "We use EC2, S3 and SQS. AWS has helped us democratize expensive technology and make it accessible to consumers and small businesses. This technology until now was available to only large enterprises."

You can read more about Pixily in this Boston Globe article. The article included this gem:

"Pixily has economized by building the entire website atop Amazon's Web services infrastructure, which allows a company to rent servers and storage space as needed. "That gives us the flexibility to add more servers based on our demand, as traffic increases, instead of paying for them at the outset," says chief technology officer Vikram Kumar"

 -- Mike

Amazon Web Services "Office Hours"

We thought of trying out a new idea. Instead of working from our Amazon offices, for a change, we will be work for few hours, every last tuesday of the month, from an offsite.

We like to call it AWS “Office Hours”.

Offsite will be at the StartPad co-working office space in Pioneer Square in Seattle. This will be your chance to chat with an AWS technical evangelist and technical support engineer and get your questions answered. Plus, there is free internet and desk space if you want to camp out for the afternoon.

Feel free to let us know if you are planning to stop by.

When: July 22nd, 2pm-6pm (last Tuesday of each month)

Where: StartPad, 811 First Ave, Suite 480, Seattle -          (206) 388-3466      

-- Jinesh

White Paper on 'Cloud Architectures' and Best Practices of Amazon S3, EC2, SimpleDB, SQS

I am very happy to announce my white paper on Cloud Architectures is now ready. This is one incarnation of the Emerging Cloud Service Architectures that Jeff wrote about a few weeks ago.

If you are new to the cloud, the first section of the paper will help you understand the benefits of building applications in-the-cloud. If you are using the cloud already, the second section of the paper will help you to use the cloud more effectively by utilizing some of the best practices.

In this paper, I discuss a new way to design architectures. Cloud Architectures are Services-Oriented Architectures that are designed to use On-demand infrastructure more effectively. Applications built on Cloud Architectures are such that the underlying computing infrastructure is used only when it is needed (for example to process a user request), draw the necessary resources on-demand (like compute servers or storage), perform a specific job, then relinquish the unneeded resources after the job is done. While in operation the application scales up or down elastically based on actual need for resources. Everything is automated and operates without any human intervention.

Figure2_2

As an example of a Cloud Architecture, I discuss the GrepTheWeb application. This application runs a regular expression against millions of documents from the web and returns the filtered results which match the query. The architecture is interesting because it is runs completely on-demand in automated fashion. Triggered by a regex request, hundreds of Amazon EC2 instances are launched, a Hadoop Cluster is started on them, transient messages are stored on Amazon SQS queues, statuses in Amazon SimpleDB, and all Map/Reduce jobs are run in parallel. Each Map task fetches the file from Amazon S3 and runs the regular expression - and aggregates all the results in the Reduce/Combine Phase and then disposes all the infrastructure back into the cloud (when the Hadoop job is processed)

GrepTheWeb is one of many applications built by Amazon that uses all our services (Amazon EC2, Amazon SimpleDB, Amazon SQS, Amazon S3) together.

Figure4

A wide variety of different types of applications that can be built using this design approach - from nightly batch processing systems to media processing pipelines.

An excerpt:

Cloud Architectures address key difficulties surrounding large-scale data processing. In traditional data processing it is difficult to get as many machines as an application needs. Second, it is difficult to get the machines when one needs them.  Third, it is difficult to distribute and co-ordinate a large-scale job on different machines, run processes on them, and provision another machine to recover if one machine fails. Fourth, it is difficult to auto-scale up and down based on dynamic workloads.  Fifth, it is difficult to get rid of all those machines when the job is done. Cloud Architectures solve such difficulties.

Applications built on Cloud Architectures run in-the-cloud where the physical location of the infrastructure is determined by the provider. They take advantage of simple APIs of Internet-accessible services that scale on-demand, that are industrial-strength, where the complex reliability and scalability logic of the underlying services remains implemented and hidden inside-the-cloud. The usage of resources in Cloud Architectures is as needed, sometimes ephemeral or seasonal, thereby providing the highest utilization and optimum bang for the buck.

In the first section I discuss the advantages and business benefits of Cloud Architectures and how each service was used. In the second section, I discuss best practices for the various Amazon Web Services.

You can download the PDF version or access it on AWS Resource Center

I talked about this briefly at the Hadoop Summit 2008 and QCon 2007. I got some good reviews after the talk and hence I decided to put all my thoughts in this paper along with some Best Practices for the use of Amazon Web Services (Amazon EC2, Amazon SQS, Amazon S3 and Amazon SimpleDB together). Many developers from our community have been asking for a real-world example of a complex, large-scale application. I will presenting this paper at the 2008 NSF Data-Intensive Scalable Computing Workshop at UW and 9th IEEE/NATEA Conference on Cloud Computing later this week.

I believe this new and emerging way of building applications, that run in-the-cloud, is going to change the way we do business.

-- Jinesh

Jollat - Cross-Platform AWS Manager Client

JollatAndras wrote to tell me about Jollat, a new graphical cross-platform (Windows, Mac, and Linux) management client for Amazon EC2 and S3. Available for free download (with a purchase option), the client includes a number of interesting features.

On the S3 side, Jollat handles bucket creation in both the US and EU zones, upload and download of multiple files, log file configuration and management, and an access control list (ACL) editor.

On the EC2 side, Jollat's image manager makes it easy to find and launch any AMI (Amazon Machine Image). Once launched, instances can be accessed using an embedded SSH client. The tool also manages availability zones, IP addresses, and key pairs.

You can see Jollat in action by watching the video.

-- Jeff;

Friday Wrapup...

It is finally summer here in Seattle and I'm trying to get out of the office as early as possible today. Here are a few cool things that have recently landed in my inbox:

Post_2008_06_27 Don MacAskill wrote to tell me about his new product, SmugVault. This new service extends the existing image storage capabilities provided by SmugMug, allowing users to upload a wide variety of image files, as well as files of other sorts, for safekeeping. Don used Amazon's DevPay system to implement a usage-based fee structure -- $1 per month and 22 cents per GB of storage, along with fees to transfer data in and out. The entire contents of a 2 GB memory card can be stored for just 44 cents per month. SmugVault can create finished products from raw video and image data, and it can also bundle together alternate formats of an image (GIF, RAW, and so forth).

Animoto For Business adds professional features to Animoto's existing music video creation service. This version of the service can be used by businesses of all sorts -- sports teams, real estate agents, vacation resorts, and trade shows -- to produce DVD-quality videos in MP4 and ISO formats. Business users have access to a library of music that has been pre-licensed for commercial use. The product is brand-new but there are already some good success stories.

There's a new release of Cloud Studio. It now incorporates an S3 browser!

ElasticFox now includes support for Firefox 3.0, as does the S3 Firefox Organizer.

Jungle Dave wrote to tell me that he's released version 2.0 of Jungle Disk Desktop. The new version includes a revised user interface with a setup wizard, a configuration dialog, and a backup preview dialog; renaming and encryption of files and directories; support for European S3 buckets; bandwidth limiting, and much more. There's also a beta version of the workgroup version (details on it can be found here). This version allows multiple users to share a single Amazon S3 account without sharing account credentials, and also allows for per-user control of access to S3 buckets.

And with that, I am out of here!

-- Jeff;

 

Wanted: AWS Architecture Blog Posts & Diagrams

From time to time, potential users of AWS ask me about the best way to set up a highly scalable architecture using Amazon EC2, S3, SimpleDB, and SQS. I'd like to challenge readers of this blog to document their AWS-powered architectures in a blog post, preferably with a diagram, and to leave comments with a link back to their posts. I'll collect them all up in a future post.

Here are a few that I have already:

Architecture_gigavox Doug Kaye described the architecture behind GigaVox Audio Lite in his post, Amazon for Infrastructure-on-Demand. Doug used EC2, S3, and SQS to build the highly scalable podcast processing system behind The Conversations Network.

Doug's implementation regulates the number of EC2 instances in use by tracking the amount of time it takes to process each work item in the queues which drive the Transcoding and Assembly processes.

 

Architecture_smugmug_usage Don MacAskill described SmugMug's master controller (SkyNet) in SkyNet Lives! (aka EC2 @ SmugMug). Don's post doesn't include a block diagram, but it does include a cool usage graph (included at right).

Don's master controller watches 30 to 50 factors in order to make high quality scaling decisions. It was called RubberBand until it became sentient and attempted to take over the world launch several hundred Extra-Large EC2 instances simultaneously. It was then renamed SkyNet.

 

Architecture_zoosk The architecture behind online social media dating site Zoosk is described in Elastic Computing with Amazon Web Service, written by Zoosk CEO Shayan Zadeh.

Per the blog post, they use SQS to maintain a queue of uploaded photos. The photos are processed on EC2 and then uploaded to S3. The graph in the blog post indicates that they are adding approximately 4 TB of new photos every month.

 

Architecture_monster_muck_mashup The AWS Developer Connection has some worthwhile how-to articles as well. In Monster Muck Mashup - Mass Video Conversion Using AWS, Mitch Garnaat shows how to use SQS, EC2, and S3 to do video conversion in a scalable way.

The article Auto-scaling Amazon EC2 with Amazon SQS also has a whole lot of really good information.

 

Once again, I invite you to write an architecture post of your own and to leave a link to it in the comments. I would also like to see posts which make reference to load management tools such as Scalr, RightScale, and Elastra.

Updates (before I write the next big post):

-- Jeff;

Zemanta Pixie

GigaSpaces XAP - Now on Amazon EC2

The GigaSpaces XAP (eXtreme Application Platform) is now available as an Amazon EC2 AMI (Amazon Machine Image).

Gigaspaces_gmcAt the core, XAP implements a scalable, in-memory database which can be used as a data grid, a messaging grid, or as a parallel processing framework.

XAP makes it easy to scale the entire middleware layer (data, messaging, and services) of an application. It does this using an architecture which provides for just-in-time provisioning of processing resources, making it an ideal match for EC2.  You can build and test an application on your laptop, and then migrate it to your own data center or to Amazon EC2 without any code changes.

The entire system runs under the control of an SLA-driven container. The container hosts applications, scales out to additional instances as needed, and manages partitioning, replication, and failover.

Applications can be built using C++, any .Net language, or Java via Spring, Hibernate, Tomcat, Mule, or J2EE. These applications can easily store plain (native) objects into the core storage facility provided by GigaSpaces.

It is easy to launch a GigaSpaces cluster on Amazon EC2 by following the directions in the tutorial. The platform is available on a per-hour basis, charged through Amazon DevPay, per their pricing schedule. You can also see the whole EC2 launch and management process in action in the screencast.

GigaSpaces will be presenting a pair of screencasts next month. On July 1st, they will talk about The GigaSpaces-RightScale One-Click-Cluster. On July 22nd they will talk about Scaling Applications on Amazon EC2 (I'll be participating in that one).

-- Jeff;

JBoss Releases on Amazon EC2

By now many of you are aware that Red Hat Enterprise Linux is fully supported by Red Hat on Amazon EC2. You can read more about the offering at http://www.redhat.com/solutions/cloud/. Jeff Barr blogged about this in November, 2007 (aws.typepad.com/aws/2007/11/red-hat-enterpr.html).

I’m posting this from Boston, where I am attending the Red Hat Global Summit -- more specifically helping with a hands-on lab that teaches developers and IT staff how to deploy Red Hat Enterprise Linux (RHEL) on Amazon EC2. (It's really easy.) It’s been fun to meet enterprise developers from all over the world, and surprising to find out that no matter what country the developer is in awareness about Cloud Computing is high.

JBoss
Perhaps you already saw the posts in other blogs… Red Hat announced that their JBoss Enterprise Application Platform is available in beta form as a service within the Amazon Elastic Compute Cloud (Amazon EC2).

Traditionally we think of Java application servers as building blocks that live in a hallowed enterprise data center; however with this announcement yet another one of those essential technologies is running fully supported by the vendor in the Cloud. In mission-critical applications support is essential--and for Red Hat products that means 24x7 operational support plus developer support. See www.redhat.com/support/policy/sla/production for a menu of offerings to choose from.

This is all quite amazing. Just over two years ago Amazon Simple Storage Service launched, followed in August of 2006 by Amazon Elastic Compute Cloud. In the short span of time since 2006 we’ve seen Cloud Computing grow from an idea to “of course we use it” for many organizations. With the advent of powerhouse enterprise infrastructure and applications, it seems inevitable that line-of-business applications in the cloud will become commonplace.

Getting started is easy, with just three steps:

  1. Sign up for Amazon EC2
  2. Purchase a subscription to Red Hat Enterprise Linux (RHEL) on Amazon EC2 or purchase a subscription to JBoss on Amazon EC2
  3. Deploy your applications on the newly-minted application server; then optionally make a custom AMI from this image and save it as your own private version in Amazon S3.

You can learn more at aws.amazon.com/partners/redhat.

Mike

Splunk Ninja & Processing Distributed Logs

Splunk_distributed_logs Early this morning, Ilya Grigorik, founder of AideRSS, sent me a short note via Twitter to tell me about his latest blog post.

In the post, he described his use of a single instance of Splunk to process application log files from several dozen Amazon EC2 instances. He also included a bit of Ruby code which illustrates the process of logging data to Splunk over socket connection.

Splunk is a very cool analysis tool for system and application log files. It indexes the logs, makes it easy to search them, lets you create alerts, and even generates some spiffy-looking reports, among other things.

Minutes later, one of my colleagues sent me another blog post related to Splunk. In that post, the Splunk Ninja (motto: "All batbelt. No tights.") demonstrates (in video form) his use of EC2 and S3 to demonstrate Splunk and its log processing tools. The Ninja likes the fact that EC2 offers quick provisioning and scaling, and that he doesn't have to buy anything or to wait for it to be delivered. He does complain that there's no pretty GUI for EC2, so I'll have to tell him about ElasticFox.

Update: The Splunk Ninja has posted a new and longer video! This one covers ElasticFox, RightScale, CloudStatus, and two very cool Splunk add-ins: Splunk Replay and Splunk Globe.

-- Jeff;

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31