This is the third part of the blog series of introducing some of the exciting new features of Windows Azure SDK 2.0. In this series I’ll take you on a tour of new Windows Azure SDK 2.0 features and experience them firsthand. You can find the SDK 2.0 announcement here. You can acquire SDK 2.0 through Web PI, or you can download it Windows Azure .NET Developer Center.
Previous parts:
Windows Azure SDK 2.0 Features (1) – A Even Better Server Explorer
Windows Azure SDK 2.0 Features (2) – A Even Better Web Sites Experience
High Memory Machines
With Windows Azure IaaS went GA (see announcements on MSDN and Scott Gu's blog), developers and ISVs can now take advantages of more VM image templates, larger VM sizes and reduced VM prices. And because now IaaS is in GA status, their VMs running on Windows Azure are formally assured with one of the industry’s highest monthly SLAs.
Windows IaaS now offer two types of high memory machines, A6 (28GB/4 core) and A7(56GB/8 core). A machine with more cores and higher memory enables you to scale-up your application to improve system performance and to increase system throughput. More importantly, it enables you to run your existing applications against bigger loads instead of having to change your application architecture for scaling out. Of course, scaling out is definitely the preferred way of scaling on Windows Azure (and any other cloud platforms). However, changing application architecture is not an easy task, if even possible at all (for instance when you use 3rd party applications that are designed for single machines). Now with the high memory machines, not only you can deploy your existing applications that require high memory directly on IaaS, you can also design your Cloud Services to take advantage of these machines.
Sample Scenario
In this sample scenario, we’ll create a Cloud Service that allows users to navigate through large data sets. The data sets I chose to use in this case are Sea Surface Temperature, Salinity and Density data sets from NASA. The data sets consist of 10,800 1920x1080 .tiff images, which occupy about 7.3G of disk space. The idea is to provide a web site that allows user to navigate through the data sets and switch freely among the three images series. To improve loading speed, I’m using a co-located cache cluster to cache image frames. Obviously the bigger the cache, the more pictures I’ll be able to preload into the cache to provide a better user experience.
I’ve pre-loaded all images to my Windows Azure storage account. When the web role starts, it will start to pre-load as many images as possible. As a user requests for a images, the system will check the cache first, if the image is not in cache, it queues a new loading request.
In future versions, more analytical features will be added, such as looking for sudden changes in image series, comparing different images, etc.
I picked this scenario out of tens of candidates, many of which are compute-intensive scenarios. However, I picked this scenario at the end because it uses every aspects of the powerful machine offering – more cores, abundant memory, and higher network bandwidth allocations. We often hear people saying “that’s a Big Data problem” when they see a large amount of data. However, in my opinion, Big Data is not a problem, it’s a system. It’s a system to gain BI from various of data sources and analysis. A Big Data system include 1) data discovery; 2) data collection; 3) data transforms; 4) data storage; 5) analysis; 6) result presentation and sharing. Running a complex simulation only focuses on the fifth part. On the other hand, this scenario shows how to collect, store, transform, and share large amount of data.
Implementation
The project is a Cloud Service with a ASP.NET MVC 4 Web Role (using Internet Application template). I enabled co-located cache with 50% of memory allocated for the cache cluster with cache entities living forever. Images are served to clients via a API controller. The following is the gist of this part of code:
DataCache cache = new DataCache(); var index = string.Format("{0}-{1:0000}", series, frame); HttpResponseMessage response = new HttpResponseMessage(); var data = cache.Get(index); if (data != null) { response.Content = new StreamContent(new MemoryStream((byte[])data)); response.Content.Headers.ContentType = new MediaTypeHeaderValue("image/tiff"); } else { //queue another request to get the image from blog storage and then put it in cache. }
Result
I’ve published an early version of the app on one of the A7 machines (http://oceandatasample.cloudapp.net/). The publish process is exactly as before. The only thing I did was to change my service definition to use a large VM:
The application largely works. You can drag the slider to navigate through 3600 frames of HD images of each data series, and you can switch among data series at any time by clicking on corresponding water drops at the top of the screen. The play buttons don’t work quite right yet.
Add a comment