Archive for the ‘Azure Marketplace’ Category

How to install Spark 1.2 on Azure HDInsight clusters

Friday, March 20th, 2015

How to install Spark 1.2 on Azure HDInsight clusters by Maxim Lukiyanov.

From the post:

Today we are pleased to announce the refresh of the Apache Spark support on Azure HDInsight clusters. Spark is available on HDInsight through custom script action and today we are updating it to support the latest version of Spark 1.2. The previous version supported version 1.0. This update also adds Spark SQL support to the package.

Spark 1.2 script action requires latest version of HDInsight clusters 3.2. Older HDInsight clusters will get previous version of Spark 1.0 when customized with Spark script action.

Follow the below steps to create Spark cluster using Azure Portal:

The only remaining questions are: How good are you with Spark? and How big of a Spark cluster do you neeed? (or can afford).


Jump-Start Big Data with Hortonworks Sandbox on Azure

Thursday, March 19th, 2015

Jump-Start Big Data with Hortonworks Sandbox on Azure by Saptak Sen.

From the post:

We’re excited to announce the general availability of Hortonworks Sandbox for Hortonworks Data Platform 2.2 on Azure.

Hortonworks Sandbox is already a very popular environment in which developers, data scientists, and administrators can learn and experiment with the latest innovations in the Hortonworks Data Platform.

The hundreds of innovations span Hadoop, Kafka, Storm, Hive, Pig, YARN, Ambari, Falcon, Ranger, and other components of which HDP is composed. Now you can deploy this environment for your learning and experimentation in a few clicks on Microsoft Azure.

Follow the guide to Getting Started with Hortonworks Sandbox with HDP 2.2 on Azure to set up your own dev-ops environment on the cloud in a few clicks.

We also provide step by step tutorials to help you get a jump-start on how to use HDP to implement a Modern Data Architecture at your organization.

The Hadoop Sandbox is an excellent way to explore the Hadoop ecosystem. If you trash the setup, just open another sandbox.

Add Hortonworks tutorials to the sandbox and you are less likely to do something really dumb. Or at least you will understand what happened and how to avoid it before you go into production. Always nice to keep the dumb mistakes on your desktop.

Now the Hortonworks Sandbox is on Azure. Same safe learning environment but the power to scale when you are really to go live!

Quick start guide to R for Azure Machine Learning

Friday, March 13th, 2015

Quick start guide to R for Azure Machine Learning by Larry Franks.

From the post:

Microsoft Azure Machine Learning contains many powerful machine learning and data manipulation modules. The powerful R language has been described as the lingua franca of analytics. Happily, analytics and data manipulation in Azure Machine Learning can be extended by using R. This combination provides the scalability and ease of deployment of Azure Machine Learning with the flexibility and deep analytics of R.

This document will help you quickly start extending Azure Machine Learning by using the R language. This guide contains the information you will need to create, test and execute R code within Azure Machine Learning. As you work though this quick start guide, you will create a complete forecasting solution by using the R language in Azure Machine Learning.

BTW, I deleted an ad in the middle of the pasted text that said you can try Azure learning free. No credit card required. Check the site for details because terms can and do change.

I don’t know who suggested “quick” be in the title but it wasn’t anyone who read the post. 😉

Seriously, despite being long it is a great onboarding to using RStudio with Azure Machine Learning that ends with lots of good R resources.

Combining the strength of cloud based machine learning with a language that is standard in data science is a winning combination.

People will differ in their preferences for cloud based machine learning environments but this guide sets a high mark for guides concerning the same.


I first saw this in a tweet by Ashish Bhatia.

Azure Machine Learning Videos: February 2015

Monday, March 2nd, 2015

Azure Machine Learning Videos: February 2015 by Mark Tabladillo.

From the post:

With the general availability of Azure Machine Learning, Microsoft released a collection of eighteen new videos which accurately summarize what the product does and how to use it. Most of the videos are short, and some of the material overlaps: I don’t have a recommended order, but you could play the shorter ones first. In all cases, you can download a copy of each video for your own library or offline use.

Eighteen new videos of varying lengths, the shortest and longest are:

Getting Started with Azure Machine Learning – Step3 35 seconds.

Preprocessing Data in Azure Machine Learning Studio 10 minutes 52 seconds.

Believe it or not, it is possible to say something meaningful in 35 seconds. Not a lot but enough to suggest an experiment based on information from a previous module.

For those of you on the MS side of the house or anyone who likes a range of employment options.


Drag-n-Drop Machine Learning?

Wednesday, June 18th, 2014

Microsoft to provide drag-and-drop machine learning on Azure by Derrick Harris.

From the post:

Microsoft is stepping up its cloud computing game with a new service called Azure Machine Learning that users visually build and machine learning models, and then publish APIs to insert those models into applications. The service, which will be available for public preview in July, is one of the first of its kind and the latest demonstration of Microsoft’s heavy investment in machine learning.

Azure Machine Learning will include numerous prebuilt model types and packages, including recommendation engines, decision trees, R packages and even deep neural networks (aka deep learning models), explained Joseph Sirosh, corporate vice president at Microsoft. The data that the models train on and analyze can reside in Azure or locally, and users are charged based on the number of API calls to their models and the amount of computing resources consumed running them.

The reason why there are so few data scientists today, Sirosh theorized, is that they need to know so many software tools and so much math and computer science just to experiment and build models. Actually deploying those models into production, especially at scale, opens up a whole new set of engineering challenges. Sirosh said Microsoft hopes Azure Machine Learning will open up advanced machine learning to anyone who understands the R programming language or, really, anyone with a respectable understanding of statistics.

“It’s also very simple. My high school son can build machine learning models and publish APIs,” he said.

Reducing the technical barriers to use machine learning is a great thing. However, if that also results in reducing the understanding of machine learning, its perils and pitfalls, that is also a very bad thing.

One of the strengths of the Weka courses taught by Prof. Ian H. Witten is that students learn that choices are made in machine learning algorithms that aren’t apparent to the casual user. And that data choices can make as much different in outcomes as the algorithms used to process that data.

Use of software with no real understanding of its limitations isn’t new but with Azure Machine Learning any challenge to analysis will be met with the suggestion you “…run the analysis yourself.” Where the speaker does not understand that a replicated a bad result is still a bad result.

Be prepared to challenge data and means of analysis used in drag-n-drop machine learning drive-bys.

Mahout on Windows Azure…

Tuesday, January 22nd, 2013

Mahout on Windows Azure – Machine Learning Using Microsoft HDInsight by Istvan Szegedi.

From the post:

Our last post was about Microsoft and Hortonworks joint effort to deliver Hadoop on Microsoft Windows Azure dubbed HDInsight. One of the key Microsoft HDInsight components is Mahout, a scalable machine learning library that provides a number of algorithms relying on the Hadoop platform. Machine learning supports a wide range of use cases from email spam filtering to fraud detection to recommending books or movies, similar to features.These algorithms can be divided into three main categories: recommenders/collaborative filtering, categorization and clustering. More details about these algorithms can be read on Apache Mahout wiki.

Are you hearing Hadoop, Mahout, HBase, Hive, etc., as often as I am?

Does it make you wonder about Apache becoming the locus of transferable IT skills?

Something to think about as you are developing topic map ecosystems.

You can hand roll your own solutions.

Or build upon solutions that have widespread vendor support.

PS: Another great post from Istvan.

Redis on Windows Azure

Monday, January 21st, 2013

One step closer to full support for Redis on Windows, MS Open Tech releases 64-bit and Azure installer by Claudio Caldato.

From the post:

I’m happy to report new updates today for Redis on Windows Azure: the open-source, networked, in-memory, key-value data store. We’ve released a new 64-bit version that gives developers access to the full benefits of an extended address space. This was an important step in our journey toward full Windows support. You can download it from the Microsoft Open Technologies github repository.

Last April we announced the release of an important update for Redis on Windows: the ability to mimic the Linux Copy On Write feature, which enables your code to serve requests while simultaneously saving data on disk.

Along with 64-bit support, we are also releasing a Windows Azure installer that enables deployment of Redis on Windows Azure as a PaaS solution using a single command line tool. Instructions on using the tool are available on this page and you can find a step-by-step tutorial here. This is another important milestone in making Redis work great on the Windows and Windows Azure platforms.

We are happy to communicate that we are using now the Microsoft Open Technologies public github repository as our main go-to SCM so the community will be able to follow what is happening more closely and get involved in our project.

Is it just me or does it seem like technology is getting easier to deploy?

Perhaps my view is jaded by doing Linux installs with raw write 1.44 MB floppies and editing boot sectors at the command line. 😉

If you like Redis or Azure, either way this is welcome news!

Drupal + Azure = OData Repository

Monday, January 21st, 2013

Using Drupal on Windows Azure to create an OData repository by Brian Benz.

From the post:

OData is an easy to use protocol that provides access to any data defined as an OData service provider. Microsoft Open Technologies, Inc., is collaborating with several other organizations and individuals in development of the OData standard in the OASIS OData Technical Committee, and the growing OData ecosystem is enabling a variety of new scenarios to deliver open data for the open web via standardized URI query syntax and semantics. To learn more about OData, including the ecosystem, developer tools, and how you can get involved, see this blog post.

In this post I’ll take you through the steps to set up Drupal on Windows Azure as an OData provider. As you’ll see, this is a great way to get started using both Drupal and OData, as there is no coding required to set this up.

It also won’t cost you any money – currently you can sign up for a 90 day free trial of Windows Azure and install a free Web development tool (Web Matrix) and a free source control tool (Git) on your local machine to make this happen, but that’s all that’s required from a client point of view. We’ll also be using a free tier for the Drupal instance, so you may not need to pay even after the 90 day trial, depending on your needs for bandwidth or storage.

So let’s get started!

Definitely worthwhile to spend some time getting to know the OData specification. It is currently under active development at OASIS.

Doesn’t do everything you might want but tries to do the things everyone needs as a basis for other services.

Thoughts on how to represent “merged” entities in OData subject to the conditions:

  1. Entities and their unique identifiers are not re-written, and
  2. Solution is consistent with the base OData data model?

Thinking back to the original text of ISO/IEC 13250 which required presentation of topic as merged, whether bits moved about to create a “merged” representation or not.

(Disclosure: I am a member of the OData TC.)

Getting Started with VM Depot

Friday, January 11th, 2013

Getting Started with VM Depot by Doug Mahugh.

From the post:

Do you need to deploy a popular OSS package on a Windows Azure virtual machine, but don’t know where to start? Or do you have a favorite OSS configuration that you’d like to make available for others to deploy easily? If so, the new VM Depot community portal from Microsoft Open Technologies is just what you need. VM Depot is a community-driven catalog of preconfigured operating systems, applications, and development stacks that can easily be deployed on Windows Azure.

You can learn more about VM Depot in the announcement from Gianugo Rabellino over on Port 25 today. In this post, we’re going to cover the basics of how to use VM Depot, so that you can get started right away.

Doug outlines simple steps to get you rolling with the VM Depot.

Sounds a lot easier than trying to walk casual computer users through installation and configuration of software. I assume you could even load data onto the VMs.

Users just need to fire up the VM and they have the interface and data they want.

Sounds like a nice way to distribute topic map based information systems.

Installing Neo4j in an Azure Linux VM

Saturday, December 29th, 2012

Installing Neo4j in an Azure Linux VM by Howard Dierking.

From the post:

I’ve been playing with Neo4j a lot recently. I’ll be writing a lot more about that later, but at a very very high level, Neo4j is a graph database that in addition to some language-specific bindings has a slick HTTP interface. You can install it on Windows, Linux, and Mac OSX, so if you’re more comfortable on Windows, don’t read this post and think that you can’t play with this awesome database unless you forget everything you know, replace your wardrobe with black turtlenecks, and write all your code in vi (though that is an option). For me, though, I hate installers and want the power of a package manager such as homebrew (OSX) or apt-get (Linux). So I’m going to take you through the steps that I went through to get neo4j running on Linux. And just to have a little more fun with things, I’ll host neo4j on a Linux VM hosted in Azure.

Azure, Neo4j, a Linux VM and CLI tools, what more could you want?

Definitely a must read post for an easy Neo4j launch on an Azure Linux VM.

Howard promises more posts on Neo4j to follow.

Microsoft Open Technologies releases Windows Azure support for Solr 4.0

Monday, December 24th, 2012

Microsoft Open Technologies releases Windows Azure support for Solr 4.0 by Brian Benz.

From the post:

Microsoft Open Technologies is pleased to share the latest update to the Windows Azure self-deployment option for Apache Solr 4.0.

Solr 4.0 is the first release to use the shared 4.x branch for Lucene & Solr and includes support for SolrCloud functionality. SolrCloud allows you to scale a single index via replication over multiple Solr instances running multiple SolrCores for massive scaling and redundancy.

To learn more about Solr 4.0, have a look at this 40 minute video covering Solr 4 Highlights, by Mark Miller of LucidWorks from Apache Lucene Eurocon 2011.

To download and install Solr on Windows Azure visit our GitHub page to learn more and download the SDK.

Another alternative for implementing the best of Lucene/Solr on Windows Azure is provided by our partner LucidWorks. LucidWorks Search on Windows Azure delivers a high-performance search solution that enables quick and easy provisioning of Lucene/Solr search functionality without any need to install, manage or operate Lucene/Solr servers, and it supports pre-built connectors for various types of enterprise data, structured data, unstructured data and web sites.

Beyond the positive impact for Solr and Azure in general, this means your Solr skills will be useful in new places.

Hadoop on Azure : Introduction

Tuesday, November 20th, 2012

Hadoop on Azure : Introduction by BrunoTerkaly.

From the post:

I am in complete awe on how this technology is resonating with today’s developers. If I invite developers for an evening event, Big Data is always a sellout.

This particular post is about getting everyone up to speed about what Hadoop is at a high level.

Big data is a technology that manages voluminous amount of unstructured and semi-structured data.

Due to its size and semi-structured nature, it is inappropriate for relational databases for analysis.

Big data is generally in the petabytes and exabytes of data.

A very high level view but a series to watch as the details emerge on using Hadoop on Azure.

Enabling Big Data Insight for Millions of Windows Developers [Your Target Audience?]

Thursday, October 25th, 2012

Enabling Big Data Insight for Millions of Windows Developers by Shaun Connolly.

From the post:

At Hortonworks, we fundamentally believe that, in the not-so-distant future, Apache Hadoop will process over half the world’s data flowing through businesses. We realize this is a BOLD vision that will take a lot of hard work by not only Hortonworks and the open source community, but also software, hardware, and solution vendors focused on the Hadoop ecosystem, as well as end users deploying platforms powered by Hadoop.

If the vision is to be achieved, we need to accelerate the process of enabling the masses to benefit from the power and value of Apache Hadoop in ways where they are virtually oblivious to the fact that Hadoop is under the hood. Doing so will help ensure time and energy is spent on enabling insights to be derived from big data, rather than on the IT infrastructure details required to capture, process, exchange, and manage this multi-structured data.

So how can we accelerate the path to this vision? Simply put, we focus on enabling the largest communities of users interested in deriving value from big data.

You don’t have to wonder long what Shaun is reacting to:

Today Microsoft unveiled previews of Microsoft HDInsight Server and Windows Azure HDInsight Service, big data solutions that are built on Hortonworks Data Platform (HDP) for Windows Server and Windows Azure respectively. These new offerings aim to provide a simplified and consistent experience across on-premise and cloud deployment that is fully compatible with Apache Hadoop.

Enabling big data insight isn’t the same as capturing those insights for later use or re-use.

May just be me, but that sounds like a great opportunity for topic maps.

Bringing semantics to millions of Windows developers that is.

MongoDB Installer for Windows Azure

Tuesday, July 10th, 2012

MongoDB Installer for Windows Azure by Doug Mahugh.

From the post:

Do you need to build a high-availability web application or service? One that can scale out quickly in response to fluctuating demand? Need to do complex queries against schema-free collections of rich objects? If you answer yes to any of those questions, MongoDB on Windows Azure is an approach you’ll want to look at closely.

People have been using MongoDB on Windows Azure for some time (for example), but recently the setup, deployment, and development experience has been streamlined by the release of the MongoDB Installer for Windows Azure. It’s now easier than ever to get started with MongoDB on Windows Azure!

If you are developing or considering developing with MongoDB, this is definitely worth a look. In part because it frees you to concentrate on software development and not running (or trying to run) a server farm. Different skill sets.

Another reason is that is levels the playing field with big IT firms with server farms. You get the advantages of a server farm without the capital investment in one.

And as Microsoft becomes a bigger and bigger tent for diverse platforms and technologies, you have more choices. Choices for the changing requirements of your clients.

Not that I expect to see an Apple hanging from the Microsoft tree anytime soon but you can’t ever tell. Enough consumer demand and it could happen.

In the meantime, while we wait for better games and commercials, consider how you would power semantic integration in the cloud?

SQL Azure Labs Posts

Tuesday, May 22nd, 2012

Roger Jennings writes in Recent Articles about SQL Azure Labs and Other Value-Added Windows Azure SaaS Previews: A Bibliography:

I’ve been concentrating my original articles for the past six months or so on SQL Azure Labs, Apache Hadoop on Windows Azure and SQL Azure Federations previews, which I call value-added offerings. I use the term value-added because Microsoft doesn’t charge for their use, other than Windows Azure compute, storage and bandwidth costs or SQL Azure monthly charges and bandwidth costs for some of the applications, such as Codename “Cloud Numerics” and SQL Azure Federations.

As of 22 May 2012, there are forty-four (44) posts in the following categories:

  • Windows Azure Marketplace DataMarket plus Codenames “Data Hub” and “Data Transfer” from SQL Azure Labs
  • Apache Hadoop on Windows Azure from the SQL Server Team
  • Codename “Cloud Numerics” from SQL Azure Labs
  • Codename “Social Analytics from SQL Azure Labs
  • Codename “Data Explorer” from SQL Azure Labs
  • SQL Azure Federations from the SQL Azure Team

If you need quick guides and/or incentives to use Windows Azure, try these on for size.

Importing UK Weather Data from Azure Marketplace into PowerPivot

Sunday, April 15th, 2012

Importing UK Weather Data from Azure Marketplace into PowerPivot by Chris Webb.

From the post:

I don’t always agree with everything Rob Collie says, much as I respect him, but his recent post on the Windows Azure Marketplace (part of which used to be known as the Azure Datamarket) had me nodding my head. The WAM has been around for a while now and up until recently I didn’t find anything much there that I could use in my day job; I had the distinct feeling it was going to be yet another Microsoft white elephant. The appearance of the DateStream date dimension table (see here for more details) was for me a turning point, and a month ago I saw something really interesting: detailed weather data for the UK from the Met Office (the UK’s national weather service) is now available there too. OK, it’s not going to be very useful for anyone outside the UK, but the UK is my home market and for some of my customers the ability to do things like use weather forecasts to predict footfall in shops will be very useful. It’s exactly the kind of data that analysts want to find in a data market, and if the WAM guys can add other equally useful data sets they should soon reach the point where WAM is a regular destination for all PowerPivot users.

Importing this weather data into PowerPivot isn’t completely straightforward though – the data itself is quite complex. The Datamarket guys are working on some documentation for it but in the meantime I thought I’d blog about my experiences; I need to thank Max Uritsky and Ziv Kaspersky for helping me out on this.

I don’t live in the UK nor do I use PowerPivot but I suspect readers of this blog may fall into either category or both. In any event, learning more about data sources, import and even software is always a useful thing.

All of those are likely to be sources you will need or encounter when authoring a topic map.

Interesting that while Amazon is striving to bring “big data” processing skills to everyone, the importing of data remains a roadblock for some users. Standard exports for particular data sets may become a commodity.