Archive for the ‘Microsoft’ Category
Tuesday, May 21st, 2013
Hadoop, Hadoop, Hurrah! HDP for Windows is Now GA! by John Kreisa.
From the post:
Today we are very excited to announce that Hortonworks Data Platform for Windows (HDP for Windows) is now generally available and ready to support the most demanding production workloads.
We have been blown away with the number and size of organizations who have downloaded the beta bits of this 100% open source, and native to Windows distribution of Hadoop and engaged Hortonworks and Microsoft around evolving their data architecture to respond to the challenges of enterprise big data.
With this key milestone HDP for Windows offers the millions of customers running their business on Microsoft technologies an ecosystem-friendly Hadoop-based solution that is built for the enterprise and purpose built for Windows. This release cements Apache Hadoop’s role as a key component of the next generation enterprise data architecture, across the broadest set of datacenter configurations as HDP becomes the first production-ready Apache Hadoop distribution to run on both Windows and Linux.
Additionally, customers now also have complete portability of their Hadoop applications between on-premise and cloud deployments via HDP for Windows and Microsofts’s HDInsight Service.
Two lessons here:
First, Hadoop is a very popular way to address enterprise big data.
Second, going where users are, not where they ought to be, is a smart business move.
Posted in Hadoop, Hortonworks, Microsoft | No Comments »
Friday, May 17th, 2013
Hadoop SDK and Tutorials for Microsoft .NET Developers by Marc Holmes.
From the post:
Microsoft has begun to treat its developer community to a number of Hadoop-y releases related to its HDInsight (Hadoop in the cloud) service, and it’s worth rounding up the material. It’s all Alpha and Preview so YMMV but looks like fun:
- Microsoft .NET SDK for Hadoop. This kit provides .NET API access to aspects of HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive. The latter is really interesting as it builds on the established technology for .NET developers to access most data sources to deliver the capabilities of the de facto standard for Hadoop data query.
- HDInsight Labs Preview. Up on Github, there is a series of 5 labs covering C#, JavaScript and F# coding for MapReduce jobs, using Hive, and then bringing that data into Excel. It also covers some Mahout use to build a recommendation engine.
- Microsoft Hive ODBC Driver. The examples above use this preview driver to enable the connection from Hive to Excel.
If all of the above excites you our Hadoop on Windows for Developers training course also similar content in a lot of depth.
Hadoop is coming to an office/data center near you.
Will you be ready?
Posted in .Net, Hadoop, MapReduce, Microsoft | No Comments »
Tuesday, April 16th, 2013
You’re invited to help us celebrate an unlikely pairing in open source by Gianugo Rabellino.
From the post:
We are just days away from reaching a significant milestone for our team and the open source and open standards communities: the first anniversary of Microsoft Open Technologies, Inc. (MS Open Tech) — a wholly owned subsidiary of Microsoft.
We can’t think of anyone better to celebrate with than YOU, the members of the open source and open standards community and technology industry who have helped us along on our adventure over the past year.
We’d like to extend an open (pun intended!) invitation to celebrate with us on April 25, and share your burning questions on the future of the subsidiary, open source at-large and how MS Open Tech can better connect with the developer community to present even more choice and freedom.
I’ll be proud to share the stage with our amazing MS Open Tech leadership team: Jean Paoli, President; Kamaljit Bath, Engineering team leader; and Paul Cotton, Standards team leader and Co-Chair of the W3C HTML Working Group.
You have three choices:
- You can be a hard ass and stay home to “punish” MS for real and imagined slights and sins over the years. (You won’t be missed.)
- You can be obnoxious and attend, doing your best to not have a good time and trying to keep others from having a good time. (Better to stay home.)
- You can attend, have a good time, ask good questions, encourage more innovation and support by Microsoft for the open source and open standards communities.
Microsoft is going to be a major player in whatever solution to semantic interoperability catches on.
If that is topic maps, then Microsoft will be into topic maps.
I would prefer that be under the open source/open standards banner.
Distance prevents me from attending but I will be there in spirit!
Happy Anniversary to Microsoft Open Technologies, Inc.!
Posted in Microsoft, Open Source | No Comments »
Thursday, April 11th, 2013
MS Machine Learning Summit
From the post:
The live broadcast of the Microsoft Research Machine Learning Summit will include keynotes from machine learning experts and enlightening discussions with leading scientific and academic researchers about approaches to challenges that are raised by the new era in machine learning. Watch it streamed live from Paris on April 23, 2013, 13:30–17:00 Greenwich Mean Time (09:30–13:00 Eastern Time, 06:30–10:00 Pacific Time) at http://MicrosoftMLS.com.
I would rather be in Paris but watching the live stream will be a lot cheaper!
Posted in Machine Learning, Microsoft | No Comments »
Saturday, March 16th, 2013
Finding Shakespeare’s Favourite Words With Data Explorer by Chris Webb.
From the post:
The more I play with Data Explorer, the more I think my initial assessment of it as a self-service ETL tool was wrong. As Jamie pointed out recently, it’s really the M language with a GUI on top of it and the GUI itself, while good, doesn’t begin to expose the power of the underlying language: I’d urge you to take a look at the Formula Language Specification and Library Specification documents which can be downloaded from here to see for yourself. So while it can certainly be used for self-service ETL it can do much, much more than that…
In this post I’ll show you an example of what Data Explorer can do once you go beyond the UI. Starting off with a text file containing the complete works of William Shakespeare (which can be downloaded from here – it’s strange to think that it’s just a 5.3 MB text file) I’m going to find the top 100 most frequently used words and display them in a table in Excel.
If Data Explorer is a GUI on top of M (outdated but a point of origin), it goes up in importance.
From the M link:
The Microsoft code name “M” Modeling Language, hereinafter referred to as M, is a language for modeling domains using text. A domain is any collection of related concepts or objects. Modeling domain consists of selecting certain characteristics to include in the model and implicitly excluding others deemed irrelevant. Modeling using text has some advantages and disadvantages over modeling using other media such as diagrams or clay. A goal of the M language is to exploit these advantages and mitigate the disadvantages.
A key advantage of modeling in text is ease with which both computers and humans can store and process text. Text is often the most natural way to represent information for presentation and editing by people. However, the ability to extract that information for use by software has been an arcane art practiced only by the most advanced developers. The language feature of M enables information to be represented in a textual form that is tuned for both the problem domain and the target audience. The M language provides simple constructs for describing the shape of a textual language – that shape includes the input syntax as well as the structure and contents of the underlying information. To that end, M acts as both a schema language that can validate that textual input conforms to a given language as well as a transformation language that projects textual input into data structures that are amenable to further processing or storage.
I try to not run examples using Shakespeare. I get distracted by the elegance of the text, which isn’t the point of the exercise.
Posted in Data Explorer, Data Mining, Excel, Microsoft, Text Mining | No Comments »
Thursday, February 28th, 2013
Public Preview of Data Explorer by Chris Webb.
From the post:
In a nutshell, Data Explorer is self-service ETL for the Excel power user – it is to SSIS what PowerPivot is to SSAS. In my opinion it is just as important as PowerPivot for Microsoft’s self-service BI strategy.
I’ll be blogging about it in detail over the coming days (and also giving a quick demo in my PASS Business Analytics Virtual Chapter session tomorrow), but for now here’s a brief list of things it gives you over Excel’s native functionality for importing data:
- It supports a much wider range of data sources, including Active Directory, Facebook, Wikipedia, Hive, and tables already in Excel
- It has better functionality for data sources that are currently supported, such as the Azure Marketplace and web pages
- It can merge data from multiple files that have the same structure in the same folder
- It supports different types of authentication and the storing of credentials
- It has a user-friendly, step-by-step approach to transforming, aggregating and filtering data until it’s in the form you want
- It can load data into the worksheet or direct into the Excel model
There’s a lot to it, so download it and have a play! It’s supported on Excel 2013 and Excel 2010 SP1.
Download: Microsoft “Data Explorer” Preview for Excel
Chris has collected a number of links to Data Explorer resources so look to his post for more details.
It looks like a local install is required for the preview. I have been meaning to add Windows 7 to a VM and MS Office with that.
Guess it may be time to take the plunge.
(I have XP/Office on a separate box that uses the same monitors/keyboard but sharing data is problematic.)
Posted in Data Explorer, Data Mining, Microsoft | No Comments »
Wednesday, February 27th, 2013
Putting the Elephant in the Window by John Kreisa.
From the post:
For several years now Apache Hadoop has been fueling the fast growing big data market and has become the defacto platform for Big Data deployments and the technology foundation for an explosion of new analytic applications. Many organizations turn to Hadoop to help tame the vast amounts of new data they are collecting but in order to do so with Hadoop they have had to use servers running the Linux operating system. That left a large number of organizations who standardize on Windows (According to IDC, Windows Server owned 73 percent of the market in 2012 – IDC, Worldwide and Regional Server 2012–2016 Forecast, Doc # 234339, May 2012) without the ability to run Hadoop natively, until today.
We are very pleased to announce the availability of Hortonworks Data Platform for Windows providing organizations with an enterprise-grade, production-tested platform for big data deployments on Windows. HDP is the first and only Hadoop-based platform available on both Windows and Linux and provides interoperability across Windows, Linux and Windows Azure. With this release we are enabling a massive expansion of the Hadoop ecosystem. New participants in the community of developers, data scientist, data management professionals and Hadoop fans to build and run applications for Apache Hadoop natively on Windows. This is great news for Windows focused enterprises, service provides, software vendors and developers and in particular they can get going today with Hadoop simply by visiting our download page.
This release would not be possible without a strong partnership and close collaboration with Microsoft. Through the process of creating this release, we have remained true to our approach of community-driven enterprise Apache Hadoop by collecting enterprise requirements, developing them in open source and applying enterprise rigor to produce a 100-precent open source enterprise-grade Hadoop platform.
Now there is a very smart marketing move!
A smaller share of a larger market is always better than a large share of a small market.
(You need to be writing down these quips.)
Seriously, take note of how Hortonworks used the open source model.
They did not build Hadoop in their image and try to sell it to the world.
Hortonworks gathered requirements from others and built Hadoop to meet their needs.
Open source model in both cases, very different outcomes.
* I didn’t remember the rhyme beyond the opening line. Consulting the oracle (Wikipedia), I discovered Playground song.
Posted in Hadoop, Hortonworks, MapReduce, Microsoft | No Comments »
Monday, February 11th, 2013
Microsoft Reveals Rapid Big Data Adoption
From the post:
More than 75 percent of midsize to large businesses are implementing big-data-related solutions within the next 12 months — with customer care, marketing and sales departments increasingly driving demand, according to new Microsoft Corp. research released today.
According to Microsoft’s “Global Enterprise Big Data Trends: 2013″ study of more than 280 IT decision-makers, the following trends emerged:
- Although the IT department (52 percent) is currently driving most of the demand for big data, customer care (41 percent), sales (26 percent), finance (23 percent) and marketing (23 percent) departments are increasingly driving demand.
- Seventeen percent of customers surveyed are in the early stages of researching big data solutions, whereas 13 percent have fully deployed them; nearly 90 percent of customers surveyed have a dedicated budget for addressing big data.
- Nearly half of customers (49 percent) reported that growth in the volume of data is the greatest challenge driving big data solution adoption, followed by having to integrate disparate business intelligence tools (41 percent) and having tools able to glean the insight (40 percent).
After hunting around the MS News Center I found: Customers Rapidly Adopting Big Data Solutions — Driven By Marketing, Sales and More — Reports New Microsoft Research.
Links from the MS News Center take me back to that infographic so that may be the “publication” they are talking about.
It’s alright but the same thing could have been a one page, suitable for printing/sharing type report.
In any event, it is encouraging news because the greater the adoption of “big data,” the more semantic impedance is going to cause real pain.
No pain, no change.
When it hurts bad enough, take two topic maps and call me in the morning.
Posted in BigData, Microsoft | No Comments »
Friday, February 8th, 2013
Netflix: Solving Big Problems with Reactive Extensions (Rx) by Claudio Caldato.
From the post:
More good news for Reactive Extensions (Rx).
Just yesterday, we told you about improvements we’ve made to two Microsoft Open Technologies, Inc., releases: Rx and ActorFx, and mentioned that Netflix was already reaping the benefits of Rx.
To top it off, on the same day, Netflix announced a Java implementation of Rx, RxJava, was now available in the Netflix Github repository. That’s great news to hear, especially given how Ben Christensen and Jafar Husain outlined on the Netflix Tech blog that their goal is to “stay close to the original Rx.NET implementation” and that “all contracts of Rx should be the same.”
Netflix also contributed a great series of interactive exercises for learning Microsoft’s Reactive Extensions (Rx) Library for JavaScript as well as some fundamentals for functional programming techniques.
Rx as implemented in RxJava is part of the solution Netflix has developed for improving the processing of 2+ billion incoming requests a day for millions of customers around the world.
Do you have 2+ billion requests coming into your topic map every day?
Assuming the lesser includes the greater, you may want to take a look at Rx or RxJava.
Be sure to visit the interactive exercises!
Posted in ActorFx, JavaRx, Microsoft, Rx | No Comments »
Sunday, February 3rd, 2013
Need to discover, access, analyze and visualize big and broad data? Try F#. by Oliver Bloch.
From the post:
Microsoft Research just released a new iteration of Try F#, a set of tools designed to make it easy for anyone – not just developers – to learn F# and take advantage of its big data, cross-platform capabilities.
F# is the open-source, cross-platform programming language invented by Don Syme and his team at Microsoft Research to help reduce the time-to-deployment for analytical software components in the modern enterprise.
Big data definitively is big these days and we are excited about this new iteration of Try F#. Regardless of your favorite language, or if you’re on a Mac, a Windows PC, Linux or Android, if you need to deal with complex problems, you will want to take a look at F#!
Kerry Godes from Microsoft’s Openness Initiative connected with Evelyne Viegas, Director of Semantic Computing at Microsoft Research, to find out more about how you can use “Try F# to seamlessly discover, access, analyze and visualize big and broad data.” For the complete interview, go to the Openness blog or check out www.tryfsharp.org to get started “writing simple code for complex problems”.
Are you an F# user?
Curious how F# compares to other languages for “complexity?”
Visualization gurus: Does the complexity of languages go up or down with the complexity of licensing terms?
Inquiring minds want to know.
Posted in Data Analysis, Data Mining, F#, Microsoft | No Comments »
Saturday, February 2nd, 2013
Office 2013, Office 365 Editions and BI Features by Chris Webb.
From the post:
By now you’re probably aware that Office 2013 is in the process of being officially released, and that Office 365 is a very hot topic. You’ve probably also read lots of blog posts by me and other writers talking about the cool new BI functionality in Office 2013 and Office 365. But which editions of Office 2013 and Office 365 include the BI functionality, and how does Office 365 match up to plain old non-subscription Office 2013 for BI? It’s surprisingly hard to find out the answers…
For regular, non-subscription, Office 2013 on the desktop you need Office Professional Plus to use the PowerPivot addin or to use Power View in Excel. However there’s an important distinction to make: the xVelocity engine is now natively integrated into Excel 2013, and this functionality is called the Excel Data Model and is available in all desktop editions of Excel. You only need the PowerPivot addin, and therefore Professional Plus, if you want to use the PowerPivot Window to modify and extend your model (for example by adding calculated columns or KPIs). So even if you’re not using Professional Plus you can still do some quite impressive BI stuff with PivotTables etc. On the server, the only edition of Sharepoint 2013 that has any BI functionality is Enterprise Edition; there’s no BI functionality in Foundation or Standard Editions.
No matter what OS you are running, you are likely to be using some version of MS Office and if you are reading this blog, probably for BI purposes.
Chris does a great job at pointing to resources and generating resources to guide you through the feature/license thicket that surrounds MS Office in its various incarnations.
Complex licensing/feature matrices contribute to the size of department budgets that create such complexity. They don’t contribute to the bottom line at Microsoft. There is a deep and profound difference.
Posted in BI, Microsoft | No Comments »
Friday, January 11th, 2013
Getting Started with VM Depot by Doug Mahugh.
From the post:
Do you need to deploy a popular OSS package on a Windows Azure virtual machine, but don’t know where to start? Or do you have a favorite OSS configuration that you’d like to make available for others to deploy easily? If so, the new VM Depot community portal from Microsoft Open Technologies is just what you need. VM Depot is a community-driven catalog of preconfigured operating systems, applications, and development stacks that can easily be deployed on Windows Azure.
You can learn more about VM Depot in the announcement from Gianugo Rabellino over on Port 25 today. In this post, we’re going to cover the basics of how to use VM Depot, so that you can get started right away.
Doug outlines simple steps to get you rolling with the VM Depot.
Sounds a lot easier than trying to walk casual computer users through installation and configuration of software. I assume you could even load data onto the VMs.
Users just need to fire up the VM and they have the interface and data they want.
Sounds like a nice way to distribute topic map based information systems.
Posted in Azure Marketplace, Cloud Computing, Linux OS, Microsoft, Virtual Machines | No Comments »
Monday, December 24th, 2012
Microsoft Open Technologies releases Windows Azure support for Solr 4.0 by Brian Benz.
From the post:
Microsoft Open Technologies is pleased to share the latest update to the Windows Azure self-deployment option for Apache Solr 4.0.
Solr 4.0 is the first release to use the shared 4.x branch for Lucene & Solr and includes support for SolrCloud functionality. SolrCloud allows you to scale a single index via replication over multiple Solr instances running multiple SolrCores for massive scaling and redundancy.
To learn more about Solr 4.0, have a look at this 40 minute video covering Solr 4 Highlights, by Mark Miller of LucidWorks from Apache Lucene Eurocon 2011.
To download and install Solr on Windows Azure visit our GitHub page to learn more and download the SDK.
Another alternative for implementing the best of Lucene/Solr on Windows Azure is provided by our partner LucidWorks. LucidWorks Search on Windows Azure delivers a high-performance search solution that enables quick and easy provisioning of Lucene/Solr search functionality without any need to install, manage or operate Lucene/Solr servers, and it supports pre-built connectors for various types of enterprise data, structured data, unstructured data and web sites.
Beyond the positive impact for Solr and Azure in general, this means your Solr skills will be useful in new places.
Posted in Azure Marketplace, Microsoft, Solr | No Comments »
Wednesday, November 7th, 2012
Claudio Caldato wrote: MS Open Tech Open Sources Rx (Reactive Extensions) – a Cure for Asynchronous Data Streams in Cloud Programming.
I was tired by the time I got to the end of the title! His is more descriptive than mine but if you know the context, you don’t need the description.
From the post:
If you are a developer that writes asynchronous code for composite applications in the cloud, you know what we are talking about, for everybody else Rx Extensions is a set of libraries that makes asynchronous programming a lot easier. As Dave Sexton describes it, “If asynchronous spaghetti code were a disease, Rx is the cure.”
Reactive Extensions (Rx) is a programming model that allows developers to glue together asynchronous data streams. This is particularly useful in cloud programming because helps create a common interface for writing applications that come from diverse data sources, e.g., stock quotes, Tweets, computer events, Web service requests.
Today, Microsoft Open Technologies, Inc., is open sourcing Rx. Its source code is now hosted on CodePlex to increase the community of developers seeking a more consistent interface to program against, and one that works across several development languages. The goal is to expand the number of frameworks and applications that use Rx in order to achieve better interoperability across devices and the cloud.
Rx was developed by Microsoft Corp. architect Erik Meijer and his team, and is currently used on products in various divisions at Microsoft. Microsoft decided to transfer the project to MS Open Tech in order to capitalize on MS Open Tech’s best practices with open development.
There are applications that you probably touch every day that are using Rx under the hood. A great example is GitHub for Windows.
According to Paul Betts at GitHub, “GitHub for Windows uses the Reactive Extensions for almost everything it does, including network requests, UI events, managing child processes (git.exe). Using Rx and ReactiveUI, we’ve written a fast, nearly 100% asynchronous, responsive application, while still having 100% deterministic, reliable unit tests. The desktop developers at GitHub loved Rx so much, that the Mac team created their own version of Rx and ReactiveUI, called ReactiveCocoa, and are now using it on the Mac to obtain similar benefits.”
What if the major cloud players started competing on the basis of interoperability? So your app here will work there.
Reducing the impedance for developers enables more competition between developers. Resulting in better services/product for consumers.
Cloud owners get more options to offer their customers.
Topic map applications have an easier time mining, identifying and recombining subjects across diverse sources and even clouds.
Does anyone see a downside here?
Posted in Cloud Computing, Data Streams, Microsoft, Rx | No Comments »
Thursday, November 1st, 2012
KitaroDB
From the “What is KitaroDB” page:
KitaroDB is a fast, efficient and scalable NoSQL database that operates natively in the WinRT (Windows 8 UI), the Win32 (x86 and x64) and .NET environments. It features:
- Easy-to-use interfaces for C#, VB, C++, C and HTML5/JavaScript developers;
- A proven commercial database system;
- Support of large-sector drives;
- Minimal overhead, consuming less than a megabyte of memory resources;
- Durability as on-disk data store;
- High performance, capable of handling tens of thousands of operations per second;
- Asynchronous and synchronous operations;
- Storage limit of 100 TB;
- Flexibility as either a key/value data store or an intrusive key database with segmented key support.
The phrase “intrusive-keyed database” was unfamiliar.
Keys can be segmented into up to 255 segments with what appears to be a fairly limited range of data types. Some of the KitaroDB documentation on intrusive-keyed database” is found here: Creating a KitaroDB database.
Segmented keys aren’t unique to KitraoDB and they offer some interesting possibilities. More to follow on that score.
Storage being limited to 100 TB should not be an issue for “middling” data sized applications.
Posted in KitaroDB, Microsoft, NoSQL | 1 Comment »
Thursday, October 25th, 2012
DINOSAURS ARE REAL: Microsoft WOWs audience with HDInsight at Strata NYC (Hortonworks Inside) by Russell Jurney.
From the post:
You don’t see many demos like the one given by Shawn Bice (Microsoft) today in the Regent Parlor of the New York Hilton, at Strata NYC. “Drive Smarter Decisions with Microsoft Big Data,” was different.
For starters – everything worked like clockwork. Live demos of new products are notorious for failing on-stage, even if they work in production. And although Microsoft was presenting about a Java-based platform at a largely open-source event… it was standing room only, with the crowd overflowing out the doors.
Shawn demonstrated working with Apache Hadoop from Excel, through Power Pivot, to Hive (with sampling-driven early results!?) and out to import third party data-sets. To get the full effect of what he did, you’re going to have to view a screencast or try it out but to give you the idea of what the first proper interface on Hadoop feels like…
My thoughts on reading Russell’s post:
- A live product demo that did not fail? Really?
- Is that tatoo copyrighted?
- Oh, yes, +1!, big data has become real for millions of users.
How’s that for a big data book, tutorial, consulting, semantic market explosion?
Posted in HDInsight, Hadoop, Hortonworks, Microsoft | No Comments »
Thursday, October 25th, 2012
Why Microsoft is committed to Hadoop and Hortonworks (a buest post at Hortonworks by Microsoft’s Dave Campbell).
From the post:
Last February at Strata Conference in Santa Clara we shared Microsoft’s progress on Big Data, specifically working to broaden the adoption of Hadoop with the simplicity and manageability of Windows and enabling customers to easily derive insights from their structured and unstructured data through familiar tools like Excel.
Hortonworks is a recognized pioneer in the Hadoop Community and a leading contributor to the Apache Hadoop project, and that’s why we’re excited to announce our expanded partnership with Hortonworks to give customers access to an enterprise-ready distribution of Hadoop that is 100 percent compatible with Windows Server and Windows Azure. To provide customers with access to this Hadoop compatibility, yesterday we also released new previews of Microsoft HDInsight Server for Windows and Windows Azure HDInsight Service, our Hadoop-based solutions for Windows Server and Windows Azure.
With this expanded partnership, the Hadoop community will reap the following benefits of Hadoop on Windows:
- Insights to all users from all data:….
- Enterprise-ready Hadoop with HDInsight:….
- Simplicity of Windows for Hadoop:….
- Extend your data warehouse with Hadoop:….
- Seamless Scale and Elasticity of the Cloud:….
This is a very exciting milestone, and we hope you’ll join us for the ride as we continue partnering with Hortonworks to democratize big data. Download HDInsight today at Microsoft.com/BigData.
See Dave’s post for the details on “benefits of Hadoop on Windows” and then like the man says:
Download HDInsight today at Microsoft.com/BigData.
Posted in BigData, Hadoop, Hortonworks, Microsoft | No Comments »
Thursday, October 25th, 2012
Enabling Big Data Insight for Millions of Windows Developers by Shaun Connolly.
From the post:
At Hortonworks, we fundamentally believe that, in the not-so-distant future, Apache Hadoop will process over half the world’s data flowing through businesses. We realize this is a BOLD vision that will take a lot of hard work by not only Hortonworks and the open source community, but also software, hardware, and solution vendors focused on the Hadoop ecosystem, as well as end users deploying platforms powered by Hadoop.
If the vision is to be achieved, we need to accelerate the process of enabling the masses to benefit from the power and value of Apache Hadoop in ways where they are virtually oblivious to the fact that Hadoop is under the hood. Doing so will help ensure time and energy is spent on enabling insights to be derived from big data, rather than on the IT infrastructure details required to capture, process, exchange, and manage this multi-structured data.
So how can we accelerate the path to this vision? Simply put, we focus on enabling the largest communities of users interested in deriving value from big data.
You don’t have to wonder long what Shaun is reacting to:
Today Microsoft unveiled previews of Microsoft HDInsight Server and Windows Azure HDInsight Service, big data solutions that are built on Hortonworks Data Platform (HDP) for Windows Server and Windows Azure respectively. These new offerings aim to provide a simplified and consistent experience across on-premise and cloud deployment that is fully compatible with Apache Hadoop.
Enabling big data insight isn’t the same as capturing those insights for later use or re-use.
May just be me, but that sounds like a great opportunity for topic maps.
Bringing semantics to millions of Windows developers that is.
Posted in Azure Marketplace, BigData, Hadoop, Hortonworks, Microsoft | No Comments »
Friday, August 17th, 2012
Marching Hadoop to Windows
From the post:
Bringing Hadoop to Windows and the two-year development of Hadoop 2.0 are two of the more exciting developments brought up by Hortonworks’s Cofounder and CTO, Eric Baldeschwieler, in a talk before a panel at the Cloud 2012 Conference in Honolulu.
(video omitted)
The panel, which was also attended by Baldeschwieler’s Cloudera counterpart Amr Awadallah, focused on insights into the big data world, a subject Baldeschwieler tackled almost entirely with Hadoop. The eighteen-minute discussion also featured a brief history of Hadoop’s rise to prominence, improvements to be made to Hadoop, and a few tips to enterprising researchers wishing to contribute to Hadoop.
“Bringing Hadoop to Windows,” says Baldeschwieler “turns out to be a very exciting initiative because there are a huge number of users in Windows operating system.” In particular, the Excel spreadsheet program is a popular one for business analysts, something analysts would like to see integrated with Hadoop’s database. That will not be possible until, as Baldeschwieler notes, Windows is integrated into Hadoop later this year, a move that will also considerably expand Hadoop’s reach.
However, that announcement pales in comparison to the possibilities provided by the impending Hadoop 2.0. “Hadoop 2.0 is a pretty major re-write of Hadoop that’s been in the works for two years. It’s now in usable alpha form…The real focus in Hadoop 2.0 is scale and opening it up for more innovation.” Baldeschwieler notes that Hadoop’s rise has been result of what he calls “a happy accident” where it was being developed by his Yahoo team for a specific use case: classifying, sorting, and indexing each of the URLs that were under Yahoo’s scope.
Integration of Excel and Hadoop?
Is that going to be echoes of Unix – The Hole Hawg?
Posted in Excel, Hadoop, Microsoft | No Comments »
Sunday, July 22nd, 2012
Windows Azure Active Directory Graph
Pre-release documentation, subject to change before release, blah, blah, but very interesting none the less.
When I look at the application scenario, Creating Enterprise Applications by Using Windows Azure AD Graph, which is described as:
In this scenario you have purchased an Office 365 subscription. As part of the subscription you have purchased the capability to manage users using Windows Azure AD, which is part of Windows Azure. You want to build an application that can access users’ information such as user names and group membership.
OK, so I can access “user names and group membership,” a good thing but a better (read more useful) thing would be to manage other user identifications for access to enterprise applications.
Or to put that differently, to map user identifications together for any single user, so the appropriate identification is used for any particular system. (Thinking of long term legacy systems and applications. Almost everyone has them.)
Certainly worth your attention as this develops towards release.
Posted in Graphs, Microsoft | No Comments »
Friday, July 20th, 2012
I had to re-write a line from “Stanta Claus is coming to town” just a bit to fit the story about SkyDrive I read today: Watch what you store on SkyDrive–you may lose your Microsoft life.
I don’t find the terms of service surprising. Everybody has to say that sort of thing to avoid liability in case you store, transfer, etc., something illegal using their service.
The rules used to require notice and refusal to remove content before you have any liability.
Has that changed?
Curious for a number of reasons, not the least of which is providing topic map data products and topic map appliances online.
Posted in Marketing, Microsoft | No Comments »
Thursday, July 19th, 2012
More of Microsoft’s App Development Tools Goes Open Source by Gianugo Rabellino.
From the post:
Today marks a milestone since we launched Microsoft Open Technologies, Inc. (MS Open Tech) as we undertake some important open source projects. We’re excited to share the news that MS Open Tech will be open sourcing the Entity Framework (EF), a database mapping tool useful for application development in the .NET Framework. EF will join the other open source components of Microsoft’s dev tools – MVC, Web API, and Web Pages with Razor Syntax – on CodePlex to help increase the development transparency of this project.
MS Open Tech will serve as an accelerator for these projects by working with the open source communities through our new MS Open Tech CodePlex landing page. Together, we will help build out its source code until shipment of the next product version.
This will enable everyone in the community to monitor and provide feedback on code check-ins, bug-fixes, new feature development, and build and test the products on a daily basis using the most up-to-date version of the source code.
The newly opened EF will, for the first time, allow developers outside Microsoft to submit patches and code contributions that the MS Open Tech development team will review for potential inclusion in the products.
We need more MS “native” topic map engines and applications.
Or topic map capabilities in the core of MS Office™.
Lots of people could start writing topic maps.
Which would be a good thing. A lot of people write documents using MS Word™, they also reach for professional typesetters for publication.
Same will be true for topic maps.
Posted in Microsoft, Open Source | No Comments »
Monday, July 16th, 2012
Data Mining In Excel: Lecture Notes and Cases (2005) by Galit Shmueli, Nitin R. Patel, and Peter C. Bruce.
From the introduction:
This book arose out of a data mining course at MIT’s Sloan School of Management. Preparation for the course revealed that there are a number of excellent books on the business context of data mining, but their coverage of the statistical and machine-learning algorithms that underlie data mining is not sufficiently detailed to provide a practical guide if the instructor’s goal is to equip students with the skills and tools to implement those algorithms. On the other hand, there are also a number of more technical books about data mining algorithms, but these are aimed at the statistical researcher, or more advanced graduate student, and do not provide the case-oriented business focus that is successful in teaching business students.
Hence, this book is intended for the business student (and practitioner) of data mining techniques, and its goal is threefold:
- To provide both a theoretical and practical understanding of the key methods of classification, prediction, reduction and exploration that are at the heart of data mining;
- To provide a business decision-making context for these methods;
- Using real business cases, to illustrate the application and interpretation of these methods.
An important feature of this book is the use of Excel, an environment familiar to business analysts. All required data mining algorithms (plus illustrative datasets) are provided in an Excel add-in, XLMiner. XLMiner offers a variety of data mining tools: neural nets, classification and regression trees, k-nearest neighbor classification, naive Bayes, logistic regression, multiple linear regression, and discriminant analysis, all for predictive modeling. It provides for automatic partitioning of data into training, validation and test samples, and for the deployment of the model to new data. It also offers association rules, principal components analysis, k-means clustering and hierarchical clustering, as well as visualization tools, and data handling utilities. With its short learning curve, affordable price, and reliance on the familiar Excel platform, it is an ideal companion to a book on data mining for the business student.
Some what dated but remember there are lots of older copies of MS Office around. Not an inconsiderable market if you start to write something on using Excel to produce topic maps. Write for the latest version but I would have a version keyed to earlier versions of Excel as well.
I first saw this at KDNuggets.
Posted in Data Mining, Excel, Microsoft | No Comments »
Tuesday, July 10th, 2012
MongoDB Installer for Windows Azure by Doug Mahugh.
From the post:
Do you need to build a high-availability web application or service? One that can scale out quickly in response to fluctuating demand? Need to do complex queries against schema-free collections of rich objects? If you answer yes to any of those questions, MongoDB on Windows Azure is an approach you’ll want to look at closely.
People have been using MongoDB on Windows Azure for some time (for example), but recently the setup, deployment, and development experience has been streamlined by the release of the MongoDB Installer for Windows Azure. It’s now easier than ever to get started with MongoDB on Windows Azure!
If you are developing or considering developing with MongoDB, this is definitely worth a look. In part because it frees you to concentrate on software development and not running (or trying to run) a server farm. Different skill sets.
Another reason is that is levels the playing field with big IT firms with server farms. You get the advantages of a server farm without the capital investment in one.
And as Microsoft becomes a bigger and bigger tent for diverse platforms and technologies, you have more choices. Choices for the changing requirements of your clients.
Not that I expect to see an Apple hanging from the Microsoft tree anytime soon but you can’t ever tell. Enough consumer demand and it could happen.
In the meantime, while we wait for better games and commercials, consider how you would power semantic integration in the cloud?
Posted in Azure Marketplace, Microsoft, MongoDB | No Comments »
Sunday, June 17th, 2012
Data Mining with Microsoft SQL Server 2008
Sandro Saitta writes:
If you are using Microsoft data mining tools, this book is a must have. Written by MacLennan, Tang and Crivat, it describes how to perform data mining using SQL Server 2008. The book is huge – more than 630 pages – but it is normal since authors give detailed explanation for each data mining function. The book covers topics such as general data mining concepts, DMX, Excel add-ins, OLAP cubes, data mining architecture and many more. The seven data mining algorithms included in the tool are described in separate chapters.
The book is well written, so it can be read from A to Z or by selecting specific chapters. Each theoretical concept is explained through examples. Using screenshots, each step of a given method is presented in details. It is thus more a user manual than a book explaining data mining concepts. Don’t expect to read any detailed algorithms or equations. A good surprise of the book are the case studies. They are present in most chapters and show real examples and how to solve them. It really shows the experience of the authors in the field.
I haven’t seen the book, yet, but that can be corrected.
Posted in Data Mining, Microsoft, SQL Server | No Comments »
Saturday, June 9th, 2012
Working with NoSQL Databases
From Microsoft’s TechNet, an outline listing of NoSQL links and resources.
Has the advantage (over similar resources) of being in English, Deustch, Italian and Português.
Posted in Microsoft, NoSQL | No Comments »
Saturday, May 26th, 2012
Doug Mahugh spotted on MongoDB Replica Sets.
The video also teaches you about MongoDB replica sets on Windows. Replica sets being the means MongoDB uses for high reliability and read performance. An expert from 10gen, Sridhar Nanjundeswaran, covers the MongoDB stuff.
PS: Kudos to Doug on his new role at MS on reaching out to open source projects!
Posted in Microsoft, MongoDB, Replica Sets | No Comments »
Tuesday, May 22nd, 2012
Roger Jennings writes in Recent Articles about SQL Azure Labs and Other Value-Added Windows Azure SaaS Previews: A Bibliography:
I’ve been concentrating my original articles for the past six months or so on SQL Azure Labs, Apache Hadoop on Windows Azure and SQL Azure Federations previews, which I call value-added offerings. I use the term value-added because Microsoft doesn’t charge for their use, other than Windows Azure compute, storage and bandwidth costs or SQL Azure monthly charges and bandwidth costs for some of the applications, such as Codename “Cloud Numerics” and SQL Azure Federations.
As of 22 May 2012, there are forty-four (44) posts in the following categories:
- Windows Azure Marketplace DataMarket plus Codenames “Data Hub” and “Data Transfer” from SQL Azure Labs
- Apache Hadoop on Windows Azure from the SQL Server Team
- Codename “Cloud Numerics” from SQL Azure Labs
- Codename “Social Analytics from SQL Azure Labs
- Codename “Data Explorer” from SQL Azure Labs
- SQL Azure Federations from the SQL Azure Team
If you need quick guides and/or incentives to use Windows Azure, try these on for size.
Posted in Azure Marketplace, Microsoft, SQL, Windows Azure, Windows Azure Marketplace | No Comments »
Saturday, April 28th, 2012
First Light – MS Open Tech: Redis on Windows
Claudio Caldato writes:
The past few weeks have been very busy in our offices as we announced the creation of Microsoft Open Technologies, Inc. Now that the dust has settled it’s time for us to resume our regular cadence in releasing code, and we are happy to share with you the very first deliverable from our new company: a new and significant iteration of our work on Redis on Windows, the open-source, networked, in-memory, key-value data store.
The major improvements in this latest version involve the process of saving data on disk. Redis on Linux uses an OS feature called Fork/Copy On Write. This feature is not available on Windows, so we had to find a way to be able to mimic the same behavior without changing completely the save on disk process so as to avoid any future integration issues with the Redis code.
Excellent news!
BTW, Microsoft Open Technologies has a presence on Github. Just the one project (Redis on Windows) but I am sure more will follow.
Posted in Microsoft, Redis | No Comments »
Thursday, March 22nd, 2012
Tracking Microsoft Buzz with Blogs, Twitter, Bitly and Videos
Matthew Hurst writes:
Microsoft is an incredibly diverse company. I’ve just celebrated 5 years here and still don’t have a full appreciation of the breadth and depth of products and innovation that the corporation generates. After BlogPulse was unplugged, I felt something of a hankering to continue to follow the buzz around Microsoft, partly as a way to better follow what the company is doing and how it is perceived in the online world.
I’m a big fan of TechMeme, but it has some challenges when it comes to tracking news and trends around a specific company. Firstly, I don’t know the sources that are used and the ranking mechanisms in place, so it is hard to really understand quantitatively what it represents. Secondly, with limited real estate, while a big story may be happening for a company of interest, it can be crowded out by other events. Thirdly, I can’t help but think it has a strong valley culture bias. Fourthly, it hasn’t evolved much in the years that I’ve been visiting it.
So I’ve put together an experimental site called track // microsoft which follows a few blogs, clusters posts that are related and uses Bitly and Twitter data to rank the articles and clusters of stories. In doing this, I observed that many posts in the blogosphere about Microsoft would contain videos (be they of Windows 8 demos or the latest research leveraging the Kinect platform).
A great illustration that not every useful search application crawls the entire WWW.
It should crawl only as much as you need. The rest is just noise.
Posted in Microsoft, Searching | No Comments »