Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 16, 2013

Anniversary! Microsoft Open Technologies, Inc. (MS Open Tech)

Filed under: Microsoft,Open Source — Patrick Durusau @ 6:58 pm

You’re invited to help us celebrate an unlikely pairing in open source by Gianugo Rabellino.

From the post:

We are just days away from reaching a significant milestone for our team and the open source and open standards communities: the first anniversary of Microsoft Open Technologies, Inc. (MS Open Tech) — a wholly owned subsidiary of Microsoft.

We can’t think of anyone better to celebrate with than YOU, the members of the open source and open standards community and technology industry who have helped us along on our adventure over the past year.

We’d like to extend an open (pun intended!) invitation to celebrate with us on April 25, and share your burning questions on the future of the subsidiary, open source at-large and how MS Open Tech can better connect with the developer community to present even more choice and freedom.

I’ll be proud to share the stage with our amazing MS Open Tech leadership team: Jean Paoli, President; Kamaljit Bath, Engineering team leader; and Paul Cotton, Standards team leader and Co-Chair of the W3C HTML Working Group.

You have three choices:

  1. You can be a hard ass and stay home to “punish” MS for real and imagined slights and sins over the years. (You won’t be missed.)
  2. You can be obnoxious and attend, doing your best to not have a good time and trying to keep others from having a good time. (Better to stay home.)
  3. You can attend, have a good time, ask good questions, encourage more innovation and support by Microsoft for the open source and open standards communities.

Microsoft is going to be a major player in whatever solution to semantic interoperability catches on.

If that is topic maps, then Microsoft will be into topic maps.

I would prefer that be under the open source/open standards banner.

Distance prevents me from attending but I will be there in spirit!

Happy Anniversary to Microsoft Open Technologies, Inc.!

April 11, 2013

MS Machine Learning Summit [23 April 2013]

Filed under: Machine Learning,Microsoft — Patrick Durusau @ 2:29 pm

MS Machine Learning Summit

From the post:

The live broadcast of the Microsoft Research Machine Learning Summit will include keynotes from machine learning experts and enlightening discussions with leading scientific and academic researchers about approaches to challenges that are raised by the new era in machine learning. Watch it streamed live from Paris on April 23, 2013, 13:30–17:00 Greenwich Mean Time (09:30–13:00 Eastern Time, 06:30–10:00 Pacific Time) at http://MicrosoftMLS.com.

I would rather be in Paris but watching the live stream will be a lot cheaper!

March 16, 2013

Finding Shakespeare’s Favourite Words With Data Explorer

Filed under: Data Explorer,Data Mining,Excel,Microsoft,Text Mining — Patrick Durusau @ 2:07 pm

Finding Shakespeare’s Favourite Words With Data Explorer by Chris Webb.

From the post:

The more I play with Data Explorer, the more I think my initial assessment of it as a self-service ETL tool was wrong. As Jamie pointed out recently, it’s really the M language with a GUI on top of it and the GUI itself, while good, doesn’t begin to expose the power of the underlying language: I’d urge you to take a look at the Formula Language Specification and Library Specification documents which can be downloaded from here to see for yourself. So while it can certainly be used for self-service ETL it can do much, much more than that…

In this post I’ll show you an example of what Data Explorer can do once you go beyond the UI. Starting off with a text file containing the complete works of William Shakespeare (which can be downloaded from here – it’s strange to think that it’s just a 5.3 MB text file) I’m going to find the top 100 most frequently used words and display them in a table in Excel.

If Data Explorer is a GUI on top of M (outdated but a point of origin), it goes up in importance.

From the M link:

The Microsoft code name “M” Modeling Language, hereinafter referred to as M, is a language for modeling domains using text. A domain is any collection of related concepts or objects. Modeling domain consists of selecting certain characteristics to include in the model and implicitly excluding others deemed irrelevant. Modeling using text has some advantages and disadvantages over modeling using other media such as diagrams or clay. A goal of the M language is to exploit these advantages and mitigate the disadvantages.

A key advantage of modeling in text is ease with which both computers and humans can store and process text. Text is often the most natural way to represent information for presentation and editing by people. However, the ability to extract that information for use by software has been an arcane art practiced only by the most advanced developers. The language feature of M enables information to be represented in a textual form that is tuned for both the problem domain and the target audience. The M language provides simple constructs for describing the shape of a textual language – that shape includes the input syntax as well as the structure and contents of the underlying information. To that end, M acts as both a schema language that can validate that textual input conforms to a given language as well as a transformation language that projects textual input into data structures that are amenable to further processing or storage.

I try to not run examples using Shakespeare. I get distracted by the elegance of the text, which isn’t the point of the exercise. 😉

February 28, 2013

Public Preview of Data Explorer

Filed under: Data Explorer,Data Mining,Microsoft — Patrick Durusau @ 5:26 pm

Public Preview of Data Explorer by Chris Webb.

From the post:

In a nutshell, Data Explorer is self-service ETL for the Excel power user – it is to SSIS what PowerPivot is to SSAS. In my opinion it is just as important as PowerPivot for Microsoft’s self-service BI strategy.

I’ll be blogging about it in detail over the coming days (and also giving a quick demo in my PASS Business Analytics Virtual Chapter session tomorrow), but for now here’s a brief list of things it gives you over Excel’s native functionality for importing data:

  • It supports a much wider range of data sources, including Active Directory, Facebook, Wikipedia, Hive, and tables already in Excel
  • It has better functionality for data sources that are currently supported, such as the Azure Marketplace and web pages
  • It can merge data from multiple files that have the same structure in the same folder
  • It supports different types of authentication and the storing of credentials
  • It has a user-friendly, step-by-step approach to transforming, aggregating and filtering data until it’s in the form you want
  • It can load data into the worksheet or direct into the Excel model

There’s a lot to it, so download it and have a play! It’s supported on Excel 2013 and Excel 2010 SP1.

Download: Microsoft “Data Explorer” Preview for Excel

Chris has collected a number of links to Data Explorer resources so look to his post for more details.

It looks like a local install is required for the preview. I have been meaning to add Windows 7 to a VM and MS Office with that.

Guess it may be time to take the plunge. 😉 (I have XP/Office on a separate box that uses the same monitors/keyboard but sharing data is problematic.)

February 27, 2013

Microsoft and Hadoop, Sitting in a Tree…*

Filed under: Hadoop,Hortonworks,MapReduce,Microsoft — Patrick Durusau @ 2:55 pm

Putting the Elephant in the Window by John Kreisa.

From the post:

For several years now Apache Hadoop has been fueling the fast growing big data market and has become the defacto platform for Big Data deployments and the technology foundation for an explosion of new analytic applications. Many organizations turn to Hadoop to help tame the vast amounts of new data they are collecting but in order to do so with Hadoop they have had to use servers running the Linux operating system. That left a large number of organizations who standardize on Windows (According to IDC, Windows Server owned 73 percent of the market in 2012 – IDC, Worldwide and Regional Server 2012–2016 Forecast, Doc # 234339, May 2012) without the ability to run Hadoop natively, until today.

We are very pleased to announce the availability of Hortonworks Data Platform for Windows providing organizations with an enterprise-grade, production-tested platform for big data deployments on Windows. HDP is the first and only Hadoop-based platform available on both Windows and Linux and provides interoperability across Windows, Linux and Windows Azure. With this release we are enabling a massive expansion of the Hadoop ecosystem. New participants in the community of developers, data scientist, data management professionals and Hadoop fans to build and run applications for Apache Hadoop natively on Windows. This is great news for Windows focused enterprises, service provides, software vendors and developers and in particular they can get going today with Hadoop simply by visiting our download page.

This release would not be possible without a strong partnership and close collaboration with Microsoft. Through the process of creating this release, we have remained true to our approach of community-driven enterprise Apache Hadoop by collecting enterprise requirements, developing them in open source and applying enterprise rigor to produce a 100-precent open source enterprise-grade Hadoop platform.

Now there is a very smart marketing move!

A smaller share of a larger market is always better than a large share of a small market.

(You need to be writing down these quips.) 😉

Seriously, take note of how Hortonworks used the open source model.

They did not build Hadoop in their image and try to sell it to the world.

Hortonworks gathered requirements from others and built Hadoop to meet their needs.

Open source model in both cases, very different outcomes.

* I didn’t remember the rhyme beyond the opening line. Consulting the oracle (Wikipedia), I discovered Playground song. 😉

February 11, 2013

Microsoft Reveals Rapid Big Data Adoption [No Pain, No Change]

Filed under: BigData,Microsoft — Patrick Durusau @ 3:05 pm

Microsoft Reveals Rapid Big Data Adoption

From the post:

More than 75 percent of midsize to large businesses are implementing big-data-related solutions within the next 12 months — with customer care, marketing and sales departments increasingly driving demand, according to new Microsoft Corp. research released today.

According to Microsoft’s “Global Enterprise Big Data Trends: 2013” study of more than 280 IT decision-makers, the following trends emerged:

  • Although the IT department (52 percent) is currently driving most of the demand for big data, customer care (41 percent), sales (26 percent), finance (23 percent) and marketing (23 percent) departments are increasingly driving demand.
  • Seventeen percent of customers surveyed are in the early stages of researching big data solutions, whereas 13 percent have fully deployed them; nearly 90 percent of customers surveyed have a dedicated budget for addressing big data.
  • Nearly half of customers (49 percent) reported that growth in the volume of data is the greatest challenge driving big data solution adoption, followed by having to integrate disparate business intelligence tools (41 percent) and having tools able to glean the insight (40 percent).

After hunting around the MS News Center I found: Customers Rapidly Adopting Big Data Solutions — Driven By Marketing, Sales and More — Reports New Microsoft Research.

Links from the MS News Center take me back to that infographic so that may be the “publication” they are talking about.

It’s alright but the same thing could have been a one page, suitable for printing/sharing type report.

In any event, it is encouraging news because the greater the adoption of “big data,” the more semantic impedance is going to cause real pain.

No pain, no change.

😉

When it hurts bad enough, take two topic maps and call me in the morning.

February 8, 2013

Netflix: Solving Big Problems with Reactive Extensions (Rx)

Filed under: ActorFx,JavaRx,Microsoft,Rx — Patrick Durusau @ 5:15 pm

Netflix: Solving Big Problems with Reactive Extensions (Rx) by Claudio Caldato.

From the post:

More good news for Reactive Extensions (Rx).

Just yesterday, we told you about improvements we’ve made to two Microsoft Open Technologies, Inc., releases: Rx and ActorFx, and mentioned that Netflix was already reaping the benefits of Rx.

To top it off, on the same day, Netflix announced a Java implementation of Rx, RxJava, was now available in the Netflix Github repository. That’s great news to hear, especially given how Ben Christensen and Jafar Husain outlined on the Netflix Tech blog that their goal is to “stay close to the original Rx.NET implementation” and that “all contracts of Rx should be the same.”

Netflix also contributed a great series of interactive exercises for learning Microsoft’s Reactive Extensions (Rx) Library for JavaScript as well as some fundamentals for functional programming techniques.

Rx as implemented in RxJava is part of the solution Netflix has developed for improving the processing of 2+ billion incoming requests a day for millions of customers around the world.

Do you have 2+ billion requests coming into your topic map every day?

Assuming the lesser includes the greater, you may want to take a look at Rx or RxJava.

Be sure to visit the interactive exercises!

February 3, 2013

Need to discover, access, analyze and visualize big and broad data? Try F#.

Filed under: Data Analysis,Data Mining,F#,Microsoft — Patrick Durusau @ 6:58 pm

Need to discover, access, analyze and visualize big and broad data? Try F#. by Oliver Bloch.

From the post:

Microsoft Research just released a new iteration of Try F#, a set of tools designed to make it easy for anyone – not just developers – to learn F# and take advantage of its big data, cross-platform capabilities.

F# is the open-source, cross-platform programming language invented by Don Syme and his team at Microsoft Research to help reduce the time-to-deployment for analytical software components in the modern enterprise.

Big data definitively is big these days and we are excited about this new iteration of Try F#. Regardless of your favorite language, or if you’re on a Mac, a Windows PC, Linux or Android, if you need to deal with complex problems, you will want to take a look at F#!

Kerry Godes from Microsoft’s Openness Initiative connected with Evelyne Viegas, Director of Semantic Computing at Microsoft Research, to find out more about how you can use “Try F# to seamlessly discover, access, analyze and visualize big and broad data.” For the complete interview, go to the Openness blog or check out www.tryfsharp.org to get started “writing simple code for complex problems”.

Are you an F# user?

Curious how F# compares to other languages for “complexity?”

Visualization gurus: Does the complexity of languages go up or down with the complexity of licensing terms?

Inquiring minds want to know. 😉

February 2, 2013

Office 2013, Office 365 Editions and BI Features

Filed under: BI,Microsoft — Patrick Durusau @ 3:09 pm

Office 2013, Office 365 Editions and BI Features by Chris Webb.

From the post:

By now you’re probably aware that Office 2013 is in the process of being officially released, and that Office 365 is a very hot topic. You’ve probably also read lots of blog posts by me and other writers talking about the cool new BI functionality in Office 2013 and Office 365. But which editions of Office 2013 and Office 365 include the BI functionality, and how does Office 365 match up to plain old non-subscription Office 2013 for BI? It’s surprisingly hard to find out the answers…

For regular, non-subscription, Office 2013 on the desktop you need Office Professional Plus to use the PowerPivot addin or to use Power View in Excel. However there’s an important distinction to make: the xVelocity engine is now natively integrated into Excel 2013, and this functionality is called the Excel Data Model and is available in all desktop editions of Excel. You only need the PowerPivot addin, and therefore Professional Plus, if you want to use the PowerPivot Window to modify and extend your model (for example by adding calculated columns or KPIs). So even if you’re not using Professional Plus you can still do some quite impressive BI stuff with PivotTables etc. On the server, the only edition of Sharepoint 2013 that has any BI functionality is Enterprise Edition; there’s no BI functionality in Foundation or Standard Editions.

No matter what OS you are running, you are likely to be using some version of MS Office and if you are reading this blog, probably for BI purposes.

Chris does a great job at pointing to resources and generating resources to guide you through the feature/license thicket that surrounds MS Office in its various incarnations.

Complex licensing/feature matrices contribute to the size of department budgets that create such complexity. They don’t contribute to the bottom line at Microsoft. There is a deep and profound difference.

January 11, 2013

Getting Started with VM Depot

Filed under: Azure Marketplace,Cloud Computing,Linux OS,Microsoft,Virtual Machines — Patrick Durusau @ 7:35 pm

Getting Started with VM Depot by Doug Mahugh.

From the post:

Do you need to deploy a popular OSS package on a Windows Azure virtual machine, but don’t know where to start? Or do you have a favorite OSS configuration that you’d like to make available for others to deploy easily? If so, the new VM Depot community portal from Microsoft Open Technologies is just what you need. VM Depot is a community-driven catalog of preconfigured operating systems, applications, and development stacks that can easily be deployed on Windows Azure.

You can learn more about VM Depot in the announcement from Gianugo Rabellino over on Port 25 today. In this post, we’re going to cover the basics of how to use VM Depot, so that you can get started right away.

Doug outlines simple steps to get you rolling with the VM Depot.

Sounds a lot easier than trying to walk casual computer users through installation and configuration of software. I assume you could even load data onto the VMs.

Users just need to fire up the VM and they have the interface and data they want.

Sounds like a nice way to distribute topic map based information systems.

December 24, 2012

Microsoft Open Technologies releases Windows Azure support for Solr 4.0

Filed under: Azure Marketplace,Microsoft,Solr — Patrick Durusau @ 4:08 pm

Microsoft Open Technologies releases Windows Azure support for Solr 4.0 by Brian Benz.

From the post:

Microsoft Open Technologies is pleased to share the latest update to the Windows Azure self-deployment option for Apache Solr 4.0.

Solr 4.0 is the first release to use the shared 4.x branch for Lucene & Solr and includes support for SolrCloud functionality. SolrCloud allows you to scale a single index via replication over multiple Solr instances running multiple SolrCores for massive scaling and redundancy.

To learn more about Solr 4.0, have a look at this 40 minute video covering Solr 4 Highlights, by Mark Miller of LucidWorks from Apache Lucene Eurocon 2011.

To download and install Solr on Windows Azure visit our GitHub page to learn more and download the SDK.

Another alternative for implementing the best of Lucene/Solr on Windows Azure is provided by our partner LucidWorks. LucidWorks Search on Windows Azure delivers a high-performance search solution that enables quick and easy provisioning of Lucene/Solr search functionality without any need to install, manage or operate Lucene/Solr servers, and it supports pre-built connectors for various types of enterprise data, structured data, unstructured data and web sites.

Beyond the positive impact for Solr and Azure in general, this means your Solr skills will be useful in new places.

November 7, 2012

Rx for Asychronous Data Streams in the Clouds

Filed under: Cloud Computing,Data Streams,Microsoft,Rx — Patrick Durusau @ 4:29 pm

Claudio Caldato wrote: MS Open Tech Open Sources Rx (Reactive Extensions) – a Cure for Asynchronous Data Streams in Cloud Programming.

I was tired by the time I got to the end of the title! His is more descriptive than mine but if you know the context, you don’t need the description.

From the post:

If you are a developer that writes asynchronous code for composite applications in the cloud, you know what we are talking about, for everybody else Rx Extensions is a set of libraries that makes asynchronous programming a lot easier. As Dave Sexton describes it, “If asynchronous spaghetti code were a disease, Rx is the cure.”

Reactive Extensions (Rx) is a programming model that allows developers to glue together asynchronous data streams. This is particularly useful in cloud programming because helps create a common interface for writing applications that come from diverse data sources, e.g., stock quotes, Tweets, computer events, Web service requests.

Today, Microsoft Open Technologies, Inc., is open sourcing Rx. Its source code is now hosted on CodePlex to increase the community of developers seeking a more consistent interface to program against, and one that works across several development languages. The goal is to expand the number of frameworks and applications that use Rx in order to achieve better interoperability across devices and the cloud.

Rx was developed by Microsoft Corp. architect Erik Meijer and his team, and is currently used on products in various divisions at Microsoft. Microsoft decided to transfer the project to MS Open Tech in order to capitalize on MS Open Tech’s best practices with open development.

There are applications that you probably touch every day that are using Rx under the hood. A great example is GitHub for Windows.

According to Paul Betts at GitHub, “GitHub for Windows uses the Reactive Extensions for almost everything it does, including network requests, UI events, managing child processes (git.exe). Using Rx and ReactiveUI, we’ve written a fast, nearly 100% asynchronous, responsive application, while still having 100% deterministic, reliable unit tests. The desktop developers at GitHub loved Rx so much, that the Mac team created their own version of Rx and ReactiveUI, called ReactiveCocoa, and are now using it on the Mac to obtain similar benefits.”

What if the major cloud players started competing on the basis of interoperability? So your app here will work there.

Reducing the impedance for developers enables more competition between developers. Resulting in better services/product for consumers.

Cloud owners get more options to offer their customers.

Topic map applications have an easier time mining, identifying and recombining subjects across diverse sources and even clouds.

Does anyone see a downside here?

November 1, 2012

KitaroDB [intrusive-keyed database]

Filed under: KitaroDB,Microsoft,NoSQL — Patrick Durusau @ 5:30 pm

KitaroDB

From the “What is KitaroDB” page:

KitaroDB is a fast, efficient and scalable NoSQL database that operates natively in the WinRT (Windows 8 UI), the Win32 (x86 and x64) and .NET environments. It features:

  • Easy-to-use interfaces for C#, VB, C++, C and HTML5/JavaScript developers;
  • A proven commercial database system;
  • Support of large-sector drives;
  • Minimal overhead, consuming less than a megabyte of memory resources;
  • Durability as on-disk data store;
  • High performance, capable of handling tens of thousands of operations per second;
  • Asynchronous and synchronous operations;
  • Storage limit of 100 TB;
  • Flexibility as either a key/value data store or an intrusive key database with segmented key support.

The phrase “intrusive-keyed database” was unfamiliar.

Keys can be segmented into up to 255 segments with what appears to be a fairly limited range of data types. Some of the KitaroDB documentation on intrusive-keyed database” is found here: Creating a KitaroDB database.

Segmented keys aren’t unique to KitraoDB and they offer some interesting possibilities. More to follow on that score.

Storage being limited to 100 TB should not be an issue for “middling” data sized applications. 😉

October 25, 2012

DINOSAURS ARE REAL: Microsoft WOWs audience with HDInsight…(Hortonworks Inside)

Filed under: Hadoop,HDInsight,Hortonworks,Microsoft — Patrick Durusau @ 4:02 pm

DINOSAURS ARE REAL: Microsoft WOWs audience with HDInsight at Strata NYC (Hortonworks Inside) by Russell Jurney.

From the post:

You don’t see many demos like the one given by Shawn Bice (Microsoft) today in the Regent Parlor of the New York Hilton, at Strata NYC. “Drive Smarter Decisions with Microsoft Big Data,” was different.

For starters – everything worked like clockwork. Live demos of new products are notorious for failing on-stage, even if they work in production. And although Microsoft was presenting about a Java-based platform at a largely open-source event… it was standing room only, with the crowd overflowing out the doors.

Shawn demonstrated working with Apache Hadoop from Excel, through Power Pivot, to Hive (with sampling-driven early results!?) and out to import third party data-sets. To get the full effect of what he did, you’re going to have to view a screencast or try it out but to give you the idea of what the first proper interface on Hadoop feels like…

My thoughts on reading Russell’s post:

  • A live product demo that did not fail? Really?
  • Is that tatoo copyrighted?
  • Oh, yes, +1!, big data has become real for millions of users.

How’s that for a big data book, tutorial, consulting, semantic market explosion?

Why Microsoft is committed to Hadoop and Hortonworks

Filed under: BigData,Hadoop,Hortonworks,Microsoft — Patrick Durusau @ 2:53 pm

Why Microsoft is committed to Hadoop and Hortonworks (a buest post at Hortonworks by Microsoft’s Dave Campbell).

From the post:

Last February at Strata Conference in Santa Clara we shared Microsoft’s progress on Big Data, specifically working to broaden the adoption of Hadoop with the simplicity and manageability of Windows and enabling customers to easily derive insights from their structured and unstructured data through familiar tools like Excel.

Hortonworks is a recognized pioneer in the Hadoop Community and a leading contributor to the Apache Hadoop project, and that’s why we’re excited to announce our expanded partnership with Hortonworks to give customers access to an enterprise-ready distribution of Hadoop that is 100 percent compatible with Windows Server and Windows Azure. To provide customers with access to this Hadoop compatibility, yesterday we also released new previews of Microsoft HDInsight Server for Windows and Windows Azure HDInsight Service, our Hadoop-based solutions for Windows Server and Windows Azure.

With this expanded partnership, the Hadoop community will reap the following benefits of Hadoop on Windows:

  • Insights to all users from all data:….
  • Enterprise-ready Hadoop with HDInsight:….
  • Simplicity of Windows for Hadoop:….
  • Extend your data warehouse with Hadoop:….
  • Seamless Scale and Elasticity of the Cloud:….

This is a very exciting milestone, and we hope you’ll join us for the ride as we continue partnering with Hortonworks to democratize big data. Download HDInsight today at Microsoft.com/BigData.

See Dave’s post for the details on “benefits of Hadoop on Windows” and then like the man says:

Download HDInsight today at Microsoft.com/BigData.

Enabling Big Data Insight for Millions of Windows Developers [Your Target Audience?]

Filed under: Azure Marketplace,BigData,Hadoop,Hortonworks,Microsoft — Patrick Durusau @ 2:39 pm

Enabling Big Data Insight for Millions of Windows Developers by Shaun Connolly.

From the post:

At Hortonworks, we fundamentally believe that, in the not-so-distant future, Apache Hadoop will process over half the world’s data flowing through businesses. We realize this is a BOLD vision that will take a lot of hard work by not only Hortonworks and the open source community, but also software, hardware, and solution vendors focused on the Hadoop ecosystem, as well as end users deploying platforms powered by Hadoop.

If the vision is to be achieved, we need to accelerate the process of enabling the masses to benefit from the power and value of Apache Hadoop in ways where they are virtually oblivious to the fact that Hadoop is under the hood. Doing so will help ensure time and energy is spent on enabling insights to be derived from big data, rather than on the IT infrastructure details required to capture, process, exchange, and manage this multi-structured data.

So how can we accelerate the path to this vision? Simply put, we focus on enabling the largest communities of users interested in deriving value from big data.

You don’t have to wonder long what Shaun is reacting to:

Today Microsoft unveiled previews of Microsoft HDInsight Server and Windows Azure HDInsight Service, big data solutions that are built on Hortonworks Data Platform (HDP) for Windows Server and Windows Azure respectively. These new offerings aim to provide a simplified and consistent experience across on-premise and cloud deployment that is fully compatible with Apache Hadoop.

Enabling big data insight isn’t the same as capturing those insights for later use or re-use.

May just be me, but that sounds like a great opportunity for topic maps.

Bringing semantics to millions of Windows developers that is.

August 17, 2012

Marching Hadoop to Windows

Filed under: Excel,Hadoop,Microsoft — Patrick Durusau @ 3:59 pm

Marching Hadoop to Windows

From the post:

Bringing Hadoop to Windows and the two-year development of Hadoop 2.0 are two of the more exciting developments brought up by Hortonworks’s Cofounder and CTO, Eric Baldeschwieler, in a talk before a panel at the Cloud 2012 Conference in Honolulu.

(video omitted)

The panel, which was also attended by Baldeschwieler’s Cloudera counterpart Amr Awadallah, focused on insights into the big data world, a subject Baldeschwieler tackled almost entirely with Hadoop. The eighteen-minute discussion also featured a brief history of Hadoop’s rise to prominence, improvements to be made to Hadoop, and a few tips to enterprising researchers wishing to contribute to Hadoop.

“Bringing Hadoop to Windows,” says Baldeschwieler “turns out to be a very exciting initiative because there are a huge number of users in Windows operating system.” In particular, the Excel spreadsheet program is a popular one for business analysts, something analysts would like to see integrated with Hadoop’s database. That will not be possible until, as Baldeschwieler notes, Windows is integrated into Hadoop later this year, a move that will also considerably expand Hadoop’s reach.

However, that announcement pales in comparison to the possibilities provided by the impending Hadoop 2.0. “Hadoop 2.0 is a pretty major re-write of Hadoop that’s been in the works for two years. It’s now in usable alpha form…The real focus in Hadoop 2.0 is scale and opening it up for more innovation.” Baldeschwieler notes that Hadoop’s rise has been result of what he calls “a happy accident” where it was being developed by his Yahoo team for a specific use case: classifying, sorting, and indexing each of the URLs that were under Yahoo’s scope.

Integration of Excel and Hadoop?

Is that going to be echoes of Unix – The Hole Hawg?

July 22, 2012

Windows Azure Active Directory Graph

Filed under: Graphs,Microsoft — Patrick Durusau @ 5:23 am

Windows Azure Active Directory Graph

Pre-release documentation, subject to change before release, blah, blah, but very interesting none the less.

When I look at the application scenario, Creating Enterprise Applications by Using Windows Azure AD Graph, which is described as:

In this scenario you have purchased an Office 365 subscription. As part of the subscription you have purchased the capability to manage users using Windows Azure AD, which is part of Windows Azure. You want to build an application that can access users’ information such as user names and group membership.

OK, so I can access “user names and group membership,” a good thing but a better (read more useful) thing would be to manage other user identifications for access to enterprise applications.

Or to put that differently, to map user identifications together for any single user, so the appropriate identification is used for any particular system. (Thinking of long term legacy systems and applications. Almost everyone has them.)

Certainly worth your attention as this develops towards release.

July 20, 2012

[It] knows if you’ve been bad or good so be good for [your own sake]

Filed under: Marketing,Microsoft — Patrick Durusau @ 1:49 pm

I had to re-write a line from “Stanta Claus is coming to town” just a bit to fit the story about SkyDrive I read today: Watch what you store on SkyDrive–you may lose your Microsoft life.

I don’t find the terms of service surprising. Everybody has to say that sort of thing to avoid liability in case you store, transfer, etc., something illegal using their service.

The rules used to require notice and refusal to remove content before you have any liability.

Has that changed?

Curious for a number of reasons, not the least of which is providing topic map data products and topic map appliances online.

July 19, 2012

More of Microsoft’s App Development Tools Goes Open Source

Filed under: Microsoft,Open Source — Patrick Durusau @ 2:38 pm

More of Microsoft’s App Development Tools Goes Open Source by Gianugo Rabellino.

From the post:

Today marks a milestone since we launched Microsoft Open Technologies, Inc. (MS Open Tech) as we undertake some important open source projects. We’re excited to share the news that MS Open Tech will be open sourcing the Entity Framework (EF), a database mapping tool useful for application development in the .NET Framework. EF will join the other open source components of Microsoft’s dev tools – MVC, Web API, and Web Pages with Razor Syntax – on CodePlex to help increase the development transparency of this project.

MS Open Tech will serve as an accelerator for these projects by working with the open source communities through our new MS Open Tech CodePlex landing page. Together, we will help build out its source code until shipment of the next product version.

This will enable everyone in the community to monitor and provide feedback on code check-ins, bug-fixes, new feature development, and build and test the products on a daily basis using the most up-to-date version of the source code.

The newly opened EF will, for the first time, allow developers outside Microsoft to submit patches and code contributions that the MS Open Tech development team will review for potential inclusion in the products.

We need more MS “native” topic map engines and applications.

Or topic map capabilities in the core of MS Office™.

Lots of people could start writing topic maps.

Which would be a good thing. A lot of people write documents using MS Word™, they also reach for professional typesetters for publication.

Same will be true for topic maps.

July 16, 2012

Data Mining In Excel: Lecture Notes and Cases (2005)

Filed under: Data Mining,Excel,Microsoft — Patrick Durusau @ 3:03 pm

Data Mining In Excel: Lecture Notes and Cases (2005) by Galit Shmueli, Nitin R. Patel, and Peter C. Bruce.

From the introduction:

This book arose out of a data mining course at MIT’s Sloan School of Management. Preparation for the course revealed that there are a number of excellent books on the business context of data mining, but their coverage of the statistical and machine-learning algorithms that underlie data mining is not sufficiently detailed to provide a practical guide if the instructor’s goal is to equip students with the skills and tools to implement those algorithms. On the other hand, there are also a number of more technical books about data mining algorithms, but these are aimed at the statistical researcher, or more advanced graduate student, and do not provide the case-oriented business focus that is successful in teaching business students.

Hence, this book is intended for the business student (and practitioner) of data mining techniques, and its goal is threefold:

  1. To provide both a theoretical and practical understanding of the key methods of classification, prediction, reduction and exploration that are at the heart of data mining;
  2. To provide a business decision-making context for these methods;
  3. Using real business cases, to illustrate the application and interpretation of these methods.

An important feature of this book is the use of Excel, an environment familiar to business analysts. All required data mining algorithms (plus illustrative datasets) are provided in an Excel add-in, XLMiner. XLMiner offers a variety of data mining tools: neural nets, classification and regression trees, k-nearest neighbor classification, naive Bayes, logistic regression, multiple linear regression, and discriminant analysis, all for predictive modeling. It provides for automatic partitioning of data into training, validation and test samples, and for the deployment of the model to new data. It also offers association rules, principal components analysis, k-means clustering and hierarchical clustering, as well as visualization tools, and data handling utilities. With its short learning curve, affordable price, and reliance on the familiar Excel platform, it is an ideal companion to a book on data mining for the business student.

Some what dated but remember there are lots of older copies of MS Office around. Not an inconsiderable market if you start to write something on using Excel to produce topic maps. Write for the latest version but I would have a version keyed to earlier versions of Excel as well.

I first saw this at KDNuggets.

July 10, 2012

MongoDB Installer for Windows Azure

Filed under: Azure Marketplace,Microsoft,MongoDB — Patrick Durusau @ 7:17 am

MongoDB Installer for Windows Azure by Doug Mahugh.

From the post:

Do you need to build a high-availability web application or service? One that can scale out quickly in response to fluctuating demand? Need to do complex queries against schema-free collections of rich objects? If you answer yes to any of those questions, MongoDB on Windows Azure is an approach you’ll want to look at closely.

People have been using MongoDB on Windows Azure for some time (for example), but recently the setup, deployment, and development experience has been streamlined by the release of the MongoDB Installer for Windows Azure. It’s now easier than ever to get started with MongoDB on Windows Azure!

If you are developing or considering developing with MongoDB, this is definitely worth a look. In part because it frees you to concentrate on software development and not running (or trying to run) a server farm. Different skill sets.

Another reason is that is levels the playing field with big IT firms with server farms. You get the advantages of a server farm without the capital investment in one.

And as Microsoft becomes a bigger and bigger tent for diverse platforms and technologies, you have more choices. Choices for the changing requirements of your clients.

Not that I expect to see an Apple hanging from the Microsoft tree anytime soon but you can’t ever tell. Enough consumer demand and it could happen.

In the meantime, while we wait for better games and commercials, consider how you would power semantic integration in the cloud?

June 17, 2012

Data Mining with Microsoft SQL Server 2008 [Book Review]

Filed under: Data Mining,Microsoft,SQL Server — Patrick Durusau @ 3:10 pm

Data Mining with Microsoft SQL Server 2008

Sandro Saitta writes:

If you are using Microsoft data mining tools, this book is a must have. Written by MacLennan, Tang and Crivat, it describes how to perform data mining using SQL Server 2008. The book is huge – more than 630 pages – but it is normal since authors give detailed explanation for each data mining function. The book covers topics such as general data mining concepts, DMX, Excel add-ins, OLAP cubes, data mining architecture and many more. The seven data mining algorithms included in the tool are described in separate chapters.

The book is well written, so it can be read from A to Z or by selecting specific chapters. Each theoretical concept is explained through examples. Using screenshots, each step of a given method is presented in details. It is thus more a user manual than a book explaining data mining concepts. Don’t expect to read any detailed algorithms or equations. A good surprise of the book are the case studies. They are present in most chapters and show real examples and how to solve them. It really shows the experience of the authors in the field.

I haven’t seen the book, yet, but that can be corrected. 😉

June 9, 2012

Working with NoSQL Databases [MS TechNet]

Filed under: Microsoft,NoSQL — Patrick Durusau @ 7:16 pm

Working with NoSQL Databases

From Microsoft’s TechNet, an outline listing of NoSQL links and resources.

Has the advantage (over similar resources) of being in English, Deustch, Italian and Português.

May 26, 2012

Doug Mahugh Live! was: MongoDB Replica Sets

Filed under: Microsoft,MongoDB,Replica Sets — Patrick Durusau @ 4:14 pm

Doug Mahugh spotted on MongoDB Replica Sets.

The video also teaches you about MongoDB replica sets on Windows. Replica sets being the means MongoDB uses for high reliability and read performance. An expert from 10gen, Sridhar Nanjundeswaran, covers the MongoDB stuff.

PS: Kudos to Doug on his new role at MS on reaching out to open source projects!

May 22, 2012

SQL Azure Labs Posts

Filed under: Azure Marketplace,Microsoft,SQL,Windows Azure,Windows Azure Marketplace — Patrick Durusau @ 10:36 am

Roger Jennings writes in Recent Articles about SQL Azure Labs and Other Value-Added Windows Azure SaaS Previews: A Bibliography:

I’ve been concentrating my original articles for the past six months or so on SQL Azure Labs, Apache Hadoop on Windows Azure and SQL Azure Federations previews, which I call value-added offerings. I use the term value-added because Microsoft doesn’t charge for their use, other than Windows Azure compute, storage and bandwidth costs or SQL Azure monthly charges and bandwidth costs for some of the applications, such as Codename “Cloud Numerics” and SQL Azure Federations.

As of 22 May 2012, there are forty-four (44) posts in the following categories:

  • Windows Azure Marketplace DataMarket plus Codenames “Data Hub” and “Data Transfer” from SQL Azure Labs
  • Apache Hadoop on Windows Azure from the SQL Server Team
  • Codename “Cloud Numerics” from SQL Azure Labs
  • Codename “Social Analytics from SQL Azure Labs
  • Codename “Data Explorer” from SQL Azure Labs
  • SQL Azure Federations from the SQL Azure Team

If you need quick guides and/or incentives to use Windows Azure, try these on for size.

April 28, 2012

First Light – MS Open Tech: Redis on Windows

Filed under: Microsoft,Redis — Patrick Durusau @ 6:05 pm

First Light – MS Open Tech: Redis on Windows

Claudio Caldato writes:

The past few weeks have been very busy in our offices as we announced the creation of Microsoft Open Technologies, Inc. Now that the dust has settled it’s time for us to resume our regular cadence in releasing code, and we are happy to share with you the very first deliverable from our new company: a new and significant iteration of our work on Redis on Windows, the open-source, networked, in-memory, key-value data store.

The major improvements in this latest version involve the process of saving data on disk. Redis on Linux uses an OS feature called Fork/Copy On Write. This feature is not available on Windows, so we had to find a way to be able to mimic the same behavior without changing completely the save on disk process so as to avoid any future integration issues with the Redis code.

Excellent news!

BTW, Microsoft Open Technologies has a presence on Github. Just the one project (Redis on Windows) but I am sure more will follow.

March 22, 2012

Tracking Microsoft Buzz with Blogs, Twitter, Bitly and Videos

Filed under: Microsoft,Searching — Patrick Durusau @ 7:43 pm

Tracking Microsoft Buzz with Blogs, Twitter, Bitly and Videos

Matthew Hurst writes:

Microsoft is an incredibly diverse company. I’ve just celebrated 5 years here and still don’t have a full appreciation of the breadth and depth of products and innovation that the corporation generates. After BlogPulse was unplugged, I felt something of a hankering to continue to follow the buzz around Microsoft, partly as a way to better follow what the company is doing and how it is perceived in the online world.

I’m a big fan of TechMeme, but it has some challenges when it comes to tracking news and trends around a specific company. Firstly, I don’t know the sources that are used and the ranking mechanisms in place, so it is hard to really understand quantitatively what it represents. Secondly, with limited real estate, while a big story may be happening for a company of interest, it can be crowded out by other events. Thirdly, I can’t help but think it has a strong valley culture bias. Fourthly, it hasn’t evolved much in the years that I’ve been visiting it.

So I’ve put together an experimental site called track // microsoft which follows a few blogs, clusters posts that are related and uses Bitly and Twitter data to rank the articles and clusters of stories. In doing this, I observed that many posts in the blogosphere about Microsoft would contain videos (be they of Windows 8 demos or the latest research leveraging the Kinect platform).

A great illustration that not every useful search application crawls the entire WWW.

It should crawl only as much as you need. The rest is just noise.

March 17, 2012

Lifebrowser: Data mining gets (really) personal at Microsoft

Filed under: Data Mining,Microsoft,Privacy — Patrick Durusau @ 8:20 pm

Lifebrowser: Data mining gets (really) personal at Microsoft

Nancy Owano writes:

Microsoft Research is doing research on software that could bring you your own personal data mining center with a touch of Proust for returns. In a recent video, Microsoft scientist Eric Horvitz demonstrated the Lifebrowser, which is prototype software that helps put your digital life in meaningful shape. The software uses machine learning to help a user place life events, which may span months or years, to be expanded or contracted selectively, in better context.

Navigating the large stores of personal information on a user’s computer, the program goes through the piles of personal data, including photos, emails and calendar dates. A search feature can pull up landmark events on a certain topic. Filtering the data, the software calls up memory landmarks and provides a timeline interface. Lifebrowser’s timeline shows items that the user can associate with “landmark” events with the use of artificial intelligence algorithms.

A calendar crawler, working with Microsoft Outlook extracts various properties from calendar events, such as location, organizer, and relationships between participants. The system then applies Bayesian machine learning and reasoning to derive atypical features from events that make them memorable. Images help human memory, and an image crawler analyzes a photo library. By associating an email with a relevant calendar date with a relevant document and photos, significance is gleaned from personal life events. With a timeline in place, a user can zoom in on details of the timeline around landmarks with a “volume control” or search across the full body of information.

Sounds like the start towards a “personal” topic map authoring application.

One important detail: With MS Lifebrowser the user is gathering information on themselves.

Not the same as having Google or FaceBook gathering information on you. Is it?

January 31, 2012

The Heat in SharePoint Semantics: January 20 – January 27

Filed under: Findability,Microsoft,Searching,Semantics,SharePoint — Patrick Durusau @ 4:37 pm

The Heat in SharePoint Semantics: January 20 – January 27

Stephen Arnold writes:

As always, SharePoint Semantics has delivered many posts that are vitally important to both SharePoint end users and search enthusiasts alike.

Read Stephen’s post and then see: SharePoint Semantics for yourself.

From the tone of the posts I would say there are at least two very large issues that topic maps can address:

First, there is the issue of working with SharePoint itself. From these posts and other reports, it would be very generous to say that SharePoint has “documentation.” True there are materials that come with it, but either it doesn’t answer the questions users have and/or it doesn’t answer any questions at all. Opinions differ.

Using a topic map to provide a portal with useful and findable information about SharePoint itself seems like an immediate commercial opportunity. Suspect like most technical advice sites you would have to rely on ad revenue but from the numbers, it looks like people needing Sharepoint help is only going to increase.

Second, it is readily apparent that it is one thing to create data and store it in Sharepoint. It is quite another to make that information findable by others.

I don’t think that is entirely a matter of poor design or coding on the part of MS. I have never seen a useful SharePoint site but site design is left up to users. Even MS can’t increase the information management capabilities of the average user. Or at least I have never seen MS software have that result. 😉

The findability inside a SharePoint installation is an issue that topic maps can address. Like SharePoint, topic maps won’t make users more capable but they can put better tools at their disposal to assist in finding data. That isn’t speculation on my part, there is at least one topic map vendor that provides that sort of service for SharePoint installations.

At the risk of sounding repetitive, I think offering better findability with topic maps isn’t going to be sufficient to drive market adoption. On the other hand, enhancing findability within contexts and applications that users are already using, may be the sweet spot we have been looking for.

« Newer PostsOlder Posts »

Powered by WordPress