Hadoop SDK and Tutorials for Microsoft .NET Developers

Friday, May 17th, 2013

Hadoop SDK and Tutorials for Microsoft .NET Developers by Marc Holmes.

Microsoft has begun to treat its developer community to a number of Hadoop-y releases related to its HDInsight (Hadoop in the cloud) service, and it’s worth rounding up the material. It’s all Alpha and Preview so YMMV but looks like fun:

• Microsoft .NET SDK for Hadoop. This kit provides .NET API access to aspects of HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive. The latter is really interesting as it builds on the established technology for .NET developers to access most data sources to deliver the capabilities of the de facto standard for Hadoop data query.
• HDInsight Labs Preview. Up on Github, there is a series of 5 labs covering C#, JavaScript and F# coding for MapReduce jobs, using Hive, and then bringing that data into Excel. It also covers some Mahout use to build a recommendation engine.
• Microsoft Hive ODBC Driver. The examples above use this preview driver to enable the connection from Hive to Excel.

If all of the above excites you our Hadoop on Windows for Developers training course also similar content in a lot of depth.

Hadoop is coming to an office/data center near you.

How to set up Semantic Logging…

Friday, February 15th, 2013

While we are on the topic of semantic logging:

Logging today is mostly done too unstructured; each application developer has his own syntax for the logs, optimized for his personal requirements and when it is time to deploy, ops consider themselves lucky if there is even some logging in the application, and even luckier if that logging can be used to find problems as they occur by being able to adjust verbosity where needed.

I’ve come to the point where I want a really awesome piece of logging from the get-go – something I can pick up and install in a couple of minutes when I come to a new customer site without proper operations support.

I want to be able to be able to search, drill down into, filter out patterns and have good tooling that allow me to let logging be an obvious support as the application is brought through its life cycle, from development to production. And I don’t want to write my own log parsers, thank you very much!

That’s where semantic logging comes in – my applications should be broadcasting log data in a manner that allow code to route, filter and index it. That’s why I’ve spent a lot of time researching how logging is done in a bloody good manner – this post and upcoming ones will teach you how to make your logs talk!

It’s worth noting that you can read this post no matter your programming language. In fact, the tooling that I’m about to discuss will span multiple operating systems; Linux, Windows, and multiple programming languages: Erlang, Java, Puppet, Ruby, PHP, JavaScript and C#. I will demo logging from C#/Win initially and continue with Python, Haskell and Scala in upcoming posts.

I didn’t see any posts following this one. But it is complete enough to get you started on semantic logging.

Embracing Semantic Logging

Friday, February 15th, 2013

Embracing Semantic Logging by Grigori Melnik.

In the world of software engineering, every system needs to log. Logging helps to diagnose and troubleshoot problems with your system both in development and in production. This requires proper, well-designed instrumentation. All too often, however, developers instrument their code to do logging without having a clear strategy and without thinking through the ways the logs are going to be consumed, parsed, and interpreted. Valuable contextual information about events frequently gets lost, or is buried inside the log messages. Furthermore, in some cases logging is done simply for the sake of logging, more like a checkmark on the list. This situation is analogous to people fallaciously believing their backup system is properly implemented by enabling the backup but never, actually, trying to restore from those backups.

This lack of a thought-through logging strategy results in systems producing huge amounts of log data which is less useful or entirely useless for problem resolution.

Many logging frameworks exist today (including our own Logging Application Block and log4net). In a nutshell, they provide high-level APIs to help with formatting log messages, grouping (by means of categories or hierarchies) and writing them to various destinations. They provide you with an entry point – some sort of a logger object through which you call log writing methods (conceptually, not very different from Console.WriteLine(message)). While supporting dynamic reconfiguration of certain knobs, they require the developer to decide upfront on the template of the logging message itself. Even when this can be changed, the message is usually intertwined with the application code, including metadata about the entry such as the severity and entry id.

As ever in all discussions, even those of semantics, there is some impedance:

Imagine another world, where the events get logged and their semantic meaning is preserved. You don’t lose any fidelity in your data. Welcome to the world of semantic logging. Note, some people refer to semantic logging as “structured logging”, “strongly-typed logging” or “schematized logging”.

Whatever you want to call it:

The technology to enable semantic logging in Windows has been around for a while (since Windows 2000). It’s called ETW – Event Tracing for Windows. It is a fast, scalable logging mechanism built into the Windows operating system itself. As Vance Morrison explains, “it is powerful because of three reasons:

1. The operating system comes pre-wired with a bunch of useful events
2. It can capture stack traces along with the event, which is INCREDIBLY USEFUL.
3. It is extensible, which means that you can add your own information that is relevant to your code.

EW has been improved in .NET Framework 4.5 but I will leave you to Grigori’s post to ferret out those details.

Semantic logging is important for all the reasons mentioned in Grigori’s post and because captured semantics provide grist for semantic mapping mills.

Index your blog using tags and lucene.net

Sunday, August 26th, 2012

Index your blog using tags and lucene.net by Ricci Gian Maria.

In the last part of my series on Lucene I show how simple is adding tags to document to do a simple tag based categorization, now it is time to explain how you can automate this process and how to use some advanced characteristic of lucene. First of all I write a specialized analyzer called TagSnowballAnalyzer, based on standard SnowballAnalyzer plus a series of keywords associated to various tags, here is how I construct it.

There are various code around the net on how to add synonyms with weight, like described in this stackoverflow question, standard java lucene code has a SynonymTokenFilter in the codebase, but this example shows how simple is to write a Filter to add tags as synonym of related words.   First of all the filter was initialized with a dictionary of keyword and Tags, where Tag is a simple helper class that stores Tag string and relative weight, it also have a ConvertToToken() method that returns the tag enclosed by | (pipe) character. The use of pipe character is done to explicitly mark tags in the token stream, any word that is enclosed by pipe is by convention a tag.

Not the answer for every situation involving synonymy (as in “same subject,” i.e., topic maps) but certainly a useful one.

Lucene.Net becomes top-level project at Apache

Friday, August 17th, 2012

Lucene.Net becomes top-level project at Apache

Lucene.Net, the port of the Lucene search engine library to C# and .NET, has left the Apache incubator and is now a top-level project. The announcement on the project’s blog says that the Apache board voted unanimously to accept the graduation resolution. The vote confirms that Lucene.Net is healthy and that the development and governance of the project follows the tenets of the “Apache way”. The developers will now be moving the project’s resources from the current incubator site to the main apache.org site.

Various flavors of MS Windows account for 80% of all operating systems.

What is the target for your next topic map app? (With or without Lucene.Net.)

Mono integrates Entity Framework

Tuesday, August 14th, 2012

Mono integrates Entity Framework

The fourth preview release of version 2.11 of Mono, the open source implementation of Microsoft’s C# and .NET platform, is now available. Version 2.11.3 integrates Microsoft’s ADO.NET Entity Framework which was released as open source, under the Apache 2.0 licence, at the end of July. The Entity Framework is the company’s object-relational mapper (ORM) for the .NET Framework. This latest alpha version of Mono 2.11 has also been updated in order to match async support in .NET 4.5.

Just in case you are not familiar with the MS ADO.Net Entity Framework:

The ADO.NET Entity Framework enables developers to create data access applications by programming against a conceptual application model instead of programming directly against a relational storage schema. The goal is to decrease the amount of code and maintenance required for data-oriented applications. Entity Framework applications provide the following benefits:

• Applications can work in terms of a more application-centric conceptual model, including types with inheritance, complex members, and relationships.
• Applications are freed from hard-coded dependencies on a particular data engine or storage schema.
• Mappings between the conceptual model and the storage-specific schema can change without changing the application code.
• Developers can work with a consistent application object model that can be mapped to various storage schemas, possibly implemented in different database management systems.
• Multiple conceptual models can be mapped to a single storage schema.
• Language-integrated query (LINQ) support provides compile-time syntax validation for queries against a conceptual model.

Does the source code at Entity Framework at CodePlex need extension to:

• Discover when multiple conceptual models are mapped against a single storage schema?
• Discover when parts of conceptual models vary in name only? (to avoid duplication of models)
• Compare/contrast types with inheritance, complex members, and relationships?

If those sound like topic map type questions, they are.

There are always going to be subjects that need mappings to work with newer systems or different understandings of old ones.

Let’s stop pretending we going to reach the promised land and keep our compasses close at hand.

SharePoint Module 3.2 HotFix 3 Now Available [Javascript bug]

Saturday, June 16th, 2012

SharePoint Module 3.2 HotFix 3 Now Available

A new hotfix package is available for version 3.2 of the TMCore SharePoint Module.

Systems Affected

This hotfix should be applied to any installation of the TMCore SharePoint Module 3.2 downloaded before 15th June 2012. If you downloaded your copy of the software from our site on or after this date, the hotfix is included in the package and you do not need to apply it again.

To determine if your system is affected, check the File Version property of the assembly NetworkedPlanet.SharePoint in the GAC (browse to C:\Windows\ASSEMBLY, locate the NetworkedPlanet.SharePoint assembly, right-click and choose Properties. The File Version can be found on the Version tab above Description and Copyright). This hotfix updates the File Version of the NetworkedPlanet.SharePoint assembly to 2.2.3.0 – if the file version shown is greater than or equal to 2.2.3.0, then you do not need to apply this hotfix.

The change log reports:

BUGFIX: Hierarchy topic selector was experiencing a javascript error when topic names contained apostrophes

Starcounter To Be Fastest ACID Adherent NewSQL Database

Saturday, May 26th, 2012

Starcounter To Be Fastest ACID Adherent NewSQL Database by Sudheer Vatsavaya.

Starcounter has last week said that its launch of in-memory database is capable to process millions of transactions per second on a single server. Such a database is designed on its patent pending VMDBMS technology which offers combined power of virtual machine (VM) and Database management system (DBMS) to process the data at required volumes and speeds.

The company claims Starcounter to be more than 100 times faster than traditional databases and 10 times faster than high performance databases, the new in-memory database is ideal for highly transactional large-scale and real-time applications. It can handle millions of users, integrate with applications to increase performance, and guarantee consistency by processing millions of ACID-compliant database transactions per second while managing up to a terabyte of updatable data on a single server.

Few things that clearly come out in the design and ambition of company is the belief that the way ahead is not SQL or NoSQL but its NewSQL which adheres to ACID attributes and at the same time overcomes the issue of being scalable to todays data scalability needs. This cannot be achieved in either of the former types of databases. While SQL structured databases cannot scale upto the needs, NOSQL databases are built around CAP theorem that says either of the three parameters Availability, Consistency or Partition tolerance has to be compromised.

Sounds interesting but runs on .Net.

I will have to rely on reports from others.

DensoDB Is Out

Sunday, April 22nd, 2012

DensoDB Is Out

DensoDB is a new NoSQL document database. Written for .Net environment in c# language.

It’s simple, fast and reliable. More details on github https://github.com/teamdev/DensoDB

You can use it in three different ways:

1. InProcess: No need of service installation and communication protocol. The fastest way to use it. You have direct access to the DataBase memory and you can manipulate objects and data in a very fast way.
2. As a Service: Installed as Windows Service, it can be used as a network document store.You can use rest service or wcf service to access it. It’s not different from the previuos way to use it but you have a networking protocol and so it’s not fast as the previous one.
3. On a Mesh: mixing the previous two usage mode with a P2P mesh network, it can be easily syncronizable with other mesh nodes. It gives you the power of a distributed scalable fast database, in a server or server-less environment.

You can use it as a database for a stand alone application or in a mesh to share data in a social application. The P2P protocol for your application and synchronization rules will be transparent for you, and you’ll be able to develop all your application as it’s stand-alone and connected only to a local DB.

I don’t work in a .Net environment but am interested in experiences with .Net based P2P mesh networks and topic maps.

At some point I should setup a smallish Windows network with commodity boxes. Perhaps I could make all of them dual (or triple) boot so I could switch between distributed networks. If you have software or a box you would like to donate to the “cause” as it were, please be in touch.

Neo4jClient

Sunday, February 26th, 2012

Neo4jClient

A .NET client for the neo4j REST API. neo4j is an open sourced, Java based transactional graph database. It’s pretty awesome.

Neo4j in a .Net World

Sunday, February 26th, 2012

Neo4j in a .Net World

This month, Tatham Oddie will be coming from Australia to present at the Neo4j User Group on Neo4j with .NET, and will cover:

• the Neo4j client we have built for .NET
• hosting it all in Azure
• why our queries were 200ms slower in the cloud, and how we fixed it

Tatham will present a case study, explaining:

• what our project is
• why we chose a graph db
• how our first attempts at modelling were wrong
• what we’re doing now

Neo4jD–.NET client for Neo4j Graph DB

Sunday, February 5th, 2012

Neo4jD–.NET client for Neo4j Graph DB

Sony Arouje writes:

Last couple of days I was working on a small light weight .NET client for Neo4j. The client framework is still in progress. This post gives some existing Api’s in Neo4jD to perform basic graph operations. In Neo4j two main entities are Nodes and Relationships. So my initial focus for the client library is to deal with Node and Relationship. The communication between client and Neo4j server is in REST Api’s and the response from the server is in json format.

Let’s go through some of the Neo4j REST Api’s and the equivalent api’s in Neo4jD, you can see more details of Neo4j RestAPi’s here.

The below table will show how to call Neo4j REST Api directly from an application and the right hand will show how to do the same operation using Neo4jD client.

Traversal is next and said to be Gremlin at first.

If you are interested in promoting Neo4j in the .NET world, consider lending a hand.

How to create and search a Lucene.Net index…

Sunday, October 23rd, 2011

How to create and search a Lucene.Net index in 4 simple steps using C#, Step 1

As mentioned in a previous blog, using Lucene.Net to create and search an index was quick and easy. Here I will show you in these 4 steps how to do it.

• Create an index
• Build the query
• Perform the search
• Display the results

Before we get started I wanted to mention that Lucene.Net was originally designed for Java. Because of this I think the creators used some classes in Lucene that already exist in the .Net framework. Therefore, we need to use the entire path to the classes and methods instead of using a directive to shorten it for us.

Useful for anyone exploring topic maps as a native to MS Windows application.

Nhibernate Search Tutorial with Lucene.Net and NHibernate 3.0

Monday, March 7th, 2011

Nhibernate Search Tutorial with Lucene.Net and NHibernate 3.0

Here’s another quickstart tutorial on NHibernate Search for NHibernate 3.0 using Lucene.Net. We’re going to be using Fluent NHibernate for NHibernate but attributes for NHibernate Search.

Uses Nhibernate:

NHibernate is a mature, open source object-relational mapper for the .NET framework. It’s actively developed , fully featured and used in thousands of successful projects.

For those of you who are more comfortable in a .Net environment.

Machine Learning for .Net

Thursday, February 24th, 2011

Machine Learning for .Net

This library is designed to assist in the use of common Machine Learning Algorithms in conjunction with the .NET platform. It is designed to include the most popular supervised and unsupervised learning algorithms while minimizing the friction involved with creating the predictive models.

Supervised Learning

Supervised learning is an approach in machine learning where the system is provided with labeled examples of a problem and the computer creates a model to predict future unlabeled examples. These classifiers are further divided into the following sets:

• Binary Classification – Predicting a Yes/No type value
• Multi-Class Classification – Predicting a value from a finite set (i.e. {A, B, C, D } or {1, 2, 3, 4})
• Regression – Predicting a continuous value (i.e. a number)

Unsupervised Learning

Unsupervised learning is an approach which involves learning about the shape of unlabeled data. This library currently contains:

1. KMeans – Performs automatic grouping of data into K groups (specified a priori)

Labeling data is the same as for the supervised learning algorithms with the exception that these algorithms ignore the [Label] attribute:

1. var kmeans = new KMeans();
2. var grouping =
kmeans.Generate(ListOfStudents, 2);

Here the KMeans algorithm is grouping the ListOfStudents into two groups returning an array corresponding to the appropriate group for each student (in this case group 0 or group 1)

2. Hierarchical Clustering – In progress!
3. Planning

Currently planning/hoping to do the following:

1. Boosting/Bagging
2. Hierarchical Clustering
3. Naïve Bayes Classifier
4. Collaborative filtering algorithms (suggest a product, friend etc.)
5. Latent Semantic Analysis (for better searching of text etc.)
6. Support Vector Machines (more powerful classifier)
7. Principal Component Analysis – Aids in dimensionality reduction which should allow/facilitate learning from images
8. *Maybe* – Common AI algorithms such as A*, Beam Search, Minimax etc.

So, if you are working in a .Net context, here is a chance to get in on the ground floor of a machine learning project.

Encog Java and DotNet Neural Network Framework

Thursday, February 17th, 2011

Encog Java and DotNet Neural Network Framework

Encog is an advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks. Encog has been in active development since 2008.

Encog is available for Java, .Net and Silverlight.

An important project for at least two reasons.

First, the obvious applicability to the creation of topic maps using machine learning techniques.

Second, it demonstrates that supporting Java, .Net and Silverlight, isn’t, you know, all that weird.

The world is changing and becoming, somewhat more interoperable.

Topic maps has a role to play in that process, both in terms of semantic interoperability of the infrastructure as well as the data it contains.