Archive for the ‘Business Intelligence’ Category

Contextifier: Automatic Generation of Annotated Stock Visualizations

Sunday, May 12th, 2013

Contextifier: Automatic Generation of Annotated Stock Visualizations by Jessica Hullman, Nicholas Diakopoulos and Eytan Adar.

Abstract:

Online news tools—for aggregation, summarization and automatic generation—are an area of fruitful development as reading news online becomes increasingly commonplace. While textual tools have dominated these developments, annotated information visualizations are a promising way to complement articles based on their ability to add context. But the manual effort required for professional designers to create thoughtful annotations for contextualizing news visualizations is difficult to scale. We describe the design of Contextifier, a novel system that automatically produces custom, annotated visualizations of stock behavior given a news article about a company. Contextifier’s algorithms for choosing annotations is informed by a study of professionally created visualizations and takes into account visual salience, contextual relevance, and a detection of key events in the company’s history. In evaluating our system we find that Contextifier better balances graphical salience and relevance than the baseline.

The authors use a stock graph as the primary context in which to link in other news about a publicly traded company.

Other aspects of Contextifier were focused on enhancement of that primary context.

The lesson here is that a tool with a purpose is easier to hone than a tool that could be anything for just about anybody.

I first saw this at Visualization Papers at CHI 2013 by Enrico Bertini.

Spreadsheet is Still the King of all Business Intelligence Tools

Thursday, April 11th, 2013

Spreadsheet is Still the King of all Business Intelligence Tools by Jim King.

From the post:

The technology consulting firm Gartner Group Inc. once precisely predicated that BI would be the hottest technology in 2012. The year of 2012 witnesses the sharp and substantial increase of BI. Unexpectedly, spreadsheet turns up to be the one developed and welcomed most, instead of the SAP BusinessObjects, IBM Cognos, QlikTech Qlikview, MicroStrateg, or TIBCO Spotfire. In facts, no matter it is in the aspect of total sales, customer base, or the increment, the spreadsheet is straight the top one.

Why the spreadsheet is still ruling the BI world?

See Jim’s post for the details but the bottom line was:

It is the low technical requirement, intuitive and flexible calculation capability, and business-expert-oriented easy solution to the 80% BI problems that makes the spreadsheet still rule the BI world.

Question:

How do you translate:

  • low technical requirement
  • intuitive and flexible calculation capacity (or its semantic equivalent)
  • business-expert-oriented solution to the 80% of BI problems

into a topic map application?

Should Business Data Have An Audit Trail?

Thursday, March 21st, 2013

The “second slide” I would lead with from Stuart Halloway’s Datomic, and How We Built It would be:

Should Business Data Have An Audit Trail?

Actually Stuart’s slide #65 but who’s counting? ;-)

Stuart points out the irony of git, saying:

developer data is important enough to have an audit trail, but business data is not

Whether business data should always have an audit trail would attract shouts of yes and no, depending on the audience.

Regulators, prosecutors, good government types, etc., mostly shouting yes.

Regulated businesses, security brokers, elected officials, etc., mostly shouting no.

Some in between.

Datomic, which has some common characteristics with topic maps, gives you the ability to answer these questions:

  • Do you want auditable business data or not?
  • If yes to auditable business data, to what degree?

Rather different that just assuming it isn’t possible.

Abstract:

Datomic is a database of flexible, time-based facts, supporting queries and joins, with elastic scalability and ACID transactions. Datomic queries run your application process, giving you both declarative and navigational access to your data. Datomic facts (“datoms”) are time-aware and distributed to all system peers, enabling OLTP, analytics, and detailed auditing in real time from a single system.

In this talk, I will begin with an overview of Datomic, covering the problems that it is intended to solve and how its data model, transaction model, query model, and deployment model work together to solve those problems. I will then use Datomic to illustrate more general points about designing and implementing production software, and where I believe our industry is headed. Key points include:

  • the pragmatic adoption of functional programming
  • how dynamic languages fare in mission- and performance- critical settings
  • the importance of data, and the perils of OO
  • the irony of git, or why developers give themselves better databases than they give their customers
  • perception, coordination, and reducing the barriers to scale

Resources

  • Video from CME Group Technology Conference 2012
  • Slides from CME Group Technology Conference 2012

REMOTE: Office Not Required

Sunday, February 17th, 2013

REMOTE: Office Not Required

From the post:

As an employer, restricting your hiring to a small geographic region means you’re not getting the best people you can. As an employee, restricting your job search to companies within a reasonable commute means you’re not working for the best company you can. REMOTE, the new book by 37signals, shows both employers and employees how they can work together, remotely, from any desk, in any space, in any place, anytime, anywhere.

REMOTE will be published in the fall of 2013 by Crown (Random House).

I was so impressed by Rework (see: Emulate Drug Dealers [Marketing Topic Maps]) that I am recommending REMOTE ahead of its publication.

Whether the lessons in REMOTE will be heard by most employers or shall we say their managers, remains to be seen.

Perhaps performance in revenue and the stock market will be important clues. ;-)

How to Implement Lean BI

Tuesday, February 12th, 2013

How to Implement Lean BI by Steve Dine.

A followup to his Why Most BI Programs Under-Deliver Value.

General considerations:

Many people hear the word “Lean” and it conjures up images of featureless tools, limited budgets, reduced development and the elimination of jobs. Dispelling those myths out of the gate is crucial in order to garner support for implementing Lean BI from the organization and the BI team. If team members feel that by becoming lean they are working themselves out of a job then they will not support your efforts. If your customers feel that they will receive less service or be relegated to using suboptimal tools then they may not support your efforts as well.

So, what is Lean BI? Lean BI is about focusing on customer value and generating additional value by accomplishing more with existing resources by eliminating waste….

Some highlights:

  1. Focus on Customer Value

    Value is defined as meeting or exceeding the customer needs at a specific cost at a specific time and, as mentioned in my last article, can only be defined by the customer. Anything that consumes resources that does not deliver customer value is considered waste….

  2. See the Whole Picture

    Learn to see beyond each individual architectural decision, organizational issue or technical problem by considering how they relate in a wider context. When business users make decisions and solve problems, they often only consider the immediate symptom rather than the root cause issue….

  3. Iterate Quickly

    It is often the case that by the time a project is implemented, the requirements have changed and part of what is implemented is not required anymore or is no longer a priority. When features, reports and data elements are implemented that aren’t utilized, it is considered waste….

  4. Reduce Variation

    Variation in BI is caused by a lack of standardization in processes, design, procedures, development and practices. Variation is introduced when work is initiated and implemented both inside and outside of the BI group. It causes waste in a number of ways including the added time to reverse engineer what others have developed, recovering ETL jobs caused by maintenance overlap, the extra time searching for scripts and reports, and the duplication of development caused by two developers working on the same file….

  5. Pursue Perfection

    Perfection is a critical component of Lean BI even though the key to successfully pursuing it is the understanding that you will never get there. The key to pursuing perfection is to focus on continuous improvement in an increment fashion….

Read Steve’s post for more analysis and his suggestions on possible solutions to these issues.

From a topic map perspective:

  1. Focus on Customer Value: A topic map solution can focus on specifics that return ROI to the customer. If you don’t need or want particular forms of inferencing, they can be ignored.
  2. See the Whole Picture: A topic map can capture and preserve relationships between businesses processes. Particularly ones discovered in earlier projects. Enabling teams to make new mistakes, not simply repeat old ones.
  3. Iterate Quickly: With topic maps you aren’t bound to decisions may by projects such as SUMO or Cyc. Your changes and models are just that, yours. You don’t need anyone’s permission to make changes.
  4. Reduce Variation: Some variation can be reduced but other variation, between departments or locations may successfully resist change. Topic maps can document variation and provide mappings to get around resistance to eliminating variation.
  5. Pursue Perfection: Topic maps support incremental change by allowing you to choose how much change you can manage. Not to mention that systems can still appear to other users as though they are unchanged. Unseen change is the most acceptable form of change.

Highly recommend you read both of Steve’s posts.

Why Most BI Programs Under-Deliver Value

Sunday, February 10th, 2013

Why Most BI Programs Under-Deliver Value by Steve Dine.

From the post:

Business intelligence initiatives have been undertaken by organizations across the globe for more than 25 years, yet according to industry experts between 60 and 65 percent of BI projects and programs fail to deliver on the requirements of their customers.

This impact of this failure reaches far beyond the project investment, from unrealized revenue to increased operating costs. While the exact reasons for failure are often debated, most agree that a lack of business involvement, long delivery cycles and poor data quality lead the list. After all this time, why do organizations continue to struggle with delivering successful BI? The answer lies in the fact that they do a poor job at defining value to the customer and how that value will be delivered given the resource constraints and political complexities in nearly all organizations.

BI is widely considered an umbrella term for data integration, data warehousing, performance management, reporting and analytics. For the vast majority of BI projects, the road to value definition starts with a program or project charter, which is a document that defines the high level requirements and capital justification for the endeavor. In most cases, the capital justification centers on cost savings rather than value generation. This is due to the level of effort required to gather and integrate data across disparate source systems and user developed data stores.

As organizations mature, the number of applications that collect and store data increase. These systems usually contain few common unique identifiers to help identify related records and are often referred to as data silos. They also can capture overlapping data attributes for common organizational entities, such as product and customer. In addition, the data models of these systems are usually highly normalized, which can make them challenging to understand and difficult for data extraction. These factors make cost savings, in the form of reduced labor for data collection, easy targets. Unfortunately, most organizations don’t eliminate employees when a BI solution is implemented; they simply work on different, hopefully more value added, activities. From the start, the road to value is based on a flawed assumption and is destined to under deliver on its proposition.

This post merits a close read, several times.

In particular I like the focus on delivery of value to the customer.

Err, that would be the person paying you to do the work.

Steve promises a follow-up on “lean BI” that focuses on delivering more value that it costs to deliver.

I am inherently suspicious of “lean” or “agile” approaches. I sat on a committee that was assured by three programmers they had improved upon IBM’s programming methodology but declined to share the details.

Their requirements document for a content management system, to be constructed on top of subversion, was a paragraph in an email.

Fortunately the committee prevailed upon management to tank the project. The programmers persist, management being unable or unwilling to correct past mistakes.

I am sure there are many agile/lean programming projects that deliver well documented, high quality results.

But I don’t start with the assumption that agile/lean or other methodology projects are well documented.

That is a question of fact. One that can be answered.

Refusal to answer due to time or resource constraints, is a very bad sign.

I first saw this in a top ten tweets list from KDNuggets.

“The treacherous are ever distrustful…” (Gandalf to Saruman at Orthanc)

Tuesday, October 9th, 2012

Andrew Gelman’s post: Ethical standards in different data communities reminded me of this quote from The Two Towers (Lord of the Rings, Book II, J.R.R. Tolkien).

Andrew reports on a widely repeated claim by a former associate of a habitual criminal offender enterprise that recent government statistics were “cooked” to help President Obama in his re-election campaign.

After examining motives for “cooking” data and actual instances of data being “cooked” (by the habitual criminal offender enterprise), Andrew remarks:

One reason this interests me is the connection to ethics in the scientific literature. Jack Welch has experience in data manipulation and so, when he sees a number he doesn’t like, he suspects it’s been manipulated.

The problem is that anyone searching for this accusation or further information about the former associate or the habitual criminal offender enterprise, is unlike to encounter GE: Decades of Misdeeds and Wrongdoing.

Everywhere the GE stock ticker appears, there should be a link to: GE Corporate Criminal History. With links to the original documents, including pleas, fines, individuals, etc. Under whatever name or guise the activity was conducted.

This isn’t an anti-corruption rant. People in other criminal offender enterprises should be able to judge for themselves the trustworthiness of their individual counter-parts in other enterprises.

Although, someone willing to cheat the government is certainly ready to cheat you.

Topic maps can deliver that level of transparency.

Or not, if you the sort with a “cheating heart.”

First Party Fraud (In Four Parts)

Friday, September 14th, 2012

Mike Betron as written a four-part series on first party fraud that merits your attention:

First Part Fraud [Part 1]

What is First Party Fraud?

First-party fraud (FPF) is defined as when somebody enters into a relationship with a bank using either their own identity or a fictitious identity with the intent to defraud. First-party fraud is different from third-party fraud (also known as “identity fraud”) because in third-party fraud, the perpetrator uses another person’s identifying information (such as a social security number, address, phone number, etc.). FPF is often referred to as a “victimless” crime, because no consumers or individuals are directly affected. The real victim in FPF is the bank, which has to eat all of the financial losses.

First-Party Fraud: How Do We Assess and Stop the Damage? [Part 2]

Mike covers the cost of first party fraud and then why it is so hard to combat.

Why is it so hard to detect FPF?

Given the amount of financial pain incurred by bust-out fraud, you might wonder why banks haven’t developed a solution and process for detecting and stopping it.

There are three primary reasons why first-party fraud is so hard to identify and block:

1) The fraudsters look like normal customers

2) The crime festers in multiple departments

3) The speed of execution is very fast

Fighting First Party Fraud With Social Link Analysis (3 of 4)

And you know, those pesky criminals won’t use their universally assigned identifiers for financial transactions. (Any security system that relies on good faith isn’t a security system, it’s an opportunity.)

A Trail of Clues Left by Criminals

Although organized fraudsters are sophisticated, they often leave behind evidence that can be used to uncover networks of organized crime. Fraudsters know that due to Know Your Customer (KYC) and Customer Due Diligence (CDD) regulations, their identification will be verified when they open an account with a financial institution. To pass these checks, the criminals will either modify their own identity slightly or else create a synthetic identity, which consists of combining real identity information (e.g., a social security number) with fake identity information (names, addresses, phone numbers, etc.).

Fortunately for banks, false identity information can be expensive and inconvenient to acquire and maintain. For example, apartments must be rented out to maintain a valid address. Additionally, there are only so many cell phones a person can carry at one time and only so many aliases that can be remembered. Because of this, fraudsters recycle bits and pieces of these valuable assets.

The reuse of identity information has inspired Infoglide to begin to create new technology on top of its IRE platform called Social Link Analysis (SLA). SLA works by examining the “linkages” between the recycled identities, therefore identifying potential fraud networks. Once the networks are detected, Infoglide SLA applies advanced analytics to determine the risk level for both the network and for every individual associated with that network.

First Party Fraud (post 4 of 4) – A Use Case

As discussed in our previous blog in this series, Social Link Analysis works by identifying linkages between individuals to create a social network. Social Link Analysis can then analyze the network to identify organized crime, such as bust-out fraud and internal collusion.

During the Social Link Analysis process, every individual is connected to a single network. An analysis at a large tier 1 bank will turn up millions of networks, but the majority of individuals only belong to very small networks (such as a husband and wife, and possibly a child). However, the social linking process will certainly turn up a small percentage of larger networks of interconnected individuals. It is in these larger networks where participants of bust-out fraud are hiding.

Due to the massive number of networks within a system, the analysis is performed mathematically (e.g. without user interface) and scores and alerts are generated. However, any network can be “visualized” using the software to create a graphic display of information and connections. In this example, we’ll look at a visualization of a small network that the social link analysis tool has alerted as a possible fraud ring.

A word of caution.

To leap from the example individuals being related to each other to:

As a result, Social Link Analysis has detected four members of a network, each with various amounts of charged-off fraud.

Is quite a leap.

Having charged off loans, with re-use of telephone numbers and a mobile population, doesn’t necessarily mean anyone is guilty of “charged-off fraud.”

Could be, but you should tread carefully and with legal advice before jumping to conclusions of fraud.

For good customer relations, if not avoiding bad PR and legal liability.

PS: Topic maps can help with this type of data. Including mapping in the bank locations or even personnel who accepted particular loans.

Are You An IT Hostage?

Monday, August 13th, 2012

As I promised last week in From Overload to Impact: An Industry Scorecard on Big Data Business Challenges [Oracle Report], the key finding that is missing from Oracle’s summary:

Executives’ Biggest Data Management Gripes:*

#1 Don’t have the right systems in place to gather the information we need (38%)

#2 Can’t give our business managers access to the information they need; need to rely on IT (36%)

Ask your business managers: Do they feel like IT hostages?

You are likely to be surprised at the answers you get.

IT’s vocabulary acts as an information clog.

A clog that impedes the flow of information in your organization.

Information that can improve the speed and quality of business decision making.

The critical point is: Information clogs are bad for business.

Do you want to borrow my plunger?

From Overload to Impact: An Industry Scorecard on Big Data Business Challenges [Oracle Report]

Friday, August 10th, 2012

From Overload to Impact: An Industry Scorecard on Big Data Business Challenges [Oracle Report]

Summary:

IT powers today’s enterprises, which is particularly true for the world’s most data-intensive industries. Organizations in these highly specialized industries increasingly require focused IT solutions, including those developed specifically for their industry, to meet their most pressing business challenges, manage and extract insight from ever-growing data volumes, improve customer service, and, most importantly, capitalize on new business opportunities.

The need for better data management is all too acute, but how are enterprises doing? Oracle surveyed 333 C-level executives from U.S. and Canadian enterprises spanning 11 industries to determine the pain points they face regarding managing the deluge of data coming into their organizations and how well they are able to use information to drive profit and growth.

Key Findings:

  • 94% of C-level executives say their organization is collecting and managing more business information today than two years ago, by an average of 86% more
  • 29% of executives give their organization a “D” or “F” in preparedness to manage the data deluge
  • 93% of executives believe their organization is losing revenue – on average, 14% annually – as a result of not being able to fully leverage the information they collect
  • Nearly all surveyed (97%) say their organization must make a change to improve information optimization over the next two years
  • Industry-specific applications are an important part of the mix; 77% of organizations surveyed use them today to run their enterprise—and they are looking for more tailored options

What key finding did they miss?

They cover it in the forty-two (42) page report but it doesn’t appear here.

Care to guess what it is?

Forgotten key finding post coming Monday, 13 August 2012. Watch for it!

I first saw this at Beyond Search.

Building a Simple BI Solution in Excel 2013 (Part 1 & 2)

Wednesday, July 18th, 2012

Chris Webb writes up a quick BI solution in Excel 2013:

Building a Simple BI Solution in Excel 2013, Part 1

and

Building a Simple BI Solution in Excel 2013, Part 2

In the process Chris uncovers some bugs and disappointments, but on the whole the application works.

I mention it for a couple of reasons.

If you recall, something like 75% of the BI market is held by Excel. I don’t expect that to change any time soon.

What do you think happens when “self-service” BI applications are created by users? Other than becoming the default applications for offices and groups in organizations?

Will different users are going to make different choices with their Excel BI applications?

Will users with different Excel BI applications resort to knives, if not guns, to avoid changing their Excel BI applications?

Excel in its many versions leads to varying and inconsistent “self-service” applications in 75% of the BI marketplace.

Is it just me or does that sound like an opportunity for topic maps to you?

Subverting Ossified Departments [Moving beyond name calling]

Saturday, July 7th, 2012

Brian Sommer has written on why analytics will not lead to new revenue streams, improved customer service, better stock options or other signs of salvation:

The Ossified Organization Won’t ‘Get’ Analytics (part 1 of 3)

How Tough Will Analytics Be in Ossified Firms? (Part 2 of 3)

Analytics and the Nimble Organization (part 3 of 3)

Why most firms won’t profit from analytics:

… Every day, companies already get thousands of ideas for new products, process innovations, customer interaction improvements, etc. and they fail to act on them. The rationale for this lack of movement can be:

- That’s not the way we do things here

- It’s a good idea but it’s just not us

- It’s too big of an idea

- It will be too disruptive

- We’d have to change so many things

- I don’t know who would be responsible for such a change

And, of course,

- It’s not my job

So if companies don’t act on the numerous, free suggestions from current customers and suppliers, why are they so deluded into thinking that IT-generated, analytic insights will actually fare better? They’re kidding themselves.

[part 1]

What Brian describes in amusing and great detail are all failures that no amount of IT, analytics or otherwise, can address. Not a technology problem. Not even an organization (as in form) issue.

It is a personnel issue. You can either retrain (I find unlikely to succeed) or you can get new personnel. it really is that simple. And with a glutted IT market, now would be the time to recruit an IT department not wedded to current practices. But you would need to do the same in accounting, marketing, management, etc.

But calling a department “ossified” is just name calling. You have to move beyond name calling to establish a bottom line reason for change.

Assuming you have access, topic maps can help you integrate data across department that don’t usually interchange data. So you can make the case for particular changes in terms of bottom line expenses.

Here is a true story with the names omitted and the context changed a bit:

Assume you are a publisher of journals, with both institutional and personal subscriptions. One of the things that all periodical publishers have to address are claims for “missing” issues. It happens, mail room mistakes, postal system errors, simply lost in transit, etc. Subscribers send in claims for those missing issues.

Some publishers maintain records of all subscriptions, including any correspondence and records, which are consulted by some full time staffer who answers all “claim” requests. One argument being there is a moral obligation to make sure non-subscribers don’t get an issue to which they are not entitled. Seriously, I have heard that argument made.

Analytics and topic maps could combine the subscription records with claim records and expenses for running the claims operation to show the expense of detailed claim service. Versus the cost of having the mail room toss another copy back to the requester. (Our printing cost was $3.00/copy so the math wasn’t the hard part.)

Topic maps help integrate the data you “obtain” from other departments. Just enough to make your point. Don’t have to integrate all the data, just enough to win the argument. Until the next argument comes along and you take a bit bigger bite of the apple.

Agile organizations are run by people agile enough to take control of them.

You can wait for permission from an ossified organization or you can use topic maps to take the first “bite.”

Your move.

PS: If you have investments in journal publishing you might want to check on claims handling.

50 Open Source Replacements for Proprietary Business Intelligence Software

Saturday, June 30th, 2012

50 Open Source Replacements for Proprietary Business Intelligence Software by Cynthia Harvey.

From the post:

In a recent Gartner survey, CIOs picked business intelligence and analytics as their top technology priority for 2012. The market research firm predicts that enterprises will spend more than $12 billion on business intelligence (BI), analytics and performance management software this year alone.

As the market for business intelligence solutions continues to grow, the open source community is responding with a growing number of applications designed to help companies store and analyze key business data. In fact, many of the best tools in the field are available under an open source license. And enterprises that need commercial support or other services will find many options available.

This month, we’ve put together a list of 50 of the top open source business intelligence tools that can replace proprietary solutions. It includes complete business intelligence platforms, data warehouses and databases, data mining and reporting tools, ERP suites with built-in BI capabilities and even spreadsheets. If we’ve overlooked any tools that you feel should be on the list, please feel free to note them in the comments section below.

A very useful listing of “replacements” for proprietary software in part because it includes links to the software to be replaced.

You will find it helpful in identifying software packages with common goals but diverse outputs, grist for topic map mills.

I tried to find a one-page display (print usually works) but you will have to endure the advertising clutter to see the listing.

PS: Remember that MS Excel seventy-five (75%) percent of the BI market. Improve upon/use an MS Excel result, you are closer to a commercially viable product. (BI’s Dirty Secrets – Why Business People are Addicted to Spreadsheets)

Business Intelligence and Reporting Tools (BIRT)

Friday, June 22nd, 2012

Business Intelligence and Reporting Tools (BIRT)

From the homepage:

BIRT is an open source Eclipse-based reporting system that integrates with your Java/Java EE application to produce compelling reports.

Being reminded by the introduction that reports can consist of lists, charts, crosstabs, letters & documents, compound reports, I was encouraged to see:

BIRT reports consist of four main parts: data, data transforms, business logic and presentation.

  • Data – Databases, web services, Java objects all can supply data to your BIRT report. BIRT provides JDBC, XML, Web Services, and Flat File support, as well as support for using code to get at other sources of data. BIRT’s use of the Open Data Access (ODA) framework allows anyone to build new UI and runtime support for any kind of tabular data. Further, a single report can include data from any number of data sources. BIRT also supplies a feature that allows disparate data sources to be combined using inner and outer joins.
  • Data Transforms – Reports present data sorted, summarized, filtered and grouped to fit the user’s needs. While databases can do some of this work, BIRT must do it for “simple” data sources such as flat files or Java objects. BIRT allows sophisticated operations such as grouping on sums, percentages of overall totals and more.
  • Business Logic – Real-world data is seldom structured exactly as you’d like for a report. Many reports require business-specific logic to convert raw data into information useful for the user. If the logic is just for the report, you can script it using BIRT’s JavaScript support. If your application already contains the logic, you can call into your existing Java code.
  • Presentation – Once the data is ready, you have a wide range of options for presenting it to the user. Tables, charts, text and more. A single data set can appear in multiple ways, and a single report can present data from multiple data sets.

I was clued into BIRT by Actuate, so you might want to pay them a visit as well.

Anytime you are manipulating data, for analysis or reporting, you are working with subjects.

Topic maps are a natural for planning or documenting your transformations or reports.

Or let me put it this way: Do you really want to hunt down what you think you did six months ago for the last report? And then spend a day or two in frantic activity correcting what you mis-remember? There are other options. Your choice.

Working with Graph Data from Neo4j in QlikView

Thursday, June 21st, 2012

Working with Graph Data from Neo4j in QlikView

From the post:

There are numerous examples of problems which can be handled efficiently by graph databases. A graph is made up of nodes and relationships between nodes (or vertices and edges): http://en.wikipedia.org/wiki/Graph_database

Now we can use graph data in a business intelligence / business discovery solution like QlikView to do some more business related analytics.

Neo4j is a high-performance, NoSQL graph database with all the features of a mature and robust database. It is an open source project supported by Neo Technology, implemented in Java. You can read more about it here: http://neo4j.org

Here are some slides about graph problems and use cases for Neo4j: http://www.slideshare.net/peterneubauer/neo4j-5-cool-graph-examples-4473985

Since the Neo4j JDBC driver is available we can use the QlikView JDBC Connector from TIQ Solutions and Cypher – a declarative graph query language – for expressive and efficient querying of the graph data. Take a look into the Cypher documentation to understand the syntax of this human query language, because it is totally different from SQL: http://docs.neo4j.org/chunked/1.7/cypher-query-lang.html

A very interesting presentation and it only requires your time to see the potential for benefits (or not) from using Neo4j and QlikView. (Some people who try the software may not benefit at all. It is as important to identify them quickly as it is those who will greatly benefit from it.)

It fails to make the case for business analytics, mostly because it doesn’t frame a business analytics problem and then solving it. Sorting movies by the average age of actors could be an answer to a BI question but which one isn’t readily apparent.

Forecasting: principles and practice

Wednesday, May 23rd, 2012

Forecasting: principles and practice: An online textbook by Rob J Hyndman and George Athanasopoulos.

From the preface:

Wel­come to our new online text­book on fore­cast­ing. This book is intended as a replace­ment for Makri­dakis, Wheel­wright and Hyn­d­man (Wiley 1998).

The entire book is avail­able online and free-of-charge. Of course, we won’t make much money doing this, but text­books never make much money any­way — the pub­lish­ers make all the money. We’d rather cre­ate some­thing that is widely used and use­ful, than have large pub­lish­ers profit from our efforts.

Even­tu­ally a print ver­sion of the book will be avail­able to pur­chase on Ama­zon, but not until a few more chap­ters are written.

This text­book is intended to pro­vide a com­pre­hen­sive intro­duc­tion to fore­cast­ing meth­ods and present enough infor­ma­tion about each method for read­ers to use them sen­si­bly. We don’t attempt to give a thor­ough dis­cus­sion of the the­o­ret­i­cal details behind each method, although the ref­er­ences at the end of each chap­ter will fill in many of those details.

The book is writ­ten for three audi­ences: (1) people find­ing them­selves doing fore­cast­ing in busi­ness when they may not have had any for­mal train­ing in the area; (2) undergraduate stu­dents study­ing busi­ness; (3) MBA stu­dents doing a fore­cast­ing elec­tive. We use it our­selves for a second-year sub­ject for stu­dents under­tak­ing a Bach­e­lor of Com­merce degree at Monash Uni­ver­sity, Australia.

Should be a useful resource for learning the forecasting “lingo” in a business context. Or for learning forecasting for that matter.

The middle chapters on regression, as the authors point out, are unfinished by they hope to have the book complete by the end of 2012.

It could be a really nice gesture on our part if we all read a chapter or so and suggested corrections to improvements to the prose.

A Look At Google BigQuery

Monday, May 21st, 2012

A Look At Google BigQuery

Chris Webb writes:

Over the years I’ve written quite a few posts about Google’s BI capabilities. Google never seems to get mentioned much as a BI tools vendor but to me it’s clear that it’s doing a lot in this area and is consciously building up its capabilities; you only need to look at things like Fusion Tables (check out these recently-added features), Google Refine and of course Google Docs to see that it’s pursuing a self-service, information-worker-led vision of BI that’s very similar to the one that Microsoft is pursuing with PowerPivot and Data Explorer.

Earlier this month Google announced the launch of BigQuery and I decided to take a look. Why would a Microsoft BI loyalist like me want to do this, you ask? Well, there are a number of reasons:

Looks like an even handed report to me.

See what you think about it and BigQuery.

Self-Service BI Mapping with Microsoft Research’s Layerscape–Part 1

Tuesday, April 17th, 2012

Self-Service BI Mapping with Microsoft Research’s Layerscape–Part 1 by Chris Webb.

From the post:

Sometimes you find a tool that is so cool, you can’t believe no-one else has picked up on it before. This is one of those times: a few month or so ago I came across a new tool called Layerscape (http://www.layerscape.org) from Microsoft Research which allows you to overlay data from Excel onto maps in Microsoft WorldWide Telescope (http://www.worldwidetelescope.org). “What is WorldWide Telescope?” I hear you ask – well, it’s basically Microsoft Research’s answer to Google Earth, although it’s not limited to the Earth in that it also contains images of the universe from a wide range of ground and space-based telescopes. It’s a pretty cool toy in its own right, but Layerscape – which seems to be aimed at academics, despite the obvious business uses – turns it into a pretty amazing BI visualisation tool.

Layerscape is very easy to use: it’s an Excel addin, and once you have it and WWT installed all you need to do is select a range of data in Excel to be able to visualise it in WWT. For some cool examples of what it can do, take a look at the videos posted on the Layerscape website like this one (Silverlight required): http://www.layerscape.org/Content/Index/384

Looks like I am going to be putting the latest version of Windows and Office on my Linux box.

Will applications like Layerscape raise the bar for BI products generally? Products for the intelligence community?


Update: Self-Service BI Mapping with Microsoft Research’s Layerscape–Part 2

Chris plots the weather data from the earlier post onto a map. I think it looks pretty good. Am concerned if the 150000 row limit Chris mentions are in Layerscape or in his hardware?

I may have to beef up the RAM in my Ubuntu box for the Windows/Office combination.

Combining Heterogeneous Classifiers for Relational Databases (Of Relational Prisons and such)

Sunday, January 22nd, 2012

Combining Heterogeneous Classifiers for Relational Databases by Geetha Manjunatha, M Narasimha Murty and Dinkar Sitaram.

Abstract:

Most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a ‘flat’ form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a practical, two-phase hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. The proposed algorithm was evaluated on three diverse datasets, namely TPCH, PKDD and UCI benchmarks and showed considerable reduction in classification time without any loss of prediction accuracy.

When I read:

So, a typical enterprise dataset resides in such expert-designed multiple relational database tables. On the other hand, as known, most traditional classi cation algorithms still assume that the input dataset is available in a single table – a flat representation of data attributes. So, for applying these state-of-art single-table data mining techniques to enterprise data, one needs to convert the distributed relational data into a flat form.

a couple of things dropped into place.

First, the problem being described, the production of a flat form for analysis reminds me of the problem of record linkage in the late 1950′s (predating relational databases). There records were regularized to enable very similar analysis.

Second, as the authors state in a paragraph or so, conversion to such a format is not possible in most cases. Interesting that the choice of relational database table design has the impact of limiting the type of analysis that can be performed on the data.

Therefore, knowledge mining over real enterprise data using machine learning techniques is very valuable for what is called an intelligent enterprise. However, application of state-of-art pattern recognition techniques in the mainstream BI has not yet taken o [Gartner report] due to lack of in-memory analytics among others. The key hurdle to make this possible is the incompatibility between the input data formats used by most machine learning techniques and the formats used by real enterprises.

If freeing data from its relational prison is a key aspect to empowering business intelligence (BI), what would you suggest as a solution?

BI’s Dirty Secrets – Why Business People are Addicted to Spreadsheets

Wednesday, January 18th, 2012

BI’s Dirty Secrets – Why Business People are Addicted to Spreadsheets by Rick Sherman.

SecretMicrosoft Excel spreadsheets are the top BI tool of choice. That choking sound you hear is vendors and IT people reacting viscerally when they confront this fact. Their responses include:

  • Business people are averse to change; they don’t want to invest time in learning a new tool
  • Business people don’t understand that BI tools such as dashboards are more powerful than spreadsheets; they’re foolish not to use them
  • Spreadsheets are filled with errors
  • Spreadsheets are from hell

IDC estimated that the worldwide spend on business analytics in 2011 was $90 billion. Studies have found that many firms have more than one BI tool in use, and often more than six BI tools. Yet a recent study found that enterprises have been “stuck” at about a 25% adoption rate of BI tools by business people for a few years.

So why have adoption rates flatlined in enterprises that have had these tools for a while? Are the pundits correct in saying that business people are averse to change, lazy or just ignorant of how wonderful BI tools are?

The answers are very different if you put yourself in the business person’s position.

Read Rick’s blog to see what business people think about changing from spreadsheets.

Have you ever heard the saying: If you can’t lick ‘em, join ‘em?

There have been a number of presentations/papers on going from spreadsheets to XTM topic maps.

I don’t recall any papers that address adding topic map capabilities to spreadsheets. Do you?

Seems to me the question is:

Should topic maps try for a percentage of the 25% slice of the BI pie (against other competing tools) or, try for a percentage of the 75% of the BI pie owed by spreadsheets?

To avoid the dreaded pie chart, I make images of the respective market shares, one three times the size of the other:

BI Market Shares

Question: If you could only have 3% of a market, which market would you pick?*

See, you are on your way to being a topic map maven and a successful entrepreneur.


* Any resemblance to a question on any MBA exam is purely coincidental.

B2B Blog Strategy | Ten Be’s of The Best B2B Blogs

Wednesday, November 9th, 2011

B2B Blog Strategy | Ten Be’s of The Best B2B Blogs

Joel York writes:

Blogging is one of the easiest, cheapest and most effective ways to engage the New Breed of B2B Buyer, yet so many B2B blogs miss the mark. Here are ten “be’s” of the best b2b blogs. It isn’t the first top ten list of best B2B blog secrets, and no doubt it will not be the last. But, it is mine and it’s what I personally strive for Chaotic Flow to be.

Joel’s advice will work for topic map blogs as well.

People are not going to find out about topic maps unless we continue to push information about topic maps out into the infosphere. Blogging is one aspect of pushing information. Tweeting is another. Publication of white papers, software and other materials is another.

The need for auditable, repeatable, reliable consolidation (if you don’t like the merging word) of information from different sources is only growing with the availability of more data on the Internet. I think topic maps has a role to play there. Do you?

Connecting the Dots: An Introduction

Wednesday, November 9th, 2011

Connecting the Dots: An Introduction

A new series of posts by Rick Sherman who writes:

In the real world the situations I discuss or encounter in enterprise BI, data warehousing and MDM implementations lead me to the conclusion that many enterprises simply do not connect the dots. These implementations potentially involve various disciplines such as data modeling, business and data requirements gathering, data profiling, data integration, data architecture, technical architecture, BI design, data governance, master data management (MDM) and predictive analytics. Although many BI project teams have experience in each of these disciplines they’re not applying the knowledge from one discipline to another.

The result is knowledge silos where the the best practices and experience from one discipline is not applied in the other disciplines.

The impact is a loss in productivity for all, higher long-term costs and poorly constructed solutions. This often results in solutions that are difficult to change as the business changes, don’t scale as the data volumes or numbers of uses increase, or is costly to maintain and operate.

Imagine that, knowledge silos in the practice of eliminating knowledge silos.

I suspect that reflects the reality that each of us is a model of a knowledge silo. There are areas we like better than others, areas we know better than others, areas where we simply don’t have the time to learn. But when asked for an answer to our part of a project, we have to have some answer, so we give the one we know. Hard to imagine us doing otherwise.

We can try to offset that natural tendency by reading broadly, looking for new areas or opportunities to learn new techniques, or at least have team members or consultants who make a practice out of surveying knowledge techniques broadly.

Rick promises to show how data modeling is disconnected from the other BI disciplines in the next Connecting the Dots post.

Tiny Trilogy

Wednesday, November 9th, 2011

Tiny Trilogy

Peter Thomas writes:

Although tinyurl.com was a pioneer in URL shortening, it seems to have been overtaken by a host of competing services. For example I tend to use bit.ly most of the time. However I still rather like the tinyurl.com option to create your own bespoke shortened URLs.

This feature rather came into its own recently when I was looking for a concise way to share my recent trilogy focusing on the use of historical data to justify BI/DW investments in Insurance.

Good series of posts on historical data and business intelligence. I suspect many of these lessons could be applied fairly directly to using historical data to justify semantic integration projects.

Such as showing what sharing would have meant as far as information on terrorists prior to 9/11.

Microsoft Business Intelligence (BI) Resources

Wednesday, November 9th, 2011

Microsoft Business Intelligence (BI) Resources

Dan English posts a number of MS BI resources from a PowerView session.

I don’t have access to an MS Server environment so you will have to evaluate these resources on your own.

For historical reasons I have mostly worked in *nix server environments. I have never really been tempted to experiment with MS server products, although I must confess I have had my share of laptops/desktops that ran Windows software. (I have *nix and Windows boxes sharing monitors/keyboard even now.)

With hardware prices where they are, perhaps I should setup a Windows server box (behind my firewall, etc.) so I can test some of these applications.

Wondering what it would take to put subject identity tests, a semantic shim as it were on top of these products to offer some enhanced value to their users? True enough, if popular MS would absorb it in a future release but isn’t that what progress is about?

The future of information workers according to Microsoft, and BI plays a big part

Monday, November 7th, 2011

The future of information workers according to Microsoft, and BI plays a big part by Kasper de Jonge.

New video from Microsoft about a possible IT future.

Does it have a Futurama (New York World’s Fair) sense to you?

Futurama was an exhibition at the 1939 World’s Fair. To get a sense of the exhibit, view: To New Horizons (1940)

Running time is almost 23 minutes and it takes almost a third of that to get to the Futurama part. A vision of what 1960 will look like.

Watch the MS video and then the other.

Discussion questions:

  • How near/far from the future is the MS video?
  • What semantic impedances need to be reduced for such a future? (people to people, people to data (searching), machine to machine)
  • Which ones first?

SmartData Collective

Tuesday, September 6th, 2011

SmartData Collective

From the about page:

SmartData Collective, an online community moderated by Social Media Today, provides enterprise leaders access to the latest trends in Business Intelligence and Data Management. Our innovative model serves as a platform for recognized, global experts to share their insights through peer contributions, custom content publishing and alignment with industry leaders. SmartData Collective is a key resource for executives who need to make informed data management decisions.

Maybe a bit more mainstream than what you are accustomed to but think of it as a cross-cultural experience. ;-)

Seriously, effective promotion of topic maps means pitching them as solving problems as seen by others, not ourselves.