Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 21, 2013

Document visualization: an overview of current research

Filed under: Data Explorer,Graphics,Text Mining,Visualization — Patrick Durusau @ 3:13 pm

Document visualization: an overview of current research by Qihong Gan, Min Zhu, Mingzhao Li, Ting Liang, Yu Cao, Baoyao Zhou.

Abstract:

As the number of sources and quantity of document information explodes, efficient and intuitive visualization tools are desperately needed to assist users in understanding the contents and features of a document, while discovering hidden information. This overview introduces fundamental concepts of and designs for document visualization, a number of representative methods in the field, and challenges as well as promising directions of future development. The focus is on explaining the rationale and characteristics of representative document visualization methods for each category. A discussion of the limitations of our classification and a comparison of reviewed methods are presented at the end. This overview also aims to point out theoretical and practical challenges in document visualization.

The authors evaluate document visualization methods against the following goals:

  • Overview. Gain an overview of the entire collection.
  • Zoom. Zoom in on items of interest.
  • Filter. Filter out uninteresting items.
  • Details-on-demand. Select an item or group and get details when needed.
  • Relate. View relationship among items.
  • History. Keep a history of actions to support undo, replay, and progressive refinement.
  • Extract. Allow extraction of sub-collections and of the query parameters.

A useful review of tools for exploring texts!

April 28, 2013

A Partly Successful Attempt To Create Life With Data Explorer

Filed under: Data Explorer,Game of Life — Patrick Durusau @ 3:54 pm

A Partly Successful Attempt To Create Life With Data Explorer by Chris Webb.

From the post:

I’ll apologise for the title right away: this post isn’t about a Frankenstein-like attempt at creating a living being in Excel, I’m afraid. Instead, it’s about my attempt to implement Jon Conway’s famous game ‘Life’ using Data Explorer, how it didn’t fully succeed and some of the interesting things I learned along the way…

When I’m learning a new technology I like to set myself mini-projects that are more fun than practically useful, and for some reason a few weeks ago I remembered ‘Life’ (which I’m sure almost anyone who has learned programming has had to write a version of at some stage), so I began to wonder if I could write a version of it in Data Explorer. This wasn’t because I thought Data Explorer was an appropriate tool to do this – there are certainly better ways to implement Life in Excel – but I thought doing this would help me in my attempts to learn Data Explorer’s formula language and might also result in an interesting blog post.

Here’s a suggestion on learning new software.

Have you ever thought about playing the game of life with topic maps?

March 16, 2013

Finding Shakespeare’s Favourite Words With Data Explorer

Filed under: Data Explorer,Data Mining,Excel,Microsoft,Text Mining — Patrick Durusau @ 2:07 pm

Finding Shakespeare’s Favourite Words With Data Explorer by Chris Webb.

From the post:

The more I play with Data Explorer, the more I think my initial assessment of it as a self-service ETL tool was wrong. As Jamie pointed out recently, it’s really the M language with a GUI on top of it and the GUI itself, while good, doesn’t begin to expose the power of the underlying language: I’d urge you to take a look at the Formula Language Specification and Library Specification documents which can be downloaded from here to see for yourself. So while it can certainly be used for self-service ETL it can do much, much more than that…

In this post I’ll show you an example of what Data Explorer can do once you go beyond the UI. Starting off with a text file containing the complete works of William Shakespeare (which can be downloaded from here – it’s strange to think that it’s just a 5.3 MB text file) I’m going to find the top 100 most frequently used words and display them in a table in Excel.

If Data Explorer is a GUI on top of M (outdated but a point of origin), it goes up in importance.

From the M link:

The Microsoft code name “M” Modeling Language, hereinafter referred to as M, is a language for modeling domains using text. A domain is any collection of related concepts or objects. Modeling domain consists of selecting certain characteristics to include in the model and implicitly excluding others deemed irrelevant. Modeling using text has some advantages and disadvantages over modeling using other media such as diagrams or clay. A goal of the M language is to exploit these advantages and mitigate the disadvantages.

A key advantage of modeling in text is ease with which both computers and humans can store and process text. Text is often the most natural way to represent information for presentation and editing by people. However, the ability to extract that information for use by software has been an arcane art practiced only by the most advanced developers. The language feature of M enables information to be represented in a textual form that is tuned for both the problem domain and the target audience. The M language provides simple constructs for describing the shape of a textual language – that shape includes the input syntax as well as the structure and contents of the underlying information. To that end, M acts as both a schema language that can validate that textual input conforms to a given language as well as a transformation language that projects textual input into data structures that are amenable to further processing or storage.

I try to not run examples using Shakespeare. I get distracted by the elegance of the text, which isn’t the point of the exercise. 😉

February 28, 2013

Public Preview of Data Explorer

Filed under: Data Explorer,Data Mining,Microsoft — Patrick Durusau @ 5:26 pm

Public Preview of Data Explorer by Chris Webb.

From the post:

In a nutshell, Data Explorer is self-service ETL for the Excel power user – it is to SSIS what PowerPivot is to SSAS. In my opinion it is just as important as PowerPivot for Microsoft’s self-service BI strategy.

I’ll be blogging about it in detail over the coming days (and also giving a quick demo in my PASS Business Analytics Virtual Chapter session tomorrow), but for now here’s a brief list of things it gives you over Excel’s native functionality for importing data:

  • It supports a much wider range of data sources, including Active Directory, Facebook, Wikipedia, Hive, and tables already in Excel
  • It has better functionality for data sources that are currently supported, such as the Azure Marketplace and web pages
  • It can merge data from multiple files that have the same structure in the same folder
  • It supports different types of authentication and the storing of credentials
  • It has a user-friendly, step-by-step approach to transforming, aggregating and filtering data until it’s in the form you want
  • It can load data into the worksheet or direct into the Excel model

There’s a lot to it, so download it and have a play! It’s supported on Excel 2013 and Excel 2010 SP1.

Download: Microsoft “Data Explorer” Preview for Excel

Chris has collected a number of links to Data Explorer resources so look to his post for more details.

It looks like a local install is required for the preview. I have been meaning to add Windows 7 to a VM and MS Office with that.

Guess it may be time to take the plunge. 😉 (I have XP/Office on a separate box that uses the same monitors/keyboard but sharing data is problematic.)

Powered by WordPress