Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 25, 2015

Malicious Microsoft Office versions are in the wild

Filed under: Cybersecurity,Microsoft — Patrick Durusau @ 10:58 am

Malicious Microsoft Office versions are in the wild

At first I wondered why this was news? 😉

After reading the post I realized they meant hacked versions of Microsoft Office, which in addition to the standard bugs and vulnerabilities, come with additional vulnerabilities installed by the people who hacked the official version.

I am untroubled by the presence of additional vulnerabilities in hacked versions of Microsoft Office as you know the saying, “…you get what you pay for.”

If you want Microsoft Office, then buy a copy of Microsoft Office. You won’t get much sympathy for security problems created while trying to cheat others. At least not from me.

If you want or need alternatives to Microsoft Office, try Apache OpenOffice or LibreOffice.

Even with “free” software, you should always use official or reputable distribution sites. A little bit of caution on your part will present attackers with a much smaller attack surface. Staff that don’t exercise such caution should be recommended to your competitors.

May 7, 2015

GQL and SharePoint Online Search REST APIs

Filed under: Graphs,Microsoft — Patrick Durusau @ 7:57 pm

Query the Office graph using GQL and SharePoint Online Search REST APIs

From the post:

Graph Query Language (GQL) is a preliminary query language designed to query the Office graph via the SharePoint Online Search REST API. By using GQL, you can query the Office graph to get items for an actor that satisfies a particular filter.

Note The features and APIs documented in this article are in preview and are subject to change. The current additions to the Search REST API are a preliminary solution to make it possible to query the Office graph, mainly intended for the Office Delve experience. Feel free to experiment with querying the Office graph but do not use these features, or other features and APIs documented in this article, in production. Your feedback about these features and APIs is important. Let us know what you think. Connect with us on Stack Overflow. Tag your questions with [office365].

An interesting development from Microsoft!

Early days so there is a long way to go before we are declaring relationships between entities inside objects and assigning the entities and their relationships properties.

Still, a promising development.

March 21, 2015

Turning the MS Battleship

Filed under: Interoperability,Microsoft,WWW,XML,XPath — Patrick Durusau @ 8:46 am

Improving interoperability with DOM L3 XPath by Thomas Moore.

From the post:

As part of our ongoing focus on interoperability with the modern Web, we’ve been working on addressing an interoperability gap by writing an implementation of DOM L3 XPath in the Windows 10 Web platform. Today we’d like to share how we are closing this gap in Project Spartan’s new rendering engine with data from the modern Web.

Some History

Prior to IE’s support for DOM L3 Core and native XML documents in IE9, MSXML provided any XML handling and functionality to the Web as an ActiveX object. In addition to XMLHttpRequest, MSXML supported the XPath language through its own APIs, selectSingleNode and selectNodes. For applications based on and XML documents originating from MSXML, this works just fine. However, this doesn’t follow the W3C standards for interacting with XML documents or exposing XPath.

To accommodate a diversity of browsers, sites and libraries wrap XPath calls to switch to the right implementation. If you search for XPath examples or tutorials, you’ll immediately find results that check for IE-specific code to use MSXML for evaluating the query in a non-interoperable way:

It seems like a long time ago that a relatively senior Microsoft staffer told me that turning a battleship like MS takes time. No change, however important, is going to happen quickly. Just the way things are in a large organization.

The important thing to remember is that once change starts, that too takes on a certain momentum and so is more likely to continue, even though it was hard to get started.

Yes, I am sure the present steps towards greater interoperability could have gone further, in another direction, etc. but they didn’t. Rather than complain about the present change for the better, why not use that as a wedge to push for greater support for more recent XML standards?

For my part, I guess I need to get a copy of Windows 10 on a VM so I can volunteer as a beta tester for full XPath (XQuery?/XSLT?) support in a future web browser. MS as a full XML competitor and possible source of open source software would generate some excitement in the XML community!

March 20, 2015

How to install Spark 1.2 on Azure HDInsight clusters

Filed under: Azure Marketplace,Microsoft,Spark — Patrick Durusau @ 4:28 pm

How to install Spark 1.2 on Azure HDInsight clusters by Maxim Lukiyanov.

From the post:

Today we are pleased to announce the refresh of the Apache Spark support on Azure HDInsight clusters. Spark is available on HDInsight through custom script action and today we are updating it to support the latest version of Spark 1.2. The previous version supported version 1.0. This update also adds Spark SQL support to the package.

Spark 1.2 script action requires latest version of HDInsight clusters 3.2. Older HDInsight clusters will get previous version of Spark 1.0 when customized with Spark script action.

Follow the below steps to create Spark cluster using Azure Portal:

The only remaining questions are: How good are you with Spark? and How big of a Spark cluster do you neeed? (or can afford).

Enjoy!

February 5, 2015

Cross Site Scripting zero-day bug [Or Feature?]

Filed under: Cybersecurity,Microsoft,Security — Patrick Durusau @ 1:36 pm

Internet Explorer has a Cross Site Scripting zero-day bug by Paul Ducklin.

From the post:

Another day, another zero-day.

This time, Microsoft Internet Explorer is attracting the sort of publicity a browser doesn’t want, following the public disclosure of what’s known as a Cross-Site Scripting, or XSS, bug.

With Microsoft apparently now investigating and looking at a patch, the timing of the disclosure certainly looks to be irresponsible.

There’s no suggestion that Microsoft failed to meet any sort of deadline to get a patch out, or even that the company was contacted in advance.

Nevertheless, details of the bug have been revealed, including some proof-of-concept JavaScript showing how to abuse the hole.

So, what is XSS, and what does this mean for security?

The bug violates the same origin policy (SOP) which Wikipedia describes as:

This mechanism bears a particular significance for modern web applications that extensively depend on HTTP cookies to maintain authenticated user sessions, as servers act based on the HTTP cookie information to reveal sensitive information or take state-changing actions. A strict separation between content provided by unrelated sites must be maintained on the client side to prevent the loss of data confidentiality or integrity.

While phrased in terms of “security,” take note that this includes content from other sites as well. As one post I read on to the topic suggested that content can be intermingled, but that isn’t the same as manipulation of content from another source.

If you think of SOP as preventing programmatic, creative and imaginative re-use of content from other sites, suddenly it sounds a lot less like a feature doesn’t it?

Only if you follow the “cookie, cookie, me want cookie” philosophy of browser interaction is SOP even necessary. Once I authenticate to a remote site, if state is maintained at all it could be maintained on the server side. Rendering SOP, how did Eve in the The Diaries of Adam and Eve put it?, ah, superfluous.

Curious how security became intertwined with the desire of content owners to prevent re-use of content. That doesn’t sound like a neutral choice to me. Perhaps we should make another choice and evolve a different security model for web browsers.

A different security model that puts security in the hands of those best able to maintain it, that is server side. And at the same time, empower users, script writers and others to re-use any content they can load into their browsers. Imagine the range of services and capabilities that would add!

Better security, better access to content from any site. Sounds like a win-win to me. You?

In the meantime, thinks with IE may not be as grim as reported. Sean Michael Kerner reports in: Researcher Discloses Potential Internet Explorer XSS Zero-Day Flaw, that Microsoft has known about the bug since October 13, 2014 and doesn’t seem to be all that excited about it.

I make that to be 115 days, including February 4, 2015, so zero-day + 115 days. Rather long in the tooth for a zero-day bug I would say. 😉 You do know that “zero-day” doesn’t mean the day you read about it. Yes?

The bug was reported on the Full Disclosure list, for which neither of the posts cited gave a URL.

PS: Is anyone working on a fork of JavaScript that enables cross site scripting by design? The advantages for content re-use would be enormous. Users in charge of content on their own screens. What a concept.

February 4, 2015

Creating Excel files with Python and XlsxWriter

Filed under: Excel,Microsoft,Python,Spreadsheets — Patrick Durusau @ 4:53 pm

Creating Excel files with Python and XlsxWriter

From the post:

XlsxWriter is a Python module for creating Excel XLSX files.

demo-xlsxwriter

(Sample code to create the above spreadsheet.)

XlsxWriter

XlsxWriter is a Python module that can be used to write text, numbers, formulas and hyperlinks to multiple worksheets in an Excel 2007+ XLSX file. It supports features such as formatting and many more, including:

  • 100% compatible Excel XLSX files.
  • Full formatting.
  • Merged cells.
  • Defined names.
  • Charts.
  • Autofilters.
  • Data validation and drop down lists.
  • Conditional formatting.
  • Worksheet PNG/JPEG images.
  • Rich multi-format strings.
  • Cell comments.
  • Memory optimisation mode for writing large files.

I know what you are thinking. If you are processing the data with Python, why the hell would you want to write data to XSL or XLSX?

Good question! But it also has an equally good answer.

Attend a workshop for mid-level managers and introduce one of the speakers saying:

We are going to give away copies of the data used in this presentation. By show of hands, how many people want it in R format? Now, how many people want it in Excel format?

Or you can reverse the questions so the glazed look from the audience on the R question doesn’t blind you. 😉

If your data need to transition to management, at least most management, spreadsheet formats are your friend.

If you don’t believe me, see any number of remarkable presentation by Felienne Hermans on the use of spreadsheets or check out my spreadsheets category.

Don’t get me wrong, I prefer being closer to the metal but on the other hand, delivering data that users can use is more profitable than the alternatives.

I first saw this in a tweet by Scientific Python.

January 21, 2015

The next generation of Windows: Windows 10

Filed under: Microsoft — Patrick Durusau @ 6:00 pm

The next generation of Windows: Windows 10 by Terry Myerson.

From the post:

Today I had the honor of sharing new information about Windows 10, the new generation of Windows.

Our team shared more Windows 10 experiences and how Windows 10 will inspire new scenarios across the broadest range of devices, from big screens to small screens to no screens at all. You can catch the video on-demand presentation here.

Windows 10 is the first step to an era of more personal computing. This vision framed our work on Windows 10, where we are moving Windows from its heritage of enabling a single device – the PC – to a world that is more mobile, natural and grounded in trust. We believe your experiences should be mobile – not just your devices. Technology should be out of the way and your apps, services and content should move with you across devices, seamlessly and easily. In our connected and transparent world, we know that people care deeply about privacy – and so do we. That’s why everything we do puts you in control – because you are our customer, not our product. We also believe that interacting with technology should be as natural as interacting with people – using voice, pen, gestures and even gaze for the right interaction, in the right way, at the right time. These concepts led our development and you saw them come to life today.

I had to find a text equivalent to the video. I was looking for specific information I saw mentioned in an email and watching the entire presentation (2+ hours) just wasn’t in the cards.

I will be watching the comment lists on Windows 10 for the answers to two questions:

First, will I be able to run Windows 10 within a VM on Ubuntu?

Second, for “sharing” of annotations to documents, is the “sharing” protocol open so that annotations can be shared by users not using Windows 10?

Actually I did see some of the video and assuming you have the skills of a graphic artist, you are going to be producing some rocking content with Windows 10. People who struggle to doodle, not so much.

The devil will be in the details but I can say this is the first version of Windows that has ever made me consider upgrading from Windows XP. Haven’t decided and may have to run it on a separate box (share monitors with Ubuntu) but I can definitely say I am interested.

December 21, 2014

New Open XML PowerTool Cmdlet simplifies retrieval of document metrics

Filed under: Microsoft,XML — Patrick Durusau @ 8:43 pm

New Open XML PowerTool Cmdlet simplifies retrieval of document metrics by Doug Mahugh.

From the post:

It’s been a good year for Open XML developers. The release of the Open XML SDK as an open source project back in June was well-received by the community, and enabled contributions such as the makefile to automate use of the SDK on Mono and a Visual Studio project for the SDK. Project leader Eric White has worked to refine and improve the testing process, and here at MS Open Tech we’ve been working with our China team to get the word out, starting with mirroring the repo to GitCafe for ease of access in China.

Today there’s another piece of good news for Open XML developers: Eric White has added a new Get-DocxMetrics Cmdlet to the Open XML PowerTools, the latest step in a developer-focused reengineering of the PowerTools to make them even more flexible and useful to Open XML developers. As Eric explains in his blog post on the Open XML Developer site:

My latest foray is a new Cmdlet, Get-DocxMetrics, which returns a lot of useful information about a WordprocessingML document. A summary of the information it returns for a document:

  • The style hierarchy – styles can inherit from other styles, and it is helpful to know what styles are defined in a document.
  • The content control hierarchy. We can examine the hierarchy, and design an XSD schema to validate them.
  • The list of languages used in a document, such as en-US, fr-FR, and so on.
  • Whether a document contains tracked revisions, text boxes, complex fields, simple fields, altChunk content, tables, hyperlinks, legacy frames, ActiveX controls, sub documents, references to null images, embedded spreadsheets, document protection, multi-font runs, the list of numbering formats used, and more.
  • Metrics on how large the document is, including element counts, average paragraph lengths, run count, zero length text elements, ASCII character counts, complex script character counts, East Asia character counts, and the count of runs of each of the variety of characters.

Get-DocxMetrics sounds like a viable way to generate statistics on a collection of OpenXML files to determine what features of OpenXML are actually in use by an enterprise or government. That would make creation of specialized tools for such entities a far more certain proposition.

Output from such analysis would be a nice input into a topic map for purposes of mapping usage to other formats. What maps?, what misses?, etc.

Looking forward to hearing more about this tool in the new year!

December 17, 2014

Orleans Goes Open Source

Filed under: .Net,Actor-Based,Cloud Computing,HyTime,Microsoft,Open Source — Patrick Durusau @ 7:03 pm

Orleans Goes Open Source

From the post:

Since the release of the Project “Orleans” Public Preview at //build/ 2014 we have received a lot of positive feedback from the community. We took your suggestions and fixed a number of issues that you reported in the Refresh release in September.

Now we decided to take the next logical step, and do the thing many of you have been asking for – to open-source “Orleans”. The preparation work has already commenced, and we expect to be ready in early 2015. The code will be released by Microsoft Research under an MIT license and published on GitHub. We hope this will enable direct contribution by the community to the project. We thought we would share the decision to open-source “Orleans” ahead of the actual availability of the code, so that you can plan accordingly.

The real excitement for me comes from a post just below this announcement: A Framework for Cloud Computing,


To avoid these complexities, we built the Orleans programming model and runtime, which raises the level of the actor abstraction. Orleans targets developers who are not distributed system experts, although our expert customers have found it attractive too. It is actor-based, but differs from existing actor-based platforms by treating actors as virtual entities, not as physical ones. First, an Orleans actor always exists, virtually. It cannot be explicitly created or destroyed. Its existence transcends the lifetime of any of its in-memory instantiations, and thus transcends the lifetime of any particular server. Second, Orleans actors are automatically instantiated: if there is no in-memory instance of an actor, a message sent to the actor causes a new instance to be created on an available server. An unused actor instance is automatically reclaimed as part of runtime resource management. An actor never fails: if a server S crashes, the next message sent to an actor A that was running on S causes Orleans to automatically re-instantiate A on another server, eliminating the need for applications to supervise and explicitly re-create failed actors. Third, the location of the actor instance is transparent to the application code, which greatly simplifies programming. And fourth, Orleans can automatically create multiple instances of the same stateless actor, seamlessly scaling out hot actors.

Overall, Orleans gives developers a virtual “actor space” that, analogous to virtual memory, allows them to invoke any actor in the system, whether or not it is present in memory. Virtualization relies on indirection that maps from virtual actors to their physical instantiations that are currently running. This level of indirection provides the runtime with the opportunity to solve many hard distributed systems problems that must otherwise be addressed by the developer, such as actor placement and load balancing, deactivation of unused actors, and actor recovery after server failures, which are notoriously difficult for them to get right. Thus, the virtual actor approach significantly simplifies the programming model while allowing the runtime to balance load and recover from failures transparently. (emphasis added)

Not in a distributed computing context but the “look and its there” model is something I recall from HyTime. So nice to see good ideas resurface!

Just imagine doing that with topic maps, including having properties of a topic, should you choose to look for them. If you don’t need a topic, why carry the overhead around? Wait for someone to ask for it.

This week alone, Microsoft continues its fight for users, announces an open source project that will make me at least read about .Net, ;-), I think Microsoft merits a lot of kudos and good wishes for the holiday season!

I first say this at: Microsoft open sources cloud framework that powers Halo by Jonathan Vanian.

November 13, 2014

The Battleship Moves

Filed under: Microsoft,Open Source — Patrick Durusau @ 2:45 pm

A milestone moment for Microsoft: .NET is now an open-source project by Jonathan Vanian.

During the acrimonious debate about OOXML, a friend said that Microsoft was like a very large battleship, it could turn, but movement wasn’t ever sudden.

From what I read in Jonathan’s post, MS is in the process of making yet another turn, this time to make .NET an open source project.

A move that gives credence to the proposition that being open source isn’t inconsistent with being a commercial enterprise and a profitable one.

But just as important is commercial open source software as a bulwark against government surveillance. Consumers will have the choice of buying binary and possibly government surveillance infected software or they can use open source and the services of traditional vendors such as MS, IBM, HP, etc. to compile specific software packages for their use.

Opening up such a large package isn’t an overnight lark so I encourage everyone to be patient as MS eases .NET into the waters of open source. Continued good experiences with an open source .NET will further the open source agenda at Microsoft.

The more open source software in use, the fewer dark places for government surveillance to hide.

Fewer dark places for government surveillance to hide.” Yet another benefit from open source software!

November 12, 2014

Potentially catastrophic bug bites all versions of Windows. Patch now

Filed under: Cybersecurity,Microsoft — Patrick Durusau @ 10:41 am

Potentially catastrophic bug bites all versions of Windows. Patch now by Dan Goodin.

From the post:

Microsoft has disclosed a potentially catastrophic vulnerability in virtually all versions of Windows. People operating Windows systems, particularly those who run websites, should immediately install a patch Microsoft released Tuesday morning.

The vulnerability resides in the Microsoft secure channel (schannel) security component that implements the secure sockets layer and transport layer security (TLS) protocols, according to a Microsoft advisory. A failure to properly filter specially formed packets makes it possible for attackers to execute attack code of their choosing by sending malicious traffic to a Windows-based server.

While the advisory makes reference to vulnerabilities targeting Windows servers, the vulnerability is rated critical for client and server versions of Windows alike, an indication the remote-code bug may threaten Windows desktops and laptop users as well. Amol Sarwate, director of engineering at Qualys, told Ars the flaw leaves client machines open if users run software that monitors Internet ports and accepts encrypted connections.

This sort of security announcement makes you nostalgic for the Black Screen and Blue Screen of Death doesn’t it? While looking up the reference on the Blue Screen of Death, I discovered that Windows still has that feature. I was thinking about the Blue Screen of Death from Windows NT days. I haven’t seen a blue screen on Windows XP so assumed they had fixed those issues. My bad.

Danger, Danger!

This security update is rated Critical for all supported releases of Microsoft Windows. (emphasis added)

The earliest versions of Windows listed are Vista and Windows Server 2003.

Which excludes Windows XP, whose security support ended on April 8, 2014.

I mention that because 95% of bank ATMs face end of security support by Jose Pagliery.

Yes, 95% of bank ATMs were running Windows XP (est.). Some banks were reported to have made arrangements with MS for continued support but who and for how long isn’t known.

The support bulletin doesn’t say if the vulnerability exists in Windows XP but you could start looking with: Vulnerability in the Windows Schannel Security Package Could Allow Remote Code Execution (935840) Published: June 12, 2007. A different security issue with Schannel.

If you confirm issue in MS14-066 with Windows XP, please post a comment. Thanks!

PS: Better organization of the Windows documentation would help security researchers. Being able to navigate from releases to specific files for a particular problem and thence backward to other versions and their files and thence to the files, would be quite helpful. Even if packages are needed for updates due to dependencies between files.


Update: November 16, 2014.

On November 14, 2014, Sara Peters posted: Microsoft Fixes Critical SChannel & OLE Bugs, But No Patches For XP and writes in part:

Joe Barrett, senior security consultant of Foreground Security says that Winshock “will most likely be the first true ‘forever-day’ vulnerability for Windows NT, Windows 2000, and Windows XP. As Microsoft has ceased all support and publicly stated they will no longer release security patches, enterprises who still have Windows 2000 and Windows XP machines will find themselves in the uncomfortable situation of having an exploitable-but-unpatchable system on their network,” he says.

“Security researchers and blackhats alike are most likely racing to get the first workable exploit against this vulnerability, and the bad guys will begin immediately using it to compromise as much as they can,” he says. “As a result, enterprises need to immediately deploy the patch to every system they can and also begin isolating and removing the unpatchable systems to prevent serious compromise of their networks.”

I guess that removes all doubt about XP based ATMs being vulnerable.

October 29, 2014

Microsoft Garage

Filed under: Microsoft,Software — Patrick Durusau @ 1:43 pm

Microsoft Garage

From the webpage:

Hackers, makers, artists, tinkerers, musicians, inventors — on any given day you’ll find them in The Microsoft Garage.

We are a community of interns, employees, and teams from everywhere in the company who come together to turn our wild ideas into real projects. This site gives you early access to projects as they come to life.

Tell us what rocks, and what doesn’t.

Welcome to The Microsoft Garage.

Two projects (out of several) that I thought were interesting:

Collaborate

Host or join collaboration sessions on canvases that hold text cards and images. Ink on the canvas to organize your content, or manipulate the text and images using pinch, drag, and rotate gestures.

Floatz

Floatz, a Microsoft Garage project, lets you float an idea out to the people around you, and see what they think. Join in on any nearby Floatz conversation, or start a new one with a question, idea, or image that you share anonymously with people nearby.

Share your team spirit at a sporting event, or your awesome picture of the band at a rock concert. Ask the locals where to get a good meal when visiting an unfamiliar neighborhood. Speak your mind, express your feelings, and find out if there are others around you who feel the same way—all from the safety of an anonymous screen name in Floatz.

I understand the theory of asking for advice anonymously, but I assume that also means the person answering is anonymous as well. Yes? I don’t have a cellphone so I can’t test that theory. Comments?

On the other hand, if you are sharing data with known and unknown others, so you know which “anonymous” screen names to trust (for example, don’t trust name with FBI, CIA or NSA preceded or followed by hyphens), then Floatz could very useful.

I first saw this in Nat Torkington’s Four short links: 23 October 2014.

October 11, 2014

Microsoft’s Quantum Mechanics

Filed under: Microsoft,Quantum,Semantics — Patrick Durusau @ 11:52 am

Microsoft’s Quantum Mechanics by Tom Simonite.

From the post:

In 2012, physicists in the Netherlands announced a discovery in particle physics that started chatter about a Nobel Prize. Inside a tiny rod of semiconductor crystal chilled cooler than outer space, they had caught the first glimpse of a strange particle called the Majorana fermion, finally confirming a prediction made in 1937. It was an advance seemingly unrelated to the challenges of selling office productivity software or competing with Amazon in cloud computing, but Craig Mundie, then heading Microsoft’s technology and research strategy, was delighted. The abstruse discovery—partly underwritten by Microsoft—was crucial to a project at the company aimed at making it possible to build immensely powerful computers that crunch data using quantum physics. “It was a pivotal moment,” says Mundie. “This research was guiding us toward a way of realizing one of these systems.”

Microsoft is now almost a decade into that project and has just begun to talk publicly about it. If it succeeds, the world could change dramatically. Since the physicist Richard Feynman first suggested the idea of a quantum computer in 1982, theorists have proved that such a machine could solve problems that would take the fastest conventional computers hundreds of millions of years or longer. Quantum computers might, for example, give researchers better tools to design novel medicines or super-efficient solar cells. They could revolutionize artificial intelligence.

Fairly upbeat review of current efforts to build a quantum computer.

You may want to off-set it by reading Scott Aaronson’s blog, Shtetl-Optimized, which has the following header note:

If you take just one piece of information from this blog:
Quantum computers would not solve hard search problems
instantaneously by simply trying all the possible solutions at once. (emphasis added)

See in particular: Speaking Truth to Parallelism at Cornell

Whatever speedups are possible with quantum computers, getting a semantically incorrect answer faster isn’t an advantage.

Assumptions about faster computing platforms include an assumption of correct semantics. There have been no proofs of default correct handling of semantics by present day or future computing platforms.

I first saw this in a tweet by Peter Lee.

PS: I saw the reference to Scott Aaronson’s blog in a comment to Tom’s post.

August 5, 2014

Bioinformatics Data and Microsoft Word

Filed under: Bioinformatics,Microsoft — Patrick Durusau @ 4:25 pm

Is there ever a valid reason for storing bioinformatics data in a Microsoft Word document? by Keith Bradnam.

You already know the answer from the title so I will skip to the conclusion:

This is not an acceptable practice! Use of Microsoft Word to store bioinformatics data will only ever result in unhappiness, frustration, and anger.

I think Keith, myself and many others who make the same or similar points are missing one critical issue:

Why is MS Word (or Excel) so much easier to use than other applications for bioinformatics?

Or perhaps even more to the point:

Why hasn’t bioinformatics lobbied for extensions to MS Word or Excel to work with their workflow?

For the most part, users aren’t really interested in a personal relationship with their computer or a religious experience with their software. They want to get some non-hardware/non-software task done. (full stop)

Rather than trying to fix users, why don’t we try to fix their tools?

Shouldn’t I be able to create a new MS Word or OpenOffice document, indicate that it contains gene names and simply type them in? And have them intelligently extracted for use with genome databases?

“Fixing” users isn’t a winning strategy. Let’s trying fixing their tools. No promises but we know the other approach fails.

June 18, 2014

Drag-n-Drop Machine Learning?

Filed under: Azure Marketplace,Machine Learning,Microsoft — Patrick Durusau @ 9:34 am

Microsoft to provide drag-and-drop machine learning on Azure by Derrick Harris.

From the post:

Microsoft is stepping up its cloud computing game with a new service called Azure Machine Learning that users visually build and machine learning models, and then publish APIs to insert those models into applications. The service, which will be available for public preview in July, is one of the first of its kind and the latest demonstration of Microsoft’s heavy investment in machine learning.

Azure Machine Learning will include numerous prebuilt model types and packages, including recommendation engines, decision trees, R packages and even deep neural networks (aka deep learning models), explained Joseph Sirosh, corporate vice president at Microsoft. The data that the models train on and analyze can reside in Azure or locally, and users are charged based on the number of API calls to their models and the amount of computing resources consumed running them.

The reason why there are so few data scientists today, Sirosh theorized, is that they need to know so many software tools and so much math and computer science just to experiment and build models. Actually deploying those models into production, especially at scale, opens up a whole new set of engineering challenges. Sirosh said Microsoft hopes Azure Machine Learning will open up advanced machine learning to anyone who understands the R programming language or, really, anyone with a respectable understanding of statistics.

“It’s also very simple. My high school son can build machine learning models and publish APIs,” he said.

Reducing the technical barriers to use machine learning is a great thing. However, if that also results in reducing the understanding of machine learning, its perils and pitfalls, that is also a very bad thing.

One of the strengths of the Weka courses taught by Prof. Ian H. Witten is that students learn that choices are made in machine learning algorithms that aren’t apparent to the casual user. And that data choices can make as much different in outcomes as the algorithms used to process that data.

Use of software with no real understanding of its limitations isn’t new but with Azure Machine Learning any challenge to analysis will be met with the suggestion you “…run the analysis yourself.” Where the speaker does not understand that a replicated a bad result is still a bad result.

Be prepared to challenge data and means of analysis used in drag-n-drop machine learning drive-bys.

May 28, 2014

Microsoft Research’s Naiad Project

Filed under: BigData,Microsoft,Naiad — Patrick Durusau @ 3:28 pm

Solve the Big Data Problems of the Future: Join Microsoft Research’s Naiad Project by Tara Grumm.

From the post:

Over the past decade, general-purpose big data platforms like Hadoop have brought distributed computing into the mainstream. As people have become accustomed to processing their data in the cloud, they have become more ambitious, wanting to do things like graph analysis, machine learning, and real-time stream processing on their huge data sources.

Naiad is designed to solve this more challenging class of problems: it adds support for a few key primitives – maintaining state, executing loops, and reacting to incoming data – and provides high-performance infrastructure for running them in a scalable distributed system.

The result is the best of both worlds. Naiad runs simple programs just as fast as existing general-purpose platforms, and complex programs as fast as specialized systems for graph analysis, machine learning, and stream processing. Moreover, as a general-purpose system, Naiad lets you compose these different applications together, enabling mashups (such as computing a graph algorithm over a real-time sliding window of a social media firehose) that weren’t possible before.

Who should use Naiad?

We’ve designed Naiad to be accessible to a variety of different users. You can get started right away with Naiad by writing programs using familiar declarative operators based on SQL and LINQ.

For power users, we’ve created low-level interfaces to make it possible to extend Naiad without sacrificing any performance. You can plug in optimized data structures and algorithms, and build new domain-specific languages on top of Naiad. For example, we wrote a graph processing layer on top of Naiad that has performance comparable with (and often better than) specialized systems designed only to process graphs.

Big data geeks and open source supporters should take a serious look at the Naiad Project.

It will take a while but the real question in the future will be how well you can build upon a continuous data substrate.

Or as Harvey Logan says in Butch Cassidy and the Sundance Kid,

Rules? In a knife fight? No rules!

I would prepare accordingly.

May 2, 2014

IE Patched!

Filed under: Cybersecurity,Microsoft,Security — Patrick Durusau @ 7:49 pm

Microsoft patches major Internet Explorer security flaw, even for Windows XP by Kif Leswing.

From the post:

Microsoft has patched a major Internet Explorer browser security flaw, the company announced in a blog post Thursday. Notably, the patch will be pushed out to Windows XP machines, which Microsoft had said it would stop supporting on April 8.

Here is the sequence of events as I understand them:

  1. Microsoft announces the bug. No fixes for XP.
  2. Department of Homeland Security says: “Don’t Use Internet Explorer!”
  3. Microsoft announces a patch for the bug, including XP.

All in less than a week.

Should we be reporting security bugs to DHS and not US-CERT?

Seems like DHS has found a way to get bugs fixed.

Yes?

March 25, 2014

Microsoft Outlook Users Face Zero-Day Attack

Filed under: Cybersecurity,Microsoft,NSA,Security — Patrick Durusau @ 6:50 pm

Microsoft Outlook Users Face Zero-Day Attack by Mathew J. Schwartz.

From the post:

Simply previewing maliciously crafted RTF documents in Outlook triggers exploit of bug present in Windows and Mac versions of Word, Microsoft warns

There is a new zero-day attack campaign that’s using malicious RTF documents to exploit vulnerable Outlook users on Windows and Mac OS X systems, even if the emailed documents are only previewed.

That warning was sounded Monday by Microsoft, which said that it’s seen “limited, targeted attacks” in the wild that exploit a newly discovered Microsoft Word RTF file format parser flaw, which can be used to corrupt system memory and execute arbitrary attack code.

“An attacker who successfully exploited this vulnerability could gain the same user rights as the current user,” said a Microsoft’s security advisory. “If the current user is logged on with administrative user rights, an attacker who successfully exploited this vulnerability could take complete control of an affected system. An attacker could then install programs; view, change, or delete data; or create new accounts with full user rights.”

It’s only Snowden Year One (SY1) and with every new zero-day attack that makes the news I wonder: “Did this escape from the NSA?”

The other lesson: Only by building securely can there be any realistic computer security.

One good place to start would be building software that reads (if not also writes) popular office formats securely.

March 15, 2014

SharePoint Conference 2014

Filed under: Microsoft,SharePoint — Patrick Durusau @ 9:21 pm

The Ultimate Script to download SharePoint Conference 2014 Videos AND slides! by Vlad Catrinescu.

From the post:

After everyone posted about 10 script versions to download the SharePoint Conference 2014 videos I decided to add some extra value before releasing mine! This is what my script does:

  • Downloads all the SPC14 Sessions and Slides
  • Groups them by folders
  • Makes sure no errors come up due to Illegal File names.
  • If you stop the script and restart in the middle, it will start where it left off and not from beginning.

The Total size will be a bit under 70GB. (emphasis added)

I’m always looking for scripts that will help you collect data and this sounded interesting.

Well, until I read it’s about 70GB of presentations/videos on SharePoint! 😉

Still, I suppose it will be useful for data mining about SharePoint.

And it should give you a good idea of what the baseline is for SharePoint-like services.

(All teasing to one side, what SharePoint attempts to address is a hard problem. Poor project design and what I interpret as a desire to prevent data access are not the fault of SharePoint. Not that I am a SharePoint fan but fair is fair.)

March 10, 2014

Data Science 101: Deep Learning Methods and Applications

Filed under: Data Science,Deep Learning,Machine Learning,Microsoft — Patrick Durusau @ 7:56 pm

Data Science 101: Deep Learning Methods and Applications by Daniel Gutierrez.

From the post:

Microsoft Research, the research arm of the software giant, is a hotbed of data science and machine learning research. Microsoft has the resources to hire the best and brightest researchers from around the globe. A recent publication is available for download (PDF): “Deep Learning: Methods and Applications” by Li Deng and Dong Yu, two prominent researchers in the field.

Deep sledding with twenty (20) pages of bibliography and pointers to frequently updated lists of resources (at page 8).

You did say you were interested in deep learning. Yes? 😉

Enjoy!

February 24, 2014

[Browsing] the .Net Reference Source

Filed under: .Net,Microsoft — Patrick Durusau @ 5:08 pm

How to browse the .NET Reference Source by Immo Landwerth.

About 2.5 minutes introduction to browing the .Net Reference source.

When you see the user experience, I think you are going to be way under-impressed.

Much better than what they had but whether it is up to par for today?, is a different question.

Imbuing source code with semantics and enabling browsing/searching on the basis those semantics would produce much more attractive results.

Preview the beta release at: http://referencesource-beta.microsoft.com/

How would you improve the source code!

Even minor comments have the potential to impact 90+% of the operating system in existence.

Enjoy!

January 31, 2014

DOCX -> HTML/CSS

Filed under: Conversion,Microsoft,XML — Patrick Durusau @ 2:04 pm

Transform DOCX to HTML/CSS with High-Fidelity using PowerTools for Open XML by Eric White.

From the post:

Today I am happy to announce the release of HtmlConverter version 2.06.00, which is a high fidelity conversion from DOCX to HTML/CSS. HtmlConverter is a module in the PowerTools for Open XML project.

….
HtmlConverter.cs 2.06.00 supports:

  • Paragraph styles, character styles, and table styles, including styles that are based on other styles.
  • Table styles includes support for conditional table style options (header row, total row, banded rows, first column, last column, and banded columns.
  • Fonts, including font styles such as bold, italic, underline, strikethrough, foreground and background colors, shading, sub-script, super-script, and more.  HtmlConverter is, in effect, guidance on how to correctly determine the font and formatting for each paragraph and text run in a document.
  • Numbered and bulleted lists.  Current support is only for en-US and fr-FR; however, HtmlConverter is factored and parameterized so that you can support other languages without altering the source code.  In the near future, I’ll be publishing guidance and instructions on how to support additional languages, and I’ll be asking for volunteers to write and contribute the bits of code to generate canonical (one, two, three) and ordinal (first, second, third) implementations for your native language, as well as the various Asian and RTL numbering systems.
  • Tabs, including left tabs, right tabs, centered tabs, and decimal tabs.  HtmlConverter takes the approach of using font metrics to calculate the exact width of the various pieces of text in a line, and inserts <span> elements with precisely calculated widths.
  • High fidelity support for vertical white space and horizontal white space, including indented text, hanging indents, centered text, right justified text, and justified text.
  • Borders around paragraphs, and high fidelity for borders of tables.
  • Horizontally and vertically merged cells in tables.
  • External hyperlinks, and internal hyperlinks to bookmarks within the document.
  • You have much more control over the conversion when compared to other approaches to converting to HTML.  There are already a number of parameters that enable you to control the transformation, and in the future I’ll be adding many more knobs and levers to fine tune the conversion.  And of course, you have the source code, so you can customize the conversion for your scenario.

See Eric’s post for questions about what priority desired features should have for addition to HtmlConverter.

BTW:

PowerTools for Open XML is licensed under the Microsoft Public License (Ms-PL), which gives you wide latitude in how you use the code, including its use in commercial products and open source projects.

It won’t be long until “not open source” software will be worthy of comment.

I first saw this in a tweet by Open Microsoft.

January 25, 2014

WorldWide Telescope Upgrade!

Filed under: Astroinformatics,Microsoft — Patrick Durusau @ 4:56 pm

A notice about the latest version was forwarded to me and it read in part:

WorldWide Telescope is celebrating its 5th anniversary with a new release that has a completely re-written rendering engine that supports DirectX11 and runs in 64bit to give you the a wealth of new features including cinematic quality rendering and new timeline tours that allow channel by channel key frames for precise control, loads of new overlays and much more.

We also have a completely new website for this release with a responsive design for our modern mix of devices. Please use it and give use feedback. We will be adding lots of new content, including many new web interactive pages using our HTML5 control so that people with any device can enjoy our data even without the full Windows Client.

All of which sounds great and kudos to Microsoft.

Unfortunately I can’t view the upgraded site because I am running (on a VM) a version of Windows prior to Windows 7 and Windows 8. My, where does the time go. 😉

I have plenty of room for another VM so I guess it is time to spin another one up.

If you are already on Windows 7 or 8, check out the new site. If not, look for the legacy version until you can upgrade!

January 22, 2014

Empowering Half a Billion Users For Free –
Would You?

Filed under: Excel,Hadoop YARN,Hortonworks,Microsoft — Patrick Durusau @ 5:24 pm

How To Use Microsoft Excel to Visualize Hadoop Data by Saptak Sen.

From the post:

Microsoft and Hortonworks have been working together for over two years now with the goal of bringing the power of Big Data to a billion people. As a result of that work, today we announced the General Availability of HDP 2.0 for Windows with the full power of YARN.

There are already over half a billion Excel users on this planet.

So, we have put together a short tutorial on the Hortonworks Sandbox where we walk through the end-to-end data pipeline using HDP and Microsoft Excel in the shoes of a data analyst at a financial services firm where she:

  • Cleans and aggregates 10 years of raw stock tick data from NYSE
  • Enriches the data model by looking up additional attributes from Wikipedia
  • Creates an interactive visualization on the model

You can find the tutorial here.

As part of this process you will experience how simple it is to integrate HDP with the Microsoft Power BI platform.

This integration is made possible by the community work to design and implement WebHDFS, an open REST API in Apache Hadoop. Microsoft used the API from Power Query for Excel to make the integration to Microsoft Business Intelligence platform seamless.

Happy Hadooping!!!

Opening up Hadoop to a half of billion users can’t do anything but drive the development of the Hadoop ecosystem.

Which will in turn return more benefits to the Excel user community, which will drive usage of Excel.

That’s what I call a smart business strategy.

You?

PS: Where are there similar strategies possible for subject identity?

January 20, 2014

Microsoft Research adopts Open Access… [Write to MS]

Filed under: Microsoft,Open Access — Patrick Durusau @ 3:43 pm

Microsoft Research adopts Open Access policy for publications

From the post:

In a recent interview with Scientific American, Peter Lee, head of Microsoft Research, discussed three main motivations for basic research at Microsoft. The first relates to an aspiration to advance human knowledge, the second derives from a culture that relies deeply on the ambitions of individual researchers, and the last concerns “promoting open publication of all research results and encouraging deep collaborations with academic researchers.”

It is in keeping with this third motivation that Microsoft Research recently committed to an Open Access policy for our researchers’ publications.

As evidenced by a long-running series of blog posts by Tony Hey, vice president of Microsoft Research Connections, Microsoft Research has carefully deliberated our role in the growing movement toward open publications and open data.

This is great news. When Microsoft steps, it’s a big step. Heard near and far.

Take the time to write to anyone you know at Microsoft just to say you appreciate the decision.

We all write to them to complain about MS products, so why not write a nice note about open access?

It won’t take five (5) minutes if you open up your email client right now. (I wrote one before I posted this entry.)

November 9, 2013

Analyzing Social Media Networks using NodeXL [D.C., Nov. 13th]

Filed under: Graphs,Microsoft,Networks,NodeXL,Visualization — Patrick Durusau @ 8:22 pm

Analyzing Social Media Networks using NodeXL by Marc Smith.

From the post:

I am excited to have the opportunity to present a NodeXL workshop with Data Community DC on November 13th at 6pm in Washington, D.C.

In this session I will describe the ways NodeXL can simplify the process of collecting, storing, analyzing, visualizing and publishing reports about connected structures. NodeXL supports the exploration of social media with import features that pull data from personal email indexes on the desktop, Twitter, Flickr, Youtube, Facebook and WWW hyperlinks.

NodeXL allows non-programmers to quickly generate useful network statistics and metrics and create visualizations of network graphs. Filtering and display attributes can be used to highlight important structures in the network. Innovative automated layouts make creating quality network visualizations simple and quick.

Apologies for the short notice but I just saw the workshop announcement today.

If you are in the D.C. area and have any interest in graphs or visualization at all, you need to catch this presentation.

If you don’t believe me, take a look at the NodeXL gallery that Marc mentions in his post:

http://nodexlgraphgallery.org/Pages/Default.aspx

Putting graph visualization into the hands of users?

August 15, 2013

Video Tutorials on Hadoop for Microsoft Developers

Filed under: Hadoop,Microsoft — Patrick Durusau @ 7:05 pm

Video Tutorials on Hadoop for Microsoft Developers by Marc Holmes.

From the post:

If you’re a Microsoft developer and stepping into Hadoop for the first time with HDP for Windows, then we thought we’d highlight this fantastic resource from Rob Kerr, Chris Campbell and Garrett Edmondson : the MSBIAcademy.

They’ve produced a high quality, practical series of videos covering anything from essential MapReduce concepts, to using .NET (in this case C#) to submit MapReduce jobs to HDInsight, to using Apache Pig for Web Log Analysis. As you may know, HDInsight is based on Hortonworks HDP platform.

More resources on Hadoop by Microsoft! (see: Microsoft as Hadoop Leader)

The more big data, the greater the need for accurate and repeatable semantics.

Go big data!

August 12, 2013

Microsoft as Hadoop Leader

Filed under: Hadoop,Microsoft,REEF — Patrick Durusau @ 3:03 pm

Microsoft to open source a big data framework called REEF by Derrick Harris.

From the post:

Microsoft has developed a big data framework called REEF (a graciously simple acronym for Retainable Evaluator Execution Framework) that the company intends to open source in about a month. REEF is designed to run on top of YARN, the next-generation resource manager for Hadoop, and is particularly well suited for building machine learning jobs.

Microsoft Technical Fellow and CTO of Information Services Raghu Ramakrishnan explained REEF and Microsoft’s plans to open source it during a Monday morning keynote at the International Conference for Knowledge Mining and Data Discovery, taking place in Chicago.

YARN is a resource manager developed as part of the Apache Hadoop project that lets users run and manage multiple types of jobs (e.g., batch MapReduce, stream processing with Storm and/or a graph-processing package) atop the same cluster of physical machines. This makes it possible not only to consolidate the number of systems that an organization has to manage, but also to run different types of analysis on top of the same data from the same place. In some cases, the entire data workflow can be carried out on just one cluster of machines.

This is very good news!

In part because it furthers the development of the Hadoop ecosystem.

But also because it reinforces the Microsoft commitment to the Hadoop ecosystem.

If you think of TCP/IP as a roadway, consider the value of good and services moving along it.

Now think of the Hadoop ecosystem as another roadway.

An interoperable and high-speed roadway for data and data analysis.

Who has user facing applications that rely on data and data analysis? 😉

Here’s to hoping that MS doubles down on the Hadoop ecosystem!

May 21, 2013

Hadoop, Hadoop, Hurrah! HDP for Windows is Now GA!

Filed under: Hadoop,Hortonworks,Microsoft — Patrick Durusau @ 4:54 pm

Hadoop, Hadoop, Hurrah! HDP for Windows is Now GA! by John Kreisa.

From the post:

Today we are very excited to announce that Hortonworks Data Platform for Windows (HDP for Windows) is now generally available and ready to support the most demanding production workloads.

We have been blown away with the number and size of organizations who have downloaded the beta bits of this 100% open source, and native to Windows distribution of Hadoop and engaged Hortonworks and Microsoft around evolving their data architecture to respond to the challenges of enterprise big data.

With this key milestone HDP for Windows offers the millions of customers running their business on Microsoft technologies an ecosystem-friendly Hadoop-based solution that is built for the enterprise and purpose built for Windows. This release cements Apache Hadoop’s role as a key component of the next generation enterprise data architecture, across the broadest set of datacenter configurations as HDP becomes the first production-ready Apache Hadoop distribution to run on both Windows and Linux.

Additionally, customers now also have complete portability of their Hadoop applications between on-premise and cloud deployments via HDP for Windows and Microsofts’s HDInsight Service.

Two lessons here:

First, Hadoop is a very popular way to address enterprise big data.

Second, going where users are, not where they ought to be, is a smart business move.

May 17, 2013

Hadoop SDK and Tutorials for Microsoft .NET Developers

Filed under: .Net,Hadoop,MapReduce,Microsoft — Patrick Durusau @ 3:39 pm

Hadoop SDK and Tutorials for Microsoft .NET Developers by Marc Holmes.

From the post:

Microsoft has begun to treat its developer community to a number of Hadoop-y releases related to its HDInsight (Hadoop in the cloud) service, and it’s worth rounding up the material. It’s all Alpha and Preview so YMMV but looks like fun:

  • Microsoft .NET SDK for Hadoop. This kit provides .NET API access to aspects of HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive. The latter is really interesting as it builds on the established technology for .NET developers to access most data sources to deliver the capabilities of the de facto standard for Hadoop data query.
  • HDInsight Labs Preview. Up on Github, there is a series of 5 labs covering C#, JavaScript and F# coding for MapReduce jobs, using Hive, and then bringing that data into Excel. It also covers some Mahout use to build a recommendation engine.
  • Microsoft Hive ODBC Driver. The examples above use this preview driver to enable the connection from Hive to Excel.

If all of the above excites you our Hadoop on Windows for Developers training course also similar content in a lot of depth.

Hadoop is coming to an office/data center near you.

Will you be ready?

« Newer PostsOlder Posts »

Powered by WordPress