June « 2014 « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 3, 2014

A first-person engine in 265 lines

Filed under: Games,Graphics,Interface Research/Design,Visualization — Patrick Durusau @ 6:18 pm

From the post:

Today, let’s drop into a world you can reach out and touch. In this article, we’ll compose a first-person exploration from scratch, quickly and without difficult math, using a technique called raycasting. You may have seen it before in games like Daggerfall and Duke Nukem 3D, or more recently in Notch Persson’s ludum dare entries. If it’s good enough for Notch, it’s good enough for me!
…

Not a short exercise but I like the idea of quick to develop interfaces.

Do you know if in practice it makes it easier to change/discard interfaces?

Thanks!

I first saw this in a tweet by Hunter Loftis.

Comments Off

GitBook:…

Filed under: Books,Publishing,Typography — Patrick Durusau @ 4:39 pm

GitBook: Write Books using Markdown on OpenShift by Marek Jelen.

From the post:

GitBook is a tool for using Markdown to write books, which are converted to dynamic websites or exported to static formats like PDF. GitBook also integrates with Git and GitHub, adding a social element to the book creation process.

If you are exporting your book into an HTML page, interactive aspects are also embedable. At the time of this writing, the system provides support for quizzes and JavaScript exercises. However, the tool is fully open source and written using Node.js, so you are free to extend the functionality to meet your needs.

…

The Gitbook Learn Javascript is used as an example of production with GitBook.

It’s readable but in terms of the publishing craft, the Mikraot Gedolot or The Art of Computer Programming (TAOCP), it’s not.

Still, it may be useful for one-off exports from topic maps and other data sources.

Comments Off

Talend 5.5 (DYI Data Integration)

Filed under: Data Integration,Integration,Talend — Patrick Durusau @ 3:09 pm

Talend Increases Big Data Integration Performance and Scalability by 45 Percent

From the post:

Only Talend 5.5 allows developers to generate high performance Hadoop code without needing to be an expert in MapReduce or Pig

(BUSINESS WIRE)–Hadoop Summit — Talend, the global big data integration software leader, today announced the availability of Talend version 5.5, the latest release of the only integration platform optimized to deliver the highest performance on all leading Hadoop distributions.

Talend 5.5 enhances Talend’s performance and scalability on Hadoop by an average of 45 percent. Adoption of Hadoop is skyrocketing and companies large and small are struggling to find enough knowledgeable Hadoop developers to meet this growing demand. Only Talend 5.5 allows any data integration developer to use a visual development environment to generate native, high performance and highly scalable Hadoop code. This unlocks a large pool of development resources that can now contribute to big data projects. In addition, Talend is staying on the cutting edge of new developments in Hadoop that allow big data analytics projects to power real-time customer interactions.

….

Version 5.5 of all Talend open source products is available for immediate download from Talend’s website, www.talend.com. Experimental support for Spark code generation is also available immediately and can be downloaded from the Talend Exchange on Talendforge.org. Version 5.5 of the commercial subscription products will be available within 3 weeks and will be provided to all existing Talend customers as part of their subscription agreement. Products can be also be procured through the usual Talend representatives and partners.

To learn more about Talend 5.5 with 45 percent faster Big Data integration Performance register here for our June 10 webinar.
….

When you think of the centuries it took to go from a movable type press to modern word processing and near professional printing/binding capabilities, the enabling of users to perform data processing/integration, is nothing short of amazing.

Data scientists need not fear DYI data processing/integration any more than your local bar association fears “How to Avoid Probate” books on the news stand.

I don’t doubt people will be able to get some answer out of data crunching software but did they get a useful answer? Or an answer sufficient to set company policy? Or an answer that will increase their bottom line?

Encourage the use of open source software. Non-clients who use it poorly will fail. Make sure they can’t say the same about your clients.

BTW, the webinar appears to be scheduled for thirty (30) minutes. Thirty minutes on Talend 5.5? You will be better off spending that thirty minutes with Talend 5.5.

Comments (1)

Google Spreadsheets -> R

Filed under: R,Spreadsheets — Patrick Durusau @ 2:01 pm

Reading data from the new version of Google Spreadsheets by Andrie de Vries.

From the post:

Spreadsheets remain an important way for people to share and work with data. Among other providers, Google has provided the ability to create online spreadsheets and other documents.

Back in 2009, David Smith posted a blog entry on how to use R, and specifically the XML package to import data from a Google Spreadsheet. Once you marked your Google sheet as exported, it took about two lines of code to import your data into a data frame.

But things have changed

More recently, it seems that Google changed and improved the Spreadsheet product. Google's own overview of changes lists some changes, but one change isn't on this list. In the previous version, it was possible to publish a sheet as a csv file. In the new version it is still possible to publish a sheet, but the ability to do this as csv is no longer there.

On April 5, 2014 somebody asked a question on StackOverflow on how to deal with this.

Because I had the same need to import data from a spreadsheet shared in our team, I set out to find and answer.

…

Deep problems require a lot of time to solve but you feel productive after having solved them.

Solving shallow problems that eat up nearly as much time as deep ones, not so much.

Posts like this one can save you from re-inventing a solution or scouring the web for one, if not both.

File this under Google spreadsheets, extraction.

Comments Off

The Rise of Gremlitron

Filed under: Graphs,Gremlin — Patrick Durusau @ 1:48 pm

TinkerPop3 RFClease — The Rise of Gremlitron by Marko A. Rodriguez.

From the post:

TinkerPop3’s SNAPSHOT release is now ready for review, comments, and brave souls wishing to do implementations.
CODE: https://github.com/tinkerpop/tinkerpop3
DOCS: http://tinkerpop.com/docs/current/

There are lots of new things about TinkerPop3 and I would like to take the time to review some of the best parts here:

1. Blueprints, Frames, Pipes, Furnace, and Rexster are no longer terms…
– Blueprints => Gremlin Structure
– Blueprints/Pipes => Gremlin Process
– Frames => Gremlin DSLs
– Furnace => Gremlin OLAP (GraphComputer)
– Rexster => Gremlin Server

…..

Gremlitron

Marko has always had a way with images!

In order to appreciate all the changes in this release of Gremlin, you will need to take the test drive. Reading the short descriptions or kicking the wheels is no substitute for trying it against your existing or anticipated graphs.

I would call out the obvious topic map issue, that of changing the traditional names to “Gremlin + (some string).”

I rather doubt anyone is going to hunt down existing email, documentation, notes, presentations, etc. and clean up all the references to Blueprints, Frames, Pipes, Furnace and Rexster. How important is that? Hard to say right now but it is the sort of issue that topic maps were designed to solve.

Could be important in terms of researching prior art, assuming that U.S. patent law continues to deteriorate. I’m thinking about patenting numerical order. Opps! Should not have said that! 😉

Enjoy!

Comments Off

June 2, 2014

Cassandra 2.1 (1st RC)

Filed under: Cassandra — Patrick Durusau @ 7:09 pm

Since we were just talking about Cassandra in connection with Titan, thought you would be interested in the newest release candidate for Cassandra 2.1.

Download here (Under Development Cassandra Server Releases (not production ready)).

Changes.

Test at your own risk but I am sure useful bug reports will be deeply appreciated.

Follow: How to File a Good Bug Report or similar documents.

Comments Off

Powers of Ten – Part II

Filed under: Faunus,Graphs,Gremlin,Titan — Patrick Durusau @ 6:54 pm

Powers of Ten – Part II by Stephen Mallette.

From the post:

“‘Curiouser and curiouser!’ cried Alice (she was so much surprised, that for the moment she quite forgot how to speak good English); ‘now I’m opening out like the largest telescope that ever was!”
— Lewis Carroll – Alice’s Adventures in Wonderland

It is sometimes surprising to see just how much data is available. Much like Alice and her sudden increase in height, in Lewis Carroll’s famous story, the upward growth of data can happen quite quickly and the opportunity to produce a multi-billion edge graph becomes immediately present. Luckily, Titan is capable of scaling to accommodate such size and with the right strategies for loading this data, the development efforts can more rapidly shift to the rewards of massive scale graph analytics.

This article represents the second installment in the two part Powers of Ten series that discusses bulk loading data into Titan at varying scales. For purposes of this series, the “scale” is determined by the number of edges to be loaded. As it so happens, the strategies for bulk loading tend to change as the scale increases over powers of ten, which creates a memorable way to categorize different strategies. “Part I” of this series, looked at strategies for loading millions and tens of millions of edges and focused on usage of Gremlin to do so. This part of the series will focus on hundreds of millions and billions of edges and will focus on the usage of Faunus as the loading tool.

Note: By Titan 0.5.0, Faunus will be pulled into the Titan project under the name Titan/Hadoop.

Scaling to graph processing to hundreds of millions and billions of edges.

Deeply interesting work but I am left with multiple questions:

Hundreds of millions and billions of edges, to load. Any other graph metrics? Traversal for example?
Does loading performance scale with more servers? Instead of m2.4xlarge EC2 instances, what is the performance with 8x?
What kind of knob tuning was useful with a social network dataset?

I am sure there are other questions but those are the first ones that came to mind.

Comments Off

Light Table 0.6.6

Filed under: Clojure,Functional Programming — Patrick Durusau @ 4:36 pm

Light Table 0.6.6 by Chris Granger.

From the post:

Happy to announce a new, and fairly big, release of Light Table today! The highlight of this release comes from moving LT to CodeMirror 4, which gives us multiple cursors, tons of performance improvements, and a few other little editing niceties. Here's a list of the new multiple cursors commands:

Editor: Set selection to top most cursor

Editor: Clear multiple cursors

Editor: Insert line after

Editor: Insert line before

Editor: Select next occurrence of word

Editor: Select between brackets

Editor: Select scope

Editor: Go to bracket

Editor: Swap line up

Editor: Swap line down

Editor: Join lines

Editor: Duplicate line

Editor: Sort lines

Editor: Sort lines insensitive

Editor: Select lines upward with multiple cursors

Editor: Select lines downward with multiple cursors

Editor: Split selection into cursors per line

If you aren’t already wedded to an editor or IDE, now would be a good time to take a look at Light Table.

Comments Off

openFDA

Filed under: Government,Government Data,Medical Informatics,Open Access,Open Data — Patrick Durusau @ 4:30 pm

openFDA

Not all the news out of government is bad.

Consider openFDA which is putting

More than 3 million adverse drug event reports at your fingertips.

From the “about” page:

OpenFDA is an exciting new initiative in the Food and Drug Administration’s Office of Informatics and Technology Innovation spearheaded by FDA’s Chief Health Informatics Officer. OpenFDA offers easy access to FDA public data and highlight projects using these data in both the public and private sector to further regulatory or scientific missions, educate the public, and save lives.

What does it do?

OpenFDA provides API and raw download access to a number of high-value structured datasets. The platform is currently in public beta with one featured dataset, FDA’s publically available drug adverse event reports.

In the future, openFDA will provide a platform for public challenges issued by the FDA and a place for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data.

We’re currently focused on working on datasets in the following areas:

Adverse Events: FDA’s publically available drug adverse event reports, a database that contains millions of adverse event and medication error reports submitted to FDA covering all regulated drugs.

Recalls (coming soon): Enforcement Report and Product Recalls Data, containing information gathered from public notices about certain recalls of FDA-regulated products

Documentation (coming soon): Structured Product Labeling Data, containing detailed product label information on many FDA-regulated product

We’ll be releasing a number of updates and additional datasets throughout the upcoming months.

OK, I’m Twitter follower #522 @openFDA.

What’s your @openFDA number?

A good experience, i.e., people making good use of released data, asking for more data, etc., is what will drive more open data. Make every useful government data project count.

Comments Off

Congress Glowers At NSA(?)

Filed under: Cybersecurity,NSA — Patrick Durusau @ 4:12 pm

The NSA Is Put on Notice Over Encryption Standards by Justin Elliott.

I was pretty excited until I read:

The amendment adopted last week by the House Committee on Science, Space, and Technology would remove an existing requirement in the law that NIST consult with the NSA on encryption standards.

In case you want to be uber precise, the amendment reads as follows:

AMENDMENT OFFERED BY MR. GRAYSON OF FLORIDA TO THE AMENDMENT IN THE NATURE OF A SUBSTITUTE

Page 101, after line 9, insert the following new section:

SEC. 411. INFORMATION SYSTEMS STANDARDS CONSULTATION

Section 20(c)(1) of the National Institute of Standards and Technology Act (15 U.S.C. 278g—3(c)(1)) is amended by striking “the National Security Agency,”.

You can imagine that the NSA wonks are rolling around on the floor after reading this news. Not out of frustration over congressional interference but gut-busting laughter that even members of Congress could be this dumb.

The section in question, presently reads:

(c) Development of standards and guidelines

In developing standards and guidelines required by subsections (a) and (b) of this section, the Institute shall–

(1) consult with other agencies and offices (including, but not limited to, the Director of the Office of Management and Budget, the Departments of Defense and Energy, the National Security Agency, the Government Accounting Office, and the Secretary of Homeland Security) to assure–

The amendment takes out the mandatory requirement that NIST consult with the NSA. Or does it?

The really funny part comes when you read “…subsection (b) of the section…”

(b) Minimum requirements for standards and guidelines

The standards and guidelines required by subsection (a) of this section shall include, at a minimum—

…..

(3) guidelines developed in coordination with the National Security Agency for identifying an information system as a national security system consistent with applicable requirements for national security systems, issued in accordance with law and as directed by the President.

Assuming you would credit an agency with the intent to obey any law passed by Congress with the record of the NSA, note that NSA will still be around to slap NIST around on “national security systems.”

I don’t doubt the good faith of the folks at NIST but when talking about encryption with the NSA, they are simply out of their league. As are members of congress.

There are any number of possible solutions to government surveillance issues, but administrative slights isn’t one of them.

Comments Off

HTML5 vs. XML: War?

Filed under: HTML5,XML — Patrick Durusau @ 3:11 pm

I stole part of the title from a tweet by Deborah A. Lapeyre that reads:

HTML5 and XML: War? Snub fest? Harmony? How should they interact? pre-Balisage 1-day Symposium. Come be heard! https://www.balisage.net/HTML5-XML/index.html

As you will gather from the tweet, Balisage is having a one day pre-conference meeting on HTML5 and XML. From the Symposium page:

Despite a decade of efforts dedicated to making XML the markup language of the Web, today it is HTML5 that has taken on that role. While HTML5 can in part be made to work with an XML syntax, reliance on that feature is rare compared to use of HTML5’s own syntax.

Over the years, the competition between these two approaches has led to animosity and frustration. But both XML and HTML5 are now clearly here to stay, and with the upcoming standardisation of HTML5 in 2014 it is now time to take stock and see how both technologies — and both communities — can coöperate constructively.

There are many environments in which these two markup languages are brought to interact. Additionally, there is much that they can learn from one another. We are looking forward to sharing experiences and ideas that bring the two together.

Mending fences and saving babies Robin Berjon

Non-Extensible Markup Language Domenic Denicola

XML on the Web Alex Miłowski, University of Edinburgh

Practical processing of HTML5 as XML and XML as HTML5 Phil Fearon, DeltaXML Ltd

Rich Hypermedia: Making readers first class citizens on the web Steven J. DeRose

Lightning talks Symposium participants

Where should the Web go in the next 25 years?

Does HTML 5 have the role of markup language of the Web?

As far as layout engines, you would have to say “partial support” for HTML5 at best.

And the number I was hearing last year was 10% of the Web using HTML5. Have you heard differently?

I’m sure the W3C is absolutely certain that HTML5 is the very thing for the Web but remember it wasn’t all that long ago that they abandoned their little red RDF wagon to its own fate. With enough money you can promote anything for more than a decade. Adoption, well, that’s something else entirely.

For me the obvious tip-off about HTML5 came from its description at Wikipedia HTML5:

It includes detailed processing models to encourage more interoperable implementations;

Anyone who needs a “detailed processing models” for “interoperability” doesn’t understand the nature of markup languages.

Markup languages capture the structure of documents in order for documents to be interchanged between applications. So long as an application can parse the document into its internal model and deliver the expected result to its user, then the document is “interoperable” between the applications.

What the W3C is attempting to hide behind its processing models is forcing users to view materials as defined by others. That is they want to take away your right to view and/or process a document as you want. Such as avoiding advertising or reformatting a document after removal of advertising.

Do you remember Rocky Horror Picture Show? And Janet’s comment about Rocky was “Well, I don’t like men with too many muscles.”

And Dr. Frank N. Furter’s response?

I didn’t make him for you!

Same can be said for HTML5. They didn’t make it for you.

If you think differently, bring your false gods to the HTML5 and XML: Mending Fences: A Balisage pre-conference symposium. Stay for the conference. It will give you time to find a new set of false gods.

Comments Off

June 1, 2014

OrientDB 1.7 is out!

Filed under: Graphs,OrientDB — Patrick Durusau @ 7:05 pm

OrientDB 1.7 is out!

From the post:

Breaking news: OrientDB 1.7 is out! We made OrientDB faster than before and with new exciting features like Distributed Sharding, the support for Lucene indexes (Full-Text and GEO-Spatial), SSL connections, Parallel queries and more.

To download OrientDB 1.7 go to: http://www.orientechnologies.com/download/

What’s new?

Core

New “minimumclusters” to auto-create X clusters per class

New cluster strategy to pick the cluster. Available round-robin, default and balanced

Added record locking via API

Removed rw/locks on schema and index manager

Cached most used users and roles in RAM (configurable)

…

See the full listing of new features at the announcement or better yet, download OrientDB 1.7 and try the new features out!

Comments Off

More Science for Computer Science

Filed under: Computer Science,Design,Science,UML — Patrick Durusau @ 6:44 pm

In Debunking Linus’s Law with Science I pointed you to a presentation by Felienne Hermans outlining why the adage:

given enough eyeballs, all bugs are shallow

is not only false but the exact opposite is in fact true. The more people who participate in development of software, the more bugs it will contain.

Remarkably, I have found another instance of the scientific method being applied to computer science.

The abstract for On the use of software design models in software development practice: an empirical investigation by Tony Gorschek, Ewan Tempero, and, Lefteris Angelis, reads as follows:

Research into software design models in general, and into the UML in particular, focuses on answering the question how design models are used, completely ignoring the question if they are used. There is an assumption in the literature that the UML is the de facto standard, and that use of design models has had a profound and substantial effect on how software is designed by virtue of models giving the ability to do model-checking, code generation, or automated test generation. However for this assumption to be true, there has to be significant use of design models in practice by developers.

This paper presents the results of a survey summarizing the answers of 3785 developers answering the simple question on the extent to which design models are used before coding. We relate their use of models with (i) total years of programming experience, (ii) open or closed development, (iii) educational level, (iv) programming language used, and (v) development type.

The answer to our question was that design models are not used very extensively in industry, and where they are used, the use is informal and without tool support, and the notation is often not UML. The use of models decreased with an increase in experience and increased with higher level of qualification. Overall we found that models are used primarily as a communication and collaboration mechanism where there is a need to solve problems and/or get a joint understanding of the overall design in a group. We also conclude that models are seldom updated after initially created and are usually drawn on a whiteboard or on paper.

I plan on citing this paper the next time someone claims that UML diagrams will be useful for readers of a standard.

If you are interested in fact correction issues at Wikipedia, you might want to suggest that in the article on UML the statement:

UML has been found useful in many design contexts,[5] so much so that is has become ubiquitous in its field.

At least the second half of it, “so much so that is has become ubiquitous in its field,” appears to be false.

Do you know of any other uses of science with regard to computer science?

I first saw this in a twee by Erik Meijer

Comments Off

Pen vs. Keyboard: Choose Wisely

Filed under: Education,Interface Research/Design — Patrick Durusau @ 6:12 pm

Students retain information better with pens than laptops by Laura Sanders.

From the post:

When it comes to taking notes, the old-fashioned way might be best. Students who jotted down notes by hand remembered lecture material better than their laptop-wielding peers did, researchers report April 23 in Psychological Science.

People taking notes on laptops have a shallower grasp of a subject than people writing with their hands, and not just because laptops distract users with other activities such as web surfing, the new study suggests.
….

The study in question: P.A. Mueller and D.M. Oppenheimer. The pen is mightier than the keyboard: advantages of longhand over laptop note taking. Psychological Science. Published online April 23, 2014. doi: 10.1177/0956797614524581.

Laura lists some resources for further reading.

What do you think this study means for the design of UIs?

I ask because some topic map UIs will be for information retrieval, where conceptual understanding isn’t at issue and others will be for imparting conceptual understandings.

What would you do differently in UI terms for those cases and just as importantly, why?

I first saw this in a tweet by Carl Anderson.

Comments Off

« Newer Posts