February « 2014 « Another Word For It

February 5, 2014

Speeding Up Big Data

Filed under: BigData,Flash Storage — Patrick Durusau @ 1:41 pm

Novel Storage Technique Speeds Big Data Processing by Tiffany Trader.

From the post:

Between the data deluge and the proliferation of uber-connected devices, the amount of data that must be stored and processed has exploded to a mind-boggling degree. One commonly cited statistic from Google Chairman Eric Schmidt holds that every two days humankind creates as much information as it did from the dawn of civilization up until 2003.

“Big data” technologies have evolved to get a handle on this information overload, but in order to be useful, the data must be stored in such a way that it is easily retrieved when needed. Until now, high-capacity, low-latency storage architectures have only been available on very high-end systems, but recently a group of MIT scientists have proposed an alternative approach, a novel high-performance storage architecture they call BlueDB (Blue Database Machine) that aims to accelerate the processing of very large datasets.

The researchers from MIT’s Department of Electrical Engineering and Computer Science have written about their work in a paper titled Scalable Multi-Access Flash Store for Big Data Analytics.
….

See the paper for a low-level view and Tiffany’s post for a high-level one.

BTW, the result of this research, BlueDB, will b e demonstrated at the International Symposium on Field-Programmable Gate Arrays in Monterey, California.

A good time to start thinking about how data structures have been influenced by storage speed.

Is normalization a useful optimization with < 1 billion records? Maybe today, but what about six months from now?

I first saw this in a tweet by Stefano Bertolo.

Comments Off

Proof Theory Foundations

Filed under: CS Lectures,Proof Theory — Patrick Durusau @ 1:02 pm

Frank Pfenning’s lectures from the Oregon Programming Languages School 2012, University of Oregon.

Unlike the astronomer in Rasselas (Chapter 41), it is insufficient in serious CS discussions to “know” you are correct. 😉

Taught along with Category Theory Foundations and Type Theory Foundations.

Comments (1)

Type Theory Foundations

Filed under: CS Lectures,Types — Patrick Durusau @ 11:54 am

Robert Harper’s lectures from the Oregon Programming Languages School 2012, University of Oregon.

If you are going to follow Walter Bright in writing a new computer language, you will need to study types.

Taught along with Category Theory Foundations and Proof Theory Foundations.

Comments Off

Category Theory Foundations

Filed under: Category Theory,CS Lectures — Patrick Durusau @ 11:46 am

Steve Awodey’s lectures from the Oregon Programming Languages School 2012, University of Oregon.

I first saw this in a tweet by Jim Duey.

More cold weather is coming and the football (U.S.) is over. 😉

Taught along with Proof Theory Foundations and Type Theory Foundations.

Comments (2)

February 4, 2014

Data Structures in Clojure:…

Filed under: Clojure,Data Structures,Lisp,Programming — Patrick Durusau @ 9:23 pm

Data Structures in Clojure: Singly-Linked Lists by Max Countryman.

From the post:

This series is about the implementation of common data structures. Throughout the series we will be implementing these data structures ourselves, exploring how they work. Our implementations will be done in Clojure. Consequently this tutorial is also about Lisp and Clojure. It is important to note that, unlike the standard collection of data structures found in Clojure, which are persistent, ours will be mutable.

To start with, we will explore a linked list implementation using deftype (more on this later) to define a JVM class from Clojure-land. This implementation will be expanded to include in-place reversal. Finally we will utilize Clojure’s built-in interfaces to give our linked list access to some of the methods Clojure provides.
…

If you aren’t going to invent your own computer language, why not learn an existing one better?

The next post is on hash tables.

Enjoy!

Comments Off

Multi-Dimensional Images / Data Cubes

Filed under: Astroinformatics,Data Cubes — Patrick Durusau @ 9:09 pm

Accessing Multi-Dimensional Images and Data Cubes In the Virtual Observatory by Bruce Berriman.

From the post:

New instruments and missions are routinely producing multi-dimensional datasets, such as Doppler velocity cubes and time-resolved movies. Observatories such as ALMA and new integral field spectrographs on ground-based telescopes are generating data cubes , and future missions such as LSST and JWST will generate ever larger volumes of them. Thus the VO, via its standards body the International Virtual Observatory Alliance (IVOA), has made it a priority by September 2014 of developing a protocol for discovering data cubes and a reference service for accessing and downloading data cubes.
…

Bruce includes a poster with a summary of the Simple Image Access Protocol (SIAP, v2).

For more details, consider the SIAP version 2.0 working draft.

The experience with SIAP will be useful when other domains scale up to the current astronomy data requirements.

Comments Off

So You Want To Write Your Own Language?

Filed under: Language,Language Design,Programming — Patrick Durusau @ 8:54 pm

So You Want To Write Your Own Language? by Walter Bright.

From the post:

The naked truth about the joys, frustrations, and hard work of writing your own programming language

My career has been all about designing programming languages and writing compilers for them. This has been a great joy and source of satisfaction to me, and perhaps I can offer some observations about what you’re in for if you decide to design and implement a professional programming language. This is actually a book-length topic, so I’ll just hit on a few highlights here and avoid topics well covered elsewhere.

In case you are wondering if Walter is a good source for language writing advice, I pulled this bio from the Dr. Dobb’s site:

Walter Bright is a computer programmer known for being the designer of the D programming language. He was also the main developer of the first native C++ compiler, Zortech C++ (later to become Symantec C++, now Digital Mars C++). Before the C++ compiler he developed the Datalight C compiler, also sold as Zorland C and later Zortech C.

I am sure writing a language is an enormous amount of work but Water makes it sound quite attractive.

Comments (1)

Sex and Big Data

Filed under: BigData,Porn — Patrick Durusau @ 8:31 pm

Sex and Big Data

A project to bring big data techniques to sexuality.

Datasets:

XHamster – approximately 800,000 entries.

Xnxx – approximately 1,200,000 entries.

I may have just missed it but you would expect a set of records from the porn videos on YouTube and Reddit. To say nothing of UseNet in the alt-sex-* groups.

Maybe I should post a note to the NSA. I am sure they have already cleaned and reconciled the data. Maybe they will post it as a public service. 😉

Comments Off

Middle Earth Maps

Filed under: Maps — Patrick Durusau @ 8:21 pm

If Middle-Earth Were Real, These Exquisite Shots Would Be Its Vacation Brochure by Peter Rubin.

From the post:

While Westeros is making a run for it, JRR Tolkien’s Middle-earth is still the undisputed champ of fantasy worlds beloved by cartographers. Several projects over the years have tried to map the land of Lord of the Rings—some great, some unrealized. Now, however, the most ambitious among them has joined forces with a videogame middleware company to transcend simple drawings and create the most stunning shots thus far.

Peter covers the results of a collaboration between Outerra and the ME-DEM (Middle-earth Digital Elevation Model) Project. Quite stunning.

To see the demo at ME-DEM, I need to install a newer version of Windows on a VM. Windows 8.1 Pro? Need to see if it is compatible with a NVIDIA video card.

I’m as confused as you are about Peter saying “…were real…” with regard to Middle Earth.

The stories of Middle Earth have influenced more people and events than many things thought to be “more real.”

Comments Off

Semantics of Business Vocabulary and Business Rules

Filed under: Business Intelligence,Semantics,Vocabularies — Patrick Durusau @ 4:52 pm

Semantics of Business Vocabulary and Business Rules

From 1.2 Applicability:

The SBVR specification is applicable to the domain of business vocabularies and business rules of all kinds of business activities in all kinds of organizations. It provides an unambiguous, meaning-centric, multilingual, and semantically rich capability for defining meanings of the language used by people in an industry, profession, discipline, field of study, or organization.

This specification is conceptualized optimally for business people rather than automated processing. It is designed to be used for business purposes, independent of information systems designs to serve these business purposes:

Unambiguous definition of the meaning of business concepts and business rules, consistently across all the terms, names and other representations used to express them, and across the natural languages in which those representations are expressed, so that they are not easily misunderstood either by “ordinary business people” or by lawyers.

Expression of the meanings of concepts and business rules in the wordings used by business people, who may belong to different communities, so that each expression wording is uniquely associated with one meaning in a given context.

Transformation of the meanings of concepts and business rules as expressed by humans into forms that are suitable to be processed by tools, and vice versa.

Interpretation of the meanings of concepts and business rules in order to discover inconsistencies and gaps within an SBVR Content Model (see 2.4) using logic-based techniques.

Application of the meanings of concepts and business rules to real-world business situations in order to enable reproducible decisions and to identify conformant and non-conformant business behavior.

Exchange of the meanings of concepts and business rules between humans and tools as well as between tools without losing information about the essence of those meanings.

I do need to repeat their warning from 6.2 How to Read this Specification:

This specification describes a vocabulary, or actually a set of vocabularies, using terminological entries. Each entry includes a definition, along with other specifications such as notes and examples. Often, the entries include rules (necessities) about the particular item being defined.

The sequencing of the clauses in this specification reflects the inherent logical order of the subject matter itself. Later clauses build semantically on the earlier ones. The initial clauses are therefore rather ‘deep’ in terms of SBVR’s grounding in formal logics and linguistics. Only after these clauses are presented do clauses more relevant to day-to-day business communication and business rules emerge.

This overall form of presentation, essential for a vocabulary standard, unfortunately means the material is rather difficult to approach. A figure presented for each sub-vocabulary does help illustrate its structure; however, no continuous ‘narrative’ or explanation is appropriate.

😉

OK. so you aren’t going to read it for giggles. But you will be encountering it in the wild world of data so at least mark the reference.

I first saw this in a tweet by Stian Danenbarger.

Comments Off

DARPA Open Catalog

Filed under: Open Source,Programming — Patrick Durusau @ 2:22 pm

DARPA Open Catalog

From the webpage:

Welcome to the DARPA Open Catalog, which contains a curated list of DARPA-sponsored software and peer-reviewed publications. DARPA funds fundamental and applied research in a variety of areas including data science, cyber, anomaly detection, etc., which may lead to experimental results and reusable technology designed to benefit multiple government domains.

The DARPA Open Catalog organizes publically releasable material from DARPA programs, beginning with the XDATA program in the Information Innovation Office (I2O). XDATA is developing an open source software library for big data. DARPA has an open source strategy through XDATA and other I2O programs to help increase the impact of government investments.

DARPA is interested in building communities around government-funded software and research. If the R&D community shows sufficient interest, DARPA will continue to make available information generated by DARPA programs, including software, publications, data and experimental results. Future updates are scheduled to include components from other I2O programs such as Broad Operational Language Translation (BOLT) and Visual Media Reasoning (VMR).

I don’t know if I would use binaries from DARPA but with open source you get to choose your own comfort level. 😉

Maybe I should ask:

How does it feel for DARPA to be more open source than your favorite vendor?

What do you think? More backdoors in open source* or binary software?

I first saw this in a tweet by Tim O’Reilly.

* Remember that open source doesn’t mean non-commercial. You can always copyright open source code. Copyright has protected Mickey Mouse longer than binary has protected COBOL programs.

Besides, open source with copyright makes it easier for you to search for infringing code doesn’t it? Enables you to ask what your competitors must be hiding. Yes?

Comments Off

The Data Avalanche in Astrophysics

Filed under: Astroinformatics,Graphs,HPC — Patrick Durusau @ 1:58 pm

The Data Avalanche in Astrophysics (podcast)

From the post:

On today’s edition of Soundbite, we’ll be talking with Dr. Kirk Borne, Professor of Astrophysics and Computational Science at George Mason University about managing the data avalanche in astronomy.

Borne has been involved in a number of data-intensive astrophysics projects, including data mining on the Galaxy Zoo database of galaxy classifications. We’ll talk about some of his experiences and what challenges lie ahead for astronomy as well what some established and emerging tools, including Graph databases, languages like Python and R, and approaches will lend to his field and big data research in general.

During the podcast, Dr. Borne talks about the rising use of graphs over the last several years on supercomputers to analyze astronomical data.

You’ll get the impression that graphs are not a recent item in high-performance computing. Which just happens to be a correct impression.

Comments Off

February 3, 2014

FISA Court Subpoena Data Released!

Filed under: Cybersecurity,NSA,Security — Patrick Durusau @ 9:42 pm

FISA Court Subpoena Data (Google) from Ed Chi.

From the post:

Todd Underwood originally shared:

This is huge. Google is finally able to publish information about the number and scope of the FISA (secret intelligence court) subpoenas received. The takeaway: it’s massively fewer subpoenas and accounts involved than many people suspected.

There are caveats. Google is required to delay reporting by six months and required to only report information in bands of 1,000. But it’s massively better than nothing.

The world, but especially US citizens, have a right to know what kind of surveillance their government is authorizing. It has been cool to see companies like Google push for this kind of openness. Reform is a long time coming and this is only the beginning. Baby steps.

The Google post with data: http://googleblog.blogspot.com/2014/02/shedding-some-light-on-foreign.html

It’s a good thing I posted about How to Lie with Statistics today!

The real danger from these numbers was voiced by Todd Underwood, when he said:

The takeaway: it’s massively fewer subpoenas and accounts involved than many people suspected.

That’s the key problem.

They aren’t violating everybody’s rights, just those nasty people hiding behind that tree over there.

How did that poem go?

First they came for the Socialists, and I did not speak out– Because I was not a Socialist.

Then they came for the Trade Unionists, and I did not speak out– Because I was not a Trade Unionist.

Then they came for the Jews, and I did not speak out– Because I was not a Jew.

Then they came for me–and there was no one left to speak for me.

First they came …

Comments Off

Features vs. Benefits

Filed under: Marketing — Patrick Durusau @ 9:22 pm

Features vs. Benefits

This is an except from a book on “User Onboarding:”

That has to be the best summary of sales strategy I have ever seen.

Visit the webpage and sign up for more information. This has a lot of promise.

BTW, substitute elected official for product and see what you think of your candidate’s rhetoric. 😉

Comments Off

How to Lie with Statistics

Filed under: Statistics — Patrick Durusau @ 8:55 pm

How to Lie with Statistics by Darrell Huff.

From the introduction:

With prospects of an end to the hallowed old British measures of inches and feet and pounds, the Gallup poll people wondered how well known its metric alternative might be. They asked in the usual way, and learned that even among men and women who had been to a university 33 per cent had never heard of the metric system.

Then a Sunday newspaper conducted a poll of its own – and announced that 98 per cent of its readers knew about the metric system. This, the newspaper boasted, showed ‘how much more knowledgeable’ its readers were than people generally.

How can two polls differ so remarkably?

Gallup interviewers had chosen, and talked to, a carefully selected cross-section of the public. The newspaper had naively, and economically, relied upon coupons clipped, filled in, and mailed in by readers.

It isn’t hard to guess that most of those readers who were unaware of the metric system had little interest in it or the coupon; and they selected themselves out of the poll by not bothering to clip and participate. This self-selection produced, in statistical terms, a biased or unrepresentative sample of just the sort that has led, over the years, to an enormous number of misleading conclusions.

Machiavelli it’s not, but as the tweet from Stat Fact says, #classic.

Comments (1)

Parallel implementation of 3D protein structure similarity searches…

Filed under: Bioinformatics,GPU,Similarity,Similarity Retrieval — Patrick Durusau @ 8:41 pm

Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA by Dariusz Mrozek, Milosz Brozek, and Bozena Malysiak-Mrozek.

Abstract:

Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT (“GPU-CASSERT”) parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.

Seventeen pages of heavy sledding but an average of 180-fold increase in speed? That’s worth the effort.

Sorry, I got distracted. How difficult did you say your subject similarity/identity problem was? 😉

Comments Off

Path Language to Topic Maps Mapping Syntax

Filed under: Mapping,Topic Maps — Patrick Durusau @ 5:37 pm

Path Language to Topic Maps Mapping Syntax by Johannes Schmidt.

From the webpage:

The “Path Language to Topic Maps Mapping Syntax (PLTTM)” defines a language to map serialized domain specific subject representations to Topic Maps equivalents. Therefore, PLTTM enables (semantic) data federation.

PLTTM uses path language statements e.g. in XPath or JSONPath to address subject representations in e.g. XML or JSON serializations. The path language statements are furthermore used to extract the relevant portions of data. […]

Cool!

Give this a close read!

Comments Off

Big Data’s Dangerous New Era of Discrimination

Filed under: BigData,Data Analysis — Patrick Durusau @ 3:16 pm

Big Data’s Dangerous New Era of Discrimination by Michael Schrage.

From the post:

Congratulations. You bought into Big Data and it’s paying off Big Time. You slice, dice, parse and process every screen-stroke, clickstream, Like, tweet and touch point that matters to your enterprise. You now know exactly who your best — and worst — customers, clients, employees and partners are. Knowledge is power. But what kind of power does all that knowledge buy?

Big Data creates Big Dilemmas. Greater knowledge of customers creates new potential and power to discriminate. Big Data — and its associated analytics — dramatically increase both the dimensionality and degrees of freedom for detailed discrimination. So where, in your corporate culture and strategy, does value-added personalization and segmentation end and harmful discrimination begin?

If you credit Robert Jackall’s Moral mazes : the world of corporate managers, Oxford, 1988, moral issues are bracketed in favor of pragmatism and group loyalty.

There was no shortage of government or corporate scandals running up to 1988 and there has been no shortage since then that fit well into Jackdall’s framework.

An evil doer may start a wrongful act but a mass scandal requires non-objection if not active assistance from a multitude that knows wrong doing is afoot.

Unlike Michael, I don’t think management will be interested in “fairly transparent” and/or “transparently fair” algorithms and analytics. Unless that serves some other goal or purpose of the organization.

Comments Off

Dear America, I Saw You Naked

Filed under: NSA,Security — Patrick Durusau @ 2:35 pm

Dear America, I Saw You Naked: And yes, we were laughing. Confessions of an ex-TSA agent by Jason Edward Harrington.

WARNING: If news about growing security state in the United States depresses you, skip this post and the article.

Just to get you interested:

I hated it from the beginning. It was a job that had me patting down the crotches of children, the elderly and even infants as part of the post-9/11 airport security show. I confiscated jars of homemade apple butter on the pretense that they could pose threats to national security. I was even required to confiscate nail clippers from airline pilots—the implied logic being that pilots could use the nail clippers to hijack the very planes they were flying.

Once, in 2008, I had to confiscate a bottle of alcohol from a group of Marines coming home from Afghanistan. It was celebration champagne intended for one of the men in the group—a young, decorated soldier. He was in a wheelchair, both legs lost to an I.E.D., and it fell to me to tell this kid who would never walk again that his homecoming champagne had to be taken away in the name of national security.
There I was, an aspiring satire writer, earnestly acting on orders straight out of Catch-22.

I quickly discovered I was working for an agency whose morale was among the lowest in the U.S. government. In private, most TSA officers I talked to told me they felt the agency’s day-to-day operations represented an abuse of public trust and funds.
…

I learned new details from Jason’s article but nothing all that surprising.

What troubles me about Jason’s account is that we as travelers have tolerated the abuse he details, despite there being no evidence that the TSA have ever stopped a single terrorist.

That’s right, there has never even been a false claim by the TSA to having caught a terrorist.

Next 9/11, it will have been thirteen (13) years and the TSA has not captured a single terrorist.

Not to mention that as of 2011, there were 25,000 breaches of airport security since 9/11.

True, the TSA captured 1,813 guns at airport checkpoints in 2013. TSA seizes record number of guns in 2013 But since security testers can get past scans and a pat down search, you wonder how many guns make it onto airplanes. Undercover agent with mock bomb breaches airport security: report

The results of “tests” of TSA security are not published. The alleged reason for non-publication of security testing results is to prevent use of that information by potential hijackers. A more obvious reason is to protect the contracts and jobs of those associated with the farce know as “airport security.”

In fact, there is no evidence that the current security procedures would stop hijackers armed the same way as the 9/11 hijackers. I say no evidence, there is no published evidence. With the known failures of the TSA on weapons and explosives, I would venture to say the 9/11 hijackers would have nothing to fear from today’s airport security.

Understanding a security problem is a lot like understanding an information problem. It isn’t sufficient to pick information that is easy to collect (like phone records) and decide that is the solution to your information problem. Yes, the 9/11 hijackers went through airports, all of them, but that doesn’t make an airport the appropriate place for a solution.

The 9/11 Commission said as much when it found:

The final layer, security on board commercial aircraft, was not designed to counter suicide hijackings.The FAA-approved “Common Strategy” had been elaborated over decades of experience with scores of hijackings, beginning in the 1960s. It taught flight crews that the best way to deal with hijackers was to accommodate their demands, get the plane to land safely, and then let law enforcement or the military handle the situation. According to the FAA, the record had shown that the longer a hijacking persisted, the more likely it was to end peacefully.The strategy operated on the fundamental assumption that hijackers issue negotiable demands (most often for asylum or the release of prisoners) and that, as one FAA official put it,“suicide wasn’t in the game plan” of hijackers. FAA training material provided no guidance for flight crews should violence occur.

This prevailing Common Strategy of cooperation and nonconfrontation meant that even a hardened cockpit door would have made little difference in a hijacking.As the chairman of the Security Committee of the Air Line Pilots Association observed when proposals were made in early 2001 to install reinforced cockpit doors in commercial aircraft,“Even if you make a vault out of the door, if they have a noose around my flight attendant’s neck, I’m going to open the door.” Prior to 9/11, FAA regulations mandated that cockpit doors permit ready access into and out of the cockpit in the event of an emergency. Even so, rules implemented in the 1960s required air crews to keep the cockpit door closed and locked in flight.This requirement was not always observed or vigorously enforced. (footnotes omitted) (The 9/11 Commission Report, page 85)

The solution to suicide hijackings is now well known:

Don’t open the cockpit door and/or allow anyone to take control of the airplane.

The 9/11 hijackers exploited flaws in airport security that persist to this day and a known flaw in U.S. hijacking policy.

Now fear of hijacking is being exploited by those who are providing no more security than existed on 9/11, at a much higher cost.

Topic maps can help connect those dots if you are interested in reducing the terrorism ROI from 9/11.

(Terrorism ROI: Security/Terrorism expenditures of the U.S. since 9/11 divided by the estimated $250,000 invested in 9/11 by terrorists)

Comments Off

Topincs – New Release

Filed under: Topic Map Software,Topic Maps,Topincs — Patrick Durusau @ 1:49 pm

Topincs 7.1.0

Now I see why Robert Cerny has been making all those screencast videos!

If you haven’t visited the Topincs homepage in a while, you really should take a look!

Sheer marketing genius when compared to most (all?) topic map marketing efforts!

I particularly liked the:

Slim down!!!

No more ~~version control~~

No more ~~build tool~~

No more ~~IDE~~

Just a web browser and Topincs

(See the Topincs homepage for the best viewing results.)

The most important new feature is the study Topincs online option.

You submit your email address and in a few moments, a login ID with password appears in your inbox.

When you login, a Topincs instance has been created for you!

How slick is that?

No download, no install, no standing on one foot while whistling out of the opposite ear, etc.

Is there a saying: “Nothing but web?” 😉

Enjoy!

I am sure any and all feedback will be greatly appreciated!

Comments Off

UX Crash Course: 31 Fundamentals

Filed under: Interface Research/Design,Library,Library software,Usability,UX — Patrick Durusau @ 10:10 am

UX Crash Course: 31 Fundamentals by Joel Marsh.

From the post:

Basic UX Principles: How to get started

The following list isn’t everything you can learn in UX. It’s a quick overview, so you can go from zero-to-hero as quickly as possible. You will get a practical taste of all the big parts of UX, and a sense of where you need to learn more. The order of the lessons follows a real-life UX process (more or less) so you can apply these ideas as-you-go. Each lesson also stands alone, so feel free to bookmark them as a reference!

Main topics:

Introduction & Key Ideas

How to Understand Users

Information Architecture

Visual Design Principles

Functional Layout Design

User Psychology

Designing with Data

Users who interact with designers, librarians and library students come to mind, would do well to review these posts. If nothing else, it will give users better questions to ask vendors about their web interface design process.

Comments Off

February 2, 2014

Catalog of the Snowden Revelations

Filed under: Cybersecurity,NSA,Security — Patrick Durusau @ 5:29 pm

Catalog of the Snowden Revelations

From the post:

This page catalogs various revelations by Edward Snowden, regarding the United States’ surveillance activities.

Each disclosure is assigned to one of the following categories: tools and methods, overseas USG locations from which operations are undertaken, foreign officials and systems that NSA has targeted, encryption that NSA has broken, ISPs or platforms that NSA has penetrated or attempted to penetrate, and identities of cooperating companies and governments.

The page will be updated from time to time and is intended as a resource regarding Snowden and the debate over U.S. surveillance. Comments and suggestions thus are welcomed, and should be sent to staff.lawfare@gmail.com.

LawFare has produced this useful, if somewhat high level, catalog of Edward Snowden‘s revelations.

Very useful for other governments when visitors from Washington start the finger waving lecture on political corruption. With a little data mining, they may be able to trace a visitor back to specific incidents.

Now that would make an interesting data set.

Violation of Pakistan’s sovereignty comes to mind. Surely that is a crime under Pakistani law.

Thoughts?

Comments Off

Finding pi…

Filed under: Marketing,Topic Maps — Patrick Durusau @ 5:05 pm

Finding pi: Enterprises must dump their legacy ideas and search for radical innovation by Nirav Shah.

From the post:

Radical innovation has historically overcome barriers to scientific progress. For example, the discovery of pi as a numerical concept found application in mathematics, physics, signal and image processing, genomics and across domains. Similarly, the internet unleashed innovation across industries. Today, the computing world stands at a point where “pi”-like innovations can unlock quantum value.

The disproportionate dichotomy

Enterprises spend $2.7 trillion on technology related products. More than 95 percent of that spend is driven by desktop or laptop related applications, services, networking and data center infrastructure for employees, partners and customers.

Amongst enterprises, there is an installed base of 700 million personal computers, while smartphones and tablets form an installed base of 400 million mobile computing units. While mobile computing units constitute 36 percent of devices, less than 5 percent of enterprise dollars are focused on the mobile device base highlighting a disproportionate dichotomy.

Nirav’s article is an interesting read but I’m not sure we should be seeking a “pi” moment.

There is evidence of π being known since approximately 1900 to 1600 BCE. Which means it has taken 3,600+ years for π to embed itself in society. I suspect investors would like a somewhat faster return on their investment.

But we don’t need a π moment to make that happen. Consider this observation from Nirav’s post:

A survey of CIOs indicate that more than two thirds of North American and European insurers will increase investment in mobile applications, however Gartner predicts that lack of alignment with customer interests and poor technical execution will lead to low adoption rates. In fact, Gartner expects that by 2016 more than 50 percent of the mobile insurance customer apps will be discontinued.

Does the Gartner stat surprise you?

How often have you sat at a basketball game wishing you could check on your automobile insurance policy?

Software apps that are born of or sustained by management echo chambers are going to fail.

There is nothing surprising or alarming about their fate.

What is alarming, at least to a degree, is that successful apps are identified after the fact of their success. Having a better model for what successful apps share in common, might increase the odds of having future successful apps.

Pointers anyone?

PS: Of course I am thinking of this in terms of topic map apps.

Assuming that a topic map can semantically integrate across languages to return 300 papers for a search instead of 200, where is the bang for me in that result? The original result was too high to be useful to me. How does having more results help?

Comments Off

Data Workflows for Machine Learning:

Filed under: Machine Learning,Workflow — Patrick Durusau @ 4:32 pm

Data Workflows for Machine Learning: by Paco Nathan.

Excellent presentation on data workflows, at least if you think of them as being primarily from one machine or process to another. Hence the closing emphasis on PMML – Predictive Model Markup Language.

Although Paco alludes to the organizational/social side of data flow, that gets lost in the thicket of technical options.

For example, at slide 25, Paco talks about using Cascading to combing the workflow from multiple departments into an integrated app.

Which I am certain is withing the capabilities of Cascading, but that does not address the social or organizational difficulties of getting that to happen.

One of the main problems in the recent U.S. health care exchange debacle was the interchange of data between two of the vendors.

I suppose in recent management lingo, no one took “ownership” of that problem. 😉

Data interchange isn’t new technical territory but failure to cooperate is as deadly to a data processing project as a melting CPU.

The technical side of data workflows is necessary for success, but so is avoiding any beaver dams across the data stream.

Dealt with any beavers lately?

Comments Off

Category Theory Using String Diagrams

Filed under: Category Theory,Mathematics — Patrick Durusau @ 4:07 pm

Category Theory Using String Diagrams by Dan Marsden.

Abstract:

In work of Fokkinga and Meertens a calculational approach to category theory is developed. The scheme has many merits, but sacrifices useful type information in the move to an equational style of reasoning. By contrast, traditional proofs by diagram pasting retain the vital type information, but poorly express the reasoning and development of categorical proofs. In order to combine the strengths of these two perspectives, we propose the use of string diagrams, common folklore in the category theory community, allowing us to retain the type information whilst pursuing a calculational form of proof. These graphical representations provide a topological perspective on categorical proofs, and silently handle functoriality and naturality conditions that require awkward bookkeeping in more traditional notation.

Our approach is to proceed primarily by example, systematically applying graphical techniques to many aspects of category theory. We develop string diagrammatic formulations of many common notions, including adjunctions, monads, Kan extensions, limits and colimits. We describe representable functors graphically, and exploit these as a uniform source of graphical calculation rules for many category theoretic concepts. We then use these graphical tools to explicitly prove many standard results in our proposed string diagram based style of proof.

This form of visualization does seem to be easier on the eyes. 😉

Whether it is sufficient or not for some particular purpose, remains to be seen.

Comments Off

Clojure for the Brave and True Update

Filed under: Clojure,Functional Programming,Programming — Patrick Durusau @ 11:01 am

There is now a Clojure for the Brave and True Update list.

You can either subscribe there or wait for me to post about the latest chapters. 😉

BTW, Daniel has a post with the title: Book Cover Concept, Revisions & Free Books. As you can guess from the title there is a book cover “doodle,” plans for the coming weeks and a short listing of pubic resources on Lisp.

Enjoy!

Comments Off

February 1, 2014

Introduction to Computational Linguistics (Scala too!)

Filed under: Computational Linguistics,Scala,Text Mining — Patrick Durusau @ 9:07 pm

Introduction to Computational Linguistics by Jason Baldridge.

From the webpage:

Advances in computational linguistics have not only led to industrial applications of language technology; they can also provide useful tools for linguistic investigations of large online collections of text and speech, or for the validation of linguistic theories.

Introduction to Computational Linguistics introduces the most important data structures and algorithmic techniques underlying computational linguistics: regular expressions and finite-state methods, categorial grammars and parsing, feature structures and unification, meaning representations and compositional semantics. The linguistic levels covered are morphology, syntax, and semantics. While the focus is on the symbolic basis underlying computational linguistics, a high-level overview of statistical techniques in computational linguistics will also be given. We will apply the techniques in actual programming exercises, using the programming language Scala. Practical programming techniques, tips and tricks, including version control systems, will also be discussed.

Jason has created a page of links, which includes a twelve part tutorial on Scala:

Part 1: the Scala REPL, expressions, variables, basic types, simple functions, saving and running programs, comments

Part 2: Tuples, Lists, methods on Lists and Strings

Part 3: Conditional execution with if-else blocks and matching

Part 4: Iterating, mapping, filtering and counting

Part 5: Regular expressions and matching with them

Part 6: Regular expression matching and substitution with the Regex API

Part 7: Maps, Sets, groupBy, Options, flatten, flatMap

Part 8: Word counting, scala.io.Source, file access, flatMap, mutable Maps

Part 9: Objects, classes, inheritance, traits, Lists with multiple related types, apply

Part 10: Scripting, compiling, main methods, return values of functions

Part 11: SBT, scalabha, packages, build systems

Part 12: Code blocks, coding style, closures, scala documentation project

If you want to walk through the course on your own, see the schedule.

Enjoy!

Comments Off

Neo4j Spatial Part 1

Filed under: Graphs,Neo4j — Patrick Durusau @ 8:58 pm

Neo4j Spatial Part 1 by Max De Marzi.

One of my new year resolutions is to do a project with Neo4j Spatial, so we’ll kick off my first blog post of the year with a gentle introduction to this awesome plugin. I advise you to watch this very short 15 minute video by Neo4j Spatial creator Craig Taverner. The man is a genius level developer, you’ll gain IQ points just listening, I swear.

Max’s layers image is haunting familiar to old time topic map hands.

This is the first of what promises to be an excellent series of posts on Neo4j Spatial.

In case you are not familiar with Neo4j Spatial:

Neo4j Spatial is a library of utilities for Neo4j that faciliates the enabling of spatial operations on data. In particular you can add spatial indexes to already located data, and perform spatial operations on the data like searching for data within specified regions or within a specified distance of a point of interest.

While you are reading the examples, recall that “spatial” in the sense of Google Maps or Open Street Map is only one sense of “spatial.”

Comments Off

Advertising RDF and Linked Data:… [Where’s the beef?]

Filed under: EU,Linked Data,RDF — Patrick Durusau @ 5:23 pm

Advertising RDF and Linked Data: SPARQL Queries on EU Data

From the webpage:

This is a collection of SPARQL queries on EU data that shows benefits of converting it to RDF and linking it, i.e. queries that reveal non-trivial information that would have been hard to reconstruct by hunting it down over separate/unlinked data sources.

At first I thought this would be a cool demonstration of the use of SPARQL, with the queries as links and more fully set forth below.

Nada. The non-working hyperlinks in the list of queries I suspect were meant to be internal links to the fuller exposition of the queries.

Then when I get to the queries, the only one that promises:

Link to query result: http://www4.wiwiss.fu-berlin.de/eures/sparql

Returns a 404.

The other links appear to be links to webpages that given a SPARQL, which if I had a SPARQL client, I could paste the SPARQL query in to see the result.

I would mirror the question:

Effort of obtaining those results without RDFizing and linking:

with:

Effort to see “…benefits of convering [EU data] to RDF and linking it” without a SPARQL client, very high/impossible.

That’s not just a criticism of RDF. Topic maps made a different mistake but it had the same impact.

The question for any user is “where’s the beef?” What am I gaining? Now, not some unknown number of tomorrows from now. Today!

PS: The EU data cloud has dropped the “Linked Open Data Around-the-Clock” moniker I reported in September of 2011. Same place, different branding. I suspect that is why governments like the web so much. Implementing newspeak policy is just a save away.

Comments Off

Baloo [KDE drops RDF]

Filed under: Metadata,Microformats,RDF — Patrick Durusau @ 4:28 pm

Baloo

From the post:

Baloo is the next generation of the Nepomuk project. It’s responsible for handling user metadata such as tags, rating and comments. It also handles indexing and searching for files, emails, contacts, and so on. Baloo aims to be lighter on resources and more reliable than its parent project.

…

The Nepomuk project started as a research project in the European Union. The goal was to explore the use of relations between data for finding what you are looking for. It was build completely on top of RDF. While RDF is a great from a theoretical point of view, it is not the simplest tool to understand or optimize. The databases which currently exist for RDF are not suited for desktop use.

The Nepomuk developers have tried very hard over the last years to optimize the indexing and searching infrastructure, and they have now come to the conclusion that Nepomuk cannot be further optimized without migrating away from RDF.

RDF also heavily relied on ontologies. These ontologies are a way to describe how the data should be stored and represented. They used the ontologies from the original EU research project – Shared Desktop Ontologies. These ontologies were not designed in a time when it was not very clear how they would work and have sub-optimal performance and ease of use. They are quite vague in certain areas and often duplicate information. This leads to scenarios where it takes forever to figure out how the data should be stored. Additionally, since all the data needs to be stored in RDF, one cannot optimize for one specific data type.

Given these shortcomings and the many lessons learned over the last years the Nepomuk developers decided to drop RDF and rechristen the project under the name of Baloo. You can find more technical background and info on its architecture here.

I suggested to someone in synchronous time that authoring support for schema.org based metadata could be a win-win for users and document processing software.

For users, search appliances, local or even Google, can ingest “lite” schema definitions that provide immediate ROI on adding semantics to your documents. Well, I say immediate, as soon as they are indexed.

That should require no more skill than being able to type, assuming your document software can recognize the terms you use and annotate them properly.

Think of the different between the number of people who can author XML using MS Office or Apache OpenOffice, etc. Now compare that to people who natively author DocBook documents.

If you want a successful strategy, do you follow the one that has resulted in a user base measured in increments of hundred’s of millions or do you prefer the righteous remnant approach with say less than 50,000?

I’m no marketing person but even I know the answer to that one. 😉

PS: There are some ankle biters who complain about the MS Office user numbers. Let’s just say between MS Office and Apache OpenOffice and the other ODF based word processors, that DocBook users are out-numbered by at least 20,000 to 1. Who needs more accurate numbers than that?

PPS: Microformats don’t have the precision that RDF and/or Topic Maps have to offer. But precision without adoption can’t be very precise. With adoption of microformats, more precision can be added as required by particular use cases.

I first saw this in a tweet by Jan Schnasse.

Comments Off

« Newer Posts — Older Posts »

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 5, 2014

February 4, 2014

February 3, 2014

Slim down!!!

February 2, 2014

February 1, 2014