Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 12, 2014

Awesome Awesomeness

Filed under: Computer Science,Programming — Patrick Durusau @ 7:10 pm

Awesome Awesomeness by Alexander Bayandin.

From Nat Torkington’s description in Four short links: 11 July 2014:

…list of curated collections of frameworks and libraries in various languages that do not suck. They solve the problem of “so, I’m new to (language) and don’t want to kiss a lot of frogs before I find the right tool for a particular task”.

I count fifteen (15) languages and six (6) general areas.

Definitely a site to bookmark and share!

Thirty-three Greek Biblical manuscripts added to Digitised Manuscripts

Filed under: Bible,Manuscripts — Patrick Durusau @ 7:01 pm

Thirty-three Greek Biblical manuscripts added to Digitised Manuscripts by Cillian O’Hogan.

From the post:

The third phase of the British Library's Greek Manuscripts Digitisation Project is now well underway. So far, the following items, all Greek biblical items, have been added to Digitised Manuscripts. We will continue to update the blog with new additions over the course of the year, and will also look at some individual manuscripts in more detail in later posts. We are extremely grateful to the foundations and individuals who have funded this project, especially the Stavros Niarchos Foundation, the A. G. Leventis Foundation, Sam Fogg, the Sylvia Ioannou Foundation and the Thriplow Charitable Trust.

Add MS 24112, Four Gospels in Greek (Gregory-Aland 694; Scrivener evan. 598; von Soden ε 502), written throughout with space for a Latin translation, which has been added for a small number of verses. 15th century, possibly Italy.

Add MS 24373, Four Gospels (Gregory-Aland 695; Scrivener evan. 599; von Soden ε 327), with illuminated Evangelist portraits. 13th century. Also online is an old 19th-century binding for this manuscript.

Add MS 24374, Fragments from a Gospel Lectionary with ekphonetic notation (Gregory-Aland l 325; Scrivener evst. 273). 13th century.

Add MS 24376, Four Gospels (Gregory-Aland 696; Scrivener evan. 600; von Soden ε 328), with illuminated Evangelist portraits (St Mark illustrated above). 14th century (illuminations added in the 16th century), Constantinople.

Add MS 24377, Gospel Lectionary (Gregory-Aland l 326; Scrivener evst. 274), with ekphonetic notation, imperfect. 2nd half of the 12th century, possibly from the Monastery of Patir in southern Italy.

Add MS 24378, Menaion for September, October, November, December, January and February (Gregory-Aland l 927; Scrivener evst. 275). 13th/14th century.

Add MS 24379, Gospel Lectionary (Gregory-Aland l 327; Scrivener evst. 276), imperfect. 14th century.

Add MS 24380, Gospel Lectionary (Gregory-Aland l 328; Scrivener evst. 277), with ekphonetic notation, imperfect. 14th century.

Add MS 27860, Gospel Lectionary (Gregory-Aland l 329; Scrivener evst. 278), imperfect at the beginning, with marginal decorations thruoghout. Late 10th/early 11th century, Southern Italy (possibly Capua). Also online is an old 17th-century binding for this manuscript.

Add MS 27861, Gospels (Gregory-Aland e 698; Scrivener evan 602; von Soden ε 436), imperfect (lacking Matthew). 14th century.

Add MS 28815, New Testament, imperfect (Gregory-Aland 699; Scrivener evst. 603; von Soden δ 104), with Evangelist portraits and a silver-gilt plated cover. Mid-10th century, Constantinople. The subject of a recent blog post along with Egerton 3145.

Add MS 28816, New Testament, from Acts onwards (Gregory-Aland 203; Scrivener act. 232; von Soden α 203), with Euthalian apparatus, and other works. Written between 1108 and 1111 by the monk Andreas in March 1111, in the cell of the monk Meletius in the monastery of the Saviour.

Add MS 28818, Gospel Lectionary (Gregory-Aland l 331; Scrivener evst. 280). 1272, written by the monk Metaxares.

Add MS 29713, Gospel Lectionary (Gregory-Aland l 332; Scrivener evst. 62), imperfect at the beginning. 14th century.

Add MS 31208, Gospel Lectionary with ekphonetic notation (Gregory-Aland l 333; Scrivener evst *281), imperfect. 13th century, possibly Constantinople.

Add MS 31920, Gospel Lectionary (Gregory-Aland l 335; Scrivener evst 283), imperfect and mutilated. 12th century, South Italy (possibly Reggio).

Add MS 32051, Lectionary of the Acts and Epistles, imperfect, with ekphonetic notation (Gregory-Aland l 169; Scrivener apost. 52). 13th century.

Add MS 32341, Four Gospels (Gregory-Aland 494; Scrivener evan. 325; von Soden ε 437), imperfect. 14th century.

Add MS 33214, New Testament: Acts and Epistles (Gregory-Aland 1765; von Soden α 486). 14th century.

Add MS 33277, Four Gospels (Gregory-Aland 892; von Soden ε 1016; Scrivener evan. 892). 9th century, with replacement leaves added in the 13th and 16th centuries.

Add MS 34108, Four Gospels (Gregory-Aland 1280; Scrivener evan. 322; von Soden ε 1319). 12th century, with some replacement leaves added in the 15th century.

Add MS 34602, Fragments from two Psalters (Rahlfs-Fraenkel 2017, 1217) (illustrated above). 7th century and 10th century, Egypt.

Add MS 36751, Gospel Lectionary with ekphonetic neumes, called ἐκλογάδι(ον) (Gregory-Aland l 1491). Completed in 1008 at the Holy Monastery of Iviron, Mount Athos, by the scribe Theophanes.

Add MS 36752, Four Gospels (Gregory-Aland 2280). 12th century.

Add MS 37005, Gospel Lectionary (Gregory-Aland l 1493). 11th century.

Add MS 37006, Gospel Lectionary with ekphonetic neumes (Gregory-Aland l 1494 [=l 460]). 12th century, with late 13th-century replacements, including a full-page miniature of Christ and a figure identified as Andronicus II Palaeologus (Byzantine emperor 1282-1328) (illustrated above).

Add MS 38538, New Testament, Acts and Epistles (Gregory-Aland 2484), with Euthalian apparatus. Written by the scribe John in 1312

Add MS 39589, Psalter (Rahlfs 1092) with introduction and commentary based on that of Euthymius Zigabenus (PG 128), attributed in the manuscript to Nicephorus Blemmydes, imperfect, with ornamental headpieces and the remains of a miniature of the Psalmist. 2nd half of the 12th century.

Add MS 39590, New Testament, without the book of Revelation (Gregory-Aland 547; Scrivener evan. 534; von Soden δ 157). 11th century.

Add MS 39593, Four Gospels (Gregory-Aland 550; Scrivener evan. 537; von Soden ε 250), with prefaces taken from the commentary of Theophylact, and synaxaria. 12th century.

Add MS 39612, Revelation (Gregory-Aland 2041; Scrivener apoc. 96; von Soden α1475). The quire-numbers on ff 1v and 10v show the manuscript formed part of a larger volume, possibly Athos, Karakallou 121 (268) (Gregory-Aland 1040). 14th century, possibly Mount Athos.

Add MS 39623, Fragments from a Gospel Lectionary (Gregory-Aland l 1742). Late 14th century, possibly Mount Athos.

Egerton MS 3145, Epistles and Revelation (Gregory-Aland 699; Scrivener paul. 266; von Soden δ 104), concluding portion of the manuscript of the entire New Testament of which Add. MS 28815 is the earlier portion. Mid-10th century, Constantinople. Also online is an old (18th century?) binding for this manuscript.

I know a number of scholars who will be happy to learn of this latest batch of NT manuscripts. (I am awaiting similar projects with 3-D imaging of cuneiform.)

Cillian O’Hogan is fortunate to work at an institution that fosters scholarship, biblical and otherwise.

InterActive Terminology for Europe (IATE)

Filed under: EU — Patrick Durusau @ 6:39 pm

InterActive Terminology for Europe (IATE)

From the about page:

IATE (= “Inter-Active Terminology for Europe”) is the EU’s inter-institutional terminology database. IATE has been used in the EU institutions and agencies since summer 2004 for the collection, dissemination and shared management of EU-specific terminology. The project partners are:

  • European Commission
  • Parliament
  • Council
  • Court of Justice
  • Court of Auditors
  • Economic & Social Committee
  • Committee of the Regions
  • European Central Bank
  • European Investment Bank
  • Translation Centre for the Bodies of the EU

The project was launched in 1999 with the objective of providing a web-based infrastructure for all EU terminology resources, enhancing the availability and standardisation of the information.

IATE incorporates all of the existing terminology databases of the EU’s translation services into a single new, highly interactive and accessible interinstitutional database. The following legacy databases have been imported into IATE, which now contains approximately 1.4 million multilingual entries:

  • Eurodicautom (Commission),
  • TIS (Council),
  • Euterpe (EP),
  • Euroterms (Translation Centre),
  • CDCTERM (Court of Auditors),

For more information, please download the IATE brochure.

I first saw this at: IATE terminology database available for download; contains very large number of legal terms in multiple languages.

I am sure IATE “…contains very large number of legal terms in multiple languages” but I would not trust the mapping for legal purposes until it has been verified.

Download at: http://iate.europa.eu/tbxPageDownload.do

Astropy Tutorials:…

Filed under: Astroinformatics,Python — Patrick Durusau @ 6:23 pm

Astropy Tutorials: Learn how to do common astro tasks with astropy and Python by Adrian Price-Whelan.

From the post:

Astropy is a community-developed Python package intended to provide much of the core functionality and common tools needed for astronomy and astrophysics research (c.f., IRAF, idlastro). In order to provide demonstrations of the package and subpackage features and how they interact, we are announcing Astropy tutorials. These tutorials are aimed to be accessible by folks with little-to-no python experience and we hope they will be useful exercises for those just getting started with programming, Python, and/or the Astropy package. (The tutorials complement the Astropy documentation, which provides more detailed and complete information about the contents of the package along with short examples of code usage.)

The Astropy tutorials work through software tasks common in astronomical data manipulation and analysis. For example, the “Read and plot catalog information from a text file” tutorial demonstrates using astropy.io.ascii for reading and writing ASCII data, astropy.coordinates and astropy.units for converting RA (as a sexagesimal angle) to decimal degrees, and then uses matplotlib for making a color-magnitude diagram an all-sky projection of the source positions.

The more data processing you do in any domain, the better your data processing skills overall.

If you already know Python, take this opportunity to learn some astronomy.

If you already like astronomy, take this opportunity to learn some Python and data processing.

Either way, you can’t lose!

Enjoy!

SEC Bite?

Filed under: Government,SEC,XBRL — Patrick Durusau @ 2:12 pm

SEC Starts Enforcing XBRL Mandate by Michael Cohn.

From the post:

The Securities and Exchange Commission has begun sending letters to public companies that fail to file their financials with all the necessary data using Extensible Business Reporting Language technology.

The SEC began requiring the largest public companies to file their financial statements using XBRL in 2009 and phased in the requirements for smaller issuers over the next two years. XBRL technology uses a data-tagging format that is supposed to make it easier for investors and analysts to compare financial information across companies and industries. Problems with the technology and the inconsistency of the tags used by companies have limited XBRL’s usefulness to many investors, however.

Nevertheless, the SEC has continued to work with vendors and companies on refining the tags and with the Financial Accounting Standards Board on regularly updating the XBRL taxonomy to adjust for any changes in U.S. GAAP. This month, the IRS’s Division of Corporate Finance started sending letters to corporations that have not been meeting the mandate.

As they say, “…it’s an ill wind indeed that doesn’t blow anyone some good.” 😉

Assuming that the SEC becomes persistent with its letters and one presumes some threats of enforcement action, those late to the XBRL party will be adapting their information systems to produce the required reports.

As I pointed out in 2012, topic maps are relevant to the transition to XBRL because:

  1. Some organizations will have legacy accounting systems that require mapping to XBRL.
  2. Even organizations that have transitioned to XBRL will have legacy data that has not.
  3. Transitions to XBRL by different organizations may not reflect the same underlying semantics.

See XBRL.org for details on XBRL.

I first saw this in a tweet by Open Data 500.

Train online with EMBL-EBI

Filed under: Bioinformatics,Information Retrieval — Patrick Durusau @ 1:54 pm

Train online with EMBL-EBI

From the webpage:

Train online provides free courses on Europe’s most widely used data resources, created by experts at EMBL-EBI and collaborating institutes. You do not need to have any previous experience of bioinformatics to benefit from this training. We want to help you to be a highly competent user of our data resources; we are not trying to train you to become a bioinformatician.

You can use Train online to learn in your own time and at your own pace. You can repeat the courses as many times as you like, or just complete part of a course if you want to brush up on how to perform a specific task.

An interesting collection of training materials on bioinformatics resources.

As the webpage says, it won’t train you to be a bioinformatician but it can make you a more effective user of the resource covered.

Keep in mind if you are working in a bioinformatics project or are interested in how other domains organize their information.

I first saw this in a tweet by Neil Saunders which pointed to: Scaling up bioinformatics training online by Ewan Birney.

Dilemmas in a General Theory of Planning [“Wicked” Problems]

Filed under: Problem Solving,Semantics — Patrick Durusau @ 1:39 pm

Dilemmas in a General Theory of Planning by Horst W. J. Rittel and Melvin M. Webber.

Abstract:

The search for scientific bases for confronting problems of social policy is bound to fail, because of the nature of these problems. They are “wicked” problems, whereas science has developed to deal with “tame” problems. Policy problems cannot be definitively described. Moreover, in a pluralistic society there is nothing like the undisputable public good; there is no objective definition of equity; policies that respond to social problems cannot be meaningfully correct or false; and it makes no sense to talk about “optimal solutions” to social problems unless severe qualifications are imposed first. Even worse, there are no “solutions” in the sense of definitive and objective answers.

If you have heard the phrase, “wicked” problems, here is your chance to read the paper that originated that phrase.

Rittel and Webber identify ten (10) properties of wicked problems, allowing for more to exist:

  1. There is no definite formulation of a wicked problem
  2. Wicked problems have no stopping rule
  3. Solutions to wicked problems are not true-or-false, but good-or-bad
  4. There is no immediate and no ultimate test of a solution to a wicked problem
  5. Every solution to a wicked problem is a “one-shot” operation”; because there is no opportunity to learn by trial-and-error, every attempt counts significantly
  6. Wicked problems do not have an enumerable (or an exhaustively describable) set of potential solutions, or is there a well-described set of permissible operations that may be incorporated into the plan
  7. Every wicked problem is essentially unique
  8. Every wicked problem can be considered to be a symptom of another problem
  9. The existence of a discrepancy representing a wicked problem can be explained in numerous ways. The choice of explanation determines the nature of the problem’s resolution.
  10. The planner has no right to be wrong

Important paper to read. It will help you spot “tame” solutions and their assumptions when posed as answers to “wicked” problems.

I first saw this in a tweet by Chris Diehl.

July 11, 2014

Neo4j’s Cypher vs Clojure – Group by and Sorting

Filed under: Clojure,Cypher,Neo4j — Patrick Durusau @ 6:46 pm

Neo4j’s Cypher vs Clojure – Group by and Sorting by Mark Needham.

From the post:

One of the points that I emphasised during my talk on building Neo4j backed applications using Clojure last week is understanding when to use Cypher to solve a problem and when to use the programming language.

A good example of this is in the meetup application I’ve been working on. I have a collection of events and want to display past events in descending order and future events in ascending order.

Mark falls back on Clojure to cure the lack of sorting within a collection in Cypher.

Oleg Kiselyov’s Homepage

Filed under: Functional Programming,Haskell,Programming,Scheme — Patrick Durusau @ 6:27 pm

Oleg Kiselyov’s Homepage

A very nearly plain text homepage that covers:

Algorithms and Data Structures Computation
Continuations Essays
Haskell Image Processing
Lambda-calculus Linguistics
Logic Meta-programming
ML Non-determinism
Numerical Math OS
Probabilistic Programming Programming Languages
Scheme Types
XML

Impressive!

Should be near the top of any seed list for searching on functional programming, Haskell, Scheme, etc.

Generic Zipper: the context of a traversal

Filed under: Functional Programming,Programming — Patrick Durusau @ 4:43 pm

Generic Zipper: the context of a traversal by Oleg Kiselyov.

From the webpage:

Zipper: a derivative of a data structure or of its mapping

Zipper is a functional cursor into a data structure. It lets us navigate to and change, without mutation, an item deeply buried in a tree or a nested record. The result is a new data structure, sharing much of its components with the old one. The latter remains available permitting the instant rollback of the changes. Zippers thus implement copy-on-write updates to data structures.

Zipper, the functional cursor into a data structure, is itself a data structure, derived from the original one. The derived data type D’ for the recursive data type D is D with exactly one hole. Filling-in the hole — integrating over all positions of the hole — gives the data type D back. Zipper is the derivative, pretty much in the calculus sense, of a data type. The zipper-as-datatype view was the original presentation, described by Huet (JFP, 1997) and Hinze and Jeuring (JFP 2001); the data structure derivative was expounded by McBride.

We advocate a different view, emphasizing not the result of navigating through a data structure to a desired item and extracting it, but the process of navigation. Each data structure comes with a method to enumerate its components, representing the data structure as a stream of the nodes visited during the enumeration. To let the user focus on a item and submit an update, the enumeration process should yield control to the user once it reached an item. Co-routines provide exactly the right yielding mechanism. As Chung-chieh Shan aptly put it, “zipper is a suspended walk.’

A page of short abstracts (part of the first one I quote above) followed by references and often code on zippers.

LDBC benchmarks reach Public Draft

Filed under: Benchmarks,Linked Data — Patrick Durusau @ 4:12 pm

LDBC benchmarks reach Public Draft

From the post:

The Linked Data Benchmark Council (LDBC) is reaching a milestone today, June 23 2014, in announcing that two of the benchmarks that it has been developing since 1.5 years have now reached the status of Public Draft. This concerns the Semantic Publishing Benchmark (SPB) and the interactive workload of the Social Network Benchmark (SNB). In case of LDBC, the release is staged: now the benchmark software just runs read-only queries. This will be expanded in a few weeks with a mix of read- and insert-queries. Also, query validation will be added later. Watch this blog for the announcements to come, as this will be a matter of weeks to add.

The Public Draft stage means that the initial software (data generator, query driver) work and an initial technical specification and documentation has been written. In other words, there is a testable version of the benchmark available for anyone who is interested. Public Draft status does not mean that the benchmark has been adopted yet, it rather means that LDBC has come closer to adopting them, but is now soliciting feedback from the users. The benchmarks will remain in this stage at least until October 6. On that date, LDBC is organizing its fifth Technical User Community meeting. One of the themes for that meeting is collecting user feedback on the Public Drafts; which input will be used to either further evolve the benchmarks, or adopt them.

You can also see that we created a this new website and a new logo. This website is different from http://ldbc.eu that describes the EU project which kick-starts LDBC. The ldbcouncil.org is a website maintained by the Linked Data Benchmark Council legal entity, which will live on after the EU project stops (in less than a year). The Linked Data Benchmark Council is an independent, impartial, member-sustained organization dedicated to the creation of RDF and graph data management benchmarks and benchmark practices.

What do you expect with an announcement of a public review draft?

A link to the public review draft?

If so, you are out of luck with the new Linked Data Benchmark Council website. Nice looking website, poor on content.

Let me help out:

The Social Network Benchmark 0.1 draft and supplemental materials.

The Semantic Publishing Benchmark 0.1 draft and supplemental materials.

Pointing readers to drafts makes it easier for them to submit comments. These drafts will remain open for comments “at least until October 6” according to the post.

At which time they will be further evolved or adopted? Suggest you review and get your comments in early.

My Top Clojure Articles

Filed under: Clojure,Functional Programming,Programming — Patrick Durusau @ 10:03 am

My Top Clojure Articles by Adam Bard.

From the post:

For the past few years, most of my posts have been beginner-intermediate essays on various clojure features and coding techniques. Since a lot of people have told me that they like my blog as a genre piece, I decided to pull some of my favorites into one place, and order them by difficulty, from Clojure beginner on up so that folks don’t have to root around.

I really hope to see Clojure become a widely-used general-purpose language, because, although much has been made of its general elegance and its propensity to be written by extremely clever people, I think it has a lot to offer mediocre programmers (like yours truly) with its practical feature-set, strong encouragement of good practices (immutability, pure functions) and useful tools like Leiningen and the excellent lisp REPL.

With that in mind, and because I’m really not past the intermediate level yet, I try to write articles targeted at people who are new to Clojure. And now I have enough such articles that I think it’s worthwhile to assemble them in once place. I’ll try to keep them up to date as I write more in the future.

Just a few of the titles to tempt you into reading the full post:

Clojure in 15 minutes

Five Mistakes Clojure Newbies Make

Acceptable Error Handling in Clojure

There are more where those came from!

Enjoy!

July 10, 2014

FVEY + N

Filed under: NSA,Security — Patrick Durusau @ 8:03 pm

Google Drive security hole leaks users’ files Lisa Vaas.

From the post:

We often repeat this advice from former Naked Security writer Graham Cluley: for a better understanding of how you should approach security in the cloud, simply replace all instances of the words in the cloud with the words on somebody else’s computer.

Google just handed us another opportunity to do just that.

It turns out that Google Drive has been incontinent, dribbling out private data courtesy of a security hole concerning files with embedded URLs.

When someone clicks an embedded hyperlink, they get sent to the website of a third-party website owner.

Unfortunately, the flaw was also letting the website owner – an unauthorized party – view header information, potentially including the original document that included the URL.

Personally I would replace on somebody else’s computer with:

FVEY + N, where N = you + people who share your data/document.

FVEY:

The “Five Eyes”, often abbreviated as “FVEY”, refer to an anglophonic alliance comprising Australia, Canada, New Zealand, the United Kingdom and the United States. [Five Eyes]

Accidental leaks are nothing compared to the legal/illegal flood gates used by FVEY.

Ask yourself, “Do I feel lucky?”

Stupid Tag Tricks [Overview]

Filed under: News,Reporting — Patrick Durusau @ 7:15 pm

Stupid Tag Tricks by Jonathan Stray.

From the post:

Overview’s tags are very powerful, but it may not obvious how to use them best. Here’s a collection of tagging tricks that have been helpful to our users, from Overview developer Jonas Karlsson.

The “tricks” include:

  • Tracking documents for review
  • Grouping Tags
  • Create a visualization from your tags
  • Tag all documents that do not contain tag “abc”
  • Tag all documents that have tags “a” OR “b” OR “c”
  • Tag all documents that have tags “a” AND “b” AND “c”

But, there is no “trick” for discovering when two or more different tags mean the same thing.

If we are annotating a collection of documents separately, we might use different tags to mean the same thing.

The last “tag trick” can collect all those documents together, how do we find out different tags meant the same thing?

If tags had properties, that is key/value pairs that identify the subject they represent, we could search those properties and discover different tags that meant the same thing.

In fact, we could write rules for when different tags represent the same subject.

That would lead to better sharing of tagged documents.

And enhanced result of tagged documents.

Interested?

Light Table 0.6.7

Filed under: Clojure,Editor,Programming — Patrick Durusau @ 6:57 pm

Light Table 0.6.7 by Chris Granger.

From the post:

I just pushed 0.6.7, which while a small release has a few important bug fixes. There’s been a weird seemingly random issue with saving and sometimes evaling that forced you to switch tabs to get things correct again. That has finally been tracked down and fixed. Proxy support was added, initial load of workspace behaviors was fixed, and creating a clojure client by selecting a project.clj has been cleared up to. Thanks to everyone who contributed!

Time to download a new version of Light Table.

Better for errors to be from you than from the iDE. 😉

Data Journalism: Overpromised/Underdelivered?

Filed under: Journalism,News,Reporting — Patrick Durusau @ 3:02 pm

Alberto Cairo: Data journalism needs to up its own standards by Alberto Cairo.

From the post:

Did you know that wearing a helmet when riding a bike may be bad for you? Or that it’s possible to infer the rise of kidnappings in Nigeria from news reports? Or that we can predict the year when a majority of Americans will reject the death penalty (hint: 2044)? Or that it’s possible to see that healthcare prices in the U.S. are “insane” in 15 simple charts? Or that the 2015 El Niño event may increase the percentage of Americans who accept climate change as a reality?

But I have to confess my disappointment with the new wave of data journalism — at least for now. All the questions in the first paragraph are malarkey. Those stories may not be representative of everything that FiveThirtyEight, Vox, or The Upshot are publishing — I haven’t gathered a proper sample — but they suggest that, when you pay close attention at what they do, it’s possible to notice worrying cracks that may undermine their own core principles.

In my present interpretation of his examples, Alberto has good reason to complain.

But that doesn’t mean re-cast any of the stories would be closer to some “truth.” Rather they would be closer to my norms for such stories. Which isn’t the same thing.

Or as Nietzsche would say: There are no facts, only interpretations.

People from presidents on down lay claim to “facts.” Your opponents can be pilloried for ignoring “facts.” Current mis-adventures in domestic and foreign security of the United States are predicated on emotional insecurities packaged as “facts.”

Acknowledging Nietzsche puts all “facts” on an even footing.

Enough diverse “facts” and it is harder to agree to spend $Trillions pursuing a security that is pushed further away with every dollar spent.

Visual Journalism Training Resources

Filed under: Journalism,News,Reporting — Patrick Durusau @ 10:48 am

BBC Opens Up Internal Visual Journalism Training Resources to the Public by Gannon Burgett.

From the post:

Last week, the BBC College of Journalism opened up their training website to the public. Full of educational resources created by and for the internal BBC team, these professional videos and guides run through a number of circumstances and suggestions for approaching visual journalism.

Set to be open for a 12 month trial run, the videos and podcasts cover topics that range from safety when harmed in the field, to iPhone photojournalism, to basic three-point lighting techniques and even videos that show you how to properly use satellite phones when capturing stories in unconventional areas.

A rather extraordinary set of resources!

Should give you a window into how the BBC views news reporting as well as the tools for news reporting on your own.

To see all the resources, see the BBC Academy page.

I first saw this in a tweet by Michael Peter Edson.

Peer Review Ring

Filed under: Peer Review,Transparency — Patrick Durusau @ 10:25 am

Scholarly journal retracts 60 articles, smashes ‘peer review ring’ by Fred Barbash.

From the post:

Every now and then a scholarly journal retracts an article because of errors or outright fraud. In academic circles, and sometimes beyond, each retraction is a big deal.

Now comes word of a journal retracting 60 articles at once.

The reason for the mass retraction is mind-blowing: A “peer review and citation ring” was apparently rigging the review process to get articles published.

You’ve heard of prostitution rings, gambling rings and extortion rings. Now there’s a “peer review ring.”

Favorable reviews were entered using fake identities as part of an open peer review process. The favorable reviews resulted in publication of those articles.

This was a peer review ring that depended upon false identities.

If peer review were more transparent, publications could explore relationships between peer reviewers and who reviewed their papers, grants, proposals, or their prior reviews of authors, projects, for interesting patterns.

I first saw this in a tweet by Steven Strogatz.

Ontology-Based Interpretation of Natural Language

Filed under: Language,Ontology,RDF,SPARQL — Patrick Durusau @ 9:46 am

Ontology-Based Interpretation of Natural Language by Philipp Cimiano, Christina Unger, John McCrae.

Authors’ description:

For humans, understanding a natural language sentence or discourse is so effortless that we hardly ever think about it. For machines, however, the task of interpreting natural language, especially grasping meaning beyond the literal content, has proven extremely difficult and requires a large amount of background knowledge.

The book Ontology-based interpretation of natural language presents an approach to the interpretation of natural language with respect to specific domain knowledge captured in ontologies. It puts ontologies at the center of the interpretation process, meaning that ontologies not only provide a formalization of domain knowlegde necessary for interpretation but also support and guide the construction of meaning representations.

The links under Resources for Ontologies, Lexica and Grammars, as of today return “coming soon.”

Implementations fares a bit better, returning information on various aspects of lemon.

lemon is a proposed meta-model for describing ontology lexica with RDF. It is declarative, thus abstracts from specific syntactic and semantic theories, and clearly separates lexicon and ontology. It follows the principle of semantics by reference, which means that the meaning of lexical entries is specified by pointing to elements in the ontology.

lemon-core

It may just be me but the Lemon model seems more complicated than asking users what identifies their subjects and distinguishes them from other subjects.

Lemon is said to be compatible with RDF, OWL, SPARQL, etc.

But, accurate (to a user) identification of subjects and their relationships to other subjects is more important to me than compatibility with RDF, SPARQL, etc.

You?

I first saw this in a tweet by Stefano Bertolo.

July 9, 2014

MuckRock

Filed under: Government,Government Data,News,Reporting — Patrick Durusau @ 4:53 pm

MuckRock

From the webpage:

MuckRock is an open news tool powered by state and federal Freedom of Information laws and you: Requests are based on your questions, concerns and passions, and you are free to embed, share and write about any of the verified government documents hosted here. Want to learn more? Check out our about page. MuckRock has been funded in part by grants from the Sunlight Foundation, the Freedom of the Press Foundation and the Knight Foundation.

Join Our Mailing List »

An amazing site.

I found MuckRock while looking for documents released by mistake by DHS. DHS Releases Trove of Documents Related to Wrong “Aurora” in Response to Freedom of Information Act (FOIA) Request (Maybe the DHS needs a topic map?)

I’ve signed up for their mailing list. Thinking about what government lies I want to read. 😉

Looks like a great place to use your data mining/analysis skills.

Enjoy!

Publication Quality Graphs With LaTeX

Filed under: Graphs,TeX/LaTeX — Patrick Durusau @ 3:50 pm

The question was asked on the Tex-LaTex Stack Exchange: How to draw graphs in LaTeX?

If you want professional quality graphs, see the answer and resources cited.

Enjoy!

Overview can now read most file formats directly

Filed under: News,Reporting — Patrick Durusau @ 3:38 pm

Overview can now read most file formats directly by Jonathan Stray.

From the post:

Previously, Overview could only read PDF files. (You can also import all documents in a single CSV file, or import a project from DocumentCloud.)

Starting today, you can directly upload documents in a wide variety of file formats. Simply add the files — or entire folders — using the usual file upload page.

Overview will automatically detect the file type and extract the text. Your document will be displayed as a PDF in your browser when you view it. Overview supports a wide variety of formats, including:

  • PDF
  • HTML
  • Microsoft Word (.doc and .docx)
  • Microsoft PowerPoint (.ppt and .pptx)
  • plain text, and also rich text (.rtf)

For a full list, see the file formats that LibreOffice can read.

This is good news!

Pass it on!

Quote for a Terrorism Topic Map?

Filed under: Authoring Topic Maps,Security — Patrick Durusau @ 1:55 pm

Reading:

British Airways has warned that passengers travelling to the US will be banned from their flight if they are unable to turn on their electronic devices when asked.

The airline said passengers will still be banned from travelling and need to reschedule even if they offer to abandon the item. British Airways says US-bound passengers will be BANNED if they can’t turn on mobile phone. British Airways says US-bound passengers will be BANNED if they can’t turn on mobile phone.

reminded me of another quotation about someone slavishly following the lead of another.

But I need you help finding it.

It’s been forty odd years ago and I was reading a young adult account of Benito Mussolini when I ran across an alleged direct quote from Mussolini in the early 1930’s:

If I starting hopping on one leg, that idiot in Munich [Hitler] would start bouncing on his head…

Not really my time period so I am unfamiliar with possible sources to track the alleged quote down. The usual suspects on the WWW have provided no answer.

Pointers?

PS: After no one followed their sycophantic excesses, British Airways backed off banning flyers who abandon non-working devices. UK follows US in banning uncharged devices from flights — and BA floated even tougher rules.

SAMUELS [English Historical Semantic Tagger]

Filed under: Humanities,Linguistics,Semantics,Tagging — Patrick Durusau @ 1:13 pm

SAMUELS (Semantic Annotation and Mark-Up for Enhancing Lexical Searches)

From the webpage:

The SAMUELS project (Semantic Annotation and Mark-Up for Enhancing Lexical Searches) is funded by the Arts and Humanities Research Council in conjunction with the Economic and Social Research Council (grant reference AH/L010062/1) from January 2014 to April 2015. It will deliver a system for automatically annotating words in texts with their precise meanings, disambiguating between possible meanings of the same word, ultimately enabling a step-change in the way we deal with large textual data. It uses the Historical Thesaurus of English as its core dataset, and will provide for each word in a text the Historical Thesaurus reference code for that concept. Textual data tagged in this way can then be accurately searched and precisely investigated, producing results which can be automatically aggregated at a range of levels of precision. The project also draws on a series of research sub-projects which will employ the software thus developed, testing and validating the utility of the SAMUELS tagger as a tool for wide-ranging further research.
….

To really appreciate this project, visit SAMUELS English Semantic Tagger Test Site.

There you can enter up to 2000 English words and select low/upper year boundaries!

Just picking a text at random, ;-), I chose:

Greenpeace flew its 135-foot-long thermal airship over the Bluffdale, UT, data center early Friday morning, carrying the message: “NSA Illegal Spying Below” along with a link steering people to a new web site, StandAgainstSpying.org, which the three groups launched with the support of a separate, diverse coalition of over 20 grassroots advocacy groups and Internet companies. The site grades members of Congress on what they have done, or often not done, to rein in the NSA.

Some terms and Semtag3 by time period:

1500-1600:

  • congress: C09d01 [Sexual intercourse]; E07e16 [Inclination]; E08e12 [Movement towards a thing/person/position]
  • data: 04.10[Unrecognised]
  • thermal: 04.10[Unrecognised]
  • UT: 04.10[Unrecognised]
  • web: B06a07 [Disorders of eye/vision]; B22h08 [Class Arachnida (spiders, scorpions)]; B10 [Biological Substance];

1700-1800

  • congress: S06k17a [Diplomacy]; C09d01 [Sexual intercourse]; E07e16 [Inclination];
  • data: 04.10[Unrecognised]
  • thermal: 04.10[Unrecognised]
  • UT: 04.10[Unrecognised]
  • web: B06a07 [Disorders of eye/vision]; B22h08 [Class Arachnida (spiders, scorpions)]; B10 [Biological Substance];

1800-1900

  • congress: S06k17a [Diplomacy]; C09d01 [Sexual intercourse]; O07 [Conversation];
  • data: H55a [Attestation, witness, evidence];
  • thermal: A04b02 [Spring]; C09a [Sexual desire]; D03c02 [Heat];
  • UT: 04.10[Unrecognised]
  • web: B06a07 [Disorders of eye/vision]; B06d01 [Deformities of specific parts]; B25d [Tools and implements];

1900-2000

  • congress: S06k17a [Diplomacy]; C09d01 [Sexual intercourse]; O07 [Conversation];
  • data: F04v04 [Data]; H55a [Attestation, witness, evidence]; W05 [Information];
  • thermal: A04b02 [Spring]; B28b [Types/styles of clothing]; D03c02 [Heat];
  • UT: 04.10[Unrecognised]
  • web: B06d01 [Deformities of specific parts]; B22h08 [Class Arachnida (spiders, scorpions)]; B10 [Biological Substance];

2000-2014

  • congress: 04.10[Unrecognised]
  • data: 04.10[Unrecognised]
  • thermal: 04.10[Unrecognised]
  • UT: 04.10[Unrecognised]
  • web: 04.10[Unrecognised]

I am assuming that the “04.10[unrecognized]” for all terms in 2000-2014 means there is no usage data for that time period.

I have never heard anyone deny that meanings of words change over time and domain.

What remains a mystery is why the value-add of documenting the meanings of words isn’t obvious?

I say “words,” I should be saying “data.” Remembering the loss of the $125 Million Mars Climate Orbiter. One system read a value as “pounds of force” and another read the same data as “newtons.” In that scenario, ET doesn’t get to call home.

So let’s rephrase the question to: Why isn’t the value-add of documenting the semantics of data obvious?

Suggestions?

July 8, 2014

The EC Brain

Filed under: Funding,Government — Patrick Durusau @ 7:30 pm

Scientists threaten to boycott €1.2bn Human Brain Project by Ian Sample.

From the post:

The world’s largest project to unravel the mysteries of the human brain has been thrown into crisis with more than 100 leading researchers threatening to boycott the effort amid accusations of mismanagement and fears that it is doomed to failure.

The European commission launched the €1.2bn (£950m) Human Brain Project (HBP) last year with the ambitious goal of turning the latest knowledge in neuroscience into a supercomputer simulation of the human brain. More than 80 European and international research institutions signed up to the 10-year project.

But it proved controversial from the start. Many researchers refused to join on the grounds that it was far too premature to attempt a simulation of the entire human brain in a computer. Now some claim the project is taking the wrong approach, wastes money and risks a backlash against neuroscience if it fails to deliver.

In an open letter to the European commission on Monday, more than 130 leaders of scientific groups around the world, including researchers at Oxford, Cambridge, Edinburgh and UCL, warn they will boycott the project and urge others to join them unless major changes are made to the initiative.

If you read Ian’s post and background material he cites, I think you will come away with the impression that all the concerns and charges are valid.

However, the question remains open whether a successful project was ever the goal of the EC? Or was there some other goal, such as funding particular people and groups, for which the project was a convenient vehicle?

If the project succeeded, all well and good but ten years from now, there will have been a decade of other grants with as little chance of success so who would remember this one in particular?

I don’t mean to single out EC projects or even governmental projects for that criticism.

If you remember Moral mazes: the world of corporate managers by Robert Jackall, one of the lessons was that the goal of projects in a corporation isn’t improving the bottom line, success of the project, etc., but rather the allocation of project resources among competing groups.

As more evidence of that mentality, consider the laundry list of failed IT projects undertaken by the U.S. government. From the FBI’s Virtual Case Management System to the now famous and monitored by Green Peace “secret” melting NSA data storage facility in Utah.

Greenpeace airship

The purpose of the melting NSA data center wasn’t to store data (important steps were skipped in the design stage) but to transfer funds to NSA contractors for building and then repairing the data center. Which may or may not actually go into actual use.

If there was actual intent to use the data center, where are the complaints about failure to follow the design? Use of sub-standard materials?

Both the EC Brain and the US Government need a new project strategy: Success isn’t defined by the appropriation and spending of funds. Success is defined by the end results of the project when compared to its original goals.

Imagine having a topic map that traced EC and US funded projects and compared results to original goals.

Anyone interested in a funding investigation that specifies who was paid, who approved, etc?

Unlike Google, the voters should never forget who obtained the benefit of their tax dollars with no appreciable return.

Advanced Time-Series Pipelines

Filed under: Apache Crunch,Time Series — Patrick Durusau @ 6:52 pm

How-to: Build Advanced Time-Series Pipelines in Apache Crunch by Mirko Kämpf.

From the post:

Learn how creating dataflow pipelines for time-series analysis is a lot easier with Apache Crunch.

In a previous blog post, I described a data-driven market study based on Wikipedia access data and content. I explained how useful it is to combine several public data sources, and how this approach sheds light onto the hidden correlations across Wikipedia pages.

One major task in the above was to apply structural analysis to networks reconstructed by time-series analysis techniques. In this post, I will describe a different method: the use of Apache Crunch time-series processing pipelines for large-scale correlation and dependency analysis. The results of these operations will be a matrix or network representation of the underlying data set, which can be further processed in an Apache Hadoop (CDH) cluster via GraphLab, GraphX, or even Apache Giraph.

This article assumes that you know a bit about Crunch. If not, read the Crunch user guide first. Furthermore, this short how-to explains how to extract and re-organize data that is already stored in Apache Avro files, using a Crunch pipeline. All source code for the article is available in the crunch.TS project.

Initial Situation and Goal

In our example dataset, for each measurement period, one SequenceFile was generated. Such a file is called a \u201ctime-series bucket\u201d and contains a key-value pair of types: Text (from Hadoop) and VectorWritable (from Apache Mahout). We use data types of projects, which do not guarantee stability over time, and we are dependent on Java as a programming language because others cannot read SequenceFiles.

The dependency on external libraries, such as the VectorWritable class from Mahout, should be removed from our data representation and storage layer, so it is a good idea to store the data in an Avro file. Such files can also be organized in a directory hierarchy that fits to the concept of Apache Hive partitions. Data processing will be done in Crunch, but for fast delivery of pre-calculated results, Impala will be used.

A more general approach will be possible later on if we use Kite SDK and Apache HCatalog, as well. In order to achieve interoperability between multiple analysis tools or frameworks — and I think this is a crucial aspect in data management, even in the case of an enterprise data hub — you have to think about access patterns early.

Worth your attention as an incentive to learn more about Apache Crunch. Aside from the benefit of learning more about processing time-series data.

Setting up your own Data Refinery

Filed under: Data Mining — Patrick Durusau @ 4:00 pm

Setting up your own Data Refinery by Shawn Graham.

From the post:

I’ve been playing with a Mac. I’ve been a windows person for a long time, so bear with me.

I’m setting up a number of platforms locally for data mining. But since what I’m *really* doing is smelting the ore of data scraped using things like Outwit Hub or Import.io (the ‘mining operation’, in this tortured analogy), what I’m setting up is a data refinery. Web based services are awesome, but if you’re dealing with sensitive data (like oral history interviews, for example) you need something local – this will also help with your ethics board review too. Onwards!

Shawn provides basic Mac setup instructions for:

The same software is available for Windows and *nix platforms.

Enjoy!

Crowdscraping – You Game?

Filed under: Corporate Data,Crowd Sourcing,Open Data,Web Scrapers — Patrick Durusau @ 1:12 pm

Launching #FlashHacks: a crowdscraping movement to release 10 million data points in 10 days. Are you in? by Hera.

From the post:

The success story that is OpenCorporates is very much a team effort – not just the tiny OpenCorporates core team, but the whole open data community, who from the beginning have been helping us in so many ways, from writing scrapers for company registers, to alerting us when new data is available, to helping with language or data questions.

But one of the most common questions has been, “How can I get data into OpenCorporates“. Given that OpenCorporates‘ goal is not just every company in the world but also all the public data that relates to those companies, that’s something we’ve wanted to allow, as we would not achieve that alone, and it’s something that will make OpenCorporates not just the biggest open database of company data in the world, but the biggest database of company data, open or proprietary.

To launch this new era in corporate data, we are launching a #FlashHacks campaign.

Flash What? #FlashHacks.

We are inviting all Ruby and Python botwriters to help us crowdscrape 10 million data points into OpenCorporates in 10 days.

How you can join the crowdscraping movement

  • Join missions.opencorporates.com and sign up!
  • Have a look at the datasets we have listed on the Campaign page as inspiration. You can either write bots for these or even chose your own!
  • Sign up to a mission! Send a tweet pledge to say you have taken on a mission.
  • Write the bot and submit on the platform.
  • Tweet your success with the #FlashHacks tag! Don’t forget to upload the FlashHack design as your twitter cover photo and facebook cover photo to get more people involved.

Join us on our Google Group, share problems and solutions, and help build the open corporate data community.

If you are interested in covering this story, you can view the press release here.

Also of interest: Ruby and Python coders – can you help us?

To join this crowdscrape, sign up at: missions.opencorporates.com.

Tweet, email, post, etc.

Could be the start of a new social activity, the episodic crowdscrape.

Are crowdscrapes an answer to massive data dumps from corporate interests?

I first saw this in a tweet by Martin Tisne.

Introduction to R for Life Scientists:…

Filed under: R,Science — Patrick Durusau @ 12:42 pm

Introduction to R for Life Scientists: Course Materials by Stephen Turner.

From the post:

Last week I taught a three-hour introduction to R workshop for life scientists at UVA’s Health Sciences Library.

[image omitted]

I broke the workshop into three sections:

In the first half hour or so I presented slides giving an overview of R and why R is so awesome. During this session I emphasized reproducible research and gave a demonstration of using knitr + rmarkdown in RStudio to produce a PDF that can easily be recompiled when data updates.

In the second (longest) section, participants had their laptops out with RStudio open coding along with me as I gave an introduction to R data types, functions, getting help, data frames, subsetting, and plotting. Participants were challenged with an exercise requiring them to create a scatter plot using a subset of the built-in mtcars dataset.

We concluded with an analysis of RNA-seq data using the DESeq2 package. We started with a count matrix and a metadata file (the modENCODE pasilla knockout data packaged with DESeq2), imported the data into a DESeqDataSet object, ran the DESeq pipeline, extracted results, and did some basic visualization (MA-plots, PCA, volcano plots, etc). A future day-long course will cover RNA-seq in more detail (intro UNIX, alignment, & quantitation in the morning; intro R, QC, and differential expression analysis in the afternoon).

Pass along to any life scientists you meet and/or review yourself to pickup life science terminology and expectations.

I first saw this in a tweet by Christophe Lalanne.

« Newer PostsOlder Posts »

Powered by WordPress