Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 10, 2015

Spark Release 1.5.0

Filed under: Data Frames,GraphX,Machine Learning,R,Spark,Streams — Patrick Durusau @ 1:42 pm

Spark Release 1.5.0

From the post:

Spark 1.5.0 is the sixth release on the 1.x line. This release represents 1400+ patches from 230+ contributors and 80+ institutions. To download Spark 1.5.0 visit the downloads page.

You can consult JIRA for the detailed changes. We have curated a list of high level changes here:

Time for your Fall Spark Upgrade!

Enjoy!

The Bogus Bogeyman of the Brainiac Robot Overlord

Filed under: Artificial Intelligence — Patrick Durusau @ 10:55 am

The Bogus Bogeyman of the Brainiac Robot Overlord by James Kobielus.

From the post:


One of the most overused science-fiction tropes is that of the super-intelligent “robot overlord” that, through human negligence or malice, has enslaved us all. Any of us can name at least one of these off the tops of our heads (e.g., “The Matrix” series). Fear of this Hollywood-fueled cultural bogeyman has stirred up anxiety about the role of machine learning, cognitive computing, and artificial intelligence (AI) in our lives, as I discussed in this recent IBM Big Data & Analytics Hub blog. It’s even fostering uneasiness about the supposedly sinister potential for our smart devices to become “smarter” than us and thereby invisibly monitor and manipulate our every action. I discussed that matter in this separate blog.

This issue will be with us forever, much the way that UFO conspiracy theorists have kept their article of faith alive in the popular mind since the early Cold War era. In the Hollywood-stoked popular mindset that surrounds this issue, the supposed algorithmic overlords represent the evil puppets dangled among us by “Big Brother,” diabolical “technocrats,” and other villains for whom there’s no Superman who might come to our rescue.

Highly entertaining take on the breathless reports that we have to stop some forms of research now or we will be enslaved and then eradicated by our machines.

You could construct a very large bulldozer and instruct it to flatten Los Angeles but that’s not an AI problem, that an HI (human intelligence) issue.

I first saw this because Bob DuCharme tweeted:

“The Bogus Bogeyman of the Brainiac Robot Overlord”: best article title I’ve seen in a long time

+1! to that!

50 Spies Say ISIS Intelligence Was Cooked

Filed under: Government,Intelligence,Security — Patrick Durusau @ 10:34 am

50 Spies Say ISIS Intelligence Was Cooked by Shane Harris and Nancy A. Youssef.

From the post:

More than 50 intelligence analysts working out of the U.S. military’s Central Command have formally complained that their reports on ISIS and al Qaeda’s branch in Syria were being inappropriately altered by senior officials, The Daily Beast has learned.

The complaints spurred the Pentagon’s inspector general to open an investigation into the alleged manipulation of intelligence. The fact that so many people complained suggests there are deep-rooted, systemic problems in how the U.S. military command charged with the war against the self-proclaimed Islamic State assesses intelligence.

“The cancer was within the senior level of the intelligence command,” one defense official said.

Two other examples of “cooked” intelligence come to mind:

S. Rept. 108-301 – REPORT OF THE SELECT COMMITTEE ON INTELLIGENCE on the U.S. INTELLIGENCE COMMUNITY’S PREWAR INTELLIGENCE ASSESSMENTS ON IRAQ together with ADDITIONAL VIEWS

Some of the results from that “cooked” intelligence include a costly war with Iraq and further destabilization of the Middle East.

The Pentagon Papers (Vietnam).

The “cooked” intelligence in Vietnam resulted in human and environmental costs that have never been adequately tallied.

Anyone, inside or outside the intelligence community who acts “shocked” that intelligence is “cooked” for political ends is either demented or extraterrestrial.

Cooked intelligence is used the intelligence community to justify its existence and in government departments to further their own budgets and agendas. Why would anyone be surprised that politicians cook intelligence for their own ends?

The cult of secrecy around intelligence is what enables the cooking of intelligence. If the information collected by the NSA, CIA and others was dumped onto GitHub on a regular basis, the ability of anyone to “cook” intelligence would be greatly diminished.

Or perhaps better, if intelligence data were available to everyone, then there would be a variety of dishes of “cooked” intelligence to chose from.

For all the frothing cries of “Danger!, Danger!,” that follow every leak of classified data, have you ever seen reports of anyone being called to account based on those leaks?

Of course not! The danger to others from TS/SCI classified data serves to enhance the status of those with clearance and avoids principled disagreement because “they know something you don’t.”

And that’s true, they do know something you don’t. What is often omitted is that what they know is often of no interest to anyone.

NLP for Security: Malicious Language Processing

Filed under: Cybersecurity,Security — Patrick Durusau @ 9:00 am

NLP for Security: Malicious Language Processing by Bobby Filar

From the post:

Natural Language Processing (NLP) is a diverse field in computer science dedicated to automatically parsing and processing human language. NLP has been used to perform authorship attribution and sentiment analysis, as well as being a core function of IBM’s Watson and Apple’s Siri. NLP research is thriving due to the massive amounts of diverse text sources (e.g., Twitter and Wikipedia) and multiple disciplines using text analytics to derive insights. However, NLP can be used for more than human language processing and can be applied to any written text. Data scientists at Endgame apply NLP to security by building upon advanced NLP techniques to better identify and understand malicious code, moving toward an NLP methodology specifically designed for malware analysis—a Malicious Language Processing framework. The goal of this Malicious Language Processing framework is to operationalize NLP to address one of the security domain’s most challenging big data problems by automating and expediting the identification of malicious code hidden within benign code.

Bobby provides pointers to NLP being used for identifying malicious domains, source code analysis, phishing identification and malware family analysis before discussing traditional NLP tasks in a code analysis setting.

For example, how to perform stemming and lemmatization on source code? Or for that matter, what is the equivalent of POS tagging for source code?

More questions than answers but new tools all start that way.

I first saw this in a tweet by Alyona Medelyan.

Copyrighted Inkspots?

Filed under: Intellectual Property (IP) — Patrick Durusau @ 5:57 am

Hacker mag 2600 laughs off Getty Images inkspots copyright claim by Richard Chirgwin.

From the post:

Venerable hacker publication 2600 is fighting off what looks like an early candidate for the most egregious copyright infringement accusation of 2015.

On a 2012 cover, 2600 used an ink-splatter effect. A group naming itself the Trunk Archive – ultimately owned by Getty Images – is now playing the pay-up game because it’s got an image that also has an ink-splatter effect.

“We thought it was a joke for almost an entire day until one of us figured out that they were actually claiming our use of a small bit of ink splatter that was on one of their images was actionable”, the 2600 team wrote on Tuesday.

Richard discloses the source of the 2600 inkspots (not owned by Getty Images) and resources should you receive an “extortion” letter from Trunk Archive.

Copyright enforcement is a non-creative activity and distracts others from being creative. Odd outcome for a policy that alleges it encourages creativity.

September 9, 2015

Anatomy of a malicious email: Crooks exploiting recent Word hole

Filed under: Cybersecurity,Security — Patrick Durusau @ 8:50 pm

Anatomy of a malicious email: Crooks exploiting recent Word hole by Paul Ducklin.

From the post:

SophosLabs has drawn our attention to a new wave of malware attacks using a recent security bug in Microsoft Word.

The bug, known as CVE-2015-1641, was patched by Microsoft back in April 2015 in security bulletin MS15-033.

The vulnerability was declared to be “publicly disclosed,” meaning that its use wasn’t limited only to the sort of crooks who hang out in underground exploit forums.

Of course, turning a potential Remote Code Execution (RCE) vulnerability into a reliably-working exploit isn’t always as easy as it sounds, but that has happened here.

Here’s how the new attacks go down.

Paul does a great job of covering the details of this attack and about Word attachment attacks in general. Highly recommended reading.

He closes security suggestions and one in particular I want to call to your attention:

Avoid opening unexpected or unsolicited attachments.

Write that down!

I don’t care if the president of the enterprise allegedly wrote to you (why would he?).

If it is unexpected/unsolicited, don’t open it.

If you think it is important, call to verify its sender.

Not a perfect defense because a legitimate sender may be infected but it will get you past one entire category of vulnerabilities.

Praise For Conservative Bible Translations

Filed under: Bible,Translation — Patrick Durusau @ 8:20 pm

I don’t often read praise for conservative Bible translations but conservative Bible translations can have unexpected uses:

Linguists use the Bible to develop language technology for small languages reports:


Anders Søgaard and his colleagues from the project LOWLANDS: Parsing Low-Resource Languages and Domains are utilising the texts which were annotated for big languages to develop language technology for smaller languages, the key to which is to find translated texts so that the researchers can transfer knowledge of one language’s grammar onto another language:

“The Bible has been translated into more than 1,500 languages, even the smallest and most ‘exotic’ ones, and the translations are extremely conservative; the verses have a completely uniform structure across the many different languages which means that we can make suitable computer models of even very small languages where we only have a couple of hundred pages of biblical text,” Anders Søgaard says and elaborates:

“We teach the machines to register what is translated with what in the different translations of biblical texts, which makes it possible to find so many similarities between the annotated and unannotated texts that we can produce exact computer models of 100 different languages — languages such as Swahili, Wolof and Xhosa that are spoken in Nigeria. And we have made these models available for other developers and researchers. This means that we will be able to develop language technology resources for these languages similar to those which speakers of languages such as English and French have.”

Anders Søgaard and his colleagues have recently presented their results in the article ‘”If you all you have is a bit of the Bible” at the conference Annual Meeting of the Association of Computational Linguistics.

The abstract for the paper: If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages reads:

We present a simple method for learning part-of-speech taggers for languages like Akawaio, Aukan, or Cakchiquel – languages for which nothing but a translation of parts of the Bible exists. By aggregating over the tags from a few annotated languages and spreading them via word-alignment on the verses, we learn POS taggers for 100 languages, using the languages to bootstrap each other. We evaluate our cross-lingual models on the 25 languages where test sets exist, as well as on another 10 for which we have tag dictionaries. Our approach performs much better (20-30%) than state-of-the-art unsupervised POS taggers induced from Bible translations, and is often competitive with weakly supervised approaches that assume high-quality parallel corpora, representative monolingual corpora with perfect tokenization, and/or tag dictionaries. We make models for all 100 languages available.

All of the resources used in this project, along with their models, can be found at: https://bitbucket.org/lowlands/

Don’t forget conservative Bible translations if you are doing linguistic models.

How to Stop a Smart Car

Filed under: Cybersecurity,Security — Patrick Durusau @ 7:11 pm

Advances in technology render traditional methods of stopping cars obsolete!

No more police roadblocks:

police-roadblock

No more spike strips:

spike-strip

No more PIT maneuvers:

Pit_maneuver.svg

Now you only need a laser pointer!

laserpointer3

John Zorabedian reports in Self-driving cars can be stopped with a laser pointer:


Jonathan Petit was able to launch a denial-of-service attack against a self-driving car by overwhelming the car’s sensors with images of fake vehicles and other objects.

As Petit describes in a paper he will present at Black Hat Europe, he recorded the pulses emitted by objects with a commercial lidar (light detection and ranging) system that self-driving cars use to detect objects.

By beaming the pulses back at a lidar on a self-driving car with a laser pointer, he could force the car into slowing down or stopping to avoid hitting phantom objects.

In an interview with IEEE Spectrum, Petit explained that spoofing objects like cars, pedestrians, or walls was fairly simple; and his attack could be replicated with a kit costing just $60.

Compare the $60 to stop a “smart” car versus $313.94 for a set of MS10 spikes, $45K for a Ford Taurus for the PIT maneuver, or right at $225,000 in cars for the full police roadblock (five cars x $45K).

What lies at the root of the smart car vulnerability?

Unchecked input. The same cause of buffer overflows, SQL injection attacks, etc. A leading cause of computer vulnerabilities is spreading to cars.

Waiting for the day that failure to check input data = strict liability + punitive damages.

Ford got a taste of liability with the exploding Pintos.

Are smart car manufacturers lining up for another taste?

September 8, 2015

Langton’s ant

Filed under: Cellular Automata — Patrick Durusau @ 7:30 pm

Langton’s ant (Wikipedia)

If two-dimensional Turing machines seem too simple, despite their complex emergent behavior, perhaps you would like to push them into three (3) or more dimensions?

The Wikipedia page listed Langton’s Ant in 3D, which has source code so you have a leg up on 3D ants.

Greater than 3D anyone? Understanding that you would encounter all the usual visualization problems of > three dimensional objects. Comparison/analysis of such objects?

I saw this in a tweet by Computer Science.

Investigation of Sound

Filed under: Infographics,Visualization — Patrick Durusau @ 6:52 pm

Investigation of Sound by Dorthy Lei.

From the post:

The word “infographics” has become a cliché nowadays. Whether you are a company trying to present marketing data or innovations to clients; charity that needs to effectively show the way you will spend donations; or, a lecturer sharing information to your peers and students. The question is still the same. How do you show your information in a simple and interesting way to your audience?

The answer is – Infographics

The definition of infographics according to the Design Handbook by Jenn and Ken Visocky O’Grady is: “Information design is about the clear and effective presentation of information. It involves a multi and interdisciplinary approach to communication, combining skills from graphic design, technical and non-technical authoring, psychology, communication theory and cultural studies.” [Thissen, 2004]

We need infographics in many different situations. Presenting survey data, simplifying a complicated idea, explaining how something works and comparing information. This is especially true in today’s world where information is becoming increasing prominent in our daily lives. We can use infographics to make this clearer.

Looking back to six years ago, I was unsure what the word meant. I remember my tutors’ guidance in helping me to explore the concept for myself. I thoroughly enjoyed discovering the many ways in which information can be presented visually.

In one exercise, we were asked to work with a partner and choose a space approximately three meters square. It was here that we would spend time on two occasions, and three hours on each occasion a few weeks apart.

The closing paragraph crystallizes why you should read this post:

An easy-to-read infographic makes information presentable and digestible to its audience. We have different types of infographics. Some are static, while other are interactive, allowing the user to explore and filter information as they please. I am glad I am able to tell the story of things which cannot be seen or touched. I believe this will help us to understand our lives for the better.

What “…things which cannot be seen or touched…” do you want to tell stories about?

Leaping the chasm from proprietary to open: …

Filed under: Open Source — Patrick Durusau @ 4:29 pm

by Bryan Cantrill.

Slides: http://www.slideshare.net/bcantrill/leaping-the-chasm-from-proprietary-to-open-a-survivors-guide.

Full illumos history mentioned in talk: https://www.youtube.com/watch?v=-zRN7XLCRhc

Corporate open source anti-patterns: https://www.youtube.com/watch?v=Pm8P4oCIY3g

Very high energy presentation starting with the early history of software. Great coverage of the history of Solaris.

My favorite quip:

Every thing you think about Oracle is true, is actually truer than you think it could be.

You will greatly enjoy the disclaimer story.

Natural law wrong. – “…assertion that APIs can be copyrighted!”

Opensource projects by Joyent:

SmartDataCenter: https://github.com/joyent/sdc

Manta: https://github.com/joyent/manta

My take away? Despite all the amusing stories and tales, I would have to pick “use a weak copy-left license.”

Alda: A Manifesto and Gentle Introduction

Filed under: Clojure,Music — Patrick Durusau @ 11:02 am

Alda: A Manifesto and Gentle Introduction by Dave Yarwood.

From the webpage:

What is Alda?

2015-08-18-alda

Alda’s ambition is to be a powerful and flexible music programming language that can be used to create music in a variety of genres by typing some code into a text editor and running a program that compiles the code and turns it into sound. I’ve put a lot of thought into making the syntax as intuitive and beginner-friendly as possible. In fact, one of the goals of Alda is to be simple for someone with little-to-no programming experience to pick up and start using. Alda’s tagline, a music programming language for musicians, conveys its goal of being useful to non-programmers.

But while its syntax aims to be as simple as possible, Alda will also be extensive in scope, offering composers a canvas with creative possibilities as close to unlimited as it can muster. I’m about to ramble a little about the inspiring creative potential that audio programming languages can bring to the table; it is my hope that Alda will embody much of this potential.

At the time of writing, Alda can be used to create MIDI scores, using any instrument available in the General MIDI sound set. In the near future, Alda’s scope will be expanded to include sounds synthesized from basic waveforms, samples loaded from sound files, and perhaps other forms of synthesis. I’m envisioning a world where programmers and non-programmers alike can create all sorts of music, from classical to chiptune to experimental soundscapes, using only a text editor and the Alda executable.

In this blog post, I will walk you through the steps of setting up Alda and writing some basic scores.

Whether you want to create new compositions or be able to “hear” what you can read in manuscript form, this looks like an exciting project.

A couple of resources if you are interested in historical material:

The Morgan Library &amp: Museum’s Music Manuscripts Online, which as works by J. S. Bach, Beethoven, Brahms, Chopin, Debussy, Fauré, Haydn, Liszt, Mahler, Massenet, Mendelssohn, Mozart, Puccini, Schubert, and Schumann, and others.

Digital Image Archive of Medieval Music (DIAMM).

The sources archived include all the currently known fragmentary sources of polyphony up to 1550 in the UK (almost all of these are available for study through this website); all the ‘complete’ manuscripts in the UK; a small number of important representative manuscripts from continental Europe; a significant portion of fragments from 1300-1450 from Belgium, France, Italy, Switzerland and Spain. Such a collection of images, created under strict protocols to ensure parity across such a varied collection, has never before been possible, and represents an extraordinary resource for study of the repertory as a whole. Although these manuscripts have been widely studied since their gradual discovery by scholars at various times over the past century, dealing with the repertory as a whole has been hampered by the very wide geographical spread of the manuscripts and the limitations of microfilm or older b/w photography. Fragments are far more numerous than complete sources, but most of them are the sole remaining representatives of lost manuscripts. Some are barely legible and hard to place and interpret. They amount to a rich but widely scattered resource that has been relatively neglected, partly because of difficulty of access, legibility and comparison of materials that are vulnerable to damage and loss.

Being aware, of course, that music notation has evolved over the years and capturing medieval works will require mastery of their notations.

A mapping from any form of medieval notation to Alda I am sure would be of great interest.

A Formerly Secret Backdoor for Hackers (Seagate) [Auto Recall Analogy]

Filed under: Cybersecurity,Security — Patrick Durusau @ 10:08 am

Warning! Seagate Wireless Hard Drives Have a Secret Backdoor for Hackers by Khyati Jain.

From the post:

Several of Seagate’s 3rd generation Wireless Hard drives have a secret backdoor for hackers that puts users’ data at risk.

A Recent study done by the security researchers at Tangible Security firm disclosed an “undocumented Telnet services” with a hard-coded password in Seagate Wireless Hard Drives.

The secret Telnet Vulnerability (CVE-2015-2874) with an inbuilt user account (default username and password — “root”) allows an attacker to access the device remotely, left users data vulnerable to theft.

But wait! There is an easy fix!

Fortunately, there’s an easy fix. Seagate recommended its affected customers to update the device firmware to version 3.4.1.105 to address these issues.

Oh, yeah, but what about all those Seagate Wireless Hard Drives that are already in the supply chain?

FYI: It need not say “Seagate” on the outside to be a vulnerable Seagate product.

Imagine if Ford brake recalls (so far in 2015) offered you a free brake repair kit you could order online. The cost of installation being place on you.

I wonder how well that would go over?

Shifting repair costs and obligations to end users has proven to be a highly ineffectual way of maintaining software security.

I don’t have a magic solution but continuing the current model and expecting different results is madness.

September 7, 2015

DataGraft: Initial Public Release

Filed under: Cloud Computing,Data Conversion,Data Integration,Data Mining — Patrick Durusau @ 3:21 pm

DataGraft: Initial Public Release

As a former resident of Louisiana and given my views on the endemic corruption in government contracts, putting “graft” in the title of anything is like waving a red flag at a bull!

From the webpage:

We are pleased to announce the initial public release of DataGraft – a cloud-based service for data transformation and data access. DataGraft is aimed at data workers and data developers interested in simplified and cost-effective solutions for managing their data. This initial release provides capabilities to:

  • Transform tabular data and share transformations: Interactively edit, host, execute, and share data transformations
  • Publish, share, and access RDF data: Data hosting and reliable RDF data access / data querying

Sign up for an account and try DataGraft now!

You may want to check out our FAQ, documentation, and the APIs. We’d be glad to hear from you – don’t hesitate to get in touch with us!

I followed a tweet from Kirk Borne recently to a demo of Pentaho on data integration. I mention that because Pentaho is a good representative of the commercial end of data integration products.

Oh, the demo was impressive, a visual interface selecting nicely styled icons from different data sources, integration, visualization, etc.

But, the one characteristic it shares with DataGraft is that I would be hard pressed to follow or verify your reasoning for the basis for integrating that particular data.

If it happens that both files have customerID and they both have the same semantic, by some chance, then you can glibly talk about integrating data from diverse resources. If not, well, then your mileage will vary a great deal.

The important point that is dropped by both Pentaho and DataGraft is that data integration isn’t just an issue for today, that same data integration must be robust long after I have moved onto another position.

Like spreadsheets, the next person in my position could just run the process blindly and hope that no one ever asks for a substantive change, but that sounds terribly inefficient.

Why not provide users with the ability to disclose the properties they “see” in the data sources and to indicate why they made the mappings they did?

That is make the mapping process more transparent.

Emacs for developers

Filed under: Editor,Programming — Patrick Durusau @ 2:44 pm

Emacs for developers by Pierre Lecocq.

Speaking of developers:

This document will (hopefully) help you to use Emacs as a developer.

Disclaimer

  • Work in progress, so stay tuned.
  • Exports are handled by the author for the moment and do not reflect the very last changes (but are recent anyway). A scripted solution will be provided soon for contributors.

Contributions or requests

Do not hesitate to send a pull request or open an issue to fix, add, discuss, … etc.

If you are already a developer using Emacs, this document won’t be very interesting. If on the other hand you are a beginning developer or looking for the ultimate in configurability, this may hit the spot.

Even if you are using Emacs for development, scanning the documentation won’t hurt.

I discovered company-mode, “…a text completion framework for Emacs.” Company-mode accepts custom backends. I can think of several potential backends that would be real time savers.

The Language of Developers (humor)

Filed under: Humor — Patrick Durusau @ 2:14 pm

Dare Obasanjo posted this list on Twitter:

developer-language

“Self-documenting” is my personal favorite.

We are all guilty of it, albeit in different contexts.

Specialized terminologies are time savers and improve accuracy, between experts using the same terminology.

They also serve as guild barriers and promote job security.

Yes?

Do you think obscurity is one of the reasons you remain employed?

That’s sad.

September 5, 2015

Topic Map Fodder – Standards For DATA Act

Filed under: DATA Act,Government,Topic Maps — Patrick Durusau @ 8:40 pm

OMB, Treasury finalize standards for DATA Act by Greg Otto.

From the post:

The White House’s Office of Management and Budget announced Monday that after more than a year of discussion, all 57 data standards related to the Digital Accountability and Transparency Act have been finalized.

In a White House blog post, OMB Controller and acting Deputy Director for Management David Mader and Commissioner of the Treasury Department’s Financial Management Service David Lebryk called the standards decree a “key milestone” in making sure the public eventually has a transparent way to track government spending.

Twenty-seven standards were already agreed upon as of July 10, with another 30 open for comment on the act’s GitHub page over the past few weeks. These data points will be part of the law that requires agencies to make their financial, budget, payment, grant and contract data interoperable when published to USASpending.gov, the federal government’s hub of publicly available financial data, by May 9, 2017.

The Data Transparency Coalition, a technology-based nonprofit, released a statement Monday applauding the government’s overall work, yet took exception to the fact the DUNS number is the favored, governmentwide identifier for recipients of federal funds. DUNS numbers are nine-digit identifiers privately owned by the company Dun & Bradstreet Inc. that users must pay for to view corresponding business information.

“Standards” doesn’t mean what Greg thinks it means.

What has been posted by the government are twenty-seven (27) terms agreed on as of July 10th and another thirty (30) terms open for comment.

Terms, not standards.

I suppose that Legal Entity Congressional District is clear enough but that is a long way from being able to track the expenditure of funds in a transparent manner.

As far as the DUNS number complaint, a DUNS number is an accepted international business identifier. Widely accepted. Creating an alternative government identifier to snub Dun & Bradstreet, Inc., is a waste of government funds.

Bear in mind that the DUNS number for any organization is a public fact. Just as street addresses, stock ticker symbols, etc. are public facts. You can collect data about companies and include their DUNS number.

By issuing DUNS numbers, Dun and Bradstreet, Inc. is actually performing a public service by creating international identifiers for businesses. They charge for access to information collected on those entities but so will anyone with a sustainable information trade about businesses.

Refining the DATA Act terms across agencies and adding additional information to make them useful looks like a good use case for topic maps.

Warrant Required For Cell Phone Tracker!

Filed under: Cybersecurity,Security — Patrick Durusau @ 7:56 pm

Does the headline make you feel safer?

It shouldn’t.

Mohit Kumar highlights a new policy from the Justice Department in New Rules Require FBI to Get Warrant for Spying With ‘Stingrays’ Cell Phone Trackers that requires:

  • warrants for use of “Stingrays” or “IMSI catchers”
  • destruction of collected data when target found or once a day
  • disclose annually the number of times stingrays were used

Mohit notes the policy has some truck-sized holes in it but, the policy is praised by some as a “step in the right direction.”

In order to feel safer, you must assume that federal agents are going to follow the new policy. I suggest you take a clue from Clapper openly lying to Congress and remaining unpunished for the odds of federal agents following this policy.

Some will, if you believe that Fox Mulder is an actual human being and not an actor in a television series.

If being tracked by cellphone is a serious issue for you, search for “Faraday cage” or “Faraday bag.” For the background principles, see: Faraday cage.

If having a working cellphone is an absolute requirement, bug cheap phones in bulk and make them single-use burner phones. One call in or out and its recycled. Insure that the accounts for the phones have no common characteristics such as purchaser, means of payment, place of purchase, sequence of numbers, etc.

Inconvenient but real security is by definition always inconvenient.

Otherwise, welcome to being tracked by:

  • federal agents who don’t follow the new policy
  • federal agents who find the loopholes in the policy
  • state and local police who have no such policy
  • private contractors
  • DVs – digital vacuums who sweep up digital debris in high traffic areas to find something worth selling

Speaking of digital vacuums, when was the last time you called your mistress from an airport?

PS: I have never used Burner, which is an app that promises phone numbers you can discard and:

Once a number expires or is burned, the number is permanently removed from your account. Any unused voice minutes or text messages on an expired or burned number cannot be transferred to any new or existing Burners.

Please note: All call, SMS and voicemail history will also be removed from your account once a number expires.

One concern is whether if served with a national security letter, if Burner would capture your data prior to a number being burned?

Perhaps best for avoiding crazy ex-lovers, marketers, etc., and not more serious opponents.

September 4, 2015

The Enemies of Books

Filed under: Books,Censorship — Patrick Durusau @ 8:11 pm

The Enemies of Books by William Blades.

Published in 1888, The Enemies of Books reflects the biases and prejudices of its time, much as our literature transparently carries forward our biases and prejudices.

A valuable reminder in these censorship happy times that knowledge has long be deemed dangerous.

See in particular Chapter 5 Ignorance and Bigotry.

The suppression of “terrorist” literature, from tweets to websites, certainly falls under bigotry and possibly ignorance as well.

Extremist literature of all kinds is heavily repetitive and while it may be exciting to look at what has been forbidden, the thrill wears off fairly quickly. Al Goldstein, the publisher of Screw, once admitted in an interview that after about a year of Screw, if you were paying attention, you would notice the same story lines starting to circle back around.

If that’s a problem with sex, it isn’t hard to imagine that political issues discussed with no nuance, no depth of analysis, no sense of history, but simply “I’m right and X must die!” gets old pretty quickly.

If you believe U.S. reports on Osama bin Lauden, even bin Laden wasn’t on a steady diet of hate literature but had Western materials as well as soft porn.

If the would-be-censors would stop wasting funds on trying to censor social media and the Internet, perhaps they could find the time for historical, nuanced and deep analysis of current issues to publish in an attractive manner.

Censors don’t think and they don’t want you to either.

Let’s disappoint them together!

Apache VXQuery: A Scalable XQuery Implementation

Filed under: XML,XQuery — Patrick Durusau @ 1:34 pm

Apache VXQuery: A Scalable XQuery Implementation by E. Preston Carman Jr., Till Westmann, Vinayak R. Borkar, Michael J. Carey, Vassilis J. Tsotras.

Abstract:

The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data collections and take advantage of parallelism, we have implemented Apache VXQuery, an open-source scalable XQuery processor. The system builds upon two other open-source frameworks — Hyracks, a parallel execution engine, and Algebricks, a language agnostic compiler toolbox. Apache VXQuery extends these two frameworks and provides an implementation of the XQuery specifics (data model, data-model dependent functions and optimizations, and a parser). We describe the architecture of Apache VXQuery, its integration with Hyracks and Algebricks, and the XQuery optimization rules applied to the query plan to improve path expression efficiency and to enable query parallelism. An experimental evaluation using a real 500GB dataset with various selection, aggregation and join XML queries shows that Apache VXQuery performs well both in terms of scale-up and speed-up. Our experiments show that it is about 3x faster than Saxon (an open-source and commercial XQuery processor) on a 4-core, single node implementation, and around 2.5x faster than Apache MRQL (a MapReduce-based parallel query processor) on an eight (4-core) node cluster.

Are you looking for more “pop” in your XQueries? Apache VXQuery may be the answer.

Suitable for the Edgar dataset and the OpenStreetMap dataset.

This maybe what finally pushes me over the edge in choosing between a local cluster or pursuing one of the online options. Just looks too interesting to not want to play with it.

Asleep at the Wheel

Filed under: Library — Patrick Durusau @ 7:09 am

Asleep at the Wheel by Bob Berring.

From the post:

In 1987, those roseate times before social media and Google searches, Dr. James Billington was appointed the United States’ Librarian of Congress. The appointment did not bode well. My voice was part of the outcry over the fact that at a crucial juncture for the role of libraries in the world, a person was taking the helm who was neither a librarian nor an information professional. The New York Times, which I had always viewed as the sage voice of national reason, opined that the job was too big for a librarian. It called for a scholar like Dr. Billington. So it goes.

Berring mentions The Enemies of Books (1880) as a history of the struggles of libraries for centuries.

Let’s hope that Billington’s replacement is a militant librarian who recognizes the need to preserve our existing cultural legacy while embracing what will be the future’s cultural legacy now.

I can’t repeat the one story I know of the dealings of the Library of Congress and an institution in another country but suffice it to say the Library of Congress was more concerned with its status than with finding a way to obtain access to fairly rare biblical materials. To be fair, so were the people I was working for.

I had mistakenly thought that furthering access to rare materials would be a goal of anyone who wanted to “foster biblical scholarship.”

Being assured by each other that they were in fact fostering biblical scholarship was more important than any actual deeds to foster biblical scholarship. As Nietzsche once said, they “told the correct time and made a modest noise while doing so.”

Better Beer Through GPUs:…

Filed under: GPU — Patrick Durusau @ 6:34 am

Better Beer Through GPUs: How GPUs and Deep Learning Help Brewers Improve Their Suds By Brian Caulfield.

From the post:

Jason Cohen isnʼt the first man to look for the solution to his problems at the bottom of a beer glass. But the 24-year-old entrepreneur might be the first to have found it.

Cohenʼs tale would make a great episode of HBOʼs “Silicon Valley” if only his epiphany had taken place in sun-dappled Palo Alto, Calif., rather than blustery State College, Pa. That Cohen has involved GPUs in this sudsy story should surprise no one.

This is the tale of a man who didnʼt master marketing to sell his product — quality control software for beer makers. He had to master it to make his product. The answer, of course, turned out to be free beer. And thatʼs put Cohen right in the middle of the fizzy business of craft brewing, a business that moves so fast heʼs enlisted GPUs to help his software keep up.

If you like micro-brew beer, suggest they contact Jason Cohen at Analytical Flavor Systems.

They will have a more consistent brew and you will too. A win-win situation.

I first saw this in a tweet by Lars Marius Garshol. (of course)

September 3, 2015

Scheduling Tasks and Drawing Graphs…

Filed under: Graphs,Visualization — Patrick Durusau @ 8:47 pm

Scheduling Tasks and Drawing Graphs — The Coffman-Graham Algorithm by Borislav Iordanov.

From the post:

When an algorithm developed for one problem domain applies almost verbatim to another, completely unrelated domain, that is the type of insight, beauty and depth that makes computer science a science on its own, and not a branch of something else, namely mathematics, like many professionals educated in the field mistakenly believe. For example, one of the common algorithmic problems during the 60s was the scheduling of tasks on multiprocessor machines. The problem is, you are given a large set of tasks, some of which depend on others, that have to be scheduled for processing on N number of processors in such a way as to maximize processor use. A well-known algorithm for this problem is the Coffman-Graham algorithm. It assumes that there are no circular dependencies between the tasks, as is usually the case when it comes to real world tasks, except in catch 22 situations at some bureaucracies run amok! To do that, the tasks and their dependencies are modeled as a DAG (a directed acyclic graph). In mathematics, this is also known as a partial order: if a tasks T1 depends on T2, we say that T2 preceeds T1, and we write T2 < T1. The ordering is called partial because not all tasks are related in this precedence relation, some are simply independent of each other and can be safely carried out in parallel.

The post is a 5 minute read and ends beautifully. I promise.

OPM Blows $133m on Post-Breach ID Monitoring

Filed under: Cybersecurity,Government — Patrick Durusau @ 8:36 pm

OPM Blows $133m on Post-Breach ID Monitoring by Phil Muncaster.

From the post:

The US government is set to spend $133m on identity theft protection services for over 21 million victims of the Office of Personnel Management (OPM) breach, despite having failed thus far to inform those affected.

In a statement on Tuesday, the OPM jointly announced with the Department of Defense the award of a $133,263,550 contract to Identity Theft Guard Solutions (ID Experts) for “credit monitoring, identity monitoring, identity theft insurance, and identity restoration services.”

Those affected will get the service free of charge for a period of three years following one of the largest and most damaging data breaches in the US government’s history.

“Millions of individuals, through no fault of their own, had their personal information stolen and we’re committed to standing by them, supporting them, and protecting them against further victimization,” said acting OPM director, Beth Cobert, in a statement.

“And as someone whose own information was stolen, I completely understand the concern and frustration people are feeling.”

Yet the OPM has so far failed to inform those 21.5 million former and current government employees and their families affected by the breach, nearly three months after it first discovered the intrusion.

Since Muncaster is from the UK, I’m not sure he understands that identifying the 21.5 million victims of the OPM hack will be yet another contract and then attempts to notify the 21.5 million victims will be a separate contract, with sub-contracts to measure the effectiveness of the attempts to identify the victims, quality of the notification efforts and the environment impact of the paper generated by the contracts separately and collectively.

The OPM money-hole has only just opened.

Within a couple of years it will be a multi-$billion sized hole and growing.

The XKCD Survey

Filed under: Humor — Patrick Durusau @ 8:22 pm

Everybody likes XKCD so here’s your chance to give something back:

XKCD Survey

Enjoy!

Poor Fantasy Adulterers [Ashley Madison]

Filed under: Cybersecurity,Privacy,Security — Patrick Durusau @ 1:59 pm

Farhad Manjoo writes in Hacking Victims Deserve Empathy, Not Ridicule:


But the theft and disclosure of more than 30 million accounts from Ashley Madison, a site that advertises itself as a place for married people to discreetly set up extramarital affairs, is different. After the hacking, many victims have been plunged into the depths of despair. In addition to those contemplating suicide, dozens have told Mr. Hunt that they feared losing their jobs and families, and they expected to be humiliated among friends and co-workers.

But the victims of the Ashley Madison hacking deserve our sympathy and aid because, with slightly different luck, you or I could just as easily find ourselves in a similarly sorry situation. This breach stands as a monument to the blind trust many of us have placed in our computers — and how powerless we all are to evade the disasters that may befall us when the trust turns out to be misplaced.

Being seen at a high-end restaurant when you are “working late” by your spouse, or your spouse finding condoms (which you don’t use at home) in your jacket, or your boss seeing you exiting a co-worker’s hotel room in a state of undress, differs from a cyberhack outing in what way?

All of those cases would induce fear of losing family, job, and humiliation among friends and co-workers. Yes?

We know now that almost no women used the Ashley Madison site so truth in advertising leads to: “Life’s short. Have a fantasy affair.

The Ashley Madison data should be made publicly available to everyone.

None of the people verified as giving Ashley Madison credit card data and a profile, should ever be given access to any IT system. Ever. (full stop)

Anyone giving information that could be used for blackmail purposes to an online adultery site is a security risk. Best to weed them out of your IT system post-haste.

Victims in a VISA, Mastercard or the OMB hack are different. They supplied information for legitimate purposes and the act of submission carries no potential for blackmail.

Ashley Madison customers supplied personal data, knowing their membership could be used for blackmail purposes.

Perhaps that is too subtle a distinction for the New York Times or the Ashley Madison data has an abundance of yet undisclosed email addresses.

Generating Sequences of Primes in Conway’s Game of Life

Filed under: Cellular Automata,Game of Life — Patrick Durusau @ 11:17 am

Generating Sequences of Primes in Conway’s Game of Life by Nathaniel Johnston.

From the post:

One of the most interesting patterns that has ever been constructed in Conway’s Game of Life is primer, a gun that fires lightweight spaceships that represent exactly the prime numbers. It was constructed by Dean Hickerson way back in 1991, yet arguably no pattern since then has been constructed that’s as interesting. It seems somewhat counter-intuitive at first that the prime numbers, which seem somehow “random” or “unpredictable”, can be generated by this (relatively simple) pattern in the completely deterministic Game of Life.

You may not have a favorite Bible verse but surely you have an opinion on whether prime numbers are ‘somehow “random” or “unpredictable,” yes?

Take a break from the drivel that makes up most news feeds and get some real mental exercise.

The link to Conway’s Game of Life in the original post is broken. I have repaired it in the quote.

There is much to explore at: ConwayLife.com.

Enjoy!

September 2, 2015

Hand-Coloured Bomb Damage Maps of London

Filed under: Government,Mapping,Maps — Patrick Durusau @ 7:23 pm

Hand-Coloured Bomb Damage Maps of London

From the webpage:

The devastation wrought on the capital by the blitz was documented by the architect’s department of London County Council. The impact of the destruction from air raids and V-bombs can still be seen in London today

Bomb Damage Maps 1939-1945 by archivist Laurence Ward was published this week by Thames & Hudson to mark the 75th anniversary of the blitz

Photos of maps for:

  • Bethnal Green, Tower Hamlets and Stepney
  • Waterloo and Elephant & Castle
  • Marylebone, Mayfair and Piccadilly
  • London Bridge, Bermondsey and Wapping
  • King’s Cross, Angel and Barbican
  • Regent’s Park, Euston and Somer’s Town
  • Hampstead Heath, Dartmouth Park and Tufnell Park
  • Deptford and Rotherhithe

The photos are impressive but not of large enough scale to make out details. For that, you will need a copy of Bomb Damage Maps 1939-1945. The current price is £48.00 (without shipping).

As you review this important historical resource, realize that nothing similar will be produced for the U.S.-led wars in Afghanistan, Iraq, Syria, etc.

Rather than confirming and reporting on “allied” bombing strikes, Western news media bases its reports on accounts supplied by the U.S. military and its familiars.

It is certainly possible to have interactive maps that show images of civilian casualties and damages within a matter of days at the outside, but current U.S. military adventures will be some of the least documented on record.

Or should I say least independently documented on record?

Is anyone collating cellphone videos of the results of U.S. airstrikes?

XQuery Exercise

Filed under: XQuery — Patrick Durusau @ 6:54 pm

I was reading a collection of resources on XQuery by Joe Wicentowski (more on that below) when I stumbled on one in particular:

XQuery Questions on Stack Overflow.

Although the examples in slides, books and blog posts are going to be helpful (most of them anyway), they are after all, examples. There are no false steps, solutions work, etc.

Reading XQuery questions on Stack Overflow will be like real life. You will hear unexpected questions to which you don’t know the answer. (OK, for some people that’s rare but it does happen.)

Even if you can’t answer the questions, now, spend a little time trying to puzzle out how it could be answered. Then check your suspicions or answer against those that are posted.

Using the XQuery questions on Stack Overflow that way will do several important things:

  1. You will become familiar with a variety of XQuery resources.
  2. You will see how experienced XQuery users answer XQuery questions.
  3. Over time you will become a contributing member of the XQuery community (if you aren’t already).

Joe has lists of other XQuery resources:

Resources built on or with XQuery

Resources for learning XQuery

GitHub repositories with XQuery

« Newer PostsOlder Posts »

Powered by WordPress