Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 23, 2015

NPR and The “American Bias”

Filed under: Government,Politics,Security — Patrick Durusau @ 2:47 pm

Can you spot the “American bias” both in this story and the reporting by NPR?

U.S. Operations Killed Two Hostages Held By Al-Qaida, Including An American by Krishnadev Calamur:

President Obama offered his “grief and condolences” to the families of the American and Italian aid workers killed in a U.S. counterterrorism operation in January. Both men were held hostage by al-Qaida.

“I take full responsibility for a U.S. government counterterrorism operation that killed two innocent hostages held by al-Qaida,” Obama said.

The president said both Warren Weinstein, an American held by the group since 2011, and Giovanni Lo Porto, an Italian national held since 2012, were “devoted to improving the lives of the Pakistani people.”

Earlier Thursday, the White House in a statement announced the two deaths, along with the killings of two American al-Qaida members.

“Analysis of all available information has led the Intelligence Community to judge with high confidence that the operation accidentally killed both hostages,” the White House statement said. “The operation targeted an al-Qa’ida-associated compound, where we had no reason to believe either hostage was present, located in the border region of Afghanistan and Pakistan. No words can fully express our regret over this terrible tragedy.”

Exact numbers of casualties from American drone strikes are hard to come by but current estimates suggest that more people have died from drone attacks than in 9/11. A large number of those people were not the intended targets but civilians, including hundreds of children. A Bureau of Investigative Journalism report has spreadsheets you can download to find the specifics about drone strikes in particular countries.

Let’s pause to hear the Obama Administration’s “grief and condolences” over the deaths of civilians and children in each of those strikes:

 
 
 
 
 

That’s right, the Obama Administration has trouble admitting any civilians or children have died as a result of its drone war. Perhaps trying to avoid criminal responsibility for their actions. But it certainly has not expressed any “grief and condolences” over those deaths.

Jeff Bachman, of American University, estimates that between twenty-eight (28) and thirty-five (35) civilians die for every one (1) person killed on the Obama “kill” list in Pakistan alone. Drone Strikes: Are They Obama’s Enhanced Interrogation Techniques?

You will notice that NPR reporting does not contrast Obama’s “grief and condolences” for the deaths of two hostages (one of who was American) with his lack of any remorse over the deaths of civilians and children in other drone attacks.

Obama’s lack of remorse over the deaths of innocents in other drone attacks, reportedly isn’t unusual for war criminals. War criminals see their crimes as justified by the pursuit of a goal worth more than innocent human lives. Or in this case, more valuable than non-American innocent lives.

April 22, 2015

A Scary Earthquake Map – Oklahoma

Filed under: Environment,Government,Government Data — Patrick Durusau @ 8:15 pm

Earthquakes in Oklahoma – Earthquake Map

OK-earthquakes

Great example of how visualization can make the case that “standard” industry practices are in fact damaging the public.

The map is interactive and the screen shot above is only one example.

The main site is located at: http://earthquakes.ok.gov/.

From the homepage:

Oklahoma experienced 585 magnitude 3+ earthquakes in 2014 compared to 109 events recorded in 2013. This rise in seismic events has the attention of scientists, citizens, policymakers, media and industry. See what information and research state officials and regulators are relying on as the situation progresses.

The next stage of data mapping should be identifying the owners or those who profited from the waste water disposal wells and their relationships to existing oil and gas interests, as well as their connections to members of the Oklahoma legislature.

What is it that Republicans call it? Ah, accountability, as in holding teachers and public agencies “accountable.” Looks to me like it is time to hold some oil and gas interests and their owners, “accountable.”

PS: Said to not be a “direct” result of fracking but of the disposal of water used for fracking. Close enough for my money. You?

Gathering, Extracting, Analyzing Chemistry Datasets

Filed under: Cheminformatics,Chemistry,Curation,Data Aggregation,Data Collection — Patrick Durusau @ 7:38 pm

Activities at the Royal Society of Chemistry to gather, extract and analyze big datasets in chemistry by Antony Williams.

If you are looking for a quick summary of efforts to combine existing knowledge resources in chemistry, you can do far worse than Antony’s 118 slides on the subject (2015).

I want to call special attention to Slide 107 in his slide deck:

chemistry-antony-22April2015

True enough, extraction is problematic, expensive, inaccurate, etc., all the things Antony describes. And I would strongly second all of what he implies is the better practice.

However, extraction isn’t just a necessity for today or for a few years, extraction is going to be necessary so long as we keep records about chemistry or any other subject.

Think about all the legacy materials on chemistry that exist in hard copy format just for the past two centuries. To say nothing of all of still older materials. It is more than unfortunate to abandon all that information simply because “modern” digital formats are easier to manipulate.

That was’t what Antony meant to imply but even after all materials have been extracted and exist in some form of digital format, that doesn’t mean the era of “extraction” will have ended.

You may not remember when atomic chemistry used “punch cards” to record isotopes:

isotope-punch-card

An isotope file on punched cards. George M. Murphy J. Chem. Educ., 1947, 24 (11), p 556 DOI: 10.1021/ed024p556 Publication Date: November 1947.

Today we would represent that record in…NoSQL?

Are you confident that in another sixty-eight (68) years we will still be using NoSQL?

We have to choose from the choices available to us today, but we should not deceive ourselves into thinking our solution will be seen as the “best” solution in the future. New data will be discovered, new processes invented, new requirements will emerge, all of which will be clamoring for a “new” solution.

Extraction will persist as long as we keep recording information in the face of changing formats and requirements. We can improve that process but I don’t think we will ever completely avoid it.

QUANTUM-type packet injection attacks [From NSA to Homework]

Filed under: Cybersecurity,NSA,Security — Patrick Durusau @ 4:12 pm

QUANTUM-type packet injection attacks

From the homework assignment:

CSE508: Network Security (PhD Section), Spring 2015

Homework 4: Man-on-the-Side Attacks

Part 1:

The MotS injector you are going to develop, named ‘quantuminject’, will capture the traffic from a network interface in promiscuous mode, and attempt to inject spoofed responses to selected client requests towards TCP services, in a way similar to the Airpwn tool.

Part 2:

The MotS attack detector you are going to develop, named ‘quantumdetect’, will capture the traffic from a network interface in promiscuous mode, and detect MotS attack attempts. Detection will be based on identifying duplicate packets towards the same destination that contain different TCP payloads, i.e., the observation of the attacker’s spoofed response followed by the server’s actual response. You should make every effort to avoid false positives, e.g., due to TCP retransmissions.

See the homework details for further requirements and resources.

If you need a starting point for “Man-on-the-Side Attacks,” I saw Bruce Schneier recommend: Our Government Has Weaponized the Internet. Here’s How They Did It by Nicholas Weaver.

You may also want to read: Attacking Tor: how the NSA targets users’ online anonymity by Bruce Schneier, but with caveats.

For example, Bruce says:

To trick targets into visiting a FoxAcid server, the NSA relies on its secret partnerships with US telecoms companies. As part of the Turmoil system, the NSA places secret servers, codenamed Quantum, at key places on the internet backbone. This placement ensures that they can react faster than other websites can. By exploiting that speed difference, these servers can impersonate a visited website to the target before the legitimate website can respond, thereby tricking the target’s browser to visit a Foxacid server.

In the academic literature, these are called “man-in-the-middle” attacks, and have been known to the commercial and academic security communities. More specifically, they are examples of “man-on-the-side” attacks.

They are hard for any organization other than the NSA to reliably execute, because they require the attacker to have a privileged position on the internet backbone, and exploit a “race condition” between the NSA server and the legitimate website. This top-secret NSA diagram, made public last month, shows a Quantum server impersonating Google in this type of attack.

Have you heard the story of the mountain hiker who explained he was wearing sneakers instead of boots in case he and his companion were chased by a bear? The companion pointed out that no one can outrun a bear, to which the mountain hiker replied, “I don’t have to outrun the bear, I just have to outrun you.

A man-in-the-middle attack can be made from a privileged place on the Internet backbone, but that’s not a requirement. The only requirement is that my “FoxAcid” server has to respond more quickly than the website a user is attempting to contact. That hardly requires a presence on the Internet backbone. I just need to out run the packets from the responding site.

Assume I want to initiate a man-on-the-side attack against a user or organization at a local university. All I need do is obtain access to the university connection to the Internet, on the university side of the connection and by definition I am going to be faster than any site remote to the university.

So I would disagree with Bruce’s statement:

They are hard for any organization other than the NSA to reliably execute, because they require the attacker to have a privileged position on the internet backbone, and exploit a “race condition” between the NSA server and the legitimate website.

Anyone can do man-on-the-side attacks, the only requirement is being faster than the responding computer.

The NSA wanted to screw everyone on the Internet, hence the need to be on the backbone. If you are less ambitious, you can make do with far less expensive and rare resources.

TIkZ & PGF

Filed under: Graphics,TeX/LaTeX — Patrick Durusau @ 3:08 pm

TIkZ & PGF by Till Tantau.

pgf-3.0

From the introduction:

Welcome to the documentation of TikZ and the underlying pgf system. What began as a small LaTEX style for creating the graphics in my (Till Tantau’s) PhD thesis directly with pdfLATEX has now grown to become a full-flung graphics language with a manual of over a thousand pages. The wealth of options offered by TikZ is often daunting to beginners; but fortunately this documentation comes with a number slowly-paced tutorials that will teach you almost all you should know about TikZ without your having to read the rest….

The examples will make you want to install the package just to see if you can duplicate them. Some of the graphics I am unlikely to ever use. On the other hand, going over this manual in detail will enable you to recognize what is possible, graphically speaking.

This is truly going to be a lot of fun!

Enjoy!

April 21, 2015

Liability as an Incentive for Secure Software?

Filed under: Government,Law,Security — Patrick Durusau @ 7:54 pm

Calls Arise to Make Developers Liable for Insecure Software by Sean Doherty.

The usual suspects show up in Sean’s post:


Dan Geer, chief information security officer at the CIA’s venture capital arm, In-Q-Tel, is often in the news arguing for legal measures to make companies accountable for developing vulnerable code. In his keynote address at the Black Hat USA conference in Las Vegas in August 2014, Geer said he would place the onus of security onto software developers.

In a recent Financial Times story, Dave Merkel, chief technology officer at IT security vendor FireEye, said, “Attackers are specifically looking for the things that code was not designed to do. As a software creator, you can test definitively for all the things that your software should do. But testing it for all things it shouldn’t do is an infinite, impossible challenge.”

But Sean adds an alternative to liability versus no-liability:


In today’s software development environment, there is no effective legal framework for liability. But perhaps lawyers are looking for the wrong framework.

The FT story also quoted Wolfgang Kandek, CTO at IT security vendor Qualys: “Building software isn’t like building a house or a bridge or a ship, where accepted engineering principles apply across whole industries.”

Like Greer, there are people in the software industry saying code development should become like the building industry—with standards. An organization of computing professionals, the IEEE Computer Society, found a working group to address the lack of software design standards: Center for Secure Design (CSD).

Liability is coming, its up to the software community to decide how to take that “hit.”

Relying on the courts to work out what “negligence” means for software development will take decades and lead to a minefield of mixed results. States will vary from each other and the feds will no doubt have different standards by circuits, at least for a while.

Standards for software development? Self-imposed standards that set a high but attainable bar that demonstrate improved results to users are definitely preferable to erratic and costly litigation.

Your call.

Imagery Processing Pipeline Launches!

Filed under: Geographic Data,Geography,Geophysical,Image Processing,Maps — Patrick Durusau @ 7:37 pm

Imagery Processing Pipeline Launches!

From the post:

Our imagery processing pipeline is live! You can search the Landsat 8 imagery catalog, filter by date and cloud coverage, then select any image. The image is instantly processed, assembling bands and correcting colors, and loaded into our API. Within minutes you will have an email with a link to the API end point that can be loaded into any web or mobile application.

Our goal is to make it fast for anyone to find imagery for a news story after a disaster, easy for any planner to get the the most recent view of their city, and any developer to pull in thousands of square KM of processed imagery for their precision agriculture app. All directly using our API

There are two ways to get started: via the imagery browser fetch.astrodigital.com, or directly via the the Search and Publish APIs. All API documentation is on astrodigital.com/api. You can either use the API to programmatically pull imagery though the pipeline or build your own UI on top of the API, just like we did.

The API provides direct access to more than 300TB of satellite imagery from Landsat 8. Early next year we’ll make our own imagery available once our own Landmapper constellation is fully commissioned.

Hit us up @astrodigitalgeo or sign up at astrodigital.com to follow as we build. Huge thanks to our partners at Development Seed who is leading our development and for the infinitively scalable API from Mapbox.

If you are interested in Earth images, you really need to check this out!

I haven’t tried the API but did get a link to an image of my city and surrounding area.

Definitely worth a long look!

Why nobody knows what’s really going into your food

Filed under: Government,Transparency — Patrick Durusau @ 4:14 pm

Why nobody knows what’s really going into your food by Phillip Allen, et al.

From the webpage:

Why doesn’t the government know what’s in your food? Because industry can declare on their own that added ingredients are safe. It’s all thanks to a loophole in a 57-year-old law that allows food manufacturers to circumvent the approval process by regulators. This means companies can add substances to their food without ever consulting the Food and Drug Administration about potential health risks.

The animation is quite good and worth your time to watch.

If you think the animation is disheartening, you could spend some time at the Generally Recognized as Safe (GRAS) page over at the FDA.

From the webpage:

“GRAS” is an acronym for the phrase Generally Recognized As Safe. Under sections 201(s) and 409 of the Federal Food, Drug, and Cosmetic Act (the Act), any substance that is intentionally added to food is a food additive, that is subject to premarket review and approval by FDA, unless the substance is generally recognized, among qualified experts, as having been adequately shown to be safe under the conditions of its intended use, or unless the use of the substance is otherwise excluded from the definition of a food additive.

Links to legislation, regulations, applications, and other sources of information.

Leaving the question of regulation to one side, every product should be required to list all of its ingredients. In addition to the package, it should be required to post a full chemical analysis online.

Disclosure would not reach everyone but at least careful consumers would have a sporting chance to discover what they are eating.

IPew Attack Map

Filed under: Cybersecurity,Security — Patrick Durusau @ 3:40 pm

IPew Attack Map

From the webpage:

pewpew

(a collaborative effort by @alexcpsec & @hrbrmstr)

Why should security vendors be the only ones allowed to use silly, animated visualizations to “compensate”? Now, you can have your very own IP attack map that’s just as useful as everyone else’s.

IPew is a feature-rich, customizable D3 / javascript visualization, needing nothing more than a web server capable of serving static content and a sense of humor to operate. It’s got all the standard features that are expected including:

  • Scary dark background!
  • Source & destination country actor/victim attribution!
  • Inane attack names!

BUT, it has one critical element that is missing from the others: SOUND EFFECTS! What good is a global cyberbattle without some cool sounds.

In all seriousness, IPew provides a simple framework – based on Datamaps – for displaying cartographic attack data in a (mostly) responsive way and shows how to use dynamic data via javascript event timers and data queues (in case you’re here to learn vs have fun – or both!).

One important feature, if you work inside the beltway in DC, you can set all attacks as originating from North Korea or China.

Instructive and fun!

Enjoy!

The Vocabulary of Cyber War

Filed under: Cybersecurity,Government,Security,Vocabularies — Patrick Durusau @ 3:15 pm

The Vocabulary of Cyber War

From the post:

At the 39th Joint Doctrine Planning Conference, a semiannual meeting on topics related to military doctrine and planning held in May 2007, a contractor for Booz Allan Hamilton named Paul Schuh gave a short presentation discussing doctrinal issues related to “cyberspace” and the military’s increasing effort to define its operations involving computer networks. Schuh, who would later become chief of the Doctrine Branch at U.S. Cyber Command, argued that military terminology related to cyberspace operations was inadequate and failed to address the expansive nature of cyberspace. According to Schuh, the existing definition of cyberspace as “the notional environment in which digitized information is communicated over computer networks” was imprecise. Instead, he proposed that cyberspace be defined as “a domain characterized by the use of electronics and the electromagnetic spectrum to store, modify, and exchange data via networked systems and associated physical infrastructures.”

Amid the disagreements about “notional environments” and “operational domains,” Schuh informed the conference that “experience gleaned from recent cyberspace operations” had revealed “the necessity for development of a lexicon to accommodate cyberspace operations, cyber warfare and various related terms” such as “weapons consequence” or “target vulnerability.” The lexicon needed to explain how the “‘four D’s (deny, degrade, disrupt, destroy)” and other core terms in military terminology could be applied to cyber weapons. The document that would later be produced to fill this void is The Cyber Warfare Lexicon, a relatively short compendium designed to “consolidate the core terminology of cyberspace operations.” Produced by the U.S. Strategic Command’s Joint Functional Command Component – Network Warfare, a predecessor to the current U.S. Cyber Command, the lexicon documents early attempts by the U.S. military to define its own cyber operations and place them within the larger context of traditional warfighting. A version of the lexicon from January 2009 obtained by Public Intelligence includes a complete listing of terms related to the process of creating, classifying and analyzing the effects of cyber weapons. An attachment to the lexicon includes a series of discussions on the evolution of military commanders’ conceptual understanding of cyber warfare and its accompanying terminology, attempting to align the actions of software with the outcomes of traditional weaponry.

A bit dated, 2009, particularly in terms of the understanding of cyber war but possibly useful for leaked documents from that time period and as a starting point to study the evolution of terminology in the area.

To the extent this crosses over with cybersecurity, you may find the A Glossary of Common Cybersecurity Terminology (NICCS) or Glossary of Information Security Terms, useful. There is overlap between the two.

There are several information sharing efforts under development or in place, which will no doubt lead to the creation of more terminology.

Syrian Travel Guide, Courtesy of the FBI

Filed under: Government,Politics,Security — Patrick Durusau @ 2:28 pm

More Arrests of Americans Attempting to Fight for ISIL in Syria by Bobby Chesney.

For the post:

Six Somali-American men from the Minneapolis area have been arrested on material support charges, based on allegations that they were attempting to travel to Syria to join ISIL. The complaint and corresponding FBI affidavit are posted here. Note that the complaint is a handy case study in the variety of investigative techniques that FBI might employ in a case of this kind, with examples including open-source review of a suspect’s Twitter and Facebook accounts, use of a CHS (“Confidential Human Source”) who previously had been part of this same material support conspiracy, review of call records to establish connections among the defendants, review of bank records, use of video footage recorded in public places, and review of instant messages exchanged via Kik (a footnote on p. 9 of the affidavit notes that Kik “does not maintain records of user conversations”).

Take special note of:

Note that the complaint is a handy case study in the variety of investigative techniques that FBI might employ in a case of this kind, with examples including open-source review of a suspect’s Twitter and Facebook accounts, use of a CHS (“Confidential Human Source”) who previously had been part of this same material support conspiracy, review of call records to establish connections among the defendants, review of bank records, use of video footage recorded in public places, and review of instant messages exchanged via Kik (a footnote on p. 9 of the affidavit notes that Kik “does not maintain records of user conversations”).

If you seriously want to travel to Syria, for reasons that seem sufficient to you, print out the FBI complaint in this case and avoid each and every one of the activities and statements (or statements of that kind), detailed in the complaint.

If you engage in any of those activities or make statements of that sort, your legitimate travel plans to Syria may be disrupted.

Any aid these six defendants could have provided to ISIL would have been more accidental than on purpose. If being nearly overwhelmed with the difficulty of traveling overseas isn’t enough of a clue as to the defendant’s competence, their travel arrangements would have been made more bizarre only by wearing a full Ronald McDonald costume to the airport. One day in a foreign country before returning?

I understand. Idealistic young people have always wanted to join causes larger than themselves. Just taking recent history into account, there were the Freedom Riders in the 1960’s, along with the Anti-War Movement of the same era. And they want to join those causes despite the orthodoxy being preached and enforced by secular governments.

Personally, I don’t see anything wrong with opposition to corrupt, U.S.-supported Arab governments. To the extent ISIL does exactly that, its designation as a “terrorist” organization is ill-founded. Terrorist designations are more political than moral.

Here’s a suggestion:

IS/ISIL seems to be short on governance expertise, however well it has been doing in terms of acquiring territory. Territory is ok, but effective governance gives a greater reason to be invited to the bargaining table.

Under 18 U.S.C. 2339B (j), there is an exception:

No person may be prosecuted under this section in connection with the term “personnel”, “training”, or “expert advice or assistance” if the provision of that material support or resources to a foreign terrorist organization was approved by the Secretary of State with the concurrence of the Attorney General. The Secretary of State may not approve the provision of any material support that may be used to carry out terrorist activity (as defined in section 212(a)(3)(B)(iii) of the Immigration and Nationality Act).

I’m not saying it is likely, but asking the State Department for permission to supply governance, medical expertise, civil engineers, etc. are all necessary aspects of governance that IS/ISIL needs just as much as fighters.

Yes, I know, doing the administrative work of government isn’t as romantic as riding into battle on a “technical” but it is just as necessary.

PS: If anyone is seriously interested, I can collate the FBI complaint with similar complaints and create a “So You Want to Travel to Syria?” document that lists all the statements and activities to avoid.

Aside to the FBI: Syria is going to need civil engineers, etc., no matter who “wins.” Putting people on productive paths is far more useful than feeding and feeding off of desires to make an immediate difference.

Security Mom (Violence In Your Own Backyard)

Filed under: Security — Patrick Durusau @ 1:03 pm

Security Mom by Juliette Kayyem.

Juliette describes this new podcast series:

My goal with every guest on this podcast– whether it’s a sneak peek into the war room, a debate between friends, or a revealing conversation from the fronts lines of homeland security — is to bring it home for you. We’re going to unpack how this strange and secretive world works, and give you a new perspective on the challenges, successes, and failures we all confront to to keep our nation and our families safe.

What do you want to hear from me? What security issues are on your mind? Email me at securitymom@wgbh.org, or find me on Twitter: @JulietteKayyem.

The first episode: Inside Command And Control During The Boston Marathon Bombings by WGBH News & Juliette Kayyem.

Former Boston Police Commissioner Ed Davis was in command and control during the week of the Boston Marathon bombings in April 2013. On the eve of the second anniversary of the bombing, he details incredible behind-the-scenes decisions during the 100 hours spent in pursuit of Tamerlan and Dzhokhar Tsarnaev.

Not deeply technical but promises to be an interesting window on how security advocates view the world.

Juliette’s reaction to violence in her “backyard” wasn’t unexpected but was still interesting.

Transpose her reaction to individuals and families who have experienced U.S. drone strikes in “their” backyards.

Do you think their reactions are any different?

“Explanations” of violence, including drone strikes, only “work” for the perpetrators of such violence. Something to keep in mind as every act of violence makes security more and more elusive.

I first saw this in a blog post by Jack Goldsmith.

April 20, 2015

Sony at Wikileaks! (MPAA Privacy versus Your Privacy)

Filed under: Cybersecurity,Privacy,Security,Wikileaks — Patrick Durusau @ 6:23 pm

Sony at Wikileaks!

From the press release:

Today, 16 April 2015, WikiLeaks publishes an analysis and search system for The Sony Archives: 30,287 documents from Sony Pictures Entertainment (SPE) and 173,132 emails, to and from more than 2,200 SPE email addresses. SPE is a US subsidiary of the Japanese multinational technology and media corporation Sony, handling their film and TV production and distribution operations. It is a multi-billion dollar US business running many popular networks, TV shows and film franchises such as Spider-Man, Men in Black and Resident Evil.

In November 2014 the White House alleged that North Korea’s intelligence services had obtained and distributed a version of the archive in revenge for SPE’s pending release of The Interview, a film depicting a future overthrow of the North Korean government and the assassination of its leader, Kim Jong-un. Whilst some stories came out at the time, the original archives, which were not searchable, were removed before the public and journalists were able to do more than scratch the surface.

Now published in a fully searchable format The Sony Archives offer a rare insight into the inner workings of a large, secretive multinational corporation. The work publicly known from Sony is to produce entertainment; however, The Sony Archives show that behind the scenes this is an influential corporation, with ties to the White House (there are almost 100 US government email addresses in the archive), with an ability to impact laws and policies, and with connections to the US military-industrial complex.

WikiLeaks editor-in-chief Julian Assange said: “This archive shows the inner workings of an influential multinational corporation. It is newsworthy and at the centre of a geo-political conflict. It belongs in the public domain. WikiLeaks will ensure it stays there.”

Lee Munson writes in WikiLeaks publishes massive searchable archive of hacked Sony documents,


According to the Guardian, former senator Chris Dodd, chairman of the MPAA, wrote how the republication of this information signifies a further attack on the privacy of those involved:

This information was stolen from Sony Pictures as part of an illegal and unprecedented cyberattack. Wikileaks is not performing a public service by making this information easily searchable. Instead, with this despicable act, Wikileaks is further violating the privacy of every person involved.

Hacked Sony documents soon began appearing online and were available for download from a number of different sites but interested parties had to wade through vast volumes of data to find what they were looking for.

WikiLeaks’ new searchable archive will, sadly, make it far easier to discover the information they require.

I don’t see anything sad about the posting of the Sony documents in searchable form by Wikileaks.

If anything, I regret there aren’t more leaks, breaches, etc., of both corporate and governmental document archives. Leaks and breaches that should be posted “as is” with no deletions by Wikileaks, the Guardian or anyone else.

Chris Dodd’s privacy concerns aren’t your privacy concerns. Not even close.

Your privacy concerns (some of them):

  • personal finances
  • medical records
  • phone calls (sorry, already SOL on that one)
  • personal history and relationships
  • more normal sort of stuff

The MPAA, Sony and such, have much different privacy concerns:

  • concealment of meetings with and donations to members of government
  • concealment of hiring practices and work conditions
  • concealment of agreements with other businesses
  • concealment of offenses against the public
  • concealment of the exercise of privilege

Not really the same are they?

Your privacy centers on you, the MPAA/Sony privacy centers on what they have done to others.

New terms? You have a privacy interest, MPAA/Sony has an interest in concealing information.

That sets a better tone for the discussion.

Same Sex Marriage Resources (Another Brown?)

Filed under: Government,Law,Politics — Patrick Durusau @ 4:38 pm

You may be aware that the right of same sex couples to marry is coming up for oral argument before the Supreme Court of the United States on 28 April 2015.

The case, Obergefell v. Hodges, has been consolidated by the Court with Tanco v. Haslam (Tennessee), DeBoer v. Snyder (Michigan), Bourke v. Beshear (Kentucky), and the Court has posed two questions:

  1. Does the Fourteenth Amendment require a state to license a marriage between two people of the same sex?
  2. Does the Fourteenth Amendment require a state to recognize a marriage between two people of the same sex when their marriage was lawfully licensed and performed out-of-state?

What you may not know is that SCOTUSblog has extensive commentary and primary documents collected at: Obergefell vs. Hodges. In addition to blog commentary covering all the positions of the parties and others who have filed briefs in this proceeding, there are links to the briefs by the parties and one hundred and fifty-one (151) briefs filed by others.

There will be a lot of loose talk about a decision favoring gay marriage as another Brown v. Board of Education. A favorable decision would legally end another form of narrow mindedness, as it should. However, I don’t think the two cases are comparable in terms of magnitude.

Perhaps because I was born the year Brown was decided and due to the practice of “…all deliberate speed…” in the South, I attended segregated schools until I was in the ninth grade. I won’t bore you will distorted recollections from so long ago but suffice it to say that interest on the debt of Jim Crow and de jure segregation is still being paid by children of all races in the South.

Same sex couples have been discriminated against and that should end, but they are adults, not children. Brown recognized sinning against children and started the nation on a long road to recognize that as well.

Twitter cuts off ‘firehose’ access…

Filed under: Data,Twitter — Patrick Durusau @ 3:11 pm

Twitter cuts off ‘firehose’ access, eyes Big Data bonanza by Mike Wheatley.

From the post:

Twitter upset the applecart on Friday when it announced it would no longer license its stream of half a billion daily tweets to third-party resellers.

The social media site said it had decided to terminate all current agreements with third parties to resell its ‘firehose’ data – an unfiltered, full stream of tweets and all of the metadata that comes with them. For companies that still wish to access the firehose, they’ll still be able to do so, but only by licensing the data directly from Twitter itself.

Twitter’s new plan is to use its own Big Data analytics team, which came about as a result of its acquisition of Gnip in 2014, to build direct relationships with data companies and brands that rely on Twitter data to measure market trends, consumer sentiment and other metrics that can be best understood by keeping track of what people are saying online. The company hopes to complete the transition by August this year.

Not that I had any foreknowledge of Twitter’s plans but I can’t say this latest move is all that surprising.

What I hope also emerges from the “new plan” is a fixed pricing structure for smaller users of Twitter content. I’m really not interested in an airline pricing model where the price you pay has no rational relationship to the value of the product. If it’s the day before the end of a sales quarter I get a very different price for a Twitter feed than mid-way through the quarter. That sort of thing.

Along with being able to specify users to follow/searches and tweet streams in daily increments of 250,000, 500,000, 750,000, 1,000,000, where they are spooled for daily pickup over high speed connections (to put less stress on infrastructure).

I suppose renewable contracts would be too much to ask? 😉

Unannotated Listicle of Public Data Sets

Filed under: Data,Dataset — Patrick Durusau @ 2:50 pm

Great Github list of public data sets by Mirko Krivanek.

Large list of public data sets, previously published on GitHub, which has no annotations to guide you to particular datasets.

Just in case you know of any legitimate aircraft wiring sites, i.e., ones that existed prior to the GAO report on hacking aircraft networks, ping me with the links. Thanks!

@alt_text_bot

Filed under: Access Points,Interface Research/Design — Patrick Durusau @ 2:33 pm

@alt_text_bot automatic text descriptions of images on Twitter by Cameron Cundiff

From the post:

Twitter is an important part of public discourse. As it becomes more and more image heavy, people who are blind are left out of the conversation. That’s where Alt-Bot comes in. Alt-Bot fills the gaps in image content using an image recognition API to add text descriptions.

The inspiration for the format of the message is a tweet by @stevefaulkner, in which he adds alt text to a retweet.

If accessibility isn’t high on your radar, imagine an adaptation of the same technique that recognizes sexual images and warns managers and diplomats of possible phishing scams.

Spread the word!

I first saw this in a tweet by Steve Faulkner.

MS Windows As A Security Flaw? Plebnet?

Filed under: Cybersecurity,Security — Patrick Durusau @ 2:06 pm

Russian cyber attackers used two unknown flaws: security company by Joseph Menn.

From the post:

(Reuters) – A widely reported Russian cyber-spying campaign against diplomatic targets in the United States and elsewhere has been using two previously unknown flaws in software to penetrate target machines, a security company investigating the matter said on Saturday.

FireEye Inc (FEYE.O), a prominent U.S. security company, said the espionage effort took advantage of holes in Adobe Systems Inc’s (ADBE.O) Flash software for viewing active content and Microsoft Corp’s (MSFT.O) ubiquitous Windows operating system.

The campaign has been tied by other firms to a serious breach at U.S. State Department computers. The same hackers are also believed to have broken into White House machines containing unclassified but sensitive information such as the president’s travel schedule.

Perhaps I was just tired last night but when I first read this story, I could not tell if Joseph was being sarcastic about “two unknown flaws” in Adobe Flash and MS Windows or if he was saying Abobe Flash and MS Windows were the security flaws being exploited by a “reported Russian cyber-spying campaign….”

Having slept since then, I am still not entirely sure which one Joseph meant. 😉

If State and the White House are running MS Windows with Adobe Flash on public networks, I can reliably isolate two major security flaws in their security. Adobe Flash and MS Windows. Not to knock MS Window as an OS, but with approximately fifty (50) million lines of code, it’s known to be insecure and will continue to be insecure. No surprises there.

You don’t need to abandon Windows as an OS but accept that it isn’t secure. (full stop) If you use Windows OS on a public network, you are by definition not secure. If you want greater security and to use MS Windows as an OS, move to a secure network.

Perhaps we should borrow (steal?) a term from Margaret Atwood: plebnet, to describe the Internet. The plebnet being rife with hazards, dangers, evil-deed doers, viruses, fraud, spam, advertising, and built upon, traveled and sustained by insecure software.

For a wide range of motivations, most of us would not have the plebnet be any other way. It’s also called freedom.

If the White House and State want a presence on the plebnet, it’s called assumption of risk, legally speaking.

April 19, 2015

DARPA: New and Updated Projects

Filed under: Cybersecurity,DARPA,Security — Patrick Durusau @ 1:38 pm

The DARPA Open Catalog lists two (2) updated projects and one (1) new project in addition to MEMEX.

New:

PLAN X: Plan X is a foundational cyberwarfare program to develop platforms for the Department of Defense to plan for, conduct, and assess cyberwarfare in a manner similar to kinetic warfare. Toward this end the program will bridge cyber communities of interest from academe, to the defense industrial base, to the commercial tech industry, to user-experience experts.

Plan X has three (3) sub-projects:

Mistral Compiler: Mistral is an experimental language and compiler for highly concurrent, distributed programming, with an emphasis on the specification and maintenance of a virtual overlay network and communication over that network. The current Mistral compiler supports the first generation of Mistral, so not all features we architected for the language are supported at present. This open source package includes our compiler and an interpreter. Use of Mistral for running programs on distributed systems requires a run-time system not included in this package. Thus this Mistral package allows only for experimentation with the language.

Lua Native Big Number Library: The PolarBN library is a big number library for use in cryptographic applications. PolarBN is a Lua wrapper for the bignum library in PolarSSL. The wrapper is written in C, using the standard Lua headers. Compared to the two standard Lua big number libraries, PolarBN is substantially faster than lbc, and does not require openssl-dev, as does lbn.

avro-golang-compiler: This repository contains a modification of the Avro Java compiler to generate golang code that uses the Avro C bindings to actually parse serialized Avro containers. (Java, C, Golang) (no link for this project)

Due to my lack of background in this area, I found the Plan X project description, such as: “…assess cyberwarfare in a manner similar to kinetic warfare,” rather opaque. Do they mean like physical “war games?” Please clue me in if you can.

In trying to find an answer, I did read the Mistral documentation, such as it was and ran across:

One challenge in programming at Internet scale is the development of languages in which to do this programming. For example, concurrency and control of it is one aspect where current languages fall short. A typical highly concurrent language such as Erlang can handle at most a few thousand concurrent processes in a computation, and requires substantial assumptions about reliable interconnection of all hosts involved in such computation. In contrast, languages for programming at Internet-scale should scale to handle millions of processes, yet be tolerant of highly dynamic network environments where both hosts and communication paths may come and go frequently during the lifetime of an application.

Any Erlangers care to comment?

Another source of puzzlement is how one would simulate a network with all its attendant vulnerabilities? In addition to varying versions and updates of software, there is the near constant interaction of users with remote resources, email, etc. Your “defense” may be perfect except for when “lite” colonels fall for phishing email scams. Unless they intend to simulate user behavior as well. Just curious.

Updated:

I say updated because DARPA says updated. I was unable to discover an easy way to tell which sub-parts were updated. I don’t have a screen shot of an earlier listing. But, for that its worth:

Active Authentication (AA): The Active Authentication (AA) program seeks to develop novel ways of validating the identity of computer users by focusing on the unique aspects of individuals through software-based biometrics. Biometrics are defined as the characteristics used to recognize individuals based on one or more intrinsic physical or behavioral traits. This program is focused on behavioral biometrics. [Seven (7) projects.]

XDATA: XDATA is developing an open source software library for big data to help overcome the challenges of effectively scaling to modern data volume and characteristics. The program is developing the tools and techniques to process and analyze large sets of imperfect, incomplete data. Its programs and publications focus on the areas of analytics, visualization, and infrastructure to efficiently fuse, analyze and disseminate these large volumes of data. [Eighty-three (83) projects so you can understand the difficulty in spotting the update.]

DARPA: MEMEX (Domain-Specific Search) Drops!

Filed under: DARPA,Search Engines,Searching,Tor — Patrick Durusau @ 12:49 pm

The DARPA MEMEX project is now listed on its Open Catalog page!

Forty (40) separate components listed by team, project, category, link to code, description and license. Each sortable of course.

No doubt DARPA has held back some of its best work but looking over the descriptions, there are no bojums or quantum leaps beyond current discussions in search technology. How far you can push the released work beyond its current state is an exercise for the reader.

Machine learning is mentioned in the descriptions for DeepDive, Formasaurus and SourcePin. No explicit mention of deep learning, at least in the descriptions.

If you prefer to not visit the DARPA site, I have gleaned the essential information (project, link to code, description) into the following list:

  • ACHE: ACHE is a focused crawler. Users can customize the crawler to search for different topics or objects on the Web. (Java)
  • Aperture Tile-Based Visual Analytics: New tools for raw data characterization of ‘big data’ are required to suggest initial hypotheses for testing. The widespread use and adoption of web-based maps has provided a familiar set of interactions for exploring abstract large data spaces. Building on these techniques, we developed tile based visual analytics that provide browser-based interactive visualization of billions of data points. (JavaScript/Java)
  • ArrayFire: ArrayFire is a high performance software library for parallel computing with an easy-to-use API. Its array-based function set makes parallel programming simple. ArrayFire’s multiple backends (CUDA, OpenCL, and native CPU) make it platform independent and highly portable. A few lines of code in ArrayFire can replace dozens of lines of parallel computing code, saving users valuable time and lowering development costs. (C, C++, Python, Fortran, Java)
  • Autologin: AutoLogin is a utility that allows a web crawler to start from any given page of a website (for example the home page) and attempt to find the login page, where the spider can then log in with a set of valid, user-provided credentials to conduct a deep crawl of a site to which the user already has legitimate access. AutoLogin can be used as a library or as a service. (Python)
  • CubeTest: Official evaluation metric used for evaluation for TREC Dynamic Domain Track. It is a multiple-dimensional metric that measures the effectiveness of complete a complex and task-based search process. (Perl)
  • Data Microscopes: Data Microscopes is a collection of robust, validated Bayesian nonparametric models for discovering structure in data. Models for tabular, relational, text, and time-series data can accommodate multiple data types, including categorical, real-valued, binary, and spatial data. Inference and visualization of results respects the underlying uncertainty in the data, allowing domain experts to feel confident in the quality of the answers they receive. (Python, C++)
  • DataWake: The Datawake project consists of various server and database technologies that aggregate user browsing data via a plug-in using domain-specific searches. This captured, or extracted, data is organized into browse paths and elements of interest. This information can be shared or expanded amongst teams of individuals. Elements of interest which are extracted either automatically, or manually by the user, are given weighted values. (Python/Java/Scala/Clojure/JavaScript)
  • DeepDive: DeepDive is a new type of knowledge base construction system that enables developers to analyze data on a deeper level than ever before. Many applications have been built using DeepDive to extract data from millions of documents, Web pages, PDFs, tables, and figures. DeepDive is a trained system, which means that it uses machine-learning techniques to incorporate domain-specific knowledge and user feedback to improve the quality of its analysis. DeepDive can deal with noisy and imprecise data by producing calibrated probabilities for every assertion it makes. DeepDive offers a scalable, high-performance learning engine. (SQL, Python, C++)
  • DIG: DIG is a visual analysis tool based on a faceted search engine that enables rapid, interactive exploration of large data sets. Users refine their queries by entering search terms or selecting values from lists of aggregated attributes. DIG can be quickly configured for a new domain through simple configuration. (JavaScript)
  • Dossier Stack: Dossier Stack provides a framework of library components for building active search applications that learn what users want by capturing their actions as truth data. The frameworks web services and javascript client libraries enable applications to efficiently capture user actions such as organizing content into folders, and allows back end algorithms to train classifiers and ranking algorithms to recommend content based on those user actions. (Python/JavaScript/Java)
  • Dumpling: Dumpling implements a novel dynamic search engine which refines search results on the fly. Dumpling utilizes the Winwin algorithm and the Query Change retrieval Model (QCM) to infer the user’s state and tailor search results accordingly. Dumpling provides a friendly user interface for user to compare the static results and dynamic results. (Java, JavaScript, HTML, CSS)
  • FacetSpace: FacetSpace allows the investigation of large data sets based on the extraction and manipulation of relevant facets. These facets may be almost any consistent piece of information that can be extracted from the dataset: names, locations, prices, etc… (JavaScript)
  • Formasaurus: Formasaurus is a Python package that tells users the type of an HTML form: is it a login, search, registration, password recovery, join mailing list, contact form or something else. Under the hood it uses machine learning. (Python)
  • Frontera: Frontera (formerly Crawl Frontier) is used as part of a web crawler, it can store URLs and prioritize what to visit next. (Python)
  • HG Profiler: HG Profiler is a tool that allows users to take a list of entities from a particular source and look for those same entities across a pre-defined list of other sources. (Python)
  • Hidden Service Forum Spider: An interactive web forum analysis tool that operates over Tor hidden services. This tool is capable of passive forum data capture and posting dialog at random or user-specifiable intervals. (Python)
  • HSProbe (The Tor Hidden Service Prober): HSProbe is a python multi-threaded STEM-based application designed to interrogate the status of Tor hidden services (HSs) and extracting hidden service content. It is an HS-protocol savvy crawler, that uses protocol error codes to decide what to do when a hidden service is not reached. HSProbe tests whether specified Tor hidden services (.onion addresses) are listening on one of a range of pre-specified ports, and optionally, whether they are speaking over other specified protocols. As of this version, support for HTTP and HTTPS is implemented. Hsprobe takes as input a list of hidden services to be probed and generates as output a similar list of the results of each hidden service probed. (Python)
  • ImageCat: ImageCat analyses images and extracts their EXIF metadata and any text contained in the image via OCR. It can handle millions of images. (Python, Java)
  • ImageSpace: ImageSpace provides the ability to analyze and search through large numbers of images. These images may be text searched based on associated metadata and OCR text or a new image may be uploaded as a foundation for a search. (Python)
  • Karma: Karma is an information integration tool that enables users to quickly and easily integrate data from a variety of data sources including databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs. Users integrate information by modelling it according to an ontology of their choice using a graphical user interface that automates much of the process. (Java, JavaScript)
  • LegisGATE: Demonstration application for running General Architecture Text Engineering over legislative resources. (Java)
  • Memex Explorer: Memex Explorer is a pluggable framework for domain specific crawls, search, and unified interface for Memex Tools. It includes the capability to add links to other web-based apps (not just Memex) and the capability to start, stop, and analyze web crawls using 2 different crawlers – ACHE and Nutch. (Python)
  • MITIE: Trainable named entity extractor (NER) and relation extractor. (C)
  • Omakase: Omakase provides a simple and flexible interface to share data, computations, and visualizations between a variety of user roles in both local and cloud environments. (Python, Clojure)
  • pykafka: pykafka is a Python driver for the Apache Kafka messaging system. It enables Python programmers to publish data to Kafka topics and subscribe to existing Kafka topics. It includes a pure-Python implementation as well as an optional C driver for increased performance. It is the only Python driver to have feature parity with the official Scala driver, supporting both high-level and low-level APIs, including balanced consumer groups for high-scale uses. (Python)
  • Scrapy Cluster: Scrapy Cluster is a scalable, distributed web crawling cluster based on Scrapy and coordinated via Kafka and Redis. It provides a framework for intelligent distributed throttling as well as the ability to conduct time-limited web crawls. (Python)
  • Scrapy-Dockerhub: Scrapy-Dockerhub is a deployment setup for Scrapy spiders that packages the spider and all dependencies into a Docker container, which is then managed by a Fabric command line utility. With this setup, users can run spiders seamlessly on any server, without the need for Scrapyd which typically handles the spider management. With Scrapy-Dockerhub, users issue one command to deploy spider with all dependencies to the server and second command to run it. There are also commands for viewing jobs, logs, etc. (Python)
  • Shadow: Shadow is an open-source network simulator/emulator hybrid that runs real applications like Tor and Bitcoin over a simulated Internet topology. It is light-weight, efficient, scalable, parallelized, controllable, deterministic, accurate, and modular. (C)
  • SMQTK: Kitware’s Social Multimedia Query Toolkit (SMQTK) is an open-source service for ingesting images and video from social media (e.g. YouTube, Twitter), computing content-based features, indexing the media based on the content descriptors, querying for similar content, and building user-defined searches via an interactive query refinement (IQR) process. (Python)
  • SourcePin: SourcePin is a tool to assist users in discovering websites that contain content they are interested in for a particular topic, or domain. Unlike a search engine, SourcePin allows a non-technical user to leverage the power of an advanced automated smart web crawling system to generate significantly more results than the manual process typically does, in significantly less time. The User Interface of SourcePin allows users to quickly across through hundreds or thousands of representative images to quickly find the websites they are most interested in. SourcePin also has a scoring system which takes feedback from the user on which websites are interesting and, using machine learning, assigns a score to the other crawl results based on how interesting they are likely to be for the user. The roadmap for SourcePin includes integration with other tools and a capability for users to actually extract relevant information from the crawl results. (Python, JavaScript)
  • Splash: Lightweight, scriptable browser as a service with an HTTP API. (Python)
  • streamparse: streamparse runs Python code against real-time streams of data. It allows users to spin up small clusters of stream processing machines locally during development. It also allows remote management of stream processing clusters that are running Apache Storm. It includes a Python module implementing the Storm multi-lang protocol; a command-line tool for managing local development, projects, and clusters; and an API for writing data processing topologies easily. (Python, Clojure)
  • TellFinder: TellFinder provides efficient visual analytics to automatically characterize and organize publicly available Internet data. Compared to standard web search engines, TellFinder enables users to research case-related data in significantly less time. Reviewing TellFinder’s automatically characterized groups also allows users to understand temporal patterns, relationships and aggregate behavior. The techniques are applicable to various domains. (JavaScript, Java)
  • Text.jl: Text.jl provided numerous tools for text processing optimized for the Julia language. Functionality supported include algorithms for feature extraction, text classification, and language identification. (Julia)
  • TJBatchExtractor: Regex based information extractor for online advertisements (Java).
  • Topic: This tool takes a set of text documents, filters by a given language, and then produces documents clustered by topic. The method used is Probabilistic Latent Semantic Analysis (PLSA). (Python)
  • Topic Space: Tool for visualization for topics in document collections. (Python)
  • Tor: The core software for using and participating in the Tor network. (C)
  • The Tor Path Simulator (TorPS): TorPS quickly simulates path selection in the Tor traffic-secure communications network. It is useful for experimental analysis of alternative route selection algorithms or changes to route selection parameters. (C++, Python, Bash)
  • TREC-DD Annotation: This Annotation Tool supports the annotation task in creating ground truth data for TREC Dynamic Domain Track. It adopts drag and drop approach for assessor to annotate passage-level relevance judgement. It also supports multiple ways of browsing and search in various domains of corpora used in TREC DD. (Python, JavaScript, HTML, CSS)

Beyond whatever use you find for the software, it is also important in terms of what capabilities are of interest to DARPA and by extension to those interested in militarized IT.

April 18, 2015

Deep Space Navigation With Deep Learning

Filed under: Astroinformatics,Deep Learning — Patrick Durusau @ 7:04 pm

Well, that’s not exactly the title but the paper does describe a better than 99% accuracy when compared to human recognition of galaxy images by type. I assume galaxy type is going to be a question on deep space navigation exams in the distant future. 😉

Rotation-invariant convolutional neural networks for galaxy morphology prediction by Sander Dieleman, Kyle W. Willett, Joni Dambre.

Abstract:

Measuring the morphological parameters of galaxies is a key requirement for studying their formation and evolution. Surveys such as the Sloan Digital Sky Survey (SDSS) have resulted in the availability of very large collections of images, which have permitted population-wide analyses of galaxy morphology. Morphological analysis has traditionally been carried out mostly via visual inspection by trained experts, which is time-consuming and does not scale to large (≳104) numbers of images.

Although attempts have been made to build automated classification systems, these have not been able to achieve the desired level of accuracy. The Galaxy Zoo project successfully applied a crowdsourcing strategy, inviting online users to classify images by answering a series of questions. Unfortunately, even this approach does not scale well enough to keep up with the increasing availability of galaxy images.

We present a deep neural network model for galaxy morphology classification which exploits translational and rotational symmetry. It was developed in the context of the Galaxy Challenge, an international competition to build the best model for morphology classification based on annotated images from the Galaxy Zoo project.

For images with high agreement among the Galaxy Zoo participants, our model is able to reproduce their consensus with near-perfect accuracy (>99%) for most questions. Confident model predictions are highly accurate, which makes the model suitable for filtering large collections of images and forwarding challenging images to experts for manual annotation. This approach greatly reduces the experts’ workload without affecting accuracy. The application of these algorithms to larger sets of training data will be critical for analysing results from future surveys such as the LSST.

I particularly like the line:

Confident model predictions are highly accurate, which makes the model suitable for filtering large collections of images and forwarding challenging images to experts for manual annotation.

It reminds me of a suggestion I made for doing something quite similar where the uncertainly of crowd classifiers on a particular letter (as in a manuscript) would trigger the forwarding of that portion to an expert for a “definitive” read. You would surprised at the resistance you can encounter to the suggestion that no special skills are needed to read Greek manuscripts, which are in many cases as clear as when they were written in the early Christian era. Some aren’t and some aspects of them require expertise, but that isn’t to say they all require expertise.

Of course, if successful, such a venture could quite possibly result in papers that cite the images of all extant biblical witnesses and all of the variant texts, as opposed to those that cite a fragment entrusted to them for publication. The difference being whether you want to engage in scholarship, the act of interpreting witnesses or whether you wish to tell the proper time and make a modest noise while doing so.

Removing blank lines in a buffer (Emacs)

Filed under: Editor,Regexes — Patrick Durusau @ 6:46 pm

Removing blank lines in a buffer by Mickey Petersen.

I was mining Twitter addresses from list embedded in HTML markup in Emacs (great way to practice regexes) and as a result, had lots of blank lines. Before running sort or uniq, I wanted to remove the blank lines.

All of Mickey’s posts are great resources but I found this one particularly helpful.

April 17, 2015

Data Elixir

Filed under: Data Science — Patrick Durusau @ 6:30 pm

Data Elixir

From the webpage:

Data Elixir is a weekly collection of the best data science news, resources, and inspirations from around the web.

Subscribe now for free and never miss an issue.

Resources like this one help with winnowing the chaff in IT.

I first saw this in a tweet by Lon Riesberg.

Hacking a Passenger Jet (No Fooling) (+ FBI Watches Fox News)

Filed under: Cybersecurity,Security — Patrick Durusau @ 3:56 pm

The screaming headlines about the potential for hacking into the control systems of a passenger jet are true. No reported proof-of-concept demonstration, yet, but consider these sources:

GAO, not part of the tin-hat crowd, publishes: Air Traffic Control: FAA Needs a More Comprehensive Approach to Address Cybersecurity As Agency Transitions to NextGen. GAO-15-370.

Fifty-six (56) pages of interesting stuff but the summary clues you in:


Modern aircraft are increasingly connected to the Internet. This interconnectedness can potentially provide unauthorized remote access to aircraft avionics systems. As part of the aircraft certification process, FAA’s Office of Safety (AVS) currently certifies new interconnected systems through rules for specific aircraft and has started reviewing rules for certifying the cybersecurity of all new aircraft systems.

How “…potentially provide unauthorized remote access to aircraft avionics systems?” On page 19, the GAO diagrams the separation between avionics and the Wi-Fi network:

gao-plane-network

One Ethernet router between you and the avionics network.

The avionics software between you and control over the flight surfaces of the airplane.

Just me but Ethernet routers aren’t typically all that difficult to hack. The avionics software and its security isn’t known to me so may be the harder of the two tasks. Certainly not a script kiddie attack, at least not the first time.

The first person I saw pointing out the GAO diagrams with the Ethernet router was Paul Ducklin in Could a hacker *really* bring down a plane from a mobile phone in seat 12C?. So much for the “separation” of the avionics and the Wi-Fi. It’s there, but a robust solution.

Just another factoid but On Hacking A Passenger Airliner (GAO report) by Herb Lin appeared yesterday but today the page is missing. You can search for it and find the link, but following the link says the page is not found.

I would not mention that except that Malia Zimmerman reports in: Security expert pulled off flight by FBI after exposing airline tech vulnerabilities that:

One of the world’s foremost experts on counter-threat intelligence within the cybersecurity industry, who blew the whistle on vulnerabilities in airplane technology systems in a series of recent Fox News reports, has become the target of an FBI investigation himself.

Chris Roberts of the Colorado-based One World Labs, a security intelligence firm that identifies risks before they’re exploited, said two FBI agents and two uniformed police officers pulled him off a United Airlines Boeing 737-800 commercial flight Wednesday night just after it landed in Syracuse, and spent the next four hours questioning him about cyberhacking of planes.

The FBI interrogation came just hours after Fox News published a report on Roberts’ research, in which he said: “We can still take planes out of the sky thanks to the flaws in the in-flight entertainment systems. Quite simply put, we can theorize on how to turn the engines off at 35,000 feet and not have any of those damn flashing lights go off in the cockpit.”

His findings, along with those of another security expert quoted in the Fox News reports, were backed up a GAO report released Tuesday.

Which leads me to conclude that:

The FBI watches Fox News!

That explains so much about the state of domestic security.

Let’s hope Chris Roberts bills the FBI for his time. Consultants have nothing to sell but their time and his seizure by the FBI is likely an expropriation of property without due process of law. Chris is under no obligation to help extract the airlines or law enforcement from their current dilemmas for free.

PS:

Thomas Fox-Brewster (Forbes), Pilot: US Government Claims Of Plane Wi-Fi Hacking Wrong And Irresponsible, managed to locate a pilot to disagree with the GAO report.

Ironically since SANS Institute was quick to jump on the Forbes report as demonstrating the incompetence of the GAO. I say ironically because the diagram from the GAO on page 19, fits the facts where Fox-Brewster says:

There have been some cases, however, where networks have not been properly segmented, potentially leaving open vulnerabilities. Seven years ago, it emerged the flight control and infotainment networks on Boeing 787 aircraft were connected, with only a firewall blocking malicious traffic between the two. If that’s still the case, then there’s at least something to worry about.

Err, that is the case being illustrated on page 19. I expect better from SANS than scanning Forbes for agreement and dissing the GAO on that basis.

What would be helpful would be wiring diagrams of airline networking and avionic systems. Pointers anyone? The alternative is to listen to “its true,” “its not true,” sort of debates for months.

Recommending music on Spotify with deep learning

Filed under: Deep Learning,Music,Recommendation — Patrick Durusau @ 2:37 pm

Recommending music on Spotify with deep learning by Sander Dieleman.

From the post:

This summer, I’m interning at Spotify in New York City, where I’m working on content-based music recommendation using convolutional neural networks. In this post, I’ll explain my approach and show some preliminary results.

Overview

This is going to be a long post, so here’s an overview of the different sections. If you want to skip ahead, just click the section title to go there.

If you are interested in the details of deep learning and recommendation for music, you have arrived at the right place!

Walking through Sander’s post will take some time but it will repay your efforts handsomely.

Not to mention Spotify having the potential to broaden your musical horizons!

I first saw this in a tweet by Mica McPeeters.

Join the Letter Hunt from Space with Aerial Bold

Filed under: MapBox,Mapping,Maps — Patrick Durusau @ 2:17 pm

Join the Letter Hunt from Space with Aerial Bold by Alex Barth.

From the post:

Imagine you could write text entirely made up of satellite imagery. Each letter would be a real world feature from a bird’s eye view. A house in the shape of an “A”, a lake in the shape of a “B”, a parking lot in the shape of a “C” and so on. This is the idea behind the nascent kickstarter funded project Aerial Bold. Its inventors Benedikt Groß and Joey Lee are right now collecting font shapes in satellite imagery for Aerial Bold and you can join the letter hunt from space.

mapbox-earth-letters

Letters are recognized from space, based on letter forms. But, more letter forms are needed!

Read the post, join the hunt:

Letter Finder App.

Enjoy!

Prof. Harold Koh Pads Resume With Innocent Lives

Filed under: Politics,Security — Patrick Durusau @ 10:55 am

For reasons best known to themselves, Sarah Cleveland and Michael Posner, have posted Open Letter In Support of Harold Hongju Koh.

In the briefest of terms, Prof. Koh added to his resume a stint as Legal Adviser to the U.S. Department of State in the Obama Administration and is now seeking to teach international human rights law at NYU. His invitation to that position is being opposed.

The letter of support is deeply misguided.

For example, Cleveland and Posner claim:

Professor Koh has been a leading scholar of, and advocate for, human rights for decades. While some may disagree with him on particular issues of law or policy, he is widely known for his unquestionable personal commitment to human rights and his eminent professional qualifications to teach and write on the subject. Any number of reports confirm that Professor Koh was a leading advocate for preservation of the rule of law, human rights and transparency within the Obama Administration, including on the drones issue.

We will only ever have unsubstantiated rumors for Koh’s positions within the Obama Administration since Presidential advice is by its very nature, secret.

Moreover, even assuming that Koh did oppose the excesses of the Obama Administration, his very presence gave legitimacy to their illegal activities. He participated in given a “color of law” protection to those excesses. Excesses, that unlike the general public, he knew or should have know were ongoing.

Cleveland and Posner conclude:

The world needs more human rights professionals who are willing to commit themselves to government service on behalf of their nation.

Really? The world needs more human rights professionals who pad their resumes with government service, at the expense of innocent lives, and then escape moral accountability by claiming they argued against criminal activity?

That’s a very strange moral calculus.

It isn’t the moral calculus that was followed by special prosecutor Archibald Cox, Attorney General Elliot Richardson and Deputy Attorney General William Ruckelshaus.

Moral people make moral decisions with real world consequences, for themselves. They don’t go along to get along to further their careers.

April 16, 2015

An Inside Look at the Components of a Recommendation Engine

Filed under: ElasticSearch,Mahout,MapR,Recommendation — Patrick Durusau @ 7:01 pm

An Inside Look at the Components of a Recommendation Engine by Carol McDonald.

From the post:

Recommendation engines help narrow your choices to those that best meet your particular needs. In this post, we’re going to take a closer look at how all the different components of a recommendation engine work together. We’re going to use collaborative filtering on movie ratings data to recommend movies. The key components are a collaborative filtering algorithm in Apache Mahout to build and train a machine learning model, and search technology from Elasticsearch to simplify deployment of the recommender.

There are two reasons to read this post:

First, you really don’t know how recommendation engines work. Well, better late than never.

Second, you want an example of how to write an excellent explanation of recommendation engines, hopefully to replicate it for other software.

This is an example of an excellent explanation of recommendation engines but whether you can replicate it for other software remains to be seen. 😉

Still, reading excellent explanations is a first step towards authoring excellent explanations.

Good luck!

Hunting Changes in Complex Networks (Changes in Networks)

Filed under: Astroinformatics,Graphs,Networks — Patrick Durusau @ 6:54 pm

While writing the Methods for visualizing dynamic networks post, I remembered a technique that the authors didn’t discuss.

What if only one node in a complex network was different? That is all of the other nodes and edges remained fixed while one node and it edges changed? How easy would that be to visualize?

If that sounds like an odd use case, it’s not. In fact, the discovery of Pluto in the 1930’s was made using a blink comparator exactly for that purpose.

blink

This is Cyrus Tombaugh using a blink comparator which shows the viewer two images, quickly alternating between them. The images are of the same parts of the night sky and anything that has changed with be quickly noticed by the human eye.

plutodisc_noarrows

Select the star field image to get a larger view and the gif will animate as though seen through a blink comparator. Do you see Pluto? (These are images of the original discovery plates.)

If not, see these http://upload.wikimedia.org/wikipedia/en/c/c6/Pluto_discovery_plates.pngplates with Pluto marked by a large arrow in each one.

This wonderful material on Pluto came from: Beyond the Planets – the discovery of Pluto

All of that was to interest you in reading: GrepNova: A tool for amateur supernova hunting by Frank Dominic.

From the article:

This paper presents GrepNova, a software package which assists amateur supernova hunters by allowing new observations of galaxies to be compared against historical library images in a highly automated fashion. As each new observation is imported, GrepNova automatically identifies a suitable comparison image and rotates it into a common orientation with the new image. The pair can then be blinked on the computer’s display to allow a rapid visual search to be made for stars in outburst. GrepNova has been in use by Tom Boles at his observatory in Coddenham, Suffolk since 2005 August, where it has assisted in the discovery of 50 supernovae up to 2011 October.

That’s right, these folks are searching for supernovas in other galaxies, each of which consists of millions of stars, far denser than most contact networks.

The download information for GrepNova has changed since the article was published: https://in-the-sky.org/software.php.

I don’t have any phone metadata to try the experiment on but with a graph of contacts, the usual contacts will simply be background and new contacts will jump off the screen at you.

A great illustration of why prior searching techniques remain relevant to modern “information busy” visualizations.

Methods for visualizing dynamic networks (Parts 1 and 2)

Filed under: Dynamic Graphs,Networks,Visualization — Patrick Durusau @ 6:16 pm

Methods for visualizing dynamic networks Part 1

Methods for visualizing dynamic networks Part 2

From part 1:

The challenge of visualizing the evolution of connected data through time has kept academics and data scientists busy for years. Finding a way to convey the added complexity of a temporal element without overwhelming the end user with it is not easy.

Whilst building the KeyLines Time Bar – our component for visualizing dynamic networks – we spent a great deal of time appraising the existing temporal visualization options available.

In this blog post, we’ve collated some of the most popular ways of visualizing dynamic graphs through time. Next week, we’ll share some of the more creative and unusual options.

Not a comprehensive survey but eight (8) ways to visualize dynamic networks that you will find interesting.

Others that you would add to this list?

« Newer PostsOlder Posts »

Powered by WordPress