Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 28, 2018

Deep Learning … Wireless Jamming Attacks

Filed under: Cybersecurity,Government,Government Data,Hacking — Patrick Durusau @ 8:25 pm

Deep Learning for Launching and Mitigating Wireless Jamming Attacks by Tugba Erpek, Yalin E. Sagduyu, Yi Shi.

Abstract:

An adversarial machine learning approach is introduced to launch jamming attacks on wireless communications and a defense strategy is provided. A cognitive transmitter uses a pre-trained classifier to predict current channel status based on recent sensing results and decides whether to transmit or not, whereas a jammer collects channel status and ACKs to build a deep learning classifier that reliably predicts whether there will be a successful transmission next and effectively jams these transmissions. This jamming approach is shown to reduce the performance of the transmitter much more severely compared with randomized or sensing-based jamming. Next, a generative adversarial network (GAN) is developed for the jammer to reduce the time to collect the training dataset by augmenting it with synthetic samples. Then, a defense scheme is introduced for the transmitter that prevents the jammer from building a reliable classifier by deliberately taking a small number of wrong actions (in form of a causative attack launched against the jammer) when it accesses the spectrum. The transmitter systematically selects when to take wrong actions and adapts the level of defense to machine learning-based or conventional jamming behavior in order to mislead the jammer into making prediction errors and consequently increase its throughput.

As you know, convenience is going to triumph over security, even (especially?) in the context of military contractors. A deep learning approach may be overkill for low-bid contractor targets but it’s good practice for the occasionally more skilled opponent.

Enjoy!

July 22, 2018

Universal Feminine Hygiene

Filed under: Feminism,Government,Politics — Patrick Durusau @ 6:30 pm

It’s Not Just the Tampon Tax: Why Periods Are Political by By Karen Zraick reminded me to post a “progressive” proposal on feminine hygiene products.

Removing taxes on feminine hygiene products is a step in the right direction but why not go all the way and make those products universally available, at no cost?

The existing distribution chain for feminine hygiene products needs only a few minor tweaks to make that possible. Here’s my solution in three steps:

  1. Retailers provide feminine hygiene products to any customer, free of charge.
  2. Customers are free to choose any brand or type of feminine hygiene product.
  3. Retailers have a tax credit equal to feminine hygiene products distributed, at their retail “price.”

Charging customers for feminine hygiene products, directly or indirectly becomes illegal and states/localities are forbidden from limiting or regulating such sales in anyway.

A direct benefit to all women that preserves their freedom of choice of products. It re-uses existing distribution systems, without any additional forms or paperwork.

Share this with progressives seeking public office.

July 19, 2018

Printed Guns – Security Warning for Protesters

Filed under: FOIA,Free Speech,Government — Patrick Durusau @ 12:13 pm

DOJ Settles With Cody Wilson, Defense Distributed on 3D-Printed Guns

From the post:

The three-year legal battle over the future of 3D-printed guns is officially over, with the Department of Justice agreeing to allow the general public to “access, discuss, use, reproduce or otherwise benefit from” 3D gun files which had previously been prohibited, Reason.com reported.

DEFCAD will permit downloading and uploading of 3D gun files 1 August 2018.

Teasers on the site include:

AR-15

VZ. 58

Printable guns raise two major security concerns for protest groups in general but especially those who oppose pipelines, mining and other environmental crimes.

Traceability: Prior to 3-D printable guns, oppressors risked tracing of bullets fired to particular weapons, weapons which have relatively permanent serial numbers and at least some records of purchase/transfer. Not 100% and certainly rarely pursued but now even that remote possibility has been removed.

Untraceable Throw Down Guns: Putting “throw down” guns on protesters has always carried the risk of the true origin of a gun being discovered. Printable guns lower the cost of “throw down” guns and their lack of traceability, removes the risk of tracking a gun back to its point of origin.

The cheap “throw down” gun is the most likely use of 3-D printable guns by oppressors.

A partial solution for specific protest sites: Have a friendly police officer search you and document your lack of weapons. It’s not much but a law enforcement officer testifying on your behalf could be the saving touch.

PS: FOIA requests to police and other government departments should include purchases of 3-D printers and supplies for the same.

June 23, 2018

Got Bots? Canadians to Monitor Online Chatter for Threats

Filed under: Bots,Cybersecurity,Government — Patrick Durusau @ 7:58 pm

NEB seeks contractor to monitor ‘vast amounts’ of online chatter for potential security threats.

From the post:

The federal regulator responsible for pipelines is seeking an outside company to monitor online chatter en masse and aggregate the data in an effort to detect security risks ahead of time.

The National Energy Board has issued a request for information (RFI) from companies qualified to provide “real-time capability to algorithmically process vast amounts of traditional media, open source and public social media data.”

It is asking applicants to provide a “short demo session” of their security threat monitoring services in early July.

“This RFI is part of our processes to ensure we are getting the services we require to proactively manage security threats, risks and incidents to help protect its personnel, critical assets, information and services,” NEB communications officer Karen Ryhorchuk said in an email.

“It is not specific to any project, application or issue.”

The National Energy Board website is loaded with details on human mistakes (read pipelines) in varying degrees of detail. First stop if you are looking to oppose, interfere with, or degrade a pipeline located in Canada.

It’s interesting to note that despite the RFI being reported, you won’t find it on the News Releases page for the National Energy Board. It’s not on their Twitter feed, NEBCanada as well.

Someone in Canada should know the Yogi Berra line:

“It’s tough to make predictions, especially about the future.”

Well, perhaps not.

Still, if the Canadians are going to spend money on it, whoever they hire needs to earn their pay.

It’s would be trivial to create bots that randomly compose “alert” level posts, but the challenge would be to create an interlocking network of bots that “appear” to be interacting and responding to each others posts.

Thoughts on models of observed network communities that would be useful in training such a system?

There’s nothing guaranteed to stop governments from monitoring social media (if you believe government avowals of non-collection, well, that’s your bad), so the smart money is on generating too many credible signals for them to separate wheat from the chaff.

June 16, 2018

Thumbprint Loans @ Post Offices?

Filed under: Government,Politics,Privacy — Patrick Durusau @ 12:32 pm

In case you haven’t heard, payday loans are the ban of the poor. Aboutpayday.com

I created a graphic that captures the essential facts of a thumbprint loan proposal, which I suggest locating at US Post offices.

The essence of the proposal is to eliminate all the paperwork for government sponsored payday loans at prime plus 1% simple interest.

To do that, all that is required for a loan is a thumbprint. That’s it. No name, location, where your job is located, etc.

When paid, users can choose to create a credit history for their thumbprint, or, have it deleted from the system. Users who create a credit history can build up a record in order to borrow larger than base amounts, or to create a credit history for export to more conventional lenders.

When I first starting thinking about this proposal, I envisioned interactions with Post Office personnel but even that is unnecessary. Thumbprint loans could be wholly automated, up to and including dispersal of cash. That has the added feature of not being limited to post office hours of operation.

A rough sketch to be sure but reducing the APR of payday loans by 791% to 532% for 24 million Americans is worth being on the national agenda.

June 11, 2018

Weaponize Information

Filed under: Data Science,Government,Military — Patrick Durusau @ 4:39 pm

Military Seeks New Tech to Weaponize Information by Aaron Boyd.

Knowledge is power, and the Defense Department wants to ensure it can outpower any enemy in any domain. But first, it needs to know what is technically possible and how industry can support those efforts.

Information warfare—controlling the flow of information in and out of a battlespace to gain a tactical edge—is one of the oldest military tactics in existence. But with the rise of the internet and other advanced communications technologies, it is fast becoming a core tool in every military’s playbook.

In February 2017, Russian military leaders announced the existence of an information warfare branch, replete with troops trained in propaganda and other information operations. In the U.S., these duties are performed by troops in the Joint Information Operations Warfare Center.

The U.S. Army and JIOWC are hosting an industry event on June 26-28 in McLean, Virginia, to identify potential industry and academic partners, find out what new technologies are available to support information operations and determine what kind of products and services the military might want to contract for in the future. While the Army is hosting the event, representatives from the entire Defense Department have been invited to attend.

The information gathered during the event will help JIOWC develop requirements for future procurements to “support the emerging domain of operations in the information environment,” according to a notice on FedBizOpps. Those requirements will likely fall under one of four capability areas:

Only nine (9) days left to file a request to attend and presentation abstracts (June 20th at 3:00pm EST), http://www.cvent.com/d/mgqsvs.

Further information: Elizabeth Bowman, (410) 278-5924, E-Mail: Elizabeth.k.bowman.civ@mail.mil.

Lacking a pet retired colonel and/or any interest in acquiring one, this event is of little interest to me.

If after reviewing the vaguely worded descriptions, you would like to discuss breaching present and future information silos, please feel free to contact me with your semantic integration requirements. patrick@durusau.net.

June 5, 2018

Installing Google Earth – Ubuntu 16.04 (Tooling to help “Preserve The New River Valley”)

Filed under: Environment,Google Earth,Google Maps,Government — Patrick Durusau @ 7:20 pm

I encountered a remarkable resource, Proposed Route Using Google Earth and Geographic Information Systems (GIS) that details the route to be taken by the Mountain Valley Pipeline. (It’s hard to get in the way if you don’t know the route.)

But it requires use of Google Earth, an application I have only used on Windows. Time to change that!

Download Google Earth Pro for PC, Mac, or Linux then run:

sudo dpkg -i google-earth-stable_current_amd64.deb

in the same directory as the downloaded file.

It’s working!

Now back to the nifty resource I was talking about:

Dr. Stockton Maxwell, Assistant Professor of Geospatial Science at Radford University contacted EQT to request a Geographic Information Systems (GIS) request for the proposed pipeline route. A GIS is, in layman’s terms, a computer system that allows geographical data to be captured, stored, analyzed, and presented. Many counties and municipalities now use GIS to map all parcels and provide pertinent information on each parcel. Not surprising to those of us familiar with the EQT/NextEra Alliance, Mountain Valley Pipeline. LLC, they did not respond to Dr. Maxwell’s request. However, the company did have to share this information with the Virginia Department of Conservation and Recreation (DCR) as they are one of the stakeholders that must be notified during the pre-filing process. Dr. Maxwell contacted DCR and they shared the GIS information with him.

To make it easier for us to interpret the GIS information, Dr. Maxwell was kind enough to create a Google Earth file of the entire route (from Wetzel County, WV to Pittsylvania County, VA. He then created a buffer of 150 feet on each side of the route as this is the planned right-of-way during the construction phase of the project. To view this file, you will first need to download and install Google Earth — you can download it for free at: http://www.google.com/earth/download/ge/agree.html. Once you have downloaded Google Earth and installed it (or if you already have Google Earth installed), you will need to download the following file: http://preservethenrv.com/docs/MVP_Route.kmz. The file will download onto your computer. You can then Open the file (you will see the file name show at the bottom of your screen; click on the arrow to the right and select open) and it should open in Google Earth.

We are so lucky to have someone with Dr. Maxwell’s talents on our side — this resource will help landowners and citizens investigate the route better than the company’s maps.

A portion of the default view:

I’m going to be looking for places where accidental traffic congestion will delay construction. Critical sites that need re-routing will require boots on the ground to find and document those places. Got boots?

May 22, 2018

ACLU Flyer for Amazon’s Rekognition

Filed under: Government,Privacy — Patrick Durusau @ 7:22 pm

Did you see the ACLU flyer for Amazon’s Rekognition program?

If there was a police department in the United States that was unaware of Rekognition, that is no longer the case. Way to go ACLU!

Part of the ACLU flyer reads as follows:

Marketing materials and documents obtained by ACLU affiliates in three states reveal a product that can be readily used to violate civil liberties and civil rights. Powered by artificial intelligence, Rekognition can identify, track, and analyze people in real time and recognize up to 100 people in a single image. It can quickly scan information it collects against databases featuring tens of millions of faces, according to Amazon.

Amazon is marketing Rekognition for government surveillance. According to its marketing materials, it views deployment by law enforcement agencies as a “common use case” for this technology. Among other features, the company’s materials describe “person tracking” as an “easy and accurate” way to investigate and monitor people. Amazon says Rekognition can be used to identify “people of interest” raising the possibility that those labeled suspicious by governments — such as undocumented immigrants or Black activists — will be seen as fair game for Rekognition surveillance. It also says Rekognition can monitor “all faces in group photos, crowded events, and public places such as airports” — at a time when Americans are joining public protests at unprecedented levels.

Amazon’s Rekognition raises profound civil liberties and civil rights concerns. Today, the ACLU and a coalition of civil rights organizations demanded that Amazon stop allowing governments to use Rekognition.

My first impression was this is yet another fund raising effort by the ACLU. That impression grew stronger when I saw:

right under the “…demanded that Amazon stop allowing governments to use Rekognition.”

That takes you to:

ACLU address and permission harvesting!

The ACLU’s faux concern about Rekognition obtains your contact data and permission to contact.

Why do I say “faux concern?” Petitioning a vendor to withdraw a product offered by others. Name five similar campaigns that were successful. Name three. Still nothing? How about one?

I’ve got nothing, how about you?

On the other hand, despite surveillance of US citizens being illegal, the NSA engaged in, concealed and continued that surveillance. Explosive Revelation of Obama Administration Illegal Surveillance of Americans (National Review), NSA surveillance exposed (CBS News), NSA Surveillance (ACLU).

Based on experience with the NSA and others, would you guess that ACLU address and permission harvesting is going to be less than effective at stopping Rekognition? The only possible success of this ACLU effort will be a larger solicitation list for the ACLU. Not what I’m interested in signing up for. You?

Options from defeating facial recognition software range from the purely physical to tricking the underlying software. A bit old (2016) but 6 Ways to Defeat Facial Recognition Cameras has some amusing ways to defeat facial recognition software, but most of them tag you as avoiding facial recognition. Unless and until avoiding facial recognition becomes commonplace, obvious avoidance isn’t the best plan.

More recent and promising efforts include Google researchers create universal adversarial image patches to defeat AI object recognition (2018), an effort to hijack an AI system’s attention. That’s only one of many efforts to defeat facial/image recognition software.

Bottom line: Amazon is going to successfully market its Rekognition software, especially with name recognition assistance from the ACLU.

Forfeiting your contact data and permission to the ACLU accomplishes exactly that, gives the ACLU your contact data and permission to contact.

Using, developing, and promoting technology to defeat facial recognition software without permission or agreement is our only hope.

May 8, 2018

Extracting Data From FBI Reports – No Waterboarding Required!

Filed under: FBI,Government,Government Data,R — Patrick Durusau @ 1:01 pm

Wrangling Data Table Out Of the FBI 2017 IC3 Crime Report

From the post:

The U.S. FBI Internet Crime Complaint Center was established in 2000 to receive complaints of Internet crime. They produce an annual report, just released 2017’s edition, and I need the data from it. Since I have to wrangle it out, I thought some folks might like to play long at home, especially since it turns out I had to use both tabulizer and pdftools to accomplish my goal.

Concepts presented:

  • PDF scraping (with both tabulizer and pdftools)
  • asciiruler
  • general string manipulation
  • case_when() vs ifelse() for text cleanup
  • reformatting data for ggraph treemaps

Let’s get started! (NOTE: you can click/tap on any image for a larger version)

Freeing FBI data from a PDF prison, is a public spirited act.

Demonstrating how to free FBI data from PDF prisons, is a virtuous act!

Enjoy!

May 4, 2018

Propaganda For Our Own Good

Filed under: Fake News,Government,Journalism,News — Patrick Durusau @ 10:43 pm

US and Western government propaganda has been plentiful for decades but Caitlin Johnstone uncovers why a prominent think tank is calling for more Western propaganda.

Atlantic Council Explains Why We Need To Be Propagandized For Our Own Good

From the post:

I sometimes try to get establishment loyalists to explain to me exactly why we’re all meant to be terrified of this “Russian propaganda” thing they keep carrying on about. What is the threat, specifically? That it makes the public less willing to go to war with Russia and its allies? That it makes us less trusting of lying, torturing, coup-staging intelligence agencies? Does accidentally catching a glimpse of that green RT logo turn you to stone like Medusa, or melt your face like in Raiders of the Lost Ark?

“Well, it makes us lose trust in our institutions,” is the most common reply.

Okay. So? Where’s the threat there? We know for a fact that we’ve been lied to by those institutions. Iraq isn’t just something we imagined. We should be skeptical of claims made by western governments, intelligence agencies and mass media. How specifically is that skepticism dangerous?

A great read as always but I depart from Johnstone when she concludes:


If our dear leaders are so worried about our losing faith in our institutions, they shouldn’t be concerning themselves with manipulating us into trusting them, they should be making those institutions more trustworthy.

Don’t manipulate better, be better. The fact that an influential think tank is now openly advocating the former over the latter should concern us all.

I tweeted to George Lakoff quite recently, asking for more explicit treatment of how to use persuasion techniques.

Being about to recognize persuasion used against you, in propaganda for example, is good. Being about to construct such techniques to use in propaganda against others, is great! Sadly, no response from Lakoff. Perhaps he was busy.

The “other side,” your pick, isn’t going to stop using propaganda. Hoping, wishing, praying they will, are exercises in being ineffectual.

If you seek to counter decades of finely honed war-mongering, exploitive Western narrative, be prepared to use propaganda and to use it well.

Win MuckRock requests and swag!

Filed under: Government,MuckRock,Transparency — Patrick Durusau @ 4:23 pm

Help analyze Donald Rumsfeld’s memos and win MuckRock requests and swag by Michael Morisy.

From the post:

In January, thanks to a five-year fight by the National Security Archive, the Pentagon began releasing massive troves of former Secretary of Defense Donald Rumsfeld’s memos. The memos were so copious that they developed their own legendary status within the Armed Forces.

Rumsfeld himself describes them:

When I returned to the Pentagon in 2001, I continued writing the short memos that had been nicknamed “snowflakes” some years ago. They quickly became a system of communication with the many employees of DoD, as I would initiate a topic with a short memo to the relevant person, who would in turn provide research, background, or a course of action as necessary. In the digital age it was much easier to keep the originals on file so I could track their progress. They quickly grew in number from mere flurries to a veritable blizzard.

The term “snowflake” covers a range of communications, from notes to myself on topics I found interesting, to extended instructions to my associates, to simple requests for a haircut. There was no set template; some are several pages and some just a few words. They were all conceived individually and I had never considered them as a set until I started work on the memoir. I then found that when reviewed together, they give a remarkable sense of the variety of topics that are confronted by a secretary of defense.

Now you can explore the early days of the War on Terror – and potentially earn free MuckRock requests and even swag – by helping analyze what was in them, surfacing the most interesting and historically important memos and sharing the results with everyone.

MuckRock is offering prizes so jump to Morisy’s post and get started.

Enjoy!

April 5, 2018

The EFF’s BFF? – Government

Filed under: Electronic Frontier Foundation,Government — Patrick Durusau @ 7:42 pm

DHS Confirms Presence of Cell-site Simulators in U.S. Capital by Cooper Quintin.

The present situation:

The Department of Homeland Security has finally confirmed what many security specialists have suspected for years: cell-phone tracking technology known as cell-site simulators (CSS) are being operated by potentially malicious actors in our nation’s capital.

Anyone with the skill level of a hobbyist can now build their own passive IMSI catcher for as little as $7 or an active cell-site simulator for around $1000. Moreover, mobile surveillance vendors have displayed a willingness to sell their goods to countries who can afford their technology, regardless of their human rights records.

The EFF’s solution:


Law enforcement and the intelligence community would surely agree that these technologies are dangerous in the wrong hands, but there is no way to stop criminals and terrorists from using these technologies without also closing the same security flaws that law enforcement uses. Unlike criminals however, law enforcement can still obtain search warrants and work directly with the phone companies to get subscribers’ location, so they would not lose any capabilities if the vulnerabilities CSSs rely on were fixed.

Why the EFF trusts a government that has spied on the American people for decades is a question you need to put to the EFF. I can’t think of any sensible explanation for their position.

I’ve been meaning to ask: How does it feel to be lumped in with “…criminals and terrorists…?”

You may be an average citizen who is curious about who your member of Congress or state/local government is sleeping with, being paid off by, or other normal and customary functions of government.

A CSS device can contribute towards meaningful government transparency. Perhaps that’s why the EFF resists CSS devices being in the hands of citizens.

We’ll lose our dependence on the EFF for what minimal transparency does exist.

I can live with that.

February 27, 2018

Kiddie Hack – OPM

Filed under: Cybersecurity,Government,Security — Patrick Durusau @ 9:24 pm

Is it fair to point out the Office of Personnel Management (OMP) continues to fail to plan upgrades to its security?

That’s right, not OPM security upgrades are failing, but OPM is failing to plan for security upgrades. Three years after 21.5 million current and former fed data records were stolen from the OPM.

The inspector general report reads in part:


While we believe that the Plan is a step in the right direction toward modernizing OPM’s IT environment, it falls short of the requirements outlined in the Appropriations Act. The Plan identifies several modernization-related initiatives and allocates the $11 million amongst these areas, but the Plan does not
identify the full scope of OPM’s modernization effort or contain cost estimates for the individual initiatives or the effort as a whole. All of the other capital budgeting, project planning, and IT security requirements are similarly missing.

At this rate, hackers are stockpiling gear slow enough to work with OPM systems.

Be careful on eBay and other online sources. No doubt the FBI is monitoring purchases of older computer gear.

February 26, 2018

Guide to Searching CIA’s Declassified Archives

Filed under: CIA,Government — Patrick Durusau @ 5:08 pm

The ultimate guide to searching CIA’s declassified archives Looking to dig into the Agency’s 70 year history? Here’s where to start by Emma Best.

From the webpage:

While the Agency deserves credit for compiling a basic guide to searching their FOIA reading room, it still omits information or leaves it spread out across the Agency’s website. In one egregious example, the CIA guide to searching the records lists only three content types that users can search for, a review of the metadata compiled by Data.World reveals an addition ninety content types. This guide will tell you everything you need to know to dive into CREST and start searching like a pro.

Great guide for anyone interested in the declassified CIA archives.

Enjoy!

February 2, 2018

Discrediting the FBI?

Filed under: FBI,Government — Patrick Durusau @ 2:27 pm

Whatever your opinion of the accidental U.S. president (that’s a dead give away), what does it mean to “discredit” the FBI?

Just hitting the high points:

The FBI has a long history of lying and abuse, these being only some of the more recent examples.

So my question remains: What does it mean to “discredit” the FBI?

The FBI and its agents are unworthy of any belief by anyone. Their own records and admissions are a story of staggering from one lie to the next.

I’ll grant the FBI is large enough that honorable, hard working, honest agents must exist. But not enough of them to prevent the repeated fails at the FBI.

Anyone who credits any FBI investigation has motivations other than the factual record of the FBI.

PS: The Nunes memo confirms what many have long suspected about the FISA court: It exercises no more meaningful oversight over FISA warrants than a physical rubber stamp would in their place.

January 22, 2018

EFF Investigates Dark Caracal (But Why?)

Filed under: Cybersecurity,Electronic Frontier Foundation,Government,Privacy,Security — Patrick Durusau @ 9:19 pm

Someone is touting a mobile, PC spyware platform called Dark Caracal to governments by Iain Thomson.

From the post:

An investigation by the Electronic Frontier Foundation and security biz Lookout has uncovered Dark Caracal, a surveillance-toolkit-for-hire that has been used to suck huge amounts of data from Android mobiles and Windows desktop PCs around the world.

Dark Caracal [PDF] appears to be controlled from the Lebanon General Directorate of General Security in Beirut – an intelligence agency – and has slurped hundreds of gigabytes of information from devices. It shares its backend infrastructure with another state-sponsored surveillance campaign, Operation Manul, which the EFF claims was operated by the Kazakhstan government last year.

Crucially, it appears someone is renting out the Dark Caracal spyware platform to nation-state snoops.

The EFF could be spending its time and resources duplicating Dark Caracal for the average citizen.

Instead the EFF continues its quixotic pursuit of governmental wrong-doers. I say “quixotic” because those pilloried by the EFF, such as the NSA, never change their behavior. Unlawful conduct, including surveillance continues.

But don’t take my word for it, the NSA admits that it deletes data it promised under court order to preserve: NSA deleted surveillance data it pledged to preserve. No consequences. Just like there were no consequences when Snowden revealed widespread and illegal surveillance by the NSA.

So you have to wonder, if investigating and suing governmental intelligence organizations produces no tangible results, why is the EFF pursuing them?

If the average citizen had the equivalent of Dark Caracal at their disposal, say as desktop software, the ability of governments like Lebanon, Kazakhstan, and others, to hide their crimes, would be greatly reduced.

Exposure is no guarantee of accountability and/or punishment, but the wack-a-mole strategy of the EFF hasn’t produced transparency or consequences.

January 21, 2018

Are You Smarter Than A 15 Year Old?

Filed under: Cybersecurity,Government,Hacking,Politics,Security — Patrick Durusau @ 1:27 pm

15-Year-Old Schoolboy Posed as CIA Chief to Hack Highly Sensitive Information by Mohit Kumar.

From the post:

A notorious pro-Palestinian hacking group behind a series of embarrassing hacks against United States intelligence officials and leaked the personal details of 20,000 FBI agents, 9,000 Department of Homeland Security officers, and some number of DoJ staffers in 2015.

Believe or not, the leader of this hacking group was just 15-years-old when he used “social engineering” to impersonate CIA director and unauthorisedly access highly sensitive information from his Leicestershire home, revealed during a court hearing on Tuesday.

Kane Gamble, now 18-year-old, the British teenager hacker targeted then CIA director John Brennan, Director of National Intelligence James Clapper, Secretary of Homeland Security Jeh Johnson, FBI deputy director Mark Giuliano, as well as other senior FBI figures.

Between June 2015 and February 2016, Gamble posed as Brennan and tricked call centre and helpline staff into giving away broadband and cable passwords, using which the team also gained access to plans for intelligence operations in Afghanistan and Iran.

Gamble said he targeted the US government because he was “getting more and more annoyed about how corrupt and cold-blooded the US Government” was and “decided to do something about it.”

Your questions:

1. Are You Smarter Than A 15 Year Old?

2. Are You Annoyed by a Corrupt and Cold-blooded Government?

3. Have You Decided to do Something about It?

Yeses for #1 and #2 number in the hundreds of millions.

The lack of governments hemorrhaging data worldwide is silent proof that #3 is a very small number.

What’s your answer to #3? (Don’t post it in the comments.)

January 18, 2018

Launch of DECLASSIFIED

Filed under: Government,Intelligence,Politics — Patrick Durusau @ 11:48 am

Launch of DECLASSIFIED by Mark Curtis.

From the post:

I am about to publish on this site hundreds of UK declassified documents and articles on British foreign policy towards various countries. This will be the first time such a collection has been brought together online.

The declassified documents, mainly from the UK’s National Archives, reveal British policy-makers actual concerns and priorities from the 1940s until the present day, from the ‘horse’s mouth’, as it were: these files are often revelatory and provide an antidote to the often misleading and false mainstream media (and academic) coverage of Britain’s past and present foreign policies.

The documents include my collections of files, accumulated over many years and used as a basis for several books, on episodes such as the UK’s covert war in Yemen in the 1960s, the UK’s support for the Pinochet coup in Chile, the UK’s ‘constitutional coup’ in Guyana, the covert wars in Indonesia in the 1950s, the UK’s backing for wars against the Iraqi Kurds in the 1960s, the coup in Oman in 1970, support for the Idi Amin takeover in Uganda and many others policies since 1945.

But the collection also brings together many other declassified documents by listing dozens of media articles that have been written on the release of declassified files over the years. It also points to some US document releases from the US National Security Archive.

A new resource for those of you tracking the antics of the small and the silly through the 20th and into the 21st century.

I say the “small and the silly” because there’s no doubt that similar machinations have been part and parcel of government toady lives so long as there have been governments. Despite the exaggerated sense of their own importance and the history making importance of their efforts, almost none of their names survive in the ancient historical record.

With the progress of time, the same fate awaits the most recent and current crop of government familiars. While we wait for them to pass into obscurity, you can amuse yourself by outing them and tracking their activities.

This new archive may assist you in your efforts.

Be sure to keep topic maps in mind for mapping between disjoint vocabularies and collections of documents as well as accounts of events.

January 10, 2018

Email Spam from Congress

Filed under: Government,Journalism,News — Patrick Durusau @ 10:40 am

Receive an Email when a Member of Congress has a New Remark Printed in the Congressional Record by Robert Brammer.

From the post:

Congress.gov alerts are emails sent to you when a measure (bill or resolution), nomination, or member profile has been updated with new information. You can also receive an email after a Member has new remarks printed in the Congressional Record. Here are instructions on how to get an email after a Member has new remarks printed in the Congressional Record….

My blog title is unfair to Brammer, who isn’t responsible for the lack of meaningful content in Member remarks printed in the Congressional Record.

Local news outlets reprint such remarks, as does the national media, whether those remarks are grounded in any shared reality or not. Secondary education classes on current events, reporting, government, where such remarks are considered meaningful, are likely to find this useful.

Another use, assuming mining of prior remarks from the Congressional Record, would be in teaching NLP techniques. Highly unlikely you will discover anything new but it will be “new to you” and the result of your own efforts.

January 8, 2018

Bait Avoidance, Congress, Kaspersky Lab

Filed under: Cybersecurity,Government,Politics,Security — Patrick Durusau @ 2:56 pm

Should you use that USB key you found? by Jeffrey Esposito.

Here is a scenario for you: You are walking around, catching Pokémon, getting fresh air, people-watching, taking Fido out to do his business, when something catches your eye. It’s a USB stick, and it’s just sitting there in the middle of the sidewalk.

Jackpot! Christmas morning! (A very small) lottery win! So, now the question is, what is on the device? Spring Break photos? Evil plans to rule the world? Some college kid’s homework? You can’t know unless…

Esposito details an experiement leaving USB keys about at University of Illinois resulted in 48% of them being plugged into computers.

Reports like this from Kaspersky Lab, given the interest in Kaspersky by Congress, could lead to what the pest control industry calls “bait avoidance.”

Imagine members of Congress or their staffs not stuffing random USB keys into their computers. This warning from Kaspersky could poison the well for everyone.

For what it’s worth, salting the halls and offices of Congress with new release music and movies on USB keys, may help develop and maintain insecure USB practices. Countering bait avoidance is everyone’s responsibility.

December 27, 2017

From the Valley of Disinformation Rode the 770 – Opportunity Knocks

Filed under: Cybersecurity,Environment,Government,Government Data,Journalism,Reporting — Patrick Durusau @ 10:32 am

More than 700 employees have left the EPA since Scott Pruitt took over by Natasha Geiling.

From the post:

Since Environmental Protection Agency Administrator Scott Pruitt took over the top job at the agency in March, more than 700 employees have either retired, taken voluntary buyouts, or quit, signaling the second-highest exodus of employees from the agency in nearly a decade.

According to agency documents and federal employment statistics, 770 EPA employees departed the agency between April and December, leaving employment levels close to Reagan-era levels of staffing. According to the EPA’s contingency shutdown plan for December, the agency currently has 14,449 employees on board — a marked change from the April contingency plan, which showed a staff of 15,219.

These departures offer journalists a rare opportunity to bleed the government like a stuck pig. From untimely remission of login credentials to acceptance of spear phishing emails, opportunities abound.

Not for “reach it to me” journalists who use sources as shields from potential criminal liability. While their colleagues are imprisoned for the simple act of publication or murdered (as of today in 2017, 42).

Governments have not, are not and will not act in the public interest. Laws that criminalize acquisition of data or documents are a continuation of their failure to act in the public interest.

Journalists who serve the public interest, by exposing the government’s failure to do so, should use any means at their disposal to obtain data and documents that evidence government failure and misconduct.

Are you a journalist serving the public interest or a “reach it to me” journalist, serving the public interest when there’s no threat to you?

December 16, 2017

Russians? Nation State? Dorm Room? Mirai Botnet Facts

Filed under: Cybersecurity,Government,Journalism,News,Reporting — Patrick Durusau @ 3:40 pm

How a Dorm Room Minecraft Scam Brought Down the Internet by Garett M. Graff.

From the post:

The most dramatic cybersecurity story of 2016 came to a quiet conclusion Friday in an Anchorage courtroom, as three young American computer savants pleaded guilty to masterminding an unprecedented botnet—powered by unsecured internet-of-things devices like security cameras and wireless routers—that unleashed sweeping attacks on key internet services around the globe last fall. What drove them wasn’t anarchist politics or shadowy ties to a nation-state. It was Minecraft.

Graff’s account is mandatory reading for:

  • Hackers who want to avoid discovery by the FBI
  • Journalists who want to avoid false and/or misleading claims about cyberattacks
  • Manufacturers who want to avoid producing insecure devices (a very small number)
  • Readers who interested in how the Mirai botnet hype played out

Enjoy!

December 14, 2017

98% Fail Rate on Privileged Accounts – Transparency in 2018

Filed under: Cybersecurity,Government,Government Data,Security,Transparency — Patrick Durusau @ 9:55 am

Half of companies fail to tell customers about data breaches, claims study by Nicholas Fearn.

From the post:

Half of organisations don’t bother telling customers when their personal information might have been compromised following a cyber attack, according to a new study.

The latest survey from security firm CyberArk comes with the full implementation of the European Union General Data Protection Regulation (GDPR) just months away.

Organisations that fail to notify the relevant data protection authorities of a breach within 72 hours of finding it can expect to face crippling fines of up to four per cent of turnover – with companies trying to hide breaches likely to be hit with the biggest punishments.

The findings have been published in the second iteration the CyberArk Global Advanced Threat Landscape Report 2018, which explores business leaders’ attitudes towards IT security and data protection.

The survey found that, overall, security “does not translate into accountability”. Some 46 per cent of organisations struggle to stop every attempt to breach their IT infrastructure.

And 63 per cent of business leaders acknowledge that their companies are vulnerable to attacks, such as phishing. Despite this concern, 49 per cent of organisations don’t have the right knowledge about security policies.

You can download the report cited in Fearn’s post at: Cyberark Global Advanced Threat Landscape Report 2018: The Business View of Security.

If you think that report has implications for involuntary/inadvertent transparency, Cyberark Global Advanced Threat Landscape Report 2018: Focus on DevOps, reports this gem:


It’s not just that businesses underestimate threats. As noted above, they also do not seem to fully understand where privileged accounts and secrets exist. When asked which IT environments and devices contain privileged accounts and secrets, responses (IT decision maker and DevOps/app developer respondents) were at odds with the claim that most businesses have implemented a privileged account security solution. A massive 98% did not select at least one of the ‘containers’, ‘microservices’, ‘CI/CD tools’, ‘cloud environments’ or ‘source code repositories’ options. At the risk of repetition, privileged accounts and secrets are stored in all of these entities.

A fail rate of 98% on identifying “privileged accounts and secrets?”

Reports like this make you wonder about the clamor for transparency of organizations and governments. Why bother?

Information in 2018 is kept secure by a lack of interest in collecting it.

Remember that for your next transparency discussion.

December 12, 2017

AI-Assisted Fake Porn Is Here… [Endless Possibilities]

Filed under: Artificial Intelligence,Government,Politics,Porn — Patrick Durusau @ 5:06 pm

AI-Assisted Fake Porn Is Here and We’re All Fucked by Samantha Cole.

From the post:

Someone used an algorithm to paste the face of ‘Wonder Woman’ star Gal Gadot onto a porn video, and the implications are terrifying.

There’s a video of Gal Gadot having sex with her stepbrother on the internet. But it’s not really Gadot’s body, and it’s barely her own face. It’s an approximation, face-swapped to look like she’s performing in an existing incest-themed porn video.

The video was created with a machine learning algorithm, using easily accessible materials and open-source code that anyone with a working knowledge of deep learning algorithms could put together.

It’s not going to fool anyone who looks closely. Sometimes the face doesn’t track correctly and there’s an uncanny valley effect at play, but at a glance it seems believable. It’s especially striking considering that it’s allegedly the work of one person—a Redditor who goes by the name ‘deepfakes’—not a big special effects studio that can digitally recreate a young Princess Leia in Rogue One using CGI. Instead, deepfakes uses open-source machine learning tools like TensorFlow, which Google makes freely available to researchers, graduate students, and anyone with an interest in machine learning.
… (emphasis in original)

Posts and tweets lamenting “fake porn” abound but where others see terrifying implications, I see boundless potential.

Spoiler: The nay-sayers are on the wrong side of history – The Erotic Engine: How Pornography has Powered Mass Communication, from Gutenberg to Google Paperback by Patchen Barss.

or,


“The industry has convincingly demonstrated that consumers are willing to shop online and are willing to use credit cards to make purchases,” said Frederick Lane in “Obscene Profits: The Entrepreneurs of Pornography in the Cyber Age.” “In the process, the porn industry has served as a model for a variety of online sales mechanisms, including monthly site fees, the provision of extensive free material as a lure to site visitors, and the concept of upselling (selling related services to people once they have joined a site). In myriad ways, large and small, the porn industry has blazed a commercial path that other industries are hastening to follow.”
… (PORN: The Hidden Engine That Drives Innovation In Tech)

Enough time remains before the 2018 mid-terms for you to learn the technology used by ‘deepfakes’ to produce campaign imagery.

Paul Ryan, current Speaker of the House, isn’t going to (voluntarily) participate in a video where he steals food from children or steps on their hands as they grab for bread crusts in the street.

The same techniques that produce fake porn could be used to produce viral videos of those very scenes and more.

Some people, well-intentioned no doubt, will protest that isn’t informing the electorate and debating the issues. For them I have only one question: Why do you like losing so much?

I would wager one good viral video against 100,000 pages of position papers, unread by anyone other than the tiresome drones who produce them.

If you insist on total authenticity, then take Ryan film clips on why medical care can’t be provided for children and run it split-screen with close up death rattles of dying children. 100% truthful. See how that plays in your local TV market.

Follow ‘deepfakes’ on Reddit and start experimenting today!

December 10, 2017

Releasing Failed Code to Distract from Accountability

Filed under: Government,Open Source,Programming,Project Management — Patrick Durusau @ 11:16 am

Dutch government publishes large project as Free Software by
Carmen Bianca Bakker.

From the post:

The Dutch Ministry of the Interior and Kingdom Relations released the source code and documentation of Basisregistratie Personen (BRP), a 100M€ IT system that registers information about inhabitants within the Netherlands. This comes as a great success for Public Code, and the FSFE applauds the Dutch government’s shift to Free Software.

Operation BRP is an IT project by the Dutch government that has been in the works since 2004. It has cost Dutch taxpayers upwards of 100 million Euros and has endured three failed attempts at revival, without anything to show for it. From the outside, it was unclear what exactly was costing taxpayers so much money with very little information to go on. After the plug had been pulled from the project earlier this year in July, the former interior minister agreed to publish the source code under pressure of Parliament, to offer transparency about the failed project. Secretary of state Knops has now gone beyond that promise and released the source code as Free Software (a.k.a. Open Source Software) to the public.

In 2013, when the first smoke signals showed, the former interior minister initially wanted to address concerns about the project by providing limited parts of the source code to a limited amount of people under certain restrictive conditions. The ministry has since made a complete about-face, releasing a snapshot of the (allegedly) full source code and documentation under the terms of the GNU Affero General Public License, with the development history soon to follow.

As far as the “…complete about-face…,” the American expression is: “You’ve been had.

Be appearing to agonize over the release of the source code, the “former interior minister” has made it appear the public has won a great victory for transparency.

Actually not.

Does the “transparency” offered by the source code show who authorized the expenditure of each part of the 100M€ total and who was paid that 100M€? Does source code “transparency” disclose project management decisions and who, in terms of government officials, approved those project decisions. For that matter, does source code “transparency” disclose discussions of project choices at all and who was present at those discussions?

It’s not hard to see that source code “transparency” is a deliberate failure on the part of the Dutch Ministry of the Interior and Kingdom Relations to be transparent. It has withheld, quite deliberately, any information that would enable Dutch citizens, programmers or otherwise, to have informed opinions about the failure of this project. Or to hold any accountable for its failure.

This may be:

…an unprecedented move of transparency by the Dutch government….

but only if the Dutch government is a black hole in terms of meaningful accountability for its software projects.

Which appears to be the case.

PS: Assuming Dutch citizens can pry project documentation out of the secretive Dutch Ministry of the Interior and Kingdom Relations, I know some Dutch topic mappers could assist with establishing transparency. If that’s what you want.

December 9, 2017

Apache Kafka: Online Talk Series [Non-registration for 5 out of 6]

Filed under: Cybersecurity,ETL,Government,Kafka,Streams — Patrick Durusau @ 2:35 pm

Apache Kafka: Online Talk Series

From the webpage:

Watch this six-part series of online talks presented by Kafka experts. You will learn the key considerations in building a scalable platform for real-time stream data processing, with Apache Kafka at its core.

This series is targeted to those who want to understand all the foundational concepts behind Apache Kafka, streaming data, and real-time processing on streams. The sequence begins with an introduction to Kafka, the popular streaming engine used by many large scale data environments, and continues all the way through to key production planning, architectural and operational methods to consider.

Whether you’re just getting started or have already built stream processing applications for critical business functions, you will find actionable tips and deep insights that will help your enterprise further derive important business value from your data systems.

Video titles:

1. Introduction To Streaming Data and Stream Processing with Apache Kafka, Jay Kreps, Confluent CEO and Co-founder, Apache Kafka Co-creator.

2. Deep Dive into Apache Kafka by Jun Rao, Confluent Co-founder, Apache Kafka Co-creator.

3. Data Integration with Apache Kafka by David Tucker, Director, Partner Engineering and Alliances.

4. Demystifying Stream Processing with Apache Kafka, Neha Narkhede, Confluent CTO and Co-Founder, Apache Kafka Co-creator.

5. A Practical Guide to Selecting a Stream Processing Technology by Michael Noll, Product Manager, Confluent.

6. Streaming in Practice: Putting Kafka in Production by Roger Hoover, Engineer, Confluent. (Registration required. Anyone know a non-registration version of Hoover’s presentation?)

I was able to find versions of the first five videos that don’t require you to register to view them.

I make it a practice to dodge marketing department registrations whenever possible.

You?

Zero Days, Thousands of Nights [Zero-day – 6.9 Year Average Life Expectancy]

Filed under: Cybersecurity,Government,Security,Transparency — Patrick Durusau @ 11:41 am

Zero Days, Thousands of Nights – The Life and Times of Zero-Day Vulnerabilities and Their Exploits by Lillian Ablon, Timothy Bogart.

From the post:

Zero-day vulnerabilities — software vulnerabilities for which no patch or fix has been publicly released — and their exploits are useful in cyber operations — whether by criminals, militaries, or governments — as well as in defensive and academic settings.

This report provides findings from real-world zero-day vulnerability and exploit data that could augment conventional proxy examples and expert opinion, complement current efforts to create a framework for deciding whether to disclose or retain a cache of zero-day vulnerabilities and exploits, inform ongoing policy debates regarding stockpiling and vulnerability disclosure, and add extra context for those examining the implications and resulting liability of attacks and data breaches for U.S. consumers, companies, insurers, and for the civil justice system broadly.

The authors provide insights about the zero-day vulnerability research and exploit development industry; give information on what proportion of zero-day vulnerabilities are alive (undisclosed), dead (known), or somewhere in between; and establish some baseline metrics regarding the average lifespan of zero-day vulnerabilities, the likelihood of another party discovering a vulnerability within a given time period, and the time and costs involved in developing an exploit for a zero-day vulnerability.

Longevity and Discovery by Others

  • Zero-day exploits and their underlying vulnerabilities have a rather long average life expectancy (6.9 years). Only 25 percent of vulnerabilities do not survive to 1.51 years, and only 25 percent live more than 9.5 years.
  • No vulnerability characteristics indicated a long or short life; however, future analyses may want to examine Linux versus other platform types, the similarity of open and closed source code, and exploit class type.
  • For a given stockpile of zero-day vulnerabilities, after a year, approximately 5.7 percent have been publicly discovered and disclosed by another entity.

Rand researchers Ablon and Bogart attempt to interject facts into the debate over stockpiling zero-day vulnerabilities. It a great read, even though I doubt policy decisions over zero-day stockpiling will be fact-driven.

As an advocate of inadvertent or involuntary transparency (is there any other honest kind?), I take heart from the 6.9 year average life expectancy of zero-day exploits.

Researchers should take encouragement from the finding that within a given year, only 5.7 of all zero-days vulnerability discoveries overlap. That is 94.3% of zero-day discoveries are unique. That indicates to me vulnerabilities are left undiscovered every year.

Voluntary transparency, like presidential press conferences, is an opportunity to shape and manipulate your opinions. Zero-day vulnerabilities, on the other hand, can empower honest/involuntary transparency.

Won’t you help?

Shopping for the Intelligence Community (IC) [Needl]

Filed under: Government,Intelligence — Patrick Durusau @ 10:54 am

The holiday season in various traditions has arrived for 2018!

With it returns the vexing question: What to get for the Intelligence Community (IC)?

They have spent all year violating your privacy, undermining legitimate government institutions, supporting illegitimate governments, mocking any notion of human rights and siphoning government resources that could benefit the public for themselves and their contractors.

The excesses of your government’s intelligence agencies will be special to you but in truth, they are all equally loathsome and merit some acknowledgement at this special time of the year.

Needl is a gift for the intelligence community this holiday season and one that can keep on giving all year long.

Take back your privacy. Lose yourself in the haystack.

Your ISP is most likely tracking your browsing habits and selling them to marketing agencies (albeit anonymised). Or worse, making your browsing history available to law enforcement at the hint of a Subpoena. Needl will generate random Internet traffic in an attempt to conceal your legitimate traffic, essentially making your data the Needle in the haystack and thus harder to find. The goal is to make it harder for your ISP, government, etc to track your browsing history and habits.

…(graphic omitted)

Implemented modules:

  • Google: generates a random search string, searches Google and clicks on a random result.
  • Alexa: visits a website from the Alexa Top 1 Million list. (warning: contains a lot of porn websites)
  • Twitter: generates a popular English name and visits their profile; performs random keyword searches
  • DNS: produces random DNS queries from the Alexa Top 1 Million list.
  • Spotify: random searches for Spotify artists

Module ideas:

  • WhatsApp
  • Facebook Messenger

… (emphasis in original)

Not for people with metered access but otherwise, a must for home PCs and enterprise PC farms.

No doubt annoying but running Needl through Tor, with a list of trigger words/phrases, searches for explosives, viruses, CBW topics with locations, etc. would create festive blinking red lights for the intelligence community.

December 6, 2017

Champing at the Cyberbit [Shouldn’t that be: Chomping on Cyberbit?]

Filed under: Cybersecurity,Government,Politics — Patrick Durusau @ 5:10 pm

Champing at the Cyberbit: Ethiopian Dissidents Targeted with New Commercial Spyware by Bill Marczak, Geoffrey Alexander, Sarah McKune, John Scott-Railton, and Ron Deibert.

From the post:

Key Findings

  • This report describes how Ethiopian dissidents in the US, UK, and other countries were targeted with emails containing sophisticated commercial spyware posing as Adobe Flash updates and PDF plugins. Targets include a US-based Ethiopian diaspora media outlet, the Oromia Media Network (OMN), a PhD student, and a lawyer. During the course of our investigation, one of the authors of this report was also targeted.
  • We found a public logfile on the spyware’s command and control server and monitored this logfile over the course of more than a year. We saw the spyware’s operators connecting from Ethiopia, and infected computers connecting from IP addresses in 20 countries, including IP addresses we traced to Eritrean companies and government agencies.
  • Our analysis of the spyware indicates it is a product known as PC Surveillance System (PSS), a commercial spyware product with a novel exploit-free architecture. PSS is offered by Cyberbit — an Israel-based cyber security company that is a wholly-owned subsidiary of Elbit Systems — and marketed to intelligence and law enforcement agencies.
  • We conducted Internet scanning to find other servers associated with PSS and found several servers that appear to be operated by Cyberbit themselves. The public logfiles on these servers seem to have tracked Cyberbit employees as they carried infected laptops around the world, apparently providing demonstrations of PSS to the Royal Thai Army, Uzbekistan’s National Security Service, Zambia’s Financial Intelligence Centre, the Philippine President’s Malacañang Palace, ISS World Europe 2017 in Prague, and Milipol 2017 in Paris. Cyberbit also appears to have provided other demos of PSS in France, Vietnam, Kazakhstan, Rwanda, Serbia, and Nigeria.

Detailed research and reporting, the like of which is absent in reporting about election year “hacks” in the United States.

Despite the excellence of reporting in this post, I find it disappointing that Citizen Lab sees this as an occasion for raising legal and regulatory issues. Especially in light of the last substantive paragraph noting:

As we explore in a separate analysis, while lawful access and intercept tools have legitimate uses, the significant insecurities and illegitimate targeting we have documented that arise from their abuse cannot be ignored. In the absence of stronger norms and incentives to induce state restraint, as well as more robust regulation of spyware companies, we expect that authoritarian and other politically corrupt leaders will continue to obtain and use spyware to covertly surveil and invisibly sabotage the individuals and institutions that hold them to account.

Exposing the abuse of peaceful citizens by their governments is a powerful tool but for me, it falls far short of holding them to account. I have always thought of being “held to account” meant there were negative consequences associated with undesirable behavior.

Do you know of any examples of governments holding Cyberbit or similar entities accountable?

I am aware that the U.S. Congress has from time to time passed legislation “regulating the CIA” and other agencies, all of which was ignored by the regulated agencies. That doesn’t sound like accountability to me.

You?

PS: Despite my disagreement on the call for action, this is a great example of how to provide credible details about malicious cyberactivity. Would that members of the IC would read it and take it to heart.

December 5, 2017

Tabula: Extracting A Hit (sorry) Security List From PDF Report

Filed under: Cybersecurity,Extraction,Government,PDF,Security — Patrick Durusau @ 11:44 am

Benchmarking U.S. Government Websites by Daniel Castro, Galia Nurko, and Alan McQuinn, provides a quick assessment of 468 of the most popular federal websites for “…page-load speed, mobile friendliness, security, and accessibility.”

Unfortunately, it has an ugly table layout:

Double column listings with the same headers?

There are 476 results on Stackoverflow this morning for extracting tables from PDF.

However, I need a cup of coffee, maybe two cups of coffee answer to extracting data from these tables.

Enter Tabula.

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there’s no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux.

Tabula is download, extract, start and point your web browser to http://localhost:8080 (or http://127.0.0.1:8080), load your PDF file, select the table, export the content, easy to use.

I tried selecting the columns separately (one page at a time) but then used table recognition and selected the entirety of Table 6 (security evaluation). I don’t think it made any difference in the errors I was seeing in the result (dropping first letter of site domains, but check with your data.)

Warning: For some unknown reason, possibly a defect in the PDF and/or Tabula, the leading character from the second domain field was dropped on some entries. Not all, not consistently, but it was dropped. Not to mention missing the last line of entries on a couple of pages. Proofing is required!

Not to mention there were other recognition issues

Capture wasn’t perfect due to underlying differences in the PDF:

cancer.gov,100,901,fdic.gov,100,"3,284"
weather.gov,100,904,blm.gov,100,"3,307"
transportation.gov,,,100,,,"3,340",,,ecreation.gov,,,100,,,"9,012",
"regulations.gov1003,390data.gov1009,103",,,,,,,,,,,,,,,,
nga.gov,,,100,,,"3,462",,,irstgov.gov,,,100,,,"9,112",
"nrel.gov1003,623nationalservice.gov1009,127",,,,,,,,,,,,,,,,
hrsa.gov,,,100,,,"3,635",,,topbullying.gov,,,100,,,"9,285",
"consumerfinance.gov1004,144section508.gov1009,391",,,,,,,,,,,,,,,,

With proofing, we are way beyond two cups of coffee but once proofed, I tossed it into Calc and produced a single column CSV file: 2017-Benchmarking-US-Government-Websites-Security-Table-6.csv.

Enjoy!

PS: I discovered a LibreOffice Calc “gotcha” in this exercise. If you select a column for the top and attempt to paste it under an existing column (same or different spreadsheet), you get the error message: “There is not enough room on the sheet to insert here.”

When you select a column from the top, it copies all the blank cells in that column so there truly isn’t sufficient space to paste it under another column. Tip: Always copy columns in Calc from the bottom of the column up.

« Newer PostsOlder Posts »

Powered by WordPress