Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 13, 2017

Designing a Business Card in LaTeX (For Your New Alt-Identities)

Filed under: Cybersecurity,Security,TeX/LaTeX — Patrick Durusau @ 9:19 pm

Designing a Business Card in LaTeX by Olivier Pieters

From the post:

In 2017, I will graduate from Ghent University. This means starting a professional career, either in academia or in industry. One of the first things that came to mind was that I needed a good curriculum vitæ, and a business card. I already have the former, but I still needed a business card. Consequently, I looked a bit online and was not all that impressed by the tools people used to design them. I did not want to change some template everybody’s using, but do my own thing. And suddenly, I realised: what better tool than LaTeX to make it!

I know, I already hear some saying “why not use the online tools?” or “Photoshop?”. I picked LaTeX because I want to have a platform independent implementation and because why not? I really like making LaTeX documents, so this seemed like something other than creating long documents.

So, how are we going to create it? First, we’ll make a template for the front and back sides. Then, we will modify this to our needs and have a perfectly formatted and aligned business card.

One of the few fun tasks in the creation of an alternative identity should be the creation of a new business card.

Olivier’s post gets you started on the LaTeX side, although an eye-catching design is on you.

It’s too late for some of us to establish convincing alternative identities.

On the other hand, alternative identities should be established for children before they are twelve or so. Complete interlocking financial, social, digital, etc. for each one.

It doesn’t make you a bad parent if you haven’t done so but a verifiable and alternative identity could be priceless in an uncertain world.

Do You Feel Chilled? W3C and DRM

Filed under: DRM,Intellectual Property (IP),W3C — Patrick Durusau @ 8:56 pm

Indefensible: the W3C says companies should get to decide when and how security researchers reveal defects in browsers by Cory Doctorow.

From the post:

The World Wide Web Consortium has just signaled its intention to deliberately create legal jeopardy for security researchers who reveal defects in its members’ products, unless the security researchers get the approval of its members prior to revealing the embarrassing mistakes those members have made in creating their products. It’s a move that will put literally billions of people at risk as researchers are chilled from investigating and publishing on browsers that follow W3C standards.

It is indefensible.

I enjoy Cory’s postings and fiction but I had to read this one more than once to capture the nature of Cory’s complaint.

As I understand it the argument runs something like this:

1. The W3C is creating a “…standardized DRM system for video on the World Wide Web….”

2. Participants in the W3C process must “…surrender the right to invoke their patents in lawsuits as a condition of participating in the W3C process….” (The keyword here is participants. No non-participant waives their patent rights as a result of W3C policy.)

3. The W3C isn’t requiring waiver of DCMA 1201 rights as a condition for participating in the video DRM work.

All true but I don’t see Cory gets to the conclusion:

…deliberately create legal jeopardy for security researchers who reveal defects in its members’ products, unless the security researchers get the approval of its members prior to revealing the embarrassing mistakes those members have made in creating their products.

Whether the W3C requires participants in the DRM system for video to waive DCMA 1201 rights or not, the W3C process has no impact on non-participants in that process.

Secondly, security researchers are in jeopardy if and only if they incriminate themselves when publishing defects in DRM products. As security researchers, they are capable of anonymously publishing any security defects they find.

Third, legal liability flows from statutory law and not the presence or absence of consensual agreement among a group of vendors. Private agreements can only protect you from those agreeing.

I don’t support DRM and never have. Personally I think it is a scam and tax on content creators. It’s unfortunate that fear that someone, somewhere might not be paying full rate, is enough for content creators to tax themselves with DRM schemes and software. None of which is free.

Rather than arguing about W3C policy, why not point to the years of wasted effort and expense by content creators on DRM? With no measurable return. That’s a plain ROI question.

DRM software vendors know the pot of gold content creators are chasing is at the end of an ever receding rainbow. In fact, they’re counting on it.

February 12, 2017

Oxford Dictionaries Thesaurus Data – XQuery

Filed under: XML,XQuery — Patrick Durusau @ 8:50 pm

Retrieve Oxford Dictionaries API Thesaurus Data as XML with XQuery and BaseX by Adam Steffanick.

From the post:

We retrieved thesaurus data from the Oxford Dictionaries application programming interface (API) and returned Extensible Markup Language (XML) with XQuery, an XML query language, and BaseX, an XML database engine and XQuery processor. This tutorial illustrates how to retrieve thesaurus data—synonyms and antonyms—as XML from the Oxford Dictionaries API with XQuery and BaseX.

The Oxford Dictionaries API returns JavaScript Object Notation (JSON) responses that yield undesired XML structures when converted automatically with BaseX. Fortunately, we’re able to use XQuery to fill in some blanks after converting JSON to XML. My GitHub repository od-api-xquery contains XQuery code for this tutorial.

If you are having trouble staying at your computer during this unreasonably warm spring, this XQuery/Oxford Dictionary tutorial may help!

Ok, maybe that is an exaggeration but only a slight one. 😉

Enjoy!

February 10, 2017

Objective-See – OS X Malware Research

Filed under: Cybersecurity,OS X — Patrick Durusau @ 11:56 am

Objective-See

Patrick Wardle‘s OS X security site described as:

As Macs become more prevelant, so does OS X malware. Unfortunately, current Mac security and anti-virus software is fairly trivial to generically bypass.

Objective-See was created to provide simple, yet effective OS X security tools. Always free of charge – no strings attached!

I don’t see news about OS X malware very often but following @patrickwardle and authors seen there will cure that problem.

Macs may be popular in current regime in Washington, among those not using:

etch-a-sketch-460

😉

BTW, since an Etch-a-Sketch uses aluminum powder, has anyone checked the concealment properties of an Etch-a-Sketch?

That is would the aluminum powder block scanners and if so, how well?

Asking for a friend of course! 😉

PS: In case you need an Etch-A-Sketch for research purposes, http://etchasketch.com/.

As the aluminum powder is removed by the stylus, blocking of EMF would go down. Making me wonder about online drawing games for the Etch-A-Sketch that would have the user removing the EMF barrier in close proximity to the computer.

Extracting any information would be a challenge but then releasing viruses in the wild to attack secure nuclear facilities relies on luck as well.

Macs Gaining Market Share? – First Mac Word Macro Malware Spotted In Wild

Filed under: Cybersecurity — Patrick Durusau @ 11:23 am

Watch Out! First-Ever Word Macro Malware for Apple Mac OS Discovered in the Wild by Swati Khandelwal.

From the post:


Denying permission can save you, but if enabled ignoring warnings, the embedded macro executes a function, coded in Python, that downloads the malware payload to infect the Mac PCs, allowing hackers to monitor webcams, access browser history logs, and steal password and encryption keys.

According to a blog post published this week by Patrick Wardle, director of research at security firm Synack, the Python function is virtually identical to EmPyre – an open source Mac and Linux post-exploitation agent.

“It’s kind of a low-tech solution, but on one hand it’s abusing legitimate functionality so it’s not going to crash like a memory corruption or overflow might, and it’s not going to be patched out,” said Wardle.

Wardle tracked the IP address from which the malicious Word documents were spread to Russia and that IP has previously been associated with malicious activities like phishing attacks.

Granting this isn’t on the same level of technology as the in memory viruses I mentioned yesterday, but an attack vector that exploits human error and isn’t going to be ‘patched’ out is a good find.

With the present Republican regime in the United States, human error may be all that is necessary to peel government IT like an orange.

Besides, it isn’t the sophistication of the attack that counts (outside of BlackHat conferences) but the results you obtain without getting caught.

Yes?

February 9, 2017

Fast and Flexible Query Analysis at MapD with Apache Calcite [Merging Data?]

Filed under: Apache Calcite,MapD,Query Rewriting,SQL — Patrick Durusau @ 8:30 pm

Fast and Flexible Query Analysis at MapD with Apache Calcite by Alex Şuhan.

From the post:

After evaluating a few other options, we decided for Apache Calcite, an incubation stage project at the time. It takes SQL queries and generates extended relational algebra, using a highly configurable cost-based optimizer. Several projects use Calcite already for SQL parsing and query optimization.

One of the main strengths of Calcite is its highly modular structure, which allows for multiple integration points and creative uses. It offers a relational algebra builder, which makes moving to a different SQL parser (or adding a non-SQL frontend) feasible.

In our product, we need runtime functions which are not recognized by Calcite by default. For example, trigonometric functions are necessary for on-the-fly geo projections used for point map rendering. Fortunately, Calcite allows specifying such functions and they become first-class citizens, with proper type checking in place.

Calcite also includes a highly capable and flexible cost-based optimizer, which can apply high-level transformations to the relational algebra based on query patterns and statistics. For example, it can push part of a filter through a join in order to reduce the size of the input, like the following figure shows:

join_filter_pushdown-460

You can find this example and more about the cost-based optimizer in Calcite in this presentation on using it in the Apache Phoenix project. Such optimizations complement the low-level optimizations we do ourselves to achieve great speed improvements.

Relational algebra example
Let’s take a simple query: SELECT A.x, COUNT(*) FROM test JOIN B ON A.x = B.x WHERE A.y > 41 GROUP BY A.x; and analyze the relational algebra generated for it.

In Calcite relational algebra, there are a few main node types, corresponding to the theoretical extended relational algebra model: Scan, Filter, Project, Aggregate and Join. Each type of node, except Scan, has one or more (in the case of Join) inputs and its output can become the input of another node. The graph of nodes connected by data flow relationships is a
directed acyclic graph (abbreviated as “DAG”). For our query, Calcite outputs the following DAG:

DAG

The Scan nodes have no inputs and output all the rows and the columns in tables A and B, respectively. The Join node specifies the join condition (in our case A.x = B.x) and its output contains the columns in A and B concatenated. The Filter node only allows the rows which pass the specified condition and its output preserves all columns of input. The Project node only preserves the specified expressions as columns in the output. Finally, the Aggregate specifies the group by expressions and aggregates.

The physical implementation of the nodes is up to the system using Calcite as a frontend. Nothing in the Join node mandates a certain implementation of the join operation (equijoin in our case). Indeed, using a condition which can’t be implemented as a hash join, like A.x < B.x, would only be reflected by the condition in the Filter node.

You’re not MapD today but that’s no excuse for poor query performance.

Besides, learning Apache Calcite will increase your attractiveness as data and queries on it become more complex.

I haven’t read all the documentation but the “metadata” in Apache Calcite is as flat as any you will find.

Which means integration of different data sources is either luck of the draw or you asked someone the “meaning” of the metadata.

The tutorial has this example:

calcite-460

The column header “GENDER” for example appears to presume the common male/female distinction. But without further exploration of the data set, there could be other genders encoded in that field as well.

If “GENDER” seems too easy, what would you say about “NAME,” bearing in mind that Japanese family names are written first and given names written second. How would those appear under “NAME?”

Apologies! My screen shot missed field “S.”

I have utterly no idea what “S” may or may not represent as a field header. Do you?

If the obviousness of field headers fails with “GENDER” and “NAME,” what do you suspect will happen with less “obvious” field headers?

How successful will merging of data be?

Where would you add subject identity information and how would you associate it with data processed by Apache Calcite?

Opening Secure Channels for Confidential Tips [Allocating Risk for Leaks]

Filed under: Cybersecurity,Journalism,News,Reporting,Security — Patrick Durusau @ 5:45 pm

Opening Secure Channels for Confidential Tips by Martin Shelton.

From the post:

In Shields Up, security user researcher Martin Shelton writes about security threats and defenses for journalists. Below, his first installment. —eds

To make it easier for tipsters to share sensitive information, a growing number of news organizations are launching resources for confidential tips. While there is some overlap between the communication channels that each news organization supports, it’s not always clear which channels are the most practical for routine use. This short guide will describe some basics around how to think about security on behalf of your sources before thinking about tools and practices. I’ll also describe common communication channels for accepting sensitive tips and tradeoffs when using each channel. When thinking about tradeoffs, consider which channels are right for you.
… (emphasis in original)

Martin does a great job of surveying your current security options but doesn’t address the allocation of risk between leakers and news organizations that I covered in U.S. Leaking Law: You Go To Jail – I Win A Pulitzer and/or the option of leaking access rather than the risk of leaking data/documents, How-To: Leaking In Two Steps.

Here’s the comment I’m posting to his post and I will report back on his response, probably in a separate post:

Martin, great job on covering the security options for tips and their tradeoffs!

I do have a question though about the current model of leaking, which puts all of the risk on the leaker. A leaker undertakes the burden of liberating data and/or documents, takes the risk of copying/removing them and then the risk of getting them securely to a news organization.

All of which requires technical skills that aren’t common.

As an alternative, why shouldn’t leakers leak access to such networks/servers and enable news organizations, who have greater technical resources, to undertake the risks of retrieval of such documents?

I mentioned this to another news person and they quickly pointed out the dangers of the Computer Fraud and Abuse Act (CFAA) for a news organization but the same holds true for the leaker. Who very likely has fewer technical skills than any news organization.

Thinking that news organizations can decide to serve the interests of government (follow the CFAA) or they can decided to serve the public interest. In my view, those are not synonymous.

I am still refining ways that leakers could securely leak access but at present, using standard subscription forms with access information instead of identifying properties, offers both a trustworthy target (the news organization) and a multiplicity of places to leak, which prevents effective monitoring of them. I have written more than once about this topic but two of particular interest: U.S. Leaking Law: You Go To Jail – I Win A Pulitzer, and, How-To: Leaking In Two Steps.

Before anyone protests the “ethics” of breaking laws such as the CFAA, recall governments broke faith with their citizens first. Laws like the CFAA are monuments to that breach of faith. Nothing more.

Fileless attacks against enterprise networks

Filed under: Cybersecurity,Security — Patrick Durusau @ 4:42 pm

Kaspersky Lab reports in Fileless attacks against enterprise networks the discovery of malware that hides in memory to avoid detection.

It’s summary:

During incident response, a team of security specialists needs to follow the artefacts that attackers have left in the network. Artefacts are stored in logs, memories and hard drives. Unfortunately, each of these storage media has a limited timeframe when the required data is available. One reboot of an attacked computer will make memory acquisition useless. Several months after an attack the analysis of logs becomes a gamble because they are rotated over time. Hard drives store a lot of needed data and, depending on its activity, forensic specialists may extract data up to a year after an incident. That’s why attackers are using anti-forensic techniques (or simply SDELETE) and memory-based malware to hide their activity during data acquisition. A good example of the implementation of such techniques is Duqu2. After dropping on the hard drive and starting its malicious MSI package it removes the package from the hard drive with file renaming and leaves part of itself in the memory with a payload. That’s why memory forensics is critical to the analysis of malware and its functions. Another important part of an attack are the tunnels that are going to be installed in the network by attackers. Cybercriminals (like Carbanak or GCMAN) may use PLINK for that. Duqu2 used a special driver for that. Now you may understand why we were very excited and impressed when, during an incident response, we found that memory-based malware and tunnelling were implemented by attackers using Windows standard utilities like “SC” and “NETSH“.

Kaspersky reports 140 enterprises in 40 countries have been affected by the malware:

hidden-malware-460

The reported focus has been on banking/financial targets, which implies to me that political targets are not preparing for this type of attack.

If you are going to “play in the street,” an American expression meaning to go in harm’s way, be sure to read the attribution section carefully and repeatedly. Your skills aren’t useful to anyone if you are in prison.

Republican Regime Creates New Cyber Market – Burner Twitter/Facebook Accounts

Filed under: Facebook,Government,Security,Twitter — Patrick Durusau @ 4:17 pm

The current Republican regime has embarked upon creating a new cyber market, less than a month after taking office.

Samatha Dean (Tech Times) reports:

Planning a visit to the U.S.? Your passport is not the only thing you may have to turn in at the immigration counter, be prepared to relinquish your social media account passwords as well to the border security agents.

That’s right! According to a new protocol from the Homeland Security that is under consideration, visitors to the U.S. may have to give their Twitter and Facebook passwords to the border security agents.

The news comes close on the heels of the Trump administration issuing the immigration ban, which resulted in a massive state of confusion at airports, where several people were debarred from entering the country.

John F. Kelly, the Homeland Security Secretary, shared with the Congress on Feb. 7 that the Trump administration was considering this option. The measure was being weighed as a means to sieve visa applications and sift through refugees from the Muslim majority countries that are under the 90-day immigration ban.

I say burner Twitter/Facebook accounts, if you plan on making a second trip to the US, you will need to have the burner accounts maintained over the years.

The need for burner Twitter/Facebook accounts, ones you can freely disclose to border security agents, presents a wide range of data science issues.

In no particular order:

  • Defeating Twitter/Facebook security on a large scale. Not trivial but not the hard part either
  • Creating accounts with the most common names
  • Automated posting to accounts in their native language
  • Posts must be indistinguishable from human user postings, i.e., no auto-retweets of Sean Spicer
  • Profile of tweets/posts shows consistent usage

I haven’t thought about burner bank account details before but that certainly should be doable. Especially if you have a set of banks on the Net that don’t have much overhead but exist to keep records one to the other.

Burner bank accounts could be useful to more than just travelers to the United States.

Kudos to the new Republican regime and their market creation efforts!

State of Washington & State of Minnesota v. Trump [Press Resource]

Filed under: Government,Law,Law - Sources — Patrick Durusau @ 1:49 pm

State of Washington & State of Minnesota v. Trump 9th Circuit Court of Appeals webpage on case: 17-35105.

The clerk of the Ninth Circuit has created a listing of all the pleading, hearings, etc., in date order (most recent at the top of the list) for your research and reading pleasure.

I won’t repeat the listing here as it would be quickly out of date.

Please include: State of Washington & State of Minnesota v. Trump, https://www.ca9.uscourts.gov/content/view.php?pk_id=0000000860 as a hyperlink in all your postings on this case.

Your readers deserve the opportunity to read, hear and see the arguments and briefs in this case for themselves.

PS: It appears to be updated after the close of business for the clerk’s office so filings today aren’t reflected on the page.

Turning Pixelated Faces Back Into Real Ones

Filed under: Image Processing,Image Recognition,Neural Networks — Patrick Durusau @ 1:32 pm

Google’s neural networks turn pixelated faces back into real ones by John E. Dunn.

From the post:

Researchers at Google Brain have come up with a way to turn heavily pixelated images of human faces into something that bears a usable resemblance to the original subject.

In a new paper, the company’s researchers describe using neural networks put to work at two different ends of what should, on the face of it, be an incredibly difficult problem to solve: how to resolve a blocky 8 x 8 pixel images of faces or indoor scenes containing almost no information?

It’s something scientists in the field of super resolution (SR) have been working on for years, using techniques such as de-blurring and interpolation that are often not successful for this type of image. As the researchers put it:

When some details do not exist in the source image, the challenge lies not only in “deblurring” an image, but also in generating new image details that appear plausible to a human observer.

Their method involves getting the first “conditioning” neural network to resize 32 x 32 pixel images down to 8 x 8 pixels to see if that process can find a point at which they start to match the test image.

John raises a practical objection:


The obvious practical application of this would be enhancing blurry CCTV images of suspects. But getting to grips with real faces at awkward angles depends on numerous small details. Emphasise the wrong ones and police could end up looking for the wrong person.

True but John presumes the “suspects” are unknown. That’s true for the typical convenience store robbery on the 10 PM news but not so for “suspects” under intentional surveillance.

In those cases, multiple ground truth images from a variety of angles are likely to be available.

February 8, 2017

Functional Programming in Erlang – MOOC – 20 Feb. 2017

Filed under: Elixir,Erlang,Functional Programming,XQuery — Patrick Durusau @ 9:06 pm

Functional Programming in Erlang with Simon Thompson (co-author of Erlang Programming)

From the webpage:

Functional programming is increasingly important in providing global-scale applications on the internet. For example, it’s the basis of the WhatsApp messaging system, which has over a billion users worldwide.

This free online course is designed to teach the principles of functional programming to anyone who’s already able to program, but wants to find out more about the novel approach of Erlang.

Learn the theory of functional programming and apply it in Erlang

The course combines the theory of functional programming and the practice of how that works in Erlang. You’ll get the opportunity to reinforce what you learn through practical exercises and more substantial, optional practical projects.

Over three weeks, you’ll:

  • learn why Erlang was developed, how its design was shaped by the context in which it was used, and how Erlang can be used in practice today;
  • write programs using the concepts of functional programming, including, in particular, recursion, pattern matching and immutable data;
  • apply your knowledge of lists and other Erlang data types in your programs;
  • and implement higher-order functions using generic patterns.

The course will also help you if you are interested in Elixir, which is based on the same virtual machine as Erlang, and shares its fundamental approach as well as its libraries, and indeed will help you to get going with any functional language, and any message-passing concurrency language – for example, Google Go and the Akka library for Scala/Java.

If you are not excited already, remember that XQuery is a functional programming language. What if your documents were “immutable data?”

Use #FLerlangfunc to see Twitter discussions on the course.

That looks like a committee drafted hashtag. 😉

Predicting Police Cellphone Locations – Weaponizing Open Data

Filed under: Ggplot2,R — Patrick Durusau @ 5:40 pm

Predicting And Mapping Arrest Types in San Francisco with LightGBM, R, ggplot2 by Max Woolf.

Max does a great job of using open source data SF OpenData to predict arrest types in San Francisco.

It takes only a small step to realize that Max is also predicting the locations of police officers and their cellphones.

Without police officers, you aren’t going to have many arrests. 😉

Anyone operating a cellphone surveillance device can use Max’s predictions to gather data from police cellphones and other electronic gear. For particular police officers, for particular types of arrests, or at particular times of day, etc.

From the post:

The new hotness in the world of data science is neural networks, which form the basis of deep learning. But while everyone is obsessing about neural networks and how deep learning is magic and can solve any problem if you just stack enough layers, there have been many recent developments in the relatively nonmagical world of machine learning with boring CPUs.

Years before neural networks were the Swiss army knife of data science, there were gradient-boosted machines/gradient-boosted trees. GBMs/GBTs are machine learning methods which are effective on many types of data, and do not require the traditional model assumptions of linear/logistic regression models. Wikipedia has a good article on the advantages of decision tree learning, and visual diagrams of the architecture:

GBMs, as implemented in the Python package scikit-learn, are extremely popular in Kaggle machine learning competitions. But scikit-learn is relatively old, and new technologies have emerged which implement GBMs/GBTs on large datasets with massive parallelization and and in-memory computation. A popular big data machine learning library, H2O, has a famous GBM implementation which, per benchmarks, is over 10x faster than scikit-learn and is optimized for datasets with millions of records. But even faster than H2O is xgboost, which can hit a 5x-10x speed-ups relative to H2O, depending on the dataset size.

Enter LightGBM, a new (October 2016) open-source machine learning framework by Microsoft which, per benchmarks on release, was up to 4x faster than xgboost! (xgboost very recently implemented a technique also used in LightGBM, which reduced the relative speedup to just ~2x). As a result, LightGBM allows for very efficient model building on large datasets without requiring cloud computing or nVidia CUDA GPUs.

A year ago, I wrote an analysis of the types of police arrests in San Francisco, using data from the SF OpenData initiative, with a followup article analyzing the locations of these arrests. Months later, the same source dataset was used for a Kaggle competition. Why not give the dataset another look and test LightGBM out?

Cellphone data gathered as a result of Max’s predictions can be tested against arrest and other police records to establish the presence and/or absence of particular police officers at a crime scene.

After a police office corroborates the presence of a gun in a suspect’s hand, cellphone evidence they were blocks away, in the presence of other police officers, could prove to be inconvenient.

Latest Data on Cellphone Spy Tool Flood

Filed under: Cybersecurity,Government,Security — Patrick Durusau @ 4:26 pm

Cellphone Spy Tools Have Flooded Local Police Departments by George Joseph.

From the post:


In December 2015, The Intercept released a catalogue of military surveillance tools, leaked by an intelligence community source concerned by this perceived militarization of domestic law enforcement. The catalogue included tools that could track thousands of people’s cellphones at once, extract deleted text messages from captured phones, and monitor ongoing calls and text messages. Following this news, last April, CityLab began sending public records requests to the top fifty largest police across the country asking for purchasing orders and invoices over 2012 to 2016 related to any of the devices listed in the catalogue. (Note: The fifty largest list is based on data released in 2010 from the Police Pay Journal, and thus does not include some departments now among the top fifty largest).

Of the fifty departments sent public records requests, only eight claimed not to have acquired any spy tools leaked by The Intercept’s intelligence source. At least twelve have admitted to having cellphone interception devices, and nineteen have admitted to having cellphone extraction devices. The responses, security-based rejections, and outstanding requests still being processed for CityLab suggest that, at a minimum, thirty-nine of the fifty departments have acquired at least some of these military-grade surveillance tools over the last four years. (Click here to see the original cache of documents, or scroll down to the bottom of this article)
… (emphasis in original)

George details the results of their investigation by class of software/hardware and provides the original documents supporting his analysis.

Later in the post:


As these military-grade spy tools pour down into local police departments across the country, legal experts are concerned that their use isn’t in keeping with individuals’ due process rights. Law enforcement practices vary dramatically across the country. In 2014, the U.S. Supreme Court unanimously ruled that police could not extract data from an arrested individual’s cellphone without ob­tain­ing a war­rant. But the ruling itself did not give clear guidance on how broad police warrant requests could be designed, and such decisions are still left up to law enforcement discretion in many cases.

I puzzle over the “lack of rules for digital surveillance” discussions.

The police/government has:

  • Lied and/or concealed its use of digital surveillance software/hardware
  • Has evaded/resisted any meaningful oversight of its surveillance activities
  • Collects data indiscriminately
  • etc.,

Yet, fashioning rules for the use of digital surveillance is all the rage.

Why will government agencies fear to break digital surveillance rules when they have systematically broken the law in the past?

Personal privacy depends on defeating military grade surveillance tools.

Not military grade but an item for testing your surveillance defeating work:

Build Your Own GSM Base Station For Fun And Profit.

I don’t keep up on the hardware side of things so please comment with more recent hardware/software for surveillance or defeating the same.

Court: Posting Standards Online Violates Copyright Law [+ solution]

Filed under: Government,Intellectual Property (IP),Law,Law - Sources — Patrick Durusau @ 3:22 pm

Court: Posting Standards Online Violates Copyright Law by Trey Barrineau.

From the post:

Last week, the U.S. District Court for the District of Columbia ruled that public-records activist Carl Malamud’s organization, Public.Resource.Org, violated copyright law by publicly sharing standards that are used in laws such as building codes. It also said organizations that develop these standards, including those used in the fenestration industry, have the right to charge reasonable fees to access them. Malamud told DWM in an e-mail that he’ll appeal the ruling.
… (emphasis in original)

I was working on a colorful rant, invoking Mr. Bumble in Charles Dickens’s Oliver Twist:

“If the law supposes that,” said Mr. Bumble, squeezing his hat emphatically in both hands, “the law is a ass- a idiot.

based on the report of the decision when I ran across the full court opinion:

AMERICAN SOCIETY FOR TESTING AND MATERIALS, et al., Plaintiffs, v. PUBLIC.RESOURCE.ORG, INC., Defendant. Case No. 13-cv-1215 (TSC)

The preservation of copyright despite being referenced in a law and/or regulation (pages 19-24) is one of the stronger parts of the decision.

In part it reads:


Congress was well aware of the potential copyright issue posed by materials incorporated by reference when it crafted Section 105 in 1976. Ten years earlier, Congress had extended to federal agencies the authority to incorporate private works by reference into federal regulations. See Pub. L. No. 90-23, § 552, 81 Stat. 54 (1967) (codified at 5 U.S.C. § 552) (providing that “matter reasonably available to the class of persons affected thereby is deemed published in the Federal Register when incorporated by reference therein with the approval of the Director of the Federal Register”). However, in the Copyright Act of 1976, Congress made no mention of these incorporated works in § 105 (no copyright for “any work of the United States Government”) or any other section. As the House Report quoted above indicates, Congress already carefully weighed the competing policy goals of making incorporated works publicly available while also preserving the incentives and protections granted by copyright, and it weighed in favor of preserving the copyright system. See H.R. Rep. No. 94-1476, at 60 (1976) (stating that under § 105 “use by the Government of a private work would not affect its copyright protection in any way”); see also M.B. Schnapper v. Foley, 667 F.2d 102, 109 (D.C. Cir. 1981) (analyzing Copyright Act and holding that “we are reluctant to cabin the discretion of government agencies to arrange ownership and publication rights with private contractors absent some reasonable showing of a congressional desire to do so”).

However, recognizing the importance of public access to works incorporated by reference into federal regulations, Congress still requires that such works be “reasonably available.” 5 U.S.C. § 552(a)(1). Under current federal regulations issued by the Office of the Federal Register in 1982, a privately authored work may be incorporated by reference into an agency’s regulation if it is “reasonably available,” including availability in hard copy at the OFR and/or the incorporating agency. 1 C.F.R. § 51.7(a)(3). Thirteen years later, Congress passed the National Technology Transfer and Advancement Act of 1995 (“NTTAA”) which directed all federal agencies to use privately developed technical voluntary consensus standards. See Pub. L. No. 104-113, 110 Stat. 775 (1996). Thus, Congress initially authorized agencies to incorporate works by reference, then excluded these incorporated works from § 105 of the Copyright Act, and, nearly twenty years later, specifically directed agencies to incorporate private works by reference. From 1966 through the present, Congress has remained silent on the question of whether privately authored standards and other works would lose copyright protection upon incorporation by reference. If Congress intended to revoke the copyrights of such standards when it passed the NTTAA, or any time before or since, it surely would have done so expressly. See Whitman v. Am. Trucking Ass’ns, Inc., 531 U.S. 457, 468 (2001) (“Congress . . . does not alter the fundamental details of a regulatory scheme in vague terms or ancillary provisions—it does not . . . hide elephants in mouseholes.”); United States v. Fausto, 484 U.S. 439, 453 (1988) (“[It] can be strongly presumed that Congress will specifically address language on the statute books that it wishes to change.”). Instead, Congress has chosen to maintain the scheme it created in 1966: that such standards must simply be made reasonably available. See 5 U.S.C. § 552(a)(1).
… (emphasis in original, pages 21-23)

Finding to the contrary, that is referencing a privately authored standard as terminating the rights of a copyright holder, creates obvious due process problems.

Some copyright holders, ASTM for example, report sales as a substantial portion of their yearly income. ASTM International 2015 Annual Report gives an annual operating income of $72,543,549, of which, $48,659,345 was from publications. (page 24)

Congress could improve both the “reasonable access” for citizens and the lot of standard developers by requiring:

  • for works incorporated by reference into federal regulations, agencies must secure a license renewable without time limit for unlimited digital reproduction of that work by anyone
  • digital reproductions of such works, whether by the licensing agency or others, must reference the work’s publisher for obtaining a print copy

That gives standard developing organizations a new source of revenue, increases the “reasonable access” of citizens, and if past experience is any guide, digital copies may drive print sales.

Any takers?

February 7, 2017

“Don’t do the crime if you can’t do the time” – Scary Talk, Check The Facts

Filed under: Cybersecurity,Government — Patrick Durusau @ 5:21 pm

Steve Morgan reprises that old adage in: Teenage hackers beware: Don’t do the cybercrime if you can’t do the jail time.

From the post:

The latest Hack Blotter features a garden variety of cyber perps who’ve been investigated, apprehended, arrested, and/or convicted.

Local U.S. law enforcement agencies are devoting more resources to cybercrime in an effort to prosecute cybercriminals. Atlanta and New York are the latest cities to invest into new cybercrime units and labs.

International authorities are also stepping up arrests and convictions of hackers.

Some teenagers are learning the hard way that cybercrime doesn’t pay. The Hack Blotter features the following children who’ve paid the price for hacking over the past few months:

Scary talk from Morgan but if you followed the link to Hack Blotter you will find:

  • forty-six (46) arrests/prosecutions around the world, Oct. – Dec. 2016
  • 7 billion, 482 million odd people, current world population

I can’t say those are bad odds. You?

More than improved cybersecurity or cybercops, the principal danger to your freedom is you.

Should you decided to hack, exercise of good operational security (opsec) is no guarantee you won’t get caught but it goes a long way in that direction.

Financing the Revolution – Hacking Slot Machines

Filed under: Algorithms,Cybersecurity — Patrick Durusau @ 4:28 pm

Russians Engineer A Brilliant Slot Machine Cheat-And Casinos Have No Fix by Brendan I. Koerner.

From the post:

IN EARLY JUNE 2014, accountants at the Lumiere Place Casino in St. Louis noticed that several of their slot machines had—just for a couple of days—gone haywire. The government-approved software that powers such machines gives the house a fixed mathematical edge, so that casinos can be certain of how much they’ll earn over the long haul—say, 7.129 cents for every dollar played. But on June 2 and 3, a number of Lumiere’s machines had spit out far more money than they’d consumed, despite not awarding any major jackpots, an aberration known in industry parlance as a negative hold. Since code isn’t prone to sudden fits of madness, the only plausible explanation was that someone was cheating.

Casino security pulled up the surveillance tapes and eventually spotted the culprit, a black-haired man in his thirties who wore a Polo zip-up and carried a square brown purse. Unlike most slots cheats, he didn’t appear to tinker with any of the machines he targeted, all of which were older models manufactured by Aristocrat Leisure of Australia. Instead he’d simply play, pushing the buttons on a game like Star Drifter or Pelican Pete while furtively holding his iPhone close to the screen.

He’d walk away after a few minutes, then return a bit later to give the game a second chance. That’s when he’d get lucky. The man would parlay a $20 to $60 investment into as much as $1,300 before cashing out and moving on to another machine, where he’d start the cycle anew. Over the course of two days, his winnings tallied just over $21,000. The only odd thing about his behavior during his streaks was the way he’d hover his finger above the Spin button for long stretches before finally jabbing it in haste; typical slots players don’t pause between spins like that.

On June 9, Lumiere Place shared its findings with the Missouri Gaming Commission, which in turn issued a statewide alert. Several casinos soon discovered that they had been cheated the same way, though often by different men than the one who’d bilked Lumiere Place. In each instance, the perpetrator held a cell phone close to an Aristocrat Mark VI model slot machine shortly before a run of good fortune.

… (emphasis in original)

A very cool story with recording a slot machine’s “pseudo-random” behavior, predicting its future “pseudo-random” behavior, acting on those predictions, all while in full view of security cameras and human observers.

Now, there’s a hacker’s challenge for you!

Legislation is pending to bring casino gambling to Georgia (USA). I assume they will have all new slot machines so newer algorithms are going to be in demand.

Assuming you don’t have access to the source code, how much video of a particular machine would you need to predict the “pseudo-random” behavior?

Thinking that people do have “favorite” machines and there would be nothing odd about the same person playing the same machine for several hours. Or even returning over a period of days. Think of that person as the information forager.

The goal of the information forager being to record as much activity of the slot machine as possible, that is not to play quickly.

One or more players can return to the casino and play the machines for which predictions are available. In that respect, the hackers in the article were smart in trying to keep winnings low, so as to avoid attracting attention.

Casinos’ crunch data like everyone else so don’t expect for unusual winnings to go unnoticed. Still, in college towns, like Atlanta, it should not be too difficult to recruit players with some equitable split of the winnings.

PS: You can buy a slot machine once you know the model being used in the target casino. Hell, install it in a frat house so it pays for itself and avoids a direct connection to you.

Leakers As Lighthouses In A Sea Of Data

Filed under: Journalism,News,Reporting — Patrick Durusau @ 3:50 pm

I was extolling my How-To: Leaking In Two Steps yesterday when a very practical problem suggested itself.

In sneakernet leaking (hard copy/digital), the leaker has selected/filtered the leaked content prior to delivery to a news organization.

Leaked access puts the burden on reporters to explore to find relevant data. Without some guidance from a leaker, reporters won’t know if a “big story” is in the next directory, file or spreadsheet.

Concern was also voiced that traditional news organizations might run afoul of the Computer Fraud and Abuse Act (CFAA).

Governments enact laws like the CFAA in order to protect their own criminal activity and those of criminals like them.

Think of the last time your local, state or national government did something that would be universally admired and acclaimed but kept it secret.

Coming up empty? Some am I.

Good acts are never kept secret and that is a commentary on acts that are kept secret

I won’t suggest you violate the CFAA, if you are subject to it, but do consider if you are serving the government’s interest in obeying such laws or the public’s.

I’m re-factoring How-To: Leaking In Two Steps to keep the ease of leaking for leakers, preserve their role as lighthouses, and, perhaps even more importantly, to reduce if not eliminate CFAA liability for news organizations.

February 6, 2017

Eight Days in March: [Bias by Omission]

Filed under: Bias,Journalism,News,Reporting — Patrick Durusau @ 9:19 pm

Eight Days in March: How the World Searched for Terror Attacks by Google Trends.

eight-days-in-march-460

Cities that searched for these attacks:

cities-searching-460

See the original for full impact but do you notice a bias by omission?

What about the terrorist bombings by the United States and its allies in Syria and Iraq, that happened every day mentioned in this graphic?

Operation Inherent Resolve reports:

Between Aug. 8, 2014 and Jan. 30, 2017, U.S. and partner-nation aircraft have flown an estimated 136,069 sorties in support of operations in Iraq and Syria.

That’s 906 days or 150 sorties on average per day.

Or for eight days in March, 1200 acts of terrorism in Iraq and Syria.

Readers who are unaware of the crimes against the people of Iraq and Syria won’t notice the bias in this graphic.

Every biased graphic is an opportunity to broaden a reader’s awareness.

Take advantage of them.

Open Science: Too Much Talk, Too Little Action [Lessons For Political Opposition]

Filed under: Open Access,Open Data,Open Science,Politics — Patrick Durusau @ 12:49 pm

Open Science: Too Much Talk, Too Little Action by Björn Brembs.

From the post:

Starting this year, I will stop traveling to any speaking engagements on open science (or, more generally, infrastructure reform), as long as these events do not entail a clear goal for action. I have several reasons for this decision, most of them boil down to a cost/benefit estimate. The time spent traveling does not seem worth the hardly noticeable benefits any more.

I got involved in Open Science more than 10 years ago. Trying to document the point when it all started for me, I found posts about funding all over my blog, but the first blog posts on publishing were from 2005/2006, the announcement of me joining the editorial board of newly founded PLoS ONE late 2006 and my first post on the impact factor in 2007. That year also saw my first post on how our funding and publishing system may contribute to scientific misconduct.

In an interview on the occasion of PLoS ONE’s ten-year anniversary, PLoS mentioned that they thought the publishing landscape had changed a lot in these ten years. I replied that, looking back ten years, not a whole lot had actually changed:

  • Publishing is still dominated by the main publishers which keep increasing their profit margins, sucking the public teat dry
  • Most of our work is still behind paywalls
  • You won’t get a job unless you publish in high-ranking journals.
  • Higher ranking journals still publish less reliable science, contributing to potential replication issues
  • The increase in number of journals is still exponential
  • Libraries are still told by their faculty that subscriptions are important
  • The digital functionality of our literature is still laughable
  • There are no institutional solutions to sustainably archive and make accessible our narratives other than text, or our code or our data

The only difference in the last few years really lies in the fraction of available articles, but that remains a small minority, less than 30% total.

So the work that still needs to be done is exactly the same as it was at the time Stevan Harnad published his “Subversive Proposal” , 23 years ago: getting rid of paywalls. This goal won’t be reached until all institutions have stopped renewing their subscriptions. As I don’t know of a single institution without any subscriptions, that task remains just as big now as it was 23 years ago. Noticeable progress has only been on the margins and potentially in people’s heads. Indeed, now only few scholars haven’t heard of “Open Access”, yet, but apparently without grasping the issues, as my librarian colleagues keep reminding me that their faculty believe open access has already been achieved because they can access everything from the computer in their institute.

What needs to be said about our infrastructure has been said, both in person, and online, and in print, and on audio, and on video. Those competent individuals at our institutions who make infrastructure decisions hence know enough to be able to make their rational choices. Obviously, if after 23 years of talking about infrastructure reform, this is the state we’re in, our approach wasn’t very effective and my contribution is clearly completely negligible, if at all existent. There is absolutely no loss if I stop trying to tell people what they already should know. After all, the main content of my talks has barely changed in the last eight or so years. Only more recent evidence has been added and my conclusions have become more radical, i.e., trying to tackle the radix (Latin: root) of the problem, rather than palliatively care for some tangential symptoms.

The line:

What needs to be said about our infrastructure has been said, both in person, and online, and in print, and on audio, and on video.

is especially relevant in light of the 2016 presidential election and the fund raising efforts of organizations that form the “political opposition.”

You have seen the ads in email, on Facebook, Twitter, etc., all pleading for funding to oppose the current US President.

I agree the current US President should be opposed.

But the organizations seeking funding failed to stop his rise to power.

Whether their failure was due to organizational defects or poor strategies is really beside the point. They failed.

Why should I enable them to fail again?

One data point, the Women’s March on Washington was NOT organized by organizations with permanents staff and offices in Washington or elsewhere.

Is your contribution supporting staffs and offices of the self-righteous (the primary function of old line organizations) or investigation, research, reporting and support of boots on the ground?

Government excesses are not stopped by bewailing our losses but by making government agents bewail theirs.

February 5, 2017

Learn A Language, 140 Characters At A Time (GIJN)

Filed under: Journalism,News,Reporting — Patrick Durusau @ 5:39 pm

Global Investigative Journalism Network (GIJN) announced two Twitter feeds today:

@gijnArabic

@gijnRu

To complement:

@gijn

@gijnAfrica

@gijnCh

@gijnEs

The GIJN Twitter feed expansion prompted me to think of its feeds as part of a language learning effort.

I doubt you would find classic literature quoted in tweets but that’s rare these days outside of classrooms.

Certainly not common in tweets from political leaders. 😉

From their about page:

The Global Investigative Journalism Network (GIJN) is an international association of nonprofit organizations that support, promote, and produce investigative journalism. GIJN holds conferences, conducts trainings, provides resources and consulting, and encourages the creation of similar nonprofit groups. It was founded in 2003 when more than 300 journalists from around the world gathered for the second Global Investigative Journalism Conference in Copenhagen. Since then it has grown to 145 member organizations in 62 countries.

If you don’t know the Global Investigative Journalism Network (GIJN), pay them a visit. You will like what you find.

February 4, 2017

Tooling Up: Adding Windows 10 to Ubuntu

Filed under: Microsoft,Software — Patrick Durusau @ 8:58 pm

In preparation for an exciting year, I have installed/upgraded several programs on Ubuntu but need to:

  • Generate OOXML files with MS Office
  • Run GIS software not otherwise available
  • Test IE/Office/Windows vulnerabilities
  • Use WebEx

That means a copy of Windows 10 to enable access to Office 365.

Abhishek Prakash’s How to Install Windows 10 in VirtualBox in Linux did the trick for me.

One caveat, my VirtualBox created by default an optical drive so when I added the Windows iso image as a second optical drive, starting the install reports no bootable media. Deleting the default optical drive, leaving only the Windows iso image fixed the problem.

The subscription/install of Office 365 went smoothly.

By default storing files on OneDrive. (1 TB)

Provocative name suggestions for encrypted core dumps?

Other than the glitch with the extra optical drive, it all went smoothly, albeit in Windows fashion, somewhat slowly at times.

Some traditions never change.

😉

February 3, 2017

Burner Phone Guide – Caution on Burner App

Filed under: Cybersecurity,Security — Patrick Durusau @ 4:08 pm

Now’s Probably The Time To Consider One Of These Burner Phones by Paul Sarconi.

From the post:

WE’RE LIVING IN a new era of political unpredictability. Who knows what race, religious group, or professional sector will be scrutinized tomorrow? If you’re concerned that your devices will be targeted for confiscation and search, heed caution now. Start carrying a burner phone—a handset you can wipe clean and destroy without much thought. We’ve rounded up some good options.

One note: The point of using a burner is to avoid leaving a trace of your phone activity. Our list of recommended phones (and one app!) comes with links to online retailers so you can read more about the devices, but if you’re trying to stay private, you should buy both the phone and a pre-paid data allotment with cash. Most of these handsets (and the prepaid cards) are available at big-box stores here and abroad.
… (emphasis in original)

If your privacy matters, burner phones are in your present and future.

Quite recently I was creating an account at a hacker site that required, required mind you, a cellphone number for authentication.

That’s crazy. Why would I want to label myself with my cellphone number in a hacker forum? Not today, not tomorrow, not any day.

So, Paul’s list comes at an opportune time.

A word of caution about the Burner App.

It’s true you can delete the Burner App temporary phone number from your phone but Burner App maintains a copy of that number with your account. In case you want to “reactivate” the number.

Trusting a third party is a poor opening move in learning to protect your privacy.

Buy a debit card for cash and use a fake identity with Burner App.

How-To: Leaking In Two Steps

Filed under: Cybersecurity,Journalism,News,Reporting,Security — Patrick Durusau @ 3:22 pm

In Lowering the Bar for Leakers I proposed this method for leaking login credentials:

  1. Write login credentials (not your own), login URL, on paper
  2. Mail to (news address) – no return address
  3. News Media: Destroys all leaked credentials upon receipt

Easier than the convolutions you will find at: How easy is it to securely leak information to some of America’s top news organizations? This easy or Attention Federal Employees: If You See Something, Leak Something, but we can do better.

A Universal (nearly) and Secure Leaking Point

Can you think of one characteristic shared by almost all websites? Aside from being on the Web?

The ability to create an account for news and updates!

Like this page from the New York Times:

nytimes-account-460

Warning: Leak login credentials to sites using the https protocol only.

Leaking access to a publicly accessible server

Leaking your sysadmin’s, boss’s, co-worker’s credentials, you enter:

nytimes-account-460-leak-1

Leaking access to a server on a restricted network

For servers or resources requiring more than one set of credentials, say on a secure network, again using your sysadmin’s, boss’s, co-worker’s credentials, you enter:

nytimes-account-460-leak-2

Leaking In Two Steps

The leaking of login credentials (not your own) is two steps:

  1. Create account from non-work computer
  2. Enter login credentials as account details

You are protected by:

  1. SSL encryption
  2. Safety in numbers – Study finds that 97% of large companies have had credentials leaked online
  3. Credential duplication is a well-known fact – 17% of passwords are “123456”
  4. Not facing the risks of a sneakernet thief to steal, transport and deliver data in hard copy or digital format

This technique will work with agencies, banks, corporations, courts, governments, legislatures, PACs, anywhere that requires digital login credentials.

I used email and password fields here but that is just an artifact of the New York Times form. Other parts of a form and other separators are certainly possible.

PS: Don’t leak credentials to me because my site doesn’t have SSL (right now) and I’m not in full control of the server.

Personally, if I were to accept leaked credentials, I would store that data on a RAM disk.

February 2, 2017

The Power of Big Data and Psychographics [Fact Checking]

Filed under: Government,Personalization,Persuasion,Politics — Patrick Durusau @ 8:42 pm

From the description:

In a 10 minute presentation at the 2016 Concordia Summit, Mr. Alexander Nix discusses the power of big data in global elections. Cambridge Analytica’s revolutionary approach to audience targeting, data modeling, and psychographic profiling has made them a leader in behavioral microtargeting for election processes around the world.

A highly entertaining but deceptive presentation on the state of the art for marketing political candidates.

Nix claims that most marketing companies base their advertising on demographics and geographics, sending the same message to all women, all African-Americans, etc.

Worse than a “straw man,” that’s simply false. If you know the work Selling Blue Elephants by Howard Moskowitz and Alex Gofman, then you know that marketers tweak their pitches to very small market slices.

But you don’t need to find a copy of Selling Blue Elephants or take my word for that. On your next visit to the grocery store see for yourself how many variations of a popular shampoo or spaghetti sauce are offered. Each one is calculated to attract a particular niche of the overall market.

Nix goes on to describe advertising in the 1960’s as “top down,” “hope messages resonant,” etc.

Not only is that another false claim, but the application described by Nix was pioneered for the 1960 presidential campaign.


Ithiel de Sola Pool, with others, developed the Simulmatics program for the computation of a great variety of factors thought to influence voting, for specific use in the 1960 presidential election. A multitude of influences can be introduced into the program, together with modifications of a strategic nature, and the results bear on both prediction and choice of strategy, much in the manner that elaborate market research influences business decision on manufacture and sale of a new product. The Simulmatics project assembled a basic matrix of voter types and “issue clusters” (480 of the former and 52 of the latter, making a total of 24,960 cells), consolidating as values the accumulated archives of polling on all kinds of questions. The records of the Roper Public Opinion Research Center at Williamstwon were used as source material. With no data later than 1958, the simulation achieved a correlation by states of .82 with the actual Kennedy vote.

(“The Mathematical Approach to Political Science” by Oliver Benson, in Contemporary Political Analysis, edited by James C. Charlesworth, The Free Press, 1967, at pp. 129-130)

I’ll grant that Nix has more data at his disposal and techniques have changed in the last fifty-seven (57) years, but there’s no legitimate reason to not credit prior researchers in the field.

PS: If you find a hard (or scanned) copy of The Simulmatics Project by Ithiel de Sola Pool, let me know.

Data Journalism Manual

Filed under: Journalism,News,Reporting — Patrick Durusau @ 4:57 pm

Data Journalism Manual by ODECA.

There are five data journalism modules:

Plus Labs.

The page footer reads:

This data journalism manual has been adapted for UNDP Istanbul Regional Hub by Eva Constantaras, and in Russian by Anastasia Valeeva, from an original work produced for The World Bank’s Sudan Evidence Base Programme, supported by the United Kingdom Department for International Development and found at https://www.sudandata.org/learning/2

(see Data Journalism Manual for the modules in Russian.)

About ODECA:

Open Data in Europe and Central Asia (ODECA) is a platform to support government representatives, civil society activists, tech activists and citizens that care about and work with open data.

The network covers 18 countries in the region and aims to stimulate innovation, knowledge sharing and learning among practitioners and aficionados of open data regionally and globally.

Our goal is to use the potential of open data to transform societies by empowering citizens and supporting governments to meet the UN Sustainable Development Goals. While we are still exploring all the ways that data will contribute to the SDGs, it is undeniable that it will play an important role in reaching and measuring them.

The network brings in the knowledge and experience of global and regional leaders in open data.
(emphasis in the original)

Enjoy!

Neil M. Gorsuch (Library of Congress, Bibliography)

Filed under: Government,Law — Patrick Durusau @ 4:16 pm

This bibliography created by the Library of Congress on Neil M. Gorsuch covers articles, books, cases written by Judge Gorsuch and others.

One of the few sane resource collections you will find on Judge Gorsuch.

Share it widely.

U.S. Leaking Law: You Go To Jail – I Win A Pulitzer

Filed under: Journalism,News,Reporting — Patrick Durusau @ 3:44 pm

While researching Challenging Anti-Whistleblowing Provision (Germany) [Republication of „stolen“ Data] in a US context, I encountered: The Legality of Publishing Hacked E-mails by Diana Dellamere.

Published in 2009 and as with all legal issues, consult a lawyer but it summarizes the rule on “illegal” content as:


Bartnicki v. Vopper is the most protective of journalists and sets out the primary “test,” holding that a broadcaster could not be held civilly liable for publishing documents or tapes illegally procured by a third party. The court set out three criteria for legitimate first amendment protection: (1) the media outlet played no role in the illegal interception; (2) media received the information lawfully; (3) the issue was a matter of public concern.

If my title sounds harsh towards the press, remember that the Washington Post won a Pulitzer Prize based on Snowden’s leaks and yet called for him to not be pardoned.

I suspect that first requirement:

(1) the media outlet played no role in the illegal interception;

is part of the reason why the bar for leakers remains high, that is media outlets don’t accept leaked login credentials for the recovery of material of public interest.

Media outlets need to realize the “no role in the illegal interception” condition of Bartnicki v. Vopper is a bargain with the devil. From which both media outlets and the public suffer.

Media outlets suffer because despite the brave rhetoric of “speaking truth to power,” media outlets say in fact:

speaking such truth as breaks through the wall of fear and punishment maintained by power

In honoring the condition of “no role in the illegal interception” media outlets have chosen a side. It isn’t the side of transparency, public interest or government accountability.

If that weren’t bad enough, the public suffers by being deprived of facts that skilled data miners could recover, that lie beyond the skill of leakers who could leak access credentials.

Everyone gets to make choices and certainly media outlets, we could all name a few, can choose to be government toadies.

As far as “legality” is concerned, I call your attention to: Tweeter And The Monkey Man by Traveling Wilburys:

Jan had told him many times, “It was you to me who taught
In Jersey anything’s legal, as long as you don’t get caught”

The law is codified caprice that favors the powerful.

Whether to break it or not asks how much is the truth worth to you really?

February 1, 2017

Digital Humanities / Studies: U.Pitt.Greenberg

Filed under: Digital Research,Humanities,Literature,Social Sciences,XML,XQuery — Patrick Durusau @ 9:13 pm

Digital Humanities / Studies: U.Pitt.Greenberg maintained by Elisa E. Beshero-Bondar.

I discovered this syllabus and course materials by accident when one of its modules on XQuery turned up in a search. Backing out of that module I discovered this gem of a digital humanities course.

The course description:

Our course in “digital humanities” and “digital studies” is designed to be interdisciplinary and practical, with an emphasis on learning through “hands-on” experience. It is a computer course, but not a course in which you learn programming for the sake of learning a programming language. It’s a course that will involve programming, and working with coding languages, and “putting things online,” but it’s not a course designed to make you, in fifteen weeks, a professional website designer. Instead, this is a course in which we prioritize what we can investigate in the Humanities and related Social Sciences fields about cultural, historical, and literary research questions through applications in computer coding and programming, which you will be learning and applying as you go in order to make new discoveries and transform cultural objects—what we call “texts” in their complex and multiple dimensions. We think of “texts” as the transmittable, sharable forms of human creativity (mainly through language), and we interface with a particular text in multiple ways through print and electronic “documents.” When we refer to a “document,” we mean a specific instance of a text, and much of our work will be in experimenting with the structures of texts in digital document formats, accessing them through scripts we write in computer code—scripts that in themselves are a kind of text, readable both by humans and machines.

Your professors are scholars and teachers of humanities, not computer programmers by trade, and we teach this course from our backgrounds (in literature and anthropology, respectively). We teach this course to share coding methods that are highly useful to us in our fields, with an emphasis on working with texts as artifacts of human culture shaped primarily with words and letters—the forms of “written” language transferable to many media (including image and sound) that we can study with computer modelling tools that we design for ourselves based on the questions we ask. We work with computers in this course as precision instruments that help us to read and process great quantities of information, and that lead us to make significant connections, ask new kinds of questions, and build models and interfaces to change our reading and thinking experience as people curious about human history, culture, and creativity.

Our focus in this course is primarily analytical: to apply computer technologies to represent and investigate cultural materials. As we design projects together, you will gain practical experience in editing and you will certainly fine-tune your precision in writing and thinking. We will be working primarily with eXtensible Markup Language (XML) because it is a powerful tool for modelling texts that we can adapt creatively to our interests and questions. XML represents a standard in adaptability and human-readability in digital code, and it works together with related technologies with which you will gain working experience: You’ll learn how to write XPath expressions: a formal language for searching and extracting information from XML code which serves as the basis for transforming XML into many publishable forms, using XSLT and XQuery. You’ll learn to write XSLT: a programming “stylesheet” transforming language designed to convert XML to publishable formats, as well as XQuery, a query (or search) language for extracting information from XML files bundled collectively. You will learn how to design your own systematic coding methods to work on projects, and how to write your own rules in schema languages (like Schematron and Relax-NG) to keep your projects organized and prevent errors. You’ll gain experience with an international XML language called TEI (after the Text Encoding Initiative) which serves as the international standard for coding digital archives of cultural materials. Since one of the best and most widely accessible ways to publish XML is on the worldwide web, you’ll gain working experience with HTML code (a markup language that is a kind of XML) and styling HTML with Cascading Stylesheets (CSS). We will do all of this with an eye to your understanding how coding works—and no longer relying without question on expensive commercial software as the “only” available solution, because such software is usually not designed with our research questions in mind.

We think you’ll gain enough experience at least to become a little dangerous, and at the very least more independent as investigators and makers who wield computers as fit instruments for your own tasks. Your success will require patience, dedication, and regular communication and interaction with us, working through assignments on a daily basis. Your success will NOT require perfection, but rather your regular efforts throughout the course, your documenting of problems when your coding doesn’t yield the results you want. Homework exercises are a back-and-forth, intensive dialogue between you and your instructors, and we plan to spend a great deal of time with you individually over these as we work together. Our guiding principle in developing assignments and working with you is that the best way for you to learn and succeed is through regular practice as you hone your skills. Our goal is not to make you expert programmers (as we are far from that ourselves)! Rather, we want you to learn how to manipulate coding technologies for your own purposes, how to track down answers to questions, how to think your way algorithmically through problems and find good solutions.

Skimming the syllabus rekindles an awareness of the distinction between the “hard” sciences and the “difficult” ones.

Enjoy!

Update:

After yesterday’s post, Elisa Beshero-Bondar tweeted this one course is now two:

At a new homepage: newtFire {dh|ds}!

Enjoy!

How Is GIS Being Used To Map Resistance And Political Protests?

Filed under: GIS,Government,Maps,Protests — Patrick Durusau @ 8:52 pm

How Is GIS Being Used To Map Resistance And Political Protests? by Sarah Bond.

From the post:

In the days since Donald Trump became president on January 20, 2017, millions of protestors have gathered in cities both big and small across the globe. And while presidential counselor Kellyanne Conway told Chuck Todd on NBC’s “Meet The Press” that, “There’s really no way to quantify crowd numbers“–digital humanists, data scientists, librarians and geographers beg to differ.

Let’s check in on some projects attempting to use GIS to visualize the recent political protests, preserve data and keep activists informed.

womens-march-460

Despite Conway’s remarks, a Google Doc started by Jeremy Pressman at the University of Connecticut and Erica Chenoweth of the University of Denver soon began to collect crowd-sourced estimates from the Women’s Marches on January 20, 2017 organized by city, state and country. As they say on the public spreadsheet, “We are not collecting this data as part of a research project. We are doing this in the public interest. We are not affiliated with any other efforts to collect data on the demonstrations.” Over at Vox, graphics reporter Sarah Frostenson turned their data into a static map. Other researchers also weighed in. Doug Duffy, a PhD candidate at the University of Toronto, made an interactive map of Pressman and Chenoweth’s data here and posted the visualization to his GitHub page. He even cleaned the data for easy download and reuse (with attribution) by others.

The post has links to a number of other projects that are mapping data related to resistance and political protests.

If that wasn’t encouraging enough, Sarah’s post appeared in Forbes, which isn’t known for being a hotbed of criminal syndicalism.

😉

Can using GIS to plan resistance and political protests be very far away?

« Newer PostsOlder Posts »

Powered by WordPress