Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 12, 2016

Inside the fight to reveal the CIA’s torture secrets [Support The Guardian]

Filed under: Government,Government Data,Journalism,News,Politics,Reporting,Transparency — Patrick Durusau @ 3:19 pm

Inside the fight to reveal the CIA’s torture secrets by Spencer Ackerman.

Part one: Crossing the bridge

Part two: A constitutional crisis

Part three: The aftermath

Ackerman captures the drama of a failed attempt by the United States Senate to exercise oversight on the Central Intelligence Agency (CIA) in this series.

I say “failed attempt” because even if the full 6,200+ page report is ever released, the lead Senate investigator, Daniel Jones, obscured the identities of all the responsible CIA personnel and sources of information in the report.

Even if the full report is serialized in your local newspaper, the CIA contractors and staff guilty of multiple felonies, will be not one step closer to being brought to justice.

To that extent, the “full” report is itself a disservice to the American people, who elect their congressional leaders and expect them to oversee agencies such as the CIA.

From Ackerman’s account you will learn that the CIA can dictate to its overseers, the location and conditions under which it can view documents, decide which documents it is allowed to see and in cases of conflict, the CIA can spy on the Select Senate Committee on Intelligence.

Does that sound like effective oversight to you?

BTW, you will also learn that members of the “most transparent administration in history” aided and abetted the CIA in preventing an effective investigation into the CIA and its torture program. I use “aided and abetted” deliberately and in their legal sense.

I mention in my header that you should support The Guardian.

This story by Spencer Ackerman is one reason.

Another reason is that given the plethora of names and transfers recited in Ackerman’s story, we need The Guardian to cover future breaks in this story.

Despite the tales of superhuman security, nobody is that good.

I leave you with the thought that if more than one person knows a secret, then it it can be discovered.

Check Ackerman’s story for a starting list of those who know secrets about the CIA torture program.

Good hunting!

September 11, 2016

United States Treaties [Library of Congress] – Incomplete – Missing Native American Treaties

Filed under: Government,Law,Law - Sources — Patrick Durusau @ 8:58 pm

United States Treaties Added to the Law Library Website by Jennifer González.

From the webpage:

We have added the United States Treaty Series, compiled by Charles I. Bevans, to our online digital collection. This collection includes treaties that the United States signed with other countries from 1776 to 1949. The collection consists of 13 volumes: four volumes of multilateral treaties, eight volumes of bilateral treaties and one volume of an index.

Multilateral Treaties

Bilateral Treaties

Charles I. Bevans did not include the treaties with native Americans listed at Treaties Between the United States and Native Americans, part of the Avalon project at Yale Law School, Lillian Goldman Law Library.

The Avalon project lists thirty treaties from 1778 – 1868, along with links to their full texts.

For your reading convenience, the list follows:


1778
  • Treaty With the Delawares
  • 1782
  • Chickasaw Peace Treaty Feeler
  • 1784
  • Treaty With the Six Nations
  • 1785
  • Treaty With the Wyandot, etc.

  • Treaty With The Cherokee
  • 1786
  • Treaty With the Chocktaw

  • Treaty With the Chickasaw

  • Treaty With the Shawnee
  • 1789
  • Treaty With the Wyandot, etc.

  • Treaty With the Six Nations
  • 1790
  • Treaty With the Creeks
  • 1791
  • Treaty With the Cherokee
  • 1794
  • Treaty With the Cherokee

  • Treaty With the Six Nations

  • Treaty With the Oneida, etc.
  • 1795
  • Treaty of Greenville
  • 1805
  • Chickasaw Treaty
  • 1816
  • Treaty With the Chickasaw
  • 1818
  • “Secret” Journal on Negotiations of the Chickasaw Treaty of 1818

  • Treaty With the Chickasaw : 1818
  • 1826
  • Refusal of the Chickasaws and Choctaws to Cede Their Lands in Mississippi : 1826
  • 1828
  • Treaty With The Potawatami, 1828.
  • 1830
  • Treaty With the Chickasaw : 1830, Unratified
  • 1832
  • Treaty With the Potawatami, 1832.
  • 1852
  • Treaty with the Apache, July 1, 1852.
  • 1853
  • Treaty with the Comanche, Kiowa, and Apache; July 27, 1853
  • 1865
  • Treaty with the Cheyenne and Arapaho; October 14, 1865

  • Treaty with the Apache, Cheyenne, and Arapaho; October 17, 1865.
  • 1867
  • Treaty With the Kiowa, Comanche, and Apache; October 21, 1867.
  • 1868
  • Fort Laramie Treaty : 1868
  • You should draw your own conclusions about why these treaties were omitted from the Bevans edition. Their omission isn’t mentioned or explained in its preface.

    projectSlam [Public self-protection. Think Trojans.]

    Filed under: Cybersecurity,Security — Patrick Durusau @ 7:39 pm

    projectSlam by Michael Banks.

    From the webpage:

    Project Slam is an initiative to utilize open source programs, operating systems and tools to aid in defending against nefarious adversaries. The overall focus is to research adversary’s behavior and utilize the data that can be captured to generate wordlists, blacklists, and expose methodologies of various threat actors that can be provided back to the public in a meaningful and useful way…

    Partial data for 2016 includes:

    A medium interaction honeypot was deployed with a focus on usernames and passwords. While attackers were attacking the honeypot, projectSlam was sucking up the attempts to generate a wordlist of what NOT to make your passwords.

    Imagine that! Instead of hoarding information from a vulnerable public, or revealing only the top 10/20 worst passwords, Michael is posting the passwords hackers are looking for online!

    Looking forward to more results from projectSlam and cybersecurity projects that enable the public to protect themselves!

    Contrast a national network of Trojan dispensers versus Trojan representatives catching couples in need of a condom.

    Which one is more effective?

    Promote cyberself-protection today!

    Watch your Python script with strace

    Filed under: Profiling,Programming,Python — Patrick Durusau @ 7:21 pm

    Description:

    Modern operating systems sandbox each process inside of a virtual memory map from which direct I/O operations are generally impossible. Instead, a process has to ask the operating system every time it wants to modify a file or communicate bytes over the network. By using operating system specific tools to watch the system calls a Python script is making — using “strace” under Linux or “truss” under Mac OS X — you can study how a program is behaving and address several different kinds of bugs.

    Brandon Rhodes does a delightful presentation on using strace with Python.

    Slides for Tracing Python with strace or truss.

    I deeply enjoyed this presentation, which I discovered while looking at a Python regex issue.

    Anticipate running strace on the Python script this week and will report back on any results or failure to obtain results! (Unlike in academic publishing, experiments and investigations do fail.)

    September 10, 2016

    Weapons of Math Destruction:… [Constructive Knowledge of Discriminatory Impact?]

    Filed under: Bias,Mathematics,Modeling — Patrick Durusau @ 8:03 pm

    Weapons of Math Destruction: invisible, ubiquitous algorithms are ruining millions of lives by Cory Doctorow.

    From the post:

    I’ve been writing about the work of Cathy “Mathbabe” O’Neil for years: she’s a radical data-scientist with a Harvard PhD in mathematics, who coined the term “Weapons of Math Destruction” to describe the ways that sloppy statistical modeling is punishing millions of people every day, and in more and more cases, destroying lives. Today, O’Neil brings her argument to print, with a fantastic, plainspoken, call to arms called (what else?) Weapons of Math Destruction.

    weapons-math-destruction-460

    I’ve followed Cathy’s posts long enough to recommend Weapons of Math Destruction sight unseen. (Publication date September 6, 2016.)

    Warning: If you read Weapons of Math Destruction, unlike executives who choose models based on their “gut,” or “instinct,” you may be charged with constructive knowledge of how you model discriminates against group X or Y.

    If, like a typical Excel user, you can honestly say “I type in the numbers here and the output comes out there,” it’s going to be hard to prove any intent to discriminate.

    You are no more responsible for a result than a pump handle is responsible for cholera.

    Doctorow’s conclusion:


    O’Neil’s book is a vital crash-course in the specialized kind of statistical knowledge we all need to interrogate the systems around us and demand better.

    depends upon your definition of “better.”

    “Better” depends on your goals or those of a client.

    Yes?

    PS: It is important to understand models/statistics/data so you can shape results to be your definition of “better.” But acknowledging all results are shaped. The critical question is “What shape do you want?”

    Self-Destruct Smart Phone Feature

    Filed under: Cybersecurity,Humor — Patrick Durusau @ 3:15 pm

    The Samsung Galaxy 7 Note offer a self-destruct feature may defeat even quantum computers. It melts itself.

    Like most new features, it’s erratic and difficult to invoke reliably. The 35 known cases don’t establish a pattern of how to make the Galaxy 7 Note explode on-demand, an essential characteristic for a self-destruct feature.

    Having discovered this feature accidentally, one expects Samsung to offer the self-destruct feature on a standard Galaxy 8. Pricing has yet to be determined.

    😉

    From the post:

    samsung-galaxy-note-7-fire-front-460

    PS: The self-destruct UI should be two-buttons. Say on/off plus phone. Something easy to remember and perform as you are being seized.

    September 9, 2016

    Let’s Offend Mark Zuckerberg! Napalm-Girl – Please Repost Image

    Filed under: Censorship,Free Speech — Patrick Durusau @ 11:01 am

    Facebook deletes Norwegian PM’s post as ‘napalm girl’ row escalates by Alice Ross and Julia Carrie Wong.

    napalm-girl-460

    From the post:

    Facebook has deleted a post by the Norwegian prime minister in an escalating row over the website’s decision to remove content featuring the Pulitzer-prize winning “napalm girl” photograph from the Vietnam war.

    Erna Solberg, the Conservative prime minister, called on Facebook to “review its editing policy” after it deleted her post voicing support for a Norwegian newspaper that had fallen foul of the social media giant’s guidelines.

    Solberg was one of a string of Norwegian politicians who shared the iconic image after Facebook deleted a post from Tom Egeland, a writer who had included the Nick Ut picture as one of seven photographs he said had “changed the history of warfare”.

    I remember when I first saw that image during the Vietnam War. As if the suffering of the young girl wasn’t enough, the photo captures the seeming indifference of the soldiers in the background.

    This photo certainly changes approach of the U.S. military to press coverage of wars. From TV cameras recording live footage of battles and the wounded in Vietnam, present day coverage is highly sanitized and “safe” for any viewing audience.

    There are the obligatory shots of the aftermath of “terrorist” bombings but where is the live reporting on allied bombing of hospitals, weddings, schools and the like? Where are the shrieking wounded and death rattles?

    Too much of that and American voters might get the idea that war has real consequences, for real people. Well, war always does but it the profit consequences that concern military leadership and their future employers. Can’t have military spending without a war and a supposed enemy.

    Zuckerberg should not shield us and especially not children from the nasty side of war.

    Sanitized and “safe” reporting of wars is a recipe for the continuation of the same.

    Read more about the photo and the photographer who took it: Nick Ut’s Napalm Girl Helped End the Vietnam War. Today in L.A., He’s Still Shooting

    You can’t really tell from the photo but the girl’s skin (Kim Phuc) was melting off in strips. That’s the reality of war that needs to be brought home to everyone who supports war to achieve abstract policy goals and objectives.

    September 8, 2016

    No Properties/No Structure – But, Subject Identity

    Filed under: Category Theory,Subject Identity,Topic Maps — Patrick Durusau @ 8:08 pm

    Jack Park has prodded me into following some category theory and data integration papers. More on that to follow but as part of that, I have been watching Bartosz Milewski’s lectures on category theory, reading his blog, etc.

    In Category Theory 1.2, Mileski goes to great lengths to emphasize:

    Objects are primitives with no properties/structure – a point

    Morphism are primitives with no properties/structure, but do have a start and end point

    Late in that lecture, Milewski says categories are the “ultimate in data hiding” (read abstraction).

    Despite their lack of properties and structure, both objects and morphisms have subject identity.

    Yes?

    I think that is more than clever use of language and here’s why:

    If I want to talk about objects in category theory as a group subject, what can I say about them? (assuming a scope of category theory)

    1. Objects have no properties
    2. Objects have no structure
    3. Objects mark the start and end of morphisms (distinguishes them from morphisms)
    4. Every object has an identity morphism
    5. Every pair of objects may have 0, 1, or many morphisms between them
    6. Morphisms may go in both directions, between a pair of morphisms
    7. An object can have multiple morphisms that start and end at it

    Incomplete and yet a lot of things to say about something that has no properties and no structure. 😉

    Bearing in mind, that’s just objects in general.

    I can also talk about a specific object at a particular time point in the lecture and screen location, which itself is a subject.

    Or an object in a paper or monograph.

    We can declare primitives, like objects and morphisms, but we should always bear in mind they are declared to be primitives.

    For other purposes, we can declare them to be otherwise.

    September 7, 2016

    New Plea: Charges Don’t Reflect Who I Am Today

    Filed under: Cybersecurity,Government,Government Data,Security — Patrick Durusau @ 3:20 pm

    Traditionally, pleas have been guilty, not guilty, not guilty by reason of insanity and nolo contendere (no contest).

    Beth Cobert, acting director at the OPM, has added a fifth plea:

    Charges Don’t Reflect Who I Am Today

    Greg Masters captures the new plea in Congressional report faults OPM over breach preparedness and response:


    While welcoming the committee’s acknowledgement of the OPM’s progress, Beth Cobert, acting director at the OPM, disagreed with the committee’s findings in a blog post published on the OPM site on Wednesday, responding that the report does “not fully reflect where this agency stands today.”
    … (emphasis added)

    Any claims about “…where this agency stands today…” are a distraction from the question of responsibility for a system wide failure of security.

    If you know any criminal defense lawyers, suggest they quote Beth Cobert as setting a precedent for responding to allegations of prior misconduct with:

    Charges Don’t Reflect Who I Am Today

    Please forward links to news reports of successful use of that plea to my attention.

    Audio/Video Conferencing – Apache OpenMeetings

    Filed under: Education,Telecommunications,Video,Video Conferencing — Patrick Durusau @ 2:47 pm

    Apache OpenMeetings

    Ignorance of Apache OpenMeetings is the only explanation I can offer for non-Apache Openmeetings webinars with one presenter, listeners and a chat channel.

    Proprietary solutions limit your audience’s choice of platforms, while offering no, repeat no advantages over Apache OpenMeetings.

    It may be that your IT department is too busy creating SQLi weaknesses to install and configure Apache OpenMeetings, but even so that’s a fairly poor excuse for not using it.

    If you just have to spend money to “trust” software, there are commercial services that offer hosting and other services for Apache OpenMeetings.

    Apologies, sort of, for the Wednesday rant, but I tire of limited but “popular logo” commercial services used in place of robust open source solutions.

    September 6, 2016

    Data Provenance: A Short Bibliography

    Filed under: Data Aggregation,Data Provenance,Merging,Topic Maps,XQuery — Patrick Durusau @ 7:45 pm

    The video Provenance for Database Transformations by Val Tannen ends with a short bibliography.

    Links and abstracts for the items in Val’s bibliography:

    Provenance Semirings by Todd J. Green, Grigoris Karvounarakis, Val Tannen. (2007)

    We show that relational algebra calculations for incomplete databases, probabilistic databases, bag semantics and whyprovenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation that uses semirings of polynomials. We extend these considerations to datalog and semirings of formal power series. We give algorithms for datalog provenance calculation as well as datalog evaluation for incomplete and probabilistic databases. Finally, we show that for some semirings containment of conjunctive queries is the same as for standard set semantics.

    Update Exchange with Mappings and Provenance by Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, Val Tannen. (2007)

    We consider systems for data sharing among heterogeneous peers related by a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries over related data from other peers as well. To achieve this, every peer’s updates propagate along the mappings to the other peers. However, this update exchange is filtered by trust conditions — expressing what data and sources a peer judges to be authoritative — which may cause a peer to reject another’s updates. In order to support such filtering, updates carry provenance information. These systems target scientific data sharing applications, and their general principles and architecture have been described in [20].

    In this paper we present methods for realizing such systems. Specifically, we extend techniques from data integration, data exchange, and incremental view maintenance to propagate updates along mappings; we integrate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance; we discuss strategies for implementing our techniques in conjunction with an RDBMS; and we experimentally demonstrate the viability of our techniques in the ORCHESTRA prototype system.

    Annotated XML: Queries and Provenance by J. Nathan Foster, Todd J. Green, Val Tannen. (2008)

    We present a formal framework for capturing the provenance of data appearing in XQuery views of XML. Building on previous work on relations and their (positive) query languages, we decorate unordered XML with annotations from commutative semirings and show that these annotations suffice for a large positive fragment of XQuery applied to this data. In addition to tracking provenance metadata, the framework can be used to represent and process XML with repetitions, incomplete XML, and probabilistic XML, and provides a basis for enforcing access control policies in security applications.

    Each of these applications builds on our semantics for XQuery, which we present in several steps: we generalize the semantics of the Nested Relational Calculus (NRC) to handle semiring-annotated complex values, we extend it with a recursive type and structural recursion operator for trees, and we define a semantics for XQuery on annotated XML by translation into this calculus.

    Containment of Conjunctive Queries on Annotated Relations by Todd J. Green. (2009)

    We study containment and equivalence of (unions of) conjunctive queries on relations annotated with elements of a commutative semiring. Such relations and the semantics of positive relational queries on them were introduced in a recent paper as a generalization of set semantics, bag semantics, incomplete databases, and databases annotated with various kinds of provenance information. We obtain positive decidability results and complexity characterizations for databases with lineage, why-provenance, and provenance polynomial annotations, for both conjunctive queries and unions of conjunctive queries. At least one of these results is surprising given that provenance polynomial annotations seem “more expressive” than bag semantics and under the latter, containment of unions of conjunctive queries is known to be undecidable. The decision procedures rely on interesting variations on the notion of containment mappings. We also show that for any positive semiring (a very large class) and conjunctive queries without self-joins, equivalence is the same as isomorphism.

    Collaborative Data Sharing with Mappings and Provenance by Todd J. Green, dissertation. (2009)

    A key challenge in science today involves integrating data from databases managed by different collaborating scientists. In this dissertation, we develop the foundations and applications of collaborative data sharing systems (CDSSs), which address this challenge. A CDSS allows collaborators to define loose confederations of heterogeneous databases, relating them through schema mappings that establish how data should flow from one site to the next. In addition to simply propagating data along the mappings, it is critical to record data provenance (annotations describing where and how data originated) and to support policies allowing scientists to specify whose data they trust, and when. Since a large data sharing confederation is certain to evolve over time, the CDSS must also efficiently handle incremental changes to data, schemas, and mappings.

    We focus in this dissertation on the formal foundations of CDSSs, as well as practical issues of its implementation in a prototype CDSS called Orchestra. We propose a novel model of data provenance appropriate for CDSSs, based on a framework of semiring-annotated relations. This framework elegantly generalizes a number of other important database semantics involving annotated relations, including ranked results, prior provenance models, and probabilistic databases. We describe the design and implementation of the Orchestra prototype, which supports update propagation across schema mappings while maintaining data provenance and filtering data according to trust policies. We investigate fundamental questions of query containment and equivalence in the context of provenance information. We use the results of these investigations to develop novel approaches to efficiently propagating changes to data and mappings in a CDSS. Our approaches highlight unexpected connections between the two problems and with the problem of optimizing queries using materialized views. Finally, we show that semiring annotations also make sense for XML and nested relational data, paving the way towards a future extension of CDSS to these richer data models.

    Provenance in Collaborative Data Sharing by Grigoris Karvounarakis, dissertation. (2009)

    This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange — which publishes a participant’s updates and then translates others’ updates to the participant’s local schema and imports them — while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases.

    To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the ORCHESTRA prototype system. We define ProQL, iv a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques.

    Provenance for Aggregate Queries by Yael Amsterdamer, Daniel Deutch, Val Tannen. (2011)

    We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggregate queries pose novel challenges rendering this approach inapplicable. Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. We realize this approach in a concrete construction, first for “simple” queries where the aggregation operator is the last one applied, and then for arbitrary (positive) relational algebra queries with aggregation; the latter queries are shown to be more challenging in this context. Finally, we use aggregation to encode queries with difference, and study the semantics obtained for such queries on provenance annotated databases.

    Circuits for Datalog Provenance by Daniel Deutch, Tova Milo, Sudeepa Roy, Val Tannen. (2014)

    The annotation of the results of database queries with provenance information has many applications. This paper studies provenance for datalog queries. We start by considering provenance representation by (positive) Boolean expressions, as pioneered in the theories of incomplete and probabilistic databases. We show that even for linear datalog programs the representation of provenance using Boolean expressions incurs a super-polynomial size blowup in data complexity. We address this with an approach that is novel in provenance studies, showing that we can construct in PTIME poly-size (data complexity) provenance representations as Boolean circuits. Then we present optimization techniques that embed the construction of circuits into seminaive datalog evaluation, and further reduce the size of the circuits. We also illustrate the usefulness of our approach in multiple application domains such as query evaluation in probabilistic databases, and in deletion propagation. Next, we study the possibility of extending the circuit approach to the more general framework of semiring annotations introduced in earlier work. We show that for a large and useful class of provenance semirings, we can construct in PTIME poly-size circuits that capture the provenance.

    Incomplete but a substantial starting point exploring data provenance and its relationship/use with topic map merging.

    To get a feel for “data provenance” just prior to the earliest reference here (2007), consider A Survey of Data Provenance Techniques by Yogesh L. Simmhan, Beth Plale, Dennis Gannon, published in 2005.

    Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources.

    The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes.

    In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.

    Another rich source of reading material!

    Why OrientDB?

    Filed under: Marketing,OrientDB — Patrick Durusau @ 4:45 pm

    Why OrientDB?

    From the webpage:

    Understanding the strengths, limitations and trade-offs among the leading DBMS options can be DIS-ORIENTING. Developers have grown tired of making compromises in speed and flexibility or supporting several DBMS products to satisfy their use case requirements.

    Thus, OrientDB was born: the first Multi-Model Open Source NoSQL DBMS that combines the power of graphs and the flexibility of documents into one scalable, high-performance operational database.

    In addition to great software, OrientDB also has a clever marketing department:

    orientdb-tweet-640

    That’s an image from an OrientDB tweet that sends you to the Why OrientDB? page.

    What’s your great image to gain attention?

    PS: I remember one from an IT zine in the 1990’s where employee’s were racing around the office on fire. Does that ring a bell with anyone? Seems like it was one of the large format, Computer Shopper size zines.

    Why No Wild Wild West? Parity Between Large/Small Governments? Citizens?

    Filed under: Cybersecurity,Government,Politics,Security — Patrick Durusau @ 3:01 pm

    Jordyn Phelps reports in Obama Tells Putin Hackers Shouldn’t Create Cyber ‘Wild Wild West’:


    “What we cannot do is have a situation where this becomes the wild, wild West, where countries that have significant cyber capacity start engaging in unhealthy competition or conflict through these means,” the president said. He added that nations have enough to worry about in the realm of cyber attacks from non-state actors without nation-states engaging in hacking against one another.

    Interesting that weapons that don’t require a major industrial base, like poison gas, biological, computer hacking, are such a pressing concern.

    Weapons that small governments, small groups of people or even single individuals can produce and use effectively, well, those need to be severely policed if not prohibited outright.

    If anything, there is too much hacking of private email accounts, celebrity nude pics, and rasomware with too little hacking of government emails, databases and document troves.

    For example, there was a coup in Egypt (the most recent one 2013) but did you see vast quantities of diplomatic correspondence being leaked?

    I am always disappointed when governments change and a bright spotlight isn’t shown on its predecessors. Especially if those predecessors had dealings with the United States and its minions. It’s not possible to tell what might be unearthed.

    Hacking maybe the great leveler between governments and between governments and their peoples.

    What’s there not to like about that?

    PS: Unless, like Obama, you are loathe to share any of the wealth and power in the world.

    September 5, 2016

    Merge 5 Proxies, Take Away 1 Proxy = ? [Data Provenance]

    Filed under: Annotation,Data Provenance,Merging,Topic Maps — Patrick Durusau @ 6:45 pm

    Provenance for Database Transformations by Val Tannen. (video)

    Description:

    Database transformations (queries, views, mappings) take apart, filter,and recombine source data in order to populate warehouses, materialize views,and provide inputs to analysis tools. As they do so, applications often need to track the relationship between parts and pieces of the sources and parts and pieces of the transformations’ output. This relationship is what we call database provenance.

    This talk presents an approach to database provenance that relies on two observations. First, provenance is a kind of annotation, and we can develop a general approach to annotation propagation that also covers other applications, for example to uncertainty and access control. In fact, provenance turns out to be the most general kind of such annotation,in a precise and practically useful sense. Second, the propagation of annotation through a broad class of transformations relies on just two operations: one when annotations are jointly used and one when they are used alternatively.This leads to annotations forming a specific algebraic structure, a commutative semiring.

    The semiring approach works for annotating tuples, field values and attributes in standard relations, in nested relations (complex values), and for annotating nodes in (unordered) XML. It works for transformations expressed in the positive fragment of relational algebra, nested relational calculus, unordered XQuery, as well as for Datalog, GLAV schema mappings, and tgd constraints. Finally, when properly extended to semimodules it works for queries with aggregates. Specific semirings correspond to earlier approaches to provenance, while others correspond to forms of uncertainty, trust, cost, and access control.

    What does happen when you subtract from a merge? (Referenced here as an “aggregation.”)

    Although possible to paw through logs to puzzle out a result, Val suggests there are more robust methods at our disposal.

    I watched this over the weekend and be forewarned, heavy sledding ahead!

    This is an active area of research and I have only begun to scratch the surface for references.

    I may discover differently, but the “aggregation” I have seen thus far relies on opaque strings.

    Not that all uses of opaque strings are inappropriate, but imagine the power of treating a token as an opaque string for one use case and exploding that same token into key/value pairs for another.

    Enjoy!

    September 4, 2016

    Keystroke Recognition Using WiFi Signals [Identifying Users With WiFi?]

    Filed under: Cybersecurity,Security — Patrick Durusau @ 8:42 pm

    Keystroke Recognition Using WiFi Signals by Kamran Ali, Alex X. Liu, Wei Wang, and Muhammad Shahzad.

    Abstract:

    Keystroke privacy is critical for ensuring the security of computer systems and the privacy of human users as what being typed could be passwords or privacy sensitive information. In this paper, we show for the first time that WiFi signals can also be exploited to recognize keystrokes. The intuition is that while typing a certain key, the hands and fingers of a user move in a unique formation and direction and thus generate a unique pattern in the time-series of Channel State Information (CSI) values, which we call CSI-waveform for that key. In this paper, we propose a WiFi signal based keystroke recognition system called WiKey. WiKey consists of two Commercial Off-The-Shelf (COTS) WiFi devices, a sender (such as a router) and a receiver (such as a laptop). The sender continuously emits signals and the receiver continuously receives signals. When a human subject types on a keyboard, WiKey recognizes the typed keys based on how the CSI values at the WiFi signal receiver end. We implemented the WiKey system using a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 laptop. WiKey achieves more than 97.5% detection rate for detecting the keystroke and 96.4% recognition accuracy for classifying single keys. In real-world experiments, WiKey can recognize keystrokes in a continuously typed sentence with an accuracy of 93.5%.

    In discussing the limitations of their technique the authors mention:


    User Specific Training. In our current implementation of WiKey, we train the classifiers using one user and test the classifier using the test samples from the same user. However, we hypothesize that if we train our classifier using a large number of users, the trained classifier will be able to capture commonalities between users and will then be able to recognize the keystrokes of any unknown user. At the same time, we also acknowledge that it is extremely challenging to build such a universal classifier that works for almost every user because WiFi signals are susceptible to various factors such as finger length/width, typing styles, and environmental noise.

    The more interesting case would be identifying users in surveillance mode by their keystrokes, assuming persistent digital capture of their keystrokes wasn’t possible.

    Subject (as in human) identification by WiFi signals?

    Data Science Series [Starts 9 September 2016 but not for *nix users]

    Filed under: Data Science — Patrick Durusau @ 8:20 pm

    The BD2K Guide to the Fundamentals of Data Science Series

    From the webpage:


    Every Friday beginning September 9, 2016
    9am – 10am Pacific Time

    Working jointly with the BD2K Centers-Coordination Center (BD2KCCC) and the NIH Office of Data Science, the BD2K Training Coordinating Center (TCC) is spearheading this virtual lecture series on the data science underlying modern biomedical research. Beginning in September 2016, the seminar series will consist of regularly scheduled weekly webinar presentations covering the basics of data management, representation, computation, statistical inference, data modeling, and other topics relevant to “big data” biomedicine. The seminar series will provide essential training suitable for individuals at all levels of the biomedical community. All video presentations from the seminar series will be streamed for live viewing, recorded, and posted online for future viewing and reference. These videos will also be indexed as part of TCC’s Educational Resource Discovery Index (ERuDIte), shared/mirrored with the BD2KCCC, and with other BD2K resources.

    View all archived videos on our YouTube channel:
    https://www.youtube.com/channel/UCKIDQOa0JcUd3K9C1TS7FLQ


    Please join our weekly meetings from your computer, tablet or smartphone.
    https://global.gotomeeting.com/join/786506213
    You can also dial in using your phone.
    United States +1 (872) 240-3311
    Access Code: 786-506-213
    First GoToMeeting? Try a test session: http://help.citrix.com/getready

    Of course, running Ubuntu, when I follow the “First GoToMeeting? Try a test session,” I get this result:


    OS not supported

    Long-Term Fix: Upgrade your computer.

    You or your IT Admin will need to upgrade your computer’s operating system in order to install our desktop software at a later date.

    Since this is most likely a lecture format, could just stream the video and use WebConf as a Q/A channel.

    Of course, that would mean losing the various technical difficulties, licensing fees, etc., all of which are distractions from the primary goal of the project.

    But who wants that?

    PS: Most *nix users won’t be interested except to refer others but still, over engineered solutions to simple issues should not be encouraged.

    Plugins for Newsgathering and Verification

    Filed under: Journalism,News,Reporting — Patrick Durusau @ 7:55 pm

    7 vital browser plugins for newsgathering and verification by Alastair Reid.

    From the post:

    When breaking news can travel the world in seconds, it is important for journalists to have the tools at their disposal to get to work fast. When searching the web, what quicker way is there to have those tools available than directly in the browser window?

    Most browsers have a catalogue of programs and software to make your browsing experience more powerful, like a smartphone app store. At First Draft we find Google’s Chrome browser is the most effective but there are obviously other options available.

    Text says “five” but this has been updated to include “seven” plugins.

    One of the updates is: Frame by Frame for YouTube, which like the name says, enables frame by frame viewing, is touted for verification.

    I can think of a number of uses for frame-by-frame viewing. You?

    See Alastair’s post for the rest and follow @firstdraftnews to stay current on digital tools for journalists.

    Running a Tor Exit Node for fun and e-mails

    Filed under: Dark Web,Tor — Patrick Durusau @ 7:34 pm

    Running a Tor Exit Node for fun and e-mails by Antonios A. Chariton.

    From the post:


    To understand the logistics behind running a Tor Exit Node, I will tell you how I got to run my Tor Exit Node for over 8 months. Hopefully, during the process, some of your questions will be answered, and you’ll also learn some new things. Please note that this is my personal experience and I cannot guarantee it will be the same for you. Also, I must state that I have run other exit nodes in the past, as well as multiple non-exit relays and bridges.
    …

    A great first person account on running a Tor Exit Node.

    Some stats after 8 months:

    • It has been running for almost 8 months
    • It costs 4,90 EUR / month. In comparison, the same server in AWS would cost $1,122, or 992€ as of today
    • The total cost to date is 40€. In comparison, the same server in AWS would cost about 8,000€.
    • It is pushing up to 50 Mb/s, every second
    • It relayed over 70 TB of Tor traffic
    • It generated 2,729 Abuse E-Mails
    • It is only blocking port 25, and this to prevent spam
    • It helped hundreds or thousands of people to reach an uncensored Internet
    • It helped even more people browse the Internet anonymously and with privacy

    If your not quite up to running an exit node, consider running a Tor relay node: Add Tor Nodes For 2 White Chocolate Mochas (Venti) Per Month.

    Considering the bandwidth used by governments for immoral purposes, the observation:


    Finally, just like with everything else, we have malicious users. Not necessarily highly skilled criminals, but people in general who (ab)use the anonymity that Tor provides to commit things they otherwise wouldn’t.

    doesn’t trouble me.

    As a general rule, highly skilled or not, criminals don’t carry out air strikes against hospitals and such.

    September 3, 2016

    Predicting American Politics

    Filed under: Government,Politics,R,Statistics — Patrick Durusau @ 4:04 pm

    Presidential Election Predictions 2016 (an ASA competition) by Jo Hardin.

    From the post:

    In this election year, the American Statistical Association (ASA) has put together a competition for students to predict the exact percentages for the winner of the 2016 presidential election. They are offering cash prizes for the entry that gets closest to the national vote percentage and that best predicts the winners for each state and the District of Columbia. For more details see:

    http://thisisstatistics.org/electionprediction2016/

    To get you started, I’ve written an analysis of data scraped from fivethirtyeight.com. The analysis uses weighted means and a formula for the standard error (SE) of a weighted mean. For your analysis, you might consider a similar analysis on the state data (what assumptions would you make for a new weight function?). Or you might try some kind of model – either a generalized linear model or a Bayesian analysis with an informed prior. The world is your oyster!

    Interesting contest but it is limited to high school and college students. Separate prizes, one for high school and one for college, $200.00 each. Oh, plus ASA memberships and a 2016 Election Prediction t-shirt.

    For adults in the audience, strike up a prediction pool by state and/or for the nation.

    September 2, 2016

    Best and Worst Journalism of August 2016 [An Exercise]

    Filed under: Journalism,News,Reporting — Patrick Durusau @ 9:18 pm

    The best and worst journalism of August 2016 by David Uberti.

    Before you read Uberti’s post:

    Take a few minutes to find stories you recall from August and sort them into best and worst, along with your reasons.

    It’s one thing to passively go along with the judgment of others, it takes real effort to form a judgment of your own.

    Now, compare your stories to Uberti’s.

    Same, different? Were your reasons different?

    What stories did Uberti “miss?”

    PS: The boosterism of the New York Times for Iraqi militias merits a “worst” place, at least to me.

    Drunk, Distracted w/ Clearance

    Filed under: Cybersecurity — Patrick Durusau @ 4:52 pm

    As promised in Drunk, Distracted + WiFi – NFL Builds Hacker’s Wet Dream, my answer to the question:

    Where are you most likely to find distracted drunks with “clearance?”

    Air Force: Air Force has WiFi at its stadium. Distracted drunks with clearance likely at every home game.

    DATE OPPONENT
    09/03/16 Abilene Christian
    09/10/16 Georgia State
    09/24/16 @ Utah State – WiFi
    10/01/16 Navy
    10/08/16 @ Wyoming
    10/15/16 vs. New Mexico
    10/22/16 Hawaii
    10/28/16 @ Fresno State
    11/05/16 @ Army West Point
    11/12/16 Colorado State
    11/19/16 @ San Jose State – WiFi
    11/25/16 Boise State

    Army: Army has no WiFi at its stadium. Distracted drunks with clearance likely at every game but only at Duke and Air Force games will they be hackable on WiFi.

    DATE OPPONENT
    09/02/16 @ Temple
    09/10/16 Rice
    09/17/16 @ UTEP
    09/24/16 @ Buffalo
    10/08/16 @ Duke – WiFi
    10/15/16 Lafayette
    10/22/16 North Texas
    10/29/16 @ Wake Forest
    11/05/16 Air Force – WiFi
    11/12/16 vs. Notre Dame*
    11/19/16 Morgan State
    12/10/16 vs. Navy*

    Navy Navy has no WiFi at its stadium. Distracted drunks with clearance likely at every game but only at Tulane, Air Force, East Carolina, and and South Florida games will they be hackable on WiFi.

    DATE OPPONENT
    09/03/16 Fordham
    09/10/16 Connecticut
    09/17/16 @ Tulane – WiFi
    10/01/16 @ Air Force – WiFi
    10/08/16 Houston
    10/13/16 @ East Carolina – WiFi
    10/22/16 Memphis
    10/28/16 @ South Florida – WiFi
    11/05/16 vs. Notre Dame*
    11/12/16 Tulsa
    11/26/16 @ SMU
    12/10/16 vs. Army West Point*

    Notes:

    Open or “free” wireless networks can be found at or near college stadiums so the absence of “official” WiFi may not be a reliable indicator of WiFi access.

    Hacks against cellphones are available at all games, whether WiFi is present or not.

    Enjoy the game!

    PS: I don’t know where you find collections of State staff with WiFi. Polo games? Suggestions welcome.

    Journalism Drone Operations Manual

    Filed under: Journalism,News,Reporting — Patrick Durusau @ 3:12 pm

    CoJMC’s Drone Journalism Lab launches drone operations manual

    From the webpage:

    To help newsrooms get started using drones for journalism, the Drone Journalism Lab at the University of Nebraska-Lincoln is releasing the “The Drone Journalism Lab Operations Manual,” a guide that covers everything from pre-flight checklists to ethical considerations.

    A first of its kind, the manual is free, Creative Commons licensed and provided as an open source document online. The Drone Journalism Lab created it with support from the John S. and James L. Knight Foundation.

    “As journalists look to become more relevant and responsive to community needs, this manual is an important step towards experimenting with new ways of gathering and presenting news and information. It is a resource for best practices and an exciting invitation to explore a fresh, emerging area of the field,” said Shazna Nessa, Knight Foundation director for journalism.

    Dr. Maria Marron, dean of the College of Journalism and Mass Communications, praised Professor Matt Waite for producing the operations manual.

    “Matt is a key innovator in journalism,” she said. “It was his prescience about the potential for drones in journalism that made UNL’s Drone Journalism Lab the leader in the field. The operations manual will be the go-to resource for anyone interested in using drones for journalistic purposes.”

    Link for the manual: https://www.dropbox.com/sh/32pi2e2gv6huyzg/AAAwGq7b1mO5ekikCn-7JFiMa?dl=0.

    What a great resource!

    A great template for how to describe your use of drones for journalism.

    September 1, 2016

    Cop Stuff Catalog (dated, from 2014)

    Filed under: Cybersecurity,Privacy — Patrick Durusau @ 9:15 pm

    Introduction to Cobham Tactical Communications and Surveillance (PDF)

    As a world-leader in its field, providing products and integrated surveillance solutions to law enforcement, military, national security and border patrol agencies, Cobham Tactical Communications & Surveillance offers innovative video, audio, tracking, locating, sensor, and covert surveillance solutions for government and civil agencies. (from page 2 of the PDF)

    This catalog, described as “confidential” in Leaked Catalogue Reveals a Vast Array of Military Spy Gear Offered to U.S. Police started circulating on Twitter, 1 September 2016.

    The catalog is a hoot to read but if you follow the URL at the bottom of each page, www.cobham.com/tcs, you will be taken to later, public information on the same products.

    More recent information I might add, as the catalog does not list the High Bandwidth Mesh – P5 (PDF), which is listed on the website.

    I did not see online video concealment suggestions:

    cobham-01-460

    So, perhaps the catalog is more useful than its date might indicate.

    I understand the emphasis on U.S. police but this type of equipment is used by governments worldwide.

    Counter measures and/or duplicating these capabilities so the watchers can be watched are always a good idea.

    PS: The outdoor trash can looks way too clean to be plausible. Besides, there are ways to create surprises with outdoor trash cans.

    Dark Web OSINT With Python Part Three: Visualization

    Filed under: Dark Web,Open Source Intelligence,Python,Tor — Patrick Durusau @ 4:40 pm

    Dark Web OSINT With Python Part Three: Visualization by Justin.

    From the post:

    Welcome back! In this series of blog posts we are wrapping the awesome OnionScan tool and then analyzing the data that falls out of it. If you haven’t read parts one and two in this series then you should go do that first. In this post we are going to analyze our data in a new light by visualizing how hidden services are linked together as well as how hidden services are linked to clearnet sites.

    One of the awesome things that OnionScan does is look for links between hidden services and clearnet sites and makes these links available to us in the JSON output. Additionally it looks for IP address leaks or references to IP addresses that could be used for deanonymization.

    We are going to extract these connections and create visualizations that will assist us in looking at interesting connections, popular hidden services with a high number of links and along the way learn some Python and how to use Gephi, a visualization tool. Let’s get started!

    Jason tops off this great series on OnionScan by teaching the rudiments of using Gephi to visualize and explore the resulting data.

    Can you map yourself from the Dark Web to visible site?

    If so, you aren’t hidden well enough.

    Drunk, Distracted + WiFi – NFL Builds Hacker’s Wet Dream

    Filed under: Cybersecurity — Patrick Durusau @ 2:57 pm

    With the coming of Fall in the U.S., football news is bleeding into the most technical of feeds.

    Consider How the NFL and its stadiums became leaders in Wi-Fi, monetizing apps, and customer experience by Teena Maddox.

    From the post:

    In the past two years, fan expectations have changed dramatically when it comes to connectivity and Wi-Fi in stadiums. Fans are consuming Wi-Fi bandwidth as fast as the stadiums can provide it, and their appetites seem insatiable.

    TechRepublic last covered this topic in-depth in April 2014, when we heard from industry sources that in order to keep millennials coming to live events, that generation expected fast Wi-Fi connectivity at stadiums—while others outside of that generation appreciated it, but didn’t demand it. Two years later, everyone, regardless of age, expects seamless connectivity at a game, concert, or other entertainment event.

    Out of 32 NFL teams, only two NFL stadiums, Qualcomm in San Diego, and O.co Coliseum in Oakland, Calif. don’t have WiFi. (Teena says three but the latest news is that NGR in Houston now has WiFi.)

    To save you the trouble of looking them up, the following thirty (30) stadiums host drunk and distracted fans on WiFi:

    Arrowhead Stadium Kansas City Chiefs 76,416
    AT&T Stadium Dallas Cowboys 80,000
    Bank of America Stadium Carolina Panthers 73,778
    CenturyLink Field Seattle Seahawks 67,000
    Edward Jones Dome St. Louis Rams 66,000
    EverBank Field Jacksonville Jaguars 67,264
    FedEx Field Washington Redskins 79,000
    FirstEnergy Stadium Cleveland Browns 68,000
    Ford Field Detroit Lions 65,000
    Georgia Dome Atlanta Falcons 71,250
    Gillette Stadium New England Patriots 68,756
    Heinz Field Pittsburgh Steelers 65,500
    Lambeau Field Green Bay Packers 80,735
    Levi’s Stadium San Francisco 49ers 68,500
    Lincoln Financial Field Philadelphia Eagles 69,176
    Nissan Stadium Tennessee Titans 69,143
    Lucas Oil Stadium Indianapolis Colts 63,000
    M&T Bank Stadium Baltimore Ravens 71,008
    MetLife Stadium New York Giants/Jets 82,500
    TCF Bank Stadium* Minnesota Vikings 52,525
    Nissan Stadium Tennessee Titans 69,143
    NRG Stadium Houston Texans 71,500
    Paul Brown Stadium Cincinnati Bengals 65,515
    Ralph Wilson Stadium Buffalo Bills 73,967
    Raymond James Stadium Tampa Bay Buccaneers 65,890
    Soldier Field Chicago Bears 61,500
    Sports Authority Field Denver Broncos 76,125
    Sun Life Stadium Miami Dolphins 75,540
    Superdome New Orleans Saints 76,468
    University of Phoenix Stadium Arizona Cardinals 63,400

    Game day attendance may vary from the capacity figures listed.

    Remember that the rich and those will “clearance” are most likely found in box seats.

    Question (My answer tomorrow): Where are you most likely to find distracted drunks with “clearance?”

    « Newer Posts

    Powered by WordPress