Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 5, 2018

#ColorOurCollections

Filed under: Art,FBI,Library — Patrick Durusau @ 5:12 pm

#ColorOurCollections

From the webpage:

From February 5-9, 2018, libraries, archives, and other cultural institutions around the world are sharing free coloring sheets and books based on materials in their collections.

Something fun to start the week!

In addition to more than one hundred participating institutions, you can also find instructions for creating your own coloring pages.

Any of the images you find at Mardi Gras New Orleans will make great coloring pages (modulo non-commercial use and/or permissions as appropriate).

The same instructions will help you make “adult” coloring pages as well.

I wasn’t able to get attractive results for Pedro Berruguete Saint Dominic Presiding over an Auto-da-fe 1495 using the simple instructions but will continue to play with it.

High hopes for an Auto-da-fe coloring page. FBI leaders who violate the privacy of American citizens as the focal point. (There are honest, decent and valuable FBI agents, but like other groups, only the bad apples get the press.)

February 3, 2018

Mapping Militant Selfies: …Generating Battlefield Data

Filed under: Entity Extraction,Entity Resolution,Mapping,Maps — Patrick Durusau @ 4:22 pm

Mapping Militant Selfies – Application of Entity Recognition/Extraction Methods to Generate Battlefield Data in Northern Syria (video) – presentation by Akin Unver.

From the seminar description:

As the Middle East goes through one of its most historic, yet painful episodes, the fate of the region’s Kurds have drawn substantial interest. Transnational Kurdish awakening—both political and armed—has attracted unprecedented global interest as individual Kurdish minorities across four countries, Turkey, Iraq, Iran, and Syria, have begun to shake their respective political status quo in various ways. In order to analyse this trend in a region in flux, this paper introduces a new methodology in generating computerised geopolitical data. Selfies of militants from three main warring non-state actors, ISIS, YPG and FSA, through February 2014 – February 2016, was sorted and operationalized through a dedicated repository of geopolitical events, extracted from a comprehensive open source archive of Turkish, Kurdish, Arabic, and Farsi sources, and constructed using entity extraction and recognition algorithms. These selfies were crosschecked against events related to conflict, such as unrest, attack, sabotage and bombings were then filtered based on human- curated lists of actors and locations. The result is a focused data set of more than 2000 events (or activity nodes) with a high level of geographical and temporal granularity. This data is then used to generate a series of four heat maps based on six-month intervals. They highlight the intensity of armed group events and the evolution of multiple fronts in the border regions of Turkey, Syria, Iraq and Iran.

Great presentation that includes the goal of:

With no reliance on ‘official’ (censored) data

Unfortunately, the technical infrastructure isn’t touched upon nor were any links given. I have written to Professor Unver asking for further information.

Although Unver focuses on the Kurds, these techniques support ad-hoc battlefield data systems, putting irregular forces to an information parity with better funded adversaries.

Replace selfies with time-stamped, geo-located images of government forces, plus image recognition, with a little discipline you have a start towards a highly effective force even if badly out numbered.

If you are interested in more academic application of this technology, see:

Schrödinger’s Kurds: Transnational Kurdish Geopolitics In The Age Of Shifting Borders

Abstract:

As the Middle East goes through one of its most historic, yet painful episodes, the fate of the region’s Kurds have drawn substantial interest. Transnational Kurdish awakening—both political and armed—has attracted unprecedented global interest as individual Kurdish minorities across four countries, Turkey, Iraq, Iran, and Syria, have begun to shake their respective political status quo in various ways. It is in Syria that the Kurds have made perhaps their largest impact, largely owing to the intensification of the civil war and the breakdown of state authority along Kurdish-dominated northern borderlands. However, in Turkey, Iraq, and Iran too, Kurds are searching for a new status quo, using multiple and sometimes mutually defeating methods. This article looks at the future of the Kurds in the Middle East through a geopolitical approach. It begins with an exposition of the Kurds’ geographical history and politics, emphasizing the natural anchor provided by the Taurus and Zagros mountains. That anchor, history tells us, has both rendered the Kurds extremely resilient to systemic changes to larger states in their environment, and also provided hindrance to the materialization of a unified Kurdish political will. Then, the article assesses the theoretical relationship between weak states and strong non-states, and examines why the weakening of state authority in Syria has created a spillover effect on all Kurds in its neighborhood. In addition to discussing classical geopolitics, the article also reflects upon demography, tribalism, Islam, and socialism as additional variables that add and expand the debate of Kurdish geopolitics. The article also takes a big-data approach to Kurdish geopolitics by introducing a new geopolitical research methodology, using large-volume and rapid-processed entity extraction and recognition algorithms to convert data into heat maps that reveal the general pattern of Kurdish geopolitics in transition across four host countries.

A basic app should run on Tails, in memory, such that if your coordinating position is compromised, powering down (jerking out the power cord) destroys all the data.

Hmmm, encrypted delivery of processed data from a web service to the coordinator, such that their computer is only displaying data.

Other requirements?

Where Are Topic Mappers Today? Lars Marius Garshol

Filed under: Games,PageRank — Patrick Durusau @ 11:37 am

Some are creating new children’s games:

If you’re interested, Ian Rogers has a complete explanation with examples at: The Google Pagerank Algorithm and How It Works or a different take with a table of approximate results at: RITE Wiki: Page Rank.

Unfortunately, both Garshol and Wikipedia’s PageRank page get the Google pagerank algorithm incorrect.

The correct formulation reads:

The results of reported algorithm are divided by U.S. Government Interference, an unknown quantity.

Perhaps that is why Google keeps its pagerank calculation secret. If I were an allegedly sovereign nation, I would keep Google’s lapdog relationship to the U.S. government firmly in mind.

IDA v7.0 Released as Freeware – Comparison to The IDA Pro Book?

Filed under: Cybersecurity,Hacking,Programming — Patrick Durusau @ 9:04 am

IDA v7.0 Released as Freeware

From the download page:

The freeware version of IDA v7.0 has the following limitations:

  • no commercial use is allowed
  • lacks all features introduced in IDA > v7.0
  • lacks support for many processors, file formats, debugging etc…
  • comes without technical support

Copious amounts of documentation are online.

I haven’t seen The IDA Pro Book by Chris Eagle, but it was published in 2011. Do you know anyone who has compared The IDA Pro Book to version 7.0?

Two promising pages: IDA Support Overview and IDA Support: Links (external).

February 2, 2018

PubMed Commons to be Discontinued

Filed under: Bioinformatics,Medical Informatics,PubMed,Social Media — Patrick Durusau @ 5:10 pm

PubMed Commons to be Discontinued

From the post:

PubMed Commons has been a valuable experiment in supporting discussion of published scientific literature. The service was first introduced as a pilot project in the fall of 2013 and was reviewed in 2015. Despite low levels of use at that time, NIH decided to extend the effort for another year or two in hopes that participation would increase. Unfortunately, usage has remained minimal, with comments submitted on only 6,000 of the 28 million articles indexed in PubMed.

While many worthwhile comments were made through the service during its 4 years of operation, NIH has decided that the low level of participation does not warrant continued investment in the project, particularly given the availability of other commenting venues.

Comments will still be available, see the post for details.

Good time for the reminder that even negative results from an experiment are valuable.

Even more so in this case because discussion/comment facilities are non-trivial components of a content delivery system. Time and resources not spent on comment facilities could be put in other directions.

Where do discussions of medical articles take place and can they be used to automatically annotate published articles?

The Unix Workbench

Filed under: Linux OS — Patrick Durusau @ 2:47 pm

Unlikely to help you but a great resource to pass along to new Unix users by Sean Kross.

Some day, Microsoft will complete the long transition to Unix. Start today and you will arrive years ahead of it. 😉

Discrediting the FBI?

Filed under: FBI,Government — Patrick Durusau @ 2:27 pm

Whatever your opinion of the accidental U.S. president (that’s a dead give away), what does it mean to “discredit” the FBI?

Just hitting the high points:

The FBI has a long history of lying and abuse, these being only some of the more recent examples.

So my question remains: What does it mean to “discredit” the FBI?

The FBI and its agents are unworthy of any belief by anyone. Their own records and admissions are a story of staggering from one lie to the next.

I’ll grant the FBI is large enough that honorable, hard working, honest agents must exist. But not enough of them to prevent the repeated fails at the FBI.

Anyone who credits any FBI investigation has motivations other than the factual record of the FBI.

PS: The Nunes memo confirms what many have long suspected about the FISA court: It exercises no more meaningful oversight over FISA warrants than a physical rubber stamp would in their place.

How To Secure Sex Toys – End to End (so to speak)

Filed under: Cybersecurity,Hacking,Security — Patrick Durusau @ 1:40 pm

Thursday began innocently enough and then I encountered:

The tumult of articles started (I think) with: Internet of Dildos: A Long Way to a Vibrant Future – From IoT to IoD, covering security flaws in Vibratissimo PantyBuster, MagicMotion Flamingo, and Realov Lydia, reads in part:


The results are the foundations for a Master thesis written by Werner Schober in cooperation with SEC Consult and the University of Applied Sciences St. Pölten. The first available results can be found in the following chapters of this blog post.

The sex toys of the “Vibratissimo” product line and their cloud platform, both manufactured and operated by the German company Amor Gummiwaren GmbH, were affected by severe security vulnerabilities. The information we present is not only relevant from a technological perspective, but also from a data protection and privacy perspective. The database containing all the customer data (explicit images, chat logs, sexual orientation, email addresses, passwords in clear text, etc.) was basically readable for everyone on the internet. Moreover, an attacker was able to remotely pleasure individuals without their consent. This could be possible if an attacker is nearby a victim (within Bluetooth range), or even over the internet. Furthermore, the enumeration of explicit images of all users is possible because of predictable numbers and missing authorization checks.

Other coverage of the vulnerability includes:

Vibratissimo product line (includes the PantyBuster).

The cited coverage doesn’t answer how to incentivize end-to-end encrypted sex toys?

Here’s one suggestion: Buy the PantyBuster or other “smart” sex toys in bulk. Re-ship these sex toys, after duly noting their serial numbers and other access information, to your government representatives, sports or TV figures, judges, military officers, etc. People whose privacy matters to the government.

If someone were to post a list of such devices, well, you can imagine the speed with sex toys will be required to be encrypted in your market.

Some people see vulnerabilities and see problems.

I see the same vulnerabilities and see endless possibilities.

Weird Machines, exploitability, and proven unexploitability – Video

Filed under: Cybersecurity,Hacking,Security — Patrick Durusau @ 10:32 am

Thomas Dullien/Halvar Flake’s presentation Weird Machines, exploitability, and proven unexploitability won’t embed but you can watch it on Vimeo.

Great presentation of the paper I mentioned at: Weird machines, exploitability, and provable unexploitability.

Includes this image of a “MitiGator:”

Views “software as an emulator for the finite state machine I would like to have.” (rough paraphrase)

Another gem, attackers don’t distinguish between data and programming:

OK, one more gem and you have to go watch the video:

Proof of unexploitability:

Mostly rote exhaustion of the possible weird state transitions.

The example used is “several orders of magnitude” less complicated than most software. Possible to prove but difficult even with simple examples.

Definitely a “watch this space” field of computer science.

Appendices with code: http://www.dullien.net/thomas/weird-machines-exploitability.pdf

February 1, 2018

NSA Exploits – Mining Malware – Ethics Question

Filed under: Cybersecurity,Ethics,Hacking,NSA,Security — Patrick Durusau @ 9:24 pm

New Monero mining malware infected 500K PCs by using 2 NSA exploits

From the post:

It looks like the craze of cryptocurrency mining is taking over the world by storm as every new day there is a new malware targeting unsuspecting users to use their computing power to mine cryptocurrency. Recently, the IT security researchers at Proofpoint have discovered a Monero mining malware that uses leaked NSA (National Security Agency) EternalBlue exploit to spread itself.

The post also mentions use of the NSA exploit, EsteemAudit.

A fair number of leads and worth your time to read in detail.

I suspect most of the data science ethics crowd will down vote the use of NSA exploits (EternalBlue, EsteemAudit) for cyrptocurrency mining.

Here’s a somewhat harder data science ethics question:

Is it ethical to infect 500,000+ Windows computers belonging to a government for the purpose of obtaining internal documents?

Does your answer depend upon which government and what documents?

Governments don’t take your rights into consideration. Should you take their laws into consideration?

George “Machine Gun” Kelly (Bank Commissioner), DJ Patil (Data Science Ethics)

Filed under: Data Science,Ethics — Patrick Durusau @ 9:04 pm

A Code of Ethics for Data Science by DJ Patil. (Former U.S. Chief Data Scientist)

From the post:


With the old adage that with great power comes great responsibility, it’s time for the data science community to take a leadership role in defining right from wrong. Much like the Hippocratic Oath defines Do No Harm for the medical profession, the data science community must have a set of principles to guide and hold each other accountable as data science professionals. To collectively understand the difference between helpful and harmful. To guide and push each other in putting responsible behaviors into practice. And to help empower the masses rather than to disenfranchise them. Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see.

So how do we do it? First, there is no single voice that determines these choices. This MUST be community effort. Data Science is a team sport and we’ve got to decide what kind of team we want to be.

Consider the specifics of Patil’s regime (2015-2017), when government data scientists:

  • Mined information on U.S. citizens. (check)
  • Mined information on non-U.S. citizens. (check)
  • Hackd computer systems of both citizens and non-citizens. (check)
  • Spread disinformation both domestically and abroad. (check)

Unless you want to resurrect George “Machine Gun” Kelly to be your banking commissioner, Patil is a poor choice to lead a charge on ethics.

Despite violations of U.S. law during his tenure as U.S. Chief Data Scientist, Patil was responsible for NO prosecutions, investigations or even whistle-blowing on a single government data scientist.

Patil’s lemming traits come to the fore when he says:


And finally, our democratic systems have been under attack using our very own data to incite hate and sow discord.

Patil ignores two very critical aspects of that claim:

  1. There has been no, repeat no forensic evidence released to support that claim. All that supports it are claims by people who claim to have seen something, but they can’t say what.
  2. The United States (that would be us), has tried to overthrow governments seventy-two times during the Cold War. Sometimes the U.S. has succeeded. Posts on Twitter and Facebook pale by comparison.

Don’t mistake Patil’s use of the term “ethics” as meaning what you mean by “ethics.” Based on his prior record and his post, you can guess that Patil’s “ethics” gives a wide berth to abusive governments and corporations.

January 31, 2018

Python’s One Hundred and Thirty-Nine Week Lectionary Cycle

Filed under: Programming,Python — Patrick Durusau @ 7:41 pm

Python 3 Module of the Week by Doug Hellmann

From the webpage:

PyMOTW-3 is a series of articles written by Doug Hellmann to demonstrate how to use the modules of the Python 3 standard library….

Hellman documents one hundred and thirty-nine (139) modules in the Python standard library.

How many of them can you name?

To improve your score, use Hellman’s list as a one hundred and thirty-nine (139) week lectionary cycle on Python.

Some modules may take less than a week, but some, re — Regular Expressions, will take more than a week.

Even if you don’t finish a longer module, push on after two weeks so you can keep that feeling of progress and encountering new material.

GraphDBLP [“dblp computer science bibliography” as a graph]

Filed under: Computer Science,Graphs,Neo4j,Networks — Patrick Durusau @ 3:30 pm

GraphDBLP: a system for analysing networks of computer scientists through graph databases by Mario Mezzanzanica, et al.

Abstract:

This paper presents GraphDBLP, a system that models the DBLP bibliography as a graph database for performing graph-based queries and social network analyses. GraphDBLP also enriches the DBLP data through semantic keyword similarities computed via word-embedding. In this paper, we discuss how the system was formalized as a multi-graph, and how similarity relations were identified through word2vec. We also provide three meaningful queries for exploring the DBLP community to (i) investigate author profiles by analysing their publication records; (ii) identify the most prolific authors on a given topic, and (iii) perform social network analyses over the whole community. To date, GraphDBLP contains 5+ million nodes and 24+ million relationships, enabling users to explore the DBLP data by referencing more than 3.3 million publications, 1.7 million authors, and more than 5 thousand publication venues. Through the use of word-embedding, more than 7.5 thousand keywords and related similarity values were collected. GraphDBLP was implemented on top of the Neo4j graph database. The whole dataset and the source code are publicly available to foster the improvement of GraphDBLP in the whole computer science community.

Although the article is behind a paywall, GraphDBLP as a tool is not! https://github.com/fabiomercorio/GraphDBLP.

From the webpage:

GraphDBLP is a tool that models the DBLP bibliography as a graph database for performing graph-based queries and social network analyses.

GraphDBLP also enriches the DBLP data through semantic keyword similarities computed via word-embedding.

GraphDBLP provides to users three meaningful queries for exploring the DBLP community:

  1. investigate author profiles by analysing their publication records;
  2. identify the most prolific authors on a given topic;
  3. perform social network analyses over the whole community;
  4. perform shortest-paths over DBLP (e.g., the shortest-path between authors, the analysis of co-author networks, etc.)

… (emphasis in original)

Sorry to see author, title, venue, publication, keyword all as flat strings but that’s not uncommon. Disappointing but not uncommon.

Viewing these flat strings as parts of structured representatives will be in addition to this default.

Not to minimize the importance of improving the usefulness of the dblp, but imagine integrating the GraphDBLP into your local library system. Without a massive data mapping project. That’s what lies just beyond the reach of this data project.

AutoSploit

Filed under: Cybersecurity,Hacking,Security — Patrick Durusau @ 11:41 am

AutoSploit

From the webpage:

As the name might suggest AutoSploit attempts to automate the exploitation of remote hosts. Targets are collected automatically as well by employing the Shodan.io API. The program allows the user to enter their platform specific search query such as; Apache, IIS, etc, upon which a list of candidates will be retrieved.

After this operation has been completed the ‘Exploit’ component of the program will go about the business of attempting to exploit these targets by running a series of Metasploit modules against them. Which Metasploit modules will be employed in this manner is determined by programatically comparing the name of the module to the initial search query. However, I have added functionality to run all available modules against the targets in a ‘Hail Mary’ type of attack as well.

The available Metasploit modules have been selected to facilitate Remote Code Execution and to attempt to gain Reverse TCP Shells and/or Meterpreter sessions. Workspace, local host and local port for MSF facilitated back connections are configured through the dialog that comes up before the ‘Exploit’ component is started.

Operational Security Consideration

Receiving back connections on your local machine might not be the best idea from an OPSEC standpoint. Instead consider running this tool from a VPS that has all the dependencies required, available.

What a great day to be alive!

“Security experts,” such as Richard Bejtlich, @taosecurity, are already crying:

There is no need to release this. The tie to Shodan puts it over the edge. There is no legitimate reason to put mass exploitation of public systems within the reach of script kiddies. Just because you can do something doesn’t make it wise to do so. This will end in tears.

The same “security experts” who never complain about script kiddies that work for the CIA for example.

Script kiddies at the CIA? Sure! Who do you think uses the tools described in: Vault7: CIA Hacking Tools Revealed, Vault 7: ExpressLane, Vault 7: Angelfire, Vault 7: Protego, Vault 8: Hive?

You didn’t think CIA staff only use tools they develop themselves from scratch did you? Neither do “security experts,” even ones capable of replicating well known tools and exploits.

So why the complaints present and forthcoming from “security experts?”

Well, for one thing, they are no longer special guardians of secret knowledge.

Ok, in practical economic terms, AutoSploit means any business, corporation or individual can run a robust penetration test against their own systems.

You don’t need a “security expert” for the task. The “security experts” with all the hoarded knowledge and expertise.

Considering “security experts” as a class (with notable exceptions) have sided with governments and corporations for decades, any downside for them is just an added bonus.

Email Address Vacuuming – Infoga

Filed under: Email,Hacking — Patrick Durusau @ 11:06 am

Infoga – Email Information Gathering

From the post:

Infoga is a tool for gathering e-mail accounts information (ip,hostname,country,…) from different public sources (search engines, pgp key servers). Is a really simple tool, but very effective for the early stages of a penetration test or just to know the visibility of your company in the Internet.

Its not COMINT:

COMINT or communications intelligence is intelligence gained through the interception of foreign communications, excluding open radio and television broadcasts. It is a subset of signals intelligence, or SIGINT, with the latter being understood as comprising COMINT and ELINT, electronic intelligence derived from non-communication electronic signals such as radar. (COMINT (Communications Intelligence))

as practiced by the NSA, but that doesn’t keep it from being useful.

Not gathering useless data means a smaller haystack and a greater chance of finding needles.

Other focused information mining tools you would recommend?

Don’t Mix Public and Dark Web Use of A Bitcoin Address

Filed under: Cybersecurity,Dark Web,Privacy,Security — Patrick Durusau @ 10:30 am

Bitcoin payments used to unmask dark web users by John E Dunn.

From the post:

Researchers have discovered a way of identifying those who bought or sold goods on the dark web, by forensically connecting them to Bitcoin transactions.

It sounds counter-intuitive. The dark web comprises thousands of hidden services accessed through an anonymity-protecting system, usually Tor.

Bitcoin transactions, meanwhile, are supposed to be pseudonymous, which is to say visible to everyone but not in a way that can easily be connected to someone’s identity.

If you believe that putting these two technologies together should result in perfect anonymity, you might want to read When A Small Leak Sinks A Great Ship to hear some bad news:

Researchers matched Bitcoin addresses found on the dark web with those found on the public web. Depending on the amount of information on the public web, identified named individuals.

Black Letter Rule: Maintain separate Bitcoin accounts for each online persona.

Black Letter Rule: Never use a public persona on the dark web or a dark web persona on the public web.

Black Letter Rule: Never make Bitcoin transactions between public versus dark web personas.

Remind yourself of basic OpSec rules every day.

January 30, 2018

Better OpSec – Black Hat Webcast – Thursday, February 15, 2018 – 2:00 PM EST

Filed under: Cybersecurity,Security — Patrick Durusau @ 8:38 pm

How the Feds Caught Russian Mega-Carder Roman Seleznev by Norman Barbosa and Harold Chun.

From the webpage:

How did the Feds catch the notorious Russian computer hacker Roman Seleznev – the person responsible for over 400 point of sale hacks and at least $169 million in credit card fraud? What challenges did the government face piecing together the international trail of electronic evidence that he left? How was Seleznev located and ultimately arrested?

This presentation will review the investigation that will include a summary of the electronic evidence that was collected and the methods used to collect that evidence. The team that convicted Seleznev will show how that evidence of user attribution was used to finger Seleznev as the hacker and infamous credit card broker behind the online nics nCuX, Track2, Bulba and 2Pac.

The presentation will also discuss efforts to locate Seleznev, a Russian national, and apprehend him while he vacationed in the Maldives. The presentation will also cover the August 2016 federal jury trial with a focus on computer forensic issues, including how prosecutors used Microsoft Windows artifacts to successfully combat Seleznev’s trial defense.

If you want to improve your opsec, study hackers who have been caught.

Formally it’s called avoiding survivorship bias. Survivorship bias – lessons from World War Two aircraft by Nick Ingram.

Abraham Wald was tasked with deciding where to add extra armour to improve the survival of airplanes in combat. Abraham Wald and the Missing Bullet Holes (An excerpt from How Not To Be Wrong by Jordan Ellenberg).

It’s a great story and one you should remember.

Combating State of the Uniom Brain Damage – Malware Reversing – Burpsuite Keygen

Filed under: Cybersecurity,Hacking,Malware,Reverse Engineering — Patrick Durusau @ 5:43 pm

Malware Reversing – Burpsuite Keygen by @lkw.

From the post:

Some random new “user” called @the_heat_man posted some files on the forums multiple times (after being deleted by mods) caliming it was a keygen for burpsuite. Many members of these forums were suspicious of it being malware. I, along with @Leeky, @dtm, @Cry0l1t3 and @L0k1 (please let me know if I missed anyone) decided to reverse engineer it to see if it is. Surprisingly as well as containing a remote access trojan (RAT) it actually contains a working keygen. As such, for legal reasons I have not included a link to the original file.

The following is a writeup of the analysis of the RAT.

In the event you, friend or family member is accidentally exposed to the State of the Uniom speech night, permanent brain damage can be avoided by repeated exposure to intellectually challenging material. For an extended time period.

With that in mind, I mention Malware Reversing – Burpsuite Keygen.

Especially challenging if you aren’t familiar with reverse engineering but the extra work of understanding each step will exercise your brain that much harder.

How serious can the brain damage be?

A few tweets from Potus and multiple sources report Democratic Senators and Representatives extolling the FBI as a bulwark of democracy.

Really? The same FBI that infiltrated civil rights groups, anti-war protesters, 9/11 defense, Black Panthers, SCLC,, etc. That FBI? The same FBI that continues such activities to this very day?

A few tweets produce that level of brain dysfunction. Imagine the impact of 20 to 30 continuous minutes of exposure.

State of the Uniom is scheduled for 9 PM EST on 30 January 2018.

Readers are strongly advised to turn off all TVs and radios, to minimize the chances of accidental exposure to the State of the Uniom or repetition of the same. The New York Times will be streaming it live on its website. I have omitted that URL for your safety.

Safe activities include, reading a book, consensual sex, knitting, baking, board games and crossword puzzles, to name only a few. Best of luck to us all.

January 29, 2018

Have You Been Drafted by Data Science Ethics?

Filed under: Data Science,Ethics — Patrick Durusau @ 8:25 pm

I ask because Strava‘s recent heatmap release (Fitness tracking app Strava gives away location of secret US army bases) is being used as a platform to urge unpaid consideration of government and military interests by data scientists.

Consider Ray Crowell‘s Strava Heatmaps: Why Ethics in Design Matters which presumes data scientists have an unpaid obligation to consider the interests of the military:

From the post:


These organizations have been warned for years (including by myself) of the information/operational security (specifically with pattern of life, that is, the data collected and analyzed establish an individual’s past behavior, determine their current behavior, and predict their future behavior) implications associated with social platforms and advanced analytical technology. I spent my career stabilizing this intersection between national security and progress — having a deep understanding of the protection of lives, billion-dollar weapon systems, and geopolitical assurances and on the other side, the power of many of these technological advancements in enabling access to health and wellness for all.

Getting at this balance requires us to not get enamored by the idea or implications of ethically sound solutions, but rather exposing our design practices to ethical scrutiny.

These tools are not only beneficial for the designer, but for the user as well. I mention these specifically for institutions like the Defense Department, impacted from the Strava heatmap and frankly many other technologies being employed both sanctioned and unsanctioned by military members and on military installations. These tools are beneficial the institution’s leadership to “reverse engineer” what technologies on the market can do by way of harm … in balance with the good. I learned a long time ago, from wiser mentors than myself, that you don’t know what you’re missing, if you’re not looking to begin with.

Crowell imposes an unpaid ethical obligation any unsuspecting reader/data scientist to consider their impact on government or military organizations.

In that effort, Crowell is certainly not alone:

If you contract to work for a government or military group, you owe them an ethical obligation of your best efforts. Just as for any other client.

However, volunteering unpaid assistance for military or government organizations, damages the market for data scientists.

Now that’s unethical!

PS: I agree there are ethical obligations to consider the impact of your work on disenfranchised, oppressed or abused populations. Governments and military organizations don’t qualify as any of those.

January 24, 2018

‘Learning to Rank’ (No Unique Feature Name Fail – Update)

Filed under: Artificial Intelligence,ElasticSearch,Ranking,Searching — Patrick Durusau @ 8:02 pm

Elasticsearch ‘Learning to Rank’ Released, Bringing Open Source AI to Search Teams

From the post:

Search experts at OpenSource Connections, the Wikimedia Foundation, and Snagajob, deliver open source cognitive search capabilities to the Elasticsearch community. The open source Learning to Rank plugin allows organizations to control search relevance ranking with machine learning. The plugin is currently delivering search results at Wikipedia and Snagajob, providing significant search quality improvements over legacy solutions.

Learning to Rank lets organizations:

  • Directly optimize sales, conversions and user satisfaction in search
  • Personalize search for users
  • Drive deeper insights from a knowledge base
  • Customize ranking down for complex nuance
  • Avoid the sticker shock & lock-in of a proprietary "cognitive search" product

“Our mission is to empower search teams. This plugin gives teams deep control of ranking, allowing machine learning models to be directly deployed to the search engine for relevance ranking” said Doug Turnbull, author of Relevant Search and CTO, OpenSource Connections.

I need to work through all the documentation and examples but:

Feature Names are Unique

Because some model training libraries refer to features by name, Elasticsearch LTR enforces unique names for each features. In the example above, we could not add a new user_rating feature without creating an error.

is a warning of what you (and I) are likely to find.

Really? Someone involved in the design thought globally unique feature names was a good idea? Or at a minimum didn’t realize it is a very bad idea?

Scope anyone? Either in the programming or topic map sense?

Despite the unique feature name fail, I’m sure ‘Learning to Rank’ will be useful. But not as useful as it could have been.

Doug Turnbull (https://twitter.com/softwaredoug) advises that features are scoped by feature stores, so the correct prose would read: “…LTR enforces unique names for each feature within a feature store.”

No fail, just bad writing.

Eset’s Guide to DeObfuscating and DeVirtualizing FinFisher

Filed under: Cybersecurity,Hacking — Patrick Durusau @ 5:38 pm

Eset’s Guide to DeObfuscating and DeVirtualizing FinFisher

From the introduction:

Thanks to its strong anti-analysis measures, the FinFisher spyware has gone largely unexplored. Despite being a prominent surveillance tool, only partial analyses have been published on its more recent samples.

Things were put in motion in the summer of 2017 with ESET’s analysis of FinFisher surveillance campaigns that ESET had discovered in several countries. In the course of our research, we have identified campaigns where internet service providers most probably played the key role in compromising the victims with FinFisher.

When we started thoroughly analyzing this malware, the main part of our effort was overcoming FinFisher’s anti-analysis measures in its Windows versions. The combination of advanced obfuscation techniques and proprietary virtualization makes FinFisher very hard to de-cloak.

To share what we learnt in de-cloaking this malware, we have created this guide to help others take a peek inside FinFisher and analyze it. Apart from offering practical insight into analyzing FinFisher’s virtual machine, the guide can also help readers to understand virtual machine protection in general – that is, proprietary virtual machines found inside a binary and used for software protection. We will not be discussing virtual machines used in interpreted programming languages to provide compatibility across various platforms, such as the Java VM.

We have also analyzed Android versions of FinFisher, whose protection mechanism is based on an open source LLVM obfuscator. It is not as sophisticated or interesting as the protection mechanism used in the Windows versions, thus we will not be discussing it in this guide.

Hopefully, experts from security researchers to malware analysts will make use of this guide to better understand FinFisher’s tools and tactics, and to protect their customers against this omnipotent security and privacy threat.

Beyond me at the moment but one should always try to learn from the very best. Making note of what can’t be understood/used today in hopes of revisiting it in the future.

Numerous reports describe FinFisher as spyware sold exclusively to governments and their agencies. Perhaps less “exclusively” than previously thought.

In any event, FinFisher is reported to be in the wild so perhaps governments that bought Finfisher will be uncovered by FinFisher.

A more deserving group of people is hard to imagine.

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Filed under: Adversarial Learning,Speech Recognition — Patrick Durusau @ 4:56 pm

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text by Nicholas Carlini and David Wagner.

Abstract:

We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second). We apply our iterative optimization-based attack to Mozilla’s implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.

You can consult the data used and code at: http://nicholas.carlini.com/code/audio_adversarial_examples.

Important not only for defeating automatic speech recognition but also for establishing properties of audio recognition differ from visual recognition.

A hint that automatic recognition properties cannot be assumed for unexplored domains.

Visualizing trigrams with the Tidyverse (Who Reads Jane Austen?)

Filed under: Literature,R,Visualization — Patrick Durusau @ 4:41 pm

Visualizing trigrams with the Tidyverse by Emil Hvitfeldt.

From the post:

In this post I’ll go though how I created the data visualization I posted yesterday on twitter:

Great post and R code, but who reads Jane Austen? 😉

I have a serious weakness for academic and ancient texts so the Jane Austen question is meant in jest.

The more direct question is to what other texts would you apply this trigram/visualization technique?

Suggestions?

I have some texts in mind but defer mentioning them while I prepare a demonstration of Hvitfeldt’s technique to them.

PS: I ran across an odd comment in the janeaustenr package:

Each text is in a character vector with elements of about 70 characters.

You have to hunt for a bit but 70 characters is the default plain text line length at Gutenberg. Some poor decisions are going to be with us for a very long time.

Data Science at the Command Line (update, now online for free)

Filed under: Data Science — Patrick Durusau @ 3:14 pm

Data Science at the Command Line by Jeroen Janssens.

From the webpage:

This is the website for Data Science at the Command Line, published by O’Reilly October 2014 First Edition. This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.

To get you started—whether you’re on Windows, macOS, or Linux—author Jeroen Janssens has developed a Docker image packed with over 80 command-line tools.

Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line.

I posted about Data Science at the Command Line in August of 2014 and it remains as relevant today as when originally published.

Impress your friends, perhaps your manager, but most importantly, yourself.

Enjoy!

Games = Geeks, Geeks = People with Access (New Paths To Transparency)

Filed under: Cybersecurity,Hacking,Security — Patrick Durusau @ 3:01 pm

Critical Flaw in All Blizzard Games Could Let Hackers Hijack Millions of PCs by Mohit Kumar.

From the post:

A Google security researcher has discovered a severe vulnerability in Blizzard games that could allow remote attackers to run malicious code on gamers’ computers.

Played every month by half a billion users—World of Warcraft, Overwatch, Diablo III, Hearthstone and Starcraft II are popular online games created by Blizzard Entertainment.

To play Blizzard games online using web browsers, users need to install a game client application, called ‘Blizzard Update Agent,’ onto their systems that run JSON-RPC server over HTTP protocol on port 1120, and “accepts commands to install, uninstall, change settings, update and other maintenance related options.”
… (emphasis in original)

See Kumar’s post for the details on “DNS Rebinding.”

Unless you are running a bot net, why would anyone want to hijack millions of PCs?

If you wanted to rob for cash, would you rob people buying subway tokens or would you rob a bank? (That’s not a trick question. Bank is the correct answer.)

The same is true with creating government or corporate transparency. You could subvert every computer at a location but the smart money says to breach the server and collect all the documents from that central location.

How to breach servers? Target sysadmins, i.e., the people who play computer games.

PS: I would not be overly concerned with Blizzard’s reported development of patches. No doubt other holes exist or will be created by their patches.

January 23, 2018

The vector algebra war: a historical perspective [Semantic Confusion in Engineering and Physics]

The vector algebra war: a historical perspective by James M. Chappell, Azhar Iqbal, John G. Hartnett, Derek Abbott.

Abstract:

There are a wide variety of different vector formalisms currently utilized in engineering and physics. For example, Gibbs’ three-vectors, Minkowski four-vectors, complex spinors in quantum mechanics, quaternions used to describe rigid body rotations and vectors defined in Clifford geometric algebra. With such a range of vector formalisms in use, it thus appears that there is as yet no general agreement on a vector formalism suitable for science as a whole. This is surprising, in that, one of the primary goals of nineteenth century science was to suitably describe vectors in three-dimensional space. This situation has also had the unfortunate consequence of fragmenting knowledge across many disciplines, and requiring a significant amount of time and effort in learning the various formalisms. We thus historically review the development of our various vector systems and conclude that Clifford’s multivectors best fulfills the goal of describing vectorial quantities in three dimensions and providing a unified vector system for science.

An image from the paper captures the “descent of the various vector systems:”

The authors contend for use of Clifford’s multivectors over the other vector formalisms described.

Assuming Clifford’s multivectors displace all other systems in use, the authors fail to answer how readers will access the present and past legacy of materials in other formalisms?

If the goal is to eliminate “fragmenting knowledge across many disciplines, and requiring a significant amount of time and effort in learning the various formalisms,” that fails in the absence of a mechanism to access existing materials using the Clifford’s multivector formalism.

Topic maps anyone?

Stop, Stop, Stop All the Patching, Give Intel Time to Breath

Filed under: Cybersecurity,Security — Patrick Durusau @ 7:37 am

Root Cause of Reboot Issue Identified; Updated Guidance for Customers and Partners by Navin Shenoy.

From the post:

As we start the week, I want to provide an update on the reboot issues we reported Jan. 11. We have now identified the root cause for Broadwell and Haswell platforms, and made good progress in developing a solution to address it. Over the weekend, we began rolling out an early version of the updated solution to industry partners for testing, and we will make a final release available once that testing has been completed.

Based on this, we are updating our guidance for customers and partners:

  • We recommend that OEMs, cloud service providers, system manufacturers, software vendors and end users stop deployment of current versions, as they may introduce higher than expected reboots and other unpredictable system behavior. For the full list of platforms, see the Intel.com Security Center site.
  • We ask that our industry partners focus efforts on testing early versions of the updated solution so we can accelerate its release. We expect to share more details on timing later this week.
  • We continue to urge all customers to vigilantly maintain security best practice and for consumers to keep systems up-to-date.

I apologize for any disruption this change in guidance may cause. The security of our products is critical for Intel, our customers and partners, and for me, personally. I assure you we are working around the clock to ensure we are addressing these issues.

I will keep you updated as we learn more and thank you for your patience.

Essence of Shenoy’s advice:

…OEMs, cloud service providers, system manufacturers, software vendors and end users stop deployment of current versions, as they may introduce higher than expected reboots and other unpredictable system behavior.

Or better:

Patching an Intel machine makes it worse.

That’s hardly news.

Unverifiable firmware/code + unverifiable patch = unverifiable firmware/code + patch. What part of that seems unclear?

January 22, 2018

WebGoat (Advantage over OPM)

Filed under: Cybersecurity,Hacking,Security — Patrick Durusau @ 9:41 pm

Deliberately Insecure Web Application: OWASP WebGoat

From the webpage:

WebGoat is a deliberately insecure web application maintained by OWASP designed to teach web application security lessons. You can install and practice with WebGoat in either J2EE or WebGoat for .Net in ASP.NET. In each lesson, users must demonstrate their understanding of a security issue by exploiting a real vulnerability in the WebGoat applications.

WebGoat for J2EE is written in Java and therefore installs on any platform with a Java virtual machine. Once deployed, the user can go through the lessons and track their progress with the scorecard.

WebGoat’s scorecards are a feature not found when hacking Office of Personnel Management (OPM). Hacks of the OPM are reported by its inspector general and more generally in the computer security press.

EFF Investigates Dark Caracal (But Why?)

Filed under: Cybersecurity,Electronic Frontier Foundation,Government,Privacy,Security — Patrick Durusau @ 9:19 pm

Someone is touting a mobile, PC spyware platform called Dark Caracal to governments by Iain Thomson.

From the post:

An investigation by the Electronic Frontier Foundation and security biz Lookout has uncovered Dark Caracal, a surveillance-toolkit-for-hire that has been used to suck huge amounts of data from Android mobiles and Windows desktop PCs around the world.

Dark Caracal [PDF] appears to be controlled from the Lebanon General Directorate of General Security in Beirut – an intelligence agency – and has slurped hundreds of gigabytes of information from devices. It shares its backend infrastructure with another state-sponsored surveillance campaign, Operation Manul, which the EFF claims was operated by the Kazakhstan government last year.

Crucially, it appears someone is renting out the Dark Caracal spyware platform to nation-state snoops.

The EFF could be spending its time and resources duplicating Dark Caracal for the average citizen.

Instead the EFF continues its quixotic pursuit of governmental wrong-doers. I say “quixotic” because those pilloried by the EFF, such as the NSA, never change their behavior. Unlawful conduct, including surveillance continues.

But don’t take my word for it, the NSA admits that it deletes data it promised under court order to preserve: NSA deleted surveillance data it pledged to preserve. No consequences. Just like there were no consequences when Snowden revealed widespread and illegal surveillance by the NSA.

So you have to wonder, if investigating and suing governmental intelligence organizations produces no tangible results, why is the EFF pursuing them?

If the average citizen had the equivalent of Dark Caracal at their disposal, say as desktop software, the ability of governments like Lebanon, Kazakhstan, and others, to hide their crimes, would be greatly reduced.

Exposure is no guarantee of accountability and/or punishment, but the wack-a-mole strategy of the EFF hasn’t produced transparency or consequences.

Don Knuth Needs Your Help

Filed under: Computer Science,Programming — Patrick Durusau @ 9:04 pm

Donald Knuth Turns 80, Seeks Problem-Solvers For TAOCP

From the post:

An anonymous reader writes:

When 24-year-old Donald Knuth began writing The Art of Computer Programming, he had no idea that he’d still be working on it 56 years later. This month he also celebrated his 80th birthday in Sweden with the world premier of Knuth’s Fantasia Apocalyptica, a multimedia work for pipe organ and video based on the bible’s Book of Revelations, which Knuth describes as “50 years in the making.”

But Knuth also points to the recent publication of “one of the most important sections of The Art of Computer Programming” in preliminary paperback form: Volume 4, Fascicle 6: Satisfiability. (“Given a Boolean function, can its variables be set to at least one pattern of 0s and 1 that will make the function true?”)

Here’s an excerpt from its back cover:

Revolutionary methods for solving such problems emerged at the beginning of the twenty-first century, and they’ve led to game-changing applications in industry. These so-called “SAT solvers” can now routinely find solutions to practical problems that involve millions of variables and were thought until very recently to be hopelessly difficult.

“in several noteworthy cases, nobody has yet pointed out any errors…” Knuth writes on his site, adding “I fear that the most probable hypothesis is that nobody has been sufficiently motivated to check these things out carefully as yet.” He’s uncomfortable printing a hardcover edition that hasn’t been fully vetted, and “I would like to enter here a plea for some readers to tell me explicitly, ‘Dear Don, I have read exercise N and its answer very carefully, and I believe that it is 100% correct,'” where N is one of the exercises listed on his web site.

Elsewhere he writes that two “pre-fascicles” — 5a and 5B — are also available for alpha-testing. “I’ve put them online primarily so that experts in the field can check the contents before I inflict them on a wider audience. But if you want to help debug them, please go right ahead.”

Do you have some other leisure project for 2018 that is more important?

😉

« Newer PostsOlder Posts »

Powered by WordPress