Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 10, 2018

The Complexity of Neurons are Beyond Our Current Imagination

Filed under: Artificial Intelligence — Patrick Durusau @ 9:28 pm

The Complexity of Neurons are Beyond Our Current Imagination by Carlos E. Perez.

From the post:

One of the biggest misconceptions around is the idea that Deep Learning or Artificial Neural Networks (ANN) mimic biological neurons. At best, ANN mimic a cartoonish version of a 1957 model of a neuron. Neurons in Deep Learning are essentially mathematical functions that perform a similarity function of its inputs against internal weights. The closer a match is made, the more likely an action is performed (i.e. not sending a signal to zero). There are exceptions to this model (see: Autoregressive networks) however it is general enough to include the perceptron, convolution networks and RNNs.

Jeff Hawkins of Numenta has always lamented that a more biologically-inspired approach is needed. So, in his research on building cognitive machinery, he has architected system that more mimic the structure of the neo-cortex. Numenta’s model of a neuron is considerably more elaborate than the Deep Learning model of a neuron:

I rather like the line “ANN mimic a cartoonish version of a 1957 model of a neuron.”

You need not worry about the MIT Intelligence Quest replicating neurons anytime soon.

In part because no one really knows how neurons work or how much more we need to learn to replicate them.

The AI crowd could train a neural network to recognize people and to fire weapons at them. Qualifies as destruction of humanity by an AI but if we are really that stupid, perhaps its time to make space for others.

JanusGraph + YugaByte (Does Cloud-Native Mean I Call Langley For Backup Support?)

Filed under: Graphs,JanusGraph — Patrick Durusau @ 8:59 pm

JanusGraph + YugaByte

Short tutorial on setting up JanusGraph to work with YugaByte DB.

I know JanusGraph so looked for more on YugaByte DB and found (overview):


Purpose-built for mission-critical applications

Mission-critical applications have a strong need for data correctness and high availability. They are typically composed of microservices with diverse workloads such as key/value, flexible schema, graph or relational. The access patterns vary as well. SaaS services or mobile/web applications keeping customer records, order history or messages need zero-data loss, geo-replication, low-latency reads/writes and a consistent customer experience. Fast data infrastructure use cases (such as IoT, finance, timeseries data) need near real-time & high-volume ingest, low-latency reads, and native integration with analytics frameworks like Apache Spark.

YugaByte DB offers polyglot persistence to power these diverse workloads and access patterns in a unified database, while providing strong correctness guarantees and high availability. You are no longer forced to create infrastructure silos for each workload or choose between different flavors SQL and NoSQL databases. YugaByte breaks down the barrier between SQL and NoSQL by offering both.

Cloud-native agility

Another theme common across these microservices is the move to a cloud-native architecture, be it on the public cloud, on-premises or hybrid environment. The primary driver is to make infrastructure agile. Agile infrastructure is linearly scalable, fault-tolerant, geo-distributed, re-configurabile with zero downtime and portable across clouds. While the container ecosystem led by Docker & Kubernetes has enabled enterprises to realize this vision for the stateless tier, the data tier has remained a big challenge. YugaByte DB is purpose-built to address these challenges, but for the data tier, and serves as the stateful complement to containers.

Only partially joking about “cloud-native” meaning you call Langley (CIA) for backup support.

Anything that isn’t air-gapped in a secure facility has been compromised. Note the use of past tense.

Disclosures about government spying, to say nothing of your competitors and lastly hackers, makes any other assumption untenable.

MIT Intelligence Quest

Filed under: Artificial Intelligence,Machine Learning — Patrick Durusau @ 8:36 pm

MIT Intelligence Quest

From the webpage:

The MIT Intelligence Quest will advance the science and engineering of both human and machine intelligence. Launched on February 1, 2018, MIT IQ seeks to discover the foundations of human intelligence and drive the development of technological tools that can positively influence virtually every aspect of society.

The Institute’s culture of collaboration will encourage life scientists, computer scientists, social scientists, and engineers to join forces to investigate the societal implications of their work as they pursue hard problems lying beyond the current horizon of intelligence research. By uniting diverse fields and capitalizing on what they can teach each other, we seek to answer the deepest questions about intelligence.

We are setting out to answer two big questions: How does human intelligence work, in engineering terms? And how can we use that deep grasp of human intelligence to build wiser and more useful machines, to the benefit of society?

Drawing on MIT’s deep strengths and signature values, culture, and history, MIT IQ promises to make important contributions to understanding the nature of intelligence, and to harnessing it to make a better world.

The most refreshing aspect of the MIT Intelligence Quest page is that it ends a contact form.

That’s right, a contact form.

Unlike the ill-fated EU brain project that had pre-chosen approaches and had a roadmap for replicating a human brain. Are they still consuming funds with meetings, hotel rooms, etc.?

You know my mis-givings about creating intelligence in the absence of understanding our own.

On the other hand, mimicking how human intelligence works in bounded situations is a far more tractable problem.

Not too tractable but tractable enough to yield useful results.

XML periodic table

Filed under: XML — Patrick Durusau @ 8:09 pm

XML periodic table

It’s a visual thing and my small blog format style won’t do it justice. Follow the link.

XML grouped by Business language, QA, Document format, Internet format, Graphic format, Metadata standard, Transformation.

What a cool listing!

Lots of old friends but some potential new ones as well!

Enjoy!

February 9, 2018

XML Prague 2018 Conference Proceedings – Weekend Reading!

Filed under: Conferences,XML,XML Database,XPath,XQuery,XSLT — Patrick Durusau @ 9:13 pm

XML Prague 2018 Conference Proceedings

Two Hundred and Sixty (260) pages of high quality content on XML!

From the table of contents:

  • Assisted Structured Authoring using Conditional Random Fields – Bert Willems
  • XML Success Story: Creating and Integrating Collaboration Solutions to Improve the Documentation Process – Steven Higgs
  • xqerl: XQuery 3.1 Implementation in Erlang – Zachary N. Dean
  • XML Tree Models for Efficient Copy Operations – Michael Kay
  • Using Maven with XML development projects – Christophe Marchand and Matthieu Ricaud-Dussarget
  • Varieties of XML Merge: Concurrent versus Sequential – Tejas Pradip Barhate and Nigel Whitaker
  • Including XML Markup in the Automated Collation of Literary Text – Elli Bleeker, Bram Buitendijk, Ronald Haentjens Dekker, and Astrid Kulsdom
  • Multi-Layer Content Modelling to the Rescue – Erik Siegel
  • Combining graph and tree – Hans-Juergen Rennau
  • SML – A simpler and shorter representation of XML – Jean-François Larvoire
  • Can we create a real world rich Internet application using Saxon-JS? – Pieter Masereeuw
  • Implementing XForms using interactive XSLT 3.0 – O’Neil Delpratt and Debbie Lockett
  • Life, the Universe, and CSS Tests – Tony Graham
  • Form, and Content – Steven Pemberton
  • tokenized-to-tree – Gerrit Imsieke

I just got a refurbished laptop for reading in bed. Now I have to load XML parsers, etc. on it to use along with reading these proceedings!

Enjoy!

PS: Be sure to thank Jirka Kosek for his tireless efforts promoting XML and XML Prague!

Alexandra Elbakyan (Sci-Hub) As Freedom Fighter

Filed under: Intellectual Property (IP),Open Access,Open Data — Patrick Durusau @ 3:33 pm

Recognizing Alexandra Elbakyan:

Alexandra Elbakyan is the freedom fighter behind Sci-Hub, a repository of 64.5 million papers, or “two-thirds of all published research, and it [is] available to anyone.”

Ian Graber-Stiehl, in Science’s Pirate Queen, misses an opportunity to ditch the mis-framing of Elbakyan as a “pirate,” and to properly frame her as a freedom fighter.

To set the background for why you too should see Elbakyan as a freedom fighter, it’s necessary to review, briefly, the notion of “sale” and your intellectual freedom prior to widespread use of electronic texts.

When I started using libraries in the ’60’s, you had to physically visit the library to use its books or journals. The library would purchase those items, what is known as first sale, and then either lend them or allow patrons to read them. No separate charge or income for the publisher upon reading. And once purchased, the item remained in the library for use by others.

With the advent of electronic texts, plus oppressive contracts and manipulation of the law, publishers began charging libraries even more than when libraries purchased and maintained access to material for their patrons. Think of it as a form of recurrent extortion, you can’t have access to materials already purchased, save for paying to maintain that access.

Which of course means that both libraries and individuals have lost their right to pay for an item and to maintain it separate and apart from the publisher. That’s a serious theft and it took place in full public view.

There are pirates in this story, people who stole the right of libraries and individuals to purchase items for their own storage and use. Some of the better known ones include: American Chemical Society, Reed-Elsevier (a/k/a RELX Group),Sage Publishing, Springer, Taylor & Francis, and, Wiley-Blackwell.

Elbakyan is trying to recover access for everyone, access that was stolen.

That doesn’t sound like the act of a pirate. Pirates steal for their own benefit. That sounds like the pirates I listed above.

Now that you know Elbakyan is fighting to recover a right taken from you, does that make you view her fight differently?

BTW, when publishers float the false canard of their professional staff/editors/reviewers, remember their retraction rates are silent witnesses refuting their claims of competence.

Read any recent retraction for the listed publishers. Use RetractionWatch for current or past retractions. “Unread” is the best explanation for how most of them got past “staff/editors/reviewers.”

Do you support freedom fighters or publisher/pirates?

If you want to support publisher/pirates, no further action needed.

If you want to support freedom fighters, including Alexandra Elbakyan, the Sci-Hub site has a donate link, contact Elbakyan if you have extra cutting edge equipment to offer, promote Sci-Hub on social media, etc.

For making the lives of publisher/pirates more difficult, use your imagination.

To follow Elbakyan, see her blog and Facebook page.

Fear Keeps People in Line (And Ignorant of Apple Source Code)

Filed under: Cybersecurity,Hacking,Security — Patrick Durusau @ 11:05 am

Apple’s top-secret iBoot firmware source code spills onto GitHub for some insane reason by Chris Williams.

From the post:

The confidential source code to Apple’s iBoot firmware in iPhones, iPads and other iOS devices has leaked into a public GitHub repo.

The closed-source code is top-secret, proprietary, copyright Apple, and yet has been quietly doing the rounds between security researchers and device jailbreakers on Reddit for four or so months, if not longer.

We’re not going to link to it. Also, downloading it is not recommended. Just remember what happened when people shared or sold copies of the stolen Microsoft Windows 2000 source code back in the day.

Notice that Williams cites scary language about the prior Windows source code but not a single example of an actual prosecution for downloading or sharing that source code. I have strong suspicions why no examples were cited.*

You?

The other thing to notice is “security researchers” have been sharing it for months, but if the great unwashed public gets to see it, well, that’s a five alarm fire.

Williams has sided with access only for the privileged, although I would be hard pressed to say why?

BTW, if you want to search Github for source code that claims to originate from Apple, use the search term iBoot.

No direct link because in the DCMA cat and mouse game, any link will be quickly broken and I have no way to verify whether a repository is or isn’t Apple source code.

Don’t let fear keep you ignorant.

*My suspicions are that anyone reading Microsoft Windows 2000 source code became a poorer programmer and that was viewed as penalty enough.

February 8, 2018

OpenStreetMap, R + Revival of Cold War Parades

Filed under: Mapping,OpenStreetMap,R — Patrick Durusau @ 5:26 pm

Cartographic Explorations of the OpenStreetMap Database with R by Timothée Giraud.

From the post:

This post exposes some cartographic explorations of the OpenStreetMap (OSM) database with R.

These explorations begin with the downloading and the cleaning of OSM data. Then I propose a set of map visualizations of the spatial distributions of bars and restaurants in Paris. Of course, these examples could be adapted to other spatial contexts and thematics (e.g. pharmacies in Roma, bike parkings in Dublin…).

This reproducible analysis is hosted on GitHub (code + data + walk-through).

What a timely post! The accidental president of the United States hungers for legitimacy and views a military parade, Cold War style, as a way to achieve that end.

If it weren’t for all those pesky cable news channels, the military could station the reviewing stand in a curve and run the same tanks, same missiles, same troops past the review stand until the president gets bored.

A sensible plan won’t suggest itself to them so expect it to be a more traditional and expensive parade.

Just in case you want to plan other “festivities” at or to intersect with those planned for the president, the data at the OpenStreetMap will prove helpful.

Once the city and parade route becomes known, what questions would you ask of OpenStreetMap data?

Porn, AI and Open Source Ethics

Filed under: Artificial Intelligence,Deep Learning,Open Source,Porn,TensorFlow — Patrick Durusau @ 4:18 pm

Google Gave the World Powerful AI Tools, and the World Made Porn With Them by Dave Gershgorn.

From the post:

In 2015, Google announced it would release its internal tool for developing artificial intelligence algorithms, TensorFlow, a move that would change the tone of how AI research and development would be conducted around the world. The means to build technology that could have an impact as profound as electricity, to borrow phrasing from Google’s CEO, would be open, accessible, and free to use. The barrier to entry was lowered from a Ph.D to a laptop.

But that also meant TensorFlow’s undeniable power was now out of Google’s control. For a little over two years, academia and Silicon Valley were still the ones making the biggest splashes with the software, but now that equation is changing. The catalyst is deepfakes, an anonymous Reddit user who built around AI software that automatically stitches any image of a face (nearly) seamlessly into a video. And you can probably imagine where this is going: As first reported by Motherboard, the software was being used to put anyone’s face, such as a famous woman or friend on Facebook, on the bodies of porn actresses.

After the first Motherboard story, the user created their own subreddit, which amassed more than 91,000 subscribers. Another Reddit user called deepfakeapp has also released a tool called FakeApp, which allows anyone to download the AI software and use it themselves, given the correct hardware. As of today, Reddit has banned the community, saying it violated the website’s policy on involuntary pornography.

According to FakeApp’s user guide, the software is built on top of TensorFlow. Google employees have pioneered similar work using TensorFlow with slightly different setups and subject matter, training algorithms to generate images from scratch. And there are plenty of potentially fun (if not inane) uses for deepfakes, like putting Nicolas Cage in a bunch of different movies. But let’s be real: 91,000 people were subscribed to deepfakes’ subreddit for the porn.

While much good has come from TensorFlow being open source, like potential cancer detection algorithms, FakeApp represents the dark side of open source. Google (and Microsoft and Amazon and Facebook) have loosed immense technological power on the world with absolutely no recourse. Anyone can download AI software and use it for anything they have the data to create. That means everything from faking political speeches (with help from the cadre of available voice-imitating AI) to generating fake revenge porn. All digital media is a series of ones and zeroes, and artificial intelligence is proving itself proficient at artfully arranging them to generate things that never happened.

You can imagine the rest or read the rest of Gershgon’s (deep voice): “dark side of open source.”

While you do, remember that Gershgon would have made the same claims about:

  1. Telephones
  2. Photography
  3. Cable television
  4. Internet
  5. etc.

The simplest rejoinder is that the world did not create porn with AI. A tiny subset of the world signed up to see porn created by an even smaller subset of the world.

The next simplest rejoinder is the realization that Gershgon wants a system that dictates ethics to users of open source software. Gershgon should empower an agency to enforce ethics on journalists and check back in a couple of years to report on their experience.

I’m willing to be ahead of time it won’t be a happy report.

Bottom line: Leave the ethics of open source software to the people using such software. May not always have a happy outcome but will always be better than the alternatives.

Introducing HacSpec (“specification language for cryptographic primitives”)

Filed under: Cryptography,Cybersecurity,Security — Patrick Durusau @ 2:58 pm

Introducing HacSpec by Franziskus Kiefer.

From the post:

HacSpec is a proposal for a new specification language for cryptographic primitives that is succinct, that is easy to read and implement, and that lends itself to formal verification. It aims to formalise the pseudocode used in cryptographic standards by proposing a formal syntax that can be checked for simple errors. HacSpec specifications are further executable to test against test vectors specified in a common syntax.

The main focus of HacSpec is to allow specifications to be compiled to formal languages such as cryptol, coq, F*, and easycrypt and thus make it easier to formally verify implementations. This allows a specification using HacSpec to be the basis not only for implementations but also for formal proofs of functional correctness, cryptographic security, and side-channel resistance.

The idea of having a language like HacSpec stems from discussions at the recent HACS workshop in Zurich. The High-Assurance-Cryptographic-Software workshop (HACS) is an invite-only workshop co-located with the Real World Crypto symposium.

Anyone interested in moving this project forward should subscribe to the mailing list or file issues and pull requests against the Github repository.

Cryptography projects should be monitored like the NSA does NIST cryptography standards. If you see an error or weakness, you’re under no obligation to help. The NSA won’t.

Given security fails from software, users, etc., end-to-end encryption resembles transporting people from one homeless camp to another in an armored car.

Secure in transit but not secure at either end.

Running a Tor Relay (New Guide)

Filed under: Privacy,Security,Tor — Patrick Durusau @ 10:45 am

The New Guide to Running a Tor Relay

Have we told you lately how much we love our relay operators? Relays are the backbone of the Tor network, providing strength and bandwidth for our millions of users worldwide. Without the thousands of fast, reliable relays in the network, Tor wouldn’t exist.

Have you considered running a relay, but didn’t know where to start? Perhaps you’re just looking for a way to help Tor, but you’ve always thought that running a relay was too complicated or technical for you and the documentation seemed daunting.

We’re here to tell you that you can become one of the many thousands of relay operators powering the Tor network, if you have some basic command-line experience.

If you can’t help support the Tor network by running a relay, don’t despair! There’s are always ways to volunteer and of course to donate.

Your support helps everyone who uses Tor and sometimes results in really cool graphics, like this one for running a Tor relay:

If you want something a bit closer to the edge, try creating a graphic where spy rays from corporations and governments bounce off of secure autos, computers, homes, phones.

February 7, 2018

Kali Linux 2018.1 Release

Filed under: Cybersecurity,Security — Patrick Durusau @ 9:52 pm

Kali Linux 2018.1 Release

From the post:

Welcome to our first release of 2018, Kali Linux 2018.1. This fine release contains all updated packages and bug fixes since our 2017.3 release last November. This release wasn’t without its challenges–from the Meltdown and Spectre excitement (patches will be in the 4.15 kernel) to a couple of other nasty bugs, we had our work cut out for us but we prevailed in time to deliver this latest and greatest version for your installation pleasure.

Churn, especially in security practices and software, is the best state imaginable for generating vulnerabilities.

New software means new bugs, unfamiliar setup requirements, newbie user mistakes, in addition to the 33% or more of users who accept phishing emails.

2018 looks like a great year for security churn.

How stable is your security? (Don’t answer over a clear channel.)

The Matrix Calculus You Need For Deep Learning

Filed under: Deep Learning,Machine Learning,Mathematics — Patrick Durusau @ 9:22 pm

The Matrix Calculus You Need For Deep Learning by Terence Parr, Jeremy Howard.

Abstract:

This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don’t worry if you get stuck at some point along the way—just go back and reread the previous section, and try writing down and working through some examples. And if you’re still stuck, we’re happy to answer your questions in the Theory category at forums.fast.ai. Note: There is a reference section at the end of the paper summarizing all the key matrix calculus rules and terminology discussed here.

Here’s a recommendation for reading the paper:

(We teach in University of San Francisco’s MS in Data Science program and have other nefarious projects underway. You might know Terence as the creator of the ANTLR parser generator. For more material, see Jeremy’s fast.ai courses and University of San Francisco’s Data Institute in-person version of the deep learning course.

Apologies to Jeremy but I recognize ANTLR more quickly than I do Jeremy’s fast.ai courses. (Need to fix that.)

The paper runs thirty-three pages and as the authors say, most of it is unnecessary unless you want to understand what’s happening under the hood with deep learning.

Think of it as the difference between knowing how to drive a sports car and being able to work on a sports car.

With the latter set of skills, you can:

  • tweak your sports car for maximum performance
  • tweak someone else’s sports car for less performance
  • detect someone tweaking your sports car

Read the paper, master the paper.

No test, just real world consequences that separate the prepared from the unprepared.

Were You Pwned by the “Human Cat” Story?

Filed under: Facebook,Fake News — Patrick Durusau @ 5:55 pm

Overseas Fake News Publishers Use Facebook’s Instant Articles To Bring In More Cash by Jane Lytvynenko

Fake stories first:

From the post:

While some mainstream publishers are abandoning Facebook’s Instant Articles, fake news sites based overseas are taking advantage of the format — and in some cases Facebook itself is earning revenue from their false stories.

BuzzFeed News found 29 Facebook pages, and associated websites, that are using Instant Articles to help their completely false stories load faster on Facebook. At least 24 of these pages are also signed up with Facebook Audience Network, meaning Facebook itself earns a share of revenue from the fake news being read on its platform.

Launched in 2015, Instant Articles offer a way for publishers to have their articles load quickly and natively within the Facebook mobile app. Publishers can insert their own ads or use Facebook’s ad network, Audience Network, to automatically place advertisements into their articles. Facebook takes a cut of the revenue when sites monetize with Audience Network.

“We’re against false news and want no part of it on our platform; including in Instant Articles,” said an email statement from a Facebook spokesperson. “We’ve launched a comprehensive effort across all products to take on these scammers, and we’re currently hosting third-party fact checkers from around the world to understand how we can more effectively solve the problem.”

The spokesperson did not respond to questions about the use of Instant Articles by spammers and fake news publishers, or about the fact that Facebook’s ad network was also being used for monetization. The articles sent to Facebook by BuzzFeed News were later removed from the platform. The company also removes publishers from Instant Articles if they’ve been flagged by third-party fact-checkers.

Really? You could be pwned by a “human cat” story?

Why I should be morally outraged and/or willing to devote attention to stopping that type of fake news?

Or ask anyone else to devote their resources to it?

Would you seek out Flat Earthers to dispel their delusions? If not, leave the “fake news” to people who seem to enjoy it. It’s their dime.

February 6, 2018

What the f*ck Python! 🐍

Filed under: Programming,Python — Patrick Durusau @ 8:32 pm

What the f*ck Python! 🐍

From the post:

Python, being a beautifully designed high-level and interpreter-based programming language, provides us with many features for the programmer’s comfort. But sometimes, the outcomes of a Python snippet may not seem obvious to a regular user at first sight.

Here is a fun project to collect such tricky & counter-intuitive examples and lesser-known features in Python, attempting to discuss what exactly is happening under the hood!

While some of the examples you see below may not be WTFs in the truest sense, but they’ll reveal some of the interesting parts of Python that you might be unaware of. I find it a nice way to learn the internals of a programming language, and I think you’ll find them interesting as well!

If you’re an experienced Python programmer, you can take it as a challenge to get most of them right in first attempt. You may be already familiar with some of these examples, and I might be able to revive sweet old memories of yours being bitten by these gotchas 😅

If you’re a returning reader, you can learn about the new modifications here.

So, here we go…

What better way to learn than being really pissed off that your code isn’t working? Or isn’t working as expected.

😉

This looks like a real hoot! Too late today to do much with it but I’ll be returning to it.

Enjoy!

Dive into BPF: a list of reading material

Filed under: Cybersecurity,Networks — Patrick Durusau @ 8:22 pm

Dive into BPF: a list of reading material by Quentin Monnet.

From the post:

BPF, as in Berkeley Packet Filter, was initially conceived in 1992 so as to provide a way to filter packets and to avoid useless packet copies from kernel to userspace. It initially consisted in a simple bytecode that is injected from userspace into the kernel, where it is checked by a verifier—to prevent kernel crashes or security issues—and attached to a socket, then run on each received packet. It was ported to Linux a couple of years later, and used for a small number of applications (tcpdump for example). The simplicity of the language as well as the existence of an in-kernel Just-In-Time (JIT) compiling machine for BPF were factors for the excellent performances of this tool.

Then in 2013, Alexei Starovoitov completely reshaped it, started to add new functionalities and to improve the performances of BPF. This new version is designated as eBPF (for “extended BPF”), while the former becomes cBPF (“classic” BPF). New features such as maps and tail calls appeared. The JIT machines were rewritten. The new language is even closer to native machine language than cBPF was. And also, new attach points in the kernel have been created.

Thanks to those new hooks, eBPF programs can be designed for a variety of use cases, that divide into two fields of applications. One of them is the domain of kernel tracing and event monitoring. BPF programs can be attached to kprobes and they compare with other tracing methods, with many advantages (and sometimes some drawbacks).

The other application domain remains network programming. In addition to socket filter, eBPF programs can be attached to tc (Linux traffic control tool) ingress or egress interfaces and perform a variety of packet processing tasks, in an efficient way. This opens new perspectives in the domain.

And eBPF performances are further leveraged through the technologies developed for the IO Visor project: new hooks have also been added for XDP (“eXpress Data Path”), a new fast path recently added to the kernel. XDP works in conjunction with the Linux stack, and relies on BPF to perform very fast packet processing.

Even some projects such as P4, Open vSwitch, consider or started to approach BPF. Some others, such as CETH, Cilium, are entirely based on it. BPF is buzzing, so we can expect a lot of tools and projects to orbit around it soon…

I haven’t even thought about the Berkeley Packet Filter in more than a decade.

But such a wonderful reading list merits mention in its own right. What a great model for reading lists on other topics!

And one or more members of your team may want to get closer to the metal on packet traffic.

PS: I don’t subscribe to the only governments can build nation state level tooling for hacks. Loose confederations of people built the Internet. Something to keep in mind while sharing code and hacks.

Finally! A Main Stream Use for Deep Learning!

Filed under: Deep Learning,Humor,Machine Learning — Patrick Durusau @ 7:45 pm

Using deep learning to generate offensive license plates by Jonathan Nolis.

From the post:

If you’ve been on the internet for long enough you’ve seen quality content generated by deep learning algorithms. This includes algorithms trained on band names, video game titles, and Pokémon. As a data scientist who wants to keep up with modern tends in the field, I figured there would be no better way to learn how to use deep learning myself than to find a fun topic to generate text for. After having the desire to do this, I waited for a year before I found just the right data set to do it,

I happened to stumble on a list of banned license plates in Arizona. This list contains all of the personalized license plates that people requested but were denied by the Arizona Motor Vehicle Division. This dataset contained over 30,000 license plates which makes a great set of text for a deep learning algorithm. I included the data as text in my GitHub repository so other people can use it if they so choose. Unfortunately the data is from 2012, but I have an active Public Records Request to the state of Arizona for an updated list. I highly recommend you look through it, it’s very funny.

What a great idea! Not only are you learning deep learning but you are being offensive at the same time. A double-dipper!

A script for banging against your state license registration is left as an exercise for the reader.

A password generator using phonetics to spell offensive phrases for c-suite users would be nice.

February 5, 2018

Balisage: The Markup Conference 2018 – 77 Days To Paper Submission Deadline!

Filed under: Conferences,XML,XML Schema,XPath,XQuery — Patrick Durusau @ 8:46 pm

Call for Participation

Submission dates/instructions have dropped!

When:
Dates:

  • 22 March 2018 — Peer review applications due
  • 22 April 2018 — Paper submissions due
  • 21 May 2018 — Speakers notified
  • 8 June 2018 — Late-breaking News submissions due
  • 15 June 2018 — Late-breaking News speakers notified
  • 6 July 2018 — Final papers due from presenters of peer reviewed papers
  • 6 July 2018 — Short paper or slide summary due from presenters of late-breaking news
  • 30 July 2018 — Pre-conference Symposium
  • 31 July –3 August 2018 — Balisage: The Markup Conference
How:
Submit full papers in XML to info@balisage.net
See the pages Instructions for Authors and
Tag Set and Submission Guidelines for details.
Apply to the Peer Review panel

I’ve heard inability to submit valid markup counts in the judging of papers. That may just be rumor or it may be true. I suggest validating your submission.

You should be on the fourth or fifth draft of your paper by now, but be aware the paper submission deadline is April 22, 2018, or 77 days from today!

Looking forward to seeing exceptionally strong papers in the review process and being presented at Balisage!

New Draft Morphological Tags for MorphGNT

Filed under: Bible,Greek,Language — Patrick Durusau @ 8:22 pm

New Draft Morphological Tags for MorphGNT by James Tauber.

From the post:

At least going back to my initial collaboration with Ulrik Sandborg-Petersen in 2005, I’ve been thinking about how I would do morphological tags in MorphGNT if I were starting from scratch.

Much later, in 2014, I had some discussions with Mike Aubrey at my first SBL conference and put together a straw proposal. There was a rethinking of some parts-of-speech, handling of tense/aspect, handling of voice, handling of syncretism and underspecification.

Even though some of the ideas were more drastic than others, a few things have remained consistent in my thinking:

  • there is value in a purely morphological analysis that doesn’t disambiguate on syntactic or semantic grounds
  • this analysis does not need the notion of parts-of-speech beyond purely Morphological Parts of Speech
  • this analysis should not attempt to distinguish middles and passives in the present or perfect system

As part of the handling of syncretism and underspecification, I had originally suggested a need for a value for the case property that didn’t distinguish nominative and accusative and a need for a value for the gender property like “non-neuter”.

If you are interested in language encoding, Biblical Greek, or morphology, Tauber has a project for you!

Be forewarned that what you tag has a great deal to do with what you can and/or will see. You have been warned.

Enjoy!

Unfairness By Algorithm

Filed under: Bias,Computer Science — Patrick Durusau @ 5:40 pm

Unfairness By Algorithm: Distilling the Harms of Automated Decision-Making by Lauren Smith.

From the post:

Analysis of personal data can be used to improve services, advance research, and combat discrimination. However, such analysis can also create valid concerns about differential treatment of individuals or harmful impacts on vulnerable communities. These concerns can be amplified when automated decision-making uses sensitive data (such as race, gender, or familial status), impacts protected classes, or affects individuals’ eligibility for housing, employment, or other core services. When seeking to identify harms, it is important to appreciate the context of interactions between individuals, companies, and governments—including the benefits provided by automated decision-making frameworks, and the fallibility of human decision-making.

Recent discussions have highlighted legal and ethical issues raised by the use of sensitive data for hiring, policing, benefits determinations, marketing, and other purposes. These conversations can become mired in definitional challenges that make progress towards solutions difficult. There are few easy ways to navigate these issues, but if stakeholders hold frank discussions, we can do more to promote fairness, encourage responsible data use, and combat discrimination.

To facilitate these discussions, the Future of Privacy Forum (FPF) attempted to identify, articulate, and categorize the types of harm that may result from automated decision-making. To inform this effort, FPF reviewed leading books, articles, and advocacy pieces on the topic of algorithmic discrimination. We distilled both the harms and potential mitigation strategies identified in the literature into two charts. We hope you will suggest revisions, identify challenges, and help improve the document by contacting lsmith@fpf.org. In addition to presenting this document for consideration for the FTC Informational Injury workshop, we anticipate it will be useful in assessing fairness, transparency and accountability for artificial intelligence, as well as methodologies to assess impacts on rights and freedoms under the EU General Data Protection Regulation.

The primary attraction are two tables, Potential Harms from Automated Decision-Making and Potential Mitigation Sets.

Take the tables as a starting point for analysis.

Some “unfair” practices, such as increased auto insurance prices for night-shift workers, which results in differential access to insurance, is an actuarial question. Insurers are not public charities and can legally discriminate based on perceived risk.

#ColorOurCollections

Filed under: Art,FBI,Library — Patrick Durusau @ 5:12 pm

#ColorOurCollections

From the webpage:

From February 5-9, 2018, libraries, archives, and other cultural institutions around the world are sharing free coloring sheets and books based on materials in their collections.

Something fun to start the week!

In addition to more than one hundred participating institutions, you can also find instructions for creating your own coloring pages.

Any of the images you find at Mardi Gras New Orleans will make great coloring pages (modulo non-commercial use and/or permissions as appropriate).

The same instructions will help you make “adult” coloring pages as well.

I wasn’t able to get attractive results for Pedro Berruguete Saint Dominic Presiding over an Auto-da-fe 1495 using the simple instructions but will continue to play with it.

High hopes for an Auto-da-fe coloring page. FBI leaders who violate the privacy of American citizens as the focal point. (There are honest, decent and valuable FBI agents, but like other groups, only the bad apples get the press.)

February 3, 2018

Mapping Militant Selfies: …Generating Battlefield Data

Filed under: Entity Extraction,Entity Resolution,Mapping,Maps — Patrick Durusau @ 4:22 pm

Mapping Militant Selfies – Application of Entity Recognition/Extraction Methods to Generate Battlefield Data in Northern Syria (video) – presentation by Akin Unver.

From the seminar description:

As the Middle East goes through one of its most historic, yet painful episodes, the fate of the region’s Kurds have drawn substantial interest. Transnational Kurdish awakening—both political and armed—has attracted unprecedented global interest as individual Kurdish minorities across four countries, Turkey, Iraq, Iran, and Syria, have begun to shake their respective political status quo in various ways. In order to analyse this trend in a region in flux, this paper introduces a new methodology in generating computerised geopolitical data. Selfies of militants from three main warring non-state actors, ISIS, YPG and FSA, through February 2014 – February 2016, was sorted and operationalized through a dedicated repository of geopolitical events, extracted from a comprehensive open source archive of Turkish, Kurdish, Arabic, and Farsi sources, and constructed using entity extraction and recognition algorithms. These selfies were crosschecked against events related to conflict, such as unrest, attack, sabotage and bombings were then filtered based on human- curated lists of actors and locations. The result is a focused data set of more than 2000 events (or activity nodes) with a high level of geographical and temporal granularity. This data is then used to generate a series of four heat maps based on six-month intervals. They highlight the intensity of armed group events and the evolution of multiple fronts in the border regions of Turkey, Syria, Iraq and Iran.

Great presentation that includes the goal of:

With no reliance on ‘official’ (censored) data

Unfortunately, the technical infrastructure isn’t touched upon nor were any links given. I have written to Professor Unver asking for further information.

Although Unver focuses on the Kurds, these techniques support ad-hoc battlefield data systems, putting irregular forces to an information parity with better funded adversaries.

Replace selfies with time-stamped, geo-located images of government forces, plus image recognition, with a little discipline you have a start towards a highly effective force even if badly out numbered.

If you are interested in more academic application of this technology, see:

Schrödinger’s Kurds: Transnational Kurdish Geopolitics In The Age Of Shifting Borders

Abstract:

As the Middle East goes through one of its most historic, yet painful episodes, the fate of the region’s Kurds have drawn substantial interest. Transnational Kurdish awakening—both political and armed—has attracted unprecedented global interest as individual Kurdish minorities across four countries, Turkey, Iraq, Iran, and Syria, have begun to shake their respective political status quo in various ways. It is in Syria that the Kurds have made perhaps their largest impact, largely owing to the intensification of the civil war and the breakdown of state authority along Kurdish-dominated northern borderlands. However, in Turkey, Iraq, and Iran too, Kurds are searching for a new status quo, using multiple and sometimes mutually defeating methods. This article looks at the future of the Kurds in the Middle East through a geopolitical approach. It begins with an exposition of the Kurds’ geographical history and politics, emphasizing the natural anchor provided by the Taurus and Zagros mountains. That anchor, history tells us, has both rendered the Kurds extremely resilient to systemic changes to larger states in their environment, and also provided hindrance to the materialization of a unified Kurdish political will. Then, the article assesses the theoretical relationship between weak states and strong non-states, and examines why the weakening of state authority in Syria has created a spillover effect on all Kurds in its neighborhood. In addition to discussing classical geopolitics, the article also reflects upon demography, tribalism, Islam, and socialism as additional variables that add and expand the debate of Kurdish geopolitics. The article also takes a big-data approach to Kurdish geopolitics by introducing a new geopolitical research methodology, using large-volume and rapid-processed entity extraction and recognition algorithms to convert data into heat maps that reveal the general pattern of Kurdish geopolitics in transition across four host countries.

A basic app should run on Tails, in memory, such that if your coordinating position is compromised, powering down (jerking out the power cord) destroys all the data.

Hmmm, encrypted delivery of processed data from a web service to the coordinator, such that their computer is only displaying data.

Other requirements?

Where Are Topic Mappers Today? Lars Marius Garshol

Filed under: Games,PageRank — Patrick Durusau @ 11:37 am

Some are creating new children’s games:

If you’re interested, Ian Rogers has a complete explanation with examples at: The Google Pagerank Algorithm and How It Works or a different take with a table of approximate results at: RITE Wiki: Page Rank.

Unfortunately, both Garshol and Wikipedia’s PageRank page get the Google pagerank algorithm incorrect.

The correct formulation reads:

The results of reported algorithm are divided by U.S. Government Interference, an unknown quantity.

Perhaps that is why Google keeps its pagerank calculation secret. If I were an allegedly sovereign nation, I would keep Google’s lapdog relationship to the U.S. government firmly in mind.

IDA v7.0 Released as Freeware – Comparison to The IDA Pro Book?

Filed under: Cybersecurity,Hacking,Programming — Patrick Durusau @ 9:04 am

IDA v7.0 Released as Freeware

From the download page:

The freeware version of IDA v7.0 has the following limitations:

  • no commercial use is allowed
  • lacks all features introduced in IDA > v7.0
  • lacks support for many processors, file formats, debugging etc…
  • comes without technical support

Copious amounts of documentation are online.

I haven’t seen The IDA Pro Book by Chris Eagle, but it was published in 2011. Do you know anyone who has compared The IDA Pro Book to version 7.0?

Two promising pages: IDA Support Overview and IDA Support: Links (external).

February 2, 2018

PubMed Commons to be Discontinued

Filed under: Bioinformatics,Medical Informatics,PubMed,Social Media — Patrick Durusau @ 5:10 pm

PubMed Commons to be Discontinued

From the post:

PubMed Commons has been a valuable experiment in supporting discussion of published scientific literature. The service was first introduced as a pilot project in the fall of 2013 and was reviewed in 2015. Despite low levels of use at that time, NIH decided to extend the effort for another year or two in hopes that participation would increase. Unfortunately, usage has remained minimal, with comments submitted on only 6,000 of the 28 million articles indexed in PubMed.

While many worthwhile comments were made through the service during its 4 years of operation, NIH has decided that the low level of participation does not warrant continued investment in the project, particularly given the availability of other commenting venues.

Comments will still be available, see the post for details.

Good time for the reminder that even negative results from an experiment are valuable.

Even more so in this case because discussion/comment facilities are non-trivial components of a content delivery system. Time and resources not spent on comment facilities could be put in other directions.

Where do discussions of medical articles take place and can they be used to automatically annotate published articles?

The Unix Workbench

Filed under: Linux OS — Patrick Durusau @ 2:47 pm

Unlikely to help you but a great resource to pass along to new Unix users by Sean Kross.

Some day, Microsoft will complete the long transition to Unix. Start today and you will arrive years ahead of it. 😉

Discrediting the FBI?

Filed under: FBI,Government — Patrick Durusau @ 2:27 pm

Whatever your opinion of the accidental U.S. president (that’s a dead give away), what does it mean to “discredit” the FBI?

Just hitting the high points:

The FBI has a long history of lying and abuse, these being only some of the more recent examples.

So my question remains: What does it mean to “discredit” the FBI?

The FBI and its agents are unworthy of any belief by anyone. Their own records and admissions are a story of staggering from one lie to the next.

I’ll grant the FBI is large enough that honorable, hard working, honest agents must exist. But not enough of them to prevent the repeated fails at the FBI.

Anyone who credits any FBI investigation has motivations other than the factual record of the FBI.

PS: The Nunes memo confirms what many have long suspected about the FISA court: It exercises no more meaningful oversight over FISA warrants than a physical rubber stamp would in their place.

How To Secure Sex Toys – End to End (so to speak)

Filed under: Cybersecurity,Hacking,Security — Patrick Durusau @ 1:40 pm

Thursday began innocently enough and then I encountered:

The tumult of articles started (I think) with: Internet of Dildos: A Long Way to a Vibrant Future – From IoT to IoD, covering security flaws in Vibratissimo PantyBuster, MagicMotion Flamingo, and Realov Lydia, reads in part:


The results are the foundations for a Master thesis written by Werner Schober in cooperation with SEC Consult and the University of Applied Sciences St. Pölten. The first available results can be found in the following chapters of this blog post.

The sex toys of the “Vibratissimo” product line and their cloud platform, both manufactured and operated by the German company Amor Gummiwaren GmbH, were affected by severe security vulnerabilities. The information we present is not only relevant from a technological perspective, but also from a data protection and privacy perspective. The database containing all the customer data (explicit images, chat logs, sexual orientation, email addresses, passwords in clear text, etc.) was basically readable for everyone on the internet. Moreover, an attacker was able to remotely pleasure individuals without their consent. This could be possible if an attacker is nearby a victim (within Bluetooth range), or even over the internet. Furthermore, the enumeration of explicit images of all users is possible because of predictable numbers and missing authorization checks.

Other coverage of the vulnerability includes:

Vibratissimo product line (includes the PantyBuster).

The cited coverage doesn’t answer how to incentivize end-to-end encrypted sex toys?

Here’s one suggestion: Buy the PantyBuster or other “smart” sex toys in bulk. Re-ship these sex toys, after duly noting their serial numbers and other access information, to your government representatives, sports or TV figures, judges, military officers, etc. People whose privacy matters to the government.

If someone were to post a list of such devices, well, you can imagine the speed with sex toys will be required to be encrypted in your market.

Some people see vulnerabilities and see problems.

I see the same vulnerabilities and see endless possibilities.

Weird Machines, exploitability, and proven unexploitability – Video

Filed under: Cybersecurity,Hacking,Security — Patrick Durusau @ 10:32 am

Thomas Dullien/Halvar Flake’s presentation Weird Machines, exploitability, and proven unexploitability won’t embed but you can watch it on Vimeo.

Great presentation of the paper I mentioned at: Weird machines, exploitability, and provable unexploitability.

Includes this image of a “MitiGator:”

Views “software as an emulator for the finite state machine I would like to have.” (rough paraphrase)

Another gem, attackers don’t distinguish between data and programming:

OK, one more gem and you have to go watch the video:

Proof of unexploitability:

Mostly rote exhaustion of the possible weird state transitions.

The example used is “several orders of magnitude” less complicated than most software. Possible to prove but difficult even with simple examples.

Definitely a “watch this space” field of computer science.

Appendices with code: http://www.dullien.net/thomas/weird-machines-exploitability.pdf

February 1, 2018

NSA Exploits – Mining Malware – Ethics Question

Filed under: Cybersecurity,Ethics,Hacking,NSA,Security — Patrick Durusau @ 9:24 pm

New Monero mining malware infected 500K PCs by using 2 NSA exploits

From the post:

It looks like the craze of cryptocurrency mining is taking over the world by storm as every new day there is a new malware targeting unsuspecting users to use their computing power to mine cryptocurrency. Recently, the IT security researchers at Proofpoint have discovered a Monero mining malware that uses leaked NSA (National Security Agency) EternalBlue exploit to spread itself.

The post also mentions use of the NSA exploit, EsteemAudit.

A fair number of leads and worth your time to read in detail.

I suspect most of the data science ethics crowd will down vote the use of NSA exploits (EternalBlue, EsteemAudit) for cyrptocurrency mining.

Here’s a somewhat harder data science ethics question:

Is it ethical to infect 500,000+ Windows computers belonging to a government for the purpose of obtaining internal documents?

Does your answer depend upon which government and what documents?

Governments don’t take your rights into consideration. Should you take their laws into consideration?

« Newer PostsOlder Posts »

Powered by WordPress