Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 30, 2019

‘Diversity in Faces’ Dataset – Are You Being Treated Unfairly? As A Matter of Fact, Yes.

IBM Research Releases ‘Diversity in Faces’ Dataset to Advance Study of Fairness in Facial Recognition Systems by John R. Smith.

From the post:

Have you ever been treated unfairly? How did it make you feel? Probably not too good. Most people generally agree that a fairer world is a better world, and our AI researchers couldn’t agree more. That’s why we are harnessing the power of science to create AI systems that are more fair and accurate.

Many of our recent advances in AI have produced remarkable capabilities for computers to accomplish increasingly sophisticated and important tasks, like translating speech across languages to bridge communications across cultures, improving complex interactions between people and machines, and automatically recognizing contents of video to assist in safety applications.

Much of the power of AI today comes from the use of data-driven deep learning to train increasingly accurate models by using growing amounts of data. However, the strength of these techniques can also be a weakness. The AI systems learn what they’re taught, and if they are not taught with robust and diverse datasets, accuracy and fairness could be at risk. For that reason, IBM, along with AI developers and the research community, need to be thoughtful about what data we use for training. IBM remains committed to developing AI systems to make the world more fair.

To request access to the DiF dataset, visit our webpage. To learn more about DiF, read our paper, “Diversity in Faces.”

Nice of Smith to we have “ever been treated unfairly?”

Because if not before, certainly now with the limitations on access to the “Diversity in Faces” Dataset.

Step 1

Review the DiF Terms of Use and Privacy Notice.

DOCUMENTS

Terms of use

DiF Privacy Notice

Step 2

Download and complete the questionnaire.

DOCUMENT

DiF Questionnaire (PDF)

Step 3

Email completed questionnaire to IBM Research.

APPLICATION CONTACT

Michele Merler | mimerler@us.ibm.com

Step 4

Further instructions will be provided from IBM Research via email once application is approved.

Check out Terms of Use, 3. IP Rights, 3.2 #5:


Licensee grants to IBM a non-exclusive, irrevocable, unrestricted, worldwide and paid-up right, license and sublicense to: a) include in any product or service any idea, know-how, feedback, concept, technique, invention, discovery or improvement, whether or not patentable, that Licensee provides to IBM, b) use, manufacture and market any such product or service, and c) allow others to do any of the foregoing. (emphasis added)

Treated unfairly? There’s the grasping claw of IBM so familiar across the decades. I suppose we should be thankful it doesn’t include any ideas, concepts, patents, etc., that you develop while in possession of the dataset. From that perspective, the terms of use are downright liberal.

Cyber Threats, The Modern Maginot Line … Worldwide Threat Assessment

Filed under: Cybersecurity,Hacking,Intelligence — Patrick Durusau @ 8:30 pm

Worldwide Threat Assessment of the US Intelligence Community

From the report:


China has the ability to launch cyber attacks that cause localized, temporary disruptive effects on critical infrastructure — such as disruption of a natural gas pipeline for days to weeks — in the United States.

I won’t shame the alleged author of this report by naming them.

This is a making a case for a bigger budget document and not a report to be taken seriously.

For example, I would re-write this item to read:


Any country with a budget large enough to rent earth moving equipment has the ability to cause disruptive effects on critical infrastructure — such as disruption of a natural gas pipeline for months — in the United States.

Think about the last time you heard of a contractor disrupting a gas or water main. Now improve upon that memory with the pipe being one that transports oil, natural gas or other petroleum products across state lines.

If you were planning on disrupting critical infrastructure in the US, would you fund years of iffy research and development for a cyber attack, or spend several thousand dollars on travel and equipment rental?

Cyber defense utility infrastructure is a modern Maginot Line. It’s true someone, a very stupid someone, could attack that way, but why would they in light of easier and surer methods of disruption?

No one associated with the report asked that question because it’s a collaborative budget increase document.

PS: The techniques overlooked in the Worldwide Threat Assessment are applicable to other countries as well. (Inquire for details.)

January 25, 2019

Setting Up A Hardware Hacking Lab (How Do You Hide An Oscilloscope?)

Filed under: Cybersecurity,Hacking,IoT - Internet of Things — Patrick Durusau @ 9:24 pm

Setting Up A Hardware Hacking Lab

From the post:

One of the questions I receive more than any other is “What tools do you use for hardware hacking?” or “What tools should I buy to get started with hardware hacking?”. Rather than wasting a bunch of time answering this every time someone asks, I’ve decided to write a blog post on the subject! It’s worth noting that YOU DON’T NEED EVERYTHING on this list in order to get started. The general idea of this post is that you would pick one tool from each category and by the time you’re done you’ll have a planned out and versatile setup. Also, I’m going to try my best to add tools that fit all different budget levels.

Before you get to the oscilloscope section, you are outfitted for less than $100. Enough tooling to start developing your skill set. So you can take full advantage of an oscilloscope in your hardware hacking future.

Not to mention law enforcement visitors will key on an oscilloscope, having only seen them in very bad sci-fi adventures. You might be a space alien or something. Creative ways to conceal an oscilloscope?

January 24, 2019

What The Hell Happened (2016) – Data Questions

Filed under: Data,Politics,Survey — Patrick Durusau @ 9:12 pm

What The Hell Happened (WTHH)

From the homepage:

Every progressive remembers waking up on November 9th, 2016. The question on everyone’s mind was… “What the hell happened?”

Pundits were quick to blame “identity politics” for Clinton’s loss. Recent research suggests this framing may have led voters to be less supportive of women candidates and candidates of color.

That’s why we’re introducing the What The Hell Happened Project, where we will work with academics, practitioners and advocates to explain the 2018 election from beginning to end.

Let’s cut to the data:

This survey is based on 3,215 interviews of registered voters conducted by YouGov. The sample was weighted according to age, sex, race, education, urban/rural status, partisanship, marital status, and Census region to be nationally representative of 2018 voters according to Catalist, and to a post-election correction consisting of the national two-party vote share. Respondents were selected from YouGov and other opt-in panels to be representative of registered voters. The weights range from 0.28 to 4.6, with a mean of 1 and a standard deviation of 0.53.

The survey dataset includes measures of political participation such as activism, group consciousness, and vote choice. It also includes measures of interest including items from a hostile sexism battery, racial resentment, fear of demographic change, fear of cultural change, and a variety of policy positions. It includes a rich demographic battery of items like age, race, ethnicity, sex, party identification, income, education, and US state. Please see the attached codebook for a full description and coding of the variables in this survey, as well as the toplines for breakdowns of some of the key variables.

The dataset also includes recodes to scale the hostile sexism items to a 0-1 scale of hostile sexism, the racial animus items to a 0-1 scale of racial animus, and the demographic change items to a 0-1 scale of fear of demographic change. See the codebook for more details. We created a two-way vote choice variable to capture Democrat/Republican voting by imputing the vote choice of undecided respondents based on a Catalist partisanship model for those respondents, who comprised about 5% of the sample.

To explore the data we have embedded a Crunchbox, which you can use to easily make crosstabs and charts of the data. Here, you can click around many of the political and demographic items and look around for interesting trends to explore.

If you want a winning candidate in 2020, repeat every morning: Focus on 2020, Focus on 2020.

Your candidate is not running in 2016 or even 2018.

And, your candidate needs better voter data than WTHH offers here.

First, how was the data gathered?

Respondents were selected from YouGov and other opt-in panels to be representative of registered voters.

Yikes! That’s not how professional pollers do surveys. It may be ok for learning analysis tools but not for serious political forecasting.

Second, what manipulation, if any, of the data, has been performed?

The sample was weighted according to age, sex, race, education, urban/rural status, partisanship, marital status, and Census region to be nationally representative of 2018 voters according to Catalist, and to a post-election correction consisting of the national two-party vote share.

Oh. So we don’t know what biases or faults the weighting process may have introduced to the data. Great.

How were the questions constructed and tested?

Don’t know. (Without this step we don’t know what the question may or may not be measuring.)

How many questions were asked? (56)

Fifty-six questions. Really?

In the 1960 presidential campaign, John F. Kennedy’s staff has a matrix of 480 voter types and 52 issue clusters.

Do you see such a matrix coming out of 56 questions? Neither do I.

The WTHH data is interesting in an amateurish sort of way but winning in 2020 requires the latest data gathering and modeling techniques. Not to mention getting voters to the polling places (modeling a solution for registered but non-voting voters would be a real plus). Your Secretary of State should have prior voting behavior records.

Weather Data – 100K stations – Hourly

Filed under: Data,Weather Data — Patrick Durusau @ 8:09 pm

Weather Directory Contents

From the webpage:

This directory contains hourly weather dumps. The files are compressed using Zstandard compression (.zst). Each file is a collection of JSON objects (ndjson) and can easily be parsed by any utility that has a JSON decode library (including Python, Java, Perl, PHP, etc.) Please contact me if you have any questions about the file format or the fields within the JSON objects. The field “retrieved_utc” is a field that I added that gives the time of when the data was retrieved. The format of the files is WEATHER_YYYY-MM-DD-HH (UTC time format).

Please consider making a donation (https://pushshift.io/donations) if you download a lot of data. This helps offset the costs of my time collecting data and providing bandwidth to make these files available to the public. Thank you!

If you have any questions about the data formats of the files or any other questions, please feel free to contact me at jason@pushshift.io

A project of pushshift.io, the homepage of which is a collection of statistics on Reddit posts.

Looking at the compressed files for today (24 January 2019), the earliest file is dated Jan 24 2019 AM and tips the scales at 35,067,516 bytes. Hourly files, running between 72,272,568 and 65,989336 bytes. Remembering these files are compressed so you need a lot of space or work with them compressed.

The perfect data is your boss is a weather freak. Be sure to mention the donation link to them.

Enjoy!

January 23, 2019

[Tails] USB images instead of ISO images – Testing Needed – Release January 29th

Filed under: Privacy,Tails — Patrick Durusau @ 3:20 pm

[Tails] USB images instead of ISO images – Testing Needed

From the webpage:

We need your help to test the simplified installation methods of Tails that we will release with 3.12 on January 29.

The method will be much simpler and faster, especially for macOS users, but for Windows users as well. Debian and Ubuntu users won’t have to install a specific program anymore and the process will also be faster for other Linux users.

In short, instead of downloading an ISO image (a format originally designed for CDs) you will download a USB image that is already an image of the data as written to your USB stick by Tails Installer. So no need for Tails Installer anymore and no need for an intermediary Tails nor a second USB stick when installing from Windows or macOS.

You should be able to create a persistent volume right away.

The methods for upgrading Tails will remain the same.
… (emphasis in original)

Got a few minutes?

The privacy you protect maybe your own!

January 19, 2019

Targeting Government Contractors/Subcontractors (U.S.)

Filed under: Cybersecurity,Government,Hacking — Patrick Durusau @ 8:18 pm

You may have seen: China’s been hacking Navy contractors for 18 months, new report reveals, which among other things says:


“It’s extremely hard for the Defense Department to secure its own systems,” Bossert said. “It’s a matter of trust and hope to secure the systems of their contractors and subcontractors.”

Subcontractors of all branches are frequently attacked by hackers due to inadequate cybersecurity measures. Officials say subcontractors are not being held accountable for those inadequacies.

Sadly, that article and the WSJ report it summarizes, Chinese Hackers Breach U.S. Navy Contractors fail to provide any actionable details, like which Navy subcontractors?

If you knew which subcontractors, you could target advertising of your services to strengthen their defenses or not be outdone by alleged Chinese hackers. I say “alleged Chinese hackers” because attribution of hacking seems to follow a “villain of the week” pattern. Last year it was super-human North Koreans, or was that the year before? Then it has been the Russians and Chinese off and on. Now it’s the Chinese again.

To correct the lack of actionable data in those reports, I have a somewhat dated (2014) RAND report, Findings from Existing Data on the Department of Defense Industrial Base by Nancy Young Moore, Clifford A. Grammich, Judith D. Mele, that gives you several starting places for finding government subcontractors.

I need to extract the specific resources they list and update/supplement them with others but for weekend reading you could do far worse.

Think of this as one example of weaponizing public data. There are others. If gathered in book form, would you be interested?

January 18, 2019

Data Mining Relevance Practice – Iraq War Report

Filed under: Data Mining,Relevance,Search Data,Text Mining — Patrick Durusau @ 9:47 pm

By now you realize how useless relevancy at the “document” level can be, considering documents can be ten, twenty, hundreds or even thousands of pages long.

Highly relevant “hits” are great, but are you going to read every page of every document?

The main report on the Iraq War, The U.S. Army in the Iraq War – Volume 1: Invasion – Insurgency – Civil War, 2003-2006 and The U.S. Army in the Iraq War — Volume 2: Surge and Withdrawal, 2007-2011, totals out at about 1,400+ pages.

Along with the report, nearly 30,000 unclassified documents used in the writing of the report are also available.

Other than being timely, the advantage for data miners is the report, while a bit long, is readable and you know in advance the ~30,000 documents are relevant to that report. Ignoring footnotes (that’s cheating), which documents go with which pages of the report? You can check your answers against the footnotes.

For bonus points, what pages of the ~30,000 documents should go with which pages of the report? They weren’t citing the entire document like some search engines, but particular pages.

And no, I haven’t loaded these documents but hope to this weekend.

PS: The Army War College Publications office has a remarkable range of very high quality publications.

January 17, 2019

Pirate Radio Historic Texts – Where To Go From Here

Filed under: Cybersecurity,Hacking,Radio — Patrick Durusau @ 9:07 pm

Pirate Radio: two downloadable manuals

From the webpage:

Two terrific manuals on Pirate Radio available for free download: The Complete Manual of Pirate Radio, by Zeke Teflon, has technical information on building a radio – including wiring diagrams, mobile operations, parts, testing and getting away with it. Seizing the Airwaves from AK Press, edited by Ron Sakolsky and Stephen Dunifer, provides some great context for Pirate Radio, including historic pirate radio stations, the fable of free speech, community radio, what to do when the FCC come knocking, and a lot more (209 pages of it!).

“Seizing the Airwaves” (219 pages) was published in 1998 and I suspect “The Complete Manual of Pirate Radio” is the older of the two because it mentions tubes in transmitters, cassette tapes and the ARRL Handbook costing $20. (It’s now $49.95.)

These two works are intersting historical artifacts in the Internet Age but a new copy of the ARRL Handbook (2019), is an entirely different story.

It was just a day or two ago that I wrote about wirelessly seizing of control of construction equipment in Who Needs a Hellfire™ Missile When You Have a Crane?.

The airways are full of unseen but hackable data streams. How do emergency and government services communicate? What do monitors emit? WiFi is just one channel waiting for your arrival. Not to mention that the ability to access those streams means you can also interfere with or mimic messages on them.

Check out the AARL’s What’s New page for products to expand or support of your hacking skills beyond cable.

January 16, 2019

Finding Bias in Data Mining/Science: An Exercise

Filed under: Bias,Data Mining — Patrick Durusau @ 9:34 pm

I follow the MIT Technology Review on Twitter: @techreview and was amazed to see this AM:

Period of relative peace?

Really? Smells like someone has been cooking the numbers! (Another way to say bias in data mining/science.)

Unlike annual reports from corporations and foundations, you read the MIT post in full, Data mining adds evidence that war is baked into the structure of society, the original paper, Pattern Analysis of World Conflicts over the past 600 years by Gianluca Martelloni, Francesca Di Patti, and Ugo Bardi, plus you can access the data used for the paper: Conflict by Dr. Peter Brecke.

From general news awareness, the claim of “…period of relative peace…” should trigger skepticism on your part. I take it as a generally accepted fact that the United States was bombing somewhere in the world every day of the Obama administration and that has continued under the current U.S. president. It’s relatively peaceful in New York, London, and Berlin, but other places in the world, not so much.

I skimmed the original article and encountered this remark: “…a relatively peaceful period during the 18th century is noticeable.” I don’t remember 18th history all that well but that strikes me as inconsistent with what I do remember. I have to wonder who was peaceful and who was not in the data set. Not saying it is wrong from one view of the data, but what underlies that statement?

Take this article, along with its data set as an exercise in finding bias in data mining/science. Bias doesn’t mean the paper or its conclusions are necessarily wrong, but choices were made with regard to the data and that shaped their paper and its conclusions.

PS: A cursory glace at the paper also finds the data used ends with the year 2000. Small comfort to the estimated 32 million Muslims who have died since 9/11, during this “period of relative peace.” You need to ask peace for who and at what price?

January 15, 2019

Who Needs a Hellfire™ Missile When You Have a Crane?

Filed under: Cybersecurity,Hacking,IoT - Internet of Things — Patrick Durusau @ 11:06 pm

The Forbes exclusive story, Hackers Take Control Of Giant Construction Cranes by Thomas Brewster, made me follow @Forbes, @ForbesTech, and @iblametom.

Their politics really suck but stories like this one amplify the impact of IoT hacks by several orders of magnitude. Even if there was no hack. You can readily imagine the next big crane accident will be blamed on “IoT hackers.” You can even create a hacking handle to discuss industrial IoT hacking and take credit for accidents with no readily apparent cause.

Hackers will benefit more from the 82-page paper: A Security Analysis of Radio Remote Controllers for Industrial Applications by Jonathan Andersson, et al. that forms the basis for the Forbes story. (I have a copy of the pdf, just in case it disappears.) For a quick overview, see: Attacks Against Industrial Machines via Vulnerable Radio Remote Controllers: Security Analysis and Recommendations.

Just so you know, Hellfire missiles run $65K to $111K, each. Plus the delivery platform, support services, etc. A weapon limited to formal military forces.

Contrast that with IoT enabled construction equipment that is and no doubt is likely to remain vulnerable to hackers. Location is opportunistic but your cost pales when compared to the investment required for a Hellfire missile.

Beyond the cost advantage, hacking construction equipment makes the familiar suddenly unfamiliar, unfriendly, and perhaps even dangerous.

Construction hacking in your area? Tip Thomas Brewster Signal: +447837496820.

January 14, 2019

Metasploit Unleashed

Filed under: Cybersecurity,Hacking,Metasploit — Patrick Durusau @ 8:22 pm

Metasploit Unleashed – Free Ethical Hacking Course

From the webpage:

The Metasploit Unleashed (MSFU) course is provided free of charge by Offensive Security in order to raise awareness for underprivileged children in East Africa. If you enjoy this free ethical hacking course, we ask that you make a donation to the Hackers For Charity non-profit 501(c)(3) organization. A sum of $9.00 will feed a child for a month, so any contribution makes a difference.

We are proud to present the most complete and in-depth Metasploit guide available, with contributions from the authors of the No Starch Press Metasploit Book. This course is a perfect starting point for Information Security Professionals who want to learn penetration testing and ethical hacking, but are not yet ready to commit to a paid course. We will teach you how to use Metasploit, in a structured and intuitive manner. Additionally, this free online ethical hacking course makes a wonderful quick reference for penetration testers, red teams, and other security professionals.

We hope you enjoy the Metasploit Unleashed course as much as we did making it!

You should start with the Requirements for the course. Seriously, read the directions first!

For example, I was anticipating using VirtualBox VMs, only to discover that the Metaploitable VM is for VMware only. So I have to install VMware, convert Metasploitable to OVF and then import into VirtualBox. That sounds like a job for tomorrow! Along with a post about my experience.

January 13, 2019

Exciting new features in XSLT 3 for book publishers

Filed under: Publishing,XSLT — Patrick Durusau @ 3:28 pm

Exciting new features in XSLT 3 for book publishers by Liam Quin.

From the post:


For e-publishers, the ability of XSLT 3 engines to read from and write to zip archives means you can generate EPUB files directly, or even extract files from ebooks. You can also process binary files, so that it’s possible to work out the size of a bitmap image in pixels, which is useful when embedding graphics into web pages or ebooks. And you can process text files a line at a time with fn:unparsed-text-lines().

Probably the single feature that’s the biggest game-changer for most people in publishing, the most fun, and that gives the largest reduction in costs, is the ability to call XSLT from within XSLT using the new fn:transform() function. This means you can easily build a collection of documents, such as making an EPUB 3 zip file, even if it involves running a separate transformation to create some or all of the components such as the spine or table of contents or index, without resorting to complex batch scripts or other programming languages. This reduces the number of programming or scripting languages you need in a project, reduces the number of components, controls the way the components interlock, and results in something easier to understand and maintain by the same person who works with the underlying XSLT transformations.

Part of a tease for Quin’s presentation at: EBOOKCRAFT March 18-19, 2019 | MaRS Discovery District. Videos from 2018 are available.

I like to think of XQuery and XSLT as ways to liberate and transform data, but I have to concede they have legitimate purposes as well. 😉

If you are an ebook publisher, Quin’s presentation at EBookCraft should be on your must-attend calendar.

Buffer Overflow Explained in Detail

Filed under: Cybersecurity,Hacking — Patrick Durusau @ 3:13 pm

Binary Exploitation – Buffer Overflow Explained in Detail by Ahmed Hesham.

From the post:

So first of all I know that there are many tutorials published about buffer overflow and binary exploitation but I decided to write this article because most of these tutorials and articles don’t really talk about the basic fund[a]mentals needed to understand what a buffer overflow really is. They just go explaining what’s a buffer overflow without explaining what is a buffer, what is a stack or what are memory addresses etc. And I just wanted to make it easier for someone who wants to learn about it to find an article that covers the basics. So what I’m going to talk about in this article is what is a buffer , what is a stack and what are the memory addresses and we will take a look at the application memory structure , what is a buffer overflow and why does it happen then I’ll show a really basic and simple example for exploiting a buffer overflow (protostar stack0)

Too basic for most readers but not all. If you are looking for more advanced materials, try the blog at: https://0xrick.github.io/, which has five “Hack the Box” walk-throughs.

Later this week I will be posting about a subject identity approach to malware identification. Any suggestions on use of a subject identity approach to identify vulnerabilities?

January 12, 2019

Reversing C code … Radare2 part I

Filed under: Radare2,Reverse Engineering — Patrick Durusau @ 9:42 pm

Reversing C code in x64 systems with Radare2 part I by Pau Muñoz.

Starting with a very basic C program, Muñoz walks you through compiling the C program and then analyzing it with Radare2.

Interested to see where this series goes.

January 11, 2019

Metasploit Framework 5.0 Released!

Filed under: Cybersecurity,Hacking,Metasploit — Patrick Durusau @ 4:52 pm

Metasploit Framework 5.0 Released!

From the post:

We are happy to announce the release of Metasploit 5.0, the culmination of work by the Metasploit team over the past year. As the first major Metasploit release since 2011, Metasploit 5.0 brings many new features, as well as a fresh release cadence. Metasploit’s new database and automation APIs, evasion modules and libraries, expanded language support, improved performance, and ease-of-use lay the groundwork for better teamwork capabilities, tool integration, and exploitation at scale.

Get it (and improve it)

As of today, you can get MSF 5 by checking out the 5.0.0 tag in the Metasploit Github project. We’re in the process of reaching out to third-party software developers to let them know that Metasploit 5 is stable and ready to ship; for information on when MSF 5 will be packaged and integrated into your favorite distribution, keep an eye on threads like this one. As always, if you find a bug, you can report it to us on Github. Friendly reminder: Your issue is a lot more likely to get attention from us and the rest of the community if you include all the information we ask for in the issue form.

Contributions from the open source community are the soul of Metasploit. Want to join the many hackers, researchers, bug hunters, and docs writers who have helped make Metasploit awesome over the years? Start here. Not into Ruby development? Help us add to our Python or Go module counts.

A beginning set of release notes for Metasploit 5.0 is here. We’ll be adding to these over the next few months. As always, community PRs are welcome! Need a primer on Framework architecture and usage? Take a look at our wiki here, and feel free to reach out to the broader community on Slack. There are also myriad public and user-generated resources on Metasploit tips, tricks, and content, so if you can’t find something you want in our wiki, ask Google or the community what they recommend.

See all the ways to stay informed and get involved at https://metasploit.com.

Before rushing off to put Metasploit Framework 5.0 to use, take a moment to consider contributing back to the Metasploit community.

The near panic for new cybersecurity hires and code to protect against attacks can only result in new security fails and vulnerabilities. Metasploit needs your help to keep up with self-inflicted security issues across government and business entities.

With your help, the CIA, and NSA will be defaulting to Metaspoilt Framework 5.0 as their default desktop hacking app! Of course, neither the CIA nor the NSA can endorse or acknowledge their use of Metaspoilt but one can dream!

January 10, 2019

“…avoid[ing] data monopolies and misuse” The other purpose of data collection being?

Filed under: Privacy — Patrick Durusau @ 8:55 pm

Sorry, your data can still be identified even if it’s anonymized by Kelsey Campbell-Dollaghan.

From the post:

Thanks to the near-complete saturation of the city with sensors and smartphones, we humans are now walking, talking data factories. Passing through a subway turnstile, sending a text, even just carrying a phone in your pocket: we generate location-tagged data on an hourly basis. All that data can be a boon for urban planners and designers who want to understand cities–and, of course, for tech companies and advertisers who want to understand the people in them. Questions about data privacy are frequently met with a chorus of, It’s anonymized! Any identifying features are scrubbed from the data!

The reality, a group of MIT scientists and urban planners show in a new study, is that it’s fairly simple to figure out who is who anyway. In other words, anonymized data can be deanonymized pretty quickly when you’re working with multiple datasets within a city.

“As researchers, we believe that working with large-scale datasets can allow discovering unprecedented insights about human society and mobility, allowing us to plan cities better,” observed Daniel Kondor of MIT’s Future Urban Mobility Group in the release. “Nevertheless, it is important to show if identification is possible, so people can be aware of potential risks of sharing mobility data,” adding, “currently much of this wealth of information is held by just a few companies and public institutions that know a lot about us, while we know so little about them. We need to take care to avoid data monopolies and misuse.”

In other words, as urban planners, tech companies, and governments collect and share data, we now know that “it’s anonymized” is never a guarantee of privacy. And as they dig deep into the data we generate, cities and citizens need to demand that this data can never be reidentified.
(emphasis in original)

I’m sorely puzzled by the “…avoid data monopolies and misuse.” We already have data monopolies and misuse of data (Facebook for example.).

Do you think they mean break-up data monopolies and regulate the use of data?

Both of those seem very unlikely.

A solution may lie in “…just a few companies and public institutions that know a lot about us, while we know so little about them.”

While freeing data from “just a few companies and public institutions,” you could learn and share a great deal about them.

Something to keep in mind!

January 9, 2019

Summer is Coming! Balisage is Coming! Papers Due April 12, 2019!

Filed under: Conferences,XML,XML Database,XML Query Rewriting,XML Schema,XPath,XProc,XQuery,XSLT — Patrick Durusau @ 7:52 pm

From a recent email about Balisage 2019:

Some “Balisage: The Markup Conference 2019” dates are coming soon:

March 29, 2019 — Peer-review applications due
April 12, 2019 — Paper submissions due
July 30 — August 2, 2019 — Balisage: The Markup Conference
July 29, 2019 — Pre-conference Symposium – Topic to be announced https://www.balisage.net/

Balisage: where serious markup practitioners and theoreticians meet every August.

A colleague recently asked me to share the program for Balisage 2019 to help support a request to attend. What, I was asked, will we talk about at Balisage 2019. I replied “It will be a variety of topics relating to markup, but we won’t know the specifics until May.” “Why? It seems like you should know that now.” was the response. “Why don’t you just decide who you want to talk about what and assign topics?” “Because that would not be a contributed paper conference, it would be some other sort of event!”

Balisage *is* a contributed paper conference, and the submissions from people who want to speak drive the program, the hallway conversations, and the whole tone of Balisage!

If you want to speak at Balisage 2019, if you want to help shape the conversation, if you have an idea, experience, opinion, or question relating to markup, please submit a paper to Balisage 2019!

We solicit papers on any aspect of markup and its uses; topics include but ARE NOT LIMITED TO:

• Cutting-edge applications of XML and related technologies
• Integration of XML with other technologies (e.g., content management, XSLT, XQuery)
• Performance issues in parsing, XML database retrieval, or XSLT processing
• Development of angle-bracket-free user interfaces for non-technical users
• Deployment of XML systems for enterprise data
• Design and implementation of XML vocabularies
• Case studies of the use of XML for publishing, interchange, or archiving
• Alternatives to XML/JSON/whatever
• Expressive power and application adequacy of XSD, Relax NG, DTDs, Schematron, and other schema languages
• Invisible XML

Detailed Call for Participation: https://www.balisage.net/Call4Participation.html
Call for Peer Reviewers: https://www.balisage.net/peer/ReviewAppForm.html
About Balisage: https://www.balisage.net/

For more information: info@balisage.net or +1 301 315 9631

Papers are due for Balisage in a little more than 90 days.

Anyone doing a topic map paper this year?

“If you can point to it, we can identify it. If we can identify it, we can map it. If we can map it, …,” well, you know how the rest of it goes.

Data silos continue to exist because they are armor. Armor that protects some stakeholders from prying eyes. Up for a little peeping?

January 8, 2019

Zerodium Bounties 2019

Filed under: Cybersecurity,Hacking — Patrick Durusau @ 8:13 pm

The power of competition for exploits?

Jan. 7, 2019 – Payouts for the majority of Desktops/Servers and Mobile exploits have been increased. Major changes are highlighted below:

Modification Details
Increased Payouts
(Mobiles)
$2,000,000 – Apple iOS remote jailbreak (Zero Click) with persistence (previously: $1,500,000)
$1,500,000 – Apple iOS remote jailbreak (One Click) with persistence (previously: $1,000,000)
$1,000,000 – WhatsApp, iMessage, or SMS/MMS remote code execution (previously: $500,000)
   $500,000 – Chrome RCE + LPE (Android) including a sandbox escape (previously: $200,000)
   $500,000 – Safari + LPE (iOS) including a sandbox escape (previously: $200,000)
   $200,000 – Local privilege escalation to either kernel or root for Android or iOS (previously: $100,000)
   $100,000 – Local pin/passcode or Touch ID bypass for Android or iOS (previously: $15,000)

NOTE: Payouts were also increased for other products including: RCE via documents/medias, RCE via MitM, ASLR or kASLR bypass, information disclosure, etc.

Increased Payouts
(Servers/Desktops)
$1,000,000 – Windows RCE (Zero Click) e.g. via SMB or RDP packets (previously: $500,000)
   $500,000 – Chrome RCE + SBX (Windows) including a sandbox escape (previously: $250,000)
   $500,000 – Apache or MS IIS RCE i.e. remote exploits via HTTP(S) requests (previously: $250,000)
   $250,000 – Outlook RCE i.e. remote exploits via a malicious email (previously: $150,000)
   $250,000 – PHP or OpenSSL RCE (previously: $150,000)
   $250,000 – MS Exchange Server RCE (previously: $150,000)
   $200,000 – VMWare ESXi VM Escape i.e. guest-to-host escape (previously: $100,000)
     $80,000 – Windows local privilege escalation or sandbox escape (previously: $50,000)

NOTE: Payouts were also increased for other products including: Thunderbird, VMWare Workstation, Plesk, cPanel, Webmin, WordPress, 7-Zip, WinRAR, etc.

Not quite in the star athlete range but getting there.

The higher the bounties, the more people who will be hunting. Not unlike the lottery. Some of them will win based on skill, others will stumble on exploits.

What we really need is a competitive market for data, however it is obtained.

January 7, 2019

200 Black Women in Tech … On Twitter

Filed under: Diversity,Feminism — Patrick Durusau @ 3:32 pm

The 2018 List of 200 Black Women in Tech to Follow On Twitter List by Jay Jay Ghatt.

I assume most people reading my blog are likely technical experts in one or more areas. To become such experts, you have worked, read, practiced and talked to others about your area of expertise. Becoming or even getting close to being an expert, is hard work.

Notice that you didn’t say, for example, you saw an XML book in a bookstore or a friend of yours had a book on XQuery or you remember other people like you, who didn’t know anything about XML, discussing it. That’s not the path to becoming an expert in XML.

Surprise, surprise, surprise, that’s also not the path to being an expert in any other field.

It’s also not the path to learning what diversity and feminism can bring to tech. If you don’t think tech needs help, remember there are legacy flaws in chip architecture more than 20 years old.

We can’t know if a design environment enriched by diversity and feminism would have avoided those flaws. What we do know is those flaws and many others were produced in monochrome and non-diverse environments. So, should we continue with design process conditions known to fail or should we try for more complex interactions?

For the impatient:

FOLLOW ALL: If you are on Twitter, you can follow everyone on this list at once by following this Twitter List! (emphasis in original)

For the non-impatient, Ghatt has all 200 members with their Twitter bios listed, should you want to pick and choose.

The first step towards the advantages of diversity is to start listening to diverse voices. That requires effort on your part. Just like becoming an expert.

January 6, 2019

Supporting Black Women/Girls Survivors – @ErynnBrook

Filed under: Feminism — Patrick Durusau @ 10:45 pm

Erynn Brook @ErynnBrook created a Twitter thread on supporting Black Women/Girls survivors.

I started to extract all the resources she lists but then thought: Part of educating yourself about Black Women/Girls and their issues is taking the time to ferret out stories, resources and to spend time listening to them.

I may yet create a collated version from Brook’s thread but for the moment, take her advice:

Spend some time with the stories you see and really listen to black women

to heart.

January 5, 2019

Papers With Code [Machine Learning]

Filed under: Machine Learning,Programming — Patrick Durusau @ 10:00 pm

Papers With Code by Zaur Fataliyev

From the webpage:

This work is in continuous progress and update. We are adding new PWC everyday! Tweet me @fvzaur.

Use this thread to request us your favorite conference to be added to our watchlist and to PWC list.

A truly remarkable collection of papers with code for machine learning.

Is this one of the first sites you hit in the morning?

January 4, 2019

Crypto-Cash for Crypto-Cache : The Dark Overlord

Filed under: Government,Government Data,Hacking,Intelligence — Patrick Durusau @ 8:24 pm
Crypto-Cash for Crypto-Cache

This is the thedarkoverlord here to deliver a message.


Our Official Bitcoin Wallet Address: 192ZobzfZxAkacLGmg9oY4M9y8MVTPxh7U


As the world is aware, we released our first decryption key for the ‘Preview_Documents.container’ Veracrypt container that contained a small sample of documents to continue to verify the authenticity of our claims. The decryption key for this container is: *CZ4=I{YZ456zGecgg9/cCz|zNP5bZ,nCvJqDZKrq@v?O5V$FezCNs26CD;e:%N^

There’s five layers to go. Layer 1, 2, 3, 4, and fine finally Layer 5. Each layer contains more secrets, more damaging materials, more SSI, more SCI, more government investigation materials, and generally just more truth. Consider our motivations (money, specifically Bitcoin), we’re not inclined to leak the juiciest items until we’re paid in full. However, in the interest of public awareness and transparency, we’re officially announcing our tiered compensation plan. …

This press release is reviewed at: Hacker group releases ‘9/11 Papers’, says future leaks will ‘burn down’ US deep state.

Nothing explosive in the initial documents but you have to wonder why they were scrubbed from Reddit, Pastebin, and Twitter, “immediately.”

I don’t see any ethical issue with The Dark Overlord charging for these documents. We are held hostage by utility, cable, ISP, mortgage and other hostiles. It’s a proven money-making model so why the tension over it being used here?

For further details, see the press release by The Dark Overlord. Please consider contributing to fund the release of these documents.

P.S. I rather doubt any document or report is going to bring down the “deep state.” Remember that it employs hundreds of thousands of people and numerous contractors and vendors. Shutting it down would cripple local economies in a number of places. It likely exists because it is needed to exist.

January 3, 2019

Getting Started with… Middle Egyptian [Middle Egyptian Code Talker?]

Filed under: Cybersecurity,Hacking,Hieroglyphics — Patrick Durusau @ 9:29 pm

Getting Started with… Middle Egyptian by Patrick J. Burns.

Middle Egyptian, sometimes referred to as Classical Egyptian, refers to the language spoken at Egypt from the beginning of the second millennium BCE to roughly 1300 BCE, or midway through the New Kingdom. It is also the written, hieroglyphic language of this period and so the medium in which the classical Egyptian literature of this period is transmitted. Funerary inscriptions, wisdom texts, heroic narratives like the “Tale of Sinuhe” or the “Shipwrecked Sailor,” and religious hymns have all come down to us in Middle Egyptian hieroglyphic. We also have papyri from this period written in a cursive script known as hieratic. The “middle” separates this phase of the Egyptian language from that of the previous millennium, or Old Egyptian (for example, the “pyramid” texts), and Late Egyptian, which begins in the second half of the New Kingdom and lasts until roughly 700 BCE with the emergence of Demotic. …

It’s been years since I seriously looked at a Middle Egyptian grammar or text but as a hobby, you could do far worse.

For hackers it offers the potential to keep records only you can read.

I don’t mean illegible, we can all do that, but written in a meaningful script but decodeable only by you.

Even better, you can take known religious texts, quotations for your notes. Various law enforcement agencies can hire (hope they charge top dollar) experts to translate your notes. Standard Middle Egyptian religious texts. Maybe that’s your thing. No way to prove otherwise.

The other upside is your support for the publishing of Middle Egyptian grammars, readers, and payments to Middle Egyptian experts by authorities for translation of standard texts. Bes will see the humor in such payments.

Enjoy!

January 2, 2019

The Soviet Threat [American View]

Filed under: Cybersecurity,Hacking,News,Reporting — Patrick Durusau @ 2:50 pm
John Klossner at Dark Reading.

Klossner’s cartoon illustrates the nature of American reporting on international cybersecurity. Foes of American, in this case, Russians, are depicted as criminals who routinely attack American businesses.

Shrugs. For all I know, the “routinely attack American busineses” may be true, for the Russians as well as others. What distorts American reporting is it’s failure to remind readers America uses illegal cyber means, illegal activity in general and brute force to do the same.

America is not a besieged group of innocents crowded into a nunnery surrounded by child molesters and rapists. Forced to defend itself with any means that comes to hand.

No, America is more like the largest pimp at a poker game, where wagers are in human flesh and America raises its oil engorged face from time to time to question the morals of other players.

I enjoy IT cartoons but prefer satirical ones telling truths the main stream press can’t stomach.

The Worst [Best?] Hacks of 2018

Filed under: Uncategorized — Patrick Durusau @ 10:09 am

The Worst Hacks of 2018 by Casey Chin.

After years of targeted hacks, epic heists, and run of the mill data breaches you might think that institutions would be getting wise to the importance of strong cybersecurity. But it seems 2018 was not the year. Here’s WIRED’s look back at the biggest breaches, data exposures, ransomware attacks, state-sponsored campaigns, and general hacks of the year. Stay safe in 2019.

Worst or Best hack depends on your point of view. I have no principled reason to disagree with Wire’s choices but note with more than a little disappointment, the lack of government breaches in 2018.

Yes, the Atlanta ransomware incident is listed but ransomware is hardly a step towards government transparency is it?

Perhaps that’s it. When I think of government breaches I think of widespread dissemination of information a government would prefer to keep secret. Taking it and hiding again, supply the name of your favorite drip, drip, drip journalism project, doesn’t strike me as transparency.

Granting that Wikileaks demonstrated that nonsensical information can keep the media in a frenzy by the drip, drip, drip release of emails in a presidential campaign. I suppose I am confessing that I prefer transparency over media frenzies or media outlets using hacked information for their own financial benefit.

Will 2019 be another desert of major government document/data breaches? You have 363 days to influence next year’s best/worst data hack report.


Constructing Stoplists for Historical Languages [Hackers?]

Filed under: Classics,Cybersecurity,Hacking,Natural Language Processing — Patrick Durusau @ 9:50 am

Constructing Stoplists for Historical Languages by Patrick J. Burns.

Abstract

Stoplists are lists of words that have been filtered from documents prior to text analysis tasks, usually words that are either high frequency or that have low semantic value. This paper describes the development of a generalizable method for building stoplists in the Classical Language Toolkit (CLTK), an open-source Python platform for natural language processing research on historical languages. Stoplists are not readily available for many historical languages, and those that are available often offer little documentation about their sources or method of construction. The development of a generalizable method for building historical-language stoplists offers the following benefits: 1. better support for well-documented, data-driven, and replicable results in the use of CLTK resources; 2. reduction of arbitrary decision-making in building stoplists; 3. increased consistency in how stopwords are extracted from documents across multiple languages; and 4. clearer guidelines and standards for CLTK developers and contributors, a helpful step forward in managing the complexity of a multi-language open-source project.

I post this in part to spread the word about these stoplists for humanists.

At the same time, I’m curious about the use of stoplists by hackers to filter cruft from disassembled files. Disassembled files are “texts” of a sort and it seems to me that many of the tools used by humanists could, emphasis on could, be relevant.

Suggestions/pointers?

January 1, 2019

Sherlock – 94 Social Networks

Filed under: Cybersecurity,Hacking — Patrick Durusau @ 8:37 pm

Sherlock

Sherlock self-describes as: “Find usernames across social networks”

What caught my eye was a tweet saying Sherlock searches across 94 social networks.

Are users likely to use the same password across multiple social media sites? That alone could make Sherlock quite useful.

Do password repeaters use the same password in more secure settings?

Powered by WordPress