Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 13, 2017

Deep Learning: Practice and Trends [NIPS 2017]

Filed under: Deep Learning — Patrick Durusau @ 9:03 pm

Deep Learning: Practice and Trends by Scott Reed, Nando de Freitas, Oriol Vinyals.

NIPS 2017 Tutorial, Long Beach, CA.

The image is easier to read as the first slide but the dark blue line represents registrations versus time to the NIPS conference for 2017.

The hyperlinks for the authors are to their Twitter accounts. Need I say more?

Trivia question (before you review the slides): Name two early computer scientists who rejected the use of logic as the key to intelligence?

No prize, just curious if you know without the slides.

Game Theory (Open Access textbook with 165 solved exercises)

Filed under: Game Theory — Patrick Durusau @ 5:33 pm

Game Theory (Open Access textbook with 165 solved exercises) by Giacomo Bonanno.

Not the video Bonanno references in Chapter 1 but close enough.

Game theory provides you with the tools necessary to analyze this game as well as more complex ones.

Enjoy!

A Guide To Kernel Exploitation: Attacking the Core (source files)

Filed under: Cybersecurity,Security — Patrick Durusau @ 4:07 pm

If you know or are interested in >A Guide To Kernel Exploitation: Attacking the Core by Enrico Perla and Massimiliano Oldani, the source files are now available at: https://github.com/yrp604/atc-sources.

The website that accompanied the book is now reported to be defunct. Thanks to yrp604 for preserving these files.

Enjoy!

Making an Onion List and Checking It Twice (or more)

Filed under: Privacy,Tor — Patrick Durusau @ 3:51 pm

Bash script to check if .onions and other urls are alive or not

From the post:

The basic idea of this bash script is to feed a list of .onion urls and use torsocks and wget to check if the url is active or not, surely there are many other alternatives but it always nice to have another option.

Useful script and daily reminder:

Privacy is a privilege you work for, it doesn’t happen by accident.

December 12, 2017

SIGINT for Anyone

Filed under: Intelligence,Signal/Collect — Patrick Durusau @ 9:11 pm

SIGINT for Anyone – The Growing Availability of Signals Intelligence in the Public Domain by Cortney Weinbaum, Steven Berner, Bruce McClintock.

From the webpage:

This Perspective examines and challenges the assumption that signals intelligence (SIGINT) is an inherently governmental function by revealing nongovernmental approaches and technologies that allow private citizens to conduct SIGINT activities. RAND researchers relied on publicly available information to identify SIGINT capabilities in the open market and to describe the intelligence value each capability provides to users. They explore the implications each capability might provide to the United States and allied governments.

The team explored four technology areas where nongovernmental SIGINT is flourishing: maritime domain awareness; radio frequency (RF) spectrum mapping; eavesdropping, jamming, and hijacking of satellite systems; and cyber surveillance. They then identified areas where further research and debate are needed to create legal, regulatory, policy, process, and human capital solutions to the challenges these new capabilities provide to government.

This was an exploratory effort, rather than a comprehensive research endeavor. The team relied on unclassified and publicly available materials to find examples of capabilities that challenge the government-only paradigm. They identified ways these capabilities and trends may affect the U.S. government in terms of emerging threats, policy implications, technology repercussions, human capital considerations, and financial effects. Finally, they identified areas for future study for U.S. and allied government leaders to respond to these changes.

More enticing than a practical guide to SIGINT, this report should encourage NGOs to consider SIGINT.

I say “consider” SIGINT because small organizations can’t measure intelligence success by the quantity of under-used/unexplored data on hand. Some large government do, cf. 9/11.

Where SIGINT offers a useful addition to other intelligence sources, it should be among the data feeds into an intelligence topic map.

IJCAI – Proceedings 1969-2016 Treasure Trove of AI Papers

Filed under: Artificial Intelligence,Machine Learning — Patrick Durusau @ 8:44 pm

IJCAI – Proceedings 1969-2016

From the about page:

International Joint Conferences on Artificial Intelligence is a non-profit corporation founded in California, in 1969 for scientific and educational purposes, including dissemination of information on Artificial Intelligence at conferences in which cutting-edge scientific results are presented and through dissemination of materials presented at these meetings in form of Proceedings, books, video recordings, and other educational materials. IJCAI conferences present premier international gatherings of AI researchers and practitioners. IJCAI conferences were held biennially in odd-numbered years since 1969. They are sponsored jointly by International Joint Conferences on Artificial Intelligence Organization (IJCAI), and the national AI societie(s) of the host nation(s).

While looking for a paper on automatic concept formulation for Jack Park, I found this archive of prior International Joint Conferences on Artificial Intelligence proceedings.

The latest proceedings, 2016, runs six volumes and approximately 4276 pages.

Enjoy!

A Little Story About the `yes` Unix Command

Filed under: Linux OS,Programming — Patrick Durusau @ 8:02 pm

A Little Story About the `yes` Unix Command by Matthais Endler.

From the post:

What’s the simplest Unix command you know?

There’s echo, which prints a string to stdout and true, which always terminates with an exit code of 0.

Among the rows of simple Unix commands, there’s also yes. If you run it without arguments, you get an infinite stream of y’s, separated by a newline:

Ever installed a program, which required you to type “y” and hit enter to keep going? yes to the rescue!

Endler sets out to re-implement the yes command in Rust.

Why re-implement Unix tools?

The trivial program yes turns out not to be so trivial after all. It uses output buffering and memory alignment to improve performance. Re-implementing Unix tools is fun and makes me appreciate the nifty tricks, which make our computers fast.

Endler’s story is unlikely to replace any of your holiday favorites but unlike those, it has the potential to make you a better programmer.

Connecting R to Keras and TensorFlow

Filed under: Deep Learning,R,TensorFlow — Patrick Durusau @ 7:42 pm

Connecting R to Keras and TensorFlow by Joseph Rickert.

From the post:

It has always been the mission of R developers to connect R to the “good stuff”. As John Chambers puts it in his book Extending R:

One of the attractions of R has always been the ability to compute an interesting result quickly. A key motivation for the original S remains as important now: to give easy access to the best computations for understanding data.

From the day it was announced a little over two years ago, it was clear that Google’s TensorFlow platform for Deep Learning is good stuff. This September (see announcment), J.J. Allaire, François Chollet, and the other authors of the keras package delivered on R’s “easy access to the best” mission in a big way. Data scientists can now build very sophisticated Deep Learning models from an R session while maintaining the flow that R users expect. The strategy that made this happen seems to have been straightforward. But, the smooth experience of using the Keras API indicates inspired programming all the way along the chain from TensorFlow to R.

The Redditor deepfakes, of AI-Assisted Fake Porn fame mentions Keras as one of his tools. Is that an endorsement?

Rickert’s post is a quick start to Keras and Tensorflow but he does mention:

the MEAP from the forthcoming Manning Book, Deep Learning with R by François Chollet, the creator of Keras, and J.J. Allaire.

I’ve had good luck with Manning books in general so am looking forward to this one as well.

AI-Assisted Fake Porn Is Here… [Endless Possibilities]

Filed under: Artificial Intelligence,Government,Politics,Porn — Patrick Durusau @ 5:06 pm

AI-Assisted Fake Porn Is Here and We’re All Fucked by Samantha Cole.

From the post:

Someone used an algorithm to paste the face of ‘Wonder Woman’ star Gal Gadot onto a porn video, and the implications are terrifying.

There’s a video of Gal Gadot having sex with her stepbrother on the internet. But it’s not really Gadot’s body, and it’s barely her own face. It’s an approximation, face-swapped to look like she’s performing in an existing incest-themed porn video.

The video was created with a machine learning algorithm, using easily accessible materials and open-source code that anyone with a working knowledge of deep learning algorithms could put together.

It’s not going to fool anyone who looks closely. Sometimes the face doesn’t track correctly and there’s an uncanny valley effect at play, but at a glance it seems believable. It’s especially striking considering that it’s allegedly the work of one person—a Redditor who goes by the name ‘deepfakes’—not a big special effects studio that can digitally recreate a young Princess Leia in Rogue One using CGI. Instead, deepfakes uses open-source machine learning tools like TensorFlow, which Google makes freely available to researchers, graduate students, and anyone with an interest in machine learning.
… (emphasis in original)

Posts and tweets lamenting “fake porn” abound but where others see terrifying implications, I see boundless potential.

Spoiler: The nay-sayers are on the wrong side of history – The Erotic Engine: How Pornography has Powered Mass Communication, from Gutenberg to Google Paperback by Patchen Barss.

or,


“The industry has convincingly demonstrated that consumers are willing to shop online and are willing to use credit cards to make purchases,” said Frederick Lane in “Obscene Profits: The Entrepreneurs of Pornography in the Cyber Age.” “In the process, the porn industry has served as a model for a variety of online sales mechanisms, including monthly site fees, the provision of extensive free material as a lure to site visitors, and the concept of upselling (selling related services to people once they have joined a site). In myriad ways, large and small, the porn industry has blazed a commercial path that other industries are hastening to follow.”
… (PORN: The Hidden Engine That Drives Innovation In Tech)

Enough time remains before the 2018 mid-terms for you to learn the technology used by ‘deepfakes’ to produce campaign imagery.

Paul Ryan, current Speaker of the House, isn’t going to (voluntarily) participate in a video where he steals food from children or steps on their hands as they grab for bread crusts in the street.

The same techniques that produce fake porn could be used to produce viral videos of those very scenes and more.

Some people, well-intentioned no doubt, will protest that isn’t informing the electorate and debating the issues. For them I have only one question: Why do you like losing so much?

I would wager one good viral video against 100,000 pages of position papers, unread by anyone other than the tiresome drones who produce them.

If you insist on total authenticity, then take Ryan film clips on why medical care can’t be provided for children and run it split-screen with close up death rattles of dying children. 100% truthful. See how that plays in your local TV market.

Follow ‘deepfakes’ on Reddit and start experimenting today!

December 11, 2017

Mathwashing:…

Filed under: Algorithms,Bias,Mathematics — Patrick Durusau @ 8:32 pm

Mathwashing: How Algorithms Can Hide Gender and Racial Biases by Kimberley Mok.

From the post:

Scholars have long pointed out that the way languages are structured and used can say a lot about the worldview of their speakers: what they believe, what they hold sacred, and what their biases are. We know humans have their biases, but in contrast, many of us might have the impression that machines are somehow inherently objective. But does that assumption apply to a new generation of intelligent, algorithmically driven machines that are learning our languages and training from human-generated datasets? By virtue of being designed by humans, and by learning natural human languages, might these artificially intelligent machines also pick up on some of those same human biases too?

It seems that machines can and do indeed assimilate human prejudices, whether they are based on race, gender, age or aesthetics. Experts are now finding more evidence that supports this phenomenon of algorithmic bias. As sets of instructions that help machines to learn, reason, recognize patterns and perform tasks on their own, algorithms increasingly pervade our lives. And in a world where algorithms already underlie many of those big decisions that can change lives forever, researchers are finding that many of these algorithms aren’t as objective as we assume them to be.

If you have ever suffered from the delusion that algorithms, any algorithm is “objective,” this post is a must read. Or re-read to remind yourself that “objectivity” is a claim used to put your position beyond question for self-interest. Nothing more.

For my part, I’m not sure what’s unclear about data collected, algorithms chosen, interpretation of results, all being the results of bias?

There may be acceptable biases, or degrees of bias, but the goal of any measurement is a result, which automatically biases a measurer in favor of phenomena that can be measured by a convenient technique. Phenomena that cannot be easily measured, no matter how important, won’t be included.

By the same token, “bias-correction” is the introduction of an acceptable bias and/or limiting bias to what is seen as, to the person judging the presence of bias, to an acceptable level of bias.

Bias is omnipresent and while evaluating algorithms is important, always bear in mind you are choosing acceptable bias over unacceptable bias.

Or to mis-quote the Princess Bride: “Bias is everywhere. Anyone who says differently is selling something.”

December 10, 2017

“Smart” Cock Ring Medical Hazard

Filed under: Humor,IoT - Internet of Things — Patrick Durusau @ 8:42 pm

World’s first ‘smart condom’ collects intimate data during sex and tells men whether their performance is red-hot or a total flop.

From the post:


The smart condom is a small band which fits around the bottom of a man’s willy, which means wearers will still need to strap on a normal condom to get full protection.

It is waterproof and features a band that’s ‘extraordinarily flexible to ensure maximum comfort for all sizes’.

Bizarrely, it even lights up to provide illumination for both partners’ nether regions.

Or better, a picture:

With a hand so you can judge its size:

It’s either the world’s shortest condom or it’s a cock ring. Calling it a condom doesn’t make it one.

The distinction between a condom vs. cock ring is non-trivial. Improperly used, a cock ring can lead to serious injury.

Refer any friends you are asking for to: Post coital penile ring entrapment: A report of a non-surgical extrication method.

Catalin Cimpanu @campuscodi tweeted this as: “Security disaster waiting to happen…” but competing against others poses a health risk as well.

Incomplete Reporting – How to Verify A Dark Web Discovery?

Filed under: Cybersecurity,Dark Web,Security — Patrick Durusau @ 4:50 pm

1.4 Billion Clear Text Credentials Discovered in a Single Database by Julio Casal.

From the post:

Now even unsophisticated and newbie hackers can access the largest trove ever of sensitive credentials in an underground community forum. Is the cyber crime epidemic about become an exponentially worse?

While scanning the deep and dark web for stolen, leaked or lost data, 4iQ discovered a single file with a database of 1.4 billion clear text credentials — the largest aggregate database found in the dark web to date.

None of the passwords are encrypted, and what’s scary is the we’ve tested a subset of these passwords and most of the have been verified to be true.

The breach is almost two times larger than the previous largest credential exposure, the Exploit.in combo list that exposed 797 million records. This dump aggregates 252 previous breaches, including known credential lists such as Anti Public and Exploit.in, decrypted passwords of known breaches like LinkedIn as well as smaller breaches like Bitcoin and Pastebin sites.

This is not just a list. It is an aggregated, interactive database that allows for fast (one second response) searches and new breach imports. Given the fact that people reuse passwords across their email, social media, e-commerce, banking and work accounts, hackers can automate account hijacking or account takeover.

This database makes finding passwords faster and easier than ever before. As an example searching for “admin,” “administrator” and “root” returned 226,631 passwords of admin users in a few seconds.

The data is organized alphabetically, offering examples of trends in how people set passwords, reuse them and create repetitive patterns over time. The breach offers concrete insights into password trends, cementing the need for recommendations, such as the NIST Cybersecurity Framework.
… (emphasis in original)

The full post goes onto discuss sources of the data, details of the dump file, freshness and password reuse. See Casal’s post for those details.

But no links were provided to the:

“…largest trove ever of sensitive credentials in an underground community forum.

How would you go about verifying such a discovery?

The post offers the following hints:

  1. “…single file … 1.4 billion clear text credentials”
  2. dump contains file “imported.log”
  3. list shown from “imported.log” has 55 unique file names

With #1, clear text credentials, I should be able to search for #2 “imported.log” and one of fifty-five (55) unique file names to come up with a fairly narrow set of search results. Not perfect but not a lot of manual browsing.

All onion search engines have .onion addresses.

Ahmia Never got to try one of the file names, “imported.log” returns 0 results.

Caronte I entered “imported.log,” but Caronte searches for “imported log.” Sigh, I really tire of corrective search interfaces. You? No useful results.

Haystack 0 results for “imported.log.”

Not Evil 3973 “hits” for “imported.log.” With search refinement, still no joy.

Bottom line: No verification of the reported credentials discovery.

Possible explanations:

  • Files have been moved or renamed
  • Forum is password protected
  • Used the wrong Dark Web search engines

Verification is all the rage in mainstream media.

How do you verify reports of content on the Dark Web? Or do you?

Releasing Failed Code to Distract from Accountability

Filed under: Government,Open Source,Programming,Project Management — Patrick Durusau @ 11:16 am

Dutch government publishes large project as Free Software by
Carmen Bianca Bakker.

From the post:

The Dutch Ministry of the Interior and Kingdom Relations released the source code and documentation of Basisregistratie Personen (BRP), a 100M€ IT system that registers information about inhabitants within the Netherlands. This comes as a great success for Public Code, and the FSFE applauds the Dutch government’s shift to Free Software.

Operation BRP is an IT project by the Dutch government that has been in the works since 2004. It has cost Dutch taxpayers upwards of 100 million Euros and has endured three failed attempts at revival, without anything to show for it. From the outside, it was unclear what exactly was costing taxpayers so much money with very little information to go on. After the plug had been pulled from the project earlier this year in July, the former interior minister agreed to publish the source code under pressure of Parliament, to offer transparency about the failed project. Secretary of state Knops has now gone beyond that promise and released the source code as Free Software (a.k.a. Open Source Software) to the public.

In 2013, when the first smoke signals showed, the former interior minister initially wanted to address concerns about the project by providing limited parts of the source code to a limited amount of people under certain restrictive conditions. The ministry has since made a complete about-face, releasing a snapshot of the (allegedly) full source code and documentation under the terms of the GNU Affero General Public License, with the development history soon to follow.

As far as the “…complete about-face…,” the American expression is: “You’ve been had.

Be appearing to agonize over the release of the source code, the “former interior minister” has made it appear the public has won a great victory for transparency.

Actually not.

Does the “transparency” offered by the source code show who authorized the expenditure of each part of the 100M€ total and who was paid that 100M€? Does source code “transparency” disclose project management decisions and who, in terms of government officials, approved those project decisions. For that matter, does source code “transparency” disclose discussions of project choices at all and who was present at those discussions?

It’s not hard to see that source code “transparency” is a deliberate failure on the part of the Dutch Ministry of the Interior and Kingdom Relations to be transparent. It has withheld, quite deliberately, any information that would enable Dutch citizens, programmers or otherwise, to have informed opinions about the failure of this project. Or to hold any accountable for its failure.

This may be:

…an unprecedented move of transparency by the Dutch government….

but only if the Dutch government is a black hole in terms of meaningful accountability for its software projects.

Which appears to be the case.

PS: Assuming Dutch citizens can pry project documentation out of the secretive Dutch Ministry of the Interior and Kingdom Relations, I know some Dutch topic mappers could assist with establishing transparency. If that’s what you want.

December 9, 2017

Introducing Data360R — data to the power of R [On Having an Agenda]

Filed under: Open Data,R — Patrick Durusau @ 9:06 pm

Introducing Data360R — data to the power of R

From the post:

Last January 2017, the World Bank launched TCdata360 (tcdata360.worldbank.org/), a new open data platform that features more than 2,000 trade and competitiveness indicators from 40+ data sources inside and outside the World Bank Group. Users of the website can compare countries, download raw data, create and share data visualizations on social media, get country snapshots and thematic reports, read data stories, connect through an application programming interface (API), and more.

The response to the site has been overwhelmingly enthusiastic, and this growing user base continually inspires us to develop better tools to increase data accessibility and usability. After all, open data isn’t useful unless it’s accessed and used for actionable insights.

One such tool we recently developed is data360r, an R package that allows users to interact with the TCdata360 API and query TCdata360 data, metadata, and more using easy, single-line functions.

So long as you remember the World Bank has an agenda and all the data it releases serves that agenda, you should suffer no permanent harm.

Don’t take that as meaning other sources of data have less of an agenda, although you may find their agendas differ from that of the World Bank.

The recent “discovery” that machine learning algorithms can conceal social or racist bias, was long overdue.

Anyone who took survey work in social science methodology in the last half of the 20th century would report that data collection itself, much less its processing, is fraught with unavoidable bias.

It is certainly possible, in the physical sense, to give students standardized tests, but what test results mean for any given question, such as teacher competence, is far from clear.

Or to put it differently, just because something can be measured is no guarantee the measurement is meaningful. The same applied to the data that results from any measurement process.

Take advantage of data360r certainly, but keep a wary eye on data from any source.

Clojure 1.9 Hits the Streets!

Filed under: Clojure,Functional Programming,Merging,Topic Maps — Patrick Durusau @ 4:31 pm

Clojure 1.9 by Alex Miller.

From the post:

Clojure 1.9 is now available!

Clojure 1.9 introduces two major new features: integration with spec and command line tools.

spec (rationale, guide) is a library for describing the structure of data and functions with support for:

  • Validation
  • Error reporting
  • Destructuring
  • Instrumentation
  • Test-data generation
  • Generative test generation
  • Documentation

Clojure integrates spec via two new libraries (still in alpha):

This modularization facilitates refinement of spec separate from the Clojure release cycle.

The command line tools (getting started, guide, reference) provide:

  • Quick and easy install
  • Clojure REPL and runner
  • Use of Maven and local dependencies
  • A functional API for classpath management (tools.deps.alpha)

The installer is available for Mac developers in brew, for Linux users in a script, and for more platforms in the future.

Being interested in documentation, I followed the link to spec rationale and found:


Map specs should be of keysets only

Most systems for specifying structures conflate the specification of the key set (e.g. of keys in a map, fields in an object) with the specification of the values designated by those keys. I.e. in such approaches the schema for a map might say :a-key’s type is x-type and :b-key’s type is y-type. This is a major source of rigidity and redundancy.

In Clojure we gain power by dynamically composing, merging and building up maps. We routinely deal with optional and partial data, data produced by unreliable external sources, dynamic queries etc. These maps represent various sets, subsets, intersections and unions of the same keys, and in general ought to have the same semantic for the same key wherever it is used. Defining specifications of every subset/union/intersection, and then redundantly stating the semantic of each key is both an antipattern and unworkable in the most dynamic cases.

Decomplect maps/keys/values

Keep map (keyset) specs separate from attribute (key→value) specs. Encourage and support attribute-granularity specs of namespaced keyword to value-spec. Combining keys into sets (to specify maps) becomes orthogonal, and checking becomes possible in the fully-dynamic case, i.e. even when no map spec is present, attributes (key-values) can be checked.

Sets (maps) are about membership, that’s it

As per above, maps defining the details of the values at their keys is a fundamental complecting of concerns that will not be supported. Map specs detail required/optional keys (i.e. set membership things) and keyword/attr/value semantics are independent. Map checking is two-phase, required key presence then key/value conformance. The latter can be done even when the (namespace-qualified) keys present at runtime are not in the map spec. This is vital for composition and dynamicity.

The idea of checking keys separate from their values strikes me as a valuable idea for processing of topic maps.

Keys not allowed in a topic or proxy, could signal an error, as in authoring, could be silently discarded depending upon your processing goals, or could be maintained while not considered or processed for merging purposes.

Thoughts?

Apache Kafka: Online Talk Series [Non-registration for 5 out of 6]

Filed under: Cybersecurity,ETL,Government,Kafka,Streams — Patrick Durusau @ 2:35 pm

Apache Kafka: Online Talk Series

From the webpage:

Watch this six-part series of online talks presented by Kafka experts. You will learn the key considerations in building a scalable platform for real-time stream data processing, with Apache Kafka at its core.

This series is targeted to those who want to understand all the foundational concepts behind Apache Kafka, streaming data, and real-time processing on streams. The sequence begins with an introduction to Kafka, the popular streaming engine used by many large scale data environments, and continues all the way through to key production planning, architectural and operational methods to consider.

Whether you’re just getting started or have already built stream processing applications for critical business functions, you will find actionable tips and deep insights that will help your enterprise further derive important business value from your data systems.

Video titles:

1. Introduction To Streaming Data and Stream Processing with Apache Kafka, Jay Kreps, Confluent CEO and Co-founder, Apache Kafka Co-creator.

2. Deep Dive into Apache Kafka by Jun Rao, Confluent Co-founder, Apache Kafka Co-creator.

3. Data Integration with Apache Kafka by David Tucker, Director, Partner Engineering and Alliances.

4. Demystifying Stream Processing with Apache Kafka, Neha Narkhede, Confluent CTO and Co-Founder, Apache Kafka Co-creator.

5. A Practical Guide to Selecting a Stream Processing Technology by Michael Noll, Product Manager, Confluent.

6. Streaming in Practice: Putting Kafka in Production by Roger Hoover, Engineer, Confluent. (Registration required. Anyone know a non-registration version of Hoover’s presentation?)

I was able to find versions of the first five videos that don’t require you to register to view them.

I make it a practice to dodge marketing department registrations whenever possible.

You?

Zero Days, Thousands of Nights [Zero-day – 6.9 Year Average Life Expectancy]

Filed under: Cybersecurity,Government,Security,Transparency — Patrick Durusau @ 11:41 am

Zero Days, Thousands of Nights – The Life and Times of Zero-Day Vulnerabilities and Their Exploits by Lillian Ablon, Timothy Bogart.

From the post:

Zero-day vulnerabilities — software vulnerabilities for which no patch or fix has been publicly released — and their exploits are useful in cyber operations — whether by criminals, militaries, or governments — as well as in defensive and academic settings.

This report provides findings from real-world zero-day vulnerability and exploit data that could augment conventional proxy examples and expert opinion, complement current efforts to create a framework for deciding whether to disclose or retain a cache of zero-day vulnerabilities and exploits, inform ongoing policy debates regarding stockpiling and vulnerability disclosure, and add extra context for those examining the implications and resulting liability of attacks and data breaches for U.S. consumers, companies, insurers, and for the civil justice system broadly.

The authors provide insights about the zero-day vulnerability research and exploit development industry; give information on what proportion of zero-day vulnerabilities are alive (undisclosed), dead (known), or somewhere in between; and establish some baseline metrics regarding the average lifespan of zero-day vulnerabilities, the likelihood of another party discovering a vulnerability within a given time period, and the time and costs involved in developing an exploit for a zero-day vulnerability.

Longevity and Discovery by Others

  • Zero-day exploits and their underlying vulnerabilities have a rather long average life expectancy (6.9 years). Only 25 percent of vulnerabilities do not survive to 1.51 years, and only 25 percent live more than 9.5 years.
  • No vulnerability characteristics indicated a long or short life; however, future analyses may want to examine Linux versus other platform types, the similarity of open and closed source code, and exploit class type.
  • For a given stockpile of zero-day vulnerabilities, after a year, approximately 5.7 percent have been publicly discovered and disclosed by another entity.

Rand researchers Ablon and Bogart attempt to interject facts into the debate over stockpiling zero-day vulnerabilities. It a great read, even though I doubt policy decisions over zero-day stockpiling will be fact-driven.

As an advocate of inadvertent or involuntary transparency (is there any other honest kind?), I take heart from the 6.9 year average life expectancy of zero-day exploits.

Researchers should take encouragement from the finding that within a given year, only 5.7 of all zero-days vulnerability discoveries overlap. That is 94.3% of zero-day discoveries are unique. That indicates to me vulnerabilities are left undiscovered every year.

Voluntary transparency, like presidential press conferences, is an opportunity to shape and manipulate your opinions. Zero-day vulnerabilities, on the other hand, can empower honest/involuntary transparency.

Won’t you help?

Shopping for the Intelligence Community (IC) [Needl]

Filed under: Government,Intelligence — Patrick Durusau @ 10:54 am

The holiday season in various traditions has arrived for 2018!

With it returns the vexing question: What to get for the Intelligence Community (IC)?

They have spent all year violating your privacy, undermining legitimate government institutions, supporting illegitimate governments, mocking any notion of human rights and siphoning government resources that could benefit the public for themselves and their contractors.

The excesses of your government’s intelligence agencies will be special to you but in truth, they are all equally loathsome and merit some acknowledgement at this special time of the year.

Needl is a gift for the intelligence community this holiday season and one that can keep on giving all year long.

Take back your privacy. Lose yourself in the haystack.

Your ISP is most likely tracking your browsing habits and selling them to marketing agencies (albeit anonymised). Or worse, making your browsing history available to law enforcement at the hint of a Subpoena. Needl will generate random Internet traffic in an attempt to conceal your legitimate traffic, essentially making your data the Needle in the haystack and thus harder to find. The goal is to make it harder for your ISP, government, etc to track your browsing history and habits.

…(graphic omitted)

Implemented modules:

  • Google: generates a random search string, searches Google and clicks on a random result.
  • Alexa: visits a website from the Alexa Top 1 Million list. (warning: contains a lot of porn websites)
  • Twitter: generates a popular English name and visits their profile; performs random keyword searches
  • DNS: produces random DNS queries from the Alexa Top 1 Million list.
  • Spotify: random searches for Spotify artists

Module ideas:

  • WhatsApp
  • Facebook Messenger

… (emphasis in original)

Not for people with metered access but otherwise, a must for home PCs and enterprise PC farms.

No doubt annoying but running Needl through Tor, with a list of trigger words/phrases, searches for explosives, viruses, CBW topics with locations, etc. would create festive blinking red lights for the intelligence community.

Lisp at the Frontier of Computation

Filed under: Computer Science,Lisp,Quantum — Patrick Durusau @ 10:18 am

Abstract:

Since the 1950s, Lisp has been used to describe and calculate in cutting-edge fields like artificial intelligence, robotics, symbolic mathematics, and advanced optimizing compilers. It is no surprise that Lisp has also found relevance in quantum computation, both in academia and industry. Hosted at Rigetti Computing, a quantum computing startup in Berkeley, Robert Smith will provide a pragmatic view of the technical, sociological, and psychological aspects of working with an interdisciplinary team, writing Lisp, to build the next generation of technology resource: the quantum computer.

ABOUT THE SPEAKER: Robert has been using Lisp for over decade, and has been fortunate to work with and manage expert teams of Lisp programmers to build embedded fingerprint analysis systems, machine learning-based product recommendation software, metamaterial phased-array antennas, discrete differential geometric computer graphics software, and now quantum computers. As Director of Software Engineering, Robert is responsible for building the publicly available Rigetti Forest platform, powered by both a real quantum computer and one of the fastest single-node quantum computer simulators in the world.

Video notes mention “poor audio quality.” Not the best but clear and audible to me.

The coverage of the quantum computer work is great but mostly a general promotion of Lisp.

Important links:

Forest (beta) Forest provides development access to our 30-qubit simulator the Quantum Virtual Machine ™ and limited access to our quantum hardware systems for select partners. Workshop video plus numerous other resources.

A Practical Quantum Instruction Set Architecture by Robert S. Smith, Michael J. Curtis, William J. Zeng. (speaker plus two of his colleagues)

December 8, 2017

Google About to Publicly Drop iPhone Exploit (More Holiday News!)

Filed under: Cybersecurity,FBI,Security — Patrick Durusau @ 5:41 pm

The Jailbreaking Community Is Bracing for Google to Publicly Drop an iPhone Exploit by Lorenzo Franceschi-Bicchierai.

From the post:


Because exploits are so valuable, it’s been a long time since we’ve seen a publicly accessible iPhone jailbreak even for older versions of iOS (let alone one in the wild for an up to date iPhone.) But a tweet sent by a Google researcher Wednesday has got the security and jailbreaking communities in a frenzy. The tweet suggests that Google is about to drop an exploit that is a major step toward an iPhone jailbreak, and other researchers say they will be able to take that exploit and turn it into a full jailbreak.

It might seem surprising that an iPhone exploit would be released by Google, Apple’s closest competitor, but the company has a history of doing so, albeit with less hype than this one is garnering.

Ian Beer is a Google Project Zero security researcher, and one of the most prolific iOS bug hunters. Wednesday, he told his followers to keep their “research-only” devices on iOS 11.1.2 because he was about to release “tfp0” soon. (tfp0 stands for “task for pid 0,” or the kernel task port, which gives you control of the core of the operating system.) He also hinted that this is just the first part of more releases to come. iOS 11.1.2 was just patched and updated last week by Apple; it is extremely rare for exploits for recent versions of iOS to be made public.

Another surprise in the offing for the holiday season! See Franceschi-Bicchierai’s post for much speculation and possibilities.

Benefits from a current iPhone Exploit

  • Security researchers obtain better access to research iPhone security issues
  • FBI told by courts to hire local hackers instead of badgering Apple
  • Who carries iPhones? (security clueless public officials)

From improving the lot of security researchers, local employment for hackers and greater exposure of public officials, what’s there to not like?

Looking forward to the drop and security researchers jumping on it like a terrier pack on a rat.

Haystack: The Search Relevance Conference! (Proposals by Jan. 19, 2018) Updated

Filed under: Conferences,Relevance,Search Algorithms,Search Analytics,Searching — Patrick Durusau @ 5:16 pm

Haystack: The Search Relevance Conference!

From the webpage:

Haystack is the conference for improving search relevance. If you’re like us, you work to understand the shiny new tools or dense academic papers out there that promise the moon. Then you puzzle how to apply those insights to your search problem, in your search stack. But the path isn’t always easy, and the promised gains don’t always materialize.

Haystack is the no-holds-barred conference for organizations where search, matching, and relevance really matters to the bottom line. For search managers, developers & data scientists finding ways to innovate, see past the silver bullets, and share what actually has worked well for their unique problems. Please come share and learn!

… (inline form for submission proposals)

Welcome topics include

  • Information Retrieval
  • Learning to Rank
  • Query Understanding
  • Semantic Search
  • Applying NLP to search
  • Personalized Search
  • Search UX Strategy: Perceived relevance, smart snippeting
  • Measuring and testing search against business objectives
  • Nuts & bolts: plugins for Solr, Elasticsearch, Vespa, etc
  • Adjacent topics: recommendation systems, entity matching via search, and other topics

… (emphasis in original)

The first link for the conference I saw was http://mailchi.mp/e609fba68dc6/announcing-haystack-the-search-relevance-conference, which promised topics including:

  • Intent detection

The modest price of $75 covers our costs….

To see a solution to the problem of other minds and to discover their intent, all for $75, is quite a bargain. Especially since the $75 covers breakfast and lunch both days, plus dinner the first day in a beer hall. 😉

Even without solving philosophical problems, sponsorship by OpenSource Connections is enough to recommend this conference without reservation.

My expectation is this conference is going to rock for hard core search geeks!

PS: Ask if videos will be posted. Thanks!

Follow Manuel Uberti’s Excellent Adventure – Learning Haskell

Filed under: Functional Programming,Haskell — Patrick Durusau @ 4:38 pm

Learning Haskell

Manuel Uberti’s post:

Since my first baby steps in the world of Functional Programming, Haskell has been there. Like the enchanting music of a Siren, it has been luring me with promises of a new set of skills and a better understanding of the lambda calculus.

I refused to oblige at first. A bit of Scheme and my eventual move to Clojure occupied my mind and my daily activities. Truth be told, the odious warfare between dynamic types troopers and static types zealots didn’t help steering my enthusiasm towards Haskell.

Still, my curiosity is stoic and hard to kill and the Haskell Siren was becoming too tempting to resist any further. The Pragmatic Programmer in me knew it was the right thing to do. My knowledge portfolio is always reaching out for something new.

My journey began with the much praised Programming in Haskell. I kept track of the exercises only to soon discover this wasn’t the right book for me. A bit too terse and schematic, I needed something that could ease me in in a different way. I needed more focus on the basics, the roots of the language.

As I usually do, I sought help online. I don’t know many Haskell developers, but I know there are crazy guys in the Emacs community. Steve Purcell was kind and patient enough to introduce me to Haskell Programming From First Principles.

This is a huge book (nearly 1300 pages), but it just took the authors’ prefaces to hook me. Julie Moronuki words in particular resonated heavily with me. Unlike Julie I have experience in programming, but I felt exactly like her when it comes to approaching Haskell teaching materials.

So here I am, armed with Stack and Intero and ready to abandon myself to the depths and wonders of static typing and pure functional programming. I will track my progress and maybe report back here. I already have a project in mind, but my Haskell needs to get really good before starting any serious work.

May the lambda be with me.

Uberti’s post was short enough to quote in full and offers something to offset the grimness the experience with 2017 promises to arrive in 2018.

We will all take to Twitter, Facebook, etc. in 2018 to vent our opinions but at the end of the year, finger exercise is all we will have to show for it.

Following Uberti’s plan, with Haskell, or Clojure, Category Theory, ARM Exploitation, etc., whatever best fits your interest, will see 2018 end with your possessing an expanded skill set.

Your call, finger exercise or an expanded skill set (skills you can use for your cause).

Journocode Data Journalism Dictionary

Filed under: Journalism,News,Reporting — Patrick Durusau @ 1:47 pm

Journocode Data Journalism Dictionary

From the webpage:

Navigating the field of data journalism, a field that borrows methods and terms from so many disciplines, can be hard – especially in the beginning. You need to speak the language in order to collaborate with others and knowing which words to type into a search engine is the first step to learning new things.

That’s why we started the Journocode Data Journalism Dictionary. It aims to explain technical terms from fields like programming, web development, statistics and graphics design in a way that every journalist and beginner can understand them.

Fifty-one (51) definitions as of today, 8 December 2017, and none will be unfamiliar to data scientists.

But, a useful resource for data scientists to gauge the terms already known to data journalists and perhaps a place to contribute other terms with definitions.

Don’t miss their DDJ Tools resource page while you visiting.

Contra Censors: Tor Bridges and Pluggable Transports [Please Donate to Tor]

Filed under: Censorship,Tor — Patrick Durusau @ 1:08 pm

Tor at the Heart: Bridges and Pluggable Transports by ssteele.

From the post:


Censors block Tor in two ways: they can block connections to the IP addresses of known Tor relays, and they can analyze network traffic to find use of the Tor protocol. Bridges are secret Tor relays—they don’t appear in any public list, so the censor doesn’t know which addresses to block. Pluggable transports disguise the Tor protocol by making it look like something else—for example like HTTP or completely random.

Ssteele points out censorship, even censorship of Tor, is getting worse, so the time to learn these tools is now. Don’t wait until Tor has gone dark for you to respond.

December seems to be when all the begging bowls come out from a number of worthwhile projects.

I should be pitching my cause at this point but instead, please donate to support the Tor project.

Another Windows Critical Vulnerability (and I forgot to get MS anything)

Filed under: Cybersecurity,Microsoft,Security — Patrick Durusau @ 11:58 am

Microsoft Issues Emergency Windows Security Update For A Critical Vulnerability by Swati Khandelwal.

From the post:

If your computer is running Microsoft’s Windows operating system, then you need to apply this emergency patch immediately. By immediately, I mean now!

Microsoft has just released an emergency security patch to address a critical remote code execution (RCE) vulnerability in its Malware Protection Engine (MPE) that could allow an attacker to take full control of a victim’s PC.

Enabled by default, Microsoft Malware Protection Engine offers the core cybersecurity capabilities, like scanning, detection, and cleaning, for the company’s antivirus and antimalware programs in all of its products.

According to Microsoft, the vulnerability affects a large number of Microsoft security products, including Windows Defender and Microsoft Security Essentials along with Endpoint Protection, Forefront Endpoint Protection, and Exchange Server 2013 and 2016, impacting Windows 7, Windows 8.1, Windows 10, Windows RT 8.1, and Windows Server.

Tracked as CVE-2017-11937, the vulnerability is a memory corruption issue which is triggered when the Malware Protection Engine scans a specially crafted file to check for any potential threat.
… (emphasis in original)

I always feel bad when I read about newly discovered vulnerabilities in Microsoft Windows. Despite MS opening up computers around the world to the idly curious if not the malicious, I haven’t gotten them anything.

I’m sure Munich must be celebrating its plan to switch to Windows 10 for €50m. You wouldn’t think unintended governmental transparency would be that expensive. Munich could save everyone time and trouble by backing up all its files/data to an open S3 bucket on AWS. Thoughts?

Khandelwal also reports Microsoft says that this vulnerability isn’t being used in the wild. Modulo that claim comes from the originator of the vulnerability. If it couldn’t/didn’t recognize the vulnerability in its code, what are the odds of it recognizes its exploit by others? Your call.

See Khandelwal’s post for more details.

December 7, 2017

Malpedia

Filed under: Cybersecurity,Malware — Patrick Durusau @ 8:55 pm

Malpedia

From the webpage:

Malpedia is a free service offered by Fraunhofer FKIE.

The primary goal of Malpedia is to provide a resource for rapid identification and actionable context when investigating malware. Openness to curated contributions shall ensure an accountable level of quality in order to foster meaningful and reproducible research.

Also, please be aware that not all content on Malpedia is publicly available.

More specifically, you will need an account to access all data (malware samples, non-public YARA rules, …).

In this regard, Malpedia is operated as an invite-only trust group.
…(emphasis in original)

You are probably already aware of Malpedia but I wasn’t.

Enjoy!

A Guide to Reproducible Code in Ecology and Evolution

Filed under: Bioinformatics,Biology,Replication,Research Methods,Science — Patrick Durusau @ 3:33 pm

A Guide to Reproducible Code in Ecology and Evolution by British Ecological Society.

Natilie Cooper, Natural History Museum, UK and Pen-Yuan Hsing, Durham University, UK, write in the introduction:

The way we do science is changing — data are getting bigger, analyses are getting more complex, and governments, funding agencies and the scientific method itself demand more transparency and accountability in research. One way to deal with these changes is to make our research more reproducible, especially our code.

Although most of us now write code to perform our analyses, it is often not very reproducible. We have all come back to a piece of work we have not looked at for a while and had no idea what our code was doing or which of the many “final_analysis” scripts truly was the final analysis! Unfortunately, the number of tools for reproducibility and all the jargon can leave new users feeling overwhelmed, with no idea how to start making their code more reproducible. So, we have put together this guide to help.

A Guide to Reproducible Code covers all the basic tools and information you will need to start making your code more reproducible. We focus on R and Python, but many of the tips apply to any programming language. Anna Krystalli introduces some ways to organise files on your computer and to document your workflows. Laura Graham writes about how to make your code more reproducible and readable. François Michonneau explains how to write reproducible reports. Tamora James breaks down the basics of version control. Finally, Mike Croucher describes how to archive your code. We have also included a selection of helpful tips from other scientists.

True reproducibility is really hard. But do not let this put you off. We would not expect anyone to follow all of the advice in this booklet at once. Instead, challenge yourself to add one more aspect to each of your projects. Remember, partially reproducible research is much better than completely non-reproducible research.

Good luck!
… (emphasis in original)

Not counting front and back matter, 39 pages total. A lot to grasp in one reading but if you don’t already have reproducible research habits, keep a copy of this publication on top of your desk. Yes, on top of the incoming mail, today’s newspaper, forms and chart requests from administrators, etc. On top means just that, on top.

At some future date, when the pages are too worn, creased, folded, dog eared and annotated to be read easily, reprint it and transfer your annotations to a clean copy.

I first saw this in David Smith’s The British Ecological Society’s Guide to Reproducible Science.

PS: The same rules apply to data science.

CatBoost: Yandex’s machine learning algorithm (here be Russians)

Filed under: CERN,Machine Learning — Patrick Durusau @ 3:08 pm

CatBoost: Yandex’s machine learning algorithm is available free of charge Victoria Zavyalova.

From the post:

Russia’s Internet giant Yandex has launched CatBoost, an open source machine learning service. The algorithm has already been integrated by the European Organization for Nuclear Research to analyze data from the Large Hadron Collider, the world’s most sophisticated experimental facility.

Machine learning helps make decisions by analyzing data and can be used in many different areas, including music choice and facial recognition. Yandex, one of Russia’s leading tech companies, has made its advanced machine learning algorithm, CatBoost, available free of charge for developers around the globe.

“This is the first Russian machine learning technology that’s an open source,” said Mikhail Bilenko, Yandex’s head of machine intelligence and research.

I called out the Russian origin of the CatBoost algorithm, not because I have any nationalistic tendencies but you can find frothing paranoids in U.S. government agencies and their familiars who do. In those cases, avoid CatBoost.

If you work in saner environments, or need to use categorical data (read not converted to numbers), give CatBoost a close look!

Enjoy!

The Top-100 rated Devoxx Belgium 2017 talks (or the full 207)

Filed under: Conferences,Programming — Patrick Durusau @ 1:47 pm

The Top-100 rated Devoxx Belgium 2017 talks

The top-100 list has Devoxx Belgium 2017 talks sorted in voting order, with hyperlinks to the top 50.

If you are looking for more comprehensive coverage of Devoxx Belgium 2017, try the Devoxx Belgium 2017 YouTube Playlist, with 207 videos!

Kudos to Devoxx for putting conference content online to spread the word about technology.

The Computer Science behind a modern distributed data store

Filed under: ArangoDB,Computer Science,Distributed Computing,Distributed Consistency — Patrick Durusau @ 1:34 pm

From the description:

What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are at least three necessary ingredients which are everything else than trivial to combine and of course even more challenging when heading for an acceptable performance.

Over the past years there has been significant progress in respect in both the science and practical implementations of such data stores. In his talk Max Neunhöffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores.

Topics are:

  • Challenges in developing a distributed, resilient data store
  • Consensus, distributed transactions, distributed query optimization and execution
  • The inner workings of ArangoDB, Cassandra, Cockroach and RethinkDB

The talk will touch complex and difficult computer science, but will at the same time be accessible to and enjoyable by a wide range of developers.

I haven’t found the slides for this presentation but did stumble across ArangoDB Tech Talks and Slides.

Neunhöffer’s presentation will make you look at ArangoDB more closely.

« Newer PostsOlder Posts »

Powered by WordPress