Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 14, 2018

Don’t Delete Evil Data [But Remember the Downside of “Evidence”]

Filed under: Archives,Preservation,Social Media — Patrick Durusau @ 8:56 pm

Don’t Delete Evil Data by Lam Thuy Vo.

From the post:

The web needs to be a friendlier place. It needs to be more truthful, less fake. It definitely needs to be less hateful. Most people agree with these notions.

There have been a number of efforts recently to enforce this idea: the Facebook groups and pages operated by Russian actors during the 2016 election have been deleted. None of the Twitter accounts listed in connection to the investigation of the Russian interference with the last presidential election are online anymore. Reddit announced late last fall that it was banning Nazi, white supremacist, and other hate groups.

But even though much harm has been done on these platforms, is the right course of action to erase all these interactions without a trace? So much of what constitutes our information universe is captured online—if foreign actors are manipulating political information we receive and if trolls turn our online existence into hell, there is a case to be made for us to be able to trace back malicious information to its source, rather than simply removing it from public view.

In other words, there is a case to be made to preserve some of this information, to archive it, structure it, and make it accessible to the public. It’s unreasonable to expect social media companies to sidestep consumer privacy protections and to release data attached to online misconduct willy-nilly. But to stop abuse, we need to understand it. We should consider archiving malicious content and related data in responsible ways that allow for researchers, sociologists, and journalists to understand its mechanisms better and, potentially, to demand more accountability from trolls whose actions may forever be deleted without a trace.

By some unspecified mechanism, I would support preservation of all social media. As well as have it publicly available, if it were publicly posted originally. Any restriction or permission to see/use the data will lead to the same abuses we see now.

Twitter, among others, talks about abuse but no one can prove or disprove whatever Twitter cares to say.

There is a downside to preserving social media. You have probably seen the NBC News story on 200,000 tweets that are the smoking gun on Russian interference with the 2016 elections.

Well, except that if you look at the tweets, that’s about as far from a smoking gun on Russian interference as anything you can imagine.

By analogy, that’s why intelligence analysts always say they have evidence and give you their conclusions, but not the evidence. Too much danger you will discover their report is completely fictional.

Or when not wholly fictional, serves their or their agency’s interest.

Keeping evidence is risky business. Just so you are aware.

Wikileaks Has Sprung A Leak

Filed under: Journalism,News,Reporting,Wikileaks — Patrick Durusau @ 5:03 pm

In Leaked Chats, WikiLeaks Discusses Preference for GOP over Clinton, Russia, Trolling, and Feminists They Don’t Like by Micah Lee, Cora Currier.

From the post:

On a Thursday afternoon in November 2015, a light snow was falling outside the windows of the Ecuadorian embassy in London, despite the relatively warm weather, and Julian Assange was inside, sitting at his computer and pondering the upcoming 2016 presidential election in the United States.

In little more than a year, WikiLeaks would be engulfed in a scandal over how it came to publish internal emails that damaged Hillary Clinton’s presidential campaign, and the extent to which it worked with Russian hackers or Donald Trump’s campaign to do so. But in the fall of 2015, Trump was polling at less than 30 percent among Republican voters, neck-and-neck with neurosurgeon Ben Carson, and Assange spoke freely about why WikiLeaks wanted Clinton and the Democrats to lose the election.

“We believe it would be much better for GOP to win,” he typed into a private Twitter direct message group to an assortment of WikiLeaks’ most loyal supporters on Twitter. “Dems+Media+liberals woudl then form a block to reign in their worst qualities,” he wrote. “With Hillary in charge, GOP will be pushing for her worst qualities., dems+media+neoliberals will be mute.” He paused for two minutes before adding, “She’s a bright, well connected, sadistic sociopath.”

Like Wikileaks, the Intercept treats the public like rude children, publishing only what it considers to be newsworthy content:


The archive spans from May 2015 through November 2017 and includes over 11,000 messages, more than 10 percent of them written from the WikiLeaks account. With this article, The Intercept is publishing newsworthy excerpts from the leaked messages.

My criticism of the Intercept’s selective publication of leaks isn’t unique to its criticism of Wikileaks. I have voiced similar concerns about the ICIJ and Wikileaks itself.

I want to believe the Intercept, ICIJ and Wikileaks when they proclaim others have been lying, unfaithful, dishonest, etc.

But that wanting/desire makes it even more important that I critically assess the evidence they advance for their claims.

Selective release of evidence undermines their credibility to be no more than those they accuse.

BTW, if anyone has a journalism 101 guide to writing headlines, send a copy to the Intercept. They need it.

PS: I don’t have an opinion one way or the other on the substance of the Lee/Currier account. I’ve never been threatened with a government missile so can’t say how I would react. Badly I would assume.

Russian Influence! Russian Influence! Get Your Russian Influence Here!

Filed under: Journalism,News,Politics,Reporting,Twitter — Patrick Durusau @ 3:54 pm

Twitter deleted 200,000 Russian troll tweets. Read them here. by Ben Popken (NBC News)

From the post:

NBC News is publishing its database of more than 200,000 tweets that Twitter has tied to “malicious activity” from Russia-linked accounts during the 2016 U.S. presidential election.

These accounts, working in concert as part of large networks, pushed hundreds of thousands of inflammatory tweets, from fictitious tales of Democrats practicing witchcraft to hardline posts from users masquerading as Black Lives Matter activists. Investigators have traced the accounts to a Kremlin-linked propaganda outfit founded in 2013 known as the Internet Research Association (IRA). The organization has been assessed by the U.S. Intelligence Community to be part of a Russian state-run effort to influence the outcome of the 2016 U.S. presidential race. And they’re not done.

“There should be no doubt that Russia perceives its past efforts as successful and views the 2018 US midterm elections as a potential target for Russian influence operations,” Director of National Intelligence Dan Coats told the Senate Intelligence Committee Tuesday.

Wow!

What’s really amazing is that NBC keeps up the narrative of “Russian influence” while publishing data to the contrary!

No, I confess I haven’t read all 200K tweets but then neither has NBC, if they read any of them at all.

Download tweets.csv. (NBC link) (Don’t worry, I’ve stored a copy elsewhere should that one disappear.)

On Unix, try this: head -100 tweets.csv | awk -F "," '{ print $8 }' > 100-tweets.txt

The eight field of the csv file containing the text in each tweet.

Walk with me through the shadow of Russian influence and see how you feel:

  1. “RT @LibertyBritt: He’s the brilliant guy who shoots himself in the foot to spite his face. And tries to convince us to do it too. https:/…”
  2. “RT @K1erry: The Marco Rubio knockdown of Elizabeth Warren no liberal media outlet will cover https://t.co/Rh391fEXe3”
  3. “Obama on Trump winning: ‘Anything’s possible’ https://t.co/MjVMZ5TR8Y #politics”
  4. “RT @bgg2wl: Walmart
  5. “it’s impossible! #TexasJihad”
  6. “RT @LibsNoFun: Who will wave the flag? #DayWithoutImmigrants https://t.co/Cn6JKqzE6X”
  7. “Bewaffnete attackieren Bus mit koptischen Christen #Islamisten #ISIS
  8. “”
  9. “The bright example of our failing education https://t.co/DgboGgkgVj”
  10. “@sendavidperdue How are they gonna protect us if they just let a bunch of terrorist walk the cities of our city? #StopIslam #IslamKills”

Only ten “Russian influence” tweets and I’m already thinking about vodka. You?

Let’s try another ten:

  1. “FC Barcelonas youth academy! La Masia doin work! Double tap for these little guys! https://t.co/eo1qIvLjgS”
  2. “When I remember it’s #Friyay https://t.co/yjBTsaFaR2”
  3. “RT @Ladydiann2: Remove these Anti Americans from America enough is enough abuse American freedoms how dare you low lives https://t.co/G44E6…”
  4. “RT @BreitbartNews: This week’s “”Sweden incident.”” https://t.co/EINMeA9R2T”
  5. “RT @alisajoy331: Prayer sent Never stop fighting💔 https://t.co/B9Tno5REjm”
  6. “RT @RossMoorhouse: #ItsRiskyTo
  7. “”
  8. “RT @RedState: The KKK Says A&E Producers Tried to Stage Fake Scenes for Cancelled Documentary https://t.co/HwaebG2rdI”
  9. “RT @hldb73: Bryan or Ryan Adams #whenthestarsgoblue #RejectedDebateTopics @WorldOfHashtags @TheRyanAdams @bryanadams https://t.co/wFBdne8K…”
  10. “RT @WorldTruthTV: #mutual #respect https://t.co/auIjJ2RdBU”

Well comrade. Do you feel any different about the motherland? I don’t. Let’s read some more of her tweets!

  1. “tired of kids how to get rid #SearchesGoogleIsAshamedOf”
  2. “RT @crookedwren: “”Praise be to the Lord
  3. “RT @deepscreenshots: https://t.co/1IuHuiAIJB”
  4. “Kareem Abdul Jabber #OneLetterOffSports @midnight #HashtagWars”
  5. “#God can be realized through all paths. All #religions…”
  6. “RT @RawStory: ‘Star Wars’ Han Solo movie to begin production in January https://t.co/bkZq7F7IkD”
  7. “RT @KStreetHipster: Hamner-Brown is already on its way here. It’s been on it’s way for billions of years. #KSHBC https://t.co/TQh86xN3pJ”
  8. “RT @TrumpSuperPAC: Obama’s a Muslim & this video from @FoxNews proves it! Even @CNN admits Obama’s training protesters/jihadists! #MAGA htt…”
  9. “RT @schotziejlk: .@greta Who is your #SuperBowl favorite?”
  10. “RT @LefLaneLivin: @trueblackpower As Black People we need to Support

I’m going to change my middle name to Putin out of respect for our glorious leader!

Is it respectful to get a Putin tatoo on your hiney?

(Recovers from Russian influence)

This is NBC’s damning proof of Russian influence. Like I said at the beginning, Wow!

As in Wow! how dumb.

OK, to be fair, any tweet set will have a lot of trash in it and grepping for Clinton/clinton and Trump/trump returns 20,893 for Clinton and 49,669 for Trump.

I haven’t checked but liberals talking about Clinton/Trump pre-election ran about 2 1/2 times more mentions of Trump than Clinton. (Odd way to run a campaign.)

So, the usual grep/head, etc. and the first ten “Clinton” tweets are:

  1. “Clinton: Trump should’ve apologized more
  2. “RT @thomassfl: Wikileaks E-Mails:  Hillary Clinton Blackmailed Bernie Sanders https://t.co/l9X32FegV6.”
  3. “Clinton’s VP Choice: More Harm Than Good https://t.co/iGnLChFHeP”
  4. “Hillary Clinton vows to fight
  5. “RT @Rammer_Jammer84: I don’t know about Hilary Clinton having a body double but it’s super weird that she came out by herself considering s…”
  6. “RT @Darren32895836: After Hillary Clinton Caught 4attempting 2take advantage of Americans hardships &tears changes Strat #PrayForFlorida ht…”
  7. “RT @steph93065: Hillary Clinton: Donald Trump’s Veterans Press Conference ‘Disgraceful’ – Breitbart https://t.co/CVvBOrTJBX”
  8. “RT @DianeRainie1: Hey @HillaryClinton this message is for you. Pack it up & go home Hillary
  9. “”
  10. “”RejectedDebateTopics””

and the first ten “Trump” tweets are:

  1. “Clinton: Trump should’ve apologized more
  2. “RT @AriaWilsonGOP: 3 Women Face Charges After Being Caught Stealing Dozens Of Trump Signs https://t.co/JjlZxaW3JN https://t.co/qW2Ok9ROxH”
  3. “RT @America_1st_: CW: “”The thing that impressed me was that Trump is always comfortable in own skin
  4. “Dave Chappelle: “”Black Lives Matter”” is the worst slogan I’ve ever heard! How about “”enough is enough””? VotingTrump! https://t.co/5okvmoQhcj”
  5. “Obama on Trump winning: ‘Anything’s possible’ https://t.co/MjVMZ5TR8Y #politics”
  6. “RT @TrumpSuperPAC: Obama’s a Muslim & this video from @FoxNews proves it! Even @CNN admits Obama’s training protesters/jihadists! #MAGA htt…”
  7. “Deceitful Media caught on act when trying to drive the “”Donald Trump is racist”” rhetoric.
  8. “”
  9. “RT @Veteran4Trump: A picture you will never see on @CNN or @MSNBC #BlacksForTrump Thumbs up for Trump 👍#MakeAmericaGreatAgain #Blacks4Trump…”
  10. “RT @steph93065: Hillary Clinton: Donald Trump’s Veterans Press Conference ‘Disgraceful’ – Breitbart https://t.co/CVvBOrTJBX”

That’s a small part of NBC’s smoking gun on Russian influence?

Does it stand to reason that the CIA, NSA, etc., have similar cap-gun evidence?

Several options present themselves:

  • Intelligence operatives and their leaders have been caught lying, again. That is spinning tales any reasonable reading of the evidence doesn’t support.
  • Intelligence operatives are believing one more impossible thing before breakfast and ignoring the evidence.
  • Journalists have chosen to not investigate whether intelligence operatives are lying or believing impossible things and report/defend intelligence conclusions.

Perhaps all three?

In any event, before crediting any “Russian influence” story, do take the time to review at least some of the 200,000 pieces of “evidence” NBC has collected on that topic.

You will be left amazed that you ever believed NBC News on any topic.

Phaser (Game/Training Framework)

Filed under: Education,Games — Patrick Durusau @ 11:13 am

Their graphic, certainly not mine!

From the webpage:

Desktop and Mobile HTML5 game framework. A fast, free and fun open source framework for Canvas and WebGL powered browser games.

Details: Phaser

Do you use games for learning?

For example, almost everyone recognizes the moral lepers in Congress, face on with a TV caption.

But how many of us could perform the same feat in a busy airport or in poor light?

Enter game learning/training!

Photos are easy enough to find and with Gimp you can create partially obscured faces.

Of course, points should be deducted for “recognizing” the wrong face or failing to recognize a “correct” face.

Game action after the point of recognition is up to you. Make it enjoyable if not addictive.

Ping me with your political action games, patrick@durusau.net. No prizes but if I see a particularly clever or enjoyable one, I’ll give a shout out to it.

Evolving a Decompiler

Filed under: C/C++,Compilers,Cybersecurity,Programming,Subject Identity — Patrick Durusau @ 8:36 am

Evolving a Decompiler by Matt Noonan.

From the post:

Back in 2016, Eric Schulte, Jason Ruchti, myself, Alexey Loginov, and David Ciarletta (all of the research arm of GrammaTech) spent some time diving into a new approach to decompilation. We made some progress but were eventually all pulled away to other projects, leaving a very interesting work-in-progress prototype behind.

Being a promising but incomplete research prototype, it was quite difficult to find a venue to publish our research. But I am very excited to announce that I will be presenting this work at the NDSS binary analysis research (BAR) workshop next week in San Diego, CA! BAR is a workshop on the state-of-the-art in binary analysis research, including talks about working systems as well as novel prototypes and works-in-progress; I’m really happy that the program committee decided to include discussion of these prototypes, because there are a lot of cool ideas out there that aren’t production-ready, but may flourish once the community gets a chance to start tinkering with them.

How wickedly cool!

Did I mention all the major components are open-source?


GrammaTech recently open-sourced all of the major components of BED, including:

  • SEL, the Software Evolution Library. This is a Common Lisp library for program synthesis and repair, and is quite nice to work with interactively. All of the C-specific mutations used in BED are available as part of SEL; the only missing component is the big code database; just bring your own!
  • clang-mutate, a command-line tool for performing low-level mutations on C and C++ code. All of the actual edits are performed using clang-mutate; it also includes a REPL-like interface for interactively manipulating C and C++ code to quickly produce variants.

The building of the “big code database” sounds like an exercise in subject identity doesn’t it?

Topic maps anyone?

February 13, 2018

Do You Have An ORCID identifier?

Filed under: Identifiers — Patrick Durusau @ 8:18 pm

ORCID: The number that every academic needs by Debbie Currie.

From the post:

Do you have your ORCID identifier yet? You might not even know what that is. But if you’re a researcher or academic, or planning to become one, you’re going to need one.

The Open Researcher and Contributor identifier—or ORCID—easily connects a researcher to his or her research output and allows others to access and share that body of work. ORCID streamlines publication submission and enhances discoverability. And, increasingly, granting bodies are requiring the ORCID as part of their application process.

“I tell my students it’s the social security number for a scientist,” says Denis Fourches, an assistant professor in the Department of Chemistry and a resident member of the Bioinformatics Research Center. “Then I show them an example of it that not only facilitates your life, but also the compilation of all the papers you reviewed, the compilation of all the papers you published, the compilation of all the presentations you gave at conferences.”

“‘Want that done automatically?’ I ask. And they say ‘Yeah, I like that.’”

The ORCID is a unique, 16-digit, ISO-compatible number. For instance, NCSU Libraries Chief Strategist for Research Collaboration Christopher Erdmann’s ID is 0000-0003-2554-180X. Once you register for free, you can then add information to your ORCID record (some of which will be automatically populated), and link your record to other identifier systems and profiles you might already have such as Scopus, ResearcherID, DataCite, or LinkedIn.

In lieu of the NSA sharing its global identifier for you, ORCID is your next best option. 😉

One of the advantages over your NSA global identifier is that people besides the NSA and its streams of careless contractors use your ORCID identifier.

Take the plunge, at least for your public persona.

I did, not much there (at present) but I’m now identified by: 0000-0003-3057-4833.

It doesn’t roll off the tongue but identifiers rarely do.

Register and start using your ORCID!

PS: Of course you can create an ORCID for your non-public personas as well. Bear in mind the risk of identity disclosing mistakes as you switch from one to the other.

Responsible Disclosure: You Lost 5 Months of Pwning Corporate/Government Computers

Filed under: Cybersecurity,Security — Patrick Durusau @ 7:34 pm

Skype can’t fix a nasty security bug without a massive code rewrite by Zack Whittaker.

From the post:

A security flaw in Skype’s updater process can allow an attacker to gain system-level privileges to a vulnerable computer.

The bug, if exploited, can escalate a local unprivileged user to the full “system” level rights — granting them access to every corner of the operating system.

But Microsoft, which owns the voice- and video-calling service, said it won’t immediately fix the flaw, because the bug would require too much work.

Security researcher Stefan Kanthak found that the Skype update installer could be exploited with a DLL hijacking technique, which allows an attacker to trick an application into drawing malicious code instead of the correct library. An attacker can download a malicious DLL into a user-accessible temporary folder and rename it to an existing DLL that can be modified by an unprivileged user, like UXTheme.dll. The bug works because the malicious DLL is found first when the app searches for the DLL it needs.

Once installed, Skype uses its own built-in updater to keep the software up to date. When that updater runs, it uses another executable file to run the update, which is vulnerable to the hijacking.

Impact of responsible disclosure?

Microsoft sat on its ass for over five months, five months you could have been pwning corporate and government computers, only to say (paraphrase): “It’s too hard.”

It wasn’t too hard for them to completely break Skype for Ubuntu and possibly other flavors of Linux. But fixing a large bug? No, let us introduce some new ones and then we’ll think about the existing ones.

Most corporations and governments maintain secrets only by lack of effort on the part of the public.

Give that some thought when deciding how to spend your leisure time.

February 12, 2018

Improving Your Phishing Game

Filed under: Cybersecurity,Ethics,Phishing for Leaks,Security — Patrick Durusau @ 7:52 pm

Did you know that KnowBe4 publishes quarterly phishing test analysis? Ranks the top lines that get links in phishing emails followed.

The entire site of KnowBe4 is a reference source if you don’t want to fall for or look like a Nigerian spammer when it comes to phishing emails.

Their definition of phishing:

Phishing is the process of attempting to acquire sensitive information such as usernames, passwords and credit card details by masquerading as a trustworthy entity using bulk email which tries to evade spam filters.

Emails claiming to be from popular social web sites, banks, auction sites, or IT administrators are commonly used to lure the unsuspecting public. It’s a form of criminally fraudulent social engineering.

I think:

It’s a form of criminally fraudulent social engineering.

sounds a bit harsh and not nuanced at all.

For example, these aren’t criminally fraudulent cases of phishing:

  • CIA sends phishing emails to foreign diplomats
  • FBI sends phishing emails to anti-war and social reform groups
  • NSA sends phishing emails to government officials (ours, theirs, etc.)

Phishing is an amoral weapon, just like any other weapon.

If you use phishing to uncover child sex traffickers, is that a criminally fraudulent use of phishing? Not to me.

If you hear a different conclusion in a windy discussion of ethics, don’t bother to write. I’ll just treat it as spam.

Don’t let other people make broad ethical pronouncements on your behalf. They have an agenda and it’s not likely to be one in your interest.

Meanwhile, improve your phishing game!

Establishment is Gaslighting Us [Begging Bowl/Reduced Rates Ahead]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 7:19 pm

How Establishment Propaganda Gaslights Us Into Submission by Caitlin Johnstone.

The dynamics of the establishment Syria narrative are hilarious if you take a step back and think about them. I mean, the Western empire is now openly admitting to having funded actual, literal terrorist groups in that country, and yet they’re still cranking out propaganda pieces about what is happening there and sincerely expecting us to believe them. It’s adorable, really; like a little kid covered in chocolate telling his mom he doesn’t know what happened to all the cake frosting.

Or least it would be adorable if it weren’t directly facilitating the slaughter of hundreds of thousands of people.

I recently had a pleasant and professional exchange with the Atlantic Council’s neoconservative propagandist Eliot Higgins, in which he referred to independent investigative journalist Vanessa Beeley as “bonkers” and myself as “crazy,” and I called him a despicable bloodsucking ghoul. I am not especially fond of Mr. Higgins.

You see this theme repeated again and again and again in Higgins’ work; the U.S.-centralized power establishment which facilitated terrorist factions in Syria is the infallible heroic Good Guy on the scene, and anyone who doesn’t agree is a mentally deranged lunatic.

If you want to see more journalism that you forward to others, post to Facebook, etc., then donate to Consortiumnews.com.

I should be begging for money for myself, blah, blah, blah, but considering the ongoing fail of the complicit mainstream media, donation to Consortiumnews.com will do more good than donating to me.

If you hire me for research, standards editing or semantic/topic maps work, discount rates are available for donors to Consortiumnews.com.

Reducing the Emotional Toll of Debating Bigots, Fascists and Misogynists

Filed under: Keras,Politics,Python,TensorFlow — Patrick Durusau @ 5:08 pm

Victims of bigots, fascists and misogynists on social media can (and many have) recounted the emotional toll of engaging with them.

How would you like to reduce your emotional toll and consume minutes if not hours of their time?

I thought you might be interested. 😉

Follow the link to DeepPavlov. (Ignore the irony of the name considering the use case I’m outlining.)

From the webpage:

An open source library for building end-to-end dialog systems and training chatbots.

We are in a really early Alfa release. You have to be ready for hard adventures.

An open-source conversational AI library, built on TensorFlow and Keras, and designed for

  • NLP and dialog systems research
  • implementation and evaluation of complex conversational systems

Our goal is to provide researchers with:

  • a framework for implementing and testing their own dialog models with subsequent sharing of that models
  • set of predefined NLP models / dialog system components (ML/DL/Rule-based) and pipeline templates
  • benchmarking environment for conversational models and systematized access to relevant datasets

and AI-application developers with:

  • framework for building conversational software
  • tools for application integration with adjacent infrastructure (messengers, helpdesk software etc.)

… (emphasis in the original)

Only one component for a social media engagement bot to debate bigots, fascists and misogynists but a very important one. A trained AI can take the emotional strain off of victims/users and at least in some cases, inflict that toll on your opponents.

For OpSec reasons, don’t announce the accounts used by such an AI backed system.

PS: AI ethics debaters. This use of an AI isn’t a meaningful interchange of ideas online. My goals are: reduce the emotional toll on victims, waste the time of their attackers. Disclosing you aren’t hurting someone on the other side (the bot) isn’t a requirement in my view.

February 10, 2018

The Complexity of Neurons are Beyond Our Current Imagination

Filed under: Artificial Intelligence — Patrick Durusau @ 9:28 pm

The Complexity of Neurons are Beyond Our Current Imagination by Carlos E. Perez.

From the post:

One of the biggest misconceptions around is the idea that Deep Learning or Artificial Neural Networks (ANN) mimic biological neurons. At best, ANN mimic a cartoonish version of a 1957 model of a neuron. Neurons in Deep Learning are essentially mathematical functions that perform a similarity function of its inputs against internal weights. The closer a match is made, the more likely an action is performed (i.e. not sending a signal to zero). There are exceptions to this model (see: Autoregressive networks) however it is general enough to include the perceptron, convolution networks and RNNs.

Jeff Hawkins of Numenta has always lamented that a more biologically-inspired approach is needed. So, in his research on building cognitive machinery, he has architected system that more mimic the structure of the neo-cortex. Numenta’s model of a neuron is considerably more elaborate than the Deep Learning model of a neuron:

I rather like the line “ANN mimic a cartoonish version of a 1957 model of a neuron.”

You need not worry about the MIT Intelligence Quest replicating neurons anytime soon.

In part because no one really knows how neurons work or how much more we need to learn to replicate them.

The AI crowd could train a neural network to recognize people and to fire weapons at them. Qualifies as destruction of humanity by an AI but if we are really that stupid, perhaps its time to make space for others.

JanusGraph + YugaByte (Does Cloud-Native Mean I Call Langley For Backup Support?)

Filed under: Graphs,JanusGraph — Patrick Durusau @ 8:59 pm

JanusGraph + YugaByte

Short tutorial on setting up JanusGraph to work with YugaByte DB.

I know JanusGraph so looked for more on YugaByte DB and found (overview):


Purpose-built for mission-critical applications

Mission-critical applications have a strong need for data correctness and high availability. They are typically composed of microservices with diverse workloads such as key/value, flexible schema, graph or relational. The access patterns vary as well. SaaS services or mobile/web applications keeping customer records, order history or messages need zero-data loss, geo-replication, low-latency reads/writes and a consistent customer experience. Fast data infrastructure use cases (such as IoT, finance, timeseries data) need near real-time & high-volume ingest, low-latency reads, and native integration with analytics frameworks like Apache Spark.

YugaByte DB offers polyglot persistence to power these diverse workloads and access patterns in a unified database, while providing strong correctness guarantees and high availability. You are no longer forced to create infrastructure silos for each workload or choose between different flavors SQL and NoSQL databases. YugaByte breaks down the barrier between SQL and NoSQL by offering both.

Cloud-native agility

Another theme common across these microservices is the move to a cloud-native architecture, be it on the public cloud, on-premises or hybrid environment. The primary driver is to make infrastructure agile. Agile infrastructure is linearly scalable, fault-tolerant, geo-distributed, re-configurabile with zero downtime and portable across clouds. While the container ecosystem led by Docker & Kubernetes has enabled enterprises to realize this vision for the stateless tier, the data tier has remained a big challenge. YugaByte DB is purpose-built to address these challenges, but for the data tier, and serves as the stateful complement to containers.

Only partially joking about “cloud-native” meaning you call Langley (CIA) for backup support.

Anything that isn’t air-gapped in a secure facility has been compromised. Note the use of past tense.

Disclosures about government spying, to say nothing of your competitors and lastly hackers, makes any other assumption untenable.

MIT Intelligence Quest

Filed under: Artificial Intelligence,Machine Learning — Patrick Durusau @ 8:36 pm

MIT Intelligence Quest

From the webpage:

The MIT Intelligence Quest will advance the science and engineering of both human and machine intelligence. Launched on February 1, 2018, MIT IQ seeks to discover the foundations of human intelligence and drive the development of technological tools that can positively influence virtually every aspect of society.

The Institute’s culture of collaboration will encourage life scientists, computer scientists, social scientists, and engineers to join forces to investigate the societal implications of their work as they pursue hard problems lying beyond the current horizon of intelligence research. By uniting diverse fields and capitalizing on what they can teach each other, we seek to answer the deepest questions about intelligence.

We are setting out to answer two big questions: How does human intelligence work, in engineering terms? And how can we use that deep grasp of human intelligence to build wiser and more useful machines, to the benefit of society?

Drawing on MIT’s deep strengths and signature values, culture, and history, MIT IQ promises to make important contributions to understanding the nature of intelligence, and to harnessing it to make a better world.

The most refreshing aspect of the MIT Intelligence Quest page is that it ends a contact form.

That’s right, a contact form.

Unlike the ill-fated EU brain project that had pre-chosen approaches and had a roadmap for replicating a human brain. Are they still consuming funds with meetings, hotel rooms, etc.?

You know my mis-givings about creating intelligence in the absence of understanding our own.

On the other hand, mimicking how human intelligence works in bounded situations is a far more tractable problem.

Not too tractable but tractable enough to yield useful results.

XML periodic table

Filed under: XML — Patrick Durusau @ 8:09 pm

XML periodic table

It’s a visual thing and my small blog format style won’t do it justice. Follow the link.

XML grouped by Business language, QA, Document format, Internet format, Graphic format, Metadata standard, Transformation.

What a cool listing!

Lots of old friends but some potential new ones as well!

Enjoy!

February 9, 2018

XML Prague 2018 Conference Proceedings – Weekend Reading!

Filed under: Conferences,XML,XML Database,XPath,XQuery,XSLT — Patrick Durusau @ 9:13 pm

XML Prague 2018 Conference Proceedings

Two Hundred and Sixty (260) pages of high quality content on XML!

From the table of contents:

  • Assisted Structured Authoring using Conditional Random Fields – Bert Willems
  • XML Success Story: Creating and Integrating Collaboration Solutions to Improve the Documentation Process – Steven Higgs
  • xqerl: XQuery 3.1 Implementation in Erlang – Zachary N. Dean
  • XML Tree Models for Efficient Copy Operations – Michael Kay
  • Using Maven with XML development projects – Christophe Marchand and Matthieu Ricaud-Dussarget
  • Varieties of XML Merge: Concurrent versus Sequential – Tejas Pradip Barhate and Nigel Whitaker
  • Including XML Markup in the Automated Collation of Literary Text – Elli Bleeker, Bram Buitendijk, Ronald Haentjens Dekker, and Astrid Kulsdom
  • Multi-Layer Content Modelling to the Rescue – Erik Siegel
  • Combining graph and tree – Hans-Juergen Rennau
  • SML – A simpler and shorter representation of XML – Jean-François Larvoire
  • Can we create a real world rich Internet application using Saxon-JS? – Pieter Masereeuw
  • Implementing XForms using interactive XSLT 3.0 – O’Neil Delpratt and Debbie Lockett
  • Life, the Universe, and CSS Tests – Tony Graham
  • Form, and Content – Steven Pemberton
  • tokenized-to-tree – Gerrit Imsieke

I just got a refurbished laptop for reading in bed. Now I have to load XML parsers, etc. on it to use along with reading these proceedings!

Enjoy!

PS: Be sure to thank Jirka Kosek for his tireless efforts promoting XML and XML Prague!

Alexandra Elbakyan (Sci-Hub) As Freedom Fighter

Filed under: Intellectual Property (IP),Open Access,Open Data — Patrick Durusau @ 3:33 pm

Recognizing Alexandra Elbakyan:

Alexandra Elbakyan is the freedom fighter behind Sci-Hub, a repository of 64.5 million papers, or “two-thirds of all published research, and it [is] available to anyone.”

Ian Graber-Stiehl, in Science’s Pirate Queen, misses an opportunity to ditch the mis-framing of Elbakyan as a “pirate,” and to properly frame her as a freedom fighter.

To set the background for why you too should see Elbakyan as a freedom fighter, it’s necessary to review, briefly, the notion of “sale” and your intellectual freedom prior to widespread use of electronic texts.

When I started using libraries in the ’60’s, you had to physically visit the library to use its books or journals. The library would purchase those items, what is known as first sale, and then either lend them or allow patrons to read them. No separate charge or income for the publisher upon reading. And once purchased, the item remained in the library for use by others.

With the advent of electronic texts, plus oppressive contracts and manipulation of the law, publishers began charging libraries even more than when libraries purchased and maintained access to material for their patrons. Think of it as a form of recurrent extortion, you can’t have access to materials already purchased, save for paying to maintain that access.

Which of course means that both libraries and individuals have lost their right to pay for an item and to maintain it separate and apart from the publisher. That’s a serious theft and it took place in full public view.

There are pirates in this story, people who stole the right of libraries and individuals to purchase items for their own storage and use. Some of the better known ones include: American Chemical Society, Reed-Elsevier (a/k/a RELX Group),Sage Publishing, Springer, Taylor & Francis, and, Wiley-Blackwell.

Elbakyan is trying to recover access for everyone, access that was stolen.

That doesn’t sound like the act of a pirate. Pirates steal for their own benefit. That sounds like the pirates I listed above.

Now that you know Elbakyan is fighting to recover a right taken from you, does that make you view her fight differently?

BTW, when publishers float the false canard of their professional staff/editors/reviewers, remember their retraction rates are silent witnesses refuting their claims of competence.

Read any recent retraction for the listed publishers. Use RetractionWatch for current or past retractions. “Unread” is the best explanation for how most of them got past “staff/editors/reviewers.”

Do you support freedom fighters or publisher/pirates?

If you want to support publisher/pirates, no further action needed.

If you want to support freedom fighters, including Alexandra Elbakyan, the Sci-Hub site has a donate link, contact Elbakyan if you have extra cutting edge equipment to offer, promote Sci-Hub on social media, etc.

For making the lives of publisher/pirates more difficult, use your imagination.

To follow Elbakyan, see her blog and Facebook page.

Fear Keeps People in Line (And Ignorant of Apple Source Code)

Filed under: Cybersecurity,Hacking,Security — Patrick Durusau @ 11:05 am

Apple’s top-secret iBoot firmware source code spills onto GitHub for some insane reason by Chris Williams.

From the post:

The confidential source code to Apple’s iBoot firmware in iPhones, iPads and other iOS devices has leaked into a public GitHub repo.

The closed-source code is top-secret, proprietary, copyright Apple, and yet has been quietly doing the rounds between security researchers and device jailbreakers on Reddit for four or so months, if not longer.

We’re not going to link to it. Also, downloading it is not recommended. Just remember what happened when people shared or sold copies of the stolen Microsoft Windows 2000 source code back in the day.

Notice that Williams cites scary language about the prior Windows source code but not a single example of an actual prosecution for downloading or sharing that source code. I have strong suspicions why no examples were cited.*

You?

The other thing to notice is “security researchers” have been sharing it for months, but if the great unwashed public gets to see it, well, that’s a five alarm fire.

Williams has sided with access only for the privileged, although I would be hard pressed to say why?

BTW, if you want to search Github for source code that claims to originate from Apple, use the search term iBoot.

No direct link because in the DCMA cat and mouse game, any link will be quickly broken and I have no way to verify whether a repository is or isn’t Apple source code.

Don’t let fear keep you ignorant.

*My suspicions are that anyone reading Microsoft Windows 2000 source code became a poorer programmer and that was viewed as penalty enough.

February 8, 2018

OpenStreetMap, R + Revival of Cold War Parades

Filed under: Mapping,OpenStreetMap,R — Patrick Durusau @ 5:26 pm

Cartographic Explorations of the OpenStreetMap Database with R by Timothée Giraud.

From the post:

This post exposes some cartographic explorations of the OpenStreetMap (OSM) database with R.

These explorations begin with the downloading and the cleaning of OSM data. Then I propose a set of map visualizations of the spatial distributions of bars and restaurants in Paris. Of course, these examples could be adapted to other spatial contexts and thematics (e.g. pharmacies in Roma, bike parkings in Dublin…).

This reproducible analysis is hosted on GitHub (code + data + walk-through).

What a timely post! The accidental president of the United States hungers for legitimacy and views a military parade, Cold War style, as a way to achieve that end.

If it weren’t for all those pesky cable news channels, the military could station the reviewing stand in a curve and run the same tanks, same missiles, same troops past the review stand until the president gets bored.

A sensible plan won’t suggest itself to them so expect it to be a more traditional and expensive parade.

Just in case you want to plan other “festivities” at or to intersect with those planned for the president, the data at the OpenStreetMap will prove helpful.

Once the city and parade route becomes known, what questions would you ask of OpenStreetMap data?

Porn, AI and Open Source Ethics

Filed under: Artificial Intelligence,Deep Learning,Open Source,Porn,TensorFlow — Patrick Durusau @ 4:18 pm

Google Gave the World Powerful AI Tools, and the World Made Porn With Them by Dave Gershgorn.

From the post:

In 2015, Google announced it would release its internal tool for developing artificial intelligence algorithms, TensorFlow, a move that would change the tone of how AI research and development would be conducted around the world. The means to build technology that could have an impact as profound as electricity, to borrow phrasing from Google’s CEO, would be open, accessible, and free to use. The barrier to entry was lowered from a Ph.D to a laptop.

But that also meant TensorFlow’s undeniable power was now out of Google’s control. For a little over two years, academia and Silicon Valley were still the ones making the biggest splashes with the software, but now that equation is changing. The catalyst is deepfakes, an anonymous Reddit user who built around AI software that automatically stitches any image of a face (nearly) seamlessly into a video. And you can probably imagine where this is going: As first reported by Motherboard, the software was being used to put anyone’s face, such as a famous woman or friend on Facebook, on the bodies of porn actresses.

After the first Motherboard story, the user created their own subreddit, which amassed more than 91,000 subscribers. Another Reddit user called deepfakeapp has also released a tool called FakeApp, which allows anyone to download the AI software and use it themselves, given the correct hardware. As of today, Reddit has banned the community, saying it violated the website’s policy on involuntary pornography.

According to FakeApp’s user guide, the software is built on top of TensorFlow. Google employees have pioneered similar work using TensorFlow with slightly different setups and subject matter, training algorithms to generate images from scratch. And there are plenty of potentially fun (if not inane) uses for deepfakes, like putting Nicolas Cage in a bunch of different movies. But let’s be real: 91,000 people were subscribed to deepfakes’ subreddit for the porn.

While much good has come from TensorFlow being open source, like potential cancer detection algorithms, FakeApp represents the dark side of open source. Google (and Microsoft and Amazon and Facebook) have loosed immense technological power on the world with absolutely no recourse. Anyone can download AI software and use it for anything they have the data to create. That means everything from faking political speeches (with help from the cadre of available voice-imitating AI) to generating fake revenge porn. All digital media is a series of ones and zeroes, and artificial intelligence is proving itself proficient at artfully arranging them to generate things that never happened.

You can imagine the rest or read the rest of Gershgon’s (deep voice): “dark side of open source.”

While you do, remember that Gershgon would have made the same claims about:

  1. Telephones
  2. Photography
  3. Cable television
  4. Internet
  5. etc.

The simplest rejoinder is that the world did not create porn with AI. A tiny subset of the world signed up to see porn created by an even smaller subset of the world.

The next simplest rejoinder is the realization that Gershgon wants a system that dictates ethics to users of open source software. Gershgon should empower an agency to enforce ethics on journalists and check back in a couple of years to report on their experience.

I’m willing to be ahead of time it won’t be a happy report.

Bottom line: Leave the ethics of open source software to the people using such software. May not always have a happy outcome but will always be better than the alternatives.

Introducing HacSpec (“specification language for cryptographic primitives”)

Filed under: Cryptography,Cybersecurity,Security — Patrick Durusau @ 2:58 pm

Introducing HacSpec by Franziskus Kiefer.

From the post:

HacSpec is a proposal for a new specification language for cryptographic primitives that is succinct, that is easy to read and implement, and that lends itself to formal verification. It aims to formalise the pseudocode used in cryptographic standards by proposing a formal syntax that can be checked for simple errors. HacSpec specifications are further executable to test against test vectors specified in a common syntax.

The main focus of HacSpec is to allow specifications to be compiled to formal languages such as cryptol, coq, F*, and easycrypt and thus make it easier to formally verify implementations. This allows a specification using HacSpec to be the basis not only for implementations but also for formal proofs of functional correctness, cryptographic security, and side-channel resistance.

The idea of having a language like HacSpec stems from discussions at the recent HACS workshop in Zurich. The High-Assurance-Cryptographic-Software workshop (HACS) is an invite-only workshop co-located with the Real World Crypto symposium.

Anyone interested in moving this project forward should subscribe to the mailing list or file issues and pull requests against the Github repository.

Cryptography projects should be monitored like the NSA does NIST cryptography standards. If you see an error or weakness, you’re under no obligation to help. The NSA won’t.

Given security fails from software, users, etc., end-to-end encryption resembles transporting people from one homeless camp to another in an armored car.

Secure in transit but not secure at either end.

Running a Tor Relay (New Guide)

Filed under: Privacy,Security,Tor — Patrick Durusau @ 10:45 am

The New Guide to Running a Tor Relay

Have we told you lately how much we love our relay operators? Relays are the backbone of the Tor network, providing strength and bandwidth for our millions of users worldwide. Without the thousands of fast, reliable relays in the network, Tor wouldn’t exist.

Have you considered running a relay, but didn’t know where to start? Perhaps you’re just looking for a way to help Tor, but you’ve always thought that running a relay was too complicated or technical for you and the documentation seemed daunting.

We’re here to tell you that you can become one of the many thousands of relay operators powering the Tor network, if you have some basic command-line experience.

If you can’t help support the Tor network by running a relay, don’t despair! There’s are always ways to volunteer and of course to donate.

Your support helps everyone who uses Tor and sometimes results in really cool graphics, like this one for running a Tor relay:

If you want something a bit closer to the edge, try creating a graphic where spy rays from corporations and governments bounce off of secure autos, computers, homes, phones.

February 7, 2018

Kali Linux 2018.1 Release

Filed under: Cybersecurity,Security — Patrick Durusau @ 9:52 pm

Kali Linux 2018.1 Release

From the post:

Welcome to our first release of 2018, Kali Linux 2018.1. This fine release contains all updated packages and bug fixes since our 2017.3 release last November. This release wasn’t without its challenges–from the Meltdown and Spectre excitement (patches will be in the 4.15 kernel) to a couple of other nasty bugs, we had our work cut out for us but we prevailed in time to deliver this latest and greatest version for your installation pleasure.

Churn, especially in security practices and software, is the best state imaginable for generating vulnerabilities.

New software means new bugs, unfamiliar setup requirements, newbie user mistakes, in addition to the 33% or more of users who accept phishing emails.

2018 looks like a great year for security churn.

How stable is your security? (Don’t answer over a clear channel.)

The Matrix Calculus You Need For Deep Learning

Filed under: Deep Learning,Machine Learning,Mathematics — Patrick Durusau @ 9:22 pm

The Matrix Calculus You Need For Deep Learning by Terence Parr, Jeremy Howard.

Abstract:

This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed. Note that you do not need to understand this material before you start learning to train and use deep learning in practice; rather, this material is for those who are already familiar with the basics of neural networks, and wish to deepen their understanding of the underlying math. Don’t worry if you get stuck at some point along the way—just go back and reread the previous section, and try writing down and working through some examples. And if you’re still stuck, we’re happy to answer your questions in the Theory category at forums.fast.ai. Note: There is a reference section at the end of the paper summarizing all the key matrix calculus rules and terminology discussed here.

Here’s a recommendation for reading the paper:

(We teach in University of San Francisco’s MS in Data Science program and have other nefarious projects underway. You might know Terence as the creator of the ANTLR parser generator. For more material, see Jeremy’s fast.ai courses and University of San Francisco’s Data Institute in-person version of the deep learning course.

Apologies to Jeremy but I recognize ANTLR more quickly than I do Jeremy’s fast.ai courses. (Need to fix that.)

The paper runs thirty-three pages and as the authors say, most of it is unnecessary unless you want to understand what’s happening under the hood with deep learning.

Think of it as the difference between knowing how to drive a sports car and being able to work on a sports car.

With the latter set of skills, you can:

  • tweak your sports car for maximum performance
  • tweak someone else’s sports car for less performance
  • detect someone tweaking your sports car

Read the paper, master the paper.

No test, just real world consequences that separate the prepared from the unprepared.

Were You Pwned by the “Human Cat” Story?

Filed under: Facebook,Fake News — Patrick Durusau @ 5:55 pm

Overseas Fake News Publishers Use Facebook’s Instant Articles To Bring In More Cash by Jane Lytvynenko

Fake stories first:

From the post:

While some mainstream publishers are abandoning Facebook’s Instant Articles, fake news sites based overseas are taking advantage of the format — and in some cases Facebook itself is earning revenue from their false stories.

BuzzFeed News found 29 Facebook pages, and associated websites, that are using Instant Articles to help their completely false stories load faster on Facebook. At least 24 of these pages are also signed up with Facebook Audience Network, meaning Facebook itself earns a share of revenue from the fake news being read on its platform.

Launched in 2015, Instant Articles offer a way for publishers to have their articles load quickly and natively within the Facebook mobile app. Publishers can insert their own ads or use Facebook’s ad network, Audience Network, to automatically place advertisements into their articles. Facebook takes a cut of the revenue when sites monetize with Audience Network.

“We’re against false news and want no part of it on our platform; including in Instant Articles,” said an email statement from a Facebook spokesperson. “We’ve launched a comprehensive effort across all products to take on these scammers, and we’re currently hosting third-party fact checkers from around the world to understand how we can more effectively solve the problem.”

The spokesperson did not respond to questions about the use of Instant Articles by spammers and fake news publishers, or about the fact that Facebook’s ad network was also being used for monetization. The articles sent to Facebook by BuzzFeed News were later removed from the platform. The company also removes publishers from Instant Articles if they’ve been flagged by third-party fact-checkers.

Really? You could be pwned by a “human cat” story?

Why I should be morally outraged and/or willing to devote attention to stopping that type of fake news?

Or ask anyone else to devote their resources to it?

Would you seek out Flat Earthers to dispel their delusions? If not, leave the “fake news” to people who seem to enjoy it. It’s their dime.

February 6, 2018

What the f*ck Python! 🐍

Filed under: Programming,Python — Patrick Durusau @ 8:32 pm

What the f*ck Python! 🐍

From the post:

Python, being a beautifully designed high-level and interpreter-based programming language, provides us with many features for the programmer’s comfort. But sometimes, the outcomes of a Python snippet may not seem obvious to a regular user at first sight.

Here is a fun project to collect such tricky & counter-intuitive examples and lesser-known features in Python, attempting to discuss what exactly is happening under the hood!

While some of the examples you see below may not be WTFs in the truest sense, but they’ll reveal some of the interesting parts of Python that you might be unaware of. I find it a nice way to learn the internals of a programming language, and I think you’ll find them interesting as well!

If you’re an experienced Python programmer, you can take it as a challenge to get most of them right in first attempt. You may be already familiar with some of these examples, and I might be able to revive sweet old memories of yours being bitten by these gotchas 😅

If you’re a returning reader, you can learn about the new modifications here.

So, here we go…

What better way to learn than being really pissed off that your code isn’t working? Or isn’t working as expected.

😉

This looks like a real hoot! Too late today to do much with it but I’ll be returning to it.

Enjoy!

Dive into BPF: a list of reading material

Filed under: Cybersecurity,Networks — Patrick Durusau @ 8:22 pm

Dive into BPF: a list of reading material by Quentin Monnet.

From the post:

BPF, as in Berkeley Packet Filter, was initially conceived in 1992 so as to provide a way to filter packets and to avoid useless packet copies from kernel to userspace. It initially consisted in a simple bytecode that is injected from userspace into the kernel, where it is checked by a verifier—to prevent kernel crashes or security issues—and attached to a socket, then run on each received packet. It was ported to Linux a couple of years later, and used for a small number of applications (tcpdump for example). The simplicity of the language as well as the existence of an in-kernel Just-In-Time (JIT) compiling machine for BPF were factors for the excellent performances of this tool.

Then in 2013, Alexei Starovoitov completely reshaped it, started to add new functionalities and to improve the performances of BPF. This new version is designated as eBPF (for “extended BPF”), while the former becomes cBPF (“classic” BPF). New features such as maps and tail calls appeared. The JIT machines were rewritten. The new language is even closer to native machine language than cBPF was. And also, new attach points in the kernel have been created.

Thanks to those new hooks, eBPF programs can be designed for a variety of use cases, that divide into two fields of applications. One of them is the domain of kernel tracing and event monitoring. BPF programs can be attached to kprobes and they compare with other tracing methods, with many advantages (and sometimes some drawbacks).

The other application domain remains network programming. In addition to socket filter, eBPF programs can be attached to tc (Linux traffic control tool) ingress or egress interfaces and perform a variety of packet processing tasks, in an efficient way. This opens new perspectives in the domain.

And eBPF performances are further leveraged through the technologies developed for the IO Visor project: new hooks have also been added for XDP (“eXpress Data Path”), a new fast path recently added to the kernel. XDP works in conjunction with the Linux stack, and relies on BPF to perform very fast packet processing.

Even some projects such as P4, Open vSwitch, consider or started to approach BPF. Some others, such as CETH, Cilium, are entirely based on it. BPF is buzzing, so we can expect a lot of tools and projects to orbit around it soon…

I haven’t even thought about the Berkeley Packet Filter in more than a decade.

But such a wonderful reading list merits mention in its own right. What a great model for reading lists on other topics!

And one or more members of your team may want to get closer to the metal on packet traffic.

PS: I don’t subscribe to the only governments can build nation state level tooling for hacks. Loose confederations of people built the Internet. Something to keep in mind while sharing code and hacks.

Finally! A Main Stream Use for Deep Learning!

Filed under: Deep Learning,Humor,Machine Learning — Patrick Durusau @ 7:45 pm

Using deep learning to generate offensive license plates by Jonathan Nolis.

From the post:

If you’ve been on the internet for long enough you’ve seen quality content generated by deep learning algorithms. This includes algorithms trained on band names, video game titles, and Pokémon. As a data scientist who wants to keep up with modern tends in the field, I figured there would be no better way to learn how to use deep learning myself than to find a fun topic to generate text for. After having the desire to do this, I waited for a year before I found just the right data set to do it,

I happened to stumble on a list of banned license plates in Arizona. This list contains all of the personalized license plates that people requested but were denied by the Arizona Motor Vehicle Division. This dataset contained over 30,000 license plates which makes a great set of text for a deep learning algorithm. I included the data as text in my GitHub repository so other people can use it if they so choose. Unfortunately the data is from 2012, but I have an active Public Records Request to the state of Arizona for an updated list. I highly recommend you look through it, it’s very funny.

What a great idea! Not only are you learning deep learning but you are being offensive at the same time. A double-dipper!

A script for banging against your state license registration is left as an exercise for the reader.

A password generator using phonetics to spell offensive phrases for c-suite users would be nice.

February 5, 2018

Balisage: The Markup Conference 2018 – 77 Days To Paper Submission Deadline!

Filed under: Conferences,XML,XML Schema,XPath,XQuery — Patrick Durusau @ 8:46 pm

Call for Participation

Submission dates/instructions have dropped!

When:
Dates:

  • 22 March 2018 — Peer review applications due
  • 22 April 2018 — Paper submissions due
  • 21 May 2018 — Speakers notified
  • 8 June 2018 — Late-breaking News submissions due
  • 15 June 2018 — Late-breaking News speakers notified
  • 6 July 2018 — Final papers due from presenters of peer reviewed papers
  • 6 July 2018 — Short paper or slide summary due from presenters of late-breaking news
  • 30 July 2018 — Pre-conference Symposium
  • 31 July –3 August 2018 — Balisage: The Markup Conference
How:
Submit full papers in XML to info@balisage.net
See the pages Instructions for Authors and
Tag Set and Submission Guidelines for details.
Apply to the Peer Review panel

I’ve heard inability to submit valid markup counts in the judging of papers. That may just be rumor or it may be true. I suggest validating your submission.

You should be on the fourth or fifth draft of your paper by now, but be aware the paper submission deadline is April 22, 2018, or 77 days from today!

Looking forward to seeing exceptionally strong papers in the review process and being presented at Balisage!

New Draft Morphological Tags for MorphGNT

Filed under: Bible,Greek,Language — Patrick Durusau @ 8:22 pm

New Draft Morphological Tags for MorphGNT by James Tauber.

From the post:

At least going back to my initial collaboration with Ulrik Sandborg-Petersen in 2005, I’ve been thinking about how I would do morphological tags in MorphGNT if I were starting from scratch.

Much later, in 2014, I had some discussions with Mike Aubrey at my first SBL conference and put together a straw proposal. There was a rethinking of some parts-of-speech, handling of tense/aspect, handling of voice, handling of syncretism and underspecification.

Even though some of the ideas were more drastic than others, a few things have remained consistent in my thinking:

  • there is value in a purely morphological analysis that doesn’t disambiguate on syntactic or semantic grounds
  • this analysis does not need the notion of parts-of-speech beyond purely Morphological Parts of Speech
  • this analysis should not attempt to distinguish middles and passives in the present or perfect system

As part of the handling of syncretism and underspecification, I had originally suggested a need for a value for the case property that didn’t distinguish nominative and accusative and a need for a value for the gender property like “non-neuter”.

If you are interested in language encoding, Biblical Greek, or morphology, Tauber has a project for you!

Be forewarned that what you tag has a great deal to do with what you can and/or will see. You have been warned.

Enjoy!

Unfairness By Algorithm

Filed under: Bias,Computer Science — Patrick Durusau @ 5:40 pm

Unfairness By Algorithm: Distilling the Harms of Automated Decision-Making by Lauren Smith.

From the post:

Analysis of personal data can be used to improve services, advance research, and combat discrimination. However, such analysis can also create valid concerns about differential treatment of individuals or harmful impacts on vulnerable communities. These concerns can be amplified when automated decision-making uses sensitive data (such as race, gender, or familial status), impacts protected classes, or affects individuals’ eligibility for housing, employment, or other core services. When seeking to identify harms, it is important to appreciate the context of interactions between individuals, companies, and governments—including the benefits provided by automated decision-making frameworks, and the fallibility of human decision-making.

Recent discussions have highlighted legal and ethical issues raised by the use of sensitive data for hiring, policing, benefits determinations, marketing, and other purposes. These conversations can become mired in definitional challenges that make progress towards solutions difficult. There are few easy ways to navigate these issues, but if stakeholders hold frank discussions, we can do more to promote fairness, encourage responsible data use, and combat discrimination.

To facilitate these discussions, the Future of Privacy Forum (FPF) attempted to identify, articulate, and categorize the types of harm that may result from automated decision-making. To inform this effort, FPF reviewed leading books, articles, and advocacy pieces on the topic of algorithmic discrimination. We distilled both the harms and potential mitigation strategies identified in the literature into two charts. We hope you will suggest revisions, identify challenges, and help improve the document by contacting lsmith@fpf.org. In addition to presenting this document for consideration for the FTC Informational Injury workshop, we anticipate it will be useful in assessing fairness, transparency and accountability for artificial intelligence, as well as methodologies to assess impacts on rights and freedoms under the EU General Data Protection Regulation.

The primary attraction are two tables, Potential Harms from Automated Decision-Making and Potential Mitigation Sets.

Take the tables as a starting point for analysis.

Some “unfair” practices, such as increased auto insurance prices for night-shift workers, which results in differential access to insurance, is an actuarial question. Insurers are not public charities and can legally discriminate based on perceived risk.

« Newer PostsOlder Posts »

Powered by WordPress