Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 25, 2015

Regular Expression Crossword Puzzle

Filed under: Education,Humor,Puzzles,Regex,Regexes — Patrick Durusau @ 3:23 pm

Regular Expression Crossword Puzzle by Greg Grothaus.

From the post:

If you know regular expressions, you might find this to be geek fun. A friend of mine posted this, without a solution, but once I started working it, it seemed put together well enough it was likely solvable. Eventually I did solve it, but not before coding up a web interface for verifying my solution and rotating the puzzle in the browser, which I recommend using if you are going to try this out. Or just print it out.

It’s actually quite impressive of a puzzle in it’s own right. It must have taken a lot of work to create.

regexpuzzle

The image is a link to the interactive version with the rules.

Other regex crossword puzzle resources:

RegHex – An alternative web interface to help solve the MIT hexagonal regular expression puzzle.

Regex Cross­word – Starting with a tutorial, the site offers 9 levels/types of games, concluding with five (5) hexagonal ones (only a few blocks on the first one and increasingly complex).

Regex Crosswords by Nikola Terziev – Generates regex crosswords, only squares at the moment.

In case you need help with some of the regex puzzles, you can try: Awesome Regex – A collection of regex resources.

If you are really adventuresome, try Constraint Reasoning Over Strings (2003) by Keith Golden and Wanlin Pang.

Abstract:

This paper discusses an approach to representing and reasoning about constraints over strings. We discuss how many string domains can often be concisely represented using regular languages, and how constraints over strings, and domain operations on sets of strings, can be carried out using this representation.

Each regex clue you add is a constraint on all the intersecting cells. Your first regex clue is unbounded, but every clue after that has a constraint. Wait, that’s not right! Constraints arise only when cells governed by different regexes intersect.

Anyone interested in going beyond hexagons and/or 2 dimensions?

I first saw this in a tweet by Alexis Lloyd.

Santa Claus is Real

Filed under: Myth,Ontology,Philosophy — Patrick Durusau @ 2:13 pm

Santa Claus is Real by Johnathan Korman.

I won’t try to summarize Korman’s post but will quote a snippet to entice you to read it in full:


Santa Claus is as real as I am.

Santa is, in truth, more real than I am. He has a bigger effect on the world.

After all, how many people know Santa Claus? If I walk down Market Street in San Francisco, there’s a good chance that a few people will recognize me; I happen to be a distinctive-looking guy. There’s a chance that one or two of those people will even know my name and a few things about me, but the odds are greatly against it. But if Santa takes the same walk, everybody (or nearly everybody) will recognize him, know his name, know a number of things about him, even have personal stories about him. So who is more real?

Enjoy!

5 free tools for newsgathering on Instagram

Filed under: Journalism,News,Reporting — Patrick Durusau @ 1:51 pm

5 free tools for newsgathering on Instagram by Alastair Reid.

From the post:

Instagram yesterday announced that it now has more than 400 million users around the world — that’s a 100 million growth in the last nine months and almost 90 million more than Twitter.

Over half of these new additions live in Europe and Asia, according to a release from the Facebook-owned social network for photos and videos, with the most new users coming from Brazil, Japan and Indonesia.

So what does this mean for journalists? In short, there are even more sources for stories. The first images to appear online from the attack on a Tunisian beach resort in June were posted by an Instagram user, and were a valuable contribution to understanding the facts as news broke.

Instagram only launched a (somewhat limited) search function for its website in July and despite announcing new search tool Signal last week Facebook has yet to widely release the platform.
So In the mean time, here are some useful tools for finding newsworthy material on Instagram.

And remember, any newsworthy material found online should always be verified and used responsibly, subjects we will continue to cover here at First Draft.

I haven’t thought of Instagram as a source of technical information but if you were building a topic map of current events, it could well be a great source of visual information.

Try out the tools that Alastair mentions to see which ones suit your information gathering needs and style.

Coal in your stocking: Cybersecurity Act of 2015

Filed under: Cybersecurity,Security — Patrick Durusau @ 1:37 pm

How does the Cybersecurity Act of 2015 change the Internet surveillance laws? by Orin Kerr.

From the post:

The Omnibus Appropriations Act that President Obama signed into law last week has a provision called the Cybersecurity Act of 2015. The Cyber Act, as I’ll call it, includes sections about Internet monitoring that modify the Internet surveillance laws. This post details those changes, focusing on how the act broadens powers of network operators to conduct surveillance for cybersecurity purposes. The upshot: The Cyber Act expands those powers in significant ways, although how far isn’t entirely clear.

Orin covers the present state of provider monitoring which sets a good background for the changes introduced by the Cybersecurity Act of 2015. He breaks the new authorizations into: monitoring, defensive measures and the ability to share data. If you are a policy wonk, definitely worth a read with an eye towards uncertainty and ambiguity in the new authorizations.

It isn’t clear how relevant Orin’s post is for law enforcement and intelligence agencies, since they have amply demonstrated their willingness to disobey the law and the lack of consequences for the same.

Service providers should be on notice from Orin’s post about the ambiguous parts of the act. On the other hand, Congress will grant retroactive immunity for law breaking at the instigation of law enforcement, so that ambiguity may or may not impact corporate policy.

Users: The Cybersecurity Act of 2015 is confirmation that the only person who can be trusted with your security is you. (full stop)

Facets for Christmas!

Filed under: XML,XPath,XQuery — Patrick Durusau @ 11:47 am

Facet Module

From the introduction:

Faceted search has proven to be enormously popular in the real world applications. Faceted search allows user to navigate and access information via a structured facet classification system. Combined with full text search, it provides user with enormous power and flexibility to discover information.

This proposal defines a standardized approach to support the Faceted search in XQuery. It has been designed to be compatible with XQuery 3.0, and is intended to be used in conjunction with XQuery and XPath Full Text 3.0.

Imagine my surprise when after opening Christmas presents with family to see a tweet by XQuery announcing yet another Christmas present:

“Facets”: A new EXPath spec w/extension functions & data models to enable faceted navigation & search in XQuery http://expath.org/spec/facet

The EXPath homepage says:

XPath is great. XPath-based languages like XQuery, XSLT, and XProc, are great. The XPath recommendation provides a foundation for writing expressions that evaluate the same way in a lot of processors, written in different languages, running in different environments, in XML databases, in in-memory processors, in servers or in clients.

Supporting so many different kinds of processor is wonderful thing. But this also contrains which features are feasible at the XPath level and which are not. In the years since the release of XPath 2.0, experience has gradually revealed some missing features.

EXPath exists to provide specifications for such missing features in a collaborative- and implementation-independent way. EXPath also provides facilities to help and deliver implementations to as many processors as possible, via extensibility mechanisms from the XPath 2.0 Recommendation itself.

Other projects exist to define extensions for XPath-based languages or languages using XPath, as the famous EXSLT, and the more recent EXQuery and EXProc projects. We think that those projects are really useful and fill a gap in the XML core technologies landscape. Nevertheless, working at the XPath level allows common solutions when there is no sense in reinventing the wheel over and over again. This is just following the brilliant idea of the W3C’s XSLT and XQuery working groups, which joined forces to define XPath 2.0 together. EXPath purpose is not to compete with other projects, but collaborate with them.

Be sure to visit the resources page. It has a manageable listing of processors that handle extensions.

What would you like to see added to XPath?

Enjoy!

December 24, 2015

Everything You Know About Latency Is Wrong

Filed under: Computer Science,Design,Statistics — Patrick Durusau @ 9:15 pm

Everything You Know About Latency Is Wrong by Tyler Treat.

From the post:

Okay, maybe not everything you know about latency is wrong. But now that I have your attention, we can talk about why the tools and methodologies you use to measure and reason about latency are likely horribly flawed. In fact, they’re not just flawed, they’re probably lying to your face.

When I went to Strange Loop in September, I attended a workshop called “Understanding Latency and Application Responsiveness” by Gil Tene. Gil is the CTO of Azul Systems, which is most renowned for its C4 pauseless garbage collector and associated Zing Java runtime. While the workshop was four and a half hours long, Gil also gave a 40-minute talk called “How NOT to Measure Latency” which was basically an abbreviated, less interactive version of the workshop. If you ever get the opportunity to see Gil speak or attend his workshop, I recommend you do. At the very least, do yourself a favor and watch one of his recorded talks or find his slide decks online.

The remainder of this post is primarily a summarization of that talk. You may not get anything out of it that you wouldn’t get out of the talk, but I think it can be helpful to absorb some of these ideas in written form. Plus, for my own benefit, writing about them helps solidify it in my head.

Great post, not only for the discussion of latency but for two extensions to the admonition (Moon is a Harsh Mistress) “Always cut cards:”

  • Always understand the nature of your data.
  • Always understand the nature your methodology.

If you fail at either of those, the results presented to you or that you present to others may or may not be true, false or irrelevant.

Treat’s post is just one example in a vast sea of data and methodologies which are just as misleading if not more so.

If you need motivation to put in the work, how’s your comfort level with being embarrassed in public? Like someone demonstrating your numbers are BS.

76 Viral Images From 2015 That Were Totally Fake

Filed under: Humor — Patrick Durusau @ 8:37 pm

76 Viral Images From 2015 That Were Totally Fake by Matt Novak.

From the post:

We debunked dozens of fake photos this year, covering everything from Charles Manson’s baby photos to John Lennon’s skateboarding skills, and everything in between. It was another busy year for anyone spreading fake images on the internet.

Below, we have 76 photos that you may have seen floating around the internet in 2015. Some are deliberate photoshops created by people who want to deceive. Others are just images that got mixed up in this big, weird game of Telephone we call the internet.

If you get to the bottom of the list and you’re hungry for even more fakes, check out last year’s round-up.

Very amusing when viewed on a large screen TV! And, you may be helping family and friends avoid being taken in by the same or similar images in 2016.

Did you fall for any of these images in 2015?

Neural Networks, Recognizing Friendlies, $Billions; Friendlies as Enemies, $Priceless

Filed under: Ethics,Image Recognition,Machine Learning,Neural Networks — Patrick Durusau @ 5:21 pm

Elon Musk merits many kudos for the recent SpaceX success.

At the same time, Elon has been nominated for Luddite of the Year, along with Bill Gates and Stephen Hawking, for fanning fears of artificial intelligence.

One favorite target for such fears are autonomous weapons systems. Hannah Junkerman annotated a list of 18 posts, articles and books on such systems for Just Security.

While moralists are wringing their hands, military forces have not let grass grow under their feet with regard to autonomous weapon systems. As Michael Carl Haas reports in Autonomous Weapon Systems: The Military’s Smartest Toys?:

Military forces that rely on armed robots to select and destroy certain types of targets without human intervention are no longer the stuff of science fiction. In fact, swarming anti-ship missiles that acquire and attack targets based on pre-launch input, but without any direct human involvement—such as the Soviet Union’s P-700 Granit—have been in service for decades. Offensive weapons that have been described as acting autonomously—such as the UK’s Brimstone anti-tank missile and Norway’s Joint Strike Missile—are also being fielded by the armed forces of Western nations. And while governments deny that they are working on armed platforms that will apply force without direct human oversight, sophisticated strike systems that incorporate significant features of autonomy are, in fact, being developed in several countries.

In the United States, the X-47B unmanned combat air system (UCAS) has been a definite step in this direction, even though the Navy is dodging the issue of autonomous deep strike for the time being. The UK’s Taranis is now said to be “merely” semi-autonomous, while the nEUROn developed by France, Greece, Italy, Spain, Sweden and Switzerland is explicitly designed to demonstrate an autonomous air-to-ground capability, as appears to be case with Russia’s MiG Skat. While little is known about China’s Sharp Sword, it is unlikely to be far behind its competitors in conceptual terms.

The reasoning of military planners in favor of autonomous weapons systems isn’t hard to find, especially when one article describes air-to-air combat between tactically autonomous and machine-piloted aircraft versus piloted aircraft this way:


This article claims that a tactically autonomous, machine-piloted aircraft whose design capitalizes on John Boyd’s observe, orient, decide, act (OODA) loop and energy-maneuverability constructs will bring new and unmatched lethality to air-to-air combat. It submits that the machine’s combined advantages applied to the nature of the tasks would make the idea of human-inhabited platforms that challenge it resemble the mismatch depicted in The Charge of the Light Brigade.

Here’s the author’s mock-up of sixth-generation approach:

fighter-six-generation

(Select the image to see an undistorted view of both aircraft.)

Given the strides being made on the use of neural networks, I would be surprised if they are not at the core of present and future autonomous weapons systems.

You can join the debate about the ethics of autonomous weapons but the more practical approach is to read How to trick a neural network into thinking a panda is a vulture by Julia Evans.

Autonomous weapon systems will be developed by a limited handful of major military powers, at least at first, which means the market for counter-measures, such as turning such weapons against their masters, will bring a premium price. Far more than the offensive development side. Not to mention there will be a far larger market for counter-measures.

Deception, one means of turning weapons against their users, has a long history, not the earliest of which is the tale of Esau and Jacob (Genesis, chapter 26):

11 And Jacob said to Rebekah his mother, Behold, Esau my brother is a hairy man, and I am a smooth man:

12 My father peradventure will feel me, and I shall seem to him as a deceiver; and I shall bring a curse upon me, and not a blessing.

13 And his mother said unto him, Upon me be thy curse, my son: only obey my voice, and go fetch me them.

14 And he went, and fetched, and brought them to his mother: and his mother made savoury meat, such as his father loved.

15 And Rebekah took goodly raiment of her eldest son Esau, which were with her in the house, and put them upon Jacob her younger son:

16 And she put the skins of the kids of the goats upon his hands, and upon the smooth of his neck:

17 And she gave the savoury meat and the bread, which she had prepared, into the hand of her son Jacob.

Julia’s post doesn’t cover the hard case of seeing Jacob as Esau up close but in a battle field environment, the equivalent of mistaking a panda for a vulture, may be good enough.

The primary distinction that any autonomous weapons system must make is the friendly/enemy distinction. The term “friendly fire” was coined to cover cases where human directed weapons systems fail to make that distinction correctly.

The historical rate of “friendly fire” or fratricide is 2% but Mark Thompson reports in The Curse of Friendly Fire, that the actual fratricide rate in the 1991 Gulf war was 24%.

#Juniper, just to name one recent federal government software failure, is evidence that robustness isn’t an enforced requirement for government software.

Apply that lack of requirements to neural networks in autonomous weapons platforms and you have the potential for both developing and defeating autonomous weapons systems.

Julia’s post leaves you a long way from defeating an autonomous weapons platform but it is a good starting place.

PS: Defeating military grade neural networks will be good training for defeating more sophisticated ones used by commercial entities.

December 23, 2015

24 Pull Requests: The journalism edition

Filed under: Journalism,News,Reporting — Patrick Durusau @ 11:43 am

24 Pull Requests: The journalism edition by Melody Kramer

From the post:

Over 10,000 people around the world are currently taking part in an annual event called 24 Pull Requests.

24 Pull Requests, which is held each December, asks developers, designers, content creators, and others to give thanks for open source software by “giving back little gifts of code for Christmas.” The idea is simple: over the first 24 days of December, participants are asked to improve code quality and documentation, fix issues and bugs or add missing features to existing projects.

I looked at the list of suggested projects for people to work on and didn’t see any journalism projects listed (though I did see that journalists from the Financial Times have signed up to contribute to other projects.)

It would be worthwhile for journalism organizations to specifically point out ways that people can contribute meaningfully through open source submissions. It’s a way to deepen connections with existing audiences, build connections with new audiences and ask for help more broadly than content submissions. (I recommend that you tag GitHub issues that are appropriate for outside contributors with a “help wanted” tag or a “good for new contributors” tag so that they can be easily surfaced.)

To help out participants in this year’s 24 Pull Requests, I decided to make a list of some of the most interesting open source journalism-related projects from 2015 that they can learn from, improve, use and/or contribute to. I’ll admit: I’ve made the list, but haven’t checked it twice. If there are other projects that fit the bill, please add them in the comments. (And if one of your resolutions is to learn more about this kind of stuff, you can start with my guide to learning more about GitHub and open source projects.)

Melody has a great list of projects and places where you can find journalism projects.

Useful for finding places to contribute as well as finding communities building new tools.

Will try to give you the full 24 days next year to contribute! (Nothing prevents you from contributing at other times of the year as well.)

Enjoy!

PS: Reaching out to unknown others involves risk. But the rewards, community, contribution to a common cause, broadening your view of the world, are orders of magnitude greater than the risk. Take a chance, reach out to a new project this year.

Calendar of Inquisitions Post Mortem, Volume 15, Richard II

Filed under: History,Library — Patrick Durusau @ 10:59 am

Calendar of Inquisitions Post Mortem, Volume 15, Richard II by By M. C. B. Dawes, A. C. Wood and D. H. Gifford. (Covers the years 1 to 7 in the reign of Richard II.).

From the homepage for the series:

An inquisition post mortem is a local enquiry into the lands held by a deceased individual, in order to discover any income and rights due to the crown. Such inquisitions were only held when people were thought or known to have held lands of the crown. The records in this series relate to the City of London for the periods 1485-1561 and 1577-1603.

I admit that some of my posts have broader audiences than others but only British History Online could send this tweet:

BHO at the IHR ‏@bho_history 2h hours ago
One final new publication to keep you busy over the holiday: Calendar of Inquisitions Post Mortem vol 15. Enjoy! http://www.british-history.ac.uk/inquis-post-mortem/vol15 …
0 retweets 0 likes

Be sure to explore the British History Online (BHO). With a goal of creating access to printed primary and secondary sources from 1300 to 1800, the BHO site promises to be a rich source of historical data.

An XQuery Module For Simplifying Semantic Namespaces

Filed under: MarkLogic,Namespace,Semantics,XQuery — Patrick Durusau @ 10:26 am

An XQuery Module For Simplifying Semantic Namespaces by Kurt Cagle.

From the post:

While I enjoy working with the MarkLogic 8 server, there are a number of features about the semantics library there that I still find a bit problematic. Declaring namespaces for semantics in particular is a pain—I normally have trouble remembering the namespaces for RDF or RDFS or OWL, even after working with them for several years, and once you start talking about namespaces that are specific to your own application domain, managing this list can get onerous pretty quickly.

I should point out however, that namespaces within semantics can be very useful in helping to organize and design an ontology, even a non-semantic ontology, and as such, my applications tend to be namespace rich. However, when working with Turtle, Sparql, RDFa, and other formats of namespaces, the need to incorporate these namespaces can be a real showstopper for any developer. Thus, like any good developer, I decided to automate my pain points and create a library that would allow me to simplify this process.

The code given here is in turtle and xquery, but I hope to build out similar libraries for use in JavaScript shortly. When I do, I’ll update this article to reflect those changes.

If you are forced to use a MarkLogic 8 server, great post on managing semantic namespaces.

If you have a choice of tools, something to consider before you willingly choose to use a MarkLogic 8 server.

I first saw this in a tweet by XQuery.

10 Best Data Visualization Projects of 2015
[p-hacking]

Filed under: Graphics,Visualization — Patrick Durusau @ 10:11 am

10 Best Data Visualization Projects of 2015 by Nathan Yau.

From the post:

Fine visualization work was alive and well in 2015, and I’m sure we’re in for good stuff next year too. Projects sprouted up across many topics and applications, but if I had to choose one theme for the year, it’d have to be teaching, whether it be through explaining, simulations, or depth. At times it felt like visualization creators dared readers to understand data and statistics beyond what they were used to. I liked it.

These are my picks for the best of 2015. As usual, they could easily appear in a different order on a different day, and there are projects not on the list that were also excellent (that you can easily find in the archive).

Here we go.

As great selection but I would call your attention to Nathan’s Lessons in statistical significance, uncertainty, and their role in science.

It is a review of work on p-hacking, that is the manipulation of variables to get a low enough p-value to merit publication in a journal.

A fine counter to the notion that “truth” lies in data.

Nothing of the sort is the case. Data reports results based on the analysis applied to it. Nothing more or less.

What questions we ask of data, what data we choose as containing answers to those questions, what analysis we apply, how we interpret the results of our analysis, are all wide avenues for the introduction of unmeasured bias.

December 22, 2015

Not Our Backdoor? Gasp!

Filed under: Cybersecurity,Government,Security — Patrick Durusau @ 3:47 pm

US Gov’t Agencies Freak Out Over Juniper Backdoor; Perhaps They’ll Now Realize Why Backdoors Are A Mistake by Mike Masnick

From the post:

Last week, we wrote about how Juniper Networks had uncovered some unauthorized code in its firewall operating system, allowing knowledgeable attackers to get in and decrypt VPN traffic. While the leading suspect still remains the NSA, it’s been interesting to watch various US government agencies totally freak out over their own networks now being exposed:


The FBI is investigating the breach, which involved hackers installing a back door on computer equipment, U.S. officials told CNN. Juniper disclosed the issue Thursday along with an emergency security patch that it urged customers to use to update their systems “with the highest priority.”

The concern, U.S. officials said, is that sophisticated hackers who compromised the equipment could use their access to get into any company or government agency that used it.

One U.S. official described it as akin to “stealing a master key to get into any government building.”

And, yes, this equipment is used all throughout the US government:


Juniper sells computer network equipment and routers to big companies and to U.S. government clients such as the Defense Department, Justice Department, FBI and Treasury Department. On its website, the company boasts of providing networks that “US intelligence agencies require.”

Its routers and network equipment are widely used by corporations, including for secure communications. Homeland Security officials are now trying to determine how many such systems are in use for U.S. government networks.

As regular readers know, disclosure disrupts zero-day markets, but this is a case where I would favor short-term non-disclosure.

Non-disclosure to allow an informal networks of hackers to drain as much information from government sources as their encrypted AWS storage could hold. Not bothering to check the data, just sucking down whatever is within reach. Any government, any network.

That happy state of affairs didn’t happen so you will have to fall back on poor patch maintenance and after all, it is the holidays. The least senior staffers will be in charge, if even them, after all, their rights come before patch maintenance.

Just guessing, I would say you have until March before most of the holes close up, possibly longer. BTW, that’s March of 2017. Given historical patch behavior.

What stories are you going to find because of this backdoor? Make them costly to the government in question. Might disabuse them of favoring backdoors.

Investigative Reporting in 2015:…

Filed under: Journalism,News,Reporting — Patrick Durusau @ 3:23 pm

Investigative Reporting in 2015: GIJN’s Top 12 Stories.

From the webpage:

As 2015 nears an end, we’d like to share our top 12 stories of the year — the stories that you, our dear readers, found most compelling. The list ranges from free data tools and crowdfunding to the secrets of the Wayback Machine. Please join us in taking a look at The Best of GIJN.org this year:

If you are not a regular follower of the Global Investigative Journalism Network (GIJN), perhaps these top 12 stories will change your reading habits in 2016.

As their name implies, the emphasis is on tools and techniques, not all of them digital, that are useful in uncovering, collecting, preparation and delivery of information some would prefer to keep secret.

In a phrase, investigative reporting.

Investigative reporting seems like a natural for the use of topic maps because concealed information is rarely accompanied by other data that gives it context and meaning.

Any number of major information leaks occurred in 2015, but how many of those were integrated with existing archives of information?

Or mapped in such a way that future researchers could put those leaks together with future releases of information?

The leaks themselves in 2015 have been titillating but hardly body blows to the intelligence community and its members.

As much as I admire investigative reporting, an “ooh, look at that…” reaction of the public is insufficient.

I want to see consequences, programs verified to be terminated, records destroyed, defunding, successful criminal prosecutions, contracts/political careers ended, blood on the street.

Anything less is another channel of the infotainment that passes for news in a media rich society.

Are you ready to take up the challenge of investigative reporting?

Investigative reporting that has consequences?

Consider adding topic maps to your arsenal of information weaponry for 2016.

Billion PC Hacking Target (Java SE Class Action Lawsuit?)

Filed under: Cybersecurity — Patrick Durusau @ 11:04 am

The Federal Trade Commission (FTC) has tagged Oracle for misleading consumers about the security of Java SE.

Brian Fung writes in Nearly a billion PCs run this notoriously insecure software. Now Oracle has to clean it up:

Oracle, one of the nation’s largest tech companies, is settling federal charges that it misled consumers about the security of its software, which is installed on roughly 850 million computers around the world.

The company won’t be paying a fine, and it isn’t admitting to any wrongdoing or fault in its settlement with the Federal Trade Commission. But Oracle will be required to tell consumers explicitly if they have outdated, insecure copies of the software — and to help them remove it.

The software, known as Java SE, helps power many of the features consumers expect to see when they browse the Web, from browser-based games to online chatrooms. But security experts say Java is notoriously vulnerable to attack. It has been linked to a staggering array of security flaws that can enable hackers to steal personal information from users, including the login information for people’s financial accounts, the FTC said.

When Oracle bought Java in 2010, it knew that Java was insecure, the FTC alleged in its initial complaint. Internal corporate records seized by the FTC noted that the “Java update mechanism is not aggressive enough or simply not working.”

Although the company issued updates to fix the vulnerabilities as they were discovered, the updates didn’t uninstall the older, problematic versions of Java, leaving them on the customer’s computer. Oracle never informed users of the fact, the FTC alleged, enabling hackers take advantage of those unpatched flaws.

Even though the FTC settlement does not carry any admission of wrongdoing or fault, there’s all that discovery already done by the FTC, would be a shame to see it go to waste.

Do you see a common law negligence claim against Oracle for knowing Java SE was insecure and taking no steps to cure the insecurity or even warn consumers of the security defect?

Using Federal Rule 23 as an example (most states follow Rule 23):

(a) Prerequisites. One or more members of a class may sue or be sued as representative parties on behalf of all members only if:

(1) the class is so numerous that joinder of all members is impracticable;

(2) there are questions of law or fact common to the class;

(3) the claims or defenses of the representative parties are typical of the claims or defenses of the class; and

(4) the representative parties will fairly and adequately protect the interests of the class.

Looks like we have:

  1. To numerous to join, I’d say almost 1 billion fits that requirement
  2. Common questions of law and fact, common law liability and common facts
  3. Typical claims and defenses (Oracle’s defense will be: “We have friends in government.”)
  4. Parties will fairly represent the class (pick your class reps carefull)

There are other class action suit requirements but on the surface of it, Oracle could be on its way to a very bad Christmas by 2016.

PS: One additional factor in favor of using Oracle as a software liability target is its personification in Larry Ellison. Ellison is no Scrooge but he isn’t a very sympathetic character. Having an arrogant defendant always helps in liability cases.

December 21, 2015

Maybe Corporations Aren’t Sovereign States After All

Filed under: EU,Government,Law — Patrick Durusau @ 8:11 pm

Revealed: how Google enlisted members of US Congress it bankrolled to fight $6bn EU antitrust case by Harry Davies.

From the post:

Google enlisted members of the US congress, whose election campaigns it had funded, to pressure the European Union to drop a €6bn antitrust case which threatens to decimate the US tech firm’s business in Europe.

The coordinated effort by senators and members of the House of Representatives, as well as by a congressional committee, formed part of a sophisticated, multimillion-pound lobbying drive in Brussels, which Google has significantly ramped up as it fends off challenges to its dominance in Europe.

An investigation by the Guardian into Google’s multifaceted lobbying campaign in Europe has uncovered fresh details of its activities and methods. Based on documents obtained under a freedom of information request and a series of interviews with EU officials, MEPs and Brussels lobbyists, the investigation has also found:

If you appreciate a tale of how a major corporation attempts to bully a sovereign government by buying up the support of another sovereign government, then this post by Harry Davies will be a great joy.

For the most part I’m not sympathetic to the EU’s complaints because it is attempting to create safe harbors for EU search companies to replicate what Google already offers. Why would anyone want more page-rank search engines is unknown. Been there, done that.

The EU could fund innovative research into the next-generation search technology and draw customers away from Google with better search results and the ad cash that goes with them.

Instead, the EU wants to hold Google back while inefficient and higher priced competitors bilk EU consumers. That hardly seems like a winning model for technological development.

Seat warmers in the EU will prattle on about privacy and other EU fictions in the actions against Goole.

Anyone who thinks removing search results from Google and only Google increases privacy is on par with Americans who fear terrorism. It’s some, as of yet to be diagnosed, mental disorder.

How people that ignorant reliably travel back and forth to work everyday is a tribute modern transportation systems.

Google should start doing rolling one-week Google blackouts across the EU. Paying penalties under SAAs and/or with lost revenue would be a small price to pay for rationality on the part of the EU.

The best defense against a monopoly is a better product than the monopoly, not the same product at a higher price from smaller EU vendors.

PS: You might want to notice the EU is trying to favor EU search vendors, not EU citizens, whatever they may claim to the contrary. Another commonality between governments.

EBCDIC to ASCII Conversion (Holiday Puzzler)

Filed under: Humor — Patrick Durusau @ 7:39 pm

EBCDIC to ASCII Conversion

From the webpage:

Last month, Gene Amdahl, an IBM fellow who was the chief architect of the legendary IBM 360 system, died at age 92.

In memory of his work, this month’s challenge focuses on the IBM-360 character set (EBCDIC):

Find a formula to convert the 52 EBCDIC letters into ASCII using no more than 4 operators.

See IBM Knowledge Center for the ASCII and EBCDIC character sets.

Supply your answer as a mathematical expression. For example, one can switch from lower-case ASCII to uppercase ASCII (and vice versa) using a single operation: f(x)=x xor 32.


Update (09/12):
You can use any reasonable operation (even trigonometric functions).

Update (13/12):
Use at most 4 operations, not 4 operations types. For example, the function x-floor((x>>4)*7.65-29), which correctly converts the upper case letters, uses five operation (2 subtractions, shift, multiplication, floor).


We will post the names of those who submit a correct, original solution! If you don’t want your name posted then please include such a statement in your submission!

We invite visitors to our website to submit an elegant solution. Send your submission to the ponder@il.ibm.com.

After you have exhausted arguing about religion, politics and sports teams, consider debating the best way to convert from EBCDIC to ASCII.

That should get your blood pumping!

Enjoy!

Natural England opens-up seabed datasets

Filed under: Environment,Oceanography,Open Data — Patrick Durusau @ 7:10 pm

Natural England opens-up seabed datasets by Hannah Ross.

From the post:

Following the Secretary of State’s announcement in June 2015 that Defra would become an open, data driven organisation we have been working hard at Natural England to start unlocking our rich collection of data. We have opened up 71 data sets, our first contribution to the #OpenDefra challenge to release 8000 sets of data by June 2016.

What is the data?

The data is primarily marine data which we commissioned to help identify marine protected areas (MPAs) and monitor their condition.

We hope that the publication of these data sets will help many people get a better understanding of:

  • marine nature and its conservation and monitoring
  • the location of habitats sensitive to human activities such as oil spills
  • the environmental impact of a range of activities from fishing to the creation of large marinas

The data is available for download on the EMODnet Seabed Habitats website under the Open Government Licence and more information about the data can be found at DATA.GOV.UK.

This is just the start…

Throughout 2016 we will be opening up lots more of our data, from species records to data from aerial surveys.

We’d like to know what you think of our data; please take a look and let us know what you think at OpenData@naturalengland.org.uk.

Image: Sea anemone (sunset cup-coral), Copyright (CC by-nc-nd 2.0) Natural England/Roger Mitchell 1978.

Great new data source and looking forward to more.

A welcome layer on this data would be, where possible, identification of activities and people responsible for degradation of sea anemone habitats.

Sea anemones are quite beautiful but lack the ability to defend against human disruption of those environment.

Preventing disruption of sea anemone habitats is a step forward.

Discouraging those who practice disruption of sea anemone habitats is another.

Encapsulation and Clojure – Part I

Filed under: Clojure,Programming — Patrick Durusau @ 6:48 pm

Encapsulation and Clojure – Part I by James Reeves.

From the post:

Encapsulation is a mainstay of object orientated programming, but in Clojure it’s often avoided. Why does Clojure steer clear of a concept that many programming languages consider to be best practice?

Err, because “best practices” may be required to “fix” problems baked into a language?

That would be my best guess.

D3 Maps without the Dirty Work

Filed under: D3,GIS,Mapping,Maps — Patrick Durusau @ 6:37 pm

D3 Maps without the Dirty Work by

From the post:

For those like me who aren’t approaching mapping in D3 with a GIS background in tow, you may find the propretary goe data structures hard to handle. Thankfully, Scott Murray lays out a simple process in his most recent course through JournalismCourses.org. By the time you are through reading this post you’ll have the guide post needed from mapping any of the data sets found on Natural Earths website in D3.

First in a series of posts on D3 rendering for maps. Layers of D3 renderings is coming up next.

Enjoy!

ggplot 2.0.0

Filed under: Ggplot2,Graphics,R,Visualization — Patrick Durusau @ 6:25 pm

ggplot 2.0.0 by Hadley Wickham.

From the post:

I’m very pleased to announce the release of ggplot2 2.0.0. I know I promised that there wouldn’t be any more updates, but while working on the 2nd edition of the ggplot2 book, I just couldn’t stop myself from fixing some long standing problems.

On the scale of ggplot2 releases, this one is huge with over one hundred fixes and improvements. This might break some of your existing code (although I’ve tried to minimise breakage as much as possible), but I hope the new features make up for any short term hassle. This blog post documents the most important changes:

  • ggplot2 now has an official extension mechanism.
  • There are a handful of new geoms, and updates to existing geoms.
  • The default appearance has been thoroughly tweaked so most plots should look better.
  • Facets have a much richer set of labelling options.
  • The documentation has been overhauled to be more helpful, and require less integration across multiple pages.
  • A number of older and less used features have been deprecated.

These are described in more detail below. See the release notes for a complete list of all changes.

It’s one thing to find an error in the statistics of a research paper.

It is quite another to visualize the error in a captivating way.

No guarantees for some random error but ggplot 2.0.0 is one of the right tools for such a job.

December 20, 2015

YC’s 2015 Reading List

Filed under: Books — Patrick Durusau @ 8:38 pm

YC’s 2015 Reading List

From the post:

Here is a roundup of some of the best books we at Y Com­bi­na­tor read in 2015 – some of them hap­pened to be pub­lished this year, but many of them were not. A big hat-tip to Bill Gates, whose leg­endary read­ing lists in­spired us to make one of our own.

Be not afraid!

There is no ordering by importance, topic or other metric.

Just a list of twenty (20) books that were enjoyed by the folks at Y Combinator.

I read recently that diverse inputs and opinions will make you smarter.

While I run that to ground, check you local library or bookstore for one or more of these volumes.

Data Science Ethics: Who’s Lying to Hillary Clinton?

Filed under: Data Science,Ethics — Patrick Durusau @ 8:19 pm

The usual ethics example for data science involves discrimination against some protected class. Discrimination on race, religion, ethnicity, etc., most if not all of which is already illegal.

That’s not a question of ethics, that’s a question of staying out of jail.

A better ethics example is to ask: Who’s lying to Hillary Clinton about back doors for encryption?

I ask because in the debate on December 19, 2015, Hillary says:

Secretary Clinton, I want to talk about a new terrorist tool used in the Paris attacks, encryption. FBI Director James Comey says terrorists can hold secret communications which law enforcement cannot get to, even with a court order.

You’ve talked a lot about bringing tech leaders and government officials together, but Apple CEO Tim Cook said removing encryption tools from our products altogether would only hurt law-abiding citizens who rely on us to protect their data. So would you force him to give law enforcement a key to encrypted technology by making it law?

CLINTON: I would not want to go to that point. I would hope that, given the extraordinary capacities that the tech community has and the legitimate needs and questions from law enforcement, that there could be a Manhattan-like project, something that would bring the government and the tech communities together to see they’re not adversaries, they’ve got to be partners.

It doesn’t do anybody any good if terrorists can move toward encrypted communication that no law enforcement agency can break into before or after. There must be some way. I don’t know enough about the technology, Martha, to be able to say what it is, but I have a lot of confidence in our tech experts.

And maybe the back door is the wrong door, and I understand what Apple and others are saying about that. But I also understand, when a law enforcement official charged with the responsibility of preventing attacks — to go back to our early questions, how do we prevent attacks — well, if we can’t know what someone is planning, we are going to have to rely on the neighbor or, you know, the member of the mosque or the teacher, somebody to see something.

CLINTON: I just think there’s got to be a way, and I would hope that our tech companies would work with government to figure that out. Otherwise, law enforcement is blind — blind before, blind during, and, unfortunately, in many instances, blind after.

So we always have to balance liberty and security, privacy and safety, but I know that law enforcement needs the tools to keep us safe. And that’s what i hope, there can be some understanding and cooperation to achieve.

Who do you think has told Secretary Clinton there is a way to have secure encryption and at the same time enable law enforcement access to encrypted data?

That would be a data scientist or someone posing as a data scientist. Yes?

I assume you have read: Keys Under Doormats: Mandating Insecurity by Requiring Government Access to All Data and Communications by H. Abelson, R. Anderson, S. M. Bellovin, J. Benaloh, M. Blaze, W. Diffie, J. Gilmore, M. Green, S. Landau, P. G. Neumann, R. L. Rivest, J. I. Schiller, B. Schneier, M. Specter, D. J. Weitzner.

Abstract:

Twenty years ago, law enforcement organizations lobbied to require data and communication services to engineer their products to guarantee law enforcement access to all data. After lengthy debate and vigorous predictions of enforcement channels “going dark,” these attempts to regulate security technologies on the emerging Internet were abandoned. In the intervening years, innovation on the Internet flourished, and law enforcement agencies found new and more effective means of accessing vastly larger quantities of data. Today, there are again calls for regulation to mandate the provision of exceptional access mechanisms. In this article, a group of computer scientists and security experts, many of whom participated in a 1997 study of these same topics, has convened to explore the likely effects of imposing extraordinary access mandates.

We have found that the damage that could be caused by law enforcement exceptional access requirements would be even greater today than it would have been 20 years ago. In the wake of the growing economic and social cost of the fundamental insecurity of today’s Internet environment, any proposals that alter the security dynamics online should be approached with caution. Exceptional access would force Internet system developers to reverse “forward secrecy” design practices that seek to minimize the impact on user privacy when systems are breached. The complexity of today’s Internet environment, with millions of apps and globally connected services, means that new law enforcement requirements are likely to introduce unanticipated, hard to detect security flaws. Beyond these and other technical vulnerabilities, the prospect of globally deployed exceptional access systems raises difficult problems about how such an environment would be governed and how to ensure that such systems would respect human rights and the rule of law.

Whether you agree on policy grounds about back doors to encryption or not, is there any factual doubt that back doors to encryption leave users insecure?

That’s an important point because Hillary’s data science advisers should have clued her in that her position is factually false. With or without a “Manhattan Project.”

Here are the ethical questions with regard to Hillary’s position on back doors for encryption:

  1. Did Hillary’s data scientist(s) tell her that access by the government to encrypted data means no security for users?
  2. What ethical obligations do data scientists have to advise public office holders or candidates that their positions are at variance with known facts?
  3. What ethical obligations do data scientists have to caution their clients when they persist in spreading mis-information, in this case about encryption?
  4. What ethical obligations do data scientists have to expose their reports to a client outlining why the client’s public position is factually false?

Many people will differ on the policy question of access to encrypted data but that access to encrypted data weakens the protection for all users is beyond reasonable doubt.

If data scientists want to debate ethics, at least make it about an issue with consequences. Especially for the data scientists.

Questions with no risk aren’t ethics questions, they are parlor entertainment games.

PS: Is there an ethical data scientist in the Hillary Clinton campaign?

Science Bowl [Different from the Quiche Bowl?]

Filed under: Contest,Machine Learning — Patrick Durusau @ 4:47 pm

Even basic cable has an overwhelming number “bowl” (American football) games. Mostly corporate sponsor names although the Cure Bowl” was sponsored by AutoNation. It’s for a worthy cause (breast cancer research) but that isn’t obvious from a TV listing.

If you aren’t interested in encouraging physical injuries, including concussions, you have to look elsewhere for bowl game excitement.

Have you considered the Second Annual Data Science Bowl?

From the web page:

We all have a heart. Although we often take it for granted, it’s our heart that gives us the moments in life to imagine, create, and discover. Yet cardiovascular disease threatens to take away these moments. Each day, 1,500 people in the U.S. alone are diagnosed with heart failure—but together, we can help. We can use data science to transform how we diagnose heart disease. By putting data science to work in the cardiology field, we can empower doctors to help more people live longer lives and spend more time with those that they love.

Declining cardiac function is a key indicator of heart disease. Doctors determine cardiac function by measuring end-systolic and end-diastolic volumes (i.e., the size of one chamber of the heart at the beginning and middle of each heartbeat), which are then used to derive the ejection fraction (EF). EF is the percentage of blood ejected from the left ventricle with each heartbeat. Both the volumes and the ejection fraction are predictive of heart disease. While a number of technologies can measure volumes or EF, Magnetic Resonance Imaging (MRI) is considered the gold standard test to accurately assess the heart’s squeezing ability.

Print

The challenge with using MRI to measure cardiac volumes and derive ejection fraction, however, is that the process is manual and slow. A skilled cardiologist must analyze MRI scans to determine EF. The process can take up to 20 minutes to complete—time the cardiologist could be spending with his or her patients. Making this measurement process more efficient will enhance doctors’ ability to diagnose heart conditions early, and carries broad implications for advancing the science of heart disease treatment.

The 2015 Data Science Bowl challenges you to create an algorithm to automatically measure end-systolic and end-diastolic volumes in cardiac MRIs. You will examine MRI images from more than 1,000 patients. This data set was compiled by the National Institutes of Health and Children’s National Medical Center and is an order of magnitude larger than any cardiac MRI data set released previously. With it comes the opportunity for the data science community to take action to transform how we diagnose heart disease.

This is not an easy task, but together we can push the limits of what’s possible. We can give people the opportunity to spend more time with the ones they love, for longer than ever before.

Timeline:

  • February 29, 2016 – First submission and team merger deadline. Your team must make its first submission by this deadline. This is also the last day you may merge with another team.
  • March 7, 2016 – Stage one deadline and stage two data release. Your model must be finalized and uploaded to Kaggle by this deadline. After this deadline, the test set is released, the answers to the validation set are released, and participants make predictions on the test set.
  • March 14, 2016 – Final submission deadline.

Motivations (in no particular order):

  • Bragging rights!
  • Experience with complex data modeling problem.
  • Prizes:
    • 1st place – $125,000
    • 2nd place – $50,000
    • 3rd place – $25,000
  • Substantial contribution to bioinformatics/heart research

I first saw this in a tweet by Kirk Borne.

‘*Star Wars Spoiler*’ Security

Filed under: Humor,Politics,Security — Patrick Durusau @ 10:57 am

ISIS Secures Comms By Putting ‘*Star Wars Spoiler*’ Before Every Message.

From the post:

The Islamic State has developed a new, incredibly effective way to safeguard their communications, according to intelligence sources. By putting the phrase “Star Wars Spoiler” in message headers, the group has essentially eliminated any chance of their messages being read by United States intelligence services even if they are intercepted.

“It’s been three days since any of us have had any intelligence at all on ISIS maneuvers and plans,” Capt. Mark Newman, Army intelligence officer, said in an interview. “We’re trying to put people who have seen the movie on the rotator out to the sandbox, but that’s pretty much making everyone lie about whether or not they’d seen Episode VII.”

Reporters have been unable to see any of the classified intelligence reports, not because Edward Snowden didn’t leak them, but because much of the staff has not seen Episode VII yet either. The ISIS Twitter account, however, was more difficult to avoid looking at.

More effective than former Queen Hillary’s position that wishes trump (sorry) known principles of cryptography:

It doesn’t do anybody any good if terrorists can move toward encrypted communication that no law enforcement agency can break into before or after. There must be some way. I don’t know enough about the technology, Martha, to be able to say what it is, but I have a lot of confidence in our tech experts. (Last Democrat sham debate of 2015)

At a minimum, that’s dishonest and at maximum, delusional. Stalin was the same way about genetics you recall.

If Hillary can lie to herself and the American public about encryption, ask yourself what else is she willing to lie about?

December 19, 2015

Commenting on PubMed: A Successful Pilot [What’s Different For Comments On News?]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 9:54 pm

Commenting on PubMed: A Successful Pilot

From the post:

We are pleased to announce that PubMed Commons is here to stay! After developing and piloting the core commenting system for PubMed, a pilot of journal clubs was added. And we have completed a major internal evaluation of the use of the Commons. We aim to publish that soon, so stay tuned to this blog or Twitter for news on that.

PubMed Commons provides a forum for scientific discourse that is integrated with PubMed, a major database of citations to the biomedical literature. Any author of a publication in PubMed is eligible to join and post comments to any citation.

More than 9,500 authors have joined PubMed Commons – and they have posted over 4,000 comments to more than 3,300 publications, mostly on recent publications. Commenting has plateaued, so the volume is low. But the value of comments has remained high. And comments often attract a lot of attention.

Completely contrary behavior that media outlets have found for comments. Time To Rebrand Comments [The Rise of Editors?].

From Andrew Losowsky’s post:

It’s time to stop using the c-word. “The comment section” has moved in people’s minds from being an empty box on a website into a viper-filled pit of hell. We need to start again. We need to do better.

This change is necessary because most publishers haven’t understood the value of their communities and so have starved them of resources. We all know what happened next: Trolls and abusers delighted in placing the worst of their words beneath the mastheads of respectable journalism, and overwhelmed the conversation. “Don’t read the comments” became a mantra.

A “viper-filled pit of hell,” isn’t what the PubMed Commons Team encountered.

What’s the difference?

I haven’t even thought out an A/B test but some differences on the surface are:

  1. In order to comment, you have to be an author listed in PubMed or be invited by an author in PubMed.
  2. You need an My NCBI account in addition to the invitation.
  3. Comments can be moderated.

Some news outlets may reject qualification to comment, lack of anonymity and moderation in exchange for the creation of high-quality comments and communities around subject areas.

But, then not everyone want to be Fox News. Yes?

December 18, 2015

Cybersecurity Act of 2015 – Text

Filed under: Cybersecurity,Government,Law,Law - Sources,Privacy — Patrick Durusau @ 8:41 pm

Coverage of the “omnibus” bill and the Cybersecurity Act of 2015 has been everywhere on the Web but nary a pointer to the text passed by Congress.

Wouldn’t you rather read the text for yourself than have it summarized?

At this point, the only text I can point you to is in the Congressional Record for December 17, 2015.

The Cybersecurity Act of 2015 is in subsection N, which begins on page H9631, last column on your right and continues to the top of the last column to your right on page H9645.

Please ask media outlets, bloggers and others to include pointers to court decisions, legislation, etc. with their stories.

It’s a small thing but a big step towards an interconnected web of information, as opposed to the current disconnected web.

Clojure Design Patterns

Filed under: Clojure,Design Patterns — Patrick Durusau @ 4:42 pm

Clojure Design Patterns by Mykhailo Kozik (Misha).

From the webpage:

Quick overview of the classic Design Patterns in Clojure.

Disclaimer: Most patterns are easy to implement because we use dynamic typing, functional programming and, of course, Clojure. Some of them look wrong and ugly. It’s okay. All characters are fake, coincidences are accidental.

In many places this is the last weekend before Christmas, which means both men and women will have lots of down time waiting on others in shopping malls.

That is the one time when I could see a good-sized mobile device being useful, assuming the mall had good Wifi.

In case yours does and you have earplugs for the background music/noise, you may enjoy pulling this up to read.

I first saw this in a tweet by Atabey Kaygun.

Time To Rebrand Comments [The Rise of Editors?]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 2:38 pm

Time To Rebrand Comments by Andrew Losowsky.

From the post:

It’s time to stop using the c-word. “The comment section” has moved in people’s minds from being an empty box on a website into a viper-filled pit of hell. We need to start again. We need to do better.

This change is necessary because most publishers haven’t understood the value of their communities and so have starved them of resources. We all know what happened next: Trolls and abusers delighted in placing the worst of their words beneath the mastheads of respectable journalism, and overwhelmed the conversation. “Don’t read the comments” became a mantra.

Little surprise that some publishers have chosen to close down, or highly restrict, their comment spaces.

In 2016, publishers are going to make a mental shift away from “comments” and towards “contributions.” They’re going to do this because engaging their communities towards contributions is the best way to surface exclusive content, to get closer to the audience and their needs, to make people feel more connected to the brand, to correct errors, to add new voices, and to get ahead of stories. The business, the journalism, and the ethics of the newsroom all depend on it.

Andrew has several questions that will confront publishers in the transition from comments to contributions.

In addition to those, I would pose this one:

Will editors arise to extract and shape contributions from users?

The inability of comments, much like email discussion lists, to organize themselves into useful threads is well known. Whatever name is given to user submissions, I don’t know of any evidence that will change in the future.

We have all seen “me too” comments, along with comments that repeat content found earlier in the thread, not to mention asides between readers that have little if any relevancy to the original story.

Imagine an editor that dedupes facts submitted in comments, eliminates “me too” and “me against” comments, asides in threads, and who creates annotations to the original story, crediting readers as appropriate.

Andrew is the project lead at The Coral Project, which:

We are creating open-source tools and resources for publishers of all sizes to build better communities around their journalism.

We also collect, support, and share practices, tools, and studies to improve communities on the web.

Editors can play a critical role in cultivating and building communities around journalistic content. Contributors will be distinguished by their analysis and/or content being incorporated and credited to them as sources.

Anticipating push back from the “I want to say whatever I want” crowd, be mindful that:

The right to speak does not imply an obligation to listen.

Communities, just like individuals, can choose what is worthy of their attention and what is not.

There’s More Than One Kind Of Reddit Comment?

Filed under: Natural Language Processing,Sentiment Analysis — Patrick Durusau @ 11:52 am

‘Sarcasm detection on Reddit comments’

Contest ends: 15th of February, 2016.

From the webpage:

Sentiment analysis is a fairly well-developed field, but on the Internet, people often don’t say exactly what they mean. One of the toughest modes of communication for both people and machines to identify is sarcasm. Sarcastic statements often sound positive if interpreted literally, but through context and other cues the speaker indicates that they mean the opposite of what they say. In English, sarcasm is primarily communicated through verbal cues, meaning that it is difficult, even for native speakers, to determine it in text.

Sarcasm detection is a subtask of opinion mining. It aims at correctly identifying the user opinions expressed in the written text. Sarcasm detection plays a critical role in sentiment analysis by correctly identifying sarcastic sentences which can incorrectly flip the polarity of the sentence otherwise. Understanding sarcasm, which is often a difficult task even for humans, is a challenging task for machines. Common approaches for sarcasm detection are based on machine learning classifiers trained on simple lexical or dictionary based features. To date, some research in sarcasm detection has been done on collections of tweets from Twitter, and reviews on Amazon.com. For this task, we are interested in looking at a more conversational medium—comments on Reddit—in order to develop an algorithm that can use the context of the surrounding text to help determine whether a specific comment is sarcastic or not.

The premise of this competition is there is more than one kind of comment on Reddit, aside from sarcasm.

A surprising assumption I know but there you have it.

I wonder if participants will have to separate sarcastic + sexist, sarcastic + misogynistic, sarcastic + racist, sarcastic + abusive, into separate categories or will all sarcastic comments be classified as sarcasm?

I suppose the default case would be to assume all Reddit comments are some form of sarcasm and see how accurate that model proves to be when judged against the results of the competition.

Training data for sarcasm? Pointers anyone?

« Newer PostsOlder Posts »

Powered by WordPress