Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 12, 2017

Stanford CoreNLP – a suite of core NLP tools (3.7.0)

Filed under: Natural Language Processing,Stanford NLP,Text Analytics,Text Mining — Patrick Durusau @ 9:16 pm

Stanford CoreNLP – a suite of core NLP tools

The beta is over and Stanford CoreNLP 3.7.0 is on the street!

From the webpage:

Stanford CoreNLP provides a set of natural language analysis tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract particular or open-class relations between entity mentions, get quotes people said, etc.

Choose Stanford CoreNLP if you need:

  • An integrated toolkit with a good range of grammatical analysis tools
  • Fast, reliable analysis of arbitrary texts
  • The overall highest quality text analytics
  • Support for a number of major (human) languages
  • Available interfaces for most major modern programming languages
  • Ability to run as a simple web service

Stanford CoreNLP’s goal is to make it very easy to apply a bunch of linguistic analysis tools to a piece of text. A tool pipeline can be run on a piece of plain text with just two lines of code. CoreNLP is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and which should be disabled. Stanford CoreNLP integrates many of Stanford’s NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, the coreference resolution system, sentiment analysis, bootstrapped pattern learning, and the open information extraction tools. Moreover, an annotator pipeline can include additional custom or third-party annotators. CoreNLP’s analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.

What stream of noise, sorry, news are you going to pipeling into the Stanford CoreNLP framework?


Imagine a web service that offers levels of analysis alongside news text.

Or does the same with leaked emails and/or documents?

Interactive Color Wheel

Filed under: Graphics,Visualization — Patrick Durusau @ 9:05 pm

Interactive Color Wheel


You will need to visit this interactive color wheel to really appreciate its capabilities.

What I find most helpful is the display of hex codes for the colors. I can distinguish colors but getting the codes right can be a real challenge.


Inaugural Ball Cancellation!

Filed under: Politics,Protests — Patrick Durusau @ 5:26 pm

Your antipathy towards upcoming inaugural balls and work on possible blockades is having an impact!

The Arkansas Inaugural Ball has been cancelled do to “low demand.”

The listing of inauguration balls I pointed to yesterday has plenty of other places for blockades and other mischief.

Looking forward to the least attended inauguration in history, longest and largest traffic snarl in history and complete social disasters at the inauguration balls.

January 11, 2017

Flashing/Mooning For Inauguration Forecast

Filed under: Politics,Protests — Patrick Durusau @ 9:55 pm

You can find updated weather forecast for January 20, 2017, updated from my speculations in Blockading Washington – #DisruptJ20 – Unusual Tactic – Nudity in Angela Friz’s Here’s the first of what will surely be many inauguration weather forecasts.

Angela isn’t reporting sun-bathing weather but warm enough that a heavy coat over your birthday suit may be sufficient.

Of course, you could always build a fire in a trash barrel, something we are likely to see a lot of during the Trump presidency.

I’m sure other protesters, in the buff or not will appreciate the extra warmth.

Missing The Beltway Blockade? Considering Blockading A Ball?

Filed under: Data Science,Politics,Protests — Patrick Durusau @ 7:18 pm

For one reason or another, you may not be able to participate in a Beltway Blockade January 20, 2017, see:

Don’t Panic!

You can still enjoy a non-permitted protest and contribute to the least attended inauguration in history!

2017 Presidential Inaugural Balls

The list is short on location information for many of the scheduled balls but the Commander in Chief’s Ball, Presidential Inaugural Ball, Mid-Atlantic Inauguration Ball, Midwest Inaugural Ball, Western Inaugural Ball, and the Neighborhood Inaugural Ball, are all being held at the: Walter E. Washington Convention Center.

Apologies but I haven’t looked up prior attendance records but just based on known scheduling, disruption in the area of Walter E. Washington Convention Center looks like it will pay the highest returns.

For the balls with location information and/or location information that I can discover, I will post a fuller list with Google Map links tomorrow.

Oh, for inside protesting, here are floor plans of the Walter E. Washington Convention Center.

Those are the official, posted floor plans.

Should that link go dark, let me know. I have a backup copy of them. 😉

January 10, 2017

Overcoming Congressional Provincialism

Filed under: Government,Politics — Patrick Durusau @ 10:25 pm

While doing hard core data collection on members of Congress, I kept encountering:

Regrettably, I am unable to reply to any email from individuals residing outside of my congressional district.

The problem, of course, is that you may have an opinion on national intelligence but your representative, for example, isn’t on the intelligence committee.

What if you could identify and reach across congressional boundaries?

More on that tomorrow, alone with news of the data set that has distracted me for several days!

January 9, 2017

ANSWER Secures More “Permitted” Protest Space

Filed under: Government,Politics,Protests — Patrick Durusau @ 9:01 pm


The ANSWER Coalition has:

…secured another permitted assembly area for an even larger gathering site on the parade route, the Navy Memorial (8th St. and Pennsylvania Ave. NW).

See: ANSWER Coalition for details and ways to support.

If your not interested in “permitted” protesting, it could be the case that attendees and protesters alike find it difficult if not impossible to attend the inauguration. See protests for an ongoing series of speculations in that direction.

PS: I remember the Constitution reading:

Congress shall make no law … abridging…the right of the people peaceably to assemble…


Congress shall make no law … abridging…the right of the people to be permitted to assemble…

Do you?

January 7, 2017

> 3000 for 2017? – Defining Blockade Success

Filed under: Government,Politics,Protests — Patrick Durusau @ 4:41 pm

Trump Inauguration Planners Unveil Tickets, Map says:

About 3,000 people holding purple tickets got stuck on foot in the Interstate 395 tunnel when trying to attend President Barack Obama’s first inauguration, causing many of them to miss the ceremony. The tunnel was later nicknamed the Purple Tunnel of Doom.

Data science projects should define criteria for success, before the project starts.

That helps prevent management from moving the goal posts to claim victory where none exists and protects you from “but it doesn’t ….” when that feature wasn’t included in the criteria for success.

In efforts to #DisruptJ20 the Trump inauguration, it appears that at least 3K people must be prevented from reaching the inauguration.

For your planning purposes, the 2017 SWEARING IN CEREMONY INFORMATION (FAQ) advises:

What time should I get to the US Capitol for the ceremony?

The gates to the mall will most likely open at 5:00am, and the ticketed areas are usually filled by 8:00am. The ceremony will begin around 11:30am with a musical performance prior to that time.

Blockaders are in for a long night!

The rate of removal of cars that intentionally or unintentionally run out of gas, disabled vehicles (think flat tires), etc., is unknown.

As a guesstimate, I would say gridlock conditions starting around 3 AM and persisting until NOON, EST, would result in an inauguration to which a majority of the ticket holders did not attend.

I wonder if the news channels will focus more on protesters or empty bleachers? Guesses?

Implementing Indivisible – Early Difficulties

Filed under: Politics,Protests — Patrick Durusau @ 3:59 pm

Indivisible (Indivisible: A Practical Guide for Resisting the Trump Agenda recommends appearing at every public appearance of your representative.

The same logic to other representatives since their committees votes on issues that impact you.

Question: How do you find all the offices of members of the US House or Senate?

Answer: Not easily.

Take Senator Diane Feinstein for example.

First source gives:


Second Source

GPO Congressional Directory provides:


Third Source lists:


Local office information appears at Senator Diane Feinstein, but its listing varies from page to page, making automated extraction an iffy proposition.

Empowering Indivisible

Congress is sorely in need of a topic map for its members, that much is obvious.

What does the lack of an easy way to local office information suggest to you?

Would local office information improve your odds of contacting your own representative and others?

January 6, 2017

Online Database of “Verified” Twitter Accounts (Right On!)

Filed under: Twitter,Wikileaks — Patrick Durusau @ 4:12 pm

The WikiLeaks Task Force tweeted on 6 Jan. 2017:

We are thinking of making an online database with all “verified” twitter accounts & their family/job/financial/housing relationships.

There are a number of comments to this tweet, the ones containing “dox,” “doxx,” “doxing,” “creepy,” “evil,” etc. that should be ignored.

Ignored because intelligence agencies, news organizations, merchants, banks, etc. are all collecting and organizing that data and more.

Ignored because the public should not preemptively disarm itself.

If anything, the Wikileaks Task Force should start with “verified” Twitter accounts and expand outwards, rapidly.

The public should be able to rapidly find relationships of individuals nominated for office, who contribute money to candidates, who profit from contracts, who launder public money. The public should have the same advantages intelligence agencies enjoy today.

To the nay-sayers to the WikiLeaks Task Force proposal:

Why do you seek to prevent putting the public on a better footing vis-a-vis government?

Question to my readers: What do the nay-sayers gain from a disarmed public?

Three More Reasons To Learn R

Filed under: Facebook,Programming,R,Statistics,Twitter — Patrick Durusau @ 3:31 pm

Three reasons to learn R today by David Smith.

From the post:

If you're just getting started with data science, the Sharp Sight Labs blog argues that R is the best data science language to learn today.

The blog post gives several detailed reasons, but the main arguments are:

  1. R is an extremely popular (arguably the most popular) data progamming language, and ranks highly in several popularity surveys.
  2. Learning R is a great way of learning data science, with many R-based books and resources for probability, frequentist and Bayesian statistics, data visualization, machine learning and more.
  3. Python is another excellent language for data science, but with R it's easier to learn the foundations.

Once you've learned the basics, Sharp Sight also argues that R is also a great data science to master, even though it's an old langauge compared to some of the newer alternatives. Every tool has a shelf life, but R isn't going anywhere and learning R gives you a foundation beyond the language itself.

If you want to get started with R, Sharp Sight labs offers a data science crash course. You might also want to check out the Introduction to R for Data Science course on EdX.

Sharp Sight Labs: Why R is the best data science language to learn today, and Why you should master R (even if it might eventually become obsolete)

If you need more reasons to learn R:

  • Unlike Facebook, R isn’t a sinkhole of non-testable propositions.
  • Unlike Instagram, R is rarely NSFW.
  • Unlike Twitter, R is a marketable skill.

Glad to hear you are learning R!

January 5, 2017

ANSWER Protest Permit – Least Attended Inauguration in History

Filed under: Government,Politics,Protests — Patrick Durusau @ 8:46 pm

Permits secured for Jan. 20 Mass Protest at the Inauguration!

From the post:

The ANSWER Coaltion has a permit for 14th street and Pennsylvania Ave NW and a portion of Freedom Plaza, beginning at 7:00am on Jan. 20 for a protest that will continue throughout the day.

We are also continuing our long legal battle for additional permitted space along Pennsylvania Ave. on Inauguration Day. The National Park Service has stonewalled the issuance of additional permits in an attempt to sanitize the most visible and primary locations along the route from dissent and free speech activity. Additionally, we are continuing to challenge the illegal, unconstitutional system whereby NPS reserves large portions of the route and other parts of D.C. on Inauguration Day, and in the days and weeks prior to and after, on behalf of Trump’s Presidential Inaugural Committee.

We think it’s critically important for the people to not be intimidated, to not be silent and to use all public spaces to express themselves.

(emphasis in original)

All of that is true, but attending the ANSWER protest means you will be counted as “attending the inauguration.”

What if you took ANSWER’s later advice:

use all public spaces to express themselves.

With tailgate parties on the Beltway (Tailgating @DisruptJ20), and blockades of the same (Low Risk Blockading of the DC Beltway).

Secure President-elect Trump’s place in the history books, with the least attended inauguration in history.

You can be part of that historical event, while sitting on the Beltway out of gas.

Which is it? Do you want to be “in attendance” or “truant” for Trump’s inauguration?

Beall’s List of Predatory Publishers 2017 [Avoiding “fake” scholarship, journalists take note]

Filed under: Journalism,News,Peer Review,Publishing — Patrick Durusau @ 11:42 am

Beall’s List of Predatory Publishers 2017 by Jeffrey Beall.

From the webpage:

Each year at this time I formally announce my updated list of predatory publishers. Because the publisher list is now very large, and because I now publish four, continuously-updated lists, the annual releases do not include the actual lists but instead include statistical and explanatory data about the lists and links to them.

Jeffrey maintains four lists of highly questionable publishers/publications:

Beall’s list should be your first stop when an article arrives from an unrecognized publication.

Not that being published in Nature and/or Science is a guarantee of quality scholarship, but publication on Beall’s list should raise publication stopping red flags.

Such a publication could be true, but bears the burden of proving itself to be so.

January 4, 2017

BaseX – Bad News, Good News

Filed under: BaseX,XML,XQuery — Patrick Durusau @ 5:39 pm

Good news and bad news from BaseX. Christian Grün posted to the BaseX discussion list today:

Dear all,

This year, there will be no BaseX Preconference Meeting in XML Prague. Sorry for that! The reason is simple: We were too busy in order to prepare ourselves for the event, look for speakers, etc.

At least we are happy to announce that BaseX 8.6 will finally be released this month. Stay tuned!

All the best,


That’s disappointing but understandable. Console yourself with watching presentations from 2013 – 2016 and reviewing issues for 8.6.

Just a guess on my part, ;-), but I suspect more testing of BaseX builds would not go unappreciated.

Something to keep yourself busy while waiting for BaseX 8.6 to drop.

Eight Years of the Republican Weekly Address

Filed under: Government,Politics,Prediction,Social Media — Patrick Durusau @ 5:23 pm

We looked at eight years of the Republican Weekly Address by Jesse Rifkin.

From the post:

Every week since Ronald Reagan started the tradition in 1982, the president delivers a weekly address. And every week, the opposition party delivers an address as well.

What can the Weekly Republican Addresses during the Obama era reveal about how the GOP has attempted to portray themselves to the American public, by the public policy topics they discussed and the speakers they featured? To find out, GovTrack Insider analyzed all 407 Weekly Republican Addresses for which we could find data during the Obama era, the first such analysis of the weekly addresses as best we can tell. (See the full list of weekly addresses here.)

Sometimes they discuss the same topic as the president’s weekly address — particularly common if a noteworthy event occurs in the news that week — although other times it’s on an unrelated topic of the party’s choosing. It also features a rotating cast of Republicans delivering the speech, most of them congressional, unlike the White House which has almost always featured President Obama, with Vice President Joe Biden occasionally subbing in.

On the issues, we found that Republicans have almost entirely refrained from discussing such inflammatory social issues as abortion, guns, or same-sex marriage in their weekly addresses, despite how animating such issues are to their base. They also were remarkably silent on Donald Trump until the week before the election.

We also find that while Republicans often get slammed on women’s rights and minority issues, Republican congressional women and African Americans are at least proportionally represented in the weekly addresses, compared to their proportions in Congress, if not slightly over-represented — but Hispanics are notably under-represented.

You have seen credible claims of On Predicting Social Unrest Using Social Media by Rostyslav Korolov, et al., and less credible claims from others, CIA claims it can predict some social unrest up to 5 days ahead.

Rumor has it that the CIA has a Word template named, appropriately enough: theRussiansDidIt. I can neither confirm nor deny that rumor.

Taking credible actors at their word, are you aware of any parallel research on weekly addresses by Congress and following congressional action?

A very lite skimming of the literature on predicting Supreme Court decisions comes up with: Competing Approaches to Predicting Supreme Court Decision Making by Andrew D. Martin, Kevin M. Quinn, Theodore W. Ruger, and Pauline T. Kim (2004), Algorithm predicts US Supreme Court decisions 70% of time by David Kravets (2014), Fantasy Scotus (a Supreme Court fantasy league with cash prizes).

Congressional voting has been studied as well, for instance, Predicting Congressional Voting – Social Identification Trumps Party. (Now there’s an unfortunate headline for searchers.)

Congressional votes are important but so is the progress of bills, the order in which issues are addressed, etc., and it the reflection of those less formal aspects in weekly addresses from congress that could be interesting.

The weekly speeches may be as divorced from any shared reality as comments inserted in the Congressional Record. On the other hand, a partially successful model, other than the timing of donations, may be possible.

Q&A Cathy O’Neil…

Filed under: BigData,Ethics,Mathematics,Modeling,Statistics — Patrick Durusau @ 2:30 pm

Q&A Cathy O’Neil, author of ‘Weapons of Math Destruction,’ on the dark side of big data by Christine Zhang.

From the post:

Cathy O’Neil calls herself a data skeptic. A former hedge fund analyst with a PhD in mathematics from Harvard University, the Occupy Wall Street activist left finance after witnessing the damage wrought by faulty math in the wake of the housing crash.

In her latest book, “Weapons of Math Destruction,” O’Neil warns that the statistical models hailed by big data evangelists as the solution to today’s societal problems, like which teachers to fire or which criminals to give longer prison terms, can codify biases and exacerbate inequalities. “Models are opinions embedded in mathematics,” she writes.

Great interview that hits enough high points to leave you wanting to learn more about Cathy and her analysis.

On that score, try:

Read her mathbabe blog.

Follow @mathbabedotorg.

Read Weapons of math destruction : how big data increases inequality and threatens democracy.

Try her new business: ORCAA [O’Neil Risk Consulting and Algorithmic Auditing].

From the ORCAA homepage:

ORCAA’s mission is two-fold. First, it is to help companies and organizations that rely on time and cost-saving algorithms to get ahead of this wave, to understand and plan for their litigation and reputation risk, and most importantly to use algorithms fairly.

The second half of ORCAA’s mission is this: to develop rigorous methodology and tools, and to set rigorous standards for the new field of algorithmic auditing.

There are bright line cases, sentencing, housing, hiring discrimination where “fair” has a binding legal meaning. And legal liability for not being “fair.”

Outside such areas, the search for “fairness” seems quixotic. Clients are entitled to their definitions of “fair” in those areas.

FEMA – HOW-TO Demonize Your Opponents

Filed under: Government,Politics,Protests — Patrick Durusau @ 11:55 am

Beryl Lipton writes in FEMA Field Force manual offers protesters insights into the future of crowd control:

Though construction on the Dakota Access Pipeline has halted for now, the lessons for law enforcement and protesters are still percolating. For the former, they’ll likely find themselves one day studying the event as they prepare for future mass gatherings, maybe in a guide just like the one distributed by the Federal Emergency Management Agency (FEMA) to North Dakota law enforcement in September.

… (graphic of DHS omitted)

Obtained by Unicorn Riot via a request to the North Dakota Department of Corrections, an agency with far fewer individuals in its custody than attended the protest at Standing Rock, the manual is a Field Force Operations training program for students, a crash course in eight parts on how to deal with a mixed crowd of lawful and unlawful dissenters.

I extracted the Field Force Operations PER-200 manual, from the zip file posted at MuckRock for your reading/access convenience.

As a government training document, allegedly “our” government, the manual fails in a number of aspects.

Consider its efforts to demonize protesters:

b. Protesters. Not every protester is the same nor should be viewed the same by law enforcement. By better understanding protesters, law enforcement officers can make better choices on how to respond. A small group of unruly protesters can stand out from the peaceful majority—often comprised of others who just want to be there along with innocent bystanders accidentally caught in the melee.

(1) Everyday citizens. Most protests include everyday citizens gathering through their First Amendment right to peaceably make their voices heard (Driscoll, 2003).

(2) Professional protesters. These people train or are trained in protester tactics often by direct action organizations that promote two universal messages: First, intervention demands responsibility. Second, a smaller harm is acceptable if it prevents a greater harm. One interpretation of this second message is that it is acceptable for protesters to break laws they consider less important like vandalism to prevent a greater harm like environmental damage. Some activism organizations may produce booklets that demonstrate use or construction of devices, including the infamous Road Raging – Top Tips for Wrecking Roadbuilding (Road Alert!, 1997).

(3) Anarchists. These people aim to disrupt, often seeking to challenge authority and capitalism at any cost. They are frequently young college students who express themselves through the destruction of property. Anarchists may mix into peaceful protests despite the efforts of the nonviolent protesters to limit destructive activities—leading to fighting sometimes between protesters. One common anarchist technique is the black bloc (violent, destructive activity), demonstrated at the Occupy Seattle protests.
… (at page 106)

If you think that lacks a charitable attitude towards ordinary people out-raged as some government misconduct, consider this listing of the “types” of individual protesters:

(1) Impulsive. These short-tempered people are the kind who are always spoiling for a fight and only need a fancied insult or a slight provocation to excite them to violence or incite others to violence.

(2) Suggestible. People who get into the action early and are easily influenced to follow the lead of the more violent.

(3) Cautious. Individuals who wait for the cloak of anonymity to give them courage by hiding their identity.

(4) Yielders. Those who do not join the action until a large number of participants give the impression of universality. In other words, “Everyone is taking part, so why shouldn’t I?”

(5) Supportive. People who do not actively join the mob but who enjoy the show and even shout encouragement.

(6) Resisters. Persons whose standards of judgment are not swayed by the emotional frenzy of the mob but who maintain level heads. They can disagree with the actions of the majority.

(7) Psychopathic. Individuals with a pathological personality structure are angry at the world and seek to use a riotous situation as a means of getting even with society (FBI, 1967, p. 21).
… (page 108)

How’s that for a rhetorical move?

In three pages the reader is drawn from “everyday citizens” to a range of personality disorders that range up to the “psychopathic.”

Any reader instinctively feels a gathering of protesters is a boiling pot of crazy ready to explode.

A false worldview but one promoted by the FEMA manual.

Imagine you are a local law enforcement officer, with little or no personal experience with civil disobedience, being told by FEMA that protesters are the harbingers of chaos. What’s your reaction going to be?

It’s only one example but Julia Carrie Wong and Sam Levin report in: Standing Rock protesters hold out against extraordinary police violence:

Harkening back to an earlier era, when police in Birmingham, Alabama, attacked African American schoolchildren with dogs and high-pressure water hoses, North Dakota officers trained water cannons on hundreds of Dakota Access pipeline protesters.

On the night of 20 November, though, the temperature was below freezing and the protesters, who call themselves “water protectors”, were camping outdoors for the evening.

Water is just one many “less-than-lethal” munitions that have been trained against the activists.

“They seem to have almost an infinite arsenal of different types of weapons,” said Rachel Lederman, attorney for the National Lawyers Guild (NLG). “I don’t think local law enforcement understands how dangerous they are.”

Police have acknowledged using sponge rounds, bean bag rounds, stinger rounds, teargas grenades, pepper spray, Mace, Tasers and a sound weapon. The explosive teargas grenades in use at Standing Rock have been banned by some US law enforcement agencies because they indiscriminately spray people, Lederman said.

More than two dozen people were hospitalized and 300 injured during the conflict, according to the medic and healer council. One woman’s arm was nearly blown off, according to her father, and the complaint alleges that another woman was shot in the eye, resulting in the detachment of her retina and possible permanent blindness.

Question: Should “everyday citizens” be sprayed with water cannon in sub-zero weather and assaulted with sponge rounds, bean bag rounds, stinger rounds, teargas grenades, pepper spray, Mace, Tasers and a sound weapon?

That’s not a hard question is it?

I suspect every non-psychotic law enforcement officer at Standing Rock would answer no, just like you.

But Morton County sheriff Kyle Kirchmeier confirms the FEMA schooled view of law enforcement:

On Thanksgiving, Morton County sheriff Kyle Kirchmeier released a statement condemning the actions of “paid agitators and protesters” without offering any evidence that people were being paid to fight the pipeline. The department has not responded to requests to substantiate the claim.

In another statement that week, the sheriff said activists were not engaged in “civil disobedience” but were acting like “evil agitators”. The Mandan, North Dakota, police chief, Jason Ziegler, has asserted that law enforcement agencies “can use whatever force necessary to maintain peace”.

To be fair, numerous law enforcement agencies have declined to subscribe to this FEMA inspired madness, Sheriffs Across US Refusing to Send Police and Equipment to DAPL as Outrage and Costs Grow by Claire Bernish.

At least in this instance. When protests come closer to home is the real test of law enforcement avoiding the FEMA “…evil agitators….” psychosis.

Government training manuals that humanize protesters are less likely to result in protests being used as proving grounds for “less lethal” weapons.

Teaching police officers to see protesters as their kith and kin will make major strides in the humane treatment of protesters.

Police officers may realize they have more in common with protesters than with players far removed from consequences on the streets. (Is that what FEMA is trying to avoid?)

January 3, 2017

Sharpening Your Hacking Skills!

Filed under: Cybersecurity,Government,Security — Patrick Durusau @ 8:31 pm

40+ Intentionally Vulnerable Websites To Practice Your Hacking Skills.

From the post:

Attack is definitely the best form of defense and this also applies to Cyber Security.

Companies are now hacking their own websites and even hiring ethical hackers in an attempt to find vulnerabilities before the bad guys do. As such ethical hacking is now a much sought after skill but hacking websites without permission can get you on the wrong side of the law, even if you’re just practising.

So how do practice your hacking skills whilst staying on the right side of the law? Well there are a number of deliberately vulnerable websites out there designed to allow you to practise and hone your hacking skills, without fear of prosecution. So we’ve decided to compile a list of over forty of them, each with short description.

Once you feel comfortable finding vulnerabilities, the next step could be a job as a penetration tester or participation in one of the bug bounty programmes where companies reward you based on the severity of the bugs that you find, which could be very lucrative. Facebook is one such company offering a bug bounty programme and has paid out more than a million dollars to date.

So without further ado, here’s the list. If you know of a good hacking website that’s not on this list, let me know and I’ll add it. Oh, and don’t forget to bookmark this page! 🙂

Yes! Not only bookmark this page but visit the sites it lists!

My only disappointment was that the Office of Personnel Management wasn’t listed. I guess the OPM site is requiring permission for hacking now. 😉

The GRU-Ukraine Artillery Hack That May Never Have Happened

Filed under: Cybersecurity,Government,Politics — Patrick Durusau @ 8:15 pm

The GRU-Ukraine Artillery Hack That May Never Have Happened by Jeffrey Carr.

From the post:

Crowdstrike’s latest report regarding Fancy Bear contains its most dramatic and controversial claim to date; that GRU-written mobile malware used by Ukrainian artillery soldiers contributed to massive artillery losses by the Ukrainian military. “It’s pretty high confidence that Fancy Bear had to be in touch with the Russian military,” Dmitri Alperovich told Forbes. “This is exactly what the mission is of the GRU.”

Crowdstrike’s core argument has three premises:

  1. Fancy Bear (APT28) is the exclusive developer and user of X-Agent [1]
  2. Fancy Bear developed an X-Agent Android variant specifically to compromise an Android ballistic computing application called Попр-Д30.apk for the purpose of geolocating Ukrainian D-30 Howitzer artillery sites[2]
  3. The D-30 Howitzers suffered 80% losses since the start of the war.[3]

If all of these premises were true, then Crowdstrike’s prior claim that Fancy Bear must be affiliated with the GRU [4] would be substantially supported by this new finding. Dmitri referred to it in the PBS interview as “DNA evidence”.

In fact, none of those premises are supported by the facts. This article is a summary of the evidence that I’ve gathered during hours of interviews and background research with Ukrainian hackers, soldiers, and an independent analysis of the malware by CrySys Lab. My complete findings will be presented in Washington D.C. next week on January 12th at Suits and Spooks.

Sadly I won’t be in attendance but am looking forward to reports of Carr’s details on the alleged GRU-Ukraine hack.

Not that I am expecting the New York Times to admit the Russian hacking of the 2016 election is a tissue of self-serving lies.

Disappointing but not expected.

Achieving a 300% speedup in ETL with Apache Spark

Filed under: Cloudera,ETL,Spark — Patrick Durusau @ 7:57 pm

Achieving a 300% speedup in ETL with Apache Spark by Eric Maynard.

From the post:

A common design pattern often emerges when teams begin to stitch together existing systems and an EDH cluster: file dumps, typically in a format like CSV, are regularly uploaded to EDH, where they are then unpacked, transformed into optimal query format, and tucked away in HDFS where various EDH components can use them. When these file dumps are large or happen very often, these simple steps can significantly slow down an ingest pipeline. Part of this delay is inevitable; moving large files across the network is time-consuming because of physical limitations and can’t be readily sped up. However, the rest of the basic ingest workflow described above can often be improved.

Campaign finance data suffers more from complexity and obscurity than volume.

However, there are data problems where volume and not deceit is the issue. In those cases, you may find Eric’s advice quite helpful.

Expiring Patents

Filed under: Intellectual Property (IP),Law — Patrick Durusau @ 7:37 pm

Expatents returns a list of patents expiring that day and you can sign up for a weekly digest of expiring patents.

The site claims that over 80% of patents are never commercially exploited.

Are expired patents, that is without commercial exploitation, like articles that are never cited by anyone?

Potential shareholder litigation over the not-so-trivial cost of patents that never resulted in commercial exploitation?

Was it inside or outside counsel that handled the patent filings?

There’s an interesting area for tracing relationships (associations) and expenses.

A 3-Second Blockading Proposal

Filed under: Government,Protests — Patrick Durusau @ 7:20 pm

Nearly everyone I know has read Steal This Book by Abbie Hoffman at some point but despite its being on the internet, younger readers may have missed it.

To set the background for the 3-second blockading proposal, consider what Abbie has to say about anti-tire weapons:

Don’t believe all those bullshit tire ads that make tires seem like the Superman of the streets. Roofing nails spread out on the street are effective in stopping a patrol car. A nail sticking out from a strong piece of wood wedged under a rear tire will work as effectively as a bazooka. An ice pick will do the trick repeatedly but you’ve got to have a strong arm to strike home…. (page 122 of the pdf Steal This Book I can’t say how that corresponds to other copies.)

Everything Abbie says is true, but I see problems with each of his suggestions:

  1. Roofing nails: Roofing nails work, are easy to purchase and not expensive. At the same time, they are an indiscriminate weapon, not unlike carpet bombing when the objective is intersecting a single road.
  2. Nail in wood: The comparison of a “strong piece of wood” and “bazooka” makes me think of nails in the end of a 2 x 4 board. Works but even TSA agents trained to spot bottled water can spot someone sporting a 2 x 4 on one shoulder. (Not what Abbie meant but a humorous image.)
  3. Ice Pick: Like the man says, requires “a strong arm to strike home.” If reduced to using an ice pick, you do know to go for the thinner sideways. Yes?

Other tire weapon methods include: flattening tires with bayonets, shooting out tires, snd the current fad with spike strips:


Pictured is the Stinger Spike System, which is advertised online for $889.20 (not including shipping and tax).

Blockading with tire weapons sounds indiscriminate (roofing nails), obvious (2 x 4 with nails), difficult (ice pick), unlikely (bayonets/guns), and/or expensive (police spikes).

But that’s not necessarily so.

What if you had the opportunity to use this truck as part of a highway blockade:


Impressive. Yes?

Look at all those tires! That just seems way too difficult. But, perhaps not.

How many of those tires would have to be disabled to make that semi-train part of a road blockade?

Here’s an image to help with that question:


Out of all those tires, only one of the font two steering tires must be disabled. Disable either one and the truck becomes a fixture unless and until someone can clear enough traffic from around it and repair the tire.

BTW, the same lesson applies to school buses, tour buses, garbage trucks, dump trucks, Metro buses (includes links to schedules in case you want to wait for one), in short, anything that is big and difficult to move until repaired.

A 3-Second Blockading Proposal

Large, difficult to repair vehicles make great elements of a roadway blockade. If they lose either one of their front tires, there they sit until repaired.

So how do either one of the front tires get flattened?

What fact about tires did Abbie Hoffman overlook in Steal This Book?

You’re ahead of me. Yes, valve stems.

Valve stems are nearly obscured on tractor trailer rigs by the wheel housing:


Valve stems vary depending on the type of vehicle and by design are not easy to cut.

The ideal (and unproven) scenario would be:

  1. Spot blockade target’s valve stem
  2. Cut valve stem
  3. Be on your way

all in 3 seconds or less.

But see the next section:

Lack of Practical Experience – Variety Intervenes

My first impulse was to recommend using robust cutters:


for severing valve stems but your success with those will depend upon your arm strength and the tires you encounter.

Quite frankly, the variety of wheels and tires is too large to make a judgment about tools until reconnaissance on the tires you are likely to encounter.

Add to that my lack of tractor trailer tires immediately available for trials, and further research is indicated.

Any research/experience you can point to and/or contribute concerning cutting valve stems, specifying tire model(s) and the tool(s) used, would be greatly appreciated.

Steal This Book is still a great read but is sorely in need of an update. It does have my favorite paragraph from all counter-culture literature:

If you are around a military base, you will find it relatively easy to get your hands on an M-79 grenade launcher, which is like a giant shotgun and is probably the best self-defense weapon of all time.

It’s not clear what experience Abbie had with the M-79, but you have to admit it is one hell of an image:


I understand that ammunition for the M-79 is hard to find. You?

January 2, 2017

News Bubble Bursting – World Newspapers and Magazines Online

Filed under: Journalism,News,Reporting — Patrick Durusau @ 9:27 pm

World Newspapers and Magazines Online

Newspaper and magazine listings for one hundred and ninety-nine (199) countries.

At the rate of one country per week, it would take 3.8 years to work your way through this listing.

Considering the depth of government and corporate deception, don’t you owe it to yourself, if not your readers, to sample that deception widely?

In an age of automatic, if not always smooth and correct, translation, do you have a good excuse for doing any less?

Russian Hackers – Repeating History?

Filed under: Journalism,News,Reporting — Patrick Durusau @ 7:35 pm

Maybe there is something to reading accounts of recent history. (A fascination with markup/computer and ANE languages doesn’t lead to much recent reading in “recent” history.)

But I was reading Manufacturing Consent by Edward S. Herman and Noam Chomsky (2002), when I encountered a repetition of the currently popular meme, “Russian hackers hacked the DNC.” (Despite the Podesta emails being obtained due to user carelessness that is hard to characterize as a “hack.”)

History Repeating (Not for the first time)

Set your wayback machine for 1981, another time when Russia (then the USSR) was an “evil empire.” (Or so claimed by people with particular agendas.)

A Turkish facist and member of a violent anti-left party in Turkey, one Mehmet Ali Agca, attempted to assassinate Pope John Paul II in May 1981. After being interrogated for 17 months, Agca “confessed” that he was an agent of the KGB and Bulgarians.

Herman and Chomsky walk through the unraveling of this fantasy of the Reagan era political elites (pages xxvii-xxix), only to conclude:

The New York Times, which had been consistently supportive of the connection in both news and editorials, not only failed to report Weinstein’s negative findings from the search of the Bulgarian files, it also excluded Goodman’s statements on the CIA penetration of the Bulgarian secret services from their excerpts of his testimony. The Times had long maintained that the CIA and the Reagan administration “recoiled from the devastating implication that Bulgarian agents were bound to have acted only on a signal from Moscow.” 58 But Goodman’s and Ford’s testimony show that this was the reverse of the truth, and that CIA heads William Casey and Robert Gates overrode the views of CIA professionals and falsified evidence to support a Soviet linkage. The Times was not alone in following a misleading party line, but it is notable that this paper of record has yet to acknowledge its exceptional gullibility and propaganda service.


recoiled from the devastating implication that Bulgarian agents were bound to have acted only on a signal from Moscow

Does that sound similar to anything you have read recently or have heard repeated by the out-going US president?

December 11, 2016

Jump forward now to December 11, 2016 and you can read the New York Times reporting:

“This is why I hate the term ‘we speak truth to power,’” said Mark M. Lowenthal, a former senior C.I.A. analyst. “We don’t have truth. We have really good ideas.”

Mr. Lowenthal said that determining the motives of foreign leaders — in this case, what drove President Vladimir V. Putin of Russia to order the hacking — was one of the most important missions for C.I.A. analysts. In 2002, one of the critical failures of American spy agencies was their inability to understand Saddam Hussein’s goals and motives.

A simple search reveals the internet is replete with such trash talking by the CIA, DHS, FBI and an assorted of agencies that rearrange conclusions but offer no facts in support of those conclusions.

A Final Blow as 2016 Closes

With the same credibility I would accord the now discredited NYTimes fable about Russian backing for the attempt on the life of Pope John Paul II, hacking of the Democratic National Committee at the direction of Vladimir Putin, comes this final shot from Russian hackers:


Since the Islamic States hasn’t claimed credit, it must be those damned Russian hackers! (Caution: That is “fake news.” Carey may have been sabotaged by someone but it wasn’t Russian hackers.)

A Case For Topic Maps & Subject Identity Anyone?

I haven’t worked out the details but these repeated charades by the US government, among others, offer an opportunity to put subject identity as defined by topic maps to work for true journalists.

The particulars of any particular subject vary but they all have:

  1. Accusations sans evidence by one or more agencies of the US government
  2. Chest-thumping by the New York Times (and others) in both reporting (sic) and editorial columns
  3. Articles/editorials rely on unnamed government sources or financially interested contractors
  4. Months without any evidence but more chest-thumping by US government agencies and their familiars

When all four of those properties are found, you are at least part way to identifying yet another repetition of the attempted assassination of Pope John Paul II fable.

Although, quite honestly, it needs a catchier moniker than that one.


January 1, 2017

Historic American Newspapers (Bulk OCR Data Find!)

Filed under: Uncategorized — Patrick Durusau @ 9:41 pm

Historic American Newspapers

From the webpage:

Search America’s historic newspaper pages from 1789-1924 or use the U.S. Newspaper Directory to find information about American newspapers published between 1690-present.

A total of 2,134 newspapers, digitized (images) and searchable. Some 11,520,159 pages for searching and review.

Quite a treasure trove for genealogy types, primary/secondary research papers, people trying to escape the smoothing influence over historical events by history books and others.

Did I mention the site has an API?

Or that it offers access to all of its OCR data in bulk?

It’s not “big data” in the sense of the astronomy community but creating sub-sets for local communities of “their papers” would have a certain cachet.


The Best And Worst Data Stories Of 2016

Filed under: Data Science,Humor — Patrick Durusau @ 9:13 pm

The Best And Worst Data Stories Of 2016 by Walt Hickey.

From the post:

It’s time once again to dole out FiveThirtyEight’s Data Awards, our annual (OK, we’ve done it once before) chance to honor those who did remarkably good stuff with data, to shame those who did remarkably bad stuff with data, and to acknowledge the key numbers that help describe what went down over the past year. As always, these are based on the considered analysis of an esteemed panel of judges, by which I mean that I pestered people around the FiveThirtyEight offices until they gave me some suggestions.

I had to list this under both data science and humor. 😉

What “…bad stuff with data…” stories do you know and how will you avoid being listed in 2017? (Assuming there is another listing.)

I suspect we learn more from data fail stories than ones that report success.



OpenTOC (ACM SIG Proceedings – Free)

Filed under: Computer Science,Open Access — Patrick Durusau @ 8:59 pm


From the webpage:

ACM OpenTOC is a unique service that enables Special Interest Groups to generate and post Tables of Contents for proceedings of their conferences enabling visitors to download the definitive version of the contents from the ACM Digital Library at no charge.

Downloads of these articles are captured in official ACM statistics, improving the accuracy of usage and impact measurements. Consistently linking to definitive versions of ACM articles should reduce user confusion over article versioning.

Conferences are listed by year, 2014 – 2016 and by event.

A step in the right direction.

Do you know if the digital library allows bulk downloading of search result metadata?

It didn’t the last time I had a digital library subscription. Contacting the secret ACM committee that decides on web features was verboten.

Enjoy this improvement in access while waiting for ACM access bottlenecks to wither and die.

Hoaxy (beta)

Filed under: Journalism,News,Reporting — Patrick Durusau @ 7:44 pm

Hoaxy (beta)

From the faq:

Q: What is Hoaxy?
Hoaxy visualizes the spread of claims and related fact checking online. A claim may be a fake news article, hoax, rumor, conspiracy theory, satire, or even an accurate report. Anyone can use Hoaxy to explore how claims spread across social media. You can select any matching fact-checking articles to observe how those spread as well.
Q: How does it work?
A: We track the social sharing of links to stories published by two types of websites: (1) Independent fact-checking organizations, such as,, and, that routinely fact check unverified claims. (2) Sources that often publish inaccurate, unverified, or satirical claims according to lists compiled and published by reputable news and fact-checking organizations.
Q: What does the visualization show?
A: Hoaxy visualizes two aspects of the spread of claims and fact checking: temporal trends and diffusion networks. Temporal trends plot the cumulative number of Twitter shares over time. The user can zoom in on any time interval. Diffusion networks display how claims spread from person to person. Each node is a Twitter account and two nodes are connected if a meme (link to a story) is passed between those two accounts via retweets, replies, quotes, or mentions. The color of a connection indicates the type of information: claims and fact checks. Clicking on an edge reveals the tweet(s) and the link to the shared story; clicking on a node reveals claims shared by the corresponding user. The network may be pruned for performance.
Q: Who decides what is true or not?
A: We do not decide what is true or false. Not all claims you can visualize on Hoaxy are false, nor can we track all false stories. We aren’t even saying that the fact checkers are 100% correct all the time. You can use the Hoaxy tool to observe how unverified stories and the fact checking of those stories spread on public social media. We welcome users to click on links to fact-checking sites to see what they’ve found in their research, but it’s up for you to evaluate the evidence about a claim and its rebuttal.


My only difficulty was in thinking of a “false story” that would be of interest to me in my day to day reading.

Who publishes false stories about XQuery or software vulnerabilities?

Ok, conceding that I take all government statements/findings, etc., as false until confirmed by someone I do trust and that vendors fall into the same camp as governments.

Those false stories aside, which rarely see contradiction in public, I don’t know what other false stories to ask about.

Can you help me?

What false stories when using Hoaxy return the best propagation graphs?


Low Risk Blockading of the DC Beltway

Filed under: Protests — Patrick Durusau @ 3:57 pm

The pre-conversion Ebeneezer Scrooge must have designed the DMV regulations for the District of Columbia.

In particular the part that reads:

Vehicle suddenly experienced mechanical problems that prevented you from moving it.

Running out of gas is not a valid defense to a ticket.

In addition to a ticket (depending on where you run out of gas), NBC Washington reports towing fees are $100, plus $20 per day storage fees.

Add to that potential damage to your car by the towing company, theft/damage in the towing lot, hassle, plus needing to get to work, it’s a big commitment.

That is, if you use your car to run out of gas to blockade the DC Beltway.

Data science and a little searching to the rescue!

There are many car and commercial/moving truck (hint, hint) locations in the Washington, D.C. area. You can pull up Google Maps for Washington, D.C. and then search for auto rentals:


This is representative only and you will find more locations and a wider selection searching or consulting local resources.

You will owe money on the rental but a rental removes the danger to your car, the hassle of getting your car back and any worries about getting to work (assuming you aren’t arrested of course).

Of course, there are unemployed and under-employed youth who might welcome a chance for a day’s employment driving a car until it runs out of gas on the Beltway. Mean spirited people could construe that as a conspiracy so use your own judgement on the risks.

For a rental deposit, walking around money, plus whatever is due under the rental agreement, at no risk to your vehicle or you, you could have a major impact. #DisruptJ20

I’m still working on the data science aspect of this problem:

The I-495 Loop is four (4) lanes wide, not counting the emergency lanes on the inside and outside.

A naive answer to the question of how many vehicles it takes to blockade all of I-495 (one side) is six (the four travel lanes plus the emergency lanes to insure full stoppage).

The Beltway could be blockaded by six vehicles at any point around its 64-mile length but is that only tactic available to potential blockaders?

Moreover, among the various tactics available to blockaders, what locations/strategies should they consider?

Those and other questions can be explored using public data and data science tools.

Defeating “Fake News” Without Mark Zuckerberg

Filed under: Censorship,Facebook,Free Speech,Politics — Patrick Durusau @ 1:53 pm

Despite a lack of proof that “fake news” is a problem, Mark Zuckerberg and others, have taken up the banner of public censors on behalf of us all. Whether any of us are interested in their assistance or not.

In countering calls for and toleration of censorship, you may find it helpful to point out that “fake news” isn’t new.

There are any number of spot instances of fake news. Michael J. Socolow reports in: Reporting and punditry that escaped infamy:

As the day wore on, real reporting receded, giving way to more speculation. Right-wing commentator Fulton Lewis Jr. told an audience five hours after the attack that he shared the doubts of many American authorities that the Japanese were truly responsible. He “reported” that US military officials weren’t convinced Japanese pilots had the skills to carry out such an impressive raid. The War Department, he said, is “concerned to find out who the pilots of these planes are—whether they are Japanese pilots. There is some doubt as to that, some skepticism whether they may be pilots of some other nationality, perhaps Germans, perhaps Italians,” he explained. The rumor that Germans bombed Pearl Harbor lingered on the airwaves, with NBC reporting, on December 8, that eyewitnesses claimed to have seen Nazi swastikas painted on some of the bombers.

You may object that it was much confusion, the pundits weren’t trying to deceive, any number of other excuses. And you can repeat those for other individual instances of “fake news.” They simply don’t compare to the flood of intentionally “fake” publications available today.

I disagree but point taken. Let’s look back to an event that, like the internet, enabled a comparative flood of information to be available to readers, the invention of the printing press.

Elizabeth Eisenstein in The Printing Revolution in Early Modern Europe characterizes the output of the first fifty years of printing presses saying:

…it seems necessary to qualify the assertion that the first half-century of printing gave “a great impetus to wide dissemination of accurate knowledge of the sources of Western thought, both classical and Christian.” The duplication of the hermetic writings, the sibylline prophecies, the hieroglyphics of “Horapollo” and many other seemingly authoritative, actually fraudulent esoteric writings worked in the opposite direction, spreading inaccurate knowledge even while paving the way for purification of Christian sources later on.
…(emphasis added) (page 48)

I take Eisenstein to mean that knowingly fraudulent materials were being published, which seems to be the essence of the charge against the authors of “fake news” today.

As far as the quantity of the printing press equivalent to “fake news,” she remarks:

Compared to the large output of unscholarly venacular materials, the number of trilingual dictionaries and Greek or even Latin editions seems so small that one wonders whether the term “wide dissemination” ought to be applied to the latter case at all.
… (page 48)

To be fair, “unscholarly venacular materials” includes both intended to be accurate as well as “fake” texts.

The Printing Revolution in Early Modern Europe is the abridged version of Eisentein’s The printing press as an agent of change : communications and cultural transformations in early modern Europe, which has the footnotes and references to enable more precision on early production figures.

Suffice it to say, however, that no 15th equivalent to Mark Zuckerberg arrived upon the scene to save everyone from “…actually fraudulent esoteric writings … spreading inaccurate knowledge….

The world didn’t need Mark Zuckerberg’s censoring equivalent in the 15th century and it doesn’t need him now.

« Newer Posts

Powered by WordPress