Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 14, 2016

Yahoo News Feed dataset, version 1.0 (1.5TB) – Sorry, No Open Data At Yahoo!

Filed under: Data,Dataset,Yahoo! — Patrick Durusau @ 9:10 pm

R10 – Yahoo News Feed dataset, version 1.0 (1.5TB)

From the webpage:

The Yahoo News Feed dataset is a collection based on a sample of anonymized user interactions on the news feeds of several Yahoo properties, including the Yahoo homepage, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Movies, and Yahoo Real Estate. The dataset stands at a massive ~110B lines (1.5TB bzipped) of user-news item interaction data, collected by recording the user- news item interaction of about 20M users from February 2015 to May 2015. In addition to the interaction data, we are providing the demographic information (age segment and gender) and the city in which the user is based for a subset of the anonymized users. On the item side, we are releasing the title, summary, and key-phrases of the pertinent news article. The interaction data is timestamped with the user’s local time and also contains partial information of the device on which the user accessed the news feeds, which allows for interesting work in contextual recommendation and temporal data mining.

The dataset may be used by researchers to validate recommender systems, collaborative filtering methods, context-aware learning, large-scale learning algorithms, transfer learning, user behavior modeling, content enrichment and unsupervised learning methods.

The readme file for this dataset is located in part 1 of the download. Please refer to the readme file for a detailed overview of the dataset.

A great data set but one you aren’t going to see unless you have a university email account.

I thought when it took my regular Yahoo! login and I accepted the license agreement I was in. Not a chance!

No open data at Yahoo!

Why Yahoo! would have such a restriction, particularly in light of the progress towards open data is a complete mystery.

To be honest, even if I heard Yahoo!’s “reasons,” I doubt I would find them convincing.

If you have a university email address, good for you, download and use the data.

If you don’t have a university email address, can you ping me with the email of a decision maker at Yahoo! who can void this no open data policy?

Thanks!

Successful Cyber War OPS As Of 2016.01.05 – (But Fear Based Marketing Works)

Filed under: Cybersecurity,Government,Humor,Marketing,Security — Patrick Durusau @ 5:49 pm

From the text just below the interactive map:

This map lists all unclassified Cyber Squirrel Operations that have been released to the public that we have been able to confirm. There are many more executed ops than displayed on this map however, those ops remain classified.

You can select by squirrel or other animal, year, even month and the map shows successful cyber operations.

Squirrels are in the lead with 623 successes, versus one success by the United States (Stuxnet).

Be careful who you show this map.

Any sane person will laugh and agree that squirrels are a larger danger to the U.S. power grid than any fantasized terrorist.

On the other hand, non-laughing people are making money from speaking engagements, consultations, government contracts, etc., all premised on fear of terrorists attacking the U.S. power grid.

People who laugh at the Cyber Squirrel 1 map, not so much.

They say it is the lizard part of your brain that controls “…fight, flight, feeding, fear, freezing-up, and fornication.

That accords with my view that if we aren’t talking about fear, greed or sex, then we aren’t talking about marketing.

Are you willing to promote world views and uses of technology (think big data) that you know are in fact false or useless? At least in the current fear of terrorists mode, its nearly a guarantee to a payday.

Or are you looking for work from employers who realizes if you are willing to lie in order to gain a contract or consulting gig, you are very willing to lie to them as well?

Your call.

PS: You can get CyberSquirrel1 Unit Patches, 5 for $5.00, but if you put them on your laptop, you may have to leave it at home, depending upon the client.

Can You Help With Important But Non-Visual Story? – The Blue People

Filed under: Intellectual Property (IP),Law — Patrick Durusau @ 4:34 pm

Accelerate Your Newsgathering and Verification reported a post that had 3 out of 5 newsgathering tools for images. But as I mention, there are important but non-visual stories that need improved tools for newsgathering and verification.

The copyright struggle between the Blue People and Carl Malamud is an important, but thus far, non-visual story.

Here’s the story in a nutshell:

Laws, court decisions, agency rulings, etc., that govern our daily lives, are found in complex document stores. They have complex citation systems to enable anyone to find a particular law, decision, or rule.

Those systems are the Dewey Decimal system or the Library of Congress classification, except several orders of magnitude more complex. And the systems vary from state to state, etc.

It’s important to get citations right, well, let’s let the BlueBook speak for itself:

The primary purpose of a citation is to facilitate finding and identifying the authority cited…. (A Uniform System of Citation, Tenth Edition, page iv.)

If you are going to quote a law or have access to it, you must have the correct citation.

In order to compel people to obey the law, they must have fair notice of it. And it stands to reason if you can’t find the law, no access to a citation guide, you are SOL as far as access to the law.

The courts come into the picture, being as lazy if not lazier than programmers, by referring to the “BlueBook” as the standard for citations. Courts could have written out their citation practices but as I said, courts are lazy.

Over time, the court’s enshrined their references to the “BlueBook” in court rules, which grants the “BlueBook” an informal monopoly on legal citations and access to the law.

As you have guessed by now, the Blue People, with their government created, unregulated monopoly, charge for the privilege of knowing how to find the law.

The Blue People are quite fond of their monopoly and are loathe to relinquish it. Even though a compilation of how statutes, regulations and courts decisions are cited in fact, is “sweat of the brow” work and not eligible for copyright protection.

A Possible Solution, Based on Capturing Public Facts

The answer to claims of copyright by the Blue People is to collect evidence of the citation practices in all fifty states and federal practice and publish such evidence along with advisory comments on usage.

Fifty law student/librarians could accomplish the task in parallel using modern search technologies and legal databases. Their findings would need to be collated but once done, every state plus federal practice, including nuances, would be easily accessible to anyone.

The courts, as practitioners of precedent,* will continue to support their self-created BlueBook monopoly.

But most judges will have difficulty distinguishing Holder, Attorney General, et al. v. Humanitarian Law Project et al. 561 U. S. 1 (2010) (following the BlueBook) and Holder, Attorney General, et al. v. Humanitarian Law Project et al. 561 U. S. 1 (2010) (following the U.S. Supreme Court and/or some recording of how cases are cited by the US Supreme Court).

If you are in the legal profession or aspire to be, don’t forget Jonathan Swift’s observation in Gulliver’s Travels:

It is a maxim among these lawyers that whatever has been done before, may legally be done again: and therefore they take special care to record all the decisions formerly made against common justice, and the general reason of mankind. These, under the name of precedents, they produce as authorities to justify the most iniquitous opinions; and the judges never fail of directing accordingly.

The inability of courts to distinguish between “BlueBook” and “non-BlueBook” citations will over time render their observance of precedent a nullity.

Not as satisfying as riding them and the Blue People down with war horns blowing but just as effective.

The Need For Visuals

If you have read this far, you obviously don’t need visuals to keep your interest in a story. Particularly a story about access to law and similarly exciting topics. It is an important topic, just not one that really gets your blood pumping.

How would you create visuals to promote public access the laws that govern our day-to-day lives?

I’m no artist but one thought would be to show people trying to consult law books that are chained shut by their citations. Or perhaps one or two of the identifiable Blue People as Jacob Marley type figures with bound law books and heavy chains about them?

The “…could have shared…might have shared…” lines would work well with access to legal materials.

Ping me with suggested images. Thanks!

Accelerate Your Newsgathering and Verification

Filed under: Journalism,News,Reporting,Verification — Patrick Durusau @ 11:28 am

5 vital browser plugins for newsgathering and verification by Alastair Reid.

From the post:

When breaking news can travel the world in seconds, it is important for journalists to have the tools at their disposal to get to work fast. When searching the web, what quicker way is there to have those tools available than directly in the browser window?

Most browsers have a catalogue of programs and software to make your browsing experience more powerful, like a smartphone app store. At First Draft we find Google’s Chrome browser is the most effective but there are obviously other options available.

Here are five of the most useful browser extensions for finding and checking newsworthy material online.

Alastair details five browser plugins to accelerate your search for information on breaking news stories.

Three of the five are focused on images, which can be very powerful but are useful only for a limited range of stories.

Accounts of the skulduggery of government agencies, standards organizations such as ANSI, copyright antics by the Blue People as Carl Malamud calls them, are rarely accompanied by gripping images.

That’s not to denigrate stories with a strong visual element but to say tools are needed to improve newsgathering and verification of not terribly visual stories.

January 13, 2016

Congressional Roll Call Vote – The Documents – Part 2 (XQuery)

Filed under: Government,XML,XQuery — Patrick Durusau @ 11:54 pm

Congressional Roll Call Vote – The Documents (XQuery) we looked at the initial elements found in FINAL VOTE RESULTS FOR ROLL CALL 705. Today we continue our examination of those elements, starting with <vote-data>.

As before, use ctrl-u in your browser to display the XML source for that page. Look for </vote-metadata>, the next element is <vote-data>, which contains all the votes cast by members of Congress as follows:

<recorded-vote>
<legislator name-id=”A000374″ sort-field=”Abraham” unaccented-name=”Abraham” party=”R” state=”LA” role=”legislator”>Abraham</legislator><
vote>Nay</vote>
</recorded-vote>
<recorded-vote>
<legislator name-id=”A000370″ sort-field=”Adams” unaccented-name=”Adams” party=”D” state=”NC” role=”legislator”>Adams</legislator>
<vote>Yea</vote>
</recorded-vote>

These are only the first two (2) lines but only the content of other <recorded-vote> elements varies from these.

I have introduced line returns to make it clear that <recorded-vote> … </recorded-vote> begin and end each record. Also note that <legislator> and <vote> are siblings.

What you didn’t see in the upper part of this document were the attributes that appear inside the <legislator> element.

Some of the attributes are: name-id=”A000374,” state=”LA” role=”legislator.”

In an XQuery, we address attributes by writing out the path to the element containing the attributes and then appending the attribute.

For example, for name-id=”A000374,” we could write:

rollcall-vote/vote-data/recorded-vote/legislator[@name-id = "A000374]

If we wanted to select that attribute value and/or the <legislator> element with that attribute and value.

Recalling that:

rollcall-vote – Root element of the document.

vote-data – Direct child of the root element.

recorded-vote – Direct child of the vote-data element (with many siblings).

legislator – Direct child of recorded-vote.

@name-id – One of the attributes of legislator.

As I mentioned in our last post, there are other ways to access elements and attributes but many useful things can be done with direct descendant XPaths.

In preparation for our next post, trying searching for “A000374” and limiting your search to the domain, congress.gov.

It is a good practice to search on unfamiliar attribute values. You never know what you may find!

Until next time!

‘Something is rotten in the state of Denmark’

Filed under: Government,Politics — Patrick Durusau @ 9:27 pm

You probably recognize the title as a line from Shakespeare‘s Hamlet.

An unfortunately apt phrase to describe the current state of politics in Denmark, once you read: Danish MPs debate plan to seize refugees’ valuables.

From the story:

Denmark’s parliament began debating Wednesday a controversial plan to seize refugees’ valuables, with the bill widely expected to be passed on January 26 vote after being backed by a majority of lawmakers.

The bill has been criticised by UN refugee agency UNHCR which fears it will “fuel fear” and “xenophobia”.

The proposal would allow Danish authorities to seize asylum seekers’ cash exceeding 10,000 kroner ($1,450), as well as any individual items valued at more than 10,000 kroner.

Wedding rings would be exempt, along with other items of sentimental value, such as engagement rings, family portraits and medals.

Prime Minister Lars Lokke Rasmussen’s right-wing government has faced a wave of criticism over its proposal, which had initially put the limit for refugees at 3,000 kroner ($437).

I thought valuables were looted from refugees just before they were herded into gas chambers.

Instead, the Danish government seeks to impoverish them and make life itself an affliction.

Is the government hopeful for a high suicide rate among the refugees? To save the expense of building death camps?

President Sanders should build a wall around Denmark, declare a no-fly zone and blockade its ports.

It maybe the only way to prevent the spread of such heartlessness across Europe.

Self-Learn Yourself Apache Spark in 21 Blogs – #1

Filed under: Spark — Patrick Durusau @ 9:05 pm

Self-Learn Yourself Apache Spark in 21 Blogs – #1 by Kumar Chinnakali.

From the post:

We have received many requests from friends who are constantly reading our blogs to provide them a complete guide to sparkle in Apache Spark. So here we have come up with learning initiative called “Self-Learn Yourself Apache Spark in 21 Blogs”.

We have drilled down various sources and archives to provide a perfect learning path for you to understand and excel in Apache Spark. These 21 blogs which will be written over a course of time will be a complete guide for you to understand and work on Apache Spark quickly and efficiently.

We wish you all a Happy New Year 2016 and start the year with rich knowledge. From dataottam we wish you good luck to “ROCK Apache Spark & the New Year 2016”

I’m not sure what to say about this series of posts. The title is promising enough but it takes until post #4 before you get any substantive content and not much then. Perhaps it will pickup as time goes by.

Worth a look but too soon to be excited about it.

I first saw this in a tweet by Kirk Borne.

You Up To Improving Traversals/DSLs/OLAP in TinkerPop 3.2.0?

Filed under: DSL,Graphs,OLAP,TinkerPop,Traversal — Patrick Durusau @ 8:52 pm

Big ideas for Traversals/DSLs/OLAP in TinkerPop 3.2.0 by Marko A. Rodriguez.

Marko posted a not earlier today that reads in part:

There is currently no active development on TinkerPop 3.2.0, however, in my spare time I’ve been developing (on paper) some new ideas that should make traversals, DSLs, and OLAP even better.

Problem #1: The Builder pattern for TraversalSources is lame. [https://issues.apache.org/jira/browse/TINKERPOP-971]

Problem #2: It is not natural going from OLTP to OLAP to OLTP to OLAP. [https://issues.apache.org/jira/browse/TINKERPOP-570]

I mention this because it has been almost seven (7) hours since Marko posted this note and its not like he is covered up with responses!

Myself included but I’m not qualified to comment on his new ideas. One or more of you are. Take up the challenge!

TinkerPop, the community and you will be better for it.

Enjoy!

Automatically Finding Weapons…

Filed under: Image Processing,Image Recognition,Intelligence,Open Source Intelligence — Patrick Durusau @ 8:35 pm

Automatically Finding Weapons in Social Media Images Part 1 by Justin Seitz.

From the post:

As part of my previous post on gangs in Detroit, one thing had struck me: there are an awful lot of guns being waved around on social media. Shocker, I know. More importantly I began to wonder if there wasn’t a way to automatically identify when a social media post has guns or other weapons contained in them. This post will cover how to use a couple of techniques to send images to the Imagga API that will automatically tag pictures with keywords that it feels accurately describe some of the objects contained within the picture. As well, I will teach you how to use some slicing and dicing techniques in Python to help increase the accuracy of the tagging. Keep in mind that I am specifically looking for guns or firearm-related keywords, but you can easily just change the list of keywords you are interested in and try to find other things of interest like tanks, or rockets.

This blog post will cover how to handle the image tagging portion of this task. In a follow up post I will cover how to pull down all Tweets from an account and extract all the images that the user has posted (something my students do all the time!).

This rocks!

Whether you are trying to make contact with a weapon owner who isn’t in the “business” of selling guns or if you are looking for like-minded individuals, this is a great post.

Would make an interesting way to broadly tag images for inclusion in group subjects in a topic map, awaiting further refinement by algorithm or humans.

This is a great blog to follow: Automating OSINT.

Using ‘R’ for betting analysis [Data Science For The Rest Of Us]

Filed under: Data Analysis,R — Patrick Durusau @ 4:45 pm

Using ‘R’ for betting analysis by Minrio Mella.

From the post:

Gaining an edge in betting often boils down to intelligent data analysis, but faced with daunting amounts of data it can be hard to know where to start. If this sounds familiar, R – an increasingly popular statistical programming language widely used for data analysis – could be just what you’re looking for.

What is R?

R is a statistical programming language that is used to visualize and analyse data. Okay, this sounds a little intimidating but actually it isn’t as scary as it may appear. Its creators – two professors from New Zealand – wanted an intuitive statistical platform that their students could use to slice and dice data and create interesting visual representation like 3D graphs.

Given its relative simplicity but endless scope for applications (packages) R has steadily gained momentum amongst the world’s brightest statisticians and data scientists. Facebook use R for statistical analysis of status updates and many of the complex word clouds you might see online are powered by R.

There are now thousands of user created libraries to enhance R functionality and given how much successful betting boils down to effective data analysis, packages are being created to perform betting related analysis and strategies.

On a day when the PowerBall lottery has a jackpot of $1.5 billion, a post on betting analysis is appropriate.

Especially since most data science articles are about sentiment analysis, recommendations, all of which is great if you are marketing videos in a streaming environment across multiple media channels.

At home? Not so much.

Mirio’s introduction to R walks you through getting R installed along with a library for Pinnacle Sports for odds conversion.

No guarantees on your betting performance but having a subject you are interested in, betting, makes it much more likely you will learn R.

Enjoy!

Cracka Bags NDI James Clapper! Kudos!

Filed under: Cybersecurity,Privacy — Patrick Durusau @ 3:53 pm

US Intelligence director’s personal e-mail, phone hacked by Sean Gallagher.

From the post:

The same individual or group claiming to be behind a recent breach of the personal e-mail account of CIA Director John Brennan now claims to be behind the hijacking of the accounts of Director of National Intelligence James Clapper. The Office of the Director of National Intelligence confirmed to Motherboard that Clapper was targeted and that the case has been forwarded to law enforcement.

Someone going by the moniker “Cracka,” claiming to be with a group of “teenage hackers” called “Crackas With Attitude,” told Motherboard’s Lorenzo Franceschi-Bicchiarai that he had gained access to Clapper’s Verizon FiOS account and changed the settings for his phone service to forward all calls to the Free Palestine Movement. Cracka also claimed to have gained access to Clapper’s personal e-mail account and his wife’s Yahoo account.

See the rest of Sean’s post for the details but really, good show!

If Cracka is not an NSA operative, this hack was:

  1. Without national security letters to phone providers
  2. Without the melting NSA data center in Colorado
  3. Without highly paid consultants and contractors
  4. Without government-only grade hardware
  5. Without secret information about phone networks
  6. etc.

Sounds like all our data is already easily available to government agents, just not the one who think of Excel as data processing. 😉

As I said before, Cracka/s needs to dump all the data they can access and then announce the hack.

Unless and until the government joins its citizens in the same goldfish bowl, its attitude towards privacy will never change. Perhaps not even then but it’s worth a shot.

PS: For the automatic: But people will get hurt from open data dumping! And your point? People are being hurt now by government secrecy and invasions of their privacy.

I wonder what your basis is for choosing who it is acceptable to hurt across an entire nation? I forgot, that’s a secret isn’t it?

January 12, 2016

Who Is Lying About Encryption?

Filed under: Cybersecurity,Government,Security — Patrick Durusau @ 10:39 pm

Canadian Cops Can Decrypt PGP BlackBerrys Too by Joseph Cox.

From the post:

On Monday, Motherboard reported that leading Dutch forensics investigators say they are able to read encrypted messages sent on PGP BlackBerry phones—custom devices which are advertised as more suited for secure communication than off-the-shelf models.

A myriad of other law enforcement agencies would not comment on whether they have this capability, but court documents reviewed by Motherboard show that the Royal Mounted Canadian Police (RMCP) can also decrypt messages from PGP BlackBerrys.

“This encryption was previously thought to be undefeatable,” one 2015 court document in a drug trafficking case reads, referring to the PGP encryption used to secure messages on a BlackBerry device. “The RCMP technological laboratory destroyed this illusion and extracted from this phone 406 e-mails, 25 address book entries and other information all of which had been protected.”

In another case from 2015, centering around charges of kidnap and assault, three out of four BlackBerrys seized by the RCMP were analysed by the “Technical Assistance Team in Ottawa and the contents were decrypted and reports prepared.

Reports such as this one make you wonder who is lying about encryption?

This report makes current encryption sound like a cheap bicycle lock that can be defeated by anyone.

On the other hand, there are known luddites like FBI Director James Comey, who claim that government must be able to read encrypted files.

Is the “we can’t read the files” simply a ploy for more funding?

Or is current encryption really as good as the “rhythm” method of birth control?

Complicating matters is that encryption is a tough subject that even honest experts disagree about techniques and their safety.

Even with your best encryption, remember two rules for transmitting digital data:

  1. Send as little data as possible.
  2. What data you send should have as short a life span as possible.

For example, “Meet at location N in 20 minutes,” has an operational lifespan of about 25 minutes. Beyond that, even if broken, it’s useless.

BTW, don’t save on burner phones by using the same phone day after day. Why do you think they call them “burner” phones?

Note the Canadian case with 406 e-mails. That’s just irresponsible.

[Don’t] …Join the National Security State

Filed under: Free Speech,Government,Privacy,Security — Patrick Durusau @ 10:13 pm

Social Media Companies Should Decline the Government’s Invitation to Join the National Security State by Hugh Handeyside.

The pressure on social media companies to limit or take down content in the name of national security has never been greater. Resolving any ambiguity about the how much the Obama administration values the companies’ cooperation, the White House on Friday dispatched the highest echelon of its national security team — including the Attorney General, the FBI Director, the Director of National Intelligence, and the NSA Director — to Silicon Valley for a meeting with technology executives chaired by the White House Chief of Staff himself. The agenda for the meeting tried to convey a locked-arms sense of camaraderie, asking, “How can we make it harder for terrorists to leveraging [sic] the internet to recruit, radicalize, and mobilize followers to violence?”

Congress, too, has been turning up the heat. On December 16, the House passed the Combat Terrorist Use of Social Media Act, which would require the President to submit a report on “United States strategy to combat terrorists’ and terrorist organizations’ use of social media.” The Senate is considering a far more aggressive measure which would require providers of Internet communications services to report to government authorities when they have “actual knowledge” of “apparent” terrorist activity (a requirement that, because of its vagueness and breadth, would likely harm user privacy and lead to over-reporting).

The government is of course right that terrorists use social media, including to recruit others to their cause. Indeed, social media companies already have systems in place for catching real threats, incitement, or actual terrorism. But the notion that social media companies can or should scrub their platforms of all potentially terrorism-related content is both unrealistic and misguided. In fact, mandating affirmative monitoring beyond existing practices would sweep in protected speech and turn the social media companies into a wing of the national security state.

The reasons not to take that route are both practical and principled. On a technical level, it would be extremely difficult, if not entirely infeasible, to screen for actual terrorism-related content in the 500 million tweets that are generated each day, or the more than 400 hours of video uploaded to YouTube each minute, or the 300 million daily photo uploads on Facebook. Nor is it clear what terms or keywords any automated screening tools would use — or how using such terms could possibly exclude beliefs and expressive activity that are perfectly legal and non-violent, but that would be deeply chilled if monitored for potential links to terrorism.

Hugh makes a great case why social media companies should resist becoming arms of the national security state.

You should read his essay in full and I would add only one additional point:

Do you and/or your company want to be remembered for resisting the security state or as collaborators? The choice is that simple.

Law as Pay-to-Play – ASTM International vs. Public.Resource.org, Inc.

Filed under: Government,Intellectual Property (IP),Law — Patrick Durusau @ 8:10 pm

Carl Malamud has been hitting Twitter hard today as he posts links to new materials in ASTM International vs. Public.Resource.org, Inc. (case docket).

The crux of the case is whether a legal authority, like the United States, can pass a law that requires citizens to buy materials from private organizations, in order to know what the law says.

That is a law will cite a standard, say by ASTM, and you are bound by the terms of that law, which aren’t clear unless you have a copy of a standard from ASTM. ASTM will be more than happy to sell you a copy.

It’s interesting that ASTM, which has reasonable membership fees of $75 a year, would be the lead plaintiff in this case.

There are technical committees associated with ANSI that have membership fees of $1,200 or more per year. And that is the lowest membership category.

I deeply enjoyed Carl’s tweet that described the ANSI amicus brief as “the sky is falling.”

No doubt from ANSI’s perspective, if Public.Resource.org, Inc. prevails, which it should under any sensible notice of the law reasoning, the sky will be falling.

ANSI and its kin profit by creating a closed club of well-heeled vendors who can pay for early access and participate in development of standards.

You have heard the term “white privilege?” In the briefs for ASTM and its friends, you will realize how deeply entrenched “corporate privilege” is in the United States. The ANSI brief is basically “this is how we do it and it works for us, go away.” No sense of other at all.

There is a running implication that standards organizations (SDOs) have to sell copies of standards to support standards activity. At least on a quick skim, I haven’t seen any documentation on that point. In fact, the W3C, which makes a large number of standards, seems to do ok giving standards away for free.

I can’t help but wonder how the presiding judge will react should a data leak from one of the plaintiffs prove that the “sale of standards” is entirely specious from a financial perspective. That is membership, the “pay-to-play,” is really the deciding factor.

That doesn’t strengthen or weaken the public notice of the law but I do think it is a good indication of the character of the plaintiffs and the lengths they are willing to go to preserve corporate privilege.

In case you are still guessing, I’m on the side of Public.Resource.org.

January 11, 2016

Congressional Roll Call Vote – The Documents (XQuery)

Filed under: Government,XML,XQuery — Patrick Durusau @ 10:41 pm

I assume you have read my new starter post for this series: Congressional Roll Call Vote and XQuery (A Do Over). If you haven’t and aren’t already familiar with XQuery, take a few minutes to go read it now. I’ll wait.

The first XML document we need to look at is FINAL VOTE RESULTS FOR ROLL CALL 705. If you press ctrl-u in your browser, the XML source of that document will be displayed.

The top portion of that document, before you see <vote-data> reads:

<?xml version=”1.0″ encoding=”UTF-8″?>
<!DOCTYPE rollcall-vote PUBLIC “-//US Congress//DTDs/vote
v1.0 20031119 //EN” “http://clerk.house.gov/evs/vote.dtd”>
<?xml-stylesheet type=”text/xsl” href=”http://clerk.house.gov/evs/vote.xsl”?>
<rollcall-vote>
<vote-metadata>
<majority>R</majority>
<congress>114</congress>
<session>1st</session>
<chamber>U.S. House of Representatives</chamber>
<rollcall-num>705</rollcall-num>
<legis-num>H R 2029</legis-num>
<vote-question>On Concurring in Senate Amdt with
Amdt Specified in Section 3(a) of H.Res. 566</vote-question>
<vote-type>YEA-AND-NAY</vote-type>
<vote-result>Passed</vote-result>
<action-date>18-Dec-2015</action-date>
<action-time time-etz=”09:49″>9:49 AM</action-time>
<vote-desc>Making appropriations for military construction, the
Department of Veterans Affairs, and related agencies for the fiscal
year ending September 30, 2016, and for other purposes</vote-desc>
<vote-totals>
<totals-by-party-header>
<party-header>Party</party-header>
<yea-header>Yeas</yea-header>
<nay-header>Nays</nay-header>
<present-header>Answered “Present”</present-header>
<not-voting-header>Not Voting</not-voting-header>
</totals-by-party-header>
<totals-by-party>
<party>Republican</party>
<yea-total>150</yea-total>
<nay-total>95</nay-total>
<present-total>0</present-total>
<not-voting-total>1</not-voting-total>
</totals-by-party>
<totals-by-party>
<party>Democratic</party>
<yea-total>166</yea-total>
<nay-total>18</nay-total>
<present-total>0</present-total>
<not-voting-total>4</not-voting-total>
</totals-by-party>
<totals-by-party>
<party>Independent</party>
<yea-total>0</yea-total>
<nay-total>0</nay-total>
<present-total>0</present-total>
<not-voting-total>0</not-voting-total>
</totals-by-party>
<totals-by-vote>
<total-stub>Totals</total-stub>
<yea-total>316</yea-total>
<nay-total>113</nay-total>
<present-total>0</present-total>
<not-voting-total>5</not-voting-total>
</totals-by-vote>
</vote-totals>
</vote-metadata>

One of the first skills you need to learn to make effective use of XQuery is how to recognize paths in an XML document.

I’ll do the first several and leave some of the others for you.

<rollcall-vote> – the root element – aka “parent” element

<vote-metadata> – first child element in this document
XPath rollcall-vote/vote-metadata

<majority>R</majority> first child of <majority>R</majority> of <vote-metadata>
XPath rollcall-vote/vote-metadata/majority

<congress>114</congress>

What do you think? Looks like the same level as <majority>R</majority> and it is. Called a sibling of <majority>R</majority>
XPath rollcall-vote/vote-metadata/congress

Caveat: There are ways to go back up the XPath and to reach siblings and attributes. For the moment, lets get good at spotting direct XPaths.

Let’s skip down in the markup until we come to <totals-by-party-header>. It’s not followed, at least not immediately, with </totals-by-party-header>. That’s a signal that the previous siblings have stopped and we have another step in the XPath.

<totals-by-party-header>
XPath: rollcall-vote/vote-metadata/majority/totals-by-party-header

<party-header>Party</party-header>
XPath: rollcall-vote/vote-metadata/majority/totals-by-party-header/party-header

As you may suspect, the next four elements are siblings of <party-header>Party</party-header>

<yea-header>Yeas</yea-header>
<nay-header>Nays</nay-header>
<present-header>Answered “Present”</present-header>
<not-voting-header>Not Voting</not-voting-header>

The closing element, shown by the “/,” signals the end of the <totals-by-party-header> element.

</totals-by-party-header>

See how you do mapping out the remaining XPaths from the top of the document.

<totals-by-party>
<party>Republican</party>
<yea-total>150</yea-total>
<nay-total>95</nay-total>
<present-total>0</present-total>
<not-voting-total>1</not-voting-total>
</totals-by-party>
<totals-by-party>
<party>Democratic</party>
<yea-total>166</yea-total>
<nay-total>18</nay-total>
<present-total>0</present-total>
<not-voting-total>4</not-voting-total>
</totals-by-party>

Tomorrow we are going to dive into the structure of the <vote-data> and how to address the attributes therein and their values.

Enjoy!

Intuition, deliberation, and the evolution of cooperation [hackathons for example?]

Filed under: Cooperation,Evoluntionary,Marketing — Patrick Durusau @ 5:51 pm

Intuition, deliberation, and the evolution of cooperation by Adam Bear and David G. Rand.

Significance:

The role of intuition versus deliberation in human cooperation has received widespread attention from experimentalists across the behavioral sciences in recent years. Yet a formal theoretical framework for addressing this question has been absent. Here, we introduce an evolutionary game-theoretic model of dual-process agents playing prisoner’s dilemma games. We find that, across many types of environments, evolution only ever favors agents who (i) always intuitively defect, or (ii) are intuitively predisposed to cooperate but who, when deliberating, switch to defection if it is in their self-interest to do so. Our model offers a clear explanation for why we should expect deliberation to promote selfishness rather than cooperation and unifies apparently contradictory empirical results regarding intuition and cooperation.

Abstract:

Humans often cooperate with strangers, despite the costs involved. A long tradition of theoretical modeling has sought ultimate evolutionary explanations for this seemingly altruistic behavior. More recently, an entirely separate body of experimental work has begun to investigate cooperation’s proximate cognitive underpinnings using a dual-process framework: Is deliberative self-control necessary to reign in selfish impulses, or does self-interested deliberation restrain an intuitive desire to cooperate? Integrating these ultimate and proximate approaches, we introduce dual-process cognition into a formal game-theoretic model of the evolution of cooperation. Agents play prisoner’s dilemma games, some of which are one-shot and others of which involve reciprocity. They can either respond by using a generalized intuition, which is not sensitive to whether the game is one-shot or reciprocal, or pay a (stochastically varying) cost to deliberate and tailor their strategy to the type of game they are facing. We find that, depending on the level of reciprocity and assortment, selection favors one of two strategies: intuitive defectors who never deliberate, or dual-process agents who intuitively cooperate but sometimes use deliberation to defect in one-shot games. Critically, selection never favors agents who use deliberation to override selfish impulses: Deliberation only serves to undermine cooperation with strangers. Thus, by introducing a formal theoretical framework for exploring cooperation through a dual-process lens, we provide a clear answer regarding the role of deliberation in cooperation based on evolutionary modeling, help to organize a growing body of sometimes-conflicting empirical results, and shed light on the nature of human cognition and social decision making.

Guidance for the formation of new communities, i.e., between strangers?

Critically, selection never favors agents who use deliberation to override selfish impulses: Deliberation only serves to undermine cooperation with strangers.

How would you motivate the non-deliberative formation of an online community for creating a topic map?

It just occurred to me, is the non-deliberative principle in play at hackathons? Where there are strangers but not sufficient time or circumstances to deliberate on your contribution and return on that contribution?

Hackathons, the ones I have read about, tend to be physical, summer camp type events. Is physical presence and support a key?

If you were going to do a topic map hackathon, physical or online, what would be its focus?

I first saw this in a tweet by Steve Strogatz.

…[N]ew “GraphStore” core – Gephi

Filed under: Gephi,Graphs,Visualization — Patrick Durusau @ 5:29 pm

Gephi boosts its performance with new “GraphStore” core by Mathieu Bastian.

From the post:

Gephi is a graph visualization and analysis platform – the entire tool revolves around the graph the user is manipulating. All modules (e.g. filter, ranking, layout etc.) touch the graph in some way or another and everything happens in real-time, reflected in the visualization. It’s therefore extremely important to rely on a robust and fast underlying graph structure. As explained in this article we decided in 2013 to rewrite the graph structure and started the GraphStore project. Today, this project is mostly complete and it’s time to look at some of the benefits GraphStore is bringing into Gephi (which its 0.9 release is approaching).

Performance is critical when analyzing graphs. A lot can be done to optimize how graphs are represented and accessed in the code but it remains a hard problem. The first versions of Gephi didn’t always shine in that area as the graphs were using a lot of memory and some operations such as filter were slow on large networks. A lot was learnt though and when the time came to start from scratch we knew what would move the needle. Compared to the previous implementation, GraphStore uses simpler data structures (e.g. more arrays, less maps) and cache-friendly collections to make common graph operations faster. Along the way, we relied on many micro-benchmarks to understand what was expensive and what was not. As often with Java, this can lead to surprises but it’s a necessary process to build a world-class graph library.

What shall we say about the performance numbers?

IMPRESSIVE!

The test were against “two different classic graphs, one small (1.5K nodes, 19K edges) and one medium (83K nodes, 68K edges).”

Less than big data size graphs but isn’t the goal of big data analysis to extract the small portion of relevant data from the big data?

Yes?

Maybe there should be an axiom about gathering of irrelevant data into a big data pile, only to be excluded again.

Or premature graphification of largely irrelevant data.

Something to think about as you contribute to the further development of this high performing graph library.

Enjoy!

50 Predictions for the Internet of Things in 2016 (Ebola Moment for Software Development?)

Filed under: Cybersecurity,IoT - Internet of Things,Security — Patrick Durusau @ 5:03 pm

50 Predictions for the Internet of Things in 2016 by David Oro.

From the post:

Earlier this year I wrote a piece asking “Do you believe the hype?” It called out an unlikely source of hype: the McKinsey Global Institute. The predictions for IoT in the years to come are massive. Gartner believes IoT is a central tenet of top strategic technology trends in 2016. Major technology players are also taking Big Swings. Louis Columbus, writing for Forbes, gathered all the 2015 market forecasts and estimates here.

So what better way to end the year and look into the future than by asking the industry for their predictions for the IoT in 2016. We asked for predictions aimed at the industrial side of the IoT. What new technologies will appear? Which companies will succeed or fail? What platforms will take off? What security challenges will the industry face? Will enterprises finally realize the benefits of IoT? We heard from dozens of startups, big players and industry soothsayers. In no particular order, here are the Internet of Things Predictions for 2016.

I count nine (9) statements from various industry leaders on IoT and you have to register to see the other forty-one (41).

I don’t have a prediction but do have a question:

Will an insecure IoT in 2016 cause enough damage to motivate better hardware/software engineering and testing practices?

I ask because 2015 was a banner year for data breaches, Data Breach Reports (ITRC), December 31, 2015, reports 169,068,506 records exposed in 2015.

Yet, where is the widespread discussion about better software engineering? (silence)

Yes, yes, let’s have more penalties for hackers, which have yet to be shown to improve cybersecurity.

Yes, yes, let’s all be more aware of security threats, except that most can’t be mitigated by those aware of them.

Apparently the exposure of 169,068,506 records in 2015 wasn’t enough to get anyone’s attention. Or at least anyone who could influence the software development process.

Odd because just the rumor of Ebola was enough to change the medical intake procedures from hospitals, to general practices to dentists.

When is the Ebola moment coming for software engineering?

Webinar: Image Similarity: Deep Learning and Beyond (January 12th/Register for Recording)

Filed under: Deep Learning,Graphs,Similarity,Similarity Retrieval — Patrick Durusau @ 4:22 pm

Webinar: Image Similarity: Deep Learning and Beyond by Dato.

From the webpage:

In this talk, we will extract features from the convolutional networks applied to real estate images to build a similarity graph and then do label propagation on the images to label different images in our dataset.

Recommended for:

  • Data scientists and engineers
  • Developers and technical team managers
  • Technical product managers

What you’ll learn:

  • How to extract features from a convolutional network using GraphLab Create
  • How to build similarity graphs using nearest neighbors
  • How to implement graph algorithms such as PageRank using GraphLab Create

What we’ll cover:

  • Extracting features from convolutional networks
  • Building similarity graphs using nearest neighbors
  • Clustering: kmeans and beyond
  • Graph algorithms: PageRank and label propagation

I had mixed results with webinars in 2015.

Looking forward to this one because of the coverage of similarity graphs.

From a subject identity perspective, how much similarity do you need to be the “same” subject?

If I have two books, one printed within the copyright period and another copy printed after the work came into the public domain, are they the same subject?

For some purposes yes and for other purposes not.

The strings we give web browsers, usually starting with “https://” these days, are crude measures of subject identity, don’t you think?

I say “the strings we give web browsers” as the efforts of TBL and his cronies to use popularity as a measure of success, continue their efforts to conflate URI, IRI, and URL into only URL. https://url.spec.whatwg.org/ The simplification doesn’t bother me as much as the attempts to conceal it.

It’s one way to bolster a claim to have anyways been right, just re-write the records that anyone is likely to remember. I prefer my history with warts and all.

JATS: Journal Article Tag Suite, Navigation Update!

Filed under: Publishing,XML — Patrick Durusau @ 8:28 am

I posted about the appearance of JATS: Journal Article Tag Suite, version 1.1 and then began to lazily browse the pdf.

I forget what I was looking for now but I noticed the table of contents jumped from page 42 to page 235, and again from 272 to to 405. I’m thinking by this point “this is going to be a bear to find elements/attributes in.” I looked for an index only to find none. 🙁

But, there’s hope!

If you look at Chapter 7 “TAG Suite Components,” elements start on page 7 and attributes on page 28, you will find:

JATS-nav

Each ✔ is a navigation link to that element (or attribute if you are in the attribute section) under each of those divisions, Archiving, Publishing, Authoring.

Very cool but falls under “non-obvious” for me.

Pass it on so others can safely and quickly navigate JATS 1.1!

PS: It was Tommie Usdin of Balisage fame who pointed out the table in chapter 7 to me. Thanks Tommie!

January 10, 2016

Congressional Roll Call Vote and XQuery (A Do Over)

Filed under: Government,XML,XQuery — Patrick Durusau @ 10:11 pm

Once words are written, as an author I consider them to be fixed. Even typos should be acknowledged as being corrected and not silently “improve” the original text. Rather than editing what has been said, more words can cover the same ground with the hope of doing so more completely or usefully.

I am starting my XQuery series of posts with the view of being more systematic, including references to at least one popular XQuery book, along with my progress through a series of uses of XQuery.

You are going to need an XQuery engine for all but this first post to be meaningful so let’s cover getting that setup first.

There are any number of GUI interface tools that I will mention over time but for now, let’s start with Saxon.

Download Saxon, unzip the file and you can choose to put saxon9he.jar in your Java classpath (if set) or you can invoke it with the -cp (path to saxon9he.jar), as in java -cp (path to saxon9he.jar) net.sf.saxon.Query -q:query-file.

Classpaths are a mixed blessing at best but who wants to keep typing -cp (your path to saxon9he.jar) net.sf.saxon.Query -q: all the time?

What I have found very useful (Ubuntu system) is to create a short shell script that I can invoke from the command line, thus:

#!/bin/bash
java -cp /home/patrick/saxon/saxon9he.jar net.sf.saxon.Query -q:$1

Which after creating that file, which I very imaginatively named “runsaxon.sh,” I used chmod 755 to make it executable.

When I want to run Saxon at the command line, in the same directory with “runsaxon.sh” I type:

./runsaxon.sh ex-5.4.xq > ex-5.4.html

It is a lot easier and not subject to my fat-fingering of the keyboard.

The “>” sign is a pipe in Linux that redirects the output to a file, in this case, ex-5.4.html.

The source of ex-5.4.xq (and its data file) is: XQuery, 2nd Edition by Patricia Walmsley. Highly recommended.

Patricia has put all of her examples online, XQuery Examples. Please pass that along with a link to her book if you use her examples.

If you have ten minutes, take a look at: Learn XQuery in 10 Minutes: An XQuery Tutorial *UPDATED* by Dr. Michael Kay. Michael Kay is also the author of Saxon.

By this point you should be well on your way to having a working XQuery engine and tomorrow we will start exploring the structure of the congressional roll call vote documents.

January 9, 2016

Congressional Roll Call and XQuery – (Week 1 of XQuery)

Filed under: Government,XML,XQuery — Patrick Durusau @ 9:49 pm

Truthfully a little more than a week of daily XQuery posts, I started a day or so before January 1, 2016.

I haven’t been flooded with suggestions or comments, ;-), so I read back over my XQuery posts and I see lots of room for improvement.

Most of my posts are on fairly technical topics and are meant to alert other researchers of interesting software or techniques. Most of them are not “how-to” or step by step guides, but some of them are.

The posts on congressional roll call documents made sense to me but then I wrote them. Part of what I sensed was that either you know enough to follow my jumps, in which case you are looking for specific details, like the correspondence across documents for attribute values, and not so much for my XQuery expressions.

On the other hand, if you weren’t already comfortable with XQuery, the correspondence of values between documents was the least of your concerns. Where the hell was all this terminology coming from?

I’m no stranger to long explanations, one of the standards I edit crosses the line at over 1,500 pages. But it hasn’t been my habit to write really long posts on this blog.

I’m going to spend the next week, starting tomorrow, re-working and expanding the congressional roll call vote posts to be more detailed for those getting into XQuery, with a very terse, short experts tips at the end of each post if needed.

The expert part will have observations such as the correspondences in attribute values and other oddities that either you know or you don’t.

Will have the first longer style post up tomorrow, January 10, 2016 and we will see how the week develops from there.

Intuitionism and Constructive Mathematics 80-518/818 — Spring 2016

Filed under: Mathematical Reasoning,Mathematics,Topic Maps — Patrick Durusau @ 9:23 pm

Intuitionism and Constructive Mathematics 80-518/818 — Spring 2016

From the course description:

In this seminar we shall read primary and secondary sources on the origins and developments of intuitionism and constructive mathematics from Brouwer and the Russian constructivists, Bishop, Martin-Löf, up to and including modern developments such as homotopy type theory. We shall focus both on philosophical and metamathematical aspects. Topics could include the Brouwer-Heyting-Kolmogorov (BHK) interpretation, Kripke semantics, topological semantics, the Curry-Howard correspondence with constructive type theories, constructive set theory, realizability, relations to topos theory, formal topology, meaning explanations, homotopy type theory, and/or additional topics according to the interests of participants.

Texts

  • Jean van Heijenoort (1967), From Frege to Gödel: A Source Book in Mathematical Logic 1879–1931, Cambridge, MA: Harvard University Press.
  • Michael Dummett (1977/2000), Elements of Intuitionism (Oxford Logic Guides, 39), Oxford: Clarendon Press, 1977; 2nd edition, 2000.
  • Michael Beeson (1985), Foundations of Constructive Mathematics, Heidelberg: Springer Verlag.
  • Anne Sjerp Troelstra and Dirk van Dalen (1988), Constructivism in Mathematics: An Introduction (two volumes), Amsterdam: North Holland.

Additional resources

Not online but a Spring course at Carnegie Mellon with a reading list that should exercise your mental engines!

Any subject with a two volume “introduction” (Anne Sjerp Troelstra and Dirk van Dalen), is likely to be heavy sledding. 😉

But the immediate relevance to topic maps is evident by this statement from Rosalie Iemhoff:

Intuitionism is a philosophy of mathematics that was introduced by the Dutch mathematician L.E.J. Brouwer (1881–1966). Intuitionism is based on the idea that mathematics is a creation of the mind. The truth of a mathematical statement can only be conceived via a mental construction that proves it to be true, and the communication between mathematicians only serves as a means to create the same mental process in different minds.

I would recast that to say:

Language is a creation of the mind. The truth of a language statement can only be conceived via a mental construction that proves it to be true, and the communication between people only serves as a means to create the same mental process in different minds.

There are those who claim there is some correspondence between language and something they call “reality.” Since no one has experienced “reality” in the absence of language, I prefer to ask: Is X useful for purpose Y? rather than the doubtful metaphysics of “Is X true?”

Think of it as helping get down to what’s really important, what’s in this for you?

BTW, don’t be troubled by anyone who suggests this position removes all limits on discussion. What motivations do you think caused people to adopt the varying positions they have now?

It certainly wasn’t a detached and disinterested search for the truth, whatever people may pretend once they have found the “truth” they are presently defending. The same constraints will persist even if we are truthful with ourselves.

January 8, 2016

Congressional Roll Call Vote – Join/Merge Remote XML Files (XQuery)

Filed under: Government,XML,XQuery — Patrick Durusau @ 10:59 pm

One of the things that yesterday’s output lacked was the full names of the Georgia representatives. Which aren’t reported in the roll call documents.

But, what the roll call documents do have, is the following:

<recorded-vote>
<legislator name-id=”J000288″ sort-field=”Johnson (GA)” unaccented-name=”Johnson (GA)”
party=”D” state=”GA” role=”legislator”>Johnson (GA)</legislator>
<vote>Nay</vote>
</recorded-vote>

With emphasis on name-id=”J000288″

I call that attribute out because there is a sample data file, just for the House of Representatives that has:

<bioguideID>J000288</bioguideID>

And yes, the “name-id” attribute and the <bioguideID> share the same value for Henry C. “Hank” Johnson, Jr. of Georgia.

As far as I can find, that relationship between the “name-id” value in roll call result files and the House Member Data File is undocumented. You have to be paying attention to the data values in the various XML files at Congress.gov.

The result of the XQuery script today has the usual header but for members of the Georgia delegation, the following:

congress-ga-phone

That is the result of joining/merging two XML files hosted at congress.gov in real time. You can substitute any roll call vote and your state as appropriate and generate a similar webpage for that roll call vote.

The roll call vote file I used for this example is: http://clerk.house.gov/evs/2015/roll705.xml and the House Member Data File was: http://xml.house.gov/MemberData/MemberData.xml. The MemberData.xml file dates from April of 2015 so it may not have the latest data on any given member. Documentation for House Member Data in XML (pdf).

The main XQuery function for merging the two XML files:

{for $voter in doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//recorded-vote,
$mem in doc(“http://xml.house.gov/MemberData/MemberData.xml”)//member/member-info
where $voter/legislator[@state = ‘GA’] and $voter/legislator/@name-id = $mem//bioguideID
return <li> {string($mem//official-name)} — {string($voter/vote)} — {string($mem//phone)}&;lt;/li>

At a minimum, you can auto-generate a listing for representatives from your state, their vote on any roll-call vote and give readers their phone number to register their opinion.

This is a crude example of what you can do with XML, XQuery and online data from Congress.gov.

BTW, if you work in “sharing” environment at a media outlet or library, you can also join/merge data that you hold internally, say the private phone number of a congressional aide, for example.

We are not nearly done with the congressional roll call vote but you can begin to see the potential that XQuery offers for very little effort. Not to mention that XQuery scripts can be rapidly adapted to your library or news room.

Try out today’s XQuery roll705-join-merge.xq.txt for yourself. (Apologies for the “.txt” extension but my ISP host has ideas about “safe” files to upload.)

I realize this first week has been kinda haphazard in its presentation. Suggestions welcome on improvements as this series goes forward.

The government and others are cranking out barely useful XML by the boatload. XQuery is your ticket to creating personalized presentations dynamically from that XML and other data.

Enjoy!

PS: For display of XML and XQuery, should I be using a different Word template? Suggestions?

JATS: Journal Article Tag Suite, version 1.1

Filed under: Publishing,XML — Patrick Durusau @ 5:42 pm

JATS: Journal Article Tag Suite, version 1.1

Abstract:

The Journal Article Tag Suite provides a common XML format in which publishers and archives can exchange journal content. The JATS provides a set of XML elements and attributes for describing the textual and graphical content of journal articles as well as some non-article material such as letters, editorials, and book and product reviews.

Documentation and help files: Journal Article Tag Suite.

Tommie Usdin (of Balisage fame) posted to Facebook:

JATS has added capabilities to encode:
– NISO Access License and Indicators
– additional support for multiple language documents and for Japanese documents (including Ruby)
– citation of datasets
and some other things users of version 1.0 have requested.

Another XML vocabulary that provides grist for your XQuery adventures!

Helmification of XML Unicode

Filed under: Emacs,Unicode — Patrick Durusau @ 4:22 pm

XML Unicode by Norman Walsh.

From the webpage:

XML Unicode provides some convenience methods for inserting Unicode characters. When it started, the focus was on characters that were traditionally inserted with named character entities, things like é.

In practice, and in the age of UTF-8, the “insert unicode character” function, especially the Helm-enabled version, is much more broadly useful.

You’re most likely going to want to bind some or all of them to keys.

Complete with suggested key bindings!

Oh, the image from Norman’s tweet:

helm-xml-unicode

FYI, the earliest use of helm-ification (note the hyphen) I can find was on November 24, 2015 by Christian Romney. Citation authorities remain split on whether Christian’s helm-ification or Norman’s helmification is the correct usage. 😉

The Truth About Change (Management, Social, Technical)

Filed under: Marketing — Patrick Durusau @ 4:01 pm

Open Mind tweeted the comment “Accurate” along with this image:

change-want

To make this a true triple, a third frame should read:

Who wants someone else to change?

Then you would see all the hands in the air again.

Taking resistance to change as given, how do you adapt to that for marketing purposes?

Image Error Level Analyser [Read: Detects Fake Photos]

Filed under: Image Processing,News,Verification — Patrick Durusau @ 11:49 am

Image Error Level Analyser by Jonas Wagner.

From the webpage:

I created a new, better tool to analyze digital images. It’s also free and web based. It features error level analysis, clone detection and more. You should try it right now.

Image error level analysis is a technique that can help to identify manipulations to compressed (JPEG) images by detecting the distribution of error introduced after resaving the image at a specific compression rate. You can find some more information about this tequnique in my blog post about this experiment and in this presentation by Neal Krawetz which served as the inspiration for this project. He also has a nice tutorial on how to interpret the results. Please do not take the results of this tool to seriously. It’s more of a toy than anything else.

Doug Mahugh pointed me to this resource in response to a post on detecting fake photos.

Now you don’t have to wait for the National Enquirer to post a photo of the current president shaking hands with aliens. With a minimum of effort you can, and people do, flood the Internet with fake photos.

Some fakes you can spot without assistance, Donald Trump being polite for instance, but other images will be more challenging. That’s where tools such as this one will save you the embarrassment of passing on images everyone but you knows are fakes.

Enjoy!

January 7, 2016

Localizing A Congressional Roll Call Vote (XQuery)

Filed under: Government,XML,XQuery — Patrick Durusau @ 10:07 pm

I made some progress today on localizing a congressional roll call vote.

As you might expect, I chose to localize to the representatives from Georgia. 😉

I used a FLWOR expression to select legislators where the attribute state = GA.

Here is that expression:

<ul>
{for $voter in doc(“http://clerk.house.gov/evs/2015/roll705.xml”)//recorded-vote
where $voter/legislator[@state = ‘GA’]
return <li> {string($voter/legislator)} — {string($voter/vote)}</li>
}</ul>

Which makes our localized display a bit better for local readers but only just.

See roll705-local.html.

What we need is more information that can be found at: http://clerk.house.gov/evs/2015/roll705.xml.

More on that tomorrow!

PostgreSQL 9.5: UPSERT, Row Level Security, and Big Data

Filed under: BigData,PostgreSQL,SQL,XML,XQuery — Patrick Durusau @ 5:26 pm

PostgreSQL 9.5: UPSERT, Row Level Security, and Big Data

Let’s reverse the order of the announcement, to be in reader-friendly order:

Downloads

Press kit

Release Notes

What’s New in 9.5

Edit: I moved my comments above the fold as it were:

Just so you know, PostgreSQL 9.5 documentation, 9.14.2.2 XMLEXISTS says:

Also note that the SQL standard specifies the xmlexists construct to take an XQuery expression as first argument, but PostgreSQL currently only supports XPath, which is a subset of XQuery.

Apologies, you will have to scroll for the subsection, there was no anchor at 9.14.2.2.

If you are looking to make a major contribution to PostgreSQL, note that XQuery is on the todo list.

Now for all the stuff that you will skip reading anyway. 😉

(I would save the prose for use in reports to management about using or transitioning to PostgreSQL 9.5.)

7 JANUARY 2016: The PostgreSQL Global Development Group announces the release of PostgreSQL 9.5. This release adds UPSERT capability, Row Level Security, and multiple Big Data features, which will broaden the user base for the world’s most advanced database. With these new capabilities, PostgreSQL will be the best choice for even more applications for startups, large corporations, and government agencies.

Annie Prévot, CIO of the CNAF, the French Child Benefits Office, said, “The CNAF is providing services for 11 million persons and distributing 73 billion Euros every year, through 26 types of social benefit schemes. This service is essential to the population and it relies on an information system that must be absolutely efficient and reliable. The CNAF’s information system is satisfyingly based on the PostgreSQL database management system.”

UPSERT

A most-requested feature by application developers for several years, “UPSERT” is shorthand for “INSERT, ON CONFLICT UPDATE”, allowing new and updated rows to be treated the same. UPSERT simplifies web and mobile application development by enabling the database to handle conflicts between concurrent data changes. This feature also removes the last significant barrier to migrating legacy MySQL applications to PostgreSQL.

Developed over the last two years by Heroku programmer Peter Geoghegan, PostgreSQL’s implementation of UPSERT is significantly more flexible and powerful than those offered by other relational databases. The new ON CONFLICT clause permits ignoring the new data, or updating different columns or relations in ways which will support complex ETL (Extract, Transform, Load) toolchains for bulk data loading. And, like all of PostgreSQL, it is designed to be absolutely concurrency-safe and to integrate with all other PostgreSQL features, including Logical Replication.

Row Level Security

PostgreSQL continues to expand database security capabilities with its new Row Level Security (RLS) feature. RLS implements true per-row and per-column data access control which integrates with external label-based security stacks such as SE Linux. PostgreSQL is already known as “the most secure by default.” RLS cements its position as the best choice for applications with strong data security requirements, such as compliance with PCI, the European Data Protection Directive, and healthcare data protection standards.

RLS is the culmination of five years of security features added to PostgreSQL, including extensive work by KaiGai Kohei of NEC, Stephen Frost of Crunchy Data, and Dean Rasheed. Through it, database administrators can set security “policies” which filter which rows particular users are allowed to update or view. Data security implemented this way is resistant to SQL injection exploits and other application-level security holes.

Big Data Features

PostgreSQL 9.5 includes multiple new features for bigger databases, and for integrating with other Big Data systems. These features ensure that PostgreSQL continues to have a strong role in the rapidly growing open source Big Data marketplace. Among them are:

BRIN Indexing: This new type of index supports creating tiny, but effective indexes for very large, “naturally ordered” tables. For example, tables containing logging data with billions of rows could be indexed and searched in 5% of the time required by standard BTree indexes.

Faster Sorts: PostgreSQL now sorts text and NUMERIC data faster, using an algorithm called “abbreviated keys”. This makes some queries which need to sort large amounts of data 2X to 12X faster, and can speed up index creation by 20X.

CUBE, ROLLUP and GROUPING SETS: These new standard SQL clauses let users produce reports with multiple levels of summarization in one query instead of requiring several. CUBE will also enable tightly integrating PostgreSQL with more Online Analytic Processing (OLAP) reporting tools such as Tableau.

Foreign Data Wrappers (FDWs): These already allow using PostgreSQL as a query engine for other Big Data systems such as Hadoop and Cassandra. Version 9.5 adds IMPORT FOREIGN SCHEMA and JOIN pushdown making query connections to external databases both easier to set up and more efficient.

TABLESAMPLE: This SQL clause allows grabbing a quick statistical sample of huge tables, without the need for expensive sorting.

“The new BRIN index in PostgreSQL 9.5 is a powerful new feature which enables PostgreSQL to manage and index volumes of data that were impractical or impossible in the past. It allows scalability of data and performance beyond what was considered previously attainable with traditional relational databases and makes PostgreSQL a perfect solution for Big Data analytics,” said Boyan Botev, Lead Database Administrator, Premier, Inc.

« Newer PostsOlder Posts »

Powered by WordPress