Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 12, 2015

Python Week 2015 (Packt Publishing)

Filed under: Books,Python — Patrick Durusau @ 7:48 pm

Python Week 2015 (Packt Publishing)

Packt Publishing is giving away free ebooks and offering 20% off their top selling Python books and videos.

The free book for today (good for approximately 22 hours from this posting):

Building Machine Learning Systems with Python

Expand your Python knowledge and learn all about machine-learning libraries in this user-friendly manual. ML is the next big breakthrough in technology and this book will give you the head-start you need.

  • Master Machine Learning using a broad set of Python libraries and start building your own Python-based ML systems
  • Covers classification, regression, feature engineering, and much more guided by practical examples
  • A scenario-based tutorial to get into the right mind-set of a machine learner (data exploration) and successfully implement this in your new or existing projects

I didn’t know this was Python week! 😉

BTW, there is a website devoted to awareness days, weeks, months: http://www.national-awareness-days.com/

They seem to take the idea quite seriously but they didn’t have Python week on their calendar.

October 11, 2015

Learning FP the hard way:…

Filed under: Elm,Functional Programming,Programming — Patrick Durusau @ 2:44 pm

Learning FP the hard way: Experiences on the Elm language by Ossi Hanhinen.

From the webpage:

Foreward

A good friend of mine convinced me about the benefits of Reactive Programming not that long ago. It almost feels wrong not to write Functional Reactive Programming — apparently the functional methodology lends itself magnificently to complement reactive programming. How, I did not know. So I decided to learn this stuff.

Knowing myself, I soon realized I would only ever get in the mindset if I had to solve some actual problems using the techniques. Having written JavaScript for the past few years, I could have just gone for RxJS. But again, I knew myself and felt it would give me way too much room for “going against the grain”. I needed something that would enforce me to solve everything in a functional manner. This is where Elm comes in.

Elm? What’s that?

Elm is a programming language that compiles to HTML5: HTML, CSS and JavaScript. Depending on how you display your output, it may be a <canvas> with objects inside, or a more traditional web page. Let me repeat. Elm is a language, which compiles into the three languages used to build web apps. What’s more, it is a functional reactive programming language with strong types and immutable data structures.

Okay, so you may have gathered I am no expert in this field, but in case you’re lost, here are my short explanations on the terminology: Appendix: Glossary.

I can think of three reasons why you should read Ossi’s post:

#3 – You like functional programming languages.

#2 – Elm sounds like a great way to avoid writing HTML5: HTML, CSS and JavaScript.

And the #1 reason people will read Ossi’s post:

#1 It provides a base for writing a Space Invaders type game! 😉

I liked Space Invaders but my personal favorite was Missile Command. It was so bad, once upon a time, that I had a jar into which I had to put quarters for each game. Just to keep time spent playing to a reasonable amount.

Enjoy!

October 10, 2015

Journalism Books

Filed under: Journalism,News,Reporting — Patrick Durusau @ 8:24 pm

Journalism Books

A collection of books on journalism, said to:

…inform & inspire the future of journalism.

Some twenty (20) books so it isn’t overwhelming like some of the cybersecurity sites that push current and out-dated material with equal enthusiasm.

Journalists find and report information that some people would prefer they didn’t.

That’s buys them a lot of street cred in my book.

Despite having relatives who are journalists I have read only some of these books. Makes a nice reading list for long winter nights! (Well, aside from XQuery archives and that sort of thing.) 😉

Titan Graph DB Performance Tips

Filed under: Graphs,TinkerPop,Titan — Patrick Durusau @ 2:33 pm

Titan Graph DB Performance Tips

From the post:

In Hawkular Inventory, we use the Tinkerpop API (version 2 for the time being) to store our inventory model in a graph database. We chose Titan as the storage engine configured to store the data in the Cassandra cluster that is also backing Hawkular Metrics and Alerts. This blog post will guide you through some performance-related lessons with Titan that we learned so far.

Inventory is under heavy development with a lot of redesign and refactoring going on between releases so we took quite a naive approach to storing and querying data from the graph database. That is, we store entities from our model as vertices in the graph and the relationships between the entities as edges in the graph. Quite simple and a school book example of how it should look like.

We did declare a couple of indices in the database on the read-only aspects of the vertices (i.e. a “type” of the entity the vertex corresponds to) but we actually didn’t pay too much attention to the performance. We wanted to have the model right first.

Fast forward a couple of months and of course, the performance started to be a real problem. The Hawkular agent for Wildfly is inserting a non-trivial amount of entities and not only inserting them but also querying them has seen a huge performance degradation compared to the simple examples we were unit testing with (due to number of vertices and edges stored).

The time has come to think about how to squeeze some performance out of Titan as well as how to store the data and query it more intelligently.

Several performance tips but the one that caught my eye and resulted in an order of magnitude performance gain:

3. Mirror Properties on The Edges

This is the single most important optimization we’ve done so far. The rationale is this. To jump from a vertex to another vertex over an edge is a fairly expensive operation. Titan uses the adjacency lists to store the vertices and their edges in wide rows in Cassandra. It uses another adjacency list for edges and their target vertices.

So to go from vertex to vertex, Titan actually has to do 2 queries. It would be much easier if we could avoid that at least in some cases.

The solution here is to copy the values of some (frequently used and, in our case, immutable) properties from the “source” and “target” vertices directly to the edges. This helps especially in the cases where you do some kind of filtering on the target vertices that you instead can do directly on the edges. If there is a high number of edges to go through, this helps tremendously because you greatly reduce the number of times you have to do the “second hop” to the target vertex.

I am curious what is being stored on the vertex that requires a second search to jump to the target vertex?

That is if you have moved “popular” vertex properties to the edge, why not move other properties of the node there?

Suggestions?

Request to Order Apple to Disable Security of Apple Device

Filed under: Cybersecurity,Government,Law,Security — Patrick Durusau @ 2:06 pm

From In Re Order Requiring Apple, Inc. To Assist in the Execution of a Search Warrant Issued by this Court (United States District Court, Eastern District of New York)

James Orenstein, Magistrate Judge:

In a sealed application filed on October 8, 2015, the government asks the court to issue an order pursuant to the All Writs Act, 28 U.S.C. § 1651, directing Apple, Inc. (“Apple”) to assist in the execution of a federal search warrant by disabling the security of an Apple device that the government has lawfully seized pursuant to a warrant issue by this court. Law enforcement agents have discovered the device to be locked, and have tried and failed to bypass that lock. As a result, they cannot gain access to any data stored on the device notwithstanding the authority to do so conferred by this court’s warrant Application at 1. For the reasons that follow, I defer ruling on the application and respectfully direct Apple to submit its views in writing, not later than October 15, 2015, as to whether the assistance the government seeks is technically feasible and, if so, whether compliance with the proposed order would be unduly burdensome. If either the government or Apple wishes to present oral arguments on the matter, I will hear such argument on October 22, 2015, at 12:00 noon.

Non-lawyers may find the analysis of the All Writs Act a bit tedious but the opinion picks up speed in dealing with the government’s contention that the pen register decision (the recording of phone numbers dialed from a phone) in United States v. New York Tel. Co., 434 U.S. 159 (1977), supports their request.

To summarize the differences found by Judge Orenstein:

  1. Apple manufactured the device but unlike New York Tel. Co. (Telephone Company), Apple doesn’t own it.
  2. Apple is not a regulated utility with a duty to serve the public. It can make a deliberate decision to favor its customers over the needs of law enforcement (in the absence of statutes to the contrary).
  3. In the Telephone Company case, there was no practical alternative to security the information. Here the government can attempt to coerce the owner of the phone, for instance.
  4. Congressional legislation had attempted to require telephone companies to provide the assistance sought and such legislation is absent, even opposed in Congress for unlocking secure devices.

If Apple has done its encryption properly, then even intimate knowledge of the encryption program should not enable Apple to unlock the device in question.

One hopes Apple will prove to the court’s satisfaction that once locked, even Apple cannot assist in the unlocking of such a device.

The government’s request is one borne of ignorance of basic encryption technology.

I first saw this in a tweet by Morgan Marquis-Boire.

PS: Should at some point the court’s opinion “go away,” write and ask for “apple-unlock-gov.uscourts.nyed.376325.2.0.pdf.”

20 Cognitive Biases That Screw Up Your Decisions

Filed under: Bias,Decision Making — Patrick Durusau @ 12:55 pm

Samantha Lee and Shana Lebowitz created an infographic (Business Insider) of common cognitive biases.

Entertaining, informative, but what key insight is missing from from this infographic?

cognitive-bias

The original at Business Insider is easier to read.

What missing is the question: Where do I stand to see my own cognitive bias?

If I were already aware of it, I would avoid it in decision making. Yes?

So if I am not aware of it, how do I get outside of myself to spot such a bias?

One possible solution, with the emphasis on possible, is to consult with others who may not share your cognitive biases. They may have other ones, ones that are apparent to you but not to them.

No guarantees on that solution because most people don’t appreciate having their cognitive biases pointed out. Particularly if they are central to their sense of identity and self-worth.

Take the management at the Office of Personnel Management (OPM), who have been repeatedly demonstrated to not only be incompetent in matters of cybersecurity but of management in general.

Among other biases, Office of Personnel Management suffers from 7. Confirmation bias, 8. Conservatism bias, 10. Ostrich effect, 17. Selective perception, and 20. Zero-risk bias.

The current infestation of incompetents at the Office of Personnel Management is absolutely convinced, judging from their responses to their Inspector General reports urging modern project management practices, that no change is necessary.

Personally I would fire everyone from the elevator operator (I’m sure they probably still have one) to the top and terminal all retirement and health benefits. Would not cure the technology problems at OPM but would provide the opportunity to have a fresh start at addressing it.

Cognitive biases, self-interest and support of other incompetents, doom reform at the OPM. You may as well wish upon a star.

I first saw this in a tweet by Christophe Lalanne.

AI vs. Taxpayer (so far, taxpayer wins)

Filed under: Evoluntionary,Genetic Algorithms,Machine Learning — Patrick Durusau @ 7:19 am

Computer Scientists Wield Artificial Intelligence to Battle Tax Evasion by Lynnley Browning.

From the post:

When federal authorities want to ferret out abusive tax shelters, they send an army of forensic accountants, auditors and lawyers to burrow into suspicious tax returns.

Analyzing mountains of filings and tracing money flows through far-flung subsidiaries is notoriously difficult; even if the Internal Revenue Service manages to unravel a major scheme, it typically does so only years after its emergence, by which point a fresh dodge has often already replaced it.

But what if that needle-in-a-haystack quest could be done routinely, and quickly, by a computer? Could the federal tax laws — 74,608 pages of legal gray areas and welters of credits, deductions and exemptions — be accurately rendered in an algorithm?

“We see the tax code as a calculator,” said Jacob Rosen, a researcher at the Massachusetts Institute of Technology who focuses on the abstract representation of financial transactions and artificial intelligence techniques. “There are lots of extraordinarily smart people who take individual parts of the tax code and recombine them in complex transactions to construct something not intended by the law.”

A recent paper by Mr. Rosen and four other computer scientists — two others from M.I.T. and two at the Mitre Corporation, a nonprofit technology research and development organization — demonstrated how an algorithm could detect a certain type of known tax shelter used by partnerships.

I had to chuckle when I read:

“There are lots of extraordinarily smart people who take individual parts of the tax code and recombine them in complex transactions to construct something not intended by the law.”

It would be more accurate to say: “…something not intended by the tax policy wonks at the IRS.”

Or at Justice Sutherland said in Gregory v. Helvering (1934):

The legal right of a taxpayer to decrease the amount of what otherwise would be his taxes, or altogether to avoid them, by means which the law permits, cannot be doubted.

Gregory v. Helvering isn’t much comfort because Sutherland also found against the taxpayer in that case on a “not intended by the law” basis.

Still, if you read the paper you will realize taxpayers are still well ahead vis-a-vis any AI:

Drawbacks are that currently SCOTE has a very simplified view of transactions, audit points and law.

Should we revisit the Turing test?

Perhaps a series of tax code tests, 1040A, 1040 long form, corporate reorganization, each one more complex than the one before.

Pitch the latest AIs against tax professionals?

Skybox: A Tool to Help Investigate Environmental Crime

Filed under: Environment,Image Recognition — Patrick Durusau @ 6:47 am

Skybox: A Tool to Help Investigate Environmental Crime by Kristine M. Gutterød & Emilie Gjengedal Vatnøy.

From the post:

Today public companies have to provide reports with data, while many private companies do not have to provide anything. Most companies within the oil, gas and mining sector are private, and to get information can be both expensive and time-consuming.

Skybox is a new developing tool used to extract information from an otherwise private industry. Using moving pictures on ground level—captured by satellites—you can monitor different areas up close.

“You can dig into the details and get more valuable and action-filled information for people both in the public and private sector,” explained Patrick Dunagan, strategic partnerships manager at Google, who worked in developing Skybox.

The satellite images can be useful when investigating environmental crime because you can monitor different companies, for example the change in the number of vehicles approaching or leaving a property, as well as environmental changes in the world.

Excellent news!

Hopefully Skybox will include an option to link in ground level photographs that can identify license plates and take photos of drivers.

Using GPS coordinates with time data, activists will have a means of detecting illegal and/or new dumping sites for surveillance.

Couple that with license plate data and the noose starts to tighten on environmental violators.

You will still need to pierce the shell corporations and follow links to state and local authorities but catching the physical dumpers is a first step.

Information theory and Coding

Filed under: Cryptography,Encryption,Information Theory — Patrick Durusau @ 5:57 am

Information theory and Coding by Mathematicalmonk.

From the introduction video:

Overview of central topics in Information theory and Coding.

Compression (source coding) theory: Source coding theorem, Kraft-McMillan inequality, Rate-distortion theorem

Error-correction (channel coding) theory: Channel coding theorem, Channel capacity, Typicality and the AEP

Compression algorithms: Huffman codes, Arithmetic coding, Lempel-Ziv

Error-correction algorithms: Hamming codes, Reed-Solomon codes, Turbo codes, Gallager (LDPC) codes

There is a great deal of cross-over between information theory and coding, cryptography, statistics, machine learning and other topics. A grounding in information theory and coding will enable you to spot and capitalize on those commonalities.

October 9, 2015

Computational Legal Studies Blog

Filed under: Government,Law,Legal Informatics — Patrick Durusau @ 8:46 pm

Computational Legal Studies Blog by Daniel Katz, Mike Bommarito & Jon Zelner.

From the about page:

The Computational Legal Studies Blog was founded on March 17, 2009. The CLS Blog is an attempt to disseminate legal or law related studies that employ a computational or complex systems component. We hope this venue will serve as a coordinating device for those interested in using such techniques to consider the development of legal systems and/or implementation of more reasoned public policy.

It isn’t important that you believe in “…reasoned public policy” but that you realize a number of people do.

This site collects information and analysis that may be persuasive to “…reasoned public policy” types.

There are a large number of resources and if even a quarter of them are as good as this site, the time spent mining them will be well worth it.

Ping me if you see something extraordinary.

Thanks!

Using Graph Structure Record Linkage on Irish Census

Filed under: Census Data,Graphs,Neo4j — Patrick Durusau @ 8:26 pm

Using Graph Structure Record Linkage on Irish Census Data with Neo4j by Brian Underwood.

From the post:

For just over a year I’ve been obsessed on-and-off with a project ever since I stayed in the town of Skibbereen, Ireland. Taking data from the 1901 and 1911 Irish censuses, I hoped I would be able to find a way to reliably link resident records from the two together to identify the same residents.

Since then I’ve learned a bit about master data management and record linkage and so I thought I would give it another stab.

Here I’d like to talk about how I’ve been matching records based on the local data space around objects to improve my record linkage scoring.

An interesting issue that has currency with intelligence agencies slurping up digital debris at every opportunity. So you have trillions of records. Which ones have you been able to reliably match up?

From a topic map perspective, I could not help but notice that in the 1901 census, the categories for Marriage were:

  • Married
  • Widower
  • Widow
  • Not Married

Whereas the 1911 census records:

  • Married
  • Widower
  • Widow
  • Single

As you know, one of the steps in record linkage is normalization of the data headings and values before you apply the standard techniques to link records together.

In traditional record linkage, the shift from “not married” to “single” is lost in the normalization.

May not be meaningful for your use case but could be important for someone studying shifts in marital relationship language. Or shifts in religious, ethnic, or racist language.

Or for that matter, shifts in the names of database column headers and/or tables. (Like anyone thinks those are stable.)

Pay close attention to how Brian models similarity candidates.

Once you move beyond string equivalent identifiers (TMDM), you are going to be facing the same issues.

Practical Fractals in Space

Filed under: Fractals,Programming — Patrick Durusau @ 4:40 pm

Highly entertaining presentation on fractal curves and how to determine when things are both close in a search and close in terms of distance.

Confirmation of TPP = Death of Public Domain

Filed under: Government,Intellectual Property (IP) — Patrick Durusau @ 3:45 pm

Wikileaks has leaked TPP Treaty: Intellectual Property Rights Chapter – 5 October 2015.

View the leaked “TPP Treaty: Intellectual Property Rights Chapter, Consolidated Text” (PDF, HTML).

Much is objectionable in the “Intellectual Property Rights Chapter” of the Trans-Pacific Partnership (TPP), but nothing so pernicious as its attempt to destroy the public domain.

Extraordinary claim you say?

Consider the following:

Article QQ.H.2: {Presumptions}

1. In civil, criminal, and if applicable, administrative proceedings involving copyright or related rights, each Party shall provide:

(a) for a presumption 109 that, in the absence of proof to the contrary, the person whose name is indicated in the usual manner 110 as the author, performer, producer of the work, performance, or phonogram, or as applicable, the publisher is the designated right holder in such work, performance, or phonogram; and

(b) for a presumption that, in the absence of proof to the contrary, the copyright or related right subsists in such subject matter.

The public domain is made up of works that have been contributed to the public or on which copyright has expired. Anyone claiming copyright on such a work has the burden of proof.

If the TPP is confirmed, all those works in the public domain, the ones with the name of an author, performer, producer or publisher, are presumed to be under copyright.

If you are sued for quoting or distributing such a work, you have the burden of proving the work isn’t subject to copyright. That burden of proof will be at your expense.

The public domain is destroyed by a presumption hidden in section double Q, subsection H, subsection 2.

That’s not just my reading, check out: Copyright Presumptions and the Trans‐Pacific Partnership Agreement.

I haven’t seen an assault against the very notion of the public domain since the OASIS Rights Language TC.

The goal of the Rights Language TC was to create a content management system language required all content, free or not, to carry its header. And since the language wasn’t going to be free, you would be paying a tax to say your content was free or public domain. By default in the language.

Telcos could man the routers and prevent transmission of unlicensed content, i.e., without the header. The public domain was collateral damage in an effort to regulate transmission of content.

The Rights Language TC assault on the public domain failed.

Time to make the TPP assault on the public domain fail as well.

PS: Reach out to old friends, make new friends, activate your social networks. The problem is the Trans-Pacific Partnership, the solution is NO!

Machine Learning for Developers (xyclade.ml)

Filed under: Machine Learning,Programming — Patrick Durusau @ 10:51 am

Machine Learning for Developers (xyclade.ml) by Mike de Waard.

From the webpage:

Most developers these days have heard of machine learning, but when trying to find an ‘easy’ way into this technique, most people find themselves getting scared off by the abstractness of the concept of Machine Learning and terms as regression, unsupervised learning, Probability Density Function and many other definitions. If one switches to books there are books such as An Introduction to Statistical Learning with Applications in R and Machine Learning for Hackers who use programming language R for their examples.

However R is not really a programming language in which one writes programs for everyday use such as is done with for example Java, C#, Scala etc. This is why in this blog machine learning will be introduced using Smile, a machine learning library that can be used both in Java and Scala. These are languages that most developers have seen at least once during their study or career.

The first section ‘The global idea of machine learning’ contains all important concepts and notions you need to know about to get started with the practical examples that are described in the section ‘Practical Examples’. The section practical examples is inspired by the examples from the book Machine Learning for Hackers. Additionally the book Machine Learning in Action was used for validation purposes.

The second section Practical examples contains examples for various machine learning (ML) applications, with Smile as ML library.

Note that in this blog, ‘new’ definitions are hyperlinked such that if you want, you can read more regarding that specific topic, but you are not obliged to do this in order to be able to work through the examples.

A great resource for developers who need an introduction to machine learning.

But an “introduction only.” The practical examples are quite useful but there are only seven (7) of them.

If you like this, look at the resources Grant Ingersoll has collected at: Getting started with open source machine learning and Andrew Ng’s Machine Learning online course in particular.

The nuances of data that can “fool” or lead to unexpected results from machine learning algorithms appears to be largely unexplored or at least not widely discussed.

As machine learning becomes more prevalent, assisting users in obtaining expected answers is going to be a very marketable skill.

Filter [Impersonating You]

Filed under: Filters,News,Twitter — Patrick Durusau @ 9:55 am

Filter

From the webpage:

Filter shows you the top stories from communities of Twitter users across a range of topics like climate change, bitcoin, and U.S. foreign policy.

With Filter, the only way you’ll miss something is if the entire community misses it too.

Following entire Twitter communities is a good idea but signing in with Twitter enables Filter to impersonate you.

This application will be able to:

  • Read Tweets from your timeline.
  • See who you follow, and follow new people.
  • Update your profile.
  • Post Tweets for you.

(emphasis added)

My complaint is general to all Sign in with Twitter applications and Filter is just an example I encountered this morning.

I can’t explore and report to you the features or shortcomings of Filter because I am happy with my current following list and have no desire to allow some unknown (read untrusted) third-party posting on my Twitter account.

If you encounter a review of Filter by someone who isn’t bothered by being randomly impersonated, send me a link. I would like to know more about the site.

Thanks!

October 8, 2015

Processing 3

Filed under: Processing,Programming — Patrick Durusau @ 6:43 pm

Processing 3

From the webpage:

Processing is a flexible software sketchbook and a language for learning how to code within the context of the visual arts. Since 2001, Processing has promoted software literacy within the visual arts and visual literacy within technology. There are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning and prototyping.

  • Free to download and open source
  • Interactive programs with 2D, 3D or PDF output
  • OpenGL integration for accelerated 2D and 3D
  • For GNU/Linux, Mac OS X, and Windows
  • Over 100 libraries extend the core software
  • Well documented, with many books available

Links from the Vimeo page:

Processing website: processing.org
Hello Processing video tutorial: hello.processing.org
Debugger tutorial video: vimeo.com/140134398
Changes in 3.0: github.com/processing/processing/wiki/Changes-in-3.0
Tweets: twitter.com/processingOrg/

Enjoy!

SICP Distilled Now Public

Filed under: Clojure,Functional Programming,Lisp,Scheme — Patrick Durusau @ 6:21 pm

SICP Distilled Now Public by Tom Hall.

From the post:

I have made the SICP Distilled site public.

It is now a year since the SICP Distilled Kickstarter was funded, I originally thought it would take 5 months to complete and had a slightly different vision for what it would look like, massive thanks to all of the supporters for their patience and understanding.

I originally thought an accompaniment to the original book but it has wound up being more like a rewrite, fortunately the original text is under a Creative Commons Attribution-ShareAlike 4.0 License so I can ‘remix’ it without getting into copyright issues if I have the same licence.

It is not complete yet, but I think I now have good explanations of:

Which really are (for me at least) the highlights of the book.

As a very minor supporter in the Kickstarter campaign I deeply appreciate all the effort that Tom has devoted to this effort.

Comment early and often!

Tipsheets & Links from GIJC15

Filed under: Journalism,News,Reporting — Patrick Durusau @ 11:00 am

Tipsheets & Links from GIJC15

Program listing from the recent Investigative Reporters and Editors (IRE) conference, annotated with tipsheets and links.

No response on my question on whether anyone is creating a subject index to tip sheets.

That’s unfortunate. The information trapped in some of these tip sheets merits wider dispersal and use, to say nothing of maintenance.

Kemoge: Latest Android Malware that Can Root Your Smartphone

Filed under: Cybersecurity,Security — Patrick Durusau @ 10:51 am

Kemoge: Latest Android Malware that Can Root Your Smartphone by Khyati Jain.

From the post:

Google Android has been a primary concern of the attackers. Counting from a simple text message that could hack an Android phone remotely to the Stagefright bug making Billion users vulnerable.

Now, the latest is the ‘Kemoge Malware’ that has made its debut as an Adware on the Android mobile phones, allowing third-party app stores to fetch your device’s information and take full control of it.

Security researchers from FireEye Labs have discovered that Kemoge malicious adware family is spreading in 20 countries around the globe. Also, the origin of the Adware’s attack is suspected from China.

See Khyati’s post for the full details but this is another illustration of why claims of security for the Internet of Things (IoT) should be viewed with suspicion.

What will your exposure be when someone roots your television, refrigerator, freezer, A/C?

Convenient Emacs Setup for Social Scientists

Filed under: Editor,Social Sciences — Patrick Durusau @ 9:48 am

Convenient Emacs Setup for Social Scientists Available, Thanks to RTC Team Member

From the post:

QSS consultant Ista Zahn has made work with Emacs a lot easier for social scientists with a package that is now available for users.

Ista Zahn, a member of the Institute’s Research Technology Consulting (RTC) team, became an Emacs user about 10 years ago, because it offered a convenient environment for literate programming and reproducible data analysis. “I quickly discovered,” he says, “as all Emacs users do, that Emacs is a strange creature.” Through nearly 40 years of continuous development, Emacs has accumulated a great many added features, which a user must comb through in order to choose which they need for their own work. Zahn explains how he came about the Emacs setup that is now available:

In the summer of 2014 Gary King asked for an Emacs configuration with a specific set of features, and I realized that my personal Emacs configuration already provided a lot of the features he was looking for. Since that time we’ve worked together to turn my personal Emacs configuration into something that can be useful to other Emacs users. The result is a well-documented Emacs initialization that focuses on configuring Emacs tools commonly used by social scientists, including LaTeX, git, and R.

Ista Zahn’s Emacs package for social scientists is available for download at https://github.com/izahn/dotemacs.

I stumbled over the word “convenient” in the title and not without cause.

Ista concedes as much when he says:

What the world needs now…

As of August 5th 2014 there are 2,960 github repositories named or mentioning ‘.emacs.d’, and another 627 named or mentioning “dotemacs”. Some of these are just personal emacs configurations, but many take pains to provide documentation and instruction for adopting them as your very own emacs configuration. And that’s not to mention the starter-kits, preludes and oh my emacs of the world! With all these options, does the world really need yet another emacs configuration?

No, the world does not need another emacs starter kit. Indeed the guy who started the original emacs starter-kit has concluded that the whole idea is unworkable, and that if you want to use emacs you’re better off configuring it yourself. I agree, and it’s not that hard, even if you don’t know emacs-lisp at all. You can copy code fragments from others’ configuration on github, from the emacs wiki, or from stackoverflow and build up your very own emacs configuration. And eventually it will be so perfect you will think “gee I could save people the trouble of configuring emacs, if they would just clone my configuration”. So you will put it on github, like everyone else (including me). Sigh.

On the other hand it may be that this emacs configuration is what you want after all. It turns on many nice features of emacs, and adds many more. Anyway it does not hurt to give it a try.

As he says, it won’t hurt to give it a try (but be sure to not step on your current Emacs installation/configuration).

How would you customize Emacs for authoring topic maps? What external programs would you call?

I first saw this in a tweet by Christophe Lalanne.

“Big data are not about data,” Djorgovski says. “It’s all about discovery.” [Not re-discovery]

Filed under: Astroinformatics,BigData,Science — Patrick Durusau @ 9:14 am

I first saw this quote in a tweet by Kirk Borne. It is the concluding line from George Djorgovski looks for knowledge hidden in data by Rebecca Fairley Raney.

From the post:

When you sit down to talk with an astronomer, you might expect to learn about galaxies, gravity, quasars or spectroscopy. George Djorgovski could certainly talk about all those topics.

But Djorgovski, a professor of astronomy at the California Institute of Technology, would prefer to talk about data.

The AAAS Fellow has spent more than three decades watching scientists struggle to find needles in massive digital haystacks. Now, he is director of the Center for Data-Driven Discovery at Caltech, where staff scientists are developing advanced data analysis techniques and applying them to fields as disparate as plant biology, disaster response, genetics and neurobiology.

The descriptions of the projects at the center are filled with esoteric phrases like “hyper-dimensional data spaces” and “datascape geometry.”

Astronomy was “always advanced as a digital field,” Djorgovski says, and in recent decades, important discoveries in the field have been driven by novel uses of data.

Take the discovery of quasars.

In the early 20th century, astronomers using radio telescopes thought quasars were stars. But by merging data from different types of observations, they discovered that quasars were rare objects that are powered by gas that spirals into black holes in the center of galaxies.

Quasars were discovered not by a single observation, but by a fusion of data.

It is assumed by Djorgovski and his readers that future researchers won’t have to start from scratch when researching quasars. They can but don’t have to re-mine all the data that supported their original discovery or their association with black holes.

Can you say the same for discoveries you make in your data? Are those discoveries preserved for others or just tossed back into the sea of big data?

Contemporary searching is a form of catch-n-release. You start with your question and whether it takes a few minutes or an hour, you find something resembling an answer to your question.

The data is then tossed back to await the next searcher who has the same or similar question.

How are you capturing your search results to benefit the next searcher?

95 tools for investigative journalists

Filed under: Journalism,News,Reporting — Patrick Durusau @ 8:57 am

95 tools for investigative journalists from @Journalism2ls

95 Resources that cover:

  • Alerts
  • Analytics
  • Collect Data
  • Data Stories
  • Interactive Video
  • Location
  • Map Stories
  • Monitor News
  • Multimedia
  • People & Paper Trail
  • Privacy
  • Production
  • Reporting
  • Snowfalling
  • Social Media
  • Verification
  • Wikipedia

I re-ordered the categories into alphabetical order. In the original post, both the categories and their contents appear in no order that is apparent to me. (If you see an ordering principle in the post, please give a shout.)

Impressive collection of tools!

October 7, 2015

Updates from Complexity Explorer

Filed under: Complexity,Fractals — Patrick Durusau @ 9:49 pm

Updates from Complexity Explorer

Among other news:

Sid Redner on his Introduction to Random Walks tutorial [interview]

From the post:

Three tutorials coming soon: We are in the process of developing three new tutorials for you. Matrix and Vector Algebra, Information Theory, and Computation Theory. Stay tuned! And in the meantime, have you taken our latest tutorials, Maximum Entropy Methods and Random Walks?

Current courses: Fractals and Scaling and Nonlinear Dynamics are happening now! You can still join in these two fantastic courses if you haven’t already. Fractals and Scaling will end October 23rd, and Nonlinear Dynamics is set to end December 1st.

Agent-based Modeling: The Agent-based Modeling course has been delayed and will now be launched in 2016. We will let you know as soon as we have a clearer idea of the timeframe. You just can’t rush a good thing!

If you haven’t visited Complexity Explorer recently then it is time to catch up.

It is clear than none of the likely candidates for U.S. President in 2016 have ever heard of complexity! At least to judge from their overly simple and deterministic claims and policies.

Avoid their mistake, take a tutorial or course at the Complexity Explorer soon!

NLP and Scala Resources

Filed under: Functional Programming,Natural Language Processing,Scala,ScalaNLP — Patrick Durusau @ 9:33 pm

Natural Language Processing and Scala Tutorials by Jason Baldridge.

An impressive collection of resources but in particular, the seventeen (17) Scala tutorials.

Unfortunately, given the state of search and indexing it isn’t possible to easily dedupe the content of these materials against others you may have already found.

Search Correctness? [Right to be Forgotten, Hiding in Plain Sight]

Filed under: Search Engines,Searching — Patrick Durusau @ 9:32 pm

Everyone has an opinion about “political correctness,” but did you know that Google has “search correctness?”

This morning I was attempting, unsuccessfully, to search for:

fetch-search

Here is a screen shot of my results:

fetch-xml-results

I happen to know that “fetch:xml” is an actual string in Google indexed documents because it was a Google search that found the page (using other search terms) where that string exists!

I wanted to search beyond that page for other instances of “fetch:xml.” Surprise, surprise.

Prior to posting this, as a sanity check, I searched for “dc:title.” And got hits!? What’s going on?

If you look at: Metatags.org, you will find them using:

DC.title, DC.creator, etc. Replacing the “:” (colon) with a “.” (period).

That solution is a non-starter for me because I have no control over how other people report fetch:xml. One assumes they will use the same string as appears in the documentation.

As annoying as this is, perhaps it is the solution to the EU’s right to be forgotten problem.

People who want to be forgotten can change their names to punctuation that Google does not index.

Once their names are entirely punctuation, these shy individuals will never be found in a Google search.

Works across the globe with no burden on search engine operators.

Treasure Trove of R Scripts…

Filed under: Open Source,R — Patrick Durusau @ 8:30 pm

Treasure Trove of R Scripts for Auto Classification, Chart Generation, Solr, Mongo, MySQL and Ton More by Jitender Aswani.

From the post:

In this repository hosted at github, the datadolph.in team is sharing all of the R codebase that it developed to analyze large quantities of data.

datadolph.in team has benefited tremendously from fellow R bloggers and other open source communities and is proud to contribute all of its codebase into the community.

The codebase includes ETL and integration scripts on –

  • R-Solr Integration
  • R-Mongo Interaction
  • R-MySQL Interaction
  • Fetching, cleansing and transforming data
  • Classification (identify column types)
  • Default chart generation (based on simple heuristics and matching a dimension with a measure)

Github Source: https://github.com/datadolphyn/R

I count twenty-two (22) R scripts in this generous donation back to the R community!

Enjoy!

Some key Win-Vector serial data science articles

Filed under: Data Science,R,Statistics — Patrick Durusau @ 8:20 pm

Some key Win-Vector serial data science articles by John Mount.

From the post:

As readers have surely noticed the Win-Vector LLC blog isn’t a stream of short notes, but instead a collection of long technical articles. It is the only way we can properly treat topics of consequence.

  • Statistics to English translation.

    This series tries to find vibrant applications and explanations of standard good statistical practices, to make them more approachable to the non statistician.

  • Statistics as it should be.

    This series tries to cover cutting edge machine learning techniques, and then adapt and explain them in traditional statistical terms.

  • R as it is.

    This series tries to teach the statistical programming language R “warts and all” so we can see it as the versatile and powerful data science tool that it is.

More than enough reasons to start haunting the the Win-Vector LLC blog on a regular basis.

Perhaps an inspiration to do more long-form posts as well.

Computational Literary Analysis

Filed under: Clojure,Computational Literary Analysis,Lisp — Patrick Durusau @ 8:07 pm

Computational Literary Analysis by Atabey Kaygun.

From the post:

Description of the problem

One of the brilliant ideas that Kurt Vonnegut came up with was that one can track the plot of a literary work using graphical methods. He did that intuitively. Today’s question is “Can we track the plot changes in a text using computational or algorithmic methods?”

Overlaps between units of texts

The basic idea is to split a text into fundamental units (whether this is a sentence, or a paragraph depends on the document) and then convert each unit into a hash table where the keys are stemmed words within the unit, and the values are the number of times these (stemmed) words appear in each unit.

My hypothesis is (and I will test that in this experiment below) that the amount of overlap (the number of common words) between two consecutive units tells us how the plot is advancing.

I will take the fundamental unit as a sentence below.

Clojure, Lisp, computational literary analysis, what’s there not to like? 😉

Given the hypothesis:

the amount of overlap (the number of common words) between two consecutive units tells us how the plot is advancing

Atabey doesn’t say what his criteria is for “the plot advancing.” I mention that because both of the plots he offers trail off from their highs.

If there is a plot advance, shouldn’t the respective speeches build until they peak at the end?

Or is there some more complex “plot advancement” at play?

One of the things that makes this and similar analyses useful, particularly of well known speeches/works, is that we all “know” the high points. We have been conditioned to hear those as distinct, when the original hearers/readers were encountering it for the first time.

Such tools can pry us out of the rut of prior analysis. Sometimes.

Now over 1,000,000 Items to Search on Congress.gov [Cause to Celebrate?]

Filed under: Government,Government Data,Law,Law - Sources,Library — Patrick Durusau @ 4:08 pm

Now over 1,000,000 Items to Search on Congress.gov: Communications and More Added by Andrew Weber.

From the post:

This has been a great year as we continue our push to develop and refine Congress.gov.  There were email alerts added in February, treaties and better default text in March, the Federalist Papers and more browse options in May, and accessibility and user requested features in July.  With this October update, Senate Executive Communications from THOMAS have migrated to Congress.gov.  There is an About Executive Communications page that provides more detail about the scope of coverage, searching, viewing, and obtaining copies.

Not to mention a new video “help” series, Legislative Subject Terms and Popular and Short Titles.

All good and from one of the few government institutions that merits respect, the Library of Congress.

Why the “Cause to Celebrate?”

This is an excellent start and certainly Congress.gov has shown itself to be far more responsive to user requests than vendors are to reports of software vulnerabilities.

But we are still at the higher level of data, legislation, regulations, etc.

Where needs to follow is a dive downward to identify who obtains the benefits of legislation/regulations? Who obtains permits, for what and at what market value? Who obtains benefits, credits, allowances? Who wins contracts and where does that money go as it tracks down the prime contractor -> sub-prime contractor -> etc. pipeline?

It is ironic that when candidates for president talk about tax reform they tend to focus on the tax tables. Which are two (2) pages out of the current 6,455 pages of the IRC (in pdf, http://uscode.house.gov/download/releasepoints/us/pl/114/51/pdf_usc26@114-51.zip).

Knowing who benefits and by how much for the rest of the pages of the IRC isn’t going to make government any cleaner.

But, when paired with campaign contributions, it will give everyone an even footing on buying favors from the government.

Not unlike public disclosure enables a relatively fair stock exchange, in the case of government it will enable relative fairness in corruption.

Internet of Things (IoT) and More $Free Porn

Filed under: Cybersecurity,IoT - Internet of Things,Security — Patrick Durusau @ 2:52 pm

Every day brings new reports of digital data breaches. Security for the Internet of Things (IoT) is being discussed, but in light of the drum roll of breaches, there is very little confidence the IoT will be any more secure than present IT systems.

That being the case and by way of forewarning, unplug your webcam when you are not using it.

Insecurity in the Internet of Things (IoT) will geometrically increase the amount of $free porn on the Internet.

Amateur porn to be sure but instead of being people you are unlikely to meet, this could be the couple next door, or down the block, your doctor or pharmacist, perhaps even your spouse.

If you don’t believe me, check out: Cyber hacker hijacked webcams to spy on people having sex by David Wells.

From the story:

A cyber criminal hijacked computers to spy on people having sex through their webcams, the National Crime Agency (NCA) has said.

Stefan Rigo, 33, used malware called Blackshades to give him control over strangers’ cameras and spent five to 12 hours a day watching what they were doing in front of their computers.

The NCA said he was addicted to monitoring his victims, some of whom he knew and some who were complete strangers.

Rigo was given a 40-week suspended prison sentence, placed on the Sex Offenders Register for seven years and ordered to do 200 hours of unpaid work by magistrates in Leeds after he admitted voyeurism at a previous hearing, the agency confirmed.

Well, there’s a deterrent, “200 hours of unpaid work.” 😉

Looking forward to cellphone apps for finding vulnerable webcams, streaming them live to public or private accounts, just a tap away from $free porn.

Of course, you may also see people doing things that are illegal in your jurisdiction and not just sexually illegal things.

Wondering how the police will react to major drug deals being caught via an “ISpy” app for a cellphone and streamed to the Internet?

For those of you who have never deliberately disconnected anything from the Internet, I include this illustration:

unplug

Yep, that’s how its done.

You do have to remember to “reconnect” (another new word) it.

The upside is that you will be safe from strangers watching you have sex and/or commit crimes or indiscretions in the privacy of your own home.

They may be able to hear or monitor you through one or more other IoT devices but they won’t have video. If that makes you feel any better.

« Newer PostsOlder Posts »

Powered by WordPress