Archive for December, 2014

U.S. Appropriations by Fiscal Year

Wednesday, December 31st, 2014

U.S. Appropriations by Fiscal Year

Congressdotgov tweeted about this resource earlier today.

It’s a great starting place for research on U.S. appropriations but it is more of a bulk resource than a granular one.

You will have to wade through this resource and many others to piece together some of the details on any particular line item in the budget. Not surprisingly, anyone interested in the same line item will have to repeat that mechanical process. For every line in the budget.

There are collected resources on different aspects of the budget process, hearing documents, campaign donation records, etc. but they are for the most part all separated and not easily collated. Perhaps that is due to lack of foresight. Perhaps.

In any event, it is a starting place if you have a particular line item in mind. Think about creating a result that can be re-used and shared if at all possible.

Awesome Common Lisp

Wednesday, December 31st, 2014

Awesome Common Lisp by Koz Ross.

From the webpage:

A curated list of Common Lisp good stuff. I give preference to free software for code, and sellers who aren’t evil for physical resources.

This is released under the GNU Free Documentation License – its text is provided in the LICENSE file.

All libraries listed here are available from Quicklisp unless stated otherwise.

A deeply impressive list of Common Lisp resources. As “big data” becomes even bigger, curated lists (or more sophisticated) collections of resources will become quite popular.

Yahoo tried that at the inception of the WWW but its buckets were too random and too large. Drowning in a bucket is as bad as drowning in the ocean. Perhaps more so because you jumped into the bucket to avoid drowning at all.

Curated collections of resources will have to be more focused and sophisticated than the Yahoo bucket model.

Specific suggestions?

31C3: a new dawn

Wednesday, December 31st, 2014

31C3: a new dawn (archives)

Live-Stream

Chaos Communication Congress conference in Hamburg.

I took a break to watch Higher-Dimensional Geometry and Fractals. If you haven’t experienced a presentation was moving quickly, this one will give you that experience. The challenge would be to stop at each slide and fully understand it before moving to the next slide. Very cool!

C++ sources for demo segments: https://github.com/ef-gy/topologic

Blog series: https://ef.gy/linear-algebra

Scott Draves, Erik Reckase, The Fractal Flame Algorithm: http://flam3.com/flame_draves.pdf

The range of talks is really amazing.

I first saw this in a post by Violet Blue, Invasive phone tracking: New SS7 research blows the lid off mobile security, where Violet covers three of the presentations at 31C3 on cellphone scanning technology. (Summary: You are even less secure than you imagine.)

Fingerprints can be reproduced from publicly available photos

Wednesday, December 31st, 2014

Fingerprints can be reproduced from publicly available photos by Kif Leswing.

From the post:

At a conference in Hamberg Germany this weekend, biometrics researcher Jan Krisller demonstrated how he spoofed a politician’s fingerprint using photos taken by a “standard photo camera.”

Krissler speculated that politicians might even want to “wear gloves when talking in public.”

Krissler claims he isolated German Defense Minister Ursula von der Leyen’s fingerprint from high-resolution photos taken during a public appearance in October using commercially available software called VeriFinger.

I’m not sure that politicians have enough access to critical infrastructure for them to bother wearing gloves in public. 😉 Or at least they shouldn’t have that kind of access.

This isn’t an issue in Texas because the driver license bureau collects fingerprints from everyone who applies for or renews a drivers license. (Watchdog: Driver’s license centers snatch your fingerprints)

Fingerprints of bank tellers, network security experts, nuclear power plant operators, sysadmins and similar folks who have access to critical infrastructure. Much easier to call a friend at the DMV than trying to scrap a print from a photograph. Where does your state collect fingerprints?

The more worrisome aspect is that children aren’t reluctant to show their hands and touch things constantly in public. Anyone gathering children’s fingerprints under a variety of guises will eventually have a database of people who do have access to critical infrastructure.

That may sound like monitoring all the Web traffic in the world, something we would have laughed about a decade ago. No one is laughing about that now.

PS: What combination of factors will you require for identification?

Google’s Secretive DeepMind Startup Unveils a “Neural Turing Machine”

Wednesday, December 31st, 2014

Google’s Secretive DeepMind Startup Unveils a “Neural Turing Machine”

From the post:

One of the great challenges of neuroscience is to understand the short-term working memory in the human brain. At the same time, computer scientists would dearly love to reproduce the same kind of memory in silico.

Today, Google’s secretive DeepMind startup, which it bought for $400 million earlier this year, unveils a prototype computer that attempts to mimic some of the properties of the human brain’s short-term working memory. The new computer is a type of neural network that has been adapted to work with an external memory. The result is a computer that learns as it stores memories and can later retrieve them to perform logical tasks beyond those it has been trained to do.

Of particular interest to topic mappers and folks looking for realistic semantic solutions for big data. In particular the concept of “recoding,” which is how the human brain collapses multiple chunks of data into one chunk for easier access/processing.

It sounds close to referential transparency to me but where the transparency is optional. That is you don’t have to look unless you need the details.

The full article will fully repay the time to read it and then some:

Neural Turing Machines by Alex Graves, Greg Wayne, Ivo Danihelka.

Abstract:

We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

The paper was revised on 10 December 2014 so if you read an earlier version, you may want to read it again. Whether Google cracks this aspect of the problem of intelligence or not, it sounds like an intriguing technique with applications in topic map/semantic processing.

Developing a D3.js Edge

Wednesday, December 31st, 2014

Developing a D3.js Edge by Chris Viau, Andrew Thornton, Ger Hobbelt, and Roland Dunn. (book)

From the description:

D3 is a powerful framework for producing interactive data visualizations. Many examples created in the real world with D3, however, can best be described as “spaghetti code.” So, if you are interested in using D3 in a reusable and modular way, which is of course in line with modern development practices, then this book is for you!

This book is aimed at intermediate developers, so to get the most from this book you need to know some JavaScript, and you should have experience creating graphics using D3. You will also want to have a good debugger handy (Chrome Developer panel or the Firefox/Firebug combo), to help you step through the many real world examples that you’ll find in this book. You should also be somewhat comfortable with any of these concepts:

If you read Kurt Cagle’s Ten Trends in Data Science 2015, you will recall him saying that 2014: “…demand for good data visualizers went from tepid to white hot” with the anticipation the same will be true for 2015.

Do note the qualifier “good.” That implies to me more than being able to use the stock charts and tools that you find in many low-end data tools.

Unlike the graphic capabilities of low-end data tools, D3 is limited more by your imagination than any practical limit to D3.

So, dust off your imagination and add D3 to your tool belt for data visualization.

PS: All the source code is here: https://github.com/backstopmedia/D3Edge

Return on Terrorism (ROT) (9/11) Update

Wednesday, December 31st, 2014

On December 8, 2014, the Congressional Research Service (CRS) issued: The Cost of Iraq, Afghanistan, and Other Global War on Terror Operations Since 9/11 which details U.S. spending on some anti-terror operations post 9/11, as of 2014.

From the summary:

With enactment of the FY2014 Consolidated Appropriations Act on January 1, 2014 (H.R. 3547/P.L. 113-73), Congress has approved appropriations for the past 13 years of war that total $1.6 trillion for military operations, base support, weapons maintenance, training of Afghan and Iraq security forces, reconstruction, foreign aid, embassy costs, and veterans’ health care for the war operations initiated since the 9/11 attacks.

Contrast that with the 9/11 Commission report’s finding on the cost of 9/11:

As noted above, the 9/11 plotters spent somewhere between $400,000 and $500,000 to plan and conduct their attack.

Visualizing the difference between the 9/11 plotters and this one area of U.S. spending:

rot-9-11-a

Spending by the 9/11 plotters is in blue and here is represented by the flat blue square on the left.

That is a return of 3,200,000 percent on the 9/11 plotters investment in terrorism.

But, even that is incomplete. The CRS report leaves out spending on national intelligence (which has doubled since 9/11, currently exceeds $70 billion per year) and some $791 billion on “homeland security (as of early 2013). Even allowing for duplication and the vagaries of government budgets and accounting, appears to easily exceed another $1 trillion dollars. (BTW, the total cost of the New Deal was only $500 billion after allowing for inflation.)

If we include the spending on “homeland security” and increased intelligence work post 9/11, the amount spent (non-productively) to fight terrorism is $2.6 trillion dollars.

Visualizing the difference between $500,000 and $2.6 trillion:

rot-9-11-c

Spending by the 9/11 plotters is in blue but doesn’t even get a full blue square on the left.

That is a return of 5,200,000 percent on the 9/11 plotters investment in terrorism.

That $2.6 trillion dollars does not include loss of the right to free speech:

Police arrest local teen for tweeting terroristic threat

Police arrested a 17-year-old for posting a photo on social media of a rifle’s sights pointed at a marked Fort Worth police car with a threatening message.

Montrae Toliver was arrested for making a terroristic threat, according to Fort Worth police spokesperson Tamara Pena. Toliver allegedly posted the photo on Twitter with the caption, “Should I do it? They don’t care for a black male anyways [sic].”

Authorities said the rifle turned out to be a toy, but they said that doesn’t matter.

“It’s considered a threat,” said Officer Tamara Pena during a news conference Monday afternoon.

On December 22, the Department of Homeland Security became aware of the tweet and the Fort Worth police, among other jurisdictions, launched an investigation into the threat of violence against police. Further investigation revealed Toliver was responsible for posting the tweet, Pena said.

I could continue the litany of what we have lost in the war on terrorism, such as the right to prevent fondling of our children by strangers in airports and a host of other indignities.

But I want to end with a question:

Considering our respective investments and losses, who do you think is winning the war on terrorism?

Retaliation Hacking

Wednesday, December 31st, 2014

FBI doesn’t want companies to hack in retaliation by Jonathan Vanian.

From the post:

Major banks, retailers, manufacturers and other companies are fed up with the increasing amount of cyber attacks and are exploring hacking in revenge, something the FBI doesn’t seem too keen on, according to a Bloomberg report.

Based on the perception that the U.S. government is not doing enough to stop data breaches, some companies are looking to hack into criminal networks and take back their goods as well as stop future breaches. To help with the retaliation hacks, these companies are supposedly working with security firms.

Do you think the FBI is also unsympathetic to relation hacking for issuance of National Security Letters and FISA court orders?

I ask because in many ways U.S. citizens and companies are in more danger from their own government than any foreign or domestic hackers. True, Sony took an enormous hit recently but sixth graders are often victimized by eight graders. What can I say?

Corporations that take cybersecurity less seriously than they do employee theft or accurate accounting records are going to have cybersecurity issues. (full stop) No other position will enable corporations to begin moving towards a partial solution to hacking.

I say a partial solution because there will always be the potential for highly imaginative hacks but the vast majority (think Sony) could be avoided by known and routine security measures. Simply because management doesn’t know how to maintain cybersecurity does not mean no one known how. Or that hackers are rogue superminds competing with each other in an electronic ether. (Sounds great but that’s not reality. Attn: C-Suite – Tron was a movie, i.e., fiction.)

Corporations and individuals need to start taking cybersecurity seriously. It will take time to strip the U.S. government of its overreaching powers (NSLs, FISA) but you can make its illegal surveillance more difficult.

Every minute wasted on your innocent but encrypted data stream is a minute government can’t spend on some other innocent data stream. Together, we can protect each other.

The New Chess World Champion

Tuesday, December 30th, 2014

The New Chess World Champion by K W Regan.

From the post:

Larry Kaufman is a Grandmaster of chess, and has teamed in the development of two champion computer chess programs, Rybka and Komodo. I have known him from chess tournaments since the 1970s. He earned the title of International Master (IM) from the World Chess Federation in 1980, a year before I did. He earned his GM title in 2008 by dint of winning the World Senior Chess Championship, equal with GM Mihai Suba.

Today we salute Komodo for winning the 7th Thoresen Chess Engines Competition (TCEC), which some regard as the de-facto world computer chess championship.

Partially computer chess history and present with asides on Shogi, dots-and-boxes, Arimaa, and Go.

Regan forgets to mention that thus far, computers don’t compete at all in the game of thrones. No one has succeeded in teaching a computer to lie. That is knowing the correct answer and for motives of its own, concealing that answer and offering another.

PS:

Komodo (commercial, $59.96)

Stockfish (open source)

Ten Trends in Data Science 2015

Tuesday, December 30th, 2014

Ten Trends in Data Science 2015 by Kurt Cagle.

From the post:

There is a certain irony talking about trends in data science, as much of data science is geared primarily to detecting and extrapolating trends from disparate data patterns. In this case, this is part of a series of analyses I’ve written for over a decade, looking at what I see as the key areas that most heavily impact the area of technology I’m focusing on at the top. For the last few years, this has been a set of technologies which have increasingly been subsumed under the rubrick of Data Science.

I tend to use the term to embrace an understanding of four key areas – Data Acquisition (how you get data into a usable form and set of stores or services), Data Awareness (how you provide context to this data so that it can work more effectively across or between enterprises), Data Analysis (turning this aware data into usable information for decision makers and data consumers) and Data Governance (establishing the business structures, provenance maintenance and continuity for that data). These I collectively call the Data Cycle, and it seems to be the broad arc that most data (whether Big Data or Small Data) follows in its life cycle. I’ll cover this cycle in more detail later, but for now, it provides a reasonably good scope for what I see as the trends that are emerging in this field.

This has been a remarkably good year in the field of data science – the Big Data field both matured and spawned a few additional areas of study, semantics went from being an obscure term to getting attention in the C-Suite and the demand for good data visualizers went from tepid to white hot.

A great overview of what is likely to be “hot” in 2015.

I disagree with Kurt when he says:


Over the course of the next year, this major upgrade to the SPARQL standard will become the de facto mechanism for communicating with triple stores, which will in turn driive the utilization of new semantics-based applications.

Semantics already figure pretty heavily in recommendation engines and similar applications, since these kinds of applications deal more heavily with searching and making connections between types of resources, and it plays fairly heavily in areas such as machine learning and NLP.

Not that I disagree with semantics being the area where large strides could be made and large profits as well. I disagree that SPARQL and triple-stores are going to play a meaningful role with regard to semantics, especially with recommendation engines, machine learning and NLP.

The “semantics” that recommendation engines mine are entirely unknown to the recommendation engine. Such a engine is ingesting a large amount of data and without making an explicit semantic choice, recommends a product to a user based on previous choices by that user and others. It is an entirely mechanical operation that has no sense of “semantics” at all. Semantic “understanding” isn’t required for Netflix or Amazon to do a pretty good job of recommending items to customers.

In terms of a recommendation, I seriously doubt a recommendation engine relies upon two items having a part-whole or class-subclass relationship. It is relying upon observed shopping/consumption behavior which may or may not have any internal coherence at all. What matters to a vendor, is that a sale is made, semantics be damned.

Other than that quibble, Kurt is predicting what most people anticipate seeing next year. Now for the fun part, seeing how the future develops in interesting and unpredictable ways.

Twitter and CS Departments (Part 1)

Tuesday, December 30th, 2014

I don’t spend all my time as Dylan says:

I’m on the pavement. Thinking about the government.

😉

Over the weekend I was looking at: The 50 Most Innovative Computer Science Departments in the U.S. in terms of how to gather information from those departments together.

One of the things that I haven’t seen is a curated list of faculty who have twitter accounts.

What follows are the top two CS departments as a proof-of-concept only and to seek your advice on a format for a complete set.

Massachusetts Institute of Technology:

Stanford:

The names and locations, where available, are from the user profiles maintained by Twitter. As you can see, there is no common location that would suffice to capture all the faculty for either of these departments. In fact, some of these were identified only by pursuing links on Twitter profiles that identified the individuals as faculty at non-Twitter sites.

Building the data set out, once I have a curated set of faculty members for the top fifty (50) institutions, such as following, followers, etc. will be a matter of querying Twitter.

On the curated set of faculty members, any preference for format? I was thinking of something simple, like a CSV file with TwitterHandle, Full Name (as appears in Twitter profile), URI of department. Does that work for everyone? (Faculty as listed by the CS department)

Suggestions? Comments?

NSA IOB Dump Finally Complete!

Tuesday, December 30th, 2014

The “Christmas Eve” NSA file dump that you will see reported at: NSA Waited Until Christmas Eve To Release Details Of Its Illegal Surveillance On Americans, What you need to know about the NSA document dump, and, U.S. Spy Agency Reports Improper Surveillance of Americans, repeated by various other sources, which never mentioned the dump being incomplete, is now complete.

I reported in Merry Christmas From the NSA! Missing Files about 15 missing files, which by my report of: NSA IOB Report Dump – Still Missing Files had become 3 missing files and when I checked today, the NSA file dump is complete, all being silent corrections to the file dump.

You will notice that the final three files: 3Q FY10, 3Q FY09, 4Q FY09 are named differently from the other files:

nsa-iob-30Dec2014

as text:

IOB/FY2010_1Q_IOB_Report.pdf
IOB/FY2010_2Q_IOB_Report.pdf
IOB/3Q_FY2010.pdf
IOB/FY2010_4Q_IOB_Report.pdf

IOB/FY2009_1Q_IOB_Report.pdf
IOB/FY2009_2Q_IOB_Report.pdf
IOB/3Q_FY2009.pdf
IOB/4Q_FY2009.pdf

Data analysis resources should be focused on the 3rd quarter report for 2010 and 3rd quarter and 4th quarter reports for 2009, especially as compared to other materials (Snowden?) for those time frames.

My heuristic being that people don’t delay without a reason. It isn’t necessary to know the reason, just to observe the delay. Could be entirely due to incompetence but if you count:

  1. Christmas Eve as happenstance
  2. Second incomplete dump as coincidence
  3. File renaming issue is three, enemy action.

I have local copies of the files as they exist as of 17:13 on 30 December 2014 and I will be tarring those up for upload to my site later this evening. Please replicate them elsewhere as you see fit.

Suggestions on tooling, collaborations, analysis, etc. welcome!

The Semantics of Victory

Tuesday, December 30th, 2014

NATO holds ceremony closing Afghan mission

From the post:

NATO has held a ceremony in Kabul formally ending its war in Afghanistan, officials said, after 13 years of conflict and gradual troop withdrawals that have left the country in the grip of worsening conflicts with armed groups.

The event was carried out on Sunday in secret due to the threat of Taliban strikes in the Afghan capital, which has been hit by repeated suicide bombings and gun attacks over recent years.

Compare that description to the AP story that appeared in the New York Times under: U.S. and NATO Formally End War in Afghanistan:

The war in Afghanistan, fought for 13 bloody years and still raging, came to a formal end Sunday with a quiet flag-lowering ceremony in Kabul that marked the transition of the fighting from U.S.-led combat troops to the country’s own security forces.

In front of a small, hand-picked audience at the headquarters of the NATO mission, the green-and-white flag of the International Security Assistance Force was ceremonially rolled up and sheathed, and the flag of the new international mission called Resolute Support was hoisted.

I assume from the dates and locations described these two accounts are describing the same event. Yes?

Does “…hand-picked audience…” translate to “…carried out … in secret due to the threat of Taliban strikes…?”

Bias isn’t unique to the United States, press or other sources but it is easier for me to spot. Examples from other sources are welcome.

The inevitable loss in Afghanistan is another example of failing to understand the semantics and culture of an opponent. (See my comments about Vietnam in Rare Find: Honest General Speaks Publicly About IS (ISIL, ISIS))

Let me summarize that lesson this way: An opponent cannot be “defeated” until you understand what “defeat” means to that opponent. And, you are capable of inflicting your opponent’s definition of “defeat” upon them.

It’s a two part requirement: 1) Opponent’s understanding of “defeat,” and 2) Inflicting opponent’s understanding of defeat. Fail on either requirement and your opponent has not been defeated.

Semantics are as important in war as in peace, if not more so.

Rare Find: Honest General Speaks Publicly About IS (ISIL, ISIS)

Monday, December 29th, 2014

In Battle to Defang ISIS, U.S. Targets Its Psychology by Eric Schmitt.

From the post:

Maj. Gen. Michael K. Nagata, commander of American Special Operations forces in the Middle East, sought help this summer in solving an urgent problem for the American military: What makes the Islamic State so dangerous?

Trying to decipher this complex enemy — a hybrid terrorist organization and a conventional army — is such a conundrum that General Nagata assembled an unofficial brain trust outside the traditional realms of expertise within the Pentagon, State Department and intelligence agencies, in search of fresh ideas and inspiration. Business professors, for example, are examining the Islamic State’s marketing and branding strategies.

“We do not understand the movement, and until we do, we are not going to defeat it,” he said, according to the confidential minutes of a conference call he held with the experts. “We have not defeated the idea. We do not even understand the idea.” (emphasis added)

An honest member of the any administration in Washington is so unusual that I wanted to draw your attention to Maj. General Michael K. Nagata.

His problem, as you will quickly recognize, is one of a diversity of semantics. What is heard one way by a Western audience is heard completely differently by an audience with a different tradition.

The general may not think of it as “progress,” but getting Washington policy makers to acknowledge that there is a legitimate semantic gap between Western policy makers and IS is a huge first step. It can’t be grudging or half-hearted. Western policy makers have to acknowledge that there are honest views of the world that are different from their own. IS isn’t practicing dishonest, deception, perversely refusing to acknowledge the truth of Western statements, etc. Members of IS have an honest but different semantic view of the world.

If the good general can get policy makers to take that step, then and only then can the discussion of what that “other” semantic is and how to map it into terms comprehensible to Western policy makers can begin. If that step isn’t taken, then the resources necessary to explore and map that “other” semantic are never going to be allocated. And even if allocated, the results will never figure into policy making with regard to IS.

Failing on any of those three points: failing to concede the legitimacy of the IS semantic, failing to allocate resources to explore and understand the IS semantic, failing to incorporate an understanding of the IS semantic into policy making, is going to result in a failure to “defeat” IS, if that remains a goal after understanding its semantic.

Need an example? Consider the Viet-Nam war, in which approximately 58,220 Americans died and millions of Vietnamese, Laotions and Cambodians died, not counting long term injuries among all of the aforementioned. In case you have not heard, the United States lost the Vietnam War.

The reasons for that loss are wide and varied but let me suggest two semantic differences that may have played a role in that defeat. First, the Vietnamese have a long term view of repelling foreign invaders. Consider that Vietnam was occupied by the Chinese from 111 BCE until 938 CE, a period of more than one thousand (1,000) years. American war planners had a war semantic of planning for the next presidential election, not a winning strategy for a foe with a semantic that was two hundred and fifty (250) times longer.

The other semantic difference (among many others) was the understanding of “democracy,” which is usually heralded by American policy makers as a grand prize resulting from American involvement. In Vietnam, however, the villages and hamlets already had what some would consider democracy for centuries. (Beyond Hanoi: Local Government in Vietnam) Different semantic for “democracy” to be sure but one that was left unexplored in the haste to import a U.S. semantic of the concept.

Fighting a war where you don’t understand the semantics in play for the “other” side is risky business.

General Nagata has taken the first step towards such an understanding by admitting that he and his advisors don’t understand the semantics of IS. The next step should be to find someone who does. May I suggest talking to members of IS under informal meeting arrangements? Such that diplomatic protocols and news reporting doesn’t interfere with honest conversations? I suspect IS members are as ignorant of U.S. semantics as U.S. planners are of IS semantics so there would be some benefit for all concerned.

Such meetings would yield more accurate understandings than U.S. born analysts who live in upper middle-class Western enclaves and attempt to project themselves into foreign cultures. The understanding derived from such meetings could well contradict current U.S. policy assessments and objectives. Whether any administration has the political will to act upon assessments that aren’t the product of a shared post-Enlightenment semantic remains to be seen. But such a assessments must be obtained first to answer that question.

Would topic maps help in such an endeavor? Perhaps, perhaps not. The most critical aspect of such a project would be conceding for all purposes, the legitimacy of the “other” semantic, where “other” depends on what side you are on. That is a topic map “state of mind” as it were, where all semantics are treated equally and not any one as more legitimate than any other.


PS: A litmus test for Major General Michael K. Nagata to use in assembling a team to attempt to understand IS semantics: Have each applicant write their description of the 9/11 hijackers in thirty (30) words or less. Any applicant who uses any variant of coward, extremist, terrorist, fanatic, etc. should be wished well and sent on their way. Not a judgement on their fitness for other tasks but they are not going to be able to bridge the semantic gap between current U.S. thinking and that of IS.

The CIA has a report on some of the gaps but I don’t know if it will be easier for General Nagata to ask the CIA for a copy or to just find a copy on the Internet. It illustrates, for example, why the American strategy of killing IS leadership is non-productive if not counter-productive.

If you have the means, please forward this post to General Nagata’s attention. I wasn’t able to easily find a direct means of contacting him.

Bioclojure: a functional library for the manipulation of biological sequences

Monday, December 29th, 2014

Bioclojure: a functional library for the manipulation of biological sequences by Jordan Plieskatt, Gabriel Rinaldi, Paul J. Brindley, Xinying Jia, Jeremy Potriquet, Jeffrey Bethony, and Jason Mulvenna.

Abstract:

Motivation: BioClojure is an open-source library for the manipulation of biological sequence data written in the language Clojure. BioClojure aims to provide a functional framework for the processing of biological sequence data that provides simple mechanisms for concurrency and lazy evaluation of large datasets.

Results: BioClojure provides parsers and accessors for a range of biological sequence formats, including UniProtXML, Genbank XML, FASTA and FASTQ. In addition, it provides wrappers for key analysis programs, including BLAST, SignalP, TMHMM and InterProScan, and parsers for analyzing their output. All interfaces leverage Clojure’s functional style and emphasize laziness and composability, so that BioClojure, and user-defined, functions can be chained into simple pipelines that are thread-safe and seamlessly integrate lazy evaluation.

Availability and implementation: BioClojure is distributed under the Lesser GPL, and the source code is freely available from GitHub (https://github.com/s312569/clj-biosequence).

Contact: jason.mulvenna@qimberghofer.edu.au or jason.mulvenna@qimr.edu.au

The introduction to this article is a great cut-n-paste “case for Clojure in bioinformatics.”

Functional programming is a programming style that treats computation as the evaluation of mathematical functions (Hudak, 1989). In its purest form, functional programming removes the need for variable assignment by using immutable data structures that eliminate the use of state and side effects (Backus, 1978). This ensures that functions will always return the same value given the same input. This greatly simplifies debugging and testing, as individual functions can be assessed in isolation regardless of a global state. Immutability also greatly simplifies concurrency and facilitates leveraging of multi-core computing facilities with little or no modifications to functionally written code. Accordingly, as a programming style, functional programming offers advantages for software development, including (i) brevity, (ii) simple handling of concurrency and (iii) seamless integration of lazy evaluation, simplifying the handling of large datasets. Clojure is a Lisp variant that encourages a functional style of programming by providing immutable data structures, functions as first-class objects and uses recursive iteration as opposed to state-based looping (Hickey, 2008). Clojure is built on the Java virtual machine (JVM), and thus, applications developed using BioClojure can be compiled into Java byte code and ran on any platform that runs the JVM. Moreover, libraries constructed using Clojure can be called in Java programs and, conversely, Java classes and methods can be called from Clojure programs, making available a large number of third-party Java libraries. BioClojure aims to leverage the tools provided by Clojure to provide a functional interface with biological sequence data and associated programs. BioClojure is similar in intent to other bioinformatics packages such as BioPerl (Stajich et al., 2002), BioPython (Cock et al., 2009), Bio++ (Dutheil et al., 2006) and BioJava (Prlić et al., 2012) but differs from these bioinformatics software libraries in its embrace of the functional style. With the decreasing cost of biological analyses, for example, next-generation sequencing, biologists are dealing with greater amounts of data, and BioClojure is an attempt to provide tools, emphasizing concurrency and lazy evaluation, for manipulating these data.

I like the introduction as a form of evangelism but use of Clojure and Bioclojure in bioinformatics to demonstrate its advantages is the better form of promotion.

Evangelism works best when results are untestable, not so well when results can be counted and measured.

Seeding the cloud…

Sunday, December 28th, 2014

Seeding the cloud — AWS gives credits with select edX certs by Barb Darrow.

From the post:

Amazon definitely wants enterprises to adopt its cloud, but it’s still wooing little startups too. This week, it said it will issue $1,000 in Amazon Web Services credit to any student who completes qualifying edX certifications in entrepreneurship. EdX is the online education platform backed by MIT, Harvard, and a raft of other universities.

Barb mentions at least six (6) other special cloud offers in a very short article. No doubt more are going to show up in 2015.

Are you going to be in the cloud in 2015?

I searched for “minimum fee” information for the entrepreneurship courses but got caught in a loop of HTML pages, none of which offered an actual answer.

Looking at some of the series courses, I would guess the “minimum fee” would be at or less than $100 per course. Check when you enroll for the actual “minimum fee.” Why the site admins want to be cagey about such a reasonable fee I cannot say.

Prying Eyes: Inside the NSA’s War on Internet Security

Sunday, December 28th, 2014

Prying Eyes: Inside the NSA’s War on Internet Security

Summary:

US and British intelligence agencies undertake every effort imaginable to crack all types of encrypted Internet communication. The cloud, it seems, is full of holes. The good news: New Snowden documents show that some forms of encryption still cause problems for the NSA.

A very long and comprehensive article from the SPIEGEL on encryption that may cause issues for the NSA. It is too complete to easily summarize so I suggest you read it in full and then take the following actions:

  • If you are not a cryptographer or child of a cryptographer, donate to one of more of the open source encryption projects you will find in the SPIEGEL article. Monthly if at all possible. Perhaps you can’t write encryption code but you can support those who do.
  • Use and consistently update your encryption technology and support those who work to make encryption easier to use. We need to create a tsunami of highly encrypted data everyday. From phone calls and IMs to emails and documents.
  • Politically resist all laws or regulations that make interception and/or decryption of communications legal and/or easier. You may not think you are committing a crime, but when government officials declare crimes and execute the guilty in private, how do you know?
  • Should you encounter any documents or data that expose government surveillance programs, there are existing examples of what you should do.

Once upon a time, privacy was a matter of the difficulty of tracking down physical copies of public records and asking neighbors what you liked to talk about. Those difficulties no longer exist and the electronic debris of our lives tells more than you might know.

The only privacy you have today is the privacy that you stake out and protect on your own. There are no guarantees that you will be successful in protecting your privacy but I can guarantee you won’t have any privacy if you don’t try.

A Beginners Guide to Content Creation

Sunday, December 28th, 2014

A Beginners Guide to Content Creation by Kristina Cisnero.

From the post:

From Songza to reddit, content curation is a huge part of the social web as we know it. We’re all on the same mission to find the absolute best material to enjoy and to share with our followers. This is especially true for businesses, whose customers and broader online audience follow them based on an expectation of quality content in return.

What is content curation?

In simple terms, the process of content curation is the act of sorting through large amounts of content on the web and presenting the best posts in a meaningful and organized way. The process can include sifting, sorting, arranging, and placing found content into specific themes, and then publishing that information.

In other words, content curation is very different from content marketing. Content curation doesn’t include creating new content; it’s the act of discovering, compiling, and sharing existing content with your online followers. Content curation is becoming an important tactic for any marketing department to maintain a successful online presence. Not only that, but content curation allows you to provide extra value to your brand’s audience and customers, which is key to building those lasting relationships with loyal fans.

It had not occurred to me that “content curation” might need definition. Kristina not only defines “content curation” but also illustrates why it is a value-add.

Being written in a web context, curation is defined relative to web content but curation can include (particularly with a topic map), any content of any form at any location. Some content may be more accessible than other content but web accessibility isn’t a requirement for curation. (Unless that is one of your requirements.)

Curated content can save your staff time and provide accurate results. Not to mention enabling informal knowledge to persist despite personnel changes. (Corporate memory)

Categories Great and Small

Saturday, December 27th, 2014

Categories Great and Small by Bartosz Milewski.

From the post:

You can get real appreciation for categories by studying a variety of examples. Categories come in all shapes and sizes and often pop up in unexpected places. We’ll start with something really simple.
No Objects

The most trivial category is one with zero objects and, consequently, zero morphisms. It’s a very sad category by itself, but it may be important in the context of other categories, for instance, in the category of all categories (yes, there is one). If you think that an empty set makes sense, then why not an empty category?
Simple Graphs

You can build categories just by connecting objects with arrows. You can imagine starting with any directed graph and making it into a category by simply adding more arrows. First, add an identity arrow at each node. Then, for any two arrows such that the end of one coincides with the beginning of the other (in other words, any two composable arrows), add a new arrow to serve as their composition. Every time you add a new arrow, you have to also consider its composition with any other arrow (except for the identity arrows) and itself. You usually end up with infinitely many arrows, but that’s okay.

Another way of looking at this process is that you’re creating a category, which has an object for every node in the graph, and all possible chains of composable graph edges as morphisms. (You may even consider identity morphisms as special cases of chains of length zero.)

Such a category is called a free category generated by a given graph. It’s an example of a free construction, a process of completing a given structure by extending it with a minimum number of items to satisfy its laws (here, the laws of a category). We’ll see more examples of it in the future.

The latest installment in literate explanation of category theory in this series.

Challenges await you at the end of this post.

Enjoy!

List of hacked government agencies grows:…

Saturday, December 27th, 2014

List of hacked government agencies grows: State Department, White House, NOAA & USPS by Darlene Storm.

Shaming the government isn’t an effective strategy to promote cyber security.

In part because improving governmental cybersecurity must be accomplished without:

  1. Changing any current personnel
  2. Changing any current practices
  3. Changing any current software
  4. Increasing burdens on users or programmers
  5. Increasing burdens on contractors

What if we were to remove those limitations and gave agency personnel some “skin in the game” so to speak?

What if an agency (subject to verification by the GAO) went unhacked for a fiscal year and its staff, below the appointed leadership level, not only got an annual bonus of 10% but also received a 10% raise for the next two fiscal years?

Plus favorable PR for being an unhacked federal agency.

How much effort do you think an agency’s staff would put into contracting for secure software and enforcing security practices?

For very large agencies, like the Department of Defense, it might be necessary to break security down on a chain of command basis. To keep slackers from pulling down other commands.

As the situation stands now, no amount of security failures or breaches has any impact on anyone. Has Booz Allen Hamilton suffered any penalty for Edward Snowden? Sysadmins at the White House feeling uneasy? When there no consequences for failure and no rewards for success, mediocrity is a certainty.

Mediocrity in cybersecurity = cyberinsecurity.*

* To anticipate the objection “…that’s just not how government agencies are run…” I would append “now” and observe there is always a first time.

A Common Logic to Seeing Cats and Cosmos

Saturday, December 27th, 2014

A Common Logic to Seeing Cats and Cosmos by Natalie Wolchover.

From the post:

CatCollage_03_SM

There may be a universal logic to how physicists, computers and brains tease out important features from among other irrelevant bits of data.

When in 2012 a computer learned to recognize cats in YouTube videos and just last month another correctly captioned a photo of “a group of young people playing a game of Frisbee,” artificial intelligence researchers hailed yet more triumphs in “deep learning,” the wildly successful set of algorithms loosely modeled on the way brains grow sensitive to features of the real world simply through exposure.

Using the latest deep-learning protocols, computer models consisting of networks of artificial neurons are becoming increasingly adept at image, speech and pattern recognition — core technologies in robotic personal assistants, complex data analysis and self-driving cars. But for all their progress training computers to pick out salient features from other, irrelevant bits of data, researchers have never fully understood why the algorithms or biological learning work.

Now, two physicists have shown that one form of deep learning works exactly like one of the most important and ubiquitous mathematical techniques in physics, a procedure for calculating the large-scale behavior of physical systems such as elementary particles, fluids and the cosmos.

The new work, completed by Pankaj Mehta of Boston University and David Schwab of Northwestern University, demonstrates that a statistical technique called “renormalization,” which allows physicists to accurately describe systems without knowing the exact state of all their component parts, also enables the artificial neural networks to categorize data as, say, “a cat” regardless of its color, size or posture in a given video.

“They actually wrote down on paper, with exact proofs, something that people only dreamed existed,” said Ilya Nemenman, a biophysicist at Emory University. “Extracting relevant features in the context of statistical physics and extracting relevant features in the context of deep learning are not just similar words, they are one and the same.”

As for our own remarkable knack for spotting a cat in the bushes, a familiar face in a crowd or indeed any object amid the swirl of color, texture and sound that surrounds us, strong similarities between deep learning and biological learning suggest that the brain may also employ a form of renormalization to make sense of the world.

“Maybe there is some universal logic to how you can pick out relevant features from data,” said Mehta. “I would say this is a hint that maybe something like that exists.”

The finding formalizes what Schwab, Mehta and others saw as a philosophical similarity between physicists’ techniques and the learning procedure behind object or speech recognition. Renormalization is “taking a really complicated system and distilling it down to the fundamental parts,” Schwab said. “And that’s what deep neural networks are trying to do as well. And what brains are trying to do.”

If you weren’t already planning on learning/catching up on deep learning in 2015, this article should tip the balance towards deep learning. Not simply because it appears to be “the” idea for 2015 but because you are likely to be called upon to respond to analysis/conclusions based upon deep learning techniques.

Unlike Stephen Hawking, I don’t fear the rise of artificial intelligence. What I fear is the uncritical acceptance of machine learning results, whether artificial intelligence ever arrives or not.

Critical discussion of deep learning results and techniques is going to require people as informed as the advocates of deep learning on all sides. How can you oppose a policy that is justified by a algorithm considering far more factors than any person and that has no racial prejudice. How can it? It is simply an algorithm.

Saying that a result or algorithm is racist isn’t very scientific. What opposition to the policies of tomorrow will require is detailed analysis of both data and algorithms so as to leave little or no doubt that a racist outcome was an intentional one.

Here’s a concrete example of where greater knowledge allows someone to deceive the general public while claiming to be completely open. In the Michael Brown case, Prosecutor McCulloch claims to have allowed everyone who claimed to have knowledge of the case to testify. Which is true, as far as it went. What he failed to say was that every witness that supported a theory that Darren Wilson was guilty of murdering Michael Brown, had their prior statements presented to the grand jury and were heavily cross-examined by the prosecutors. On the surface fair, just beneath, extremely unfair. But you have to know the domain to see the unfairness.

The same is going to be the case when results of deep learning are presented. How much do you trust the person presenting the results? And the people they trusted with the data and analysis?

Software Foundations

Saturday, December 27th, 2014

Software Foundations by Benjamin Pierce and others.

From the preface:

This electronic book is a course on Software Foundations, the mathematical underpinnings of reliable software. Topics include basic concepts of logic, computer-assisted theorem proving and the Coq proof assistant, functional programming, operational semantics, Hoare logic, and static type systems. The exposition is intended for a broad range of readers, from advanced undergraduates to PhD students and researchers. No specific background in logic or programming languages is assumed, though a degree of mathematical maturity will be helpful.

One novelty of the course is that it is one hundred per cent formalized and machine-checked: the entire text is literally a script for Coq. It is intended to be read alongside an interactive session with Coq. All the details in the text are fully formalized in Coq, and the exercises are designed to be worked using Coq.

The files are organized into a sequence of core chapters, covering about one semester’s worth of material and organized into a coherent linear narrative, plus a number of “appendices” covering additional topics. All the core chapters are suitable for both graduate and upper-level undergraduate students.

This looks like a real treat!

Imagine security in a world where buggy software (by error and design) wasn’t patched by more buggy software (by error and design) and protected by security software, which is also buggy (by error and design). Would that change the complexion of current security issues?

I first saw this in a tweet by onepaperperday.

PS: Sony got hacked, again. Rumor is that this latest Sony hack was an extra credit exercise for a 6th grade programming class.

NSA IOB Report Dump – Still Missing Files

Saturday, December 27th, 2014

In Merry Christmas From the NSA! Missing Files, I reported a blog entry on the Christmas Eve posting of Intelligence Oversight Board reports covering a span of years. On closer inspection, I found a number of second quarter reports were missing.

The very next day, 26 December 2014, another blog post reported on the NSA posting and following the link there, most of the missing files had been supplied by the NSA, with nary a peep about the previously missing files.

Today is 27 December 2014 and the following files continue to be missing:

3Q FY10, 3Q FY09, 4Q FY09

I realize that I am being unreasonable. The NSA posting was only missing 15 files out of 48. The current NSA posting continues to miss 3 files out of 48. The stories got NSA correct, some files were released, and the URL was right. What more could you ask for?

I will have to let you answer that last question for yourselves. I will say something untowards if I continue this post.


I have created a compressed tar ball with all the NSA IOB reports that are currently on the NSA website. Avoiding being on the NSA web logs doesn’t count for much because I am sure they are tracking all web traffic anyway. But, as a sign of annoyance with the NSA, please obtain the incomplete file set here (92 MB approximately).

The Inductive Biases of Various Machine Learning Algorithms

Saturday, December 27th, 2014

The Inductive Biases of Various Machine Learning Algorithms by Laura Diane Hamilton.

From the post:

Every machine learning algorithm with any ability to generalize beyond the training data that it sees has, by definition, some type of inductive bias.

That is, there is some fundamental assumption or set of assumptions that the learner makes about the target function that enables it to generalize beyond the training data.

Below is a chart that shows the inductive biases for various machine learning algorithms:

Inductive reasoning has a checkered history (Hume) but is widely relied upon in machine learning.

Consider this a starter set of biases for classes of machine learning algorithms.

There may be entire monographs on the subject but I haven’t seen a treatment at length on how to manipulate data sets so they take advantage of known biases in the better known machine learning algorithms.

You could take the position that misleading data sets test the robustness of machine learning algorithms and so the principles of their generation and use have the potential to improve machine learning.

That may well be the case but I would be interested in such a treatment so that detection of such manipulation of data could be detected.

Either way, it would be an interesting effort, assuming it doesn’t exist already.

Pointers anyone?

I first saw this in a tweet by Alex Hall.

Accidental vs Deliberate Context

Saturday, December 27th, 2014

Accidental vs Deliberate Context by Jessica Kerr.

From the post:

In all decisions, we bring our context with us. Layers of context, from what we read about that morning to who our heroes were growing up. We don’t realize how much context we assume in our communications, and in our code.

One time I taught someone how to make the Baby Vampire face. It involves poking out both corners of my lower lip, so they stick up like poky gums. Very silly. To my surprise, the person couldn’t do it. They could only poke one side of the lower lip out at a time.

Hotel-Transylvania-Castle-1280x1024-Wallpaper-ToonsWallpapers.com-

Turns out, few outside my family can make this face. My mom can do it, my sister can do it, my daughters can do it – so it came as a complete surprise to me when someone couldn’t. There is a lip-flexibility that’s part of my context, always has been, and I didn’t even realize it.

Jessica goes on to illustrate that communication depends upon the existence of some degree of shared context and that additional context can be explained to others, as on a team.

She distinguishes between “incidental” shared contexts and “deliberate” shared contexts. Incidental contexts arising from family or long association with friends. Common/shared experiences form an incidental context.

Deliberate contexts, on the other hand, are the intentional melding of a variety of contexts, in her examples, the contexts of biologists and programmers. Who at the outset, lacked a common context in which to communicate.

Forming teams with diverse backgrounds is a way to create a “deliberate” context, but my question would be how to preserve that “deliberate” context for others? It becomes an “incidental” context if others must join the team in order to absorb the previously “deliberate” context. If that is a requirement, then others will not be able to benefit from deliberately created contexts in which they did not participate.

If the process and decisions made in forming a “deliberate” context were captured by a topic map, then others could apply this “new” deliberate context to develop other “deliberate” contexts. Perhaps some of the decisions or mappings made would not suit another “deliberate” context but perhaps some would. And perhaps other “deliberate” contexts would evolve beyond the end of their inputs.

The point being that unless these “deliberate” contexts are captured, to whatever degree of granularity is desired, every “deliberate” context for say biologists and programmers is starting off at ground zero. Have you ever heard of a chemistry experiment starting off by recreating the periodic table? I haven’t. Perhaps we should abandon that model in the building of “deliberate” contexts as well.

Not to mention that re-usable “deliberate” contexts might enable greater diversity in teams.

Topic maps anyone?

PS: I suggest topic maps to capture “deliberate” context because topic maps are not constrained by logic. You can capture any subject and any relationship between subjects, logical or not. For example, a user of a modern dictionary, which lists words in alphabetical order, would be quite surprised if given a dictionary of Biblical Hebrew and asked to find a word (assuming they know the alphabet). The most common dictionaries of Biblical Hebrew list words by their roots and not as they appear to the common reader. There are arguments to be made for each arrangement but neither one is a “logical” answer.

The arrangement of dictionaries is another example of differing contexts. With a topic map I can offer a reader whichever Biblical Hebrew dictionary is desired, with only one text underlying both displays. As opposed to the printed version which can offer only one context or another.

U&lc (volumes 1-24)

Saturday, December 27th, 2014

U&lc (volumes 1-24)

A remarkable collection of back issues of U&lc.

To best explain the almost twenty-seven year run of U&lc, I offer this editorial from volume 1, issue 1:

Why U&lc?

The world of graphic arts is alive today with new technological advances, so vast and difficult to comprehend, that they strain the imagination of even the most knowledgeable and creatively gifted among us. New materials, new tools, new ways to plan work are becoming mandatory for efficiency, quality, economy—presenting problems for all—printers, typesetters, artists, writers, advertisers, publishers—all the creative people who have anything to do with preparation of the visual word.

How to keep up? How to stay in touch with what is current? How to plan for tomorrow? To envision a future essential to decision making today?

Vital questions for the interested professional. Yet where can he find the most recent information on trends, styles, fashions? Where can he read about all and everything that is happening in the graphic arts and sciences?

To help make this broad body of knowledge and information available—and, hopefully, to provide some answers— International Typeface Corporation introduces this first issue of “U&/c,”the International Journal of Typo/Graphics, designed by Herb Lubalin and distributed worldwide.

“U&lc”will have broad general appeal, covering important graphic events and presenting original articles by world leaders in the typographic arts, as well as reprints of articles of importance that have appeared in other publications.

“U&/c”will feature outstanding examples of typographic design in all fields of visual communication, from the best-known creators to the undiscovered shops.

“U&lc” will offer in-depth analysis of the material presented and study the direction of current work and developments in typographic technology.

In brief, “U&k”will provide a panoramic window, a showcase for the world of graphic arts—a clearinghouse for the international exchange of ideas and information.

It is the intent of the editorial staff and the directors of ITC that “U&/c” will come to serve as the international journal for all who want to have their finger on “what is new’,’ “what is happening’: and “what to look for” in the world of typographics.

The Editors

Fonts, graphics, page layout as as important to communication as language, grammar, style, content, experience, context and a host of other known and unknown factors. Life may or may not be a miracle. But that communication happens at all, is certainly a miracle.

Fonts and graphic layout are two known factors that impact communication and you would do well to appreciate them, even as you seek the advice of experts for screen or print communication with users.

Enjoy!

PS: This remarkable collection is hosted at Fonts.com. A remarkable collection of information on typography in its own right.

24 Data Science Resources to Keep Your Finger on the Pulse

Friday, December 26th, 2014

24 Data Science Resources to Keep Your Finger on the Pulse by Cheng Han Lee.

From the post:

There are lots of resources out there to learn about, or to build upon what you already know about, data science. But where do you start? What are some of the best or most authoritative sources? Here are some websites, books, and other resources that we think are outstanding.

All of these resources are worth following.

If you aspire to be a data scientist, do more than nod along with each posting. Download/install the tools, work through the presented problem and then explore beyond it. In three months, your data science skills will have improved more than you can imagine. Think of where you will be next year!

The 50 Most Innovative Computer Science Departments in the U.S.

Friday, December 26th, 2014

The 50 Most Innovative Computer Science Departments in the U.S. by Yusuf Laher.

A great resource but the additional links were written as text and not hyperlinks. Thus, for University of Texas at Austin:

link-shot

Compare my listing (which included links buried in prose):

11. Department of Computer Science, University of Texas at Austin – Austin, Texas

Department of Computer Science

https://www.cs.utexas.edu/faculty

https://www.cs.utexas.edu/about-us

I deleted the prose descriptions, changed all the links into HTML links, swept for vanity links, such as prior rankings, etc. What’s left should be links to the departments, major projects and faculty. It’s a rough cut but suitable for spidering, etc.

50. Department of Computer Science, University of Arkansas at Little Rock – Little Rock, Arkansas

Department of Computer Science

http://ualr.edu/eit/

http://ualr.edu/computerscience/prospective-students/facilities/

49. School of Informatics and Computing, Indiana University – Bloomington, Indiana

School of Informatics and Computing

Student Technology Centers

http://www.soic.indiana.edu/faculty-research/

http://www.soic.indiana.edu/about/

48. Department of Computer Science & Engineering, Texas A&M University – College Station, Texas

Department of Computer Science & Engineering

the High Performance Computing Laboratory

http://engineering.tamu.edu/cse/

47. School of Computing, Informatics and Decision Systems Engineering, Arizona State University – Tempe, Arizona

School of Computing, Informatics and Decision Systems Engineering

Center for Excellence in Logistics and Distribution

http://cidse.engineering.asu.edu/facultyandresearc/research-centers/

46. Department of Computer Science, The University of North Carolina at Chapel Hill – Chapel Hill, North Carolina

Department of Computer Science

https://www.cs.unc.edu/cms/research/research-laboratories

45. Department of Computer Science, Rutgers University – Piscataway, New Jersey

Department of Computer Science

Hack R Space

44. Department of Computer Science, Stony Brook University – Stony Brook, New York

Stony Brook University

Center of Excellence in Wireless and Information Technology

https://www.cs.stonybrook.edu/research

43. Department of Computer Science & Engineering, Washington University in St. Louis – St. Louis, Missouri

Department of Computer Science & Engineering

Cyber-Physical Systems Lab

Stream-based Supercomputing Lab

http://cse.wustl.edu/aboutthedepartment/Pages/history.aspx

http://cse.wustl.edu/Research/

42. Department of Computer Science, Purdue University – West Lafayette, Indiana

Department of Computer Science

Center for Integrated Systems in Aerospace

https://www.cs.purdue.edu/research/centers.html

http://www.purdue.edu/discoverypark/cri/research/projects.php

41. Computer Science Department, New York University – New York City, New York

Computer Science Department

40. Electrical Engineering and Computer Science, Northwestern University – Evanston, Illinois

Electrical Engineering and Computer Science department

http://eecs.northwestern.edu/2013-09-03-20-01-56/researchgroupsandlabs/

http://www.eecs.northwestern.edu/graduate-study

39. School of Computing, The University of Utah – Salt Lake City, Utah

School of Computing

Scientific Computing and Imaging Institute

http://www.cs.utah.edu/research/

http://www.cs.utah.edu/about/history/

38. Department of Computer Science, University of California, Santa Barbara – Santa Barbara, California

Department of Computer Science

Four Eyes Lab

https://www.cs.ucsb.edu/research

37. Department of Computer Science and Engineering, University of Minnesota – Minneapolis, Minnesota

Department of Computer Science and Engineering

Laboratory for Computational Science and Engineering

http://www.cs.umn.edu/department/excellence.php

http://www.cs.umn.edu/research/

36. Department of Computer Science, Dartmouth College – Hanover, New Hampshire

Department of Computer Science

Visual Learning Group

computational biology-focused Grigoryan Lab

http://web.cs.dartmouth.edu/research/projects

35. Department of Computer Science and Engineering, The Ohio State University – Columbus, Ohio

computer science programs

https://cse.osu.edu/research

34. Department of Computer Science, North Carolina State University – Raleigh, North Carolina

Department of Computer Science

Visual Experiences Lab

http://www.csc.ncsu.edu/news/

33. Computer Science Department, Boston University – Boston, Massachusetts

Computer Science Department

http://www.bu.edu/hic/about-hic/

32. Department of Computer Science, University of Pittsburgh – Pittsburgh, Pennsylvania

Department of Computer Science

Pittsburgh Supercomputing Center

http://www.cs.pitt.edu/research/

31. Department of Computer Science, Virginia Polytechnic Institute and State University – Blacksburg, Virginia

computer science programs

30. Department of Computer Science, University of California, Davis – Davis, California

Department of Computer Science

http://www.cs.ucdavis.edu/index.html

http://www.cs.ucdavis.edu/research/index.html

http://www.cs.ucdavis.edu/iap/index.html

29. Department of Computer Science and Engineering, Pennsylvania State University – University Park, Pennsylvania

Department of Computer Science and Engineering

Institute for CyberScience

https://www.cse.psu.edu/research

http://ics.psu.edu/what-we-do/

28. Department of Computer Science, Johns Hopkins University – Baltimore, Maryland

Department of Computer Science

boundary-crossing areas

Center for Encrypted Functionalities

http://www.cs.jhu.edu/research/

27. School of Computer Science, University of Massachusetts Amherst – Amherst, Massachusetts

School of Computer Science

https://www.cs.umass.edu/faculty/

26. Department of Computer Science, University of Illinois at Chicago – Chicago, Illinois

Department of Computer Science

25. Rice University Computer Science, Rice University – Houston, Texas

computer science department

24. Department of Computer Science, Donald Bren School of Information and Computer Sciences, University of California, Irvine – Irvine, California

Donald Bren School of Information and Computer Sciences

Center for Emergency Response Technologies

Center for Machine Learning and Intelligent Systems

Institute for Virtual Environments and Computer Games

http://uci.edu/academics/ics.php

http://www.ics.uci.edu/faculty/

23. Department of Computer Science, University of Maryland – College Park, Maryland

Department of Computer Science

Institute for Advanced Computer Studies

https://www.cs.umd.edu/research

22. Brown Computer Science, Brown University – Providence, Rhode Island

computer science department

Center for Computational Molecular Biology

Center for Vision Research

http://cs.brown.edu/research/

21. Department of Computer Science, University of Chicago – Chicago, Illinois

Department of Computer Science

http://www.cs.uchicago.edu/research/labs

20. Department of Computer and Information Science, University of Pennsylvania – Philadelphia, Pennsylvania

Department of Computer and Information Science

Electronic Numerical Integrator and Computer

http://www.seas.upenn.edu/about-seas/eniac/

19. Department of Computer Science and Engineering, University of California, San Diego – La Jolla, California

Department of Computer Science and Engineering

18. Department of Computer Science, University of Southern California – Los Angeles, California

Department of Computer Science

http://www.cs.usc.edu/research/centers-and-institutes.htm

17. Department of Electrical Engineering and Computer Science, University of Michigan – Ann Arbor, Michigan

Department of Electrical Engineering and Computer Science

http://www.eecs.umich.edu/eecs/research/reslabs.html

16. Department of Computer Sciences, University of Wisconsin-Madison – Madison, Wisconsin

Department of Computer Sciences

https://www.cs.wisc.edu/people/emeritus-faculty

15. College of Computing, Georgia Institute of Technology – Atlanta, Georgia

College of Computing

Center for Robotics and Intelligent Machines

Georgia Tech Information Security Center

http://www.cse.gatech.edu/research

14. Department of Computer Science, Yale University – New Haven, Connecticut

Department of Computer Science

http://cpsc.yale.edu/our-research

13. Department of Computing + Mathematical Sciences, California Institute of Technology – Pasadena, California

Department of Computing + Mathematical Sciences

Annenberg Center

http://www.cms.caltech.edu/research/

12. Department of Computer Science, University of Illinois at Urbana-Champaign – Urbana, Illinois

Department of Computer Science

http://cs.illinois.edu/research/research-centers

11. Department of Computer Science, University of Texas at Austin – Austin, Texas

Department of Computer Science

https://www.cs.utexas.edu/faculty

10. Department of Computer Science, Cornell University – Ithaca, New York

Department of Computer Science

Juris Hartmanis

http://www.cs.cornell.edu/research

http://www.cs.cornell.edu/people/faculty

9. UCLA Computer Science Department, University of California, Los Angeles – Los Angeles, California

Computer Science Department

http://www.cs.ucla.edu/people/faculty

http://www.cs.ucla.edu/research/research-labs

8. Department of Computer Science, Princeton University – Princeton, New Jersey

computer science department

http://www.cs.princeton.edu/research/areas

7. Computer Science Division, University of California, Berkeley – Berkeley, California

Computer Science Division

6. Harvard School of Engineering and Applied Sciences, Harvard University – Cambridge, Massachusetts

School of Engineering and Applied Sciences

http://www.seas.harvard.edu/faculty-research/

5. Carnegie Mellon School of Computer Science, Carnegie Mellon University – Pittsburgh, Pennsylvania

School of Computer Science

http://www.cs.cmu.edu/directory/

4. Computer Science & Engineering, University of Washington – Seattle, Washington

Computer Science & Engineering

Paul G. Allen Center for Computer Science & Engineering

https://www.cs.washington.edu/research/

3. Department of Computer Science, Columbia University – New York City, New York

The Fu Foundation School of Engineering and Applied Science

http://engineering.columbia.edu/graduateprograms/

http://www.cs.columbia.edu/people/faculty

2. Computer Science Department, Stanford University – Stanford, California

Computer Science Department

http://www-cs.stanford.edu/research/

1. MIT Electrical Engineering & Computer Science, Massachusetts Institute of Technology – Cambridge, Massachusetts

Electrical Engineering & Computer Science

Computer Science and Artificial Intelligence Laboratory

Suggestions for further uses of this listing welcome! (Or grab a copy and use it yourself. Please send a link to what you make out of it. Thanks!)

How to Win at Rock-Paper-Scissors

Friday, December 26th, 2014

How to Win at Rock-Paper-Scissors

From the post:

The first large-scale measurements of the way humans play Rock-Paper-Scissors reveal a hidden pattern of play that opponents can exploit to gain a vital edge.

RPSgame

If you’ve ever played Rock-Paper-Scissors, you’ll have wondered about the strategy that is most likely to beat your opponent. And you’re not alone. Game theorists have long puzzled over this and other similar games in the hope of finding the ultimate approach.

It turns out that the best strategy is to choose your weapon at random. Over the long run, that makes it equally likely that you will win, tie, or lose. This is known as the mixed strategy Nash equilibrium in which every player chooses the three actions with equal probability in each round.

And that’s how the game is usually played. Various small-scale experiments that record the way real people play Rock-Paper-Scissors show that this is indeed the strategy that eventually evolves.

Or so game theorists had thought… (emphasis added)

No, I’m not going to give away the answer!

I will only say the answer isn’t what has been previously thought.

Why the different answer? Well, the authors speculate (with some justification) that the smallness of prior experiments resulted in the non-exhibition of a data pattern that was quite obvious when done on a larger scale.

Given that N < 100 in so many sociology, psychology, and other social science experiments, the existing literature offers a vast number of opportunities where repeating small experiments on large scale could produce different results. If you have any friends in a local social science department, you might want to suggest this to them as a way to be on the front end of big data in social science. PS: If you have access to a social science index, please search and post a rough count of participants < 100 in some subset of social science journals. Say since 1970. Thanks!

Big Data – The New Science of Complexity

Friday, December 26th, 2014

Big Data – The New Science of Complexity by Wolfgang Pietsch.

Abstract:

Data-intensive techniques, now widely referred to as ‘big data’, allow for novel ways to address complexity in science. I assess their impact on the scientific method. First, big-data science is distinguished from other scientific uses of information technologies, in particular from computer simulations. Then, I sketch the complex and contextual nature of the laws established by data-intensive methods and relate them to a specific concept of causality, thereby dispelling the popular myth that big data is only concerned with correlations. The modeling in data-intensive science is characterized as ‘horizontal’—lacking the hierarchical, nested structure familiar from more conventional approaches. The significance of the transition from hierarchical to horizontal modeling is underlined by a concurrent paradigm shift in statistics from parametric to non-parametric methods.

A serious investigation of the “science” of big data, which I noted was needed in: Underhyped – Big Data as an Advance in the Scientific Method.

From the conclusion:

The knowledge established by big-data methods will consist in a large number of causal laws that generally involve numerous parameters and that are highly context-specific, i.e. instantiated only in a small number of cases. The complexity of these laws and the lack of a hierarchy into which they could be integrated prevent a deeper understanding, while allowing for predictions and interventions. Almost certainly, we will experience the rise of entire sciences that cannot leave the computers and do not fit into textbooks.

This essay and the references therein are a good vantage point from which to observe the development of a new science and its philosophy of science.