### Got Balls?

Sunday, May 19th, 2013

IED Trends: Turning Tennis Balls Into Bombs

From the post:

Terrorists are relentlessly evolving tactics and techniques for IEDs (Improvised Explosive Devices), and analyzing reporting on IEDs can provide insight complementary to HUMINT on emerging militant methods. Preparing for an upcoming webcast with our friends at Terrogence, we found incidents using sports balls, particularly tennis balls and cricket balls, more frequently appearing as a delivery vehicle for explosives.

When we break these incidents from the last four months down by location, the city of Karachi in southern Pakistan stands out as a hotbed. There is also evidence that this tactic is being embraced around the globe as you can see sports balls fashioned into bombs found from Longview, Washington in the United States to Varanasi in India.

We can use Recorded Future’s Web Intelligence platform to plot out the locations where incidents have recently occurred as well as the frequency and timing.

Interesting but the military, by their stated doctrines, should be providing this information in theater specific IED briefings.

See for example: FMI 3-34.119/MCIP 3-17.01 IMPROVISED EXPLOSIVE DEVICE DEFEAT

On boobytraps (the old name) in general, see: FM 5-31 Boobytraps (1965), which includes pressure cookers (pp. 73-74) and rubber balls (p. 87).

Topic maps offer over rapid dissemination of “new” forms and checklists for where they may be found. (As opposed to static publications.)

Interesting that FM 5-31 reports an electric iron as boobytrap, but an electric iron is more likely to show up on Antiques Roadshow than as an IED.

At least in the United States.

### Office of the Director of National Intelligence: Data Mining 2012

Saturday, April 13th, 2013

Office of the Director of National Intelligence: Data Mining 2012

Office of the Director of National Intelligence = ODNI

To cut directly to the chase:

II. ODNI Data Mining Activities

The ODNI did not engage in any activities to use or develop data mining functionality during the reporting period.

My source, KDNuggets, provides the legal loophole analysis.

Who watches the watchers?

Looks like that it’s going to be you and me.

Every citizen who recognizes a government employee, agent, official, tweet the name you know them by with your location.

Just that.

If enough of us do that, patterns will begin to appear in the data stream.

If enough patterns appear in the data stream, the identities of government employees, agents, officials, will slowly become known.

Transparency won’t happen overnight or easily.

But if you are waiting for the watchers to watch themselves, you are going to be severely disappointed.

### FLOPS Fall Flat for Intelligence Agency

Friday, March 29th, 2013

FLOPS Fall Flat for Intelligence Agency by Nicole Hemsoth.

From the post:

The Intelligence Advanced Research Projects Activity (IARPA) is putting out some RFI feelers in hopes of pushing new boundaries with an HPC program. However, at the core of their evaluation process is an overt dismissal of current popular benchmarks, including floating operations per second (FLOPS).

To uncover some missing pieces for their growing computational needs, IARPA is soliciting for “responses that illuminate the breadth of technologies” under the HPC umbrella, particularly the tech that “isn’t already well-represented in today’s HPC benchmarks.”

The RFI points to the general value of benchmarks (Linpack, for instance) as necessary metrics to push research and development, but argues that HPC benchmarks have “constrained the technology and architecture options for HPC system designers.” More specifically, in this case, floating point benchmarks are not quite as valuable to the agency as data-intensive system measurements, particularly as they relate to some of the graph and other so-called big data problems the agency is hoping to tackle using HPC systems.

Responses are due by Apr 05, 2013 4:00 pm Eastern.

Not that I expect most of you to respond to this RFI but I mention it as a step in the right direction for the processing of semantics.

Semantics are not native to vector fields and so every encoding of semantics in a vector field is a mapping.

As is every extraction of semantic from a vector field is the reverse of that mapping process.

The impact of this mapping/unmapping of semantics to and from a vector field on interpretation are unclear.

As mapping and unmapping decisions are interpretative, it seems reasonable to conclude there is some impact. How much isn’t known.

Vector fields are easy for high FLOPS systems to process but do you want a fast inaccurate answer or one that bears some resemblance to reality as experienced by others?

Graph databases, to name one alternative, are the current rage, at least according to graph database vendors.

But saying “graph database,” isn’t the same as usefully capturing semantics with a graph database.

Or processing semantics once captured.

What we need is an alternative to FLOPS that represents effective processing of semantics.

Suggestions?

### Worldwide Threat Assessment…

Thursday, March 14th, 2013

Worldwide Threat Assessment of the US Intelligence Community, Senate Select Committee on Intelligence, James R. Clapper, Director of National Intelligence, March 12, 2013.

Thought you might be interested in the cybersecurity parts, marketing literature stuff if your interests lie towards security issues.

It has tidbits like this one:

Foreign intelligence and security services have penetrated numerous computer networks of US Government, business, academic, and private sector entities. Most detected activity has targeted unclassified networks connected to the Internet, but foreign cyber actors are also targeting classified networks. Importantly, much of the nation’s critical proprietary data are on sensitive but unclassified networks; the same is true for most of our closest allies. (emphasis added)

Just curious, if you discovered your retirement funds were in your mail box, would you move them to a more secure location?

Depending on the products or services you are selling, the report may have other marketing information.

I first saw this in a tweet by Jeffrey Carr.

### Hiding in Plain Sight/Being Secure From The NSA

Wednesday, March 13th, 2013

I presume that if a message can be “overhear,” electronically or otherwise, it is likely the NSA and other “fictional” groups are capturing it.

The use of encryption marks you as a possible source of interest.

You can use image-based steganography to conceal messages but that requires large file sizes and is subject to other attacks.

Professor Abdelrahman Desoky of the University of Maryland in Baltimore County, USA, suggests that messages can be hidden in plain sight, but changing the wording of jokes to carry a secret message.

Desoky suggests that instead of using a humdrum text document and modifying it in a codified way to embed a secret message, correspondents could use a joke to hide their true meaning. As such, he has developed an Automatic Joke Generation Based Steganography Methodology (Jokestega) that takes advantage of recent software that can automatically write pun-type jokes using large dictionary databases. Among the automatic joke generators available are: The MIT Project, Chuck Norris Joke Generator, Jokes2000, The Joke Generator dot Com and the Online Joke Generator System (pickuplinegen).

A simple example might be to hide the code word “shaking” in the following auto-joke. The original question and answer joke is “Where do milk shakes come from?” and the correct answer would be “From nervous cows.” So far, so funny. But, the system can substitute the word “shaking” for “nervous” and still retain the humor so that the answer becomes “From shaking cows.” It loses some of its wit, but still makes sense and we are not all Bob Hopes, after all. [Hiding Secret Messages in Email Jokes]

Or if you prefer the original article abstract:

This paper presents a novel steganography methodology, namely Automatic Joke Generation Based Steganography Methodology (Jokestega), that pursues textual jokes in order to hide messages. Basically, Jokestega methodology takes advantage of recent advances in Automatic Jokes Generation (AJG) techniques to automate the generation of textual steganographic cover. In a corpus of jokes, one may judge a number of documents to be the same joke although letters, locations, and other details are different. Generally, joke and puns could be retold with totally different vocabulary, while still retaining their identities. Therefore, Jokestega pursues the common variations among jokes to conceal data. Furthermore, when someone is joking, anything may be said which legitimises the use of joke-based steganography. This makes employing textual jokes very attractive as steganographic carrier for camouflaging data. It is worth noting that Jokestega follows Nostega paradigm, which implies that joke-cover is noiseless. The validation results demonstrate the effectiveness of Jokestega. is only available to individual subscribers or to users at subscribing institutions. [Jokestega: automatic joke generation-based steganography methodology by Abdelrahman Desoky. International Journal of Security and Networks (IJSN), Vol. 7, No. 3, 2012]

If you are interested, other publications by Professor Desoky are listed here.

Occurs to me that topic maps offer the means to create steganography chains over public channels. The sender may know its meaning but there can be several links in the chain of transmission that change the message but have no knowledge of its meaning. And/or that don’t represent traceable links in the chain.

With every “hop” and/or mapping of the terms to another vocabulary, the task of statistical analysis grows more difficult.

Not the equivalent of highly secure communication networks, the contents of which can be copied onto a Lady Gaga DVD, but then not everyone needs that level of security.

Some people need cheaper but more secure systems for communication.

Will devote some more thought to the outline of a topic map system for hiding content in plain sight.

### Fast Data Gets A Jump On Big Data

Tuesday, March 12th, 2013

Fast Data Gets A Jump On Big Data by Hasan Rizvi.

The title reminded me of a post by Sam Hunting that asked: “How come we’ve got Big Data and not Good Data?”

Now “big data” is to give way to “fast data.”

From the post:

Today, both IT and business users alike are facing business scenarios where they need better information to differentiate, innovate, and radically transform their business.

In many cases, that transformation is being enabled by a move to “Big Data.” Organizations are increasingly collecting vast quantities of real-time data from a variety of sources, from online social media data to highly-granular transactional data to data from embedded sensors. Once collected, users or businesses are mining the data for meaningful patterns that can be used to drive business decisions or actions.

Big Data uses specialized technologies (like Hadoop and NoSQL) to process vast amounts of information in bulk. But most of the focus on Big Data so far has been on situations where the data being managed is basically fixed—it’s already been collected and stored in a Big Data database.

This is where Fast Data comes in. Fast Data is a complimentary approach to Big Data for managing large quantities of “in-flight” data that helps organizations get a jump on those business-critical decisions. Fast Data is the continuous access and processing of events and data in real-time for the purposes of gaining instant awareness and instant action. Fast Data can leverage Big Data sources, but it also adds a real-time component of being able to take action on events and information before they even enter a Big Data system.

Sorry Sam, “good data” misses out again.

Data isn’t the deciding factor in human decision making, instant or otherwise, see Thinking, Fast and Slow by Daniel Kahnman.

Supplying decision makers with good data and sufficient time to consider it, is the route to better decision making.

Of course, that leaves time to discover the poor quality of data provided by fast/big data delivery mechanisms.

### In-Q-Tel (IQT)

Sunday, February 24th, 2013

In-Q-Tel (IQT)

THE IQT MISSION

Launched in 1999 as an independent, not-for-profit organization, IQT was created to bridge the gap between the technology needs of the U.S. Intelligence Community (IC) and new advances in commercial technology. With limited insight into fast-moving private sector innovation, the IC needed a way to find emerging companies, and, more importantly, to work with them. As a private company with deep ties to the commercial world, we attract and build relationships with technology startups outside the reach of the Intelligence Community. In fact, more than 70 percent of the companies that IQT partners with have never before done business with the government.

As a strategic investor, our model is unique. We make investments in startup companies that have developed commercially-focused technologies that will provide strong, near-term advantages (within 36 months) to the IC mission. We design our strategic investments to accelerate product development and delivery for this ready-soon innovation, and specifically to help companies add capabilities needed by our customers in the Intelligence Community. Additionally, IQT effectively leverages its direct investments by attracting a significant amount of private sector funds, often from top-tier venture capital firms, to co-invest in our portfolio companies. On average, for every dollar that IQT invests in a company, the venture capital community has invested over nine dollars, helping to deliver crucial new capabilities at a lower cost to the government.

Topic maps could offer advantages to an intelligence community, either vis-à-vis other intelligence communities and/or vis-à-vis competitors in the same intelligence community.

A funding source to consider for topic maps in intelligence work.

I first saw this at Beyond Search.

### Building the Library of Twitter

Saturday, January 19th, 2013

Building the Library of Twitter by Ian Armas Foster.

From the post:

On an average day people around the globe contribute 500 million messages to Twitter. Collecting and storing every single tweet and its resulting metadata from a single day would be a daunting task in and of itself.

The Library of Congress is trying something slightly more ambitious than that: storing and indexing every tweet ever posted.

With the help of social media facilitator Gnip, the Library of Congress aims to create an archive where researchers can access any tweet recorded since Twitter’s inception in 2006.

According to this update on the progress of the seemingly herculean project, the LOC has already archived 170 billion tweets and their respective metadata. That total includes the posts from 2006-2010, which Gnip compressed and sent to the LOC over three different files of 2.3 terabytes each. When the LOC uncompressed the files, they filled 20 terabytes’ worth of server space representing 21 billion tweets and its supplementary 50 metadata fields.

It is often said that 90% of the world’s data has accrued over the last two years. That is remarkably close to the truth for Twitter, as an additional 150 billion tweets (88% of the total) poured into the LOC archive in 2011 and 2012. Further, Gnip delivers hourly updates to the tune of half a billion tweets a day. That means 42 days’ worth of 2012-2013 tweets equal the total amount from 2006-2010. In all, they are dealing with 133.2 terabytes of information.

Now there’s a big data problem for you! Not to mention a resource problem for the Library of Congress.

You might want to make a contribution to help fund their work on this project.

Obviously of incredible value for researchers at all levels, smaller sub-sets of the Twitter stream may be valuable as well.

If I were designing a Twitter based lexicon for covert communication for example, I would want to use frequent terms from particular geographic locations.

And/or create patterns of tweets from particular accounts so that they don’t stand out from others.

Not to mention trying to crunch the Twitter stream for content I know must be present.

### Federal Big Data Forum

Saturday, January 19th, 2013

From the post:

Friends at Cloudera are lead sponsors and coordinators of a new Big Data Forum focused on Apache Hadoop. The first, which will be held 30 January 2013 in Columbia Maryland, will be focused on lessons learned of use to the national security community. This is primarily for practitioners and leaders fielding real working Big Data solutions on Apache Hadoop and related technologies. I’ve seen a draft agenda, it includes a lineup of the nation’s greatest Big Data technologists, including the chairman of the Apache Software foundation and creator of Hadoop, Lucene and Nutch Doug Cutting.

This event is intentionally being focused on real practitioners and others who can benefit from lessons learned by those who have created/fielded real enterprise solutions. This will fill up fast. Please mark you calendar now and register right away. To register see: http://info.cloudera.com/federal-big-data-hadoop-forum.html

Bob’s post also has the invite.

I won’t be able to attend but would love to hear from anyone who does. Thanks!

### Geospatial Intelligence Forum

Monday, December 24th, 2012

Geospatial Intelligence Forum: The Magazine of the National Intelligence Community

Apologies but I could not afford a magazine subscription for every reader of this blog.

The next best thing is a free magazine that may be useful in your data integration/topic map practice.

Defense intelligence has been a hot topic for the last decade and there are no signs that is going to change any time soon.

I was browsing through Geospatial Intelligence Forum (GIF) when I encountered:

Closing the Interoperability Gap by Cheryl Gerber.

From the article:

The current technology gaps can be frustrating for soldiers to grapple with, particularly in the middle of battlefield engagements. “This is due, in part, to stovepiped databases forcing soldiers who are working in tactical operations centers to perform many work-arounds or data translations to present the best common operating picture to the commander,” said Dr. Joseph Fontanella, AGC director and Army geospatial information officer.

Now there is a use case for interoperability, being “…in the middle of battlefield engagements.”

Cheryl goes on to identify five (5) gaps in interoperability.

GIF looks like a good place to pick up riffs, memes, terminology and even possible contacts.

Enjoy!

### INSA Highlights Increasing Importance of Open Source

Tuesday, December 4th, 2012

INSA Highlights Increasing Importance of Open Source

From Recorded Future*:

The Intelligence and National Security Alliance (INSA) Rebalance Task Force recently released its new white paper “Expectations of Intelligence in the Information Age“.

We’re obviously big fans of open source analysis, so some of the lead observations reported by the task force really hit home. Here they are, as written by INSA:

• The heightened expectations of decision makers for timely strategic warning and current intelligence can be addressed in significant ways by the IC through “open sourcing” of information.
• “Open sourcing” will not replace traditional intelligence; decision makers will continue to expect the IC to extract those secrets others are determined to keep from the United States.
• However, because decision makers will access open sources as readily as the IC, they will expect the IC to rapidly validate open source information and quickly meld it with that derived from espionage and traditional sources of collection to provide them with the knowledge desired to confidently address national security issues and events.

You can check out an interactive version of the full report here, and take a moment to visit Recorded Future to see how we’re embracing this synthesis of open source and confidential intelligence.

I have confidence that the IC will find ways to make their collection, recording, analysis and synthesis of information with traditional intelligence sources incompatible with each other.

After all, we are less than five (5) years away from some unknown level of sharing of traditional intelligence data: Read’em and Weep.

Let’s say there is some sort of intelligence sharing by 2017 (2012 + 5). That’s sixteen (16) years after 9/11.

Being mindful that sharing doesn’t mean integrated into the information flow of the respective agencies.

How does that saying go?

Once is happenstance.

Twice is coincidence.

Three times is enemy action?

Where does the continuing failure to share intelligence fall on that list?

(Topic maps can’t provide the incentives to make sharing happen, but they do make sharing possible for people with incentives to share.)

* I listed the entry as originating from Recorded Future. Why some blog authors find it difficult to identify themselves I cannot say.

### How Google’s Dremel Makes Quick Work of Massive Data

Saturday, October 20th, 2012

How Google’s Dremel Makes Quick Work of Massive Data by Ian Armas Foster.

From the post:

The ability to process more data and the ability to process data faster are usually mutually exclusive. According to Armando Fox, professor of computer science at University of California at Berkeley, “the more you do one, the more you have to give up on the other.”

Hadoop, an open-source, batch processing platform that runs on MapReduce, is one of the main vehicles organizations are driving in the big data race.

However, Mike Olson, CEO of Cloudera, an important Hadoop-based vendor, is looking past Hadoop and toward today’s research projects. That includes one named Dremel, possibly Google’s next big innovation that combines the scale of Hadoop with the ever-increasing speed demands of the business intelligence world.

“People have done Big Data systems before,” Fox said “but before Dremel, no one had really done a system that was that big and that fast.”

On Dremel, see: Dremel: Interactive Analysis of Web-Scale Datasets, as well.

Are you looking (or considering looking) beyond Hadoop?

Accuracy and timeliness beyond the average daily intelligence briefing will drive demand for your information product.

Your edge is agility. Use it.

### Sneak Peek into Skybox Imaging’s Cloudera-powered Satellite System [InaaS?]

Saturday, October 20th, 2012

Sneak Peek into Skybox Imaging’s Cloudera-powered Satellite System by Justin Kestelyn (@kestelyn)

This is a guest post by Oliver Guinan, VP Ground Software, at Skybox Imaging. Oliver is a 15-year veteran of the internet industry and is responsible for all ground system design, architecture and implementation at Skybox.

One of the great promises of the big data movement is using networks of ubiquitous sensors to deliver insights about the world around us. Skybox Imaging is attempting to do just that for millions of locations across our planet.

Skybox is developing a low cost imaging satellite system and web-accessible big data processing platform that will capture video or images of any location on Earth within a couple of days. The low cost nature of the satellite opens the possibility of deploying tens of satellites which, when integrated together, have the potential to image any spot on Earth within an hour.

Skybox satellites are designed to capture light in the harsh environment of outer space. Each satellite captures multiple images of a given spot on Earth. Once the images are transferred from the satellite to the ground, the data needs to be processed and combined to form a single image, similar to those seen within online mapping portals.

With any sensor network, capturing raw data is only the beginning of the story. We at Skybox are building a system to ingest and process the raw data, allowing data scientists and end users to ask arbitrary questions of the data, then publish the answers in an accessible way and at a scale that grows with the number of satellites in orbit. We selected Cloudera to support this deployment.

Now is the time to start planning topic map based products that can incorporate this type of data.

There are lots of folks who are “curious” about what is happening next door, in the next block, a few “klicks” away, across the border, etc.

Not all of them have the funds for private “keyhole” satellites and vacuum data feeds. But they may have money to pay you for efficient and effective collation of intelligence data.

Topic maps empowering “Intelligence as a Service (InaaS)”?

### News Reporting, Not Just DHS Fusion Centers, Ineffectual

Wednesday, October 3rd, 2012

A report by the United States Senate, PERMANENT SUBCOMMITTEE ON INVESTIGATIONS, Committee on Homeland Security and Governmental Affairs, FEDERAL SUPPORT FOR AND INVOLVEMENT IN STATE AND LOCAL FUSION CENTERS (link to page with actual report), was described this way in the New York Times coverage:

One of the nation’s biggest domestic counterterrorism programs has failed to provide virtually any useful intelligence, according to Congressional investigators.

Their scathing report, to be released Wednesday, looked at problems in regional intelligence-gathering offices known as “fusion centers” that are financed by the Department of Homeland Security and created jointly with state and local law enforcement agencies.

The report found that the centers “forwarded intelligence of uneven quality — oftentimes shoddy, rarely timely, sometimes endangering citizens’ civil liberties and Privacy Act protections, occasionally taken from already published public sources, and more often than not unrelated to terrorism.”

The investigators reviewed 610 reports produced by the centers over 13 months in 2009 and 2010. Of these, the report said, 188 were never published for use within the Homeland Security Department or other intelligence agencies. Hundreds of draft reports sat for months, awaiting review by homeland security officials, making much of their information obsolete. And some of the reports appeared to be based on previously published information or facts that had on long since been reported through the Federal Bureau of Investigation.

What is remarkable about a link to a page with the actual report?

After reading the New York Times article, I looked for a link in the article to the report. Nada. Zip. The null string. No link.

Searching over news reports from other major news outlets, same result.

Searching the US Senate, PERMANENT SUBCOMMITTEE ON INVESTIGATIONS website, at least as of 5:00 AM Eastern Standard time on October 3, 2012, fails to produce the report.

We aren’t lacking the “semantic web.”

There is a lack of linking to information sources. Links empower the reader to make their own judgements.

I expect “shoddy reporting” from the Department of Homeland Security. I don’t expect it from the New York Times. Or other major news outlets.

The report will be a “brief flash in the pan.” The news cycle will move onto the latest political gaffe or fraud, just as DHS folk move onto other ineffectual activities.

Would be nice to link up names, events, etc., from the report, to past and future mentions of the same people and events.

Imagine Senator Levin asking: “This is your fifth appearance on questionable spending of government funds, in four separate agencies, under two different administrations?”

Accountability and transparency, a topic maps double shot.

### The Man Behind the Curtain

Tuesday, September 25th, 2012

The Man Behind the Curtain

From the post:

Without any lead-in whatsoever, we just ask that you watch the video above.

And we ask that you hang on for a few moments—this goes far beyond the hocus pocus you’re thinking the clip contains.

You really need to see this video.

Should watchers to watch themselves?

Should people watch the watchers?

### Building a “Data Eye in the Sky”

Saturday, September 22nd, 2012

Building a “Data Eye in the Sky” by Erwin Gianchandani.

From the post:

Nearly a year ago, tech writer John Markoff published a story in The New York Times about Open Source Indicators (OSI), a new program by the Federal government’s Intelligence Advanced Research Projects Activity (IARPA) seeking to automatically collect publicly available data, including Web search queries, blog entries, Internet traffic flows, financial market indicators, traffic webcams, changes in Wikipedia entries, etc., to understand patterns of human communication, consumption, and movement. According to Markoff:

It is intended to be an entirely automated system, a “data eye in the sky” without human intervention, according to the program proposal. The research would not be limited to political and economic events, but would also explore the ability to predict pandemics and other types of widespread contagion, something that has been pursued independently by civilian researchers and by companies like Google.

This past April, IARPA issued contracts to three research teams, providing funding potentially for up to three years, with continuation beyond the first year contingent upon satisfactory progress. At least two of these contracts are now public (following the link):

Erwin reviews what is known about programs at Virginia Tech and BBN Technologies.

And concludes with:

Each OSI research team is being required to make a number of warnings/alerts that will be judged on the basis of lead time, or how early the alert was made; the accuracy of the warning, such as the where/when/what of the alert; and the probability associated with the alert, that is, high vs. very high.

To learn more about the OSI program, check out the IARPA website or a press release issued by Virginia Tech.

Given the complexities of semantics, what has my curiosity up is how “warnings/alerts” are going to be judged?

Recalling that “all the lights were blinking red” before 9/11.

If all the traffic lights in the U.S. flashed three (3) times at the same time, without more, it could mean anything from the end of the Mayan calendar to free beer. One just never knows.

Do you have the stats on the oracle at Delphi?

Might be a good baseline for comparison.

### New Army Guide to Open-Source Intelligence

Sunday, September 16th, 2012

New Army Guide to Open-Source Intelligence

If you don’t know Full Text Reports, you should.

A top-tier research professional’s hand-picked selection of documents from academe, corporations, government agencies, interest groups, NGOs, professional societies, research institutes, think tanks, trade associations, and more.

You will winnow some chaff but also find jewels like Open Source Intelligence (PDF).

From the post:

• Provides fundamental principles and terminology for Army units that conduct OSINT exploitation.
• Discusses tactics, techniques, and procedures (TTP) for Army units that conduct OSINT exploitation.
• Provides a catalyst for renewing and emphasizing Army awareness of the value of publicly available information and open sources.
• Establishes a common understanding of OSINT.
• Develops systematic approaches to plan, prepare, collect, and produce intelligence from publicly available information from open sources.

Impressive intelligence overview materials.

Would be nice to re-work into a topic map intelligence approach document with the ability to insert a client’s name and industry specific examples. Has that militaristic tone that is hard to capture with civilian writers.

### Experts vs. Crowds (How to Distinguish, CIA and Drones)

Monday, August 27th, 2012

Reporting on the intelligence community’s view of crowd-sourcing, Ken Dilanian reports:

“I don’t believe in the wisdom of crowds,” said Mark Lowenthal, a former senior CIA and State Department analyst (and 1988″Jeopardy!” champion) who now teaches classified courses about intelligence. “Crowds produce riots. Experts produce wisdom.”

I would modify Lowenthal’s assessment to read:

Crowds produce diverse judgements. Experts produce highly similar judgements.

Or to put it another way, the smaller the group, over time, the less variation you will find in opinion. And the further group opinion diverges from reality as experienced by non-group members.

No real surprise Beltway denizens failed to predict the Arab Spring. None of the concerns that led to the Arab Spring are part of the “experts” concerns. Not just on a conscious level but as a social experience.

The more diverse the opinion/experience pool, the less likely a crowd judgement is to be completely alien to reality as experienced by others.

Which is how I would explain the performance of the crowd thus far in the experiment.

Dilanian’s speculation:

Crowd-sourcing would mean, in theory, polling large groups across the 200,000-person intelligence community, or outside experts with security clearances, to aggregate their views about the strength of the Taliban, say, or the likelihood that Iran is secretly building a nuclear weapon.

reflects a failure to appreciate the nature of crowd-sourced judgements.

First, crowd-sourcing will be more effective if the “intelligence community” is only a small part of the crowd. To choose people only with security clearances I suspect automatically excludes many Taliban sympathizers. Not going to get good results if the crowd is poorly chosen.

Think of it as trying to re-create the “dance” that bees do as a means of communicating the location of pollen. I would trust the CIA to build a bee hive with only drones. And then complain that crowd behavior didn’t work.

Second, crowd-sourcing can do factual questions, like guessing the weight of an animal, but only if everyone has the same information. Otherwise, use crowd-sourcing to gauge the likely impact of policies, changes in policies, etc. Pulse of the “public” as it were.

The “likelihood that Iran is secretly building a nuclear weapon” isn’t a crowd-source question. No lack of information can counter the effort being “secret.” There is no information because, yes, Iran is keeping it secret.

Properly used, crowd-sourcing can be a very valuable tool.

The ad agencies call it public opinion polling.

Imagine appropriate polling activities on the ground in the Middle East. Asking ordinary people about their hopes, desires, and dreams. If credited over summarized and sanitized results of experts, could lead to policies that benefit the people, not to say the governments, of the Middle East. (Another reason some prefer experts. Experts support current governments.)

### World Leaders Comment on Attack in Bulgaria

Thursday, July 19th, 2012

World Leaders Comment on Attack in Bulgaria

From the post:

Following the terror attack in Bulgaria killing a number of Israeli tourists on an airport bus, we can see the statements from world leaders around the globe including Israel Prime Minister Benjamin Netanyahu openly pinning the blame on Iran and threatening retaliation

If you haven’t seen one of the visualizations by Recorded Future you will be impressed by this one. Mousing over people and locations invokes what we would call scoping in a topic map context and limits the number of connections you see. And each node can lead to additional information.

While this works like a topic map, I can’t say it is a topic map application because how it works isn’t disclosed. You can read How Recorded Future Works, but you won’t be any better informed than before you read it.

Impressive work but it isn’t clear how I would integrate their matching of sources to say an internal mapping of sources? Or how I would augment their mapping with additional mappings by internal subject experts?

Or how I would map this incident to prior incidents which lead to disproportionate responses?

Or map “terrorist” attacks by the world leaders now decrying other “terrorist” attacks?

That last mapping could be an interesting one for the application of the term “terrorist.” My anecdotal experience is that it depends on the sponsor.

Would be interesting to know if systematic analysis supports that observation.

Perhaps the news media could then evenly identify the probable sponsors of “terrorists” attacks.

### Detecting Emergent Conflicts with Recorded Future + Ushahidi

Friday, June 29th, 2012

Detecting Emergent Conflicts with Recorded Future + Ushahidi by Ninja Shoes. (?)

From the post:

An ocean of data is available on the web. From this ocean of data, information can in theory be extracted and used by analysts for detecting emergent trends (trend spotting). However, to do this manually is a daunting and nearly impossible task. We in this study we describe a semi-automatic system in which data is automatically collected from selected sources, and to which linguistic analysis is applied to extract e.g., entities and events. After combining the extracted information with human intelligence reports, the results are visualized to the user of the system who can interact with it in order to obtain a better awareness of historic as well as emergent trends. A prototype of the proposed system has been implemented and some initial results are presented in the paper.

A fairly remarkable bit of work that illustrates the current capabilities for mining the web and also its limitations.

The processing of news feeds for protest reports is interesting, but mistakes the result of years of activity as an “emergent” conflict.

If you were going to capture the data that would enable a human analyst to “predict” the Arab Spring, you would have to begin in union organizing activities. Not the sort of thing that is going to make news reports on the WWW.

For that you would need traditional human intelligence. From people who don’t spend their days debating traffic or reports with other non-native staffers. Or meeting with managers from Washington or Stockholm.

Or let me put it this way:

Mining the web doesn’t equal useful results. Just as mining for gold doesn’t mean you will find any.

### Korematsu Maps

Saturday, June 23rd, 2012

Just in case you need Korematsu maps of the United States, that is where immigrant populations are located, take a look at: Maps of the Foreign Born in the US.

For those of you unfamiliar with the Korematsu case, it involved the mass detention of people of Japanese ancestry during WW II. With no showing that any of the detainees were dangerous, disloyal, etc.

The thinking at the time and when the Supreme Court heard the case after the end of WW II, was that the fears of the many, out weighed the rights of the few.

You may be called upon to create maps to assist in mass detentions by race, ethnic or religious backgrounds. Just wanted you to know what to call them.

### The Central Intelligence Agency’s 9/11 File

Saturday, June 23rd, 2012

The Central Intelligence Agency’s 9/11 File

From the post:

The National Security Archive today is posting over 100 recently released CIA documents relating to September 11, Osama bin Laden, and U.S. counterterrorism operations. The newly-declassified records, which the Archive obtained under the Freedom of Information Act, are referred to in footnotes to the 9/11 Commission Report and present an unprecedented public resource for information about September 11.

The collection includes rarely released CIA emails, raw intelligence cables, analytical summaries, high-level briefing materials, and comprehensive counterterrorism reports that are usually withheld from the public because of their sensitivity. Today’s posting covers a variety of topics of major public interest, including background to al-Qaeda’s planning for the attacks; the origins of the Predator program now in heavy use over Afghanistan, Pakistan and Iran; al-Qaeda’s relationship with Pakistan; CIA attempts to warn about the impending threat; and the impact of budget constraints on the U.S. government’s hunt for bin Laden.

Today’s posting is the result of a series of FOIA requests by National Security Archive staff based on a painstaking review of references in the 9/11 Commission Report.

Possibly interesting material for topic map practice.

What has been redacted from the CIA documents? Based upon your mapping of other documents available on 9/11?

For extra points, include a summary of “why” you think the material was redacted?

Won’t be able to verify or check your answers but will be good practice at putting information in context to discover what may be missing.

### Semantic Technology For Intelligence, Defense, and Security STIDS 2012

Saturday, June 16th, 2012

SEMANTIC TECHNOLOGY FOR INTELLIGENCE, DEFENSE, AND SECURITY STIDS 2012

Paper submissions due: July 24, 2012
Notification of acceptance: August 28, 2012
Camera-ready papers due: September 18, 2012
Presentations due: October 17, 2012

Tutorials October 23
Main Conference October 24-26
Early Bird Registration rates until September 25

From the call for papers:

The conference is an opportunity for collaboration and cross-fertilization between researchers and practitioners of semantic-based technologies with particular experience in the problems facing the Intelligence, Defense, and Security communities. It will feature invited talks from prominent ontologists and recognized leaders from the target application domains.

To facilitate interchange among communities with a clear commonality of interest but little history of interaction, STIDS will host two separate tracks. The Research Track will showcase original, significant research on semantic technologies applicable to problems in intelligence, defense or security. Submissions to the research track are expected to clearly present their contribution, demonstrate its significance, and show the applicability to problems in the target applications domain. The Applications Track provides a forum for presenting implemented semantic-based applications to intelligence, defense, or security, as well as to discuss and evaluate the use of semantic techniques in these areas. Of particular interest are comparisons between different technologies or approaches and lessons learned from applications. By capitalizing on this opportunity, STIDS could spark dramatic progress toward transitioning semantic technologies from research to the field.

A hidden area where it will be difficult to cut IT budgets. Mostly because it is “hidden.”

Not the only reason you should participate but perhaps an extra incentive to do well!

### WikiLeaks as Wakeup Call?

Thursday, May 31st, 2012

Must be a slow news week. Federal Computer Week is recycling Wikileaks as a “wake up” call.

In case you have forgotten (or is that why the story is coming back up?), Robert Gates (Sec. of Defense) found that Wikileaks did not disclose sensitive intelligence sources or methods.

Hardly “…a security breach of epic proportions…” as claimed by the State Department.

If you want to claim Wikileaks was a “wakeup call,” make it a wake up call about “data dumpster” techniques for sharing intelligence data.

“Here are all our reports. Good luck finding something, anything.”

Security breach written all over it. Useless other than as material for a security breach. Easy to copy in bulk, etc.

Best methods for sharing intelligence vary depending on the data, security requirements and a host of other factors. Take Wikileaks as motivation (if lacking before) to strive for useful intelligence sharing.

Not sharing for the sake of saying you are sharing.

### From the Bin Laden Letters: Reactions in the Islamist Blogosphere

Saturday, May 19th, 2012

From the Bin Laden Letters: Reactions in the Islamist Blogosphere

From the post:

Following our initial analysis of the Osama bin Laden letters released by the Combating Terrorism Center (CTC) at West Point, we’ll more closely examine interesting moments from the letters and size them up against what was publicly reported as happening in the world in order to gain a deeper perspective on what was known or unknown at the time.

There was a frenzy of summarization and highlight reel reporting in the wake of the Abbottabad documents being publicly released. Some focused on the idea that Osama bin Laden was ostracized, some pointed to the seeming obsession with image in the media, and others simply took a chance to jab at Joe Biden for the suggestions made about his lack of preparedness for the presidency.

What we’ll do in this post is take a different approach, and rather than focus on analyst viewpoints we’ll compare reactions to the Abbottabad documents from a unique source – Islamist discussion forums.

There we find rebukes over the veracity of the documents released, support for the efforts of operatives such as Faisal Shahzad, and a little interest in the Arab Spring.

Interesting visualizations as always.

The question I would ask as a consumer of such information services is: How do I integrate this analysis with in-house analysis tools?

Or perhaps better: How do I evaluate non-direct references to particular persons or places? That is a person or place is implied but not named. What do I know about the basis for such an identification?

### From the Bin Laden Letters: Mapping OBL’s Reach into Yemen

Wednesday, May 16th, 2012

From the Bin Laden Letters: Mapping OBL’s Reach into Yemen

I puzzled over this headline. A close friend refers to President Obama as “OB1″ so I had a moment of confusion when reading the headline. Didn’t make sense for Bin Laden’s letters to map President Obama’s reach into Yemen.

With some diplomatic cables and White House internal documents, that would be an interesting visualization as well.

The mining of a larger corpus of 70,000+ public sources for individuals mentioned in the Ben Laden letters is responsible for the visualizations.

What we don’t know is what means of analysis produced the visualizations in question.

Some process was used to reduce redundant references to the same actors, events and relationships. Just by way of example.

That isn’t a complaint, simply an observation. It isn’t possible to evaluate the techniques used to obtain the results.

It would be interesting to see Recorded Future in one of the TREC competitions. At least then the results would be against a shared data set.

Do be aware that when the text says “open source,” what is meant is “open source intelligence.”

The better practice would be to say “open source intelligence or (OSINT)” and not “open source,” the latter having a well recognized meaning in the software community.

Friday, May 11th, 2012

My summary: We are less than five years away from some unknown level of functioning for an Information Sharing Environment (ISE) that facilitates the sharing of terrorism-related information.

Less than 20 years after 9/11, we will have some capacity to share information that may enable the potential disruption of terrorist plots.

The patience of terrorists and their organizations is appreciated. (I added that part. The report doesn’t say that.)

The official summary.

A breakdown in information sharing was a major factor contributing to the failure to prevent the September 11, 2001, terrorist attacks. Since then, federal, state, and local governments have taken steps to improve sharing. This statement focuses on government efforts to (1) establish the Information Sharing Environment (ISE), a government-wide approach that facilitates the sharing of terrorism-related information; (2) support fusion centers, where states collaborate with federal agencies to improve sharing; (3) provide other support to state and local agencies to enhance sharing; and (4) strengthen use of the terrorist watchlist. GAO’s comments are based on products issued from September 2010 through July 2011 and selected updates in September 2011. For the updates, GAO reviewed reports on the status of Department of Homeland Security (DHS) efforts to support fusion centers, and interviewed DHS officials regarding these efforts. This statement also includes preliminary observations based on GAO’s ongoing watchlist work. For this work, GAO is analyzing the guidance used by agencies to nominate individuals to the watchlist and agency procedures for screening individuals against the list, and is interviewing relevant officials from law enforcement and intelligence agencies, among other things..

Let me share with you the other GAO reports cited in this report:

Do you see semantic mapping opportunities in all those reports?

### CIA/NSA Diff Utility?

Thursday, May 10th, 2012

How much of the data sold to the CIA/NSA is from public resources?

Of the sort you find at Knoema?

Albeit some of it isn’t easy to find but it is public data.

A topic map of public data resources would be a good CIA/NSA Diff Utility so they could avoid paying for data that is freely available on the WWW.

I suppose the fall back position of suppliers would be their “value add.”

With public data sets, the CIA/NSA could put that “value add” to the test. Along the lines of the Netflix competition.

Even if the results weren’t the goal, it would be a good way to discover new techniques and/or analysts.

How would you “diff” public data from that being supplied by a contractor?

### Making Intelligence Systems Smarter (or Dumber)

Wednesday, May 9th, 2012

Picking the Brains of Strangers….[$507 Billion Dollar Prize (at least)] had three keys to its success: • Use of human analysts • Common access to data and prior efforts • Reuse of prior efforts by human analysts Intelligence analysts spend their days with snippets and bits of data, trying to wring sense out of it, only to pigeon hold their results in silos. Other analysts have to know about data to even request it. Or analysts with information must understand their information will help others with their own sensemaking. All contrary to the results in Picking the Brains of Strangers…. What information will result in sensemaking for one or more analysts is unknown. And cannot be known. Every firewall, every silo, every compartment, every clearance level, makes every intelligence agency and the overall intelligence community dumber. Until now, the intelligence community has chosen to be dumber and more secure. In a time of budget cuts and calls for efficiency in government, it is time for more effective intelligence work, even if less secure. Take the leak of the diplomatic cables. The only people unaware of the general nature of the cables were the public and perhaps the intelligence agency of Zambia. All other intelligence agencies probably had them or their own version, pigeon holed in their own systems. With robust intelligence sharing, the NSA could do all the signal capture and expense it out to other agencies. Rather than having duplicate systems by various agencies. And perhaps a public data flow of analysis for foreign news sources in their original languages. They may not have clearance but they may have insights into cultures and languages that are rare in intelligence agencies. But that presumes an interest in smarter intelligence systems, not dumber ones by design. ### Picking the Brains of Strangers….[$507 Billion Dollar Prize (at least)]

Wednesday, May 9th, 2012

Picking the Brains of Strangers Helps Make Sense of Online Information

Science Daily carried this summary (the official abstract and link are below):

People who have already sifted through online information to make sense of a subject can help strangers facing similar tasks without ever directly communicating with them, researchers at Carnegie Mellon University and Microsoft Research have demonstrated.

This process of distributed sensemaking, they say, could save time and result in a better understanding of the information needed for whatever goal users might have, whether it is planning a vacation, gathering information about a serious disease or trying to decide what product to buy.

The researchers explored the use of digital knowledge maps — a means of representing the thought processes used to make sense of information gathered from the Web. When participants in the study used a knowledge map that had been created and improved upon by several previous users, they reported that the quality of their own work was better than when they started from scratch or used a newly created knowledge map.

“Collectively, people spend more than 70 billion hours a year trying to make sense of information they have gathered online,” said Aniket Kittur, assistant professor in Carnegie Mellon’s Human-Computer Interaction Institute. “Yet in most cases, when someone finishes a project, that work is essentially lost, benefitting no one else and perhaps even being forgotten by that person. If we could somehow share those efforts, however, all of us might learn faster.”

Three take away points:

• “people spend more than 70 billion hours a year trying to make sense of information they have gathered online”
• “when someone finishes a project, that work is essentially lost, benefitting no one else and perhaps even being forgotten by that person”
• using knowledge maps created and improved upon by others — improved the quality of their own work

At the current minimum wage in the US of $7.25, that’s roughly$507,500,000,000. Some of us make more than minimum wage so that figure should be adjusted upwards.

The key to success was improvement upon efforts already improved upon by others.

Based on a small sample set (21 people) so there is an entire research field waiting to explore. Whether this holds true with different types of data, what group dynamics make it work best, individual characteristics that influence outcomes, interfaces (that help or hinder), processing models, software, hardware, integrating the results from different interfaces, etc.

Start here:

Distributed sensemaking: improving sensemaking by leveraging the efforts of previous users
by Kristie Fisher, Scott Counts, and Aniket Kittur.

Abstract:

We examine the possibility of distributed sensemaking: improving a user’s sensemaking by leveraging previous users’ work without those users directly collaborating or even knowing one another. We asked users to engage in sensemaking by organizing and annotating web search results into “knowledge maps,” either with or without previous users’ maps to work from. We also recorded gaze patterns as users examined others’ knowledge maps. Our findings show the conditions under which distributed sensemaking can improve sensemaking quality; that a user’s sensemaking process is readily apparent to a subsequent user via a knowledge map; and that the organization of content was more useful to subsequent users than the content itself, especially when those users had differing goals. We discuss the role distributed sensemaking can play in schema induction by helping users make a mental model of an information space and make recommendations for new tool and system development.