Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 25, 2012

The Man Behind the Curtain

Filed under: Intelligence,Privacy — Patrick Durusau @ 2:33 pm

The Man Behind the Curtain

From the post:

Without any lead-in whatsoever, we just ask that you watch the video above.

And we ask that you hang on for a few moments—this goes far beyond the hocus pocus you’re thinking the clip contains.

You really need to see this video.

Then answer:

Should watchers to watch themselves?

Should people watch the watchers?

September 22, 2012

Building a “Data Eye in the Sky”

Filed under: Intelligence,Prediction — Patrick Durusau @ 2:50 pm

Building a “Data Eye in the Sky” by Erwin Gianchandani.

From the post:

Nearly a year ago, tech writer John Markoff published a story in The New York Times about Open Source Indicators (OSI), a new program by the Federal government’s Intelligence Advanced Research Projects Activity (IARPA) seeking to automatically collect publicly available data, including Web search queries, blog entries, Internet traffic flows, financial market indicators, traffic webcams, changes in Wikipedia entries, etc., to understand patterns of human communication, consumption, and movement. According to Markoff:

It is intended to be an entirely automated system, a “data eye in the sky” without human intervention, according to the program proposal. The research would not be limited to political and economic events, but would also explore the ability to predict pandemics and other types of widespread contagion, something that has been pursued independently by civilian researchers and by companies like Google.

This past April, IARPA issued contracts to three research teams, providing funding potentially for up to three years, with continuation beyond the first year contingent upon satisfactory progress. At least two of these contracts are now public (following the link):

Erwin reviews what is known about programs at Virginia Tech and BBN Technologies.

And concludes with:

Each OSI research team is being required to make a number of warnings/alerts that will be judged on the basis of lead time, or how early the alert was made; the accuracy of the warning, such as the where/when/what of the alert; and the probability associated with the alert, that is, high vs. very high.

To learn more about the OSI program, check out the IARPA website or a press release issued by Virginia Tech.

Given the complexities of semantics, what has my curiosity up is how “warnings/alerts” are going to be judged?

Recalling that “all the lights were blinking red” before 9/11.

If all the traffic lights in the U.S. flashed three (3) times at the same time, without more, it could mean anything from the end of the Mayan calendar to free beer. One just never knows.

Do you have the stats on the oracle at Delphi?

Might be a good baseline for comparison.

September 16, 2012

New Army Guide to Open-Source Intelligence

Filed under: Intelligence,Open Data,Open Source,Public Data — Patrick Durusau @ 4:06 pm

New Army Guide to Open-Source Intelligence

If you don’t know Full Text Reports, you should.

A top-tier research professional’s hand-picked selection of documents from academe, corporations, government agencies, interest groups, NGOs, professional societies, research institutes, think tanks, trade associations, and more.

You will winnow some chaff but also find jewels like Open Source Intelligence (PDF).

From the post:

  • Provides fundamental principles and terminology for Army units that conduct OSINT exploitation.
  • Discusses tactics, techniques, and procedures (TTP) for Army units that conduct OSINT exploitation.
  • Provides a catalyst for renewing and emphasizing Army awareness of the value of publicly available information and open sources.
  • Establishes a common understanding of OSINT.
  • Develops systematic approaches to plan, prepare, collect, and produce intelligence from publicly available information from open sources.

Impressive intelligence overview materials.

Would be nice to re-work into a topic map intelligence approach document with the ability to insert a client’s name and industry specific examples. Has that militaristic tone that is hard to capture with civilian writers.

August 27, 2012

Experts vs. Crowds (How to Distinguish, CIA and Drones)

Filed under: Crowd Sourcing,Intelligence — Patrick Durusau @ 9:22 am

Reporting on the intelligence community’s view of crowd-sourcing, Ken Dilanian reports:

“I don’t believe in the wisdom of crowds,” said Mark Lowenthal, a former senior CIA and State Department analyst (and 1988″Jeopardy!” champion) who now teaches classified courses about intelligence. “Crowds produce riots. Experts produce wisdom.”

I would modify Lowenthal’s assessment to read:

Crowds produce diverse judgements. Experts produce highly similar judgements.

Or to put it another way, the smaller the group, over time, the less variation you will find in opinion. And the further group opinion diverges from reality as experienced by non-group members.

No real surprise Beltway denizens failed to predict the Arab Spring. None of the concerns that led to the Arab Spring are part of the “experts” concerns. Not just on a conscious level but as a social experience.

The more diverse the opinion/experience pool, the less likely a crowd judgement is to be completely alien to reality as experienced by others.

Which is how I would explain the performance of the crowd thus far in the experiment.

Dilanian’s speculation:

Crowd-sourcing would mean, in theory, polling large groups across the 200,000-person intelligence community, or outside experts with security clearances, to aggregate their views about the strength of the Taliban, say, or the likelihood that Iran is secretly building a nuclear weapon.

reflects a failure to appreciate the nature of crowd-sourced judgements.

First, crowd-sourcing will be more effective if the “intelligence community” is only a small part of the crowd. To choose people only with security clearances I suspect automatically excludes many Taliban sympathizers. Not going to get good results if the crowd is poorly chosen.

Think of it as trying to re-create the “dance” that bees do as a means of communicating the location of pollen. I would trust the CIA to build a bee hive with only drones. And then complain that crowd behavior didn’t work.

Second, crowd-sourcing can do factual questions, like guessing the weight of an animal, but only if everyone has the same information. Otherwise, use crowd-sourcing to gauge the likely impact of policies, changes in policies, etc. Pulse of the “public” as it were.

The “likelihood that Iran is secretly building a nuclear weapon” isn’t a crowd-source question. No lack of information can counter the effort being “secret.” There is no information because, yes, Iran is keeping it secret.

Properly used, crowd-sourcing can be a very valuable tool.

The ad agencies call it public opinion polling.

Imagine appropriate polling activities on the ground in the Middle East. Asking ordinary people about their hopes, desires, and dreams. If credited over summarized and sanitized results of experts, could lead to policies that benefit the people, not to say the governments, of the Middle East. (Another reason some prefer experts. Experts support current governments.)

Los Angeles Times, in: U.S. intelligence tests crowd-sourcing against its experts.

July 19, 2012

World Leaders Comment on Attack in Bulgaria

Filed under: Data Mining,Intelligence,Social Media — Patrick Durusau @ 4:53 am

World Leaders Comment on Attack in Bulgaria

From the post:

Following the terror attack in Bulgaria killing a number of Israeli tourists on an airport bus, we can see the statements from world leaders around the globe including Israel Prime Minister Benjamin Netanyahu openly pinning the blame on Iran and threatening retaliation

If you haven’t seen one of the visualizations by Recorded Future you will be impressed by this one. Mousing over people and locations invokes what we would call scoping in a topic map context and limits the number of connections you see. And each node can lead to additional information.

While this works like a topic map, I can’t say it is a topic map application because how it works isn’t disclosed. You can read How Recorded Future Works, but you won’t be any better informed than before you read it.

Impressive work but it isn’t clear how I would integrate their matching of sources to say an internal mapping of sources? Or how I would augment their mapping with additional mappings by internal subject experts?

Or how I would map this incident to prior incidents which lead to disproportionate responses?

Or map “terrorist” attacks by the world leaders now decrying other “terrorist” attacks?

That last mapping could be an interesting one for the application of the term “terrorist.” My anecdotal experience is that it depends on the sponsor.

Would be interesting to know if systematic analysis supports that observation.

Perhaps the news media could then evenly identify the probable sponsors of “terrorists” attacks.

June 29, 2012

Detecting Emergent Conflicts with Recorded Future + Ushahidi

Filed under: Data Mining,Intelligence — Patrick Durusau @ 3:16 pm

Detecting Emergent Conflicts with Recorded Future + Ushahidi by Ninja Shoes. (?)

From the post:

An ocean of data is available on the web. From this ocean of data, information can in theory be extracted and used by analysts for detecting emergent trends (trend spotting). However, to do this manually is a daunting and nearly impossible task. We in this study we describe a semi-automatic system in which data is automatically collected from selected sources, and to which linguistic analysis is applied to extract e.g., entities and events. After combining the extracted information with human intelligence reports, the results are visualized to the user of the system who can interact with it in order to obtain a better awareness of historic as well as emergent trends. A prototype of the proposed system has been implemented and some initial results are presented in the paper.

The paper in question.

A fairly remarkable bit of work that illustrates the current capabilities for mining the web and also its limitations.

The processing of news feeds for protest reports is interesting, but mistakes the result of years of activity as an “emergent” conflict.

If you were going to capture the data that would enable a human analyst to “predict” the Arab Spring, you would have to begin in union organizing activities. Not the sort of thing that is going to make news reports on the WWW.

For that you would need traditional human intelligence. From people who don’t spend their days debating traffic or reports with other non-native staffers. Or meeting with managers from Washington or Stockholm.

Or let me put it this way:

Mining the web doesn’t equal useful results. Just as mining for gold doesn’t mean you will find any.

June 23, 2012

Korematsu Maps

Filed under: Intelligence,Security — Patrick Durusau @ 6:37 pm

Just in case you need Korematsu maps of the United States, that is where immigrant populations are located, take a look at: Maps of the Foreign Born in the US.

For those of you unfamiliar with the Korematsu case, it involved the mass detention of people of Japanese ancestry during WW II. With no showing that any of the detainees were dangerous, disloyal, etc.

The thinking at the time and when the Supreme Court heard the case after the end of WW II, was that the fears of the many, out weighed the rights of the few.

You may be called upon to create maps to assist in mass detentions by race, ethnic or religious backgrounds. Just wanted you to know what to call them.

The Central Intelligence Agency’s 9/11 File

Filed under: Intelligence,Security — Patrick Durusau @ 6:24 pm

The Central Intelligence Agency’s 9/11 File

From the post:

The National Security Archive today is posting over 100 recently released CIA documents relating to September 11, Osama bin Laden, and U.S. counterterrorism operations. The newly-declassified records, which the Archive obtained under the Freedom of Information Act, are referred to in footnotes to the 9/11 Commission Report and present an unprecedented public resource for information about September 11.

The collection includes rarely released CIA emails, raw intelligence cables, analytical summaries, high-level briefing materials, and comprehensive counterterrorism reports that are usually withheld from the public because of their sensitivity. Today’s posting covers a variety of topics of major public interest, including background to al-Qaeda’s planning for the attacks; the origins of the Predator program now in heavy use over Afghanistan, Pakistan and Iran; al-Qaeda’s relationship with Pakistan; CIA attempts to warn about the impending threat; and the impact of budget constraints on the U.S. government’s hunt for bin Laden.

Today’s posting is the result of a series of FOIA requests by National Security Archive staff based on a painstaking review of references in the 9/11 Commission Report.

Possibly interesting material for topic map practice.

What has been redacted from the CIA documents? Based upon your mapping of other documents available on 9/11?

For extra points, include a summary of “why” you think the material was redacted?

Won’t be able to verify or check your answers but will be good practice at putting information in context to discover what may be missing.

June 16, 2012

Semantic Technology For Intelligence, Defense, and Security STIDS 2012

Filed under: Conferences,Defense,Intelligence — Patrick Durusau @ 1:37 pm

SEMANTIC TECHNOLOGY FOR INTELLIGENCE, DEFENSE, AND SECURITY STIDS 2012

Paper submissions due: July 24, 2012
Notification of acceptance: August 28, 2012
Camera-ready papers due: September 18, 2012
Presentations due: October 17, 2012

Tutorials October 23
Main Conference October 24-26
Early Bird Registration rates until September 25

From the call for papers:

The conference is an opportunity for collaboration and cross-fertilization between researchers and practitioners of semantic-based technologies with particular experience in the problems facing the Intelligence, Defense, and Security communities. It will feature invited talks from prominent ontologists and recognized leaders from the target application domains.

To facilitate interchange among communities with a clear commonality of interest but little history of interaction, STIDS will host two separate tracks. The Research Track will showcase original, significant research on semantic technologies applicable to problems in intelligence, defense or security. Submissions to the research track are expected to clearly present their contribution, demonstrate its significance, and show the applicability to problems in the target applications domain. The Applications Track provides a forum for presenting implemented semantic-based applications to intelligence, defense, or security, as well as to discuss and evaluate the use of semantic techniques in these areas. Of particular interest are comparisons between different technologies or approaches and lessons learned from applications. By capitalizing on this opportunity, STIDS could spark dramatic progress toward transitioning semantic technologies from research to the field.

A hidden area where it will be difficult to cut IT budgets. Mostly because it is “hidden.” 😉

Not the only reason you should participate but perhaps an extra incentive to do well!

May 31, 2012

WikiLeaks as Wakeup Call?

Filed under: Intelligence,Wikileaks — Patrick Durusau @ 1:21 pm

Must be a slow news week. Federal Computer Week is recycling Wikileaks as a “wake up” call.

In case you have forgotten (or is that why the story is coming back up?), Robert Gates (Sec. of Defense) found that Wikileaks did not disclose sensitive intelligence sources or methods.

Hardly “…a security breach of epic proportions…” as claimed by the State Department.

If you want to claim Wikileaks was a “wakeup call,” make it a wake up call about “data dumpster” techniques for sharing intelligence data.

“Here are all our reports. Good luck finding something, anything.”

Security breach written all over it. Useless other than as material for a security breach. Easy to copy in bulk, etc.

What about this says “potential security breach” to you?

Best methods for sharing intelligence vary depending on the data, security requirements and a host of other factors. Take Wikileaks as motivation (if lacking before) to strive for useful intelligence sharing.

Not sharing for the sake of saying you are sharing.

May 19, 2012

From the Bin Laden Letters: Reactions in the Islamist Blogosphere

Filed under: Intelligence,Text Analytics — Patrick Durusau @ 4:41 pm

From the Bin Laden Letters: Reactions in the Islamist Blogosphere

From the post:

Following our initial analysis of the Osama bin Laden letters released by the Combating Terrorism Center (CTC) at West Point, we’ll more closely examine interesting moments from the letters and size them up against what was publicly reported as happening in the world in order to gain a deeper perspective on what was known or unknown at the time.

There was a frenzy of summarization and highlight reel reporting in the wake of the Abbottabad documents being publicly released. Some focused on the idea that Osama bin Laden was ostracized, some pointed to the seeming obsession with image in the media, and others simply took a chance to jab at Joe Biden for the suggestions made about his lack of preparedness for the presidency.

What we’ll do in this post is take a different approach, and rather than focus on analyst viewpoints we’ll compare reactions to the Abbottabad documents from a unique source – Islamist discussion forums.

There we find rebukes over the veracity of the documents released, support for the efforts of operatives such as Faisal Shahzad, and a little interest in the Arab Spring.

Interesting visualizations as always.

The question I would ask as a consumer of such information services is: How do I integrate this analysis with in-house analysis tools?

Or perhaps better: How do I evaluate non-direct references to particular persons or places? That is a person or place is implied but not named. What do I know about the basis for such an identification?

May 16, 2012

From the Bin Laden Letters: Mapping OBL’s Reach into Yemen

Filed under: Intelligence — Patrick Durusau @ 3:25 pm

From the Bin Laden Letters: Mapping OBL’s Reach into Yemen

I puzzled over this headline. A close friend refers to President Obama as “OB1” so I had a moment of confusion when reading the headline. Didn’t make sense for Bin Laden’s letters to map President Obama’s reach into Yemen.

With some diplomatic cables and White House internal documents, that would be an interesting visualization as well.

The mining of a larger corpus of 70,000+ public sources for individuals mentioned in the Ben Laden letters is responsible for the visualizations.

What we don’t know is what means of analysis produced the visualizations in question.

Some process was used to reduce redundant references to the same actors, events and relationships. Just by way of example.

That isn’t a complaint, simply an observation. It isn’t possible to evaluate the techniques used to obtain the results.

It would be interesting to see Recorded Future in one of the TREC competitions. At least then the results would be against a shared data set.

Do be aware that when the text says “open source,” what is meant is “open source intelligence.”

The better practice would be to say “open source intelligence or (OSINT)” and not “open source,” the latter having a well recognized meaning in the software community.

May 11, 2012

Read’em and Weep

Filed under: Government,Government Data,Intelligence — Patrick Durusau @ 2:14 pm

I read Progress Made and Challenges Remaining in Sharing Terrorism-Related Information today.

My summary: We are less than five years away from some unknown level of functioning for an Information Sharing Environment (ISE) that facilitates the sharing of terrorism-related information.

Less than 20 years after 9/11, we will have some capacity to share information that may enable the potential disruption of terrorist plots.

The patience of terrorists and their organizations is appreciated. (I added that part. The report doesn’t say that.)

The official summary.

A breakdown in information sharing was a major factor contributing to the failure to prevent the September 11, 2001, terrorist attacks. Since then, federal, state, and local governments have taken steps to improve sharing. This statement focuses on government efforts to (1) establish the Information Sharing Environment (ISE), a government-wide approach that facilitates the sharing of terrorism-related information; (2) support fusion centers, where states collaborate with federal agencies to improve sharing; (3) provide other support to state and local agencies to enhance sharing; and (4) strengthen use of the terrorist watchlist. GAO’s comments are based on products issued from September 2010 through July 2011 and selected updates in September 2011. For the updates, GAO reviewed reports on the status of Department of Homeland Security (DHS) efforts to support fusion centers, and interviewed DHS officials regarding these efforts. This statement also includes preliminary observations based on GAO’s ongoing watchlist work. For this work, GAO is analyzing the guidance used by agencies to nominate individuals to the watchlist and agency procedures for screening individuals against the list, and is interviewing relevant officials from law enforcement and intelligence agencies, among other things..

The government continues to make progress in sharing terrorism-related information among its many security partners, but does not yet have a fully-functioning ISE in place. In prior reports, GAO recommended that agencies take steps to develop an overall plan or roadmap to guide ISE implementation and establish measures to help gauge progress. These measures would help determine what information sharing capabilities have been accomplished and are left to develop, as well as what difference these capabilities have made to improve sharing and homeland security. Accomplishing these steps, as well as ensuring agencies have the necessary resources and leadership commitment, should help strengthen sharing and address issues GAO has identified that make information sharing a high-risk area. Federal agencies are helping fusion centers build analytical and operational capabilities, but have more work to complete to help these centers sustain their operations and measure their homeland security value. For example, DHS has provided resources, including personnel and grant funding, to develop a national network of centers. However, centers are concerned about their ability to sustain and expand their operations over the long term, negatively impacting their ability to function as part of the network. Federal agencies have provided guidance to centers and plan to conduct annual assessments of centers’ capabilities and develop performance metrics by the end of 2011 to determine centers’ value to the ISE. DHS and the Department of Justice are providing technical assistance and training to help centers develop privacy and civil liberties policies and protections, but continuous assessment and monitoring policy implementation will be important to help ensure the policies provide effective protections. In response to its mission to share information with state and local partners, DHS’s Office of Intelligence and Analysis (I&A) has taken steps to identify these partner’s information needs, develop related intelligence products, and obtain more feedback on its products. I&A also provides a number of services to its state and local partners that were generally well received by the state and local officials we contacted. However, I&A has not yet defined how it plans to meet its state and local mission by identifying and documenting the specific programs and activities that are most important for executing this mission. The office also has not developed performance measures that would allow I&A to demonstrate the expected outcomes and effectiveness of state and local programs and activities. In December 2010, GAO recommended that I&A address these issues, which could help it make resource decisions and provide accountability over its efforts. GAO’s preliminary observations indicate that federal agencies have made progress in implementing corrective actions to address problems in watchlist-related processes that were exposed by the December 25, 2009, attempted airline bombing. These actions are intended to address problems in the way agencies share and use information to nominate individuals to the watchlist, and use the list to prevent persons of concern from boarding planes to the United States or entering the country, among other things. These actions can also have impacts on agency resources and the public, such as traveler delays and other inconvenience. GAO plans to report the results of this work later this year. GAO is not making new recommendations, but has made recommendations in prior reports to federal agencies to enhance information sharing. The agencies generally agreed and are making progress, but full implementation of these recommendations is needed.

Full Report: Progress Made and Challenges Remaining in Sharing Terrorism-Related Information

Let me share with you the other GAO reports cited in this report:

Do you see semantic mapping opportunities in all those reports?

May 10, 2012

CIA/NSA Diff Utility?

Filed under: Intelligence,Marketing,Topic Maps — Patrick Durusau @ 2:40 pm

How much of the data sold to the CIA/NSA is from public resources?

Of the sort you find at Knoema?

Albeit some of it isn’t easy to find but it is public data.

A topic map of public data resources would be a good CIA/NSA Diff Utility so they could avoid paying for data that is freely available on the WWW.

I suppose the fall back position of suppliers would be their “value add.”

With public data sets, the CIA/NSA could put that “value add” to the test. Along the lines of the Netflix competition.

Even if the results weren’t the goal, it would be a good way to discover new techniques and/or analysts.

How would you “diff” public data from that being supplied by a contractor?

May 9, 2012

Making Intelligence Systems Smarter (or Dumber)

Filed under: Data Silos,Distributed Sensemaking,Intelligence,Sensemaking — Patrick Durusau @ 10:02 am

Picking the Brains of Strangers….[$507 Billion Dollar Prize (at least)] had three keys to its success:

  • Use of human analysts
  • Common access to data and prior efforts
  • Reuse of prior efforts by human analysts

Intelligence analysts spend their days with snippets and bits of data, trying to wring sense out of it, only to pigeon hold their results in silos.

Other analysts have to know about data to even request it. Or analysts with information must understand their information will help others with their own sensemaking.

All contrary to the results in Picking the Brains of Strangers….

What information will result in sensemaking for one or more analysts is unknown. And cannot be known.

Every firewall, every silo, every compartment, every clearance level, makes every intelligence agency and the overall intelligence community dumber.

Until now, the intelligence community has chosen to be dumber and more secure.

In a time of budget cuts and calls for efficiency in government, it is time for more effective intelligence work, even if less secure.

Take the leak of the diplomatic cables. The only people unaware of the general nature of the cables were the public and perhaps the intelligence agency of Zambia. All other intelligence agencies probably had them or their own version, pigeon holed in their own systems.

With robust intelligence sharing, the NSA could do all the signal capture and expense it out to other agencies. Rather than having duplicate systems by various agencies.

And perhaps a public data flow of analysis for foreign news sources in their original languages. They may not have clearance but they may have insights into cultures and languages that are rare in intelligence agencies.

But that presumes an interest in smarter intelligence systems, not dumber ones by design.

Picking the Brains of Strangers….[$507 Billion Dollar Prize (at least)]

Filed under: BigData,Distributed Sensemaking,Intelligence,Sensemaking — Patrick Durusau @ 9:17 am

Picking the Brains of Strangers Helps Make Sense of Online Information

Science Daily carried this summary (the official abstract and link are below):

People who have already sifted through online information to make sense of a subject can help strangers facing similar tasks without ever directly communicating with them, researchers at Carnegie Mellon University and Microsoft Research have demonstrated.

This process of distributed sensemaking, they say, could save time and result in a better understanding of the information needed for whatever goal users might have, whether it is planning a vacation, gathering information about a serious disease or trying to decide what product to buy.

The researchers explored the use of digital knowledge maps — a means of representing the thought processes used to make sense of information gathered from the Web. When participants in the study used a knowledge map that had been created and improved upon by several previous users, they reported that the quality of their own work was better than when they started from scratch or used a newly created knowledge map.

“Collectively, people spend more than 70 billion hours a year trying to make sense of information they have gathered online,” said Aniket Kittur, assistant professor in Carnegie Mellon’s Human-Computer Interaction Institute. “Yet in most cases, when someone finishes a project, that work is essentially lost, benefitting no one else and perhaps even being forgotten by that person. If we could somehow share those efforts, however, all of us might learn faster.”

Three take away points:

  • “people spend more than 70 billion hours a year trying to make sense of information they have gathered online”
  • “when someone finishes a project, that work is essentially lost, benefitting no one else and perhaps even being forgotten by that person”
  • using knowledge maps created and improved upon by others — improved the quality of their own work

At the current minimum wage in the US of $7.25, that’s roughly $507,500,000,000. Some of us make more than minimum wage so that figure should be adjusted upwards.

The key to success was improvement upon efforts already improved upon by others.

Based on a small sample set (21 people) so there is an entire research field waiting to explore. Whether this holds true with different types of data, what group dynamics make it work best, individual characteristics that influence outcomes, interfaces (that help or hinder), processing models, software, hardware, integrating the results from different interfaces, etc.

Start here:

Distributed sensemaking: improving sensemaking by leveraging the efforts of previous users
by Kristie Fisher, Scott Counts, and Aniket Kittur.

Abstract:

We examine the possibility of distributed sensemaking: improving a user’s sensemaking by leveraging previous users’ work without those users directly collaborating or even knowing one another. We asked users to engage in sensemaking by organizing and annotating web search results into “knowledge maps,” either with or without previous users’ maps to work from. We also recorded gaze patterns as users examined others’ knowledge maps. Our findings show the conditions under which distributed sensemaking can improve sensemaking quality; that a user’s sensemaking process is readily apparent to a subsequent user via a knowledge map; and that the organization of content was more useful to subsequent users than the content itself, especially when those users had differing goals. We discuss the role distributed sensemaking can play in schema induction by helping users make a mental model of an information space and make recommendations for new tool and system development.

May 8, 2012

Reading Other People’s Mail For Fun and Profit

Filed under: Analytics,Data Analysis,Intelligence — Patrick Durusau @ 6:16 pm

Bob Gourley writes much better content than he does titles: Osama Bin Laden Letters Analyzed: A rapid assessment using Recorded Future’s temporal analytic technologies and intelligence analysis tools. (Sorry Bob.)

Bob writes:

The Analysis Intelligence site provides open source analysis and information on a variety of topics based on the the temporal analytic technology and intelligence analysis tools of Recorded Future. Shortly after the release of 175 pages of documents from the Combatting Terrorism Center (CTC) a very interesting assessment was posted on the site. This assessment sheds light on the nature of these documents and also highlights some of the important context that the powerful capabilities of Recorded Future can provide.

The analysis by Recorded Future is succinct and well done so I cite most of it below. I’ll conclude with some of my own thoughts as an experienced intelligence professional and technologist on some of the “So What” of this assessment.

If you are interested in analytics, particularly visual analytics, you will really appreciate this piece.

Recorded Future has a post on the US Presidential Election. Just to be on the safe side, I would “fuzz” the data when it got close to the election. 😉

April 12, 2012

Is There A Dictionary In The House? (Savanna – Think Software)

Filed under: Integration,Intelligence,OWL,Semantic Web — Patrick Durusau @ 7:04 pm

Reading a white paper on an integration solution from Thetus Corporation (on its Savanna product line) when I encountered:

Savanna supports the core architectural premise that the integration of external services and components is an essential element of any enterprise platform by providing out-of-the-box integrations with many of the technologies and programs already in use in the DI2E framework. These investments include existing programs, such as: the Intelligence Community Data Layer (ICDL), OPTIC (force protection application), WATCHDOG (Terrorist Watchlist 2.0), SERENGETI (AFRICOM socio-cultural analysis), SCAN-R (EUCOM deep futures analysis); and, in the future: TAC (tripwire search and analysis), and HSCB-funded modeling capabilities, including Signature Analyst and others. To further make use of existing external services and components, the proposed solution includes integration points for commercial and opensource software, including: SOLR (indexing), Open Sextant (geotagging), Apache OpenNLP (entity extraction), R (statistical analysis), ESRI (geo-processing), OpenSGI GeoCache (geospatial data), i2 Analyst’s Notebook (charting and analysis) and a variety of structured and unstructured data repositories.

I have to plead ignorance of the “existing program” alphabet soup but I am familiar with several of the open source packages.

I am not sure what an “integration point” for an unknown future use of any of those packages would look like. Do you? Their output can be used by any program but that hardly qualifies the other program as having an “integration point.”

I am sensitive to the use of “integration” because to me it means there is some basis for integration. So a user having integrated data once, can re-use and possibly enhance the basis for integration of data with other data. (We call that “merging” in topic map land.)

Integration and even reuse is mentioned: “The Savanna architecture prevents creating a set of comparable reuse issues at the enterprise scale by providing a set of interconnected and flexible models that articulate how analysis assets are sourced and created and how they are used by the community.” (page 16)

But not in enough detail to really evaluate the basis for re-use of data, data structures, enrichment of the same, etc.

Looked around for an SDK or such but came up empty.

Point of amusement:

It’s official, we’re debuting our newest release of Savanna at DoDIIS (March 21, 2012) (Department of Defense Intelligence Information Systems Worldwide Conference (DoDIIS))

The next blog entry by date?

Happy Peaceful Birthday to the Peace Corps (March 1, 2012)

I would appreciate hearing from anyone with information or stories to tell about how Savanna works in practice.

In particular I am interested in whether two distinct Savanna installations can share information in a blind interchange? That should be the test of re-use of information by another installation.

Moreover, do I have to convert data between formats or can data structures themselves be entities with properties?

PS: I am not overly impressed with the use of OWL for modeling in Savanna. The experience with “big data” has shown that starting with data first leads to different, perhaps more useful models than the other way around.

Premature modeling with OWL will result in models that are “useful” in meeting the expectations of the creating analyst. That may not be the criteria of “usefulness” that is required.

March 19, 2012

Intelligence Community (U.S.)

Filed under: Government,Intelligence — Patrick Durusau @ 6:53 pm

At the Intelligence and National Security Alliance (INSA) I ran across: Cloud Computing: Risks, Benefits, and Mission Enhancement for the Intelligence Community, which I thought might be of interest.

The document is important to learn the “lingo” that is being used to describe cloud computing in the intelligence community.

And to understand the grave misunderstandings of cloud computing in the intelligence community.

At page 7 you will find:

Within the IC, information is often the decisive discriminator. Studies of recent mission failures found that many were caused by:

  • The compartmentalization of information in data silos;
  • Weaknesses of the human-based document exploitation process; and
  • A reliance on “operationally proven” processes and filters typically used to address the lack of computational power or decision time.8

In most of these cases, the critical piece of information necessary for mission success was already possessed. The failure was not in obtaining the information but in locating and applying it to the mission. Cloud computing can address such issues, as well as enabling multi-use intelligence. Cloud solutions can now be used to work on all of the data, all of the time. With the ability to leverage the power of a supercomputer at will, critical decision timelines can now be more easily met. (Emphasis added)

Hard to make that many mistakes in one passage, short of misspelling one’s own name.

Cloud computing cannot address the sharing of intelligence, or as the document says: “…work on all of the data, all of the time.” That is a utter and complete falsehood.

Intelligence sharing is possible with cloud computing, just as it is with file folders with sticky labels. But the mechanism of sharing has not, cannot, and will not enable the sharing of intelligence or data in the intelligence community.

To say otherwise is to ignore the realities that produced the current culture of not sharing intelligence and data.

Sharing data and intelligence can only be accomplished by creating cultures, habits, social mechanisms that reward and promote the sharing of data and intelligence. Some of those can be represented or facilitated in information systems but it will be people who authorize, create and reward the use of those mechanisms.

So long as the NSA views the CIA (to just pick two agencies at random) as a leaky sieve, its staff are not going to take responsibility for initiating the sharing of information. Or even responding favorably to requests for information. You can pick any other pairing and get the same result.

Developing incentives and ridding the relevant agencies of people who aren’t incentivized to share, will go much further to promote the sharing of intelligence than any particular technology solution.

If you start to pitch a topic map solution in the intelligence community, I would mention sharing but also that without incentives they won’t be making the highest and best use of your topic map solution.

December 9, 2011

Maltego

Filed under: Intelligence,Maltego — Patrick Durusau @ 8:20 pm

Maltego

From the website:

What is Maltego?

With the continued growth of your organization, the people and hardware deployed to ensure that it remains in working order is essential, yet the threat picture of your “environment” is not always clear or complete. In fact, most often it’s not what we know that is harmful – it’s what we don’t know that causes the most damage. This being stated, how do you develop a clear profile of what the current deployment of your infrastructure resembles? What are the cutting edge tool platforms designed to offer the granularity essential to understand the complexity of your network, both physical and resource based?

Maltego is a unique platform developed to deliver a clear threat picture to the environment that an organization owns and operates. Maltego’s unique advantage is to demonstrate the complexity and severity of single points of failure as well as trust relationships that exist currently within the scope of your infrastructure.

The unique perspective that Maltego offers to both network and resource based entities is the aggregation of information posted all over the internet – whether it’s the current configuration of a router poised on the edge of your network or the current whereabouts of your Vice President on his international visits, Maltego can locate, aggregate and visualize this information.

Maltego offers the user with unprecedented information. Information is leverage. Information is power. Information is Maltego.

What does Maltego do?

  • Maltego is a program that can be used to determine the relationships and real world links between:
    • People
    • Groups of people (social networks)
    • Companies
    • Organizations
    • Web sites
    • Internet infrastructure such as:
      • Domains
      • DNS names
      • Netblocks
      • IP addresses
    • Phrases
    • Affiliations
    • Documents and files
  • These entities are linked using open source intelligence.
  • Maltego is easy and quick to install – it uses Java, so it runs on Windows, Mac and Linux.
  • Maltego provides you with a graphical interface that makes seeing these relationships instant and accurate – making it possible to see hidden connections.
  • Using the graphical user interface (GUI) you can see relationships easily – even if they are three or four degrees of separation away.
  • Maltego is unique because it uses a powerful, flexible framework that makes customizing possible. As such, Maltego can be adapted to your own, unique requirements.

I just encountered this today and have downloaded the community edition client. Have also registered for an account for the client.

More news as it develops.

December 5, 2011

US intelligence group seeks Machine Learning breakthroughs

Filed under: Funding,Intelligence — Patrick Durusau @ 7:50 pm

US intelligence group seeks Machine Learning breakthroughs

From the post:

Machine Learning technology is found in everything from spam detection programs to intelligent thermostats, but can the technology make a huge leap to handle the exponentially larger amounts of information and advanced applications of the future?

Researchers from the government’s cutting edge research group, the Intelligence Advanced Research Projects Activity (IARPA), certainly hope so and this week announced that they are looking to the industry for new ideas that may become the basis for cutting edge Machine Learning projects.

Read more: From Anonymous to Hackerazzi: The year in security mischief-making

From IARPA: The focus of our request for information is on recent advances toward automatic machine learning, including automation of architecture and algorithm selection and combination, feature engineering, and training data scheduling for usability by non-experts, as well as scalability for handling large volumes of data.   Machine Learning is used extensively in application areas of interest including speech, language, vision, sensor processing and the ability to meld that data into a single, what IARPA calls multi-modal system.

“In many application areas, the amount of data to be analyzed has been increasing exponentially (sensors, audio and video, social network data, web information) stressing even the most efficient procedures and most powerful processors. Most of these data are unorganized and unlabeled and human effort is needed for annotation and to focus attention on those data that are significant,” IARPA stated.

This could be interesting, depending on how you developed the interface. What if the system actually learned from its users while it was being used? So that not only did it provide faster access to more accurate information, it “learned” how to better do its job from the analysts using the software.

Especially if part of that “learning” was on what basis to merge information from disparate sources.

Note: Responses to the RFI are due by 27 January 2012.

September 8, 2011

Summing up Properties with subjectIdentifiers/URLs?

Filed under: Identification,Identifiers,Intelligence,Subject Identifiers,Subject Identity — Patrick Durusau @ 6:06 pm

I was picking tomatoes in the garden when I thought about telling Carol (my wife) the plants are about to stop producing.

Those plants are at a particular address, in the backyard, middle garden bed of three, are of three different varieties, but I am going to sum up those properties by saying: “The tomatoes are about to stop producing.”

It occurred to me that a subjectIdentifier could be assigned to a topic element on the basis of summing up properties of the topic.* That would have the advantage of enabling merging on the basis of subjectIdentifiers as opposed to more complex tests upon properties of a topic.

Disclosure of the basis for assignment of a subjectIdentifier is an interesting question.

It could be that a service wishes to produce subjectIdentifiers and index information based upon complex property measures, producing for consumption, the subjectIdentifiers and merge-capable indexes on one or more information sets. The basis for merging being the competitive edge offered by the service.

If promoting merging with a vendor’s process or format, which is seeking to become the TCP/IP of some area, the basis for merging and tools to assist with it will be supplied.

Or if you are an intelligence agency and you want an inward and outward facing interface that promotes merging of information but does not disclose your internal basis for identification, variants of this technique may be of interest.

*The notion of summing up imposes no prior constraints on the tests used or the location of the information subjected to those tests.

July 22, 2011

Hadoop for Intelligence Analysis???

Filed under: Hadoop,Intelligence — Patrick Durusau @ 6:05 pm

Hadoop for Intelligence Analysis

From the webpage:

CTOlabs.com, a subsidiary of the technology research, consulting and services firm Crucial Point LLC and a peer site of CTOvision.com, has just published a white paper providing context, tips and strategies around Hadoop titled “Hadoop for Intelligence Analysis.” This paper focuses on use cases selected to be informative to any organization thinking through ways to make sense out of large quantities of information.

I’m curious. How little would you have to know about Hadoop or intelligence analysis to get something from the “white paper?”

Or is having “Hadoop” in a title these days enough to gain a certain number of readers?

Unless you want to answer my first question, suggest that you avoid this “white paper” as “white noise.”

Your time can be better spent, doing almost anything.

April 21, 2011

IC Bias: If it’s measurable, it’s meaningful

Filed under: Data Models,Intelligence,Marketing — Patrick Durusau @ 12:37 pm

Dean Conway writes in Data Science in the U.S. Intelligence Community [1] about modeling assumptions:

For example, it is common for an intelligence analyst to measure the relationship between two data sets as they pertain to some ongoing global event. Consider, therefore, in the recent case of the democratic revolution in Egypt that an analyst had been asked to determine the relationship between the volume of Twitter traffic related to the protests and the size of the crowds in Tahrir Square. Assuming the analyst had the data hacking skills to acquire the Twitter data, and some measure of crowd density in the square, the next step would be to decide how to model the relationship statistically.

One approach would be to use a simple linear regression to estimate how Tweets affect the number of protests, but would this be reasonable? Linear regression assumes an independent distribution of observations, which is violated by the nature of mining Twitter. Also, these events happen in both time (over the course of several hours) and space (the square), meaning there would be considerable time- and spatial-dependent bias in the sample. Understanding how modeling assumptions impact the interpretations of analytical results is critical to data science, and this is particularly true in the IC.

His central point that: Understanding how modeling assumptions impact the interpretations of analytical results is critical to data science, and this is particularly true in the IC. cannot be over emphasized.

The example of Twitter traffic reveals a deeper bias in the intelligence community, if it’s measurable, it’s meaningful.

No doubt Twitter facilitated communication within communities that already existed but that does not make it an enabling technology.

The revolution was made possible by community organizers working over decades (http://english.aljazeera.net/news/middleeast/2011/02/2011212152337359115.html) and trade unions (http://www.guardian.co.uk/commentisfree/2011/feb/10/trade-unions-egypt-tunisia).

And the revolution continued after Twitter and then cell phones were turned off.

Understanding such events requires investment in human intell and analysis, not over reliance on SIGINT. [2]


[1] Spring (2011) issue of I-Q-Tel’s quarterly journal, IQT Quarterly

[2] That a source is technical or has lights and bells, does not make it reliable or even useful.

PS: The Twitter traffic, such as it was, may have primarily been from: Twitter, I think, is being used by news media people with computer connections, through those kind of means. Facebook, Twitter, and the Middle East, IEEE Spectrum, Steve Cherry interviews Ben Zhao, expert on social networking performance.

Are we really interested in how news people use Twitter, even in a social movement context?

« Newer Posts

Powered by WordPress