Archive for the ‘Social Media’ Category

Twitter As Investment Tool

Thursday, May 21st, 2015

Social Media, Financial Algorithms and the Hack Crash by Tero Karppi and Kate Crawford.


@AP: Breaking: Two Explosions in the White House and Barack Obama is injured’. So read a tweet sent from a hacked Associated Press Twitter account @AP, which affected financial markets, wiping out $136.5 billion of the Standard & Poor’s 500 Index’s value. While the speed of the Associated Press hack crash event and the proprietary nature of the algorithms involved make it difficult to make causal claims about the relationship between social media and trading algorithms, we argue that it helps us to critically examine the volatile connections between social media, financial markets, and third parties offering human and algorithmic analysis. By analyzing the commentaries of this event, we highlight two particular currents: one formed by computational processes that mine and analyze Twitter data, and the other being financial algorithms that make automated trades and steer the stock market. We build on sociology of finance together with media theory and focus on the work of Christian Marazzi, Gabriel Tarde and Tony Sampson to analyze the relationship between social media and financial markets. We argue that Twitter and social media are becoming more powerful forces, not just because they connect people or generate new modes of participation, but because they are connecting human communicative spaces to automated computational spaces in ways that are affectively contagious and highly volatile.

Social sciences lag behind the computer sciences in making their publications publicly accessible as well as publishing behind firewalls so I can report on is the abstract.

On the other hand, I’m not sure how much practical advice you could gain from the article as opposed to the volumes of commentary following the incident itself.

The research reminds me of Malcolm Gladwell, author of The Tipping Point and similar works.

While I have greatly enjoyed several of Gladwell’s books, including the Tipping Point, it is one thing to look back and say: “Look, there was a tipping point.” It is quite another to be in the present and successfully say: “Look, there is a tipping point and we can make it tip this way or that.”

In retrospect, we all credit ourselves with near omniscience when our plans succeed and we invent fanciful explanations about what we knew or realized at the time. Others, equally skilled, dedicated and competent, who started at the same time, did not succeed. Of course, the conservative media (and ourselves if we are honest), invent narratives to explain those outcomes as well.

Of course, deliberate manipulation of the market with false information, via Twitter or not, is illegal. The best you can do is look for a pattern of news and/or tweets that result in downward changes in a particular stock, which then recovers and then apply that pattern more broadly. You won’t make $millions off of any one transaction but that is the sort of thing that draws regulatory attention.

Exposure to Diverse Information on Facebook [Skepticism]

Saturday, May 9th, 2015

Exposure to Diverse Information on Facebook by Eytan Bakshy, Solomon Messing, Lada Adamicon.

From the post:

As people increasingly turn to social networks for news and civic information, questions have been raised about whether this practice leads to the creation of “echo chambers,” in which people are exposed only to information from like-minded individuals [2]. Other speculation has focused on whether algorithms used to rank search results and social media posts could create “filter bubbles,” in which only ideologically appealing content is surfaced [3].

Research we have conducted to date, however, runs counter to this picture. A previous 2012 research paper concluded that much of the information we are exposed to and share comes from weak ties: those friends we interact with less often and are more likely to be dissimilar to us than our close friends [4]. Separate research suggests that individuals are more likely to engage with content contrary to their own views when it is presented along with social information [5].

Our latest research, released today in Science, quantifies, for the first time, exactly how much individuals could be and are exposed to ideologically diverse news and information in social media [1].

We found that people have friends who claim an opposing political ideology, and that the content in peoples’ News Feeds reflect those diverse views. While News Feed surfaces content that is slightly more aligned with an individual’s own ideology (based on that person’s actions on Facebook), who they friend and what content they click on are more consequential than the News Feed ranking in terms of how much diverse content they encounter.

The Science paper: Exposure to Ideologically Diverse News and Opinion

The definition of an “echo chamber” is implied in the authors’ conclusion:

By showing that people are exposed to a substantial amount of content from friends with opposing viewpoints, our findings contrast concerns that people might “list and speak only to the like-minded” while online [2].

The racism of the Deep South existed in spite of interaction between whites and blacks. So “echo chamber” should not be defined as association of like with like, at least not entirely. The Deep South was a echo chamber of racism but not for a lack of diversity in social networks.

Besides lacking a useful definition of “echo chamber,” the author’s ignore the role of confirmation bias (aka “backfire effect”) when confronted with contrary thoughts or evidence. To some readers seeing a New York Times editorial disagreeing with their position, can make them feel better about being on the “right side.”

That people are exposed to diverse information on Facebook is interesting, but until there is a meaningful definition of “echo chambers,” the role Facebook plays in the maintenance of “echo chambers” remains unknown.

Bias? What Bias?

Monday, March 16th, 2015

Scientists Warn About Bias In The Facebook And Twitter Data Used In Millions Of Studies by Brid-Aine Parnell.

From the post:

Social media like Facebook and Twitter are far too biased to be used blindly by social science researchers, two computer scientists have warned.

Writing in today’s issue of Science, Carnegie Mellon’s Juergen Pfeffer and McGill’s Derek Ruths have warned that scientists are treating the wealth of data gathered by social networks as a goldmine of what people are thinking – but frequently they aren’t correcting for inherent biases in the dataset.

If folks didn’t already know that scientists were turning to social media for easy access to the pat statistics on thousands of people, they found out about it when Facebook allowed researchers to adjust users’ news feeds to manipulate their emotions.

Both Facebook and Twitter are such rich sources for heart pounding headlines that I’m shocked, shocked that anyone would suggest there is bias in the data! 😉

Not surprisingly, people participate in social media for reasons entirely of their own and quite unrelated to the interests or needs of researchers. Particular types of social media attract different demographics than other types. I’m not sure how you could “correct” for those biases, unless you wanted to collect better data for yourself.

Not that there are any bias free data sets but some are so obvious that it hardly warrants mentioning. Except that institutions like the Brookings Institute bump and grind on Twitter data until they can prove the significance of terrorist social media. Brookings knows better but terrorism is a popular topic.

Not to make data carry all the blame, the test most often applied to data is:

Will this data produce a result that merits more funding and/or will please my supervisor?

I first saw this in a tweet by Persontyle.

The ISIS Twitter Census

Saturday, March 7th, 2015

The ISIS Twitter Census: Defining and describing the population of ISIS supporters on Twitter by J.M. Berger and Jonathon Morgan.

This is the Brookings Institute report that I said was forthcoming in: Losing Your Right To Decide, Needlessly.

From the Executive Summary:

The Islamic State, known as ISIS or ISIL, has exploited social media, most notoriously Twitter, to send its propaganda and messaging out to the world and to draw in people vulnerable to radicalization.

By virtue of its large number of supporters and highly organized tactics, ISIS has been able to exert an outsized impact on how the world perceives it, by disseminating images of graphic violence (including the beheading of Western journalists and aid workers and more recently, the immolation of a Jordanian air force pilot), while using social media to attract new recruits and inspire lone actor attacks.

Although much ink has been spilled on the topic of ISIS activity on Twitter, very basic questions remain unanswered, including such fundamental issues as how many Twitter users support ISIS, who they are, and how many of those supporters take part in its highly organized online activities.

Previous efforts to answer these questions have relied on very small segments of the overall ISIS social network. Because of the small, cellular nature of that network, the examination of particular subsets such as foreign fighters in relatively small numbers, may create misleading conclusions.

My suggestion is that you skim the “group think” sections on ISIS and move quickly to Section 3, Methodology. That will put you into a position to evaluate the various and sundry claims about ISIS and what may or may not be supported by their methodology.

I am still looking for a metric for “successful” use of social media. So far, no luck.

SocioViz (Danger?)

Tuesday, February 24th, 2015

SocioViz (Danger?)

From the website:

SocioViz is a social media analytics platform powered by Social Network Analysis metrics

Are you a Social Media Marketer, Digital Journalist or Social Researcher? Have a try and jump on board!

After you login, you give SocioViz access to your Twitter account and it generates a visual graph of your connections.

But there is no “about us” link. The tos (terms of service) and privacy link just reloads the login page. Only other links are to share SocioViz on a variety of social media sites. Quick search did not find any other significant information.


Sort of like Luke in the trash compactor, I have a very bad feeling about this. 😉

Anyone know more about this site?

I don’t like opaque social sites seeking access to my accounts. Maybe nothing but poor design but it is so far beyond the pale that I suspect a less generous explanation.

If you are feeling really risky, search for SocioViz, the site will turn up in the first few hits. I am reluctant to even repeat its address online.

How Do Others See You Online?

Thursday, January 1st, 2015

The question isn’t “how do you see yourself online?” but “How to others see you online?”

Allowing for the vagaries of memory, selective unconscious editing, self-justification, etc., I quite confident that how others see us online isn’t the same thing as how we see ourselves.

The saying “know thyself” is often repeated and for practical purposes, is about as effective as a poke with a sharp stick. It hurts but there’s not much other benefit to be had.

Farhad Manjoo writes in ThinkUp Helps the Social Network User See the Online Self about the startup, which offers an analytical service of your participation in social networks.

Unlike your “selective” memory, Thinkup gives you a report based on all your tweets, posts, etc., and breaks them down in ways you probably would not anticipate. The service creates enough distance between you and the report that you get a glimpse of yourself as others may be seeing you.

Beyond whatever value self-knowledge has for you, Thinkup, as Farhad learns from experience, can make you a more effective user of social media. You are already spending time on social media, why not spend it more effectively?

Everything You Need To Know About Social Media Search

Sunday, December 14th, 2014

Everything You Need To Know About Social Media Search by Olsy Sorokina.

From the post:

For the past decade, social networks have been the most universally consistent way for us to document our lives. We travel, build relationships, accomplish new goals, discuss current events and welcome new lives—and all of these events can be traced on social media. We have created hashtags like #ThrowbackThursday and apps like Timehop to reminisce on all the past moments forever etched in the social web in form of status updates, photos, and 140-character phrases.

Major networks demonstrate their awareness of the role they play in their users’ lives by creating year-end summaries such as Facebook’s Year in Review, and Twitter’s #YearOnTwitter. However, much of the emphasis on social media has been traditionally placed on real-time interactions, which often made it difficult to browse for past posts without scrolling down for hours on end.

The bias towards real-time messaging has changed in a matter of a few days. Over the past month, three major social networks announced changes to their search functions, which made finding old posts as easy as a Google search. If you missed out on the news or need a refresher, here’s everything you need to know.

I suppose Olsy means in addition to search in general sucking.

Interested tidbit on Facebook:

This isn’t Facebook’s first attempt at building a search engine. The earlier version of Graph Search gave users search results in response to longer-form queries, such as “my friends who like Game of Thrones.” However, the semantic search never made it to the mobile platforms; many supposed that using complex phrases as search queries was too confusing for an average user.

Does anyone have any user research on the ability of users to use complex phrases as search queries?

I ask because if users have difficulty authoring “complex” semantics and difficulty querying with “complex” semantics, it stands to reason they may have difficulty interpreting “complex” semantic results. Yes?

If all three of those are the case, then how do we impart the value-add of “complex” semantics without tripping over one of those limitations?

Osly also covers Instagram and Twitter. Twitter’s advanced search looks like the standard include/exclude, etc. type of “advanced” search. “Advanced” maybe forty years ago in the early OPACs but not really “advanced” now.

Catch up on these new search features. They will provide at least a minimum of grist for your topic map mill.

The 2014 Social Media Glossary: 154 Essential Definitions

Saturday, October 25th, 2014

The 2014 Social Media Glossary: 154 Essential Definitions by Matt Foulger.

From the post:

Welcome to the 2014 edition of the Hootsuite Social Media Glossary. This is a living document that will continue to grow as we add more terms and expand our definitions. If there’s a term you would like to see added, let us know in the comments!

I searched but did not find an earlier version of this glossary on the Hootsuite blog. I have posted a comment asking for pointers to the earlier version(s).

In the meantime, you may want to compare: The Ultimate Glossary: 120 Social Media Marketing Terms Explained by Kipp Bodnar. From 2011 but if you don’t know the terms, even a 2011 posting may be helpful.

We all accept the notion that language evolves but within domains that evolution is gradual and as thinking in that domain shifts, making it harder for domain members to see it.

Tracking a rapidly changing vocabulary, such as the one used in social media, might be more apparent.

Web Apps in the Cloud: Even Astronomers Can Write Them!

Wednesday, October 22nd, 2014

Web Apps in the Cloud: Even Astronomers Can Write Them!

From the post:

Philip Cowperthwaite and Peter K. G. Williams work in time-domain astronomy at Harvard. Philip is a graduate student working on the detection of electromagnetic counterparts to gravitational wave events, and Peter studies magnetic activity in low-mass stars, brown dwarfs, and planets.

Astronomers that study GRBs are well-known for racing to follow up bursts immediately after they occur — thanks to services like the Gamma-ray Coordinates Network (GCN), you can receive an email with an event position less than 30 seconds after it hits a satellite like Swift. It’s pretty cool that we professionals can get real-time notification of stars exploding across the universe, but it also seems like a great opportunity to convey some of the excitement of cutting-edge science to the broader public. To that end, we decided to try to expand the reach of GCN alerts by bringing them on to social media. Join us for a surprisingly short and painless tale about the development of YOITSAGRB, a tiny piece of Python code on the Google App Engine that distributes GCN alerts through the social media app Yo.

If you’re not familiar with Yo, there’s not much to know. Yo was conceived as a minimalist social media experience: users can register a unique username and send each other a message consisting of “Yo,” and only “Yo.” You can think of it as being like Twitter, but instead of 140 characters, you have zero. (They’ve since added more features such as including links with your “Yo,” but we’re Yo purists so we’ll just be using the base functionality.) A nice consequence of this design is that the Yo API is incredibly straightforward, which is convenient for a “my first web app” kind of project.

While “Yo” has been expanded to include more content, the origin remains an illustration of the many meanings that can be signaled by the same term. In this case, the detection of a gamma-ray burst in the known universe.

Or “Yo” could mean it is time to start some other activity when received from a particular sender. Or even be a message composed entirely of “Yo’s” where different senders had some significance. Or “Yo’s” sent at particular times to compose a message. Or “Yo’s” sent to leave the impression that messages were being sent. 😉

So, does a “Yo” have any semantics separate and apart from that read into it by a “Yo” recipient?

Twitter and the Arab Spring

Sunday, September 7th, 2014

You may remember that “effective use of social media” was claimed as a hallmark of the Arab Spring. (The Arab Spring and the impact of social media and Opening Closed Regimes: What Was the Role of Social Media During the Arab Spring?)

When evaluating such claims remember that your experience with social media may or may not represent the experience with social media elsewhere.

For example, Citizen Engagement and Public Services in the Arab World: The Potential of Social Media from Mohammed Bin Rashid School of Government (2014) reports:

Figure 23: Egypt 22.4% Facebook User Penetration

Figure 34: Egypt 1.26% Twitter user penetration rate.

Those figures are as of 2014. Figures for prior years are smaller.

That doesn’t sound like a level of social media necessary for create and then drive a social movement like the Arab Spring.

You can find additional datasets and additional information at: Registration is free.

And check out: Mohammed Bin Rashid School of Government

I first saw this in a tweet by Peter W. Singer.

Conference on Weblogs and Social Media (Proceedings)

Saturday, May 31st, 2014

Proceedings of the Eighth International Conference on Weblogs and Social Media

A great collection of fifty-eight papers and thirty-one posters on weblogs and social media.

Not directly applicable to topic maps but social media messages are as confused, ambiguous, etc., as any area could be. Perhaps more so but there isn’t a reliable measure for semantic confusion that I am aware of to compare different media.

These papers may give you some insight into social media and useful ways for processing its messages.

I first saw this in a tweet by Ben Hachey.

Social Media Mining: An Introduction

Saturday, April 26th, 2014

Social Media Mining: An Introduction by Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu.

From the webpage:

The growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining.

Another Cambridge University Press title that is available in pre-publication PDF format.

If you are contemplating writing a textbook, Cambridge University Press access policies should be one of your considerations in seeking a publisher.

You can download the entire books, chapters, and slides from Social Media Mining: An Introduction

Do remember that only 14% of the U.S. adult population uses Twitter. Whatever “trends” you extract from Twitter may or may not reflect “trends” in the larger population.

I first saw this in a tweet by Stat Fact.

Are You A Facebook Slacker? (Or, “Don’t “Like” Me, Support Me!”)

Sunday, November 10th, 2013

Their title reads: The Nature of Slacktivism: How the Social Observability of an Initial Act of Token Support Affects Subsequent Prosocial Action by Kirk Kristofferson, Katherine White, John Peloza. (Kirk Kristofferson, Katherine White, John Peloza. The Nature of Slacktivism: How the Social Observability of an Initial Act of Token Support Affects Subsequent Prosocial Action. Journal of Consumer Research, 2013; : 000 DOI: 10.1086/674137)


Prior research offers competing predictions regarding whether an initial token display of support for a cause (such as wearing a ribbon, signing a petition, or joining a Facebook group) subsequently leads to increased and otherwise more meaningful contributions to the cause. The present research proposes a conceptual framework elucidating two primary motivations that underlie subsequent helping behavior: a desire to present a positive image to others and a desire to be consistent with one’s own values. Importantly, the socially observable nature (public vs. private) of initial token support is identified as a key moderator that influences when and why token support does or does not lead to meaningful support for the cause. Consumers exhibit greater helping on a subsequent, more meaningful task after providing an initial private (vs. public) display of token support for a cause. Finally, the authors demonstrate how value alignment and connection to the cause moderate the observed effects.

From the introduction:

We define slacktivism as a willingness to perform a relatively costless, token display of support for a social cause, with an accompanying lack of willingness to devote significant effort to enact meaningful change (Davis 2011; Morozov 2009a).

From the section: The Moderating Role of Social Observability: The Public versus Private Nature of Support:

…we anticipate that consumers who make an initial act of token support in public will be no more likely to provide meaningful support than those who engaged in no initial act of support.

Four (4) detailed studies and an extensive review of the literature are offered to support the author’s conclusions.

The only source that I noticed missing was:

10 Two men went up into the temple to pray; the one a Pharisee, and the other a publican.

11 The Pharisee stood and prayed thus with himself, God, I thank thee, that I am not as other men are, extortioners, unjust, adulterers, or even as this publican.

12 I fast twice in the week, I give tithes of all that I possess.

13 And the publican, standing afar off, would not lift up so much as his eyes unto heaven, but smote upon his breast, saying, God be merciful to me a sinner.

14 I tell you, this man went down to his house justified rather than the other: for every one that exalteth himself shall be abased; and he that humbleth himself shall be exalted.

King James Version, Luke 18: 10-14.

The authors would reverse the roles of the Pharisee and the publican, to find the Pharisee contributes “meaningful support,” and the publican has not.

We contrast token support with meaningful support, which we define as consumer contributions that require a significant cost, effort, or behavior change in ways that make tangible contributions to the cause. Examples of meaningful support include donating money and volunteering time and skills.

If you are trying to attract “meaningful support” for your cause or organization, i.e., avoid slackers, there is much to learn here.

If you are trying to move beyond the “cheap grace” (Bonhoeffer)* of “meaningful support” and towards “meaningful change,” there is much to be learned here as well.

Governments, corporations, ad agencies and even your competitors are manipulating the public understanding of “meaningful support” and “meaningful change.” And acceptable means for both.

You can play on their terms and lose, or you can define your own terms and roll the dice.


* I know the phrase “cheap grace” from Bonhoeffer but in running a reference to ground, I saw a statement in Wikipedia that Bonhoeffer learned that phrase from Adam Clayton Powell, Sr.. Homiletics have never been a strong interest of mine but I will try to run down some sources on sermons by Adam Clayton Powell, Sr.

Twitter Data Analytics

Wednesday, September 11th, 2013

Twitter Data Analytics by Shamanth Kumar, Fred Morstatter, and Huan Liu.

From the webpage:

Social media has become a major platform for information sharing. Due to its openness in sharing data, Twitter is a prime example of social media in which researchers can verify their hypotheses, and practitioners can mine interesting patterns and build realworld applications. This book takes a reader through the process of harnessing Twitter data to find answers to intriguing questions. We begin with an introduction to the process of collecting data through Twitter’s APIs and proceed to discuss strategies for curating large datasets. We then guide the reader through the process of visualizing Twitter data with realworld examples, present challenges and complexities of building visual analytic tools, and provide strategies to address these issues. We show by example how some powerful measures can be computed using various Twitter data sources. This book is designed to provide researchers, practitioners, project managers, and graduate students new to the field with an entry point to jump start their endeavors. It also serves as a convenient reference for readers seasoned in Twitter data analysis.

Preprint with data set on analyzing Twitter data.

Although running a scant seventy-nine (79) pages, including an index, Twitter Data Analytics (TDA) covers:

Each chapter end with suggestions for further reading and references.

In addition to learning more about Twitter and its APIs, the reader will be introduced to MondoDB, JUNG and D3.

No mean accomplishment for seventy-nine (79) pages!

Social Remains Isolated From ‘Business-Critical’ Data

Wednesday, August 14th, 2013

Social Remains Isolated From ‘Business-Critical’ Data by Aarti Shah.

From the post:

Social data — including posts, comments and reviews — are still largely isolated from business-critical enterprise data, according to a new report from the Altimeter Group.

The study considered 35 organizations — including Caesar’s Entertainment and Symantec — that use social data in context with enterprise data, defined as information collected from CRM, business intelligence, market research and email marketing, among other sources. It found that the average enterprise-class company owns 178 social accounts and 13 departments — including marketing, human resources, field sales and legal — are actively engaged on social platforms.

“Organizations have invested in social media and tools are consolidating but it’s all happening in a silo,” said Susan Etlinger, the report’s author. “Tools tend to be organized around departments because that’s where budgets live…and the silos continue because organizations are designed for departments to work fairly autonomously.”

Somewhat surprisingly, the report finds social data is often difficult to integrate because it is touched by so many organizational departments, all with varying perspectives on the information. The report also notes the numerous nuances within social data make it problematic to apply general metrics across the board and, in many organizations, social data doesn’t carry the same credibility as its enterprise counterpart. (emphasis added)

Isn’t the definition of a silo the organization of data from a certain perspective?

If so, why would it be surprising that different views on data make it difficult to integrate?

Viewing data from one perspective isn’t the same as viewing it from another perspective.

Not really a question of integration but of how easy/hard it is to view data from a variety of equally legitimate perspectives.

Rather than a quest for “the” view shouldn’t we be asking users: “What view serves you best?”

AAAI – Weblogs and Social Media

Tuesday, July 9th, 2013

Seventh International AAAI Conference on Weblogs and Social Media

Abstracts and papers from the Seventh International AAAI Conference on Weblogs and Social Media.

Much to consider:

Frontmatter: Six (6) entries.

Full Papers: Sixty-nine (69) entries.

Poster Papers: Eighteen (18) entries.

Demonstration Papers: Five (5) entries.

Computational Personality Recognition: Ten (10) entries.

Social Computing for Workforce 2.0: Seven (7) entries.

Social Media Visualization: Four (4) entries.

When the City Meets the Citizen: Nine (9) entries.

Be aware that the links for tutorials and workshops only give you the abstracts describing the tutorials and workshops.

There is the obligatory “blind men and the elephant” paper:

Blind Men and the Elephant: Detecting Evolving Groups in Social News


We propose an automated and unsupervised methodology for a novel summarization of group behavior based on content preference. We show that graph theoretical community evolution (based on similarity of user preference for content) is effective in indexing these dynamics. Combined with text analysis that targets automatically-identified representative content for each community, our method produces a novel multi-layered representation of evolving group behavior. We demonstrate this methodology in the context of political discourse on a social news site with data that spans more than four years and find coexisting political leanings over extended periods and a disruptive external event that lead to a significant reorganization of existing patterns. Finally, where there exists no ground truth, we propose a new evaluation approach by using entropy measures as evidence of coherence along the evolution path of these groups. This methodology is valuable to designers and managers of online forums in need of granular analytics of user activity, as well as to researchers in social and political sciences who wish to extend their inquiries to large-scale data available on the web.

It is a great paper but commits a common error when it notes:

Like the parable of Blind Men and the Elephant2, these techniques provide us with disjoint, specific pieces of information.

Yes, the parable is oft told to make a point about partial knowledge, but the careful observer will ask:

How are we different from the blind men trying to determine the nature of an elephant?

Aren’t we also blind men trying to determine the nature of blind men who are examining an elephant?

And so on?

Not that being blind men should keep us from having opinions, but it should may us wary of how deeply we are attached to them.

Not only are there elephants all the way down, there are blind men before, with (including ourselves) and around us.

Data Socializing

Tuesday, April 23rd, 2013

If you need more opportunities for data socializing, KDNuggets has complied: Top 30 LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science.

Here’s an interesting test:

Write down your LinkedIn groups and compare your list to this one.


ViralSearch: How Viral Content Spreads over Twitter

Wednesday, March 6th, 2013

ViralSearch: How Viral Content Spreads over Twitter by Andrew Vande Moere.

From the post:

ViralSearch [], developed by Jake Hofman and others of Microsoft Research, visualizes how content spreads over social media, and Twitter in particular.

ViralSearch is based on hundred thousands of stories that are spread through billions of mentions of these stories, over many generations. In particular, it reveals the typical, hidden structures behind the sharing of viral videos, photos and posts as an hierarchical generation tree or as an animated bubble graph. The interface contains an interactive timeline of events, as well as a search field to explore specific phrases, stories, or Twitter users to provide an overview of how the independent actions of many individuals make content go viral.

As this tool seems only to be available within Microsoft, you can only enjoy it by watching the documentary video below.

See also NYTLabs Cascade: How Information Propagates through Social Media for a visualization of a very similar concept.

Impressive graphics!

Question: If and when you have an insight while viewing a social networking graphic, where do you capture that insight?

That is how do you link your insight into a particular point in the graphic?

The Swipp API: Creating the World’s Social Intelligence

Monday, February 4th, 2013

The Swipp API: Creating the World’s Social Intelligence by Greg Bates.

From the post:

The Swipp API allows developers to integrate Swipp’s “Social Intelligence” into their sites and applications. Public information is not available on the API; interested parties are asked to email Once available the APIs will “make it possible for people to interact around any topic imaginable.”

[graphic omitted]

Having operated in stealth mode for 2 years, Swipp founders Don Thorson and Charlie Costantini decided to go public after Facebook’s release of it’s somewhat different competitor, the social graph. The idea is to let users rate any topic they can comment on or anything they can photograph. Others can chime in, providing an average rating by users. One cool difference: you can dislike something as well as like it, giving a rating from -5 to +5. According to Darrell Etherington at Techcrunch, the company has a three-pronged strategy of a consumer app just described, a business component tailored around specific events like the Superbowl, that will help businesses target specific segments.

A fact that seems to be lost in most discussions of social media/sites is that social intelligence already exists.

Social media/sites may assist in the capturing/recording of social intelligence but that isn’t the same thing as creating social intelligence.

It is an important distinction because understanding the capture/recording role enables us to focus on what we want to capture and in what way?

What we decide to capture or record greatly influences the utility of the social intelligence we gather.

Such as capturing how users choose to identify particular subjects or relationships between subjects, for example.

PS: The goal of Swipp is to create a social network and ratings system (like Facebook) that is open for re-use elsewhere on the web. Adding semantic integration to that social networks and ratings system would be a plus I would imagine.

REVIEW: Crawling social media and depicting social networks with NodeXL [in 3 parts]

Friday, February 1st, 2013

REVIEW: Crawling social media and depicting social networks with NodeXL by Eruditio Loginquitas.appears in three parts: Part 1 of 3, Part 2 of 3 and Part 3 of 3.

From part 1:

Surprisingly, given the complexity of the subject matter and the various potential uses by researchers from a range of fields, “Analyzing…” is a very coherent and highly readable text. The ideas are well illustrated throughout with full-color screenshots.

In the introduction, the authors explain that this is a spatially organized book—in the form of an organic tree. The early chapters are the roots which lay the groundwork of social media and social network analysis. Then, there is a mid-section that deals with how to use the NodeXL add-on to Excel. Finally, there are chapters that address particular social media platforms and how data is extracted and analyzed from each type. These descriptors include email, thread networks, Twitter, Facebook, WWW hyperlink networks, Flickr, YouTube, and wiki networks. The work is surprisingly succinct, clear, and practical.

Further, it is written with such range that it can serve as an introductory text for newcomers to social network analysis (me included) as well as those who have been using this approach for a while (but may need to review the social media and data crawling aspects). Taken in total, this work is highly informative, with clear depictions of the social and technical sides of social media platforms.

From part 2:

One of the strengths of “Analyzing Social Media Networks with NodeXL” is that it introduces a powerful research method and a tool that helps tap electronic media and non-electronic social network information intelligently, in a way that does not over-state what is knowable. The authors, Derek Hansen, Ben Schneiderman, and Marc A. Smith, are no strangers to research or academic publishing, and theirs is a fairly conservative approach in terms of what may be asserted.

To frame what may be researched, the authors use a range of resources: some generalized research questions, examples from real-world research, and step-by-step techniques for data extraction, analysis, visualization, and then further analysis.

From part 3:

What is most memorable about “Analyzing Social Media Networks with NodeXL” is the depth of information about the various social network sites that may be crawled using NodeXL. With so many evolving social network platforms, and each capturing and storing information differently, it helps to know what an actual data extractions mean.

I haven’t seen the book personally, but from this review it sounds like a good model for technical writing for a lay audience.

For that matter, a good model for writing about topic maps for a lay audience. (Many of the issues being similar.)

1 Billion Videos = No Reruns

Monday, January 14th, 2013

Viki Video: 1 Billion Videos in 150 languages Means Never Having to Say Rerun by Greg Bates.

from the post:

Tried of American TV? Tired of TV in English? Escape to Viki, the leading global TV and movie network, which provides videos with crowd sourced translations in 150 languages. The Viki API allows your users to browse more than 1 billion videos by genre, country, and language, plus search across the entire database. The API uses OAuth2.0 authentication, REST, with responses in either JSON or XML.

The Viki Platform Google Group.

Now this looks like a promising data set!

A couple of use cases for topic maps come to mind:

  • Entry in OPAC points patron mapping from catalog to videos from this database.
  • Entry returned from database maps to book in local library collection (via WorldCat) (more likely to appeal to me).

What use cases do you see?

Windows into Relational Events: Data Structures for Contiguous Subsequences of Edges

Friday, September 28th, 2012

Windows into Relational Events: Data Structures for Contiguous Subsequences of Edges by Michael J. Bannister, Christopher DuBois, David Eppstein, Padhraic Smyth.


We consider the problem of analyzing social network data sets in which the edges of the network have timestamps, and we wish to analyze the subgraphs formed from edges in contiguous subintervals of these timestamps. We provide data structures for these problems that use near-linear preprocessing time, linear space, and sublogarithmic query time to handle queries that ask for the number of connected components, number of components that contain cycles, number of vertices whose degree equals or is at most some predetermined value, number of vertices that can be reached from a starting set of vertices by time-increasing paths, and related queries.

Among other interesting questions, raises the issue of what time span of connections constitutes a network of interest? More than being “dynamic.” A definitional issue for the social network in question.

If you are working with social networks, a must read.

PS: You probably need to read: Relational events vs graphs, a posting by David Eppstein.

David details several different terms for “relational event data,” and says there are probably others they did not find. (Topic maps anyone?)

The Art of Social Media Analysis with Twitter and Python

Friday, July 20th, 2012

The Art of Social Media Analysis with Twitter and Python by Krishna Sankar.

All that social media data in your topic map has to come from somewhere. 😉

Covers both the basics of the Twitter API and social graph analysis. With code of course.

I first saw this at KDNuggets.

World Leaders Comment on Attack in Bulgaria

Thursday, July 19th, 2012

World Leaders Comment on Attack in Bulgaria

From the post:

Following the terror attack in Bulgaria killing a number of Israeli tourists on an airport bus, we can see the statements from world leaders around the globe including Israel Prime Minister Benjamin Netanyahu openly pinning the blame on Iran and threatening retaliation

If you haven’t seen one of the visualizations by Recorded Future you will be impressed by this one. Mousing over people and locations invokes what we would call scoping in a topic map context and limits the number of connections you see. And each node can lead to additional information.

While this works like a topic map, I can’t say it is a topic map application because how it works isn’t disclosed. You can read How Recorded Future Works, but you won’t be any better informed than before you read it.

Impressive work but it isn’t clear how I would integrate their matching of sources to say an internal mapping of sources? Or how I would augment their mapping with additional mappings by internal subject experts?

Or how I would map this incident to prior incidents which lead to disproportionate responses?

Or map “terrorist” attacks by the world leaders now decrying other “terrorist” attacks?

That last mapping could be an interesting one for the application of the term “terrorist.” My anecdotal experience is that it depends on the sponsor.

Would be interesting to know if systematic analysis supports that observation.

Perhaps the news media could then evenly identify the probable sponsors of “terrorists” attacks.

Social Meets Search with the Latest Version of Bing…

Saturday, June 2nd, 2012

Social Meets Search with the Latest Version of Bing…

Two things are obvious:

  • I am running a day behind.
  • Bing isn’t my default search engine. (Or I would have noticed this yesterday.)

From the post:

A few weeks ago, we introduced you to the most significant update to Bing since our launch three years ago, combining the best of search with relevant people from your social networks, including Facebook and Twitter. After the positive response to the preview, the new version of Bing is available today in the US at You can now access Bing’s new three column design , including the snapshot feature and social features.

According to a recent internal survey, nearly 75 % of people spend more time than they would like searching for information online. With Bing’s new design, you can access information from the Web including friends you do know and relevant experts that you may not know letting you spend less time searching and more time doing.

(screenshot omitted)

Today, we’re also unveiling a new advertising campaign to support the introduction of search plus social and announcing the Bing Summer of Doing, in celebration of the new features and designed to inspire people to do amazing things this summer.

BTW, correcting the HTML code in the post for Bing,

When I arrived, the “top” searches were:

  • Nazi parents
  • Hosni Mubarak

“Popular” searches ranging from the inane to the irrelevant.

I need something a bit more focused on subjects of interest to me.

Perhaps automated queries that are filtered, then processed into a topic map?

Something to think about over the summer. More posts to follow on that theme.

Knowledge Extraction and Consolidation from Social Media

Thursday, May 31st, 2012

Knowledge Extraction and Consolidation from Social Media KECSM2012 – November 11 – 12, Boston, USA.

Important dates

  • Jul 31, 2012: submission deadline full & short papers
  • Aug 21, 2012: notifications for research papers
  • Sep 10, 2012: camera-ready papers due
  • Oct 05, 2012: submission deadline poster & demo abstracts
  • Oct 10, 2012: notifications posters & demos

From the website:

The workshop aims to become a highly interactive research forum for exploring innovative approaches for extracting and correlating knowledge from degraded social media by exploiting the Web of Data. While the workshop’s general focus is on the creation of well-formed and well-interlinked structured data from highly unstructured Web content, its interdisciplinary scope will bring together researchers and practitioners from areas such as the semantic and social Web, text mining and NLP, multimedia analysis, data extraction and integration, and ontology and data mapping. The workshop will also look into innovative applications that exploit extracted knowledge in order to produce solutions to domain-specific needs.

We will welcome high-quality papers about current trends in the areas listed in the following, non-exhaustive list of topics. We will seek application-oriented, as well as more theoretical papers and position papers.

Knowledge detection and extraction (content perspective)

  • Knowledge extraction from text (NLP, text mining)
  • Dealing with scalability and performance issues with regard to large amounts of heterogeneous content
  • Multilinguality issues
  • Knowledge extraction from multimedia (image and video analysis)
  • Sentiment detection and opinion mining from text and audiovisual content
  • Detection and consideration of temporal and dynamics aspects
  • Dealing with degraded Web content

Knowledge enrichment, aggregation and correlation (data perspective)

  • Modelling of events and entities such as locations, organisations, topics, opinions
  • Representation of temporal and dynamics-related aspects
  • Data clustering and consolidation
  • Data enrichment based on linked data/semantic web
  • Using reference datasets to structure, cluster and correlate extracted knowledge
  • Evaluation of automatically extracted data

Exploitation of automatically extracted knowledge/data (application perspective)

  • Innovative applications which make use of automatically extracted data (e.g. for recommendation or personalisation of Web content)
  • Semantic search in annotated Web content
  • Entity-driven navigation of user-generated content
  • Novel navigation and visualisation of extracted knowledge/graphs and associated Web resources

I like the sound of “consolidation.” An unspoken or tacit goal of any knowledge gathering. Not much use in scattered pieces on the shop floor.

Collocated with the 11th International Semantic Web Conference (ISWC2012)


Thursday, May 10th, 2012


I remember my childhood neighborhood just before the advent of air conditioning and the omnipresence of TV. A walk down the block gave you a good idea of what your neighbors were up to. Or not. 😉

Comparing then to now, the neighborhood where I now live, is strangely silent. Walk down my block and you hear no TVs, conversations, radios, loud discussions or the like.

We have become increasingly isolated from others by our means of transportation, entertainment and climate control.

EveryBlock offers the promise of restoring some of the random contact with our neighbors to our lives.

EveryBlock says it solves two problems:

First, there’s no good place to keep track of everything happening in your neighborhood, from news coverage to events to photography. We try to collect all of the news and civic goings-on that have happened recently in your city, and make it simple for you to keep track of news in particular areas.

Second, there’s no good way to post messages to your neighbors online. Facebook lets you post messages to your friends, Twitter lets you post messages to your followers, but no well-used service lets you post a message to people in a given neighborhood.

EveryBlock addresses the problem of geographic blocks, but how do you get information on your professional block?

Do you hear anything unexpected or different? Or do you hear the customary and expected?

Maybe your professional block has gotten too silent.

Suggestions for how to change that?

HotSocial 2012

Saturday, March 31st, 2012

HotSocial 2012: First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research August 12, 2012, Beijing, China (in conjunction with ACM KDD 2012, August 12-16, 2012)

Important Dates:

Deadline for submissions: May 9, 2012 (11:59 PM, EST)
Notification of acceptance: June 1, 2012
Camera-ready version: June 12, 2012
HotSocial Workshop Day: Aug 12, 2012

From the post:

Among the fundamental open questions are:

  • How to access social networks data? Different communities have different means, each with pros and cons. Experience exchanges from different communities will be beneficial.
  • How to protect these data? Privacy and data protection techniques considering social and legal aspects are required.
  • How the complex systems and graph theory algorithms can be used for understanding social networks? Interdisciplinary collaboration are necessary.
  • Can social network features be exploited for a better computing and social network system design?
  • How do online social networks play a role in real-life (offline) community forming and evolution?
  • How does the human mobility and human interaction influence human behaviors and thus public health? How can we develop methodologies to investigate the public health and their correlates in the context of the social networks?

Topics of Interest:

Main topics of this workshop include (but are not limited to) the following:

  • methods for accessing social networks (e.g., sensor nets, mobile apps, crawlers) and bias correction for use in different communities (e.g., sociology, behavior studies, epidemiology)
  • privacy and ethic issues of data collection and management of large social graphs, leveraging social network properties as well as legal and social constraints
  • application of data mining and machine learning in the context of specific social networks
  • information spread models and campaign detection
  • trust and reputation and community evolution in the online and offline interacted social networks, including the presence and evolution of social identities and social capital in OSNs
  • understanding complex systems and scale-free networks from an interdisciplinary angle
  • interdisciplinary experiences and intermediate results on social network research

Sounds relevant to the “big data” stuff of interest to the White House.

PS: Have you noticed how some blogging software really sucks when you do “view source” on pages? Markup and data should be present. It makes content reuse easier. WordPress does it. How about your blogging software?

Social Media Application (FBI RFI)

Monday, February 20th, 2012

Social Media Application (FBI RFI)

Current Due Date: 11:00 AM, March 13, 2012

You have to read the Social Media Application.pdf document to prepare a response.

Be aware that as of 20 February 2012, that document has a blank page every other page. I suspect it is the complete document but have written to confirm and to request a corrected document be posted.

Out-Hoover Hoover: FBI wants massive data-mining capability for social media does mention:

Nowhere in this detailed RFI, however, does the FBI ask industry to comment on the privacy implications of such massive data collection and storage of social media sites. Nor does the FBI say how it would define the “bad actors” who would be subjected this type of scrutiny.

I take that to mean that the FBI is not seeking your comments on privacy implications or possible definitions of “bad actors.”

I won’t be able to prepare an official response because I don’t meet the contractor suitability requirements, which include a cost estimate for an offsite server as a solution to the requirements.

I will be going over the requirements and publishing my response here as though I meet the contractor suitability requirements. Could be an interesting exercise.

Social Media Monitoring with CEP, pt. 2: Context As Important As Sentiment

Sunday, February 5th, 2012

Social Media Monitoring with CEP, pt. 2: Context As Important As Sentiment by Chris Carlson.

From the post:

When I last wrote about social media monitoring, I made a case for using a technology like Complex Event Processing (“CEP”) to detect rapidly growing and geospatially-oriented social media mentions that can provide early warning detection for the public good (Social Media Monitoring for Early Warning of Public Safety Issues, Oct. 27, 2011).

A recent article by Chris Matyszczyk of CNET highlights the often conflicting and confusing nature of monitoring social media. A 26-year old British citizen, Leigh Van Bryan, gearing up for a holiday of partying in Los Angeles, California (USA), tweeted in British slang his intention to have a good time: “Free this week, for quick gossip/prep before I go and destroy America.” Since I’m not too far removed the culture of youth, I did take this to mean partying, cutting loose, having a good time (and other not-so-current definitions.)

This story does not end happily, as Van Bryan and his friend Emily Bunting were arrested and then sent back to Blighty.

This post will not increase American confidence in the TSAbut does illustrate how context can influence the identification of a subject (or “person of interest”) or to exclude the same.

Context is captured in topic maps using associations. In this particular case, a view of the information on the young man in question would reveal a lack of associations with any known terror suspects, people on the no-fly list, suspicious travel patterns, etc.

Not to imply that having good information leads to good decisions, technology can’t correct that particular disconnect.