Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 2, 2012

Predicting what topics will trend on Twitter [Predicting Merging?]

Filed under: Merging,Prediction,Time Series,Tweets — Patrick Durusau @ 1:40 pm

Predicting what topics will trend on Twitter

From the post:

Twitter’s home page features a regularly updated list of topics that are “trending,” meaning that tweets about them have suddenly exploded in volume. A position on the list is highly coveted as a source of free publicity, but the selection of topics is automatic, based on a proprietary algorithm that factors in both the number of tweets and recent increases in that number.

At the Interdisciplinary Workshop on Information and Decision in Social Networks at MIT in November, Associate Professor Devavrat Shah and his student, Stanislav Nikolov, will present a new algorithm that can, with 95 percent accuracy, predict which topics will trend an average of an hour and a half before Twitter’s algorithm puts them on the list — and sometimes as much as four or five hours before.

If you can’t attend the Interdisciplinary Workshop on Information and Decision in Social Networks workshop, which has an exciting final program, try Stanislav Nikolov thesis, Trend or No Trend: A Novel Nonparametric Method for Classifying Time Series.

Abstract:

In supervised classification, one attempts to learn a model of how objects map to labels by selecting the best model from some model space. The choice of model space encodes assumptions about the problem. We propose a setting for model specification and selection in supervised learning based on a latent source model. In this setting, we specify the model by a small collection of unknown latent sources and posit that there is a stochastic model relating latent sources and observations. With this setting in mind, we propose a nonparametric classification method that is entirely unaware of the structure of these latent sources. Instead, our method relies on the data as a proxy for the unknown latent sources. We perform classification by computing the conditional class probabilities for an observation based on our stochastic model. This approach has an appealing and natural interpretation — that an observation belongs to a certain class if it sufficiently resembles other examples of that class.

We extend this approach to the problem of online time series classification. In the binary case, we derive an estimator for online signal detection and an associated implementation that is simple, efficient, and scalable. We demonstrate the merit of our approach by applying it to the task of detecting trending topics on Twitter. Using a small sample of Tweets, our method can detect trends before Twitter does 79% of the time, with a mean early advantage of 1.43 hours, while maintaining a 95% true positive rate and a 4% false positive rate. In addition, our method provides the flexibility to perform well under a variety of tradeoffs between types of error and relative detection time.

This will be interesting in many classification contexts.

Particularly predicting what topics a user will say represent the same subject.

November 1, 2012

Design a Twitter Like Application with Nati Shalom

Filed under: Analytics,Cassandra,Stream Analytics,Tweets — Patrick Durusau @ 6:32 pm

Design a Twitter Like Application with Nati Shalom

From the description:

Design a large scale NoSQL/DataGrid application similar to Twitter with Nati Shalom.

The use case is solved with Gigaspaces and Cassandra but other NoSQL and DataGrids solutions could be used.

Slides : xebia-video.s3-website-eu-west-1.amazonaws.com/2012-02/realtime-analytics-for-big-data-a-twitter-case-study-v2-ipad.pdf

If you enjoyed the posts I pointed to at: Building your own Facebook Realtime Analytics System, you will enjoy the video. (Same author.)

Not to mention Nati teaches patterns, the specific software being important but incidental.

October 30, 2012

The one million tweet map

Filed under: Geography,Mapping,Maps,Tweets — Patrick Durusau @ 2:43 pm

The one million tweet map

Displays the last one million tweets by geographic location, plus the top five (5) hashtags.

So tweets are not just 140 or less character strings, they are locations as well. Wondering how far you can take re-purposing of a tweet?

Powered by Maptimize.

I first saw this at Mashable.com.

BTW, I don’t find the Adobe Social ad (part of the video at Mashable) all that convincing.

You?

October 26, 2012

Information Diffusion on Twitter by @snikolov

Filed under: Gephi,Graphs,Networks,Pig,Tweets — Patrick Durusau @ 6:33 pm

Information Diffusion on Twitter by @snikolov by Marti Hearst.

From the post:

Today Stan Nikolov, who just finished his masters at MIT in studying information diffusion networks, walked us through one particular theoretical model of information diffusion which tries to predict under what conditions an idea stops spreading based on a network’s structure (from the popular Easley and Kleinberg Network book). Stan also gathered a huge amount of Twitter data, processed it using Pig scripts, and graphed the results using Gephi. The video lecture below shows you some great visualizations of the spreading behavior of the data!

(video omitted)

The slides in his Lecture Notes let you see the Pig scripts in more detail.

Another deeply awesome lecture from Marti’s class on Twitter and big data.

Also an example of the level of analysis that a Twitter stream will need to withstand to avoid “imperial entanglements.”

October 24, 2012

Kurt Thomas on Security at Twitter and Everywhere

Filed under: BigData,Security,Tweets — Patrick Durusau @ 3:32 pm

Kurt Thomas on Security at Twitter and Everywhere by Marti Hearst.

From the post:

Kurt Thomas is a former Twitter engineer and a current PhD student at UC Berkeley who studies how the criminal underground conspires to make money via unintended uses of computer systems.

Lecture notes.

Focus is on underground economies that depend upon theft of data or compromise of access to data.

Suspect if you started making money over a free service, that would be an “unintended use” as well.

The Data Science Community on Twitter

Filed under: Data Science,Graphs,Networks,Social Networks,Tweets,Visualization — Patrick Durusau @ 2:07 pm

The Data Science Community on Twitter

From the webpage:

659 Twitter accounts linked to data science, May 2012.

Linkage of Twitter accounts to display followers and following nodes.

That sounds so inadequate (and is).

You need to go see the page, play with it and then come back.

How was that? Impressive yes?

OK, how would that experience be different if you were using a topic map?

More/less information? Other display options?

It is an impressive piece of eye candy but I have a sense it could be so much more.

You?

October 22, 2012

Whisper: Tracing the Propagation of Twitter Messages in Time and Space

Filed under: Graphics,Tweets,Visualization — Patrick Durusau @ 6:25 pm

Whisper: Tracing the Propagation of Twitter Messages in Time and Space by Andrew Vande Moere.

From the post:

Whisper [whisperseer.com] is a new data visualization technique that traces how Twitter messages propagate, in particular in terms of its temporal trends, its social and spatial extent, and its community response.

Subject of a paper at: IEEE Infovis/Visweek 2012
.

Where I found:

Whisper: Tracing the Spatiotemporal Process of Information Diffusion in Real Time by Nan Cao, Yu-Ru Lin, Xiaohua Sun, David Lazer, Shixia Liu, Huamin Qu.

Abstract:

When and where is an idea dispersed? Social media, like Twitter, has been increasingly used for exchanging information, opinions and emotions about events that are happening across the world. Here we propose a novel visualization design, Whisper, for tracing the process of information diffusion in social media in real time. Our design highlights three major characteristics of diffusion processes in social media: the temporal trend, social-spatial extent, and community response of a topic of interest. Such social, spatiotemporal processes are conveyed based on a sunflower metaphor whose seeds are often dispersed far away. In Whisper, we summarize the collective responses of communities on a given topic based on how tweets were retweeted by groups of users, through representing the sentiments extracted from the tweets, and tracing the pathways of retweets on a spatial hierarchical layout. We use an efficient flux line-drawing algorithm to trace multiple pathways so the temporal and spatial patterns can be identified even for a bursty event. A focused diffusion series highlights key roles such as opinion leaders in the diffusion process. We demonstrate how our design facilitates the understanding of when and where a piece of information is dispersed and what are the social responses of the crowd, for large-scale events including political campaigns and natural disasters. Initial feedback from domain experts suggests promising use for today’s information consumption and dispersion in the wild.

The videos at Andrew’s post are particularly impressive.

Monitoring tweets and their content appears to be a growing trend. Governments are especially interested in such techniques.

October 16, 2012

Analyzing Twitter Data with Hadoop, Part 2: Gathering Data with Flume

Filed under: Cloudera,Flume,Hadoop,Tweets — Patrick Durusau @ 9:15 am

Analyzing Twitter Data with Hadoop, Part 2: Gathering Data with Flume by Jon Natkins.

From the post:

This is the second article in a series about analyzing Twitter data using some of the components of the Hadoop ecosystem available in CDH, Cloudera’s open-source distribution of Hadoop and related projects. In the first article, you learned how to pull CDH components together into a single cohesive application, but to really appreciate the flexibility of each of these components, we need to dive deeper.

Every story has a beginning, and every data pipeline has a source. So, to build Hadoop applications, we need to get data from a source into HDFS.

Apache Flume is one way to bring data into HDFS using CDH. The Apache Flume website describes Flume as “a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.” At the most basic level, Flume enables applications to collect data from its origin and send it to a resting location, such as HDFS. At a slightly more detailed level, Flume achieves this goal by defining dataflows consisting of three primary structures: sources, channels and sinks. The pieces of data that flow through Flume are called events, and the processes that run the dataflow are called agents.

A very good introduction to the use of Flume!

Does it seem to you that the number of examples using Twitter, not just for “big data” but in general seems to be on the rise?

Just a personal observation and subject to all the flaws, “all the buses were going the other way,” of such.

Judging from the state of my inbox, some people are still writing more than 140 characters at a time.

Will it make a difference in our tools/thinking if we focus on shorter strings as opposed to longer ones?

October 12, 2012

43 Big Data experts to follow on Twitter

Filed under: BigData,Tweets — Patrick Durusau @ 2:36 pm

43 Big Data experts to follow on Twitter by David Smith.

David points to a list of forty-three big data experts to follow on Twitter.

Who do you follow on Twitter for:

  • ElasticSearch
  • Graphs
  • Indexing
  • Lucene
  • Neo4j
  • Search Engines
  • Semantics
  • Solr

?

What else should I have listed and suggest experts for the same. Thanks!

October 10, 2012

Twitter Recommendations by @alpa

Filed under: Algorithms,Recommendation,Tweets — Patrick Durusau @ 4:18 pm

Twitter Recommendations by @alpa by Marti Hearst.

From the post:

Alpa Jain has great experience teaching from her time as a graduate student at Columbia University, and it shows in the clarity of her descriptions of SVD and other recommendation algorithms in today’s lecture:

Would you incorporate recommendation algorithms into a topic map authoring solution?

Fighting Spam at Twitter [Spam means non-licensed by service provider?]

Filed under: Ad Targeting,Spam,Tweets — Patrick Durusau @ 4:18 pm

Fighting Spam at Twitter by Marti Hearst.

From the post:

On Thursday, Delip Rao electrified the class with a lecture on how Twitter combats the pervasive threat of tweet spam:

The video failed but lecture notes are available.

Spam defined:

An unintended use of a service by an adversary to potentially cause harm or degrade user experience while maximizing benefit for the adversary.

On the slides, “Rate-limit avoidance” appears under “unintended use.”

Licensing by service provider means material that “degrade[s] user experience while maximizing benefit for the adversary” isn’t spam?

My experience with licensed spam on television (including cable) and online is that it all degrades my experience in hope of maximizing their gain.

We need a pull model for advertising instead of a push one.

Banning all push spam would be a step in the right direction.

October 2, 2012

Twitter Results Recipe with Gephi Garnish

Filed under: Gephi,Google Refine,Graphics,Tweets — Patrick Durusau @ 7:23 pm

Grabbing Twitter Search Results into Google Refine And Exporting Conversations into Gephi by Tony Hirst.

From the post:

How can we get a quick snapshot of who’s talking to whom on Twitter in the context of a particular hashtag?

What follows is a detailed recipe with the answer to that question.

September 30, 2012

Twitter Semantics Opportunity?

Filed under: Semantics,Tweets — Patrick Durusau @ 8:25 pm

Carl Bialik (Wall Street Journal) writes in Timing Twitter about the dangers of reading too much into tweet statistics and then says:

She [Twitter spokeswoman Elaine Filadelfo] noted that the company is being conservative in its counting, and that the true counts likely are higher than the ones reported by Twitter. For instance, the company didn’t include “Ryan” in its search tersm for the Republican convention, to avoid picking up tweets about, say, Ryan Gosling rather than those about Republican vice-presidential candidate Paul Ryan. And it has no way to catch tweets such as “beautiful dress” that are referring to presenters’ outfits during the Emmy Awards telecast. “You follow me during the Emmys, and you know I’m talking about the Emmys,” Filadelfo said of the hypothetical “beautiful dress” tweet. But Twitter doesn’t know that and doesn’t count that tweet.

Twitter may not “know” about the Emmys (they need to get out more) but certainly followers on Twitter did.

Followers probably bright enough to know which presenter was being identified in the tweet.

Imagine a crowd sourced twitter application where you follow particular people and add semantics to their tweets.

Might not return big bucks for the people adding semantics but if they were donating their time to an organization or group, could reach commercial mass.

We can keep waiting for computers to become dumb, at least, or we can pitch in to cover the semantic gap.

What do you think?

September 29, 2012

Twitter Social Network by @aneeshs (video lecture)

Filed under: Graphs,Networks,Social Networks,Tweets — Patrick Durusau @ 3:37 pm

Video Lecture: Twitter Social Network by @aneeshs by Marti Hearst.

From the post:

Learn about weak ties, triadic closures, and personal pagerank, and how they all relate to the Twitter social graph from Aneesh Sharma:

Just when you think the weekend can’t get any better!

Enjoy!

September 27, 2012

Mining Twitter Data with Ruby – Visualizing User Mentions

Filed under: Graphs,Ruby,Tweets,Visualization — Patrick Durusau @ 3:11 pm

Mining Twitter Data with Ruby – Visualizing User Mentions by Greg Moreno.

From the post:

In my previous post on mining twitter data with ruby, we laid our foundation for collecting and analyzing Twitter updates. We stored these updates in MongoDB and used map-reduce to implement a simple counting of tweets. In this post, we’ll show relationships between users based on mentions inside the tweet. Fortunately for us, there is no need to parse each tweet just to get a list of users mentioned in the tweet because Twitter provides the “entities.mentions” field that contains what we need. After we collected the “who mentions who”, we then construct a directed graph to represent these relationships and convert them to an image so we can actually see it.

Good lesson in paying attention to your data stream.

Can impress your clients with elaborate system for parsing tweets for mentions or you can just use the “entities.mentions” field.

I would rather used the “entities.mentions” field’s content to create linkage to more content. Possibly searched/parsed content.

Question of where you are going to devote your resources.

September 26, 2012

Splunk’s Software Architecture and GUI for Analyzing Twitter Data

Filed under: CS Lectures,Splunk,Tweets — Patrick Durusau @ 1:24 pm

Splunk’s Software Architecture and GUI for Analyzing Twitter Data by Marti Hearst.

From the post:

Today we learned about an alternative software architecture for processing large data, getting the technical details from Splunk’s VP of Engineering, Stephen Sorkin. Splunk also has a really amazing GUI for analyzing Twitter and other data sources in real time; be sure to watch the last 15 minutes of the video to see the demo:

Someone needs to organize a “big data tool of the month” club!

Or at the rate of current development, would that be a “big data tool of the week” club?

September 22, 2012

Real-Time Twitter Search by @larsonite

Filed under: Indexing,Java,Relevance,Searching,Tweets — Patrick Durusau @ 1:18 pm

Real-Time Twitter Search by @larsonite by Marti Hearst.

From the post:

Brian Larson gives a brilliant technical talk about how real-time search Real-Time Twitter Search by @larsoniteworks at Twitter; He really knows what he’s talking about given that he’s the tech lead for search and relevance at Twitter!

The coverage of the real-time indexing, Java memory model, safe publication were particularly good.

As a bonus, also discusses relevance near the end of the presentation.

You may want to watch this more than once!

Brian recommends Java Concurrency in Practice by Brian Goetz as having good coverage of the Java memory model.

Gnip Introduces Historical PowerTrack for Twitter [Gnip Feed Misses What?]

Filed under: Semantics,Tweets — Patrick Durusau @ 4:31 am

Gnip Introduces Historical PowerTrack for Twitter

From the post:

Gnip, the largest provider of social data to the world, is launching Historical PowerTrack for Twitter, which makes available every public Tweet since the launch of Twitter in March of 2006.

People use Twitter to connect with and share information on the things they care about. To date, analysts have had incomplete access to historical Tweets. Starting today, companies can now analyze a full six years of discussion around their brands and product launches to better understand the impact of these conversations. Political reporters can compare Tweets around the 2008 Election to the activity we are seeing around this year’s Election. Financial firms can backtest their trading algorithms to model how incorporating Twitter data generates additional signal. Business Intelligence companies can incorporate six years of Tweets into their data offerings so their customers can identify correlation with key business metrics like inventory and revenue.

“We’ve been developing Historical PowerTrack for Twitter for more than a year,” said Chris Moody, President and COO of Gnip. “During our early access phase, we’ve given companies like Esri, Brandwatch, Networked Insights, Union Metrics, Waggener Edstrom Worldwide and others the opportunity to take advantage of this amazing new data. With today’s announcement, we’re making this data fully available to the entire data ecosystem.” (emphasis added)

Can you name one thing that Gnip’s “PowerTrack for Twitter” is not capturing?

Think about it for a minute. I am sure they have all the “text” of tweets, along with whatever metadata was in the stream.

So what is Gnip missing and cannot deliver to you?

In a word, semantics.

The one thing that makes one message valuable and another irrelevant.

Example: In a 1950’s episode of “I Love Lucy,” Lucy says to Ricky over the phone, “There’s a man here making passionate love to me.” Didn’t have the same meaning in the 1950’s as it does now (and Ricky was in on the joke).

A firehose of tweets may be impressive, but so is an open fire plug in the summer.

Without direction (read semantics), the water just runs off into the sewer.

September 19, 2012

Analyzing Twitter Data with Hadoop [Hiding in a Public Data Stream]

Filed under: Cloudera,Flume,Hadoop,HDFS,Hive,Oozie,Tweets — Patrick Durusau @ 10:46 am

Analyzing Twitter Data with Hadoop by Jon Natkins

From the post:

Social media has gained immense popularity with marketing teams, and Twitter is an effective tool for a company to get people excited about its products. Twitter makes it easy to engage users and communicate directly with them, and in turn, users can provide word-of-mouth marketing for companies by discussing the products. Given limited resources, and knowing we may not be able to talk to everyone we want to target directly, marketing departments can be more efficient by being selective about whom we reach out to.

In this post, we’ll learn how we can use Apache Flume, Apache HDFS, Apache Oozie, and Apache Hive to design an end-to-end data pipeline that will enable us to analyze Twitter data. This will be the first post in a series. The posts to follow to will describe, in more depth, how each component is involved and how the custom code operates. All the code and instructions necessary to reproduce this pipeline is available on the Cloudera Github.

Looking forward to more posts in this series!

Social media is a focus for marketing teams for obvious reasons.

Analysis of snaps, crackles and pops en masse.

What if you wanted to communicate securely with others using social media?

Thinking of something more robust and larger than two (or three) lovers agreeing on code words.

How would you hide in a public data stream?

Or the converse, how would you hunt for someone in a public data stream?

How would you use topic maps to manage the semantic side of such a process?

September 14, 2012

Tweet Feeds For Topic Maps?

Filed under: Topic Map Software,Topic Maps,Tweets — Patrick Durusau @ 9:42 am

The Twitter Trend lecture will leave you with a number of ideas about tracking tweets.

It occurred to me watching the video that a Twitter stream could be used as a feed into a topic map.

Not the same as converting a tweet feed into a topic map, where you accept all tweets on some specified condition.

No, more along the lines that the topic map application watches for tweets from particular users or from particular users with specified hash tags, and when observed, adds information to a topic map.

Thinking such a feed mechanism could have templates that are invoked based upon hash tags for the treatment of tweet content or to marshal other information to be included in the map.

For example, I tweet: doi:10.3789/isqv24n2-3.2012 #tmbib .

A TM application recognizes the #tmbib, invokes a topic map bibliography template, uses the DOI to harvests the title, author, abstract, creates appropriate topics. (Or whatever your template is designed to do.)

Advantage: I don’t have to create and evangelize a new protocol for communication with my topic maps.

Advantage: Someone else is maintaining the pipe. (Not to be underestimated.)

Advantage: Tweet software is nearly ubiquitous.

Do you see a downside to this approach?

Kostas T. on How To Detect Twitter Trends

Filed under: Machine Learning,Tweets — Patrick Durusau @ 9:14 am

Kostas T. on How To Detect Twitter Trends by Marti Hearst.

From the post:

Have you ever wondered how Twitter computes its Trending Topics? Kostas T. is one of the wizards behind that, and today he shared some of the secrets with our class:

Be prepared to watch this more than once!

Sparks a number of ideas about how to track and analyze tweets.

September 12, 2012

Coding to the Twitter API

Filed under: CS Lectures,Tweets — Patrick Durusau @ 3:50 pm

Coding to the Twitter API by Marti Hearst.

From the post:

Today Rion Snow saved us huge amounts of time by giving us a primo introduction to the Twitter API. We learned about both the RESTful API and the streaming API for both Java and Python.

A very cool set of slides!

Just the right amount of detail and amusement. Clearly an experienced presenter!

September 11, 2012

GraphChi parsers toolkit

Filed under: GraphChi,GraphLab,Graphs,Latent Dirichlet Allocation (LDA),Parsers,Tweets — Patrick Durusau @ 9:53 am

GraphChi parsers toolkit by Danny Bickson.

From the post:

To the request of Baoqiang Cao I have started a parsers toolkits in GraphChi to be used for preparing data to be used in GraphLab/ Graphchi. The parsers should be used as template which can be easy customized to user specific needs.

Danny starts us off with an LDA parser (with worked example of its use) and then adds a Twitter parser that creates a graph of retweets.

Enjoy!

September 5, 2012

Exploring Twitter Data

Filed under: Splunk,Tweets — Patrick Durusau @ 4:32 pm

Exploring Twitter Data

From the post:

Want to explore popular content on Twitter with Splunk queries? The new Twitter App for Splunk 4.3 provides a scripted input that automatically extracts data from Twitter’s public 1% sample stream.

What could be better? Watching a twitter stream and calling it work. 😉

August 18, 2012

Clockwork Raven uses humans to crunch your Big Data

Filed under: BigData,Mechanical Turk,Tweets — Patrick Durusau @ 4:21 pm

Clockwork Raven uses humans to crunch your Big Data (Powered by an army of twits) by Elliot Bentley.

Twitter, the folks with the friendly API, ;-), have open sourced Clockwork Raven, a Twitter based means to upload small tasks to Mechanical Turk.

You can give users a full topic map editing/ontology creation tool (and train them in its use) or, you can ask very precise questions and crunch the output.

Not appropriate for every task but I suspect good enough for a number of them.

July 28, 2012

Twitter Words Association Analysis

Filed under: Tweets,Visualization,Word Association,Word Cloud — Patrick Durusau @ 7:41 pm

Twitter Words Association Analysis by Gunjan Amit.

From the post:

Recently I came across Twitter Spectrum tool from Jeff Clerk. This tool is modified version of News Spectrum tool.

Here you can enter two topics and then analyse the associated words based on twitter data. Blue and Red color represents the associated words of those two topics whereas Purple represents the common words.

You can click on any word to see the related tweets. The visualization is really awesome and you can easily analyze the data.

For example, I have taken “icici” and “hdfc” as two topics. Below is the twitter spectrum based on these two topics:

Looks interesting as a “rough cut” or exploratory tool.

July 20, 2012

The Art of Social Media Analysis with Twitter and Python

Filed under: Python,Social Graphs,Social Media,Tweets — Patrick Durusau @ 4:59 am

The Art of Social Media Analysis with Twitter and Python by Krishna Sankar.

All that social media data in your topic map has to come from somewhere. 😉

Covers both the basics of the Twitter API and social graph analysis. With code of course.

I first saw this at KDNuggets.

July 12, 2012

Real-time Twitter heat map with MongoDB

Filed under: Mapping,Maps,MongoDB,Tweets — Patrick Durusau @ 1:54 pm

Real-time Twitter heat map with MongoDB

From the post:

Over the last few weeks I got in touch with the fascinating field of data visualisation which offers great ways to play around with the perception of information.

In a more formal approach data visualisation denotes “The representation and presentation of data that exploits our visual perception abilities in order to amplify cognition

Nowadays there is a huge flood of information that hit’s us everyday. Enormous amounts of data collected from various sources are freely available on the internet. One of these data gargoyles is Twitter producing around 400 million (400 000 000!) tweets per day!

Tweets basically offer two “layers” of information. The obvious direct information within the text of the Tweet itself and also a second layer that is not directly perceived which is the Tweets’ metadata. In this case Twitter offers a large number of additional information like user data, retweet count, hashtags, etc. This metadata can be leveraged to experience data from Twitter in a lot of exciting new ways!

So as a little weekend project I have decided to build a small piece of software that generates real-time heat maps of certain keywords from Twitter data.

Yes, “…in a lot of exciting new ways!” +1!

What about maintenance issues on such a heat map? The capture of terms to the map is fairly obvious, but a subsequent user may be left in the dark as to why this term and not some other term? Or some then current synonym for a term that is being captured?

Or imposing semantics on tweets or terms that are unexpected or non-obvious to a casual or not so casual observer.

You and I can agree red means go and green means stop in a tweet. That’s difficult to maintain as the number of participants and terms go up.

A great starting place to experiment with topic maps to address such issues.

I first saw this in the NoSQL Weekly Newsletter.

July 11, 2012

Twitter Languages of London

Filed under: Tweets,Visualization — Patrick Durusau @ 2:25 pm

Twitter Languages of London by James Cheshire.

From the post:

Last year Eric Fischer produced a great map (see below) visualising the language communities of Twitter. The map, perhaps unsurprisingly, closely matches the geographic extents of the world’s major linguistic groups. On seeing these broad patterns I wondered how well they applied to the international communities living in London. The graphic above shows the spatial distribution of about 470,000 geo-located tweets (collected and georeferenced by Steven Gray) grouped by the language stated in their user’s profile information*. Unsurprisingly, English is by far the most popular. More surprising, perhaps, is the very similar distributions of most of the other languages- with higher densities in central areas and a gradual spreading to the outskirts (I expected greater concentrations in particular areas of the city). Arabic (and Farsi) tweets are much more concentrated around the Hyde Park, Marble Arch and Edgware Road areas whilst the Russian tweeters tend to stick to the West End. Polish and Hungarian tweets appear the most evenly spread throughout London.

Interesting visualization of tweet locations in London and the languages of the same.

Ties in with something I need to push out this week.

On using Twitter as a public but secure intelligence channel. More on that either later today or tomorrow.

July 6, 2012

Apache Camel at 5 [2.10 release]

Filed under: Apache Camel,Integration,Tweets — Patrick Durusau @ 4:54 pm

Apache Camel celebrates 5 years in development with 2.10 release by Chris Mayer.

Chris writes:

Off the back of celebrating its fifth birthday at CamelOne 2012, the Apache Camel team have put the finishing touches to their next release, Apache Camel 2.10, adding in an array of new components to the Apache enterprise application integration platform.

No less than 483 issues have been resolved this time round, but the real draw is the 18 components added to the package, including Websocket and Twitter, allowing for deeper cohesive messaging for users. With the Twitter component, based on the Twitter4J library, users may obtain direct, polling, or event-driven consumption of timelines, users, trends, and direct messages. An example of combining the two can be found here.

Other additions to the component catalogue include support for HBase, CDI, MongoDB, Apache Avro, DynamoDB on AWS, Google GSON and Guava. Java 7 support is much more thorough now, as is support for Spring 3.1.x and Netty. A full list of all resolved issues can be found here.

The Twitter Websocket example reminds me of something I have been meaning to write about Twitter, topic maps and public data streams.

But more on that next week.

« Newer PostsOlder Posts »

Powered by WordPress