Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 25, 2013

Easier than Excel:…

Filed under: Excel,Gephi,Graphs,Networks,Social Networks — Patrick Durusau @ 4:59 pm

Easier than Excel: Social Network Analysis of DocGraph with Gephi by Janos G. Hajagos and Fred Trotter. (PDF)

From the session description:

The DocGraph dataset was released at Strata RX 2012. The dataset is the result of FOI request to CMS by healthcare data activist Fred Trotter (co-presenter). The dataset is minimal where each row consists of just three numbers: 2 healthcare provider identifiers and a weighting factor. By combining these three numbers with other publicly available information sources novel conclusions can be made about delivery of healthcare to Medicare members. As an example of this approach see: http://tripleweeds.tumblr.com/post/42989348374/visualizing-the-docgraph-for-wyoming-medicare-providers

The DocGraph dataset consists of over 49,685,810 relationships between 940,492 different Medicare providers. Analyzing the complete dataset is too big for traditional tools but useful subsets of the larger dataset can be analyzed with Gephi. Gephi is a opensource tool to visually explore and analyze graphs. This tutorial will teach participants how to use Gephi for social network analysis on the DocGraph dataset.

Outline of the tutorial:

Part 1: DocGraph and the network data model (30% of the time)

The DocGraph dataset The raw data Helper data (NPI associated data) The graph / network data model Nodes versus edges How graph models are integral to social networking Other Healthcare graph data sets

Part 2: Using Gephi to perform analysis (70% of the time)

Basic usage of Gephi Saving and reading the GraphML format Laying out edges and nodes of a graph Navigating and exploring the graph Generating graph metrics on the network Filtering a subset of the graph Producing the final output of the graph.

Links from the last slide:

http://strata.oreilly.com/2012/11/docgraph-open-social-doctor-data.html (information)

https://github.com/jhajagos/DocGraph (code)

http://notonlydev.com/docgraph-data (open source $1 covers bandwidth fees)

https://groups.google.com/forum/#!forum/docgraph (mailing list)

Just in case you don’t have it bookmarked already: Gephi.

The type of workshop that makes an entire conference seem like lagniappe.

Just sorry I will have to appreciate it from afar.

Work through this one carefully. You will acquire useful skills doing so.

May 19, 2013

Visualizing your LinkedIn graph using Gephi (Parts 1 & 2)

Filed under: Gephi,Graphics,Networks,Social Networks,Visualization — Patrick Durusau @ 1:41 pm

Visualizing your LinkedIn graph using Gephi – Part 1

&

Visualizing your LinkedIn graph using Gephi – Part 2

by Thomas Cabrol.

From part 1:

Graph analysis becomes a key component of data science. A lot of things can be modeled as graphs, but social networks are really one of the most obvious examples.

In this post, I am going to show how one could visualize its own LinkedIn graph, using the LinkedIn API and Gephi, a very nice software for working on this type of data. If you don’t have it yet, just go to http://gephi.org/ and download it now !

My objective is to simply look at my connections (the “nodes” or “vertices” of the graph), see how they relate to each other (the “edges”) and find clusters of strongly connected users (“communities”). This is somewhat emulating what is available already in the InMaps data product, but, hey, this is cool to do it by ourselves, no ?

The first thing to do for running this graph analysis is to be able to query LinkedIn via its API. You really don’t want to get the data by hand… The API uses the oauth authentification protocol, which will let an application make queries on behalf of a user. So go to https://www.linkedin.com/secure/developer and register a new application. Fill the form as required, and in the OAuth part, use this redirect URL for instance:

Great introduction to Gephi!

As a bonus, reinforces the lesson that ETL isn’t required to re-use data.

ETL may be required in some cases but in a world of data APIs those are getting fewer and fewer.

Think of it this way: Non-ETL data access means someone else is paying for maintenance, backups, hardware, etc.

How much of your IT budget is supporting duplicated data?

March 13, 2013

Inferring Social Rank in…

Filed under: Networks,Probalistic Models,Social Networks — Patrick Durusau @ 4:05 am

Inferring Social Rank in an Old Assyrian Trade Network by David Bamman, Adam Anderson, Noah A. Smith.

Abstract:

We present work in jointly inferring the unique individuals as well as their social rank within a collection of letters from an Old Assyrian trade colony in K\”ultepe, Turkey, settled by merchants from the ancient city of Assur for approximately 200 years between 1950-1750 BCE, the height of the Middle Bronze Age. Using a probabilistic latent-variable model, we leverage pairwise social differences between names in cuneiform tablets to infer a single underlying social order that best explains the data we observe. Evaluating our output with published judgments by domain experts suggests that our method may be used for building informed hypotheses that are driven by data, and that may offer promising avenues for directed research by Assyriologists.

An example of how digitization of ancient texts enables research other than text searching.

Inferring identity and social rank may be instructive for creation of topic maps from both ancient and modern data sources.

I first saw this in a tweet by Stefano Bertolo.

March 6, 2013

Social Graphs and Applied NoSQL Solutions [Merging Graphs?]

Filed under: Graphs,Networks,Social Graphs,Social Networks — Patrick Durusau @ 11:20 am

Social Graphs and Applied NoSQL Solutions by John L. Myers.

From the post:

Recent postings have been more about the “theory” behind the wonderful world of NoSQL and less about how to implement a solution with a NoSQL platform. Well it’s about time that I changed that. This posting will be about how the graph structure and graph databases in particular can be an excellent “applied solution” of NoSQL technologies.

When Facebook released its Graph Search, the general public finally got a look at what the “backend” of Facebook looked like or its possible uses … For many the consumer to consumer (c2c) version of Facebook’s long available business-to-business and business-to-consumer offerings was a bit more of the “creepy” vs. the “cool” of the social media content. However, I think it will have the impact of opening people’s eyes on how their content can and probably should be used for search and other analytical purposes.

With graph structures, unlike tabular structures such as row and column data schemas, you look at the relationships between the nodes (i.e. customers, products, locations, etc.) as opposed to looking at the attributes of a particular object. For someone like me, who has long advocated that we should look at how people, places and things interact with each other versus how their “demographics” (i.e. size, shape, income, etc.) make us “guess” how they interact with each other. In my opinion, demographics and now firmographics have been used as “substitutes” for how people and organizations behave. While this can be effective in the aggregate, as we move toward a “bucket of one” treatment model for customers or clients, for instance, we need to move away from using demographics/firmographics as a primary analysis tool.

Let’s say that graph databases become as popular as SQL databases. You can’t scratch an enterprise without finding a graph database.

And they are all as different from each other as the typical SQL database is today.

How do you go about merging graph databases?

Can you merge a graph database and retain the characteristics of the graph databases separately?

If graph databases become as popular as they should, those are going to be real questions in the not too distant future.

February 23, 2013

Social Network Analysis [Coursera – March 4, 2013]

Filed under: Networks,Social Networks — Patrick Durusau @ 5:40 am

Social Network Analysis by Lada Adamic (University of Michigan)

Description:

Everything is connected: people, information, events and places, all the more so with the advent of online social media. A practical way of making sense of the tangle of connections is to analyze them as networks. In this course you will learn about the structure and evolution of networks, drawing on knowledge from disciplines as diverse as sociology, mathematics, computer science, economics, and physics. Online interactive demonstrations and hands-on analysis of real-world data sets will focus on a range of tasks: from identifying important nodes in the network, to detecting communities, to tracing information diffusion and opinion formation.

The item on the syllabus that caught my eye:

Ahn et al., and Teng et al.: Learning about cooking from ingredient and flavor networks

On which see:

Flavor network and the principles of food pairing, Yong-Yeol Ahn, Sebastian E. Ahnert, James P. Bagrow & Albert-László Barabási.

or

Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu-Ru Lin, Lada A. Adamic,

Heavier on the technical side than Julia Child reruns but enjoyable none the less.

February 9, 2013

The Perfect Case for Social Network Analysis [Maybe yes. Maybe no.]

Filed under: Graphs,Networks,Security,Social Networks — Patrick Durusau @ 8:21 pm

New Jersey-based Fraud Ring Charged this Week: The Perfect Case for Social Network Analysis by Mike Betron.

When I first saw the headline, I thought the New Jersey legislature had gotten busted. 😉

No such luck, although with real transparency on contributions, relationships and state contracts, prison building would become a growth industry in New Jersey and elsewhere.

From the post:

As reported by MSN Money this week, eighteen members of a fraud ring have just been charged in what may be one of the largest international credit card scams in history. The New Jersey-based fraud ring is reported to have stolen at least $200 million, fooling credit card agencies by creating thousands of fake identities to create accounts.

What They Did

The FBI claims the members of the ring began their activity as early as 2007, and over time, used more than 7,000 fake identities to get more than 25,000 credit cards, using more than 1,800 addresses. Once they obtained credit cards, ring members started out by making small purchases and paying them off quickly to build up good credit scores. The next step was to send false reports to credit agencies to show that the account holders had paid off debts – and soon, their fake account holders had glowing credit ratings and high spending limits. Once the limits were raised, the fraudsters would “bust out,” taking out cash loans or maxing out the cards with no intention of paying them back.

But here’s the catch: The criminals in this case created synthetic identities with fake identity information (social security numbers, names, addresses, phone numbers, etc.). Addresses for the account holders were used multiple times on multiple accounts, and the members created at least 80 fake businesses which accepted credit card payments from the ring members.

This is exactly the kind of situation that would be caught by Social Network Analysis (SNA) software. Unfortunately, the credit card companies in this case didn’t have it.

Well, yes and no.

Yes, if Social Network Analysis (SNA) software were looking for the right relationships, the it could catch the fraud in question.

No, if Social Network Analysis (SNA) software were looking at the wrong relationships, the it would not catch the fraud in question.

Analysis isn’t a question of technology.

For example, what one policy change would do more to prevent future 9/11 type incidents that all the $billions spent since 9/11/2001?

Would you believe: Don’t open the cockpit door for hijackers. (full stop)

The 9/11 hijackers took advantage of the “Common Strategy” flaw in U.S. hijacking protocols.

One of the FAA officials most involved with the Common Strategy in the period leading up to 9/11 described it as an approach dating back to the early 1980s, developed in consultation with the industry and the FBI, and based on the historical record of hijackings. The point of the strategy was to “optimize actions taken by a flight crew to resolve hijackings peacefully” through systematic delay and, if necessary, accommodation of the hijackers. The record had shown that the longer a hijacking persisted, the more likely it was to have a peaceful resolution. The strategy operated on the fundamental assumptions that hijackers issue negotiable demands, most often for asylum or the release of prisoners, and that “suicide wasn’t in the game plan” of hijackers.

Hijackers may blow up a plane, kill or torture passengers, but not opening the cockpit door prevents a future 9/11 type event.

But at 9/11, there was no historical experience with hijacking a plane to use a weapon.

Historical experience is just as important for detecting fraud.

Once a pattern is identified for fraud, SNA or topic maps or several other technologies can spot it.

But it has to be identified that first time.

February 4, 2013

The Swipp API: Creating the World’s Social Intelligence

Filed under: Social Graphs,Social Media,Social Networks — Patrick Durusau @ 11:29 am

The Swipp API: Creating the World’s Social Intelligence by Greg Bates.

From the post:

The Swipp API allows developers to integrate Swipp’s “Social Intelligence” into their sites and applications. Public information is not available on the API; interested parties are asked to email info@swipp.com. Once available the APIs will “make it possible for people to interact around any topic imaginable.”

[graphic omitted]

Having operated in stealth mode for 2 years, Swipp founders Don Thorson and Charlie Costantini decided to go public after Facebook’s release of it’s somewhat different competitor, the social graph. The idea is to let users rate any topic they can comment on or anything they can photograph. Others can chime in, providing an average rating by users. One cool difference: you can dislike something as well as like it, giving a rating from -5 to +5. According to Darrell Etherington at Techcrunch, the company has a three-pronged strategy of a consumer app just described, a business component tailored around specific events like the Superbowl, that will help businesses target specific segments.

A fact that seems to be lost in most discussions of social media/sites is that social intelligence already exists.

Social media/sites may assist in the capturing/recording of social intelligence but that isn’t the same thing as creating social intelligence.

It is an important distinction because understanding the capture/recording role enables us to focus on what we want to capture and in what way?

What we decide to capture or record greatly influences the utility of the social intelligence we gather.

Such as capturing how users choose to identify particular subjects or relationships between subjects, for example.

PS: The goal of Swipp is to create a social network and ratings system (like Facebook) that is open for re-use elsewhere on the web. Adding semantic integration to that social networks and ratings system would be a plus I would imagine.

February 2, 2013

Neo4j – Social Networking – QA – Scientific Communication

Filed under: Graphs,Neo4j,Social Networks — Patrick Durusau @ 3:10 pm

René Pickhardt’s blog post title was: Slides of Related work application presented in the Graphdevroom at FOSDEM, which is unlikely to catch your eye. The paper title is: A neo4j powered social networking and Question & Answer application to enhance scientific communication.

I took the liberty of crafting a shorter title for this post. 😉

The problems René addresses are shared by all academics:

  1. Finding new relevant publications
  2. Connecting people interested in the same topic

This project is the result of the merger of the Open Citation and Related Work project, on which see: Open Citations and Related Work projects merge.

The terminology for the project components:

  • Open Citations Corpus: data corpus
  • Open Citations Corpus Datastore (OCCD): infrastructure of the data corpus
  • Related Work: user-oriented services built on top of the citation data

Resources:

You need to take a long look at the project in general but the data in particular.

From the data webpage:

We downloaded the source files of all arxiv articles published until 2012-09-31, extracted the references and matched them against the metadata using these python scripts. The result is a 2.0Gb sized *.txt file with more than 16m lines representing the citaiton graph in the following format:

Document level linking so there is still topic map work to be done merging the same subjects identified differently but this data set is certainly a “leg up” on that task.

We should all encourage if not actively contribute to the Related Work project.

February 1, 2013

REVIEW: Crawling social media and depicting social networks with NodeXL [in 3 parts]

Filed under: NodeXL,Social Graphs,Social Media,Social Networks — Patrick Durusau @ 8:08 pm

REVIEW: Crawling social media and depicting social networks with NodeXL by Eruditio Loginquitas.appears in three parts: Part 1 of 3, Part 2 of 3 and Part 3 of 3.

From part 1:

Surprisingly, given the complexity of the subject matter and the various potential uses by researchers from a range of fields, “Analyzing…” is a very coherent and highly readable text. The ideas are well illustrated throughout with full-color screenshots.

In the introduction, the authors explain that this is a spatially organized book—in the form of an organic tree. The early chapters are the roots which lay the groundwork of social media and social network analysis. Then, there is a mid-section that deals with how to use the NodeXL add-on to Excel. Finally, there are chapters that address particular social media platforms and how data is extracted and analyzed from each type. These descriptors include email, thread networks, Twitter, Facebook, WWW hyperlink networks, Flickr, YouTube, and wiki networks. The work is surprisingly succinct, clear, and practical.

Further, it is written with such range that it can serve as an introductory text for newcomers to social network analysis (me included) as well as those who have been using this approach for a while (but may need to review the social media and data crawling aspects). Taken in total, this work is highly informative, with clear depictions of the social and technical sides of social media platforms.

From part 2:

One of the strengths of “Analyzing Social Media Networks with NodeXL” is that it introduces a powerful research method and a tool that helps tap electronic media and non-electronic social network information intelligently, in a way that does not over-state what is knowable. The authors, Derek Hansen, Ben Schneiderman, and Marc A. Smith, are no strangers to research or academic publishing, and theirs is a fairly conservative approach in terms of what may be asserted.

To frame what may be researched, the authors use a range of resources: some generalized research questions, examples from real-world research, and step-by-step techniques for data extraction, analysis, visualization, and then further analysis.

From part 3:

What is most memorable about “Analyzing Social Media Networks with NodeXL” is the depth of information about the various social network sites that may be crawled using NodeXL. With so many evolving social network platforms, and each capturing and storing information differently, it helps to know what an actual data extractions mean.

I haven’t seen the book personally, but from this review it sounds like a good model for technical writing for a lay audience.

For that matter, a good model for writing about topic maps for a lay audience. (Many of the issues being similar.)

January 25, 2013

Billionaires of the world ranked and charted [Distraction?]

Filed under: Graphics,Social Networks,Visualization — Patrick Durusau @ 8:16 pm

Billionaires of the world ranked and charted by Nathan Yau.

Nathan reviews an interactive tool by Bloomberg that plots the wealth of the richest people in the world, compares them to each other and charts how their net worth changes.

Question: Would charting Taylor Swift’s relationships would be more or less useful?

Or like tracking the world’s richest, is it just a distraction?

Who are you more likely to encounter?

  • One of the world’s richest people, or
  • Tayor Swift, or
  • A local judge, prosecutor, elected official or bank officer?

Social graphs of which one is most relevant for you?

Don’t be distracted by “infotainment” than has no relevance for your daily life.

What if you had a social graph of those with the most impact on your life? And other people had similar graphs with that they know about some of the same people.

If enough small social graphs are put together, the authors of those graphs, not the infotainment industry or elected officials, are empowered.

Question: Who do you want to start tracking with a social graph?

January 23, 2013

Complex Adaptive Systems Modeling

Filed under: Adaptive Networks,Networks,Social Networks — Patrick Durusau @ 7:42 pm

Complex Adaptive Systems Modeling, Editor-in-Chief: Muaz A. Niazi, ISSN: 2194-3206 (electronic version)

From the webpage:

Complex Adaptive Systems Modeling is a peer-reviewed open access journal published under the brand SpringerOpen.

Complex Adaptive Systems Modeling (CASM) is a highly multidisciplinary modeling and simulation journal that serves as a unique forum for original, high-quality peer-reviewed papers with a specific interest and scope limited to agent-based and complex network-based modeling paradigms for Complex Adaptive Systems (CAS). The highly multidisciplinary scope of CASM spans any domain of CAS. Possible areas of interest range from the Life Sciences (E.g. Biological Networks and agent-based models), Ecology (E.g. Agent-based/Individual-based models), Social Sciences (Agent-based simulation, Social Network Analysis), Scientometrics (E.g. Citation Networks) to large-scale Complex Adaptive COmmunicatiOn Networks and environmentS (CACOONS) such as Wireless Sensor Networks (WSN), Body Sensor Networks, Peer-to-Peer (P2P) networks, pervasive mobile networks, service oriented architecture, smart grid and the Internet of Things.

In general, submitted papers should have the following key elements:

  • A clear focus on a specific area of CAS E.g. ecology, social sciences, large scale communication networks, biological sciences etc.)
  • Either focus on an agent-based simulation model or else a complex network model based on data from CAS (e.g. Citation networks, Gene regulatory Networks, Social networks, Ecological Networks etc.).

A new open access journal from Springer with a focus on complex adaptive systems.

January 14, 2013

1 Billion Videos = No Reruns

Filed under: Data,Entertainment,Social Media,Social Networks — Patrick Durusau @ 8:38 pm

Viki Video: 1 Billion Videos in 150 languages Means Never Having to Say Rerun by Greg Bates.

from the post:

Tried of American TV? Tired of TV in English? Escape to Viki, the leading global TV and movie network, which provides videos with crowd sourced translations in 150 languages. The Viki API allows your users to browse more than 1 billion videos by genre, country, and language, plus search across the entire database. The API uses OAuth2.0 authentication, REST, with responses in either JSON or XML.

The Viki Platform Google Group.

Now this looks like a promising data set!

A couple of use cases for topic maps come to mind:

  • Entry in OPAC points patron mapping from catalog to videos from this database.
  • Entry returned from database maps to book in local library collection (via WorldCat) (more likely to appeal to me).

What use cases do you see?

December 14, 2012

Structure and Dynamics of Information Pathways in Online Media

Filed under: Information Flow,Information Theory,Networks,News,Social Networks — Patrick Durusau @ 6:16 am

Structure and Dynamics of Information Pathways in Online Media by Manuel Gomez Rodriguez, Jure Leskovec, Bernhard Schölkopf.

Abstract:

Diffusion of information, spread of rumors and infectious diseases are all instances of stochastic processes that occur over the edges of an underlying network. Many times networks over which contagions spread are unobserved, and such networks are often dynamic and change over time. In this paper, we investigate the problem of inferring dynamic networks based on information diffusion data. We assume there is an unobserved dynamic network that changes over time, while we observe the results of a dynamic process spreading over the edges of the network. The task then is to infer the edges and the dynamics of the underlying network.

We develop an on-line algorithm that relies on stochastic convex optimization to efficiently solve the dynamic network inference problem. We apply our algorithm to information diffusion among 3.3 million mainstream media and blog sites and experiment with more than 179 million different pieces of information spreading over the network in a one year period. We study the evolution of information pathways in the online media space and find interesting insights. Information pathways for general recurrent topics are more stable across time than for on-going news events. Clusters of news media sites and blogs often emerge and vanish in matter of days for on-going news events. Major social movements and events involving civil population, such as the Libyan’s civil war or Syria’s uprise, lead to an increased amount of information pathways among blogs as well as in the overall increase in the network centrality of blogs and social media sites.

A close reading of this paper will have to wait for the holidays but it will be very near the top of the stack!

Transient subjects anyone?

November 29, 2012

Social Network Analysis (Mathematica 9)

Filed under: Mathematica,Networks,Social Networks — Patrick Durusau @ 7:05 pm

Social Network Analysis (Mathematica 9)

From the webpage:

Drawing on Mathematica‘s strong graph and network capabilities, Mathematica 9 introduces a complete and rich set of state-of-the art social network analysis functions. Access to social networks from a variety of sources, including directly from social media sites, and high level functions for community detection, cohesive groups, centrality, and similarity measures make performing network analysis tasks easier and more flexible than ever before.

Too many features on networks to list.

I now have one item on my Christmas wish list. 😉

How about you?

I first saw this in a tweet by Julian Bilcke.

Detecting Communities in Social Graph [Communities of Representatives?]

Filed under: Graphs,Social Graphs,Social Networks,Subject Identity — Patrick Durusau @ 6:49 pm

Detecting Communities in Social Graph by Ricky Ho.

From the post:

In analyzing social network, one common problem is how to detecting communities, such as groups of people who knows or interacting frequently with each other. Community is a subgraph of a graph where the connectivity are unusually dense.

In this blog, I will enumerate some common algorithms on finding communities.

First of all, community detection can be think of graph partitioning problem. In this case, a single node will belong to no more than one community. In other words, community does not overlap with each other.

When you read:

community detection can be think of graph partitioning problem. In this case, a single node will belong to no more than one community.

What does that remind you of?

Does it stand to reason that representatives of the same subject, some with more, some with less information about a subject, would exhibit the same “connectivity” that Ricky calls “unusually dense?”

The TMDM defines a basis for “unusually dense” connectivity but what if we are exploring other representatives of subjects? And trying to detect likely representatives of the same subject?

How would you use graph partitioning to explore such representative?

That could make a fairly interesting research project for anyone wanting to merge diverse intelligence about some subject or person together.

November 11, 2012

Analysis of the statistics blogosphere

Filed under: Blogs,Data Mining,Python,Social Networks — Patrick Durusau @ 8:11 pm

Analysis of the statistics blogosphere by John Johnson.

From the post:

My analysis of the statistics blogosphere for the Coursera Social Networking Analysis class is up. The Python code and the data are up at my github repository. Enjoy!

Included are most of the Python code I used to obtain blog content, some of my attempts to automate the building of the network (I ended up using a manual process in the end), and my analysis. I also included the data. (You can probably see some of your own content.)

Excellent post on mining blog content.

A rich source of data for a topic map on the subject of your dreams.

November 10, 2012

Algorithmic Economics

Filed under: Algorithms,Game Theory,Networks,Social Networks — Patrick Durusau @ 1:28 pm

Algorithmic Economics, August 6-10, 2012, Carnegie Mellon University.

You will find slides and videos for:

Another view of social dynamics. Which is everywhere when you think about it. Not just consumers but sellers, manufacturers, R&D.

There isn’t any human activity separate and apart from social dynamics or influenced by them.

That includes the design, authoring and marketing of topic maps.

I first saw this in a tweet from Stefano Bertolo, mentioning the general link and also the lecture on game theory.

November 4, 2012

Towards Social Discovery…

Filed under: Common Crawl,Data,Social Networks — Patrick Durusau @ 4:14 pm

Towards Social Discovery – New Content Models; New Data; New Toolsets by Matthew Berk, Founder of Lucky Oyster.

From the post:

When I first came across the field of information retrieval in the 80′s and early 90′s (back when TREC began), vectors were all the rage, and the key units were terms, texts, and corpora. Through the 90′s and with the advent of hypertext and later the explosion of the Web, that metaphor shifted to pages, sites, and links, and approaches like HITS and Page Rank leveraged hyperlinking between documents and sites as key proxies for authority and relevance.

Today we’re at a crossroads, as the nature of the content we seek to leverage through search and discovery has shifted once again, with a specific gravity now defined by entities, structured metadata, and (social) connections. In particular, and based on my work with Common Crawl data specifically, content has shifted in three critical ways:

No, I won’t even summarize his three points. It’s short and quite well written.

Read his post and then consider: Where do topic maps fit into his “crossroads?”

October 24, 2012

The Data Science Community on Twitter

Filed under: Data Science,Graphs,Networks,Social Networks,Tweets,Visualization — Patrick Durusau @ 2:07 pm

The Data Science Community on Twitter

From the webpage:

659 Twitter accounts linked to data science, May 2012.

Linkage of Twitter accounts to display followers and following nodes.

That sounds so inadequate (and is).

You need to go see the page, play with it and then come back.

How was that? Impressive yes?

OK, how would that experience be different if you were using a topic map?

More/less information? Other display options?

It is an impressive piece of eye candy but I have a sense it could be so much more.

You?

September 29, 2012

Twitter Social Network by @aneeshs (video lecture)

Filed under: Graphs,Networks,Social Networks,Tweets — Patrick Durusau @ 3:37 pm

Video Lecture: Twitter Social Network by @aneeshs by Marti Hearst.

From the post:

Learn about weak ties, triadic closures, and personal pagerank, and how they all relate to the Twitter social graph from Aneesh Sharma:

Just when you think the weekend can’t get any better!

Enjoy!

September 28, 2012

Windows into Relational Events: Data Structures for Contiguous Subsequences of Edges

Filed under: Graphs,Networks,Social Media,Social Networks,Time,Time Series — Patrick Durusau @ 12:59 pm

Windows into Relational Events: Data Structures for Contiguous Subsequences of Edges by Michael J. Bannister, Christopher DuBois, David Eppstein, Padhraic Smyth.

Abstract:

We consider the problem of analyzing social network data sets in which the edges of the network have timestamps, and we wish to analyze the subgraphs formed from edges in contiguous subintervals of these timestamps. We provide data structures for these problems that use near-linear preprocessing time, linear space, and sublogarithmic query time to handle queries that ask for the number of connected components, number of components that contain cycles, number of vertices whose degree equals or is at most some predetermined value, number of vertices that can be reached from a starting set of vertices by time-increasing paths, and related queries.

Among other interesting questions, raises the issue of what time span of connections constitutes a network of interest? More than being “dynamic.” A definitional issue for the social network in question.

If you are working with social networks, a must read.

PS: You probably need to read: Relational events vs graphs, a posting by David Eppstein.

David details several different terms for “relational event data,” and says there are probably others they did not find. (Topic maps anyone?)

September 14, 2012

First Party Fraud (In Four Parts)

Filed under: Business Intelligence,Graphs,Networks,Social Graphs,Social Networks — Patrick Durusau @ 1:00 pm

Mike Betron as written a four-part series on first party fraud that merits your attention:

First Part Fraud [Part 1]

What is First Party Fraud?

First-party fraud (FPF) is defined as when somebody enters into a relationship with a bank using either their own identity or a fictitious identity with the intent to defraud. First-party fraud is different from third-party fraud (also known as “identity fraud”) because in third-party fraud, the perpetrator uses another person’s identifying information (such as a social security number, address, phone number, etc.). FPF is often referred to as a “victimless” crime, because no consumers or individuals are directly affected. The real victim in FPF is the bank, which has to eat all of the financial losses.

First-Party Fraud: How Do We Assess and Stop the Damage? [Part 2]

Mike covers the cost of first party fraud and then why it is so hard to combat.

Why is it so hard to detect FPF?

Given the amount of financial pain incurred by bust-out fraud, you might wonder why banks haven’t developed a solution and process for detecting and stopping it.

There are three primary reasons why first-party fraud is so hard to identify and block:

1) The fraudsters look like normal customers

2) The crime festers in multiple departments

3) The speed of execution is very fast

Fighting First Party Fraud With Social Link Analysis (3 of 4)

And you know, those pesky criminals won’t use their universally assigned identifiers for financial transactions. (Any security system that relies on good faith isn’t a security system, it’s an opportunity.)

A Trail of Clues Left by Criminals

Although organized fraudsters are sophisticated, they often leave behind evidence that can be used to uncover networks of organized crime. Fraudsters know that due to Know Your Customer (KYC) and Customer Due Diligence (CDD) regulations, their identification will be verified when they open an account with a financial institution. To pass these checks, the criminals will either modify their own identity slightly or else create a synthetic identity, which consists of combining real identity information (e.g., a social security number) with fake identity information (names, addresses, phone numbers, etc.).

Fortunately for banks, false identity information can be expensive and inconvenient to acquire and maintain. For example, apartments must be rented out to maintain a valid address. Additionally, there are only so many cell phones a person can carry at one time and only so many aliases that can be remembered. Because of this, fraudsters recycle bits and pieces of these valuable assets.

The reuse of identity information has inspired Infoglide to begin to create new technology on top of its IRE platform called Social Link Analysis (SLA). SLA works by examining the “linkages” between the recycled identities, therefore identifying potential fraud networks. Once the networks are detected, Infoglide SLA applies advanced analytics to determine the risk level for both the network and for every individual associated with that network.

First Party Fraud (post 4 of 4) – A Use Case

As discussed in our previous blog in this series, Social Link Analysis works by identifying linkages between individuals to create a social network. Social Link Analysis can then analyze the network to identify organized crime, such as bust-out fraud and internal collusion.

During the Social Link Analysis process, every individual is connected to a single network. An analysis at a large tier 1 bank will turn up millions of networks, but the majority of individuals only belong to very small networks (such as a husband and wife, and possibly a child). However, the social linking process will certainly turn up a small percentage of larger networks of interconnected individuals. It is in these larger networks where participants of bust-out fraud are hiding.

Due to the massive number of networks within a system, the analysis is performed mathematically (e.g. without user interface) and scores and alerts are generated. However, any network can be “visualized” using the software to create a graphic display of information and connections. In this example, we’ll look at a visualization of a small network that the social link analysis tool has alerted as a possible fraud ring.

A word of caution.

To leap from the example individuals being related to each other to:

As a result, Social Link Analysis has detected four members of a network, each with various amounts of charged-off fraud.

Is quite a leap.

Having charged off loans, with re-use of telephone numbers and a mobile population, doesn’t necessarily mean anyone is guilty of “charged-off fraud.”

Could be, but you should tread carefully and with legal advice before jumping to conclusions of fraud.

For good customer relations, if not avoiding bad PR and legal liability.

PS: Topic maps can help with this type of data. Including mapping in the bank locations or even personnel who accepted particular loans.

August 17, 2012

Character social networks in movies

Filed under: Graphs,Networks,Social Graphs,Social Networks — Patrick Durusau @ 4:28 pm

Character social networks in movies

Nathan Yau started the weekend early with:

We’ve seen a lot of network charts for Twitter, Facebook, and real people. Screw that. I want to see social networks for movie characters. That’s where Movie Galaxies comes in.

That’s really cool but what if you combined a topic map with a social graph?

So that everyone contributes their vision of the social graph at their office, homeroom, class, etc.

Then the social graphs get merged and can be viewed from different perspectives?

NeoSocial: Connecting to Facebook with Neo4j

Filed under: Facebook,Neo4j,Social Graphs,Social Networks — Patrick Durusau @ 12:29 pm

NeoSocial: Connecting to Facebook with Neo4j by Max De Marzi.

From the post:

(Really cool graphic omitted – see below)

Social applications and Graph Databases go together like peanut butter and jelly. I’m going to walk you through the steps of building an application that connects to Facebook, pulls your friends and likes data and visualizes it. I plan on making a video of me coding it one line at a time, but for now let’s just focus on the main elements.

The application will have two major components:

  1. A web service that handles authentication and displaying of friends, likes, and so-on.
  2. A background job service that imports data from Facebook.

We will be deploying this application on Heroku and making use of the RedisToGo and Neo4j Add-ons.

A very good weekend project for Facebook and Neo4j.

I have a different solution when you have too many friends to make a nice graphic (> 50):

Get a bigger monitor. 😉

August 2, 2012

Universal properties of mythological networks

Filed under: Networks,Social Networks — Patrick Durusau @ 2:30 pm

Universal properties of mythological networks by Pádraig Mac Carron and Ralph Kenna (2012 EPL 99 28002 doi:10.1209/0295-5075/99/28002)

Abstract:

As in statistical physics, the concept of universality plays an important, albeit qualitative, role in the field of comparative mythology. Here we apply statistical mechanical tools to analyse the networks underlying three iconic mythological narratives with a view to identifying common and distinguishing quantitative features. Of the three narratives, an Anglo-Saxon and a Greek text are mostly believed by antiquarians to be partly historically based while the third, an Irish epic, is often considered to be fictional. Here we use network analysis in an attempt to discriminate real from imaginary social networks and place mythological narratives on the spectrum between them. This suggests that the perceived artificiality of the Irish narrative can be traced back to anomalous features associated with six characters. Speculating that these are amalgams of several entities or proxies, renders the plausibility of the Irish text comparable to the others from a network-theoretic point of view.

A study that suggests there is more to be learned about networks, social, mythological and otherwise. But three (3) examples out of extant accounts, mythological and otherwise, isn’t enough for definitive conclusions.

BTW, if you are interested in the use of social networks with literature, see: Extracting Social Networks from Literary Fiction by David K. Elson , Nicholas Dames , Kathleen R. Mckeown for one approach. (If you know of a recent survey on extraction of social networks, please forward and I will cite you in a post.)

July 27, 2012

Computational Aspects of Social Networks (CASoN) [Conference]

Filed under: Conferences,Networks,Social Networks — Patrick Durusau @ 7:51 am

Computational Aspects of Social Networks (CASoN)

Important Dates:

Paper submission due:Aug. 15 2012
Notification of paper acceptance:Sep. 15 2012
Final manuscript due:Sep. 30 2012
Registration and full payment due:Sep. 30 2012
Conference date:Nov. 21-23 2012

Conference venue: São Carlos, Brazil.

From the Call for Papers:

The International Conference on Computational Aspects of Social Networks (CASoN 2012) brings together an interdisciplinary venue for social scientists, mathematicians, computer scientists, engineers, computer users, and students to exchange and share their experiences, new ideas, and research results about all aspects (theory, applications and tools) of intelligent methods applied to Social Networks, and to discuss the practical challenges encountered and the solutions adopted.

Social networks provide a powerful abstraction of the structure and dynamics of diverse kinds of people or people-to-technology interaction. These social network systems are usually characterized by the complex network structures and rich accompanying contextual information. Recent trends also indicate the usage of complex network as a key feature for next generation usage and exploitation of the Web. This international conference on Computational Aspect of Networks is focused on the foundations of social networks as well as case studies, empirical, and other methodological works related to the computational tools for the automatic discovery of Web-based social networks. This conference provides an opportunity to compare and contrast the ethological approach to social behavior in animals (including the study of animal tracks and learning by members of the same species) with web-based evidence of social interaction, perceptual learning, information granulation, the behavior of humans and affinities between web-based social networks. The main topics cover the design and use of various computational intelligence tools and software, simulations of social networks, representation and analysis of social networks, use of semantic networks in the design and community-based research issues such as knowledge discovery, privacy and protection, and visualization.

We solicit original research and technical papers not published elsewhere. The papers can be theoretical, practical and application, and cover a broad set of intelligent methods, with particular emphasis on Social Network computing.

One of the more interesting aspects of social network study, at least to me, is the existence of social networks of researchers who are studying social networks. Implies, to me at least, that “subjects” of discussion have their origins in social networks.

Some approaches, I won’t name names, take “subjects” as given and never question their origins. That leads directly to fragile systems/ontologies because change isn’t taken into account.

Clearly saying “stop” is insufficient, else the many attempts to fix some standardized language would have succeeded long ago.

If you know approaches that attempt to allow for change, would appreciate a note.

May 21, 2012

How do things go viral? Information diffusion in social networks.

Filed under: Social Networks,Viral — Patrick Durusau @ 10:53 am

How do things go viral? Information diffusion in social networks by Maksim Tsvetovat.

May 22, 2012 – 10 AM Pacific Time, Webcast

From the post:

“Going viral” is a holy grail of internet marketing — but beside the well-known memes and viral campaigns, there is a slower and quieter process of information diffusion. In fact, information diffusion is at the root of “viral nature” of some information. In this webcast, we will talk about the viral nature of information, adoption of attitudes and memes, and the way social networks evolve at the same time as people’s attitudes and desires. We will demonstrate some of these principles using models built in Python.

Admit it or not, we all want other people to use software or ideas that we like. If nothing else (like sales income) it provides validation.

Having a bunch of people like our stuff, is even more validation.

Watch the video. Maybe it will work for you!

May 20, 2012

An Example of Social Network Analysis with R using Package igraph

Filed under: igraph,Networks,R,Social Networks — Patrick Durusau @ 10:33 am

An Example of Social Network Analysis with R using Package igraph by Yanchang Zhao.

From the post:

This post presents an example of social network analysis with R using package igraph.

The data to analyze is Twitter text data of @RDataMining used in the example of Text Mining, and it can be downloaded as file “termDocMatrix.rdata” at the Data webpage. Putting it in a general scenario of social networks, the terms can be taken as people and the tweets as groups on LinkedIn, and the term-document matrix can then be taken as the group membership of people. We will build a network of terms based on their co-occurrence in the same tweets, which is similar with a network of people based on their group memberships.

I like the re-use of traditional social network analysis with tweets.

And the building of a network of terms based on co-occurrence.

May or may not serve your purposes but:

If you don’t look, you won’t see.

May 10, 2012

Visual Complexity

Filed under: Complex Networks,Graphics,Social Networks,Visualization — Patrick Durusau @ 5:54 pm

Visual Complexity

Described by Manuel Lima (its creator) as:

VisualComplexity.com intends to be a unified resource space for anyone interested in the visualization of complex networks. The project’s main goal is to leverage a critical understanding of different visualization methods, across a series of disciplines, as diverse as Biology, Social Networks or the World Wide Web. I truly hope this space can inspire, motivate and enlighten any person doing research on this field.

Not all projects shown here are genuine complex networks, in the sense that they aren’t necessarily at the edge of chaos, or show an irregular and systematic degree of connectivity. However, the projects that apparently skip this class were chosen for two important reasons. They either provide advancement in terms of visual depiction techniques/methods or show conceptual uniqueness and originality in the choice of a subject. Nevertheless, all projects have one trait in common: the whole is always more than the sum of its parts.

The homepage is simply stunning.

BTW, Manuel is also the author of: Visual Complexity: Mapping Patterns of Information.

EveryBlock

Filed under: Social Media,Social Networks — Patrick Durusau @ 5:43 pm

EveryBlock

I remember my childhood neighborhood just before the advent of air conditioning and the omnipresence of TV. A walk down the block gave you a good idea of what your neighbors were up to. Or not. 😉

Comparing then to now, the neighborhood where I now live, is strangely silent. Walk down my block and you hear no TVs, conversations, radios, loud discussions or the like.

We have become increasingly isolated from others by our means of transportation, entertainment and climate control.

EveryBlock offers the promise of restoring some of the random contact with our neighbors to our lives.

EveryBlock says it solves two problems:

First, there’s no good place to keep track of everything happening in your neighborhood, from news coverage to events to photography. We try to collect all of the news and civic goings-on that have happened recently in your city, and make it simple for you to keep track of news in particular areas.

Second, there’s no good way to post messages to your neighbors online. Facebook lets you post messages to your friends, Twitter lets you post messages to your followers, but no well-used service lets you post a message to people in a given neighborhood.

EveryBlock addresses the problem of geographic blocks, but how do you get information on your professional block?

Do you hear anything unexpected or different? Or do you hear the customary and expected?

Maybe your professional block has gotten too silent.

Suggestions for how to change that?

« Newer PostsOlder Posts »

Powered by WordPress