Social Graphs « Another Word For It

September 18, 2017

Game of Thrones, Murder Network Analysis

Filed under: Games,Graphs,Networks,Social Graphs,Social Networks,Visualization — Patrick Durusau @ 1:03 pm

Game of Thrones, Murder Network Analysis by George McIntire.

From the post:

Everybody’s favorite show about bloody power struggles and dragons, Game of Thrones, is back for its seventh season. And since we’re such big GoT fans here, we just had to do a project on analyzing data from the hit HBO show. You might not expect it, but the show is rife with data and has been the subject of various data projects from data scientists, who we all know love to combine their data powers with the hobbies and interests.

Milan Janosov of the Central European University devised a machine learning algorithm to predict the death of certain characters. A handy tool, for any fan tired of being surprised by the shock murders of the show. Dr. Allen Downey, author of the popular ThinkStats textbooks conducted a Bayesian analysis of the characters’ survival rate in the show. Data Scientist and biologist Shirin Glander applied social network analysis tools to analyze and visualize the family and house relationships of the characters.

The project we did is quite similar to that of Glander’s, we’ll be playing around with network analysis, but with data on the murderers and their victims. We constructed a giant network that maps out every murder of character’s with minor, recurring, and major roles.

The data comes courtesy of Ændrew Rininsland of The Financial Times, who’s done a great of collecting, cleaning, and formatting the data. For the purposes of this project, I had to do a whole lot of wrangling and cleaning of my own and in addition to my subjective decisions about which characters to include as well and what constitutes a murder. My finalized dataset produced a total of of 240 murders from 79 killers. For my network graph, the data produced a total of 225 nodes and 173 edges.
…

I prefer the Game of Thrones (GoT) books over the TV series. The text exercises a reader’s imagination in ways that aren’t matched by visual media.

That said, the TV series murder data set (Ændrew Rininsland of The Financial Times) is a great resource to demonstrate the power of network analysis.

After some searching, it appears that sometime in 2018 is the earliest date for the next volume in the GoT series. Sorry.

Comments Off

April 14, 2015

₳ustral Blog

Filed under: Sentiment Analysis,Social Graphs,Social Networks,Topic Models (LDA) — Patrick Durusau @ 4:14 pm

₳ustral Blog

From the post:

We’re software developers and entrepreneurs who wondered what Reddit might be able to tell us about our society.

Social network data have revolutionized advertising, brand management, political campaigns, and more. They have also enabled and inspired vast new areas of research in the social and natural sciences.

Traditional social networks like Facebook focus on mostly-private interactions between personal acquaintances, family members, and friends. Broadcast-style social networks like Twitter enable users at “hubs” in the social graph (those with many followers) to disseminate their ideas widely and interact directly with their “followers”. Both traditional and broadcast networks result in explicit social networks as users choose to associate themselves with other users.

Reddit and similar services such as Hacker News are a bit different. On Reddit, users vote for, and comment on, content. The social network that evolves as a result is implied based on interactions rather than explicit.

Another important difference is that, on Reddit, communication between users largely revolves around external topics or issues such as world news, sports teams, or local events. Instead of discussing their own lives, or topics randomly selected by the community, Redditors discuss specific topics (as determined by community voting) in a structured manner.

This is what we’re trying to harness with Project Austral. By combining Reddit stories, comments, and users with technologies like sentiment analysis and topic identification (more to come soon!) we’re hoping to reveal interesting trends and patterns that would otherwise remain hidden.

Please, check it out and let us know what you think!

Bad assumption on my part! Since ₳ustral uses Neo4j to store the Reddit graph, I was expecting a graph-type visualization. If that was intended, that isn’t what I found.

Most of my searching is content oriented and not so much concerned with trends or patterns. An upsurge in hypergraph queries could happen in Reddit, but aside from references to publications and projects, the upsurge itself would be a curiosity to me.

Nothing against trending, patterns, etc. but just not my use case. May be yours.

Comments Off

October 1, 2014

Uncovering Community Structures with Initialized Bayesian Nonnegative Matrix Factorization

Filed under: Bayesian Data Analysis,Matrix,Social Graphs,Social Networks,Subgraphs — Patrick Durusau @ 3:28 pm

Uncovering Community Structures with Initialized Bayesian Nonnegative Matrix Factorization by Xianchao Tang, Tao Xu, Xia Feng, and, Guoqing Yang.

Abstract:

Uncovering community structures is important for understanding networks. Currently, several nonnegative matrix factorization algorithms have been proposed for discovering community structure in complex networks. However, these algorithms exhibit some drawbacks, such as unstable results and inefficient running times. In view of the problems, a novel approach that utilizes an initialized Bayesian nonnegative matrix factorization model for determining community membership is proposed. First, based on singular value decomposition, we obtain simple initialized matrix factorizations from approximate decompositions of the complex network’s adjacency matrix. Then, within a few iterations, the final matrix factorizations are achieved by the Bayesian nonnegative matrix factorization method with the initialized matrix factorizations. Thus, the network’s community structure can be determined by judging the classification of nodes with a final matrix factor. Experimental results show that the proposed method is highly accurate and offers competitive performance to that of the state-of-the-art methods even though it is not designed for the purpose of modularity maximization.

Some titles grab you by the lapels and say, “READ ME!,” don’t they?

I found the first paragraph a much friendlier summary of why you should read this paper (footnotes omitted):

Many complex systems in the real world have the form of networks whose edges are linked by nodes or vertices. Examples include social systems such as personal relationships, collaborative networks of scientists, and networks that model the spread of epidemics; ecosystems such as neuron networks, genetic regulatory networks, and protein-protein interactions; and technology systems such as telephone networks, the Internet and the World Wide Web [1]. In these networks, there are many sub-graphs, called communities or modules, which have a high density of internal links. In contrast, the links between these sub-graphs have a fairly lower density [2]. In community networks, sub-graphs have their own functions and social roles. Furthermore, a community can be thought of as a general description of the whole network to gain more facile visualization and a better understanding of the complex systems. In some cases, a community can reveal the real world network’s properties without releasing the group membership or compromising the members’ privacy. Therefore, community detection has become a fundamental and important research topic in complex networks.

If you think of “the real world network’s properties” as potential properties for identification of a network as a subject or as properties of the network as a subject, the importance of this article becomes clearer.

Being able to speak of sub-graphs as subjects with properties can only improve our ability to compare sub-graphs across complex networks.

BTW, all the data used in this article is available for downloading: http://dx.doi.org/10.6084/m9.figshare.1149965

I first saw this in a tweet by Brian Keegan.

Comments Off

March 6, 2013

Social Graphs and Applied NoSQL Solutions [Merging Graphs?]

Filed under: Graphs,Networks,Social Graphs,Social Networks — Patrick Durusau @ 11:20 am

Social Graphs and Applied NoSQL Solutions by John L. Myers.

From the post:

Recent postings have been more about the “theory” behind the wonderful world of NoSQL and less about how to implement a solution with a NoSQL platform. Well it’s about time that I changed that. This posting will be about how the graph structure and graph databases in particular can be an excellent “applied solution” of NoSQL technologies.

When Facebook released its Graph Search, the general public finally got a look at what the “backend” of Facebook looked like or its possible uses … For many the consumer to consumer (c2c) version of Facebook’s long available business-to-business and business-to-consumer offerings was a bit more of the “creepy” vs. the “cool” of the social media content. However, I think it will have the impact of opening people’s eyes on how their content can and probably should be used for search and other analytical purposes.

With graph structures, unlike tabular structures such as row and column data schemas, you look at the relationships between the nodes (i.e. customers, products, locations, etc.) as opposed to looking at the attributes of a particular object. For someone like me, who has long advocated that we should look at how people, places and things interact with each other versus how their “demographics” (i.e. size, shape, income, etc.) make us “guess” how they interact with each other. In my opinion, demographics and now firmographics have been used as “substitutes” for how people and organizations behave. While this can be effective in the aggregate, as we move toward a “bucket of one” treatment model for customers or clients, for instance, we need to move away from using demographics/firmographics as a primary analysis tool.

Let’s say that graph databases become as popular as SQL databases. You can’t scratch an enterprise without finding a graph database.

And they are all as different from each other as the typical SQL database is today.

How do you go about merging graph databases?

Can you merge a graph database and retain the characteristics of the graph databases separately?

If graph databases become as popular as they should, those are going to be real questions in the not too distant future.

Comments Off

February 4, 2013

The Swipp API: Creating the World’s Social Intelligence

Filed under: Social Graphs,Social Media,Social Networks — Patrick Durusau @ 11:29 am

The Swipp API: Creating the World’s Social Intelligence by Greg Bates.

From the post:

The Swipp API allows developers to integrate Swipp’s “Social Intelligence” into their sites and applications. Public information is not available on the API; interested parties are asked to email info@swipp.com. Once available the APIs will “make it possible for people to interact around any topic imaginable.”

[graphic omitted]

Having operated in stealth mode for 2 years, Swipp founders Don Thorson and Charlie Costantini decided to go public after Facebook’s release of it’s somewhat different competitor, the social graph. The idea is to let users rate any topic they can comment on or anything they can photograph. Others can chime in, providing an average rating by users. One cool difference: you can dislike something as well as like it, giving a rating from -5 to +5. According to Darrell Etherington at Techcrunch, the company has a three-pronged strategy of a consumer app just described, a business component tailored around specific events like the Superbowl, that will help businesses target specific segments.

A fact that seems to be lost in most discussions of social media/sites is that social intelligence already exists.

Social media/sites may assist in the capturing/recording of social intelligence but that isn’t the same thing as creating social intelligence.

It is an important distinction because understanding the capture/recording role enables us to focus on what we want to capture and in what way?

What we decide to capture or record greatly influences the utility of the social intelligence we gather.

Such as capturing how users choose to identify particular subjects or relationships between subjects, for example.

PS: The goal of Swipp is to create a social network and ratings system (like Facebook) that is open for re-use elsewhere on the web. Adding semantic integration to that social networks and ratings system would be a plus I would imagine.

Comments Off

February 1, 2013

REVIEW: Crawling social media and depicting social networks with NodeXL [in 3 parts]

Filed under: NodeXL,Social Graphs,Social Media,Social Networks — Patrick Durusau @ 8:08 pm

REVIEW: Crawling social media and depicting social networks with NodeXL by Eruditio Loginquitas.appears in three parts: Part 1 of 3, Part 2 of 3 and Part 3 of 3.

From part 1:

Surprisingly, given the complexity of the subject matter and the various potential uses by researchers from a range of fields, “Analyzing…” is a very coherent and highly readable text. The ideas are well illustrated throughout with full-color screenshots.

In the introduction, the authors explain that this is a spatially organized book—in the form of an organic tree. The early chapters are the roots which lay the groundwork of social media and social network analysis. Then, there is a mid-section that deals with how to use the NodeXL add-on to Excel. Finally, there are chapters that address particular social media platforms and how data is extracted and analyzed from each type. These descriptors include email, thread networks, Twitter, Facebook, WWW hyperlink networks, Flickr, YouTube, and wiki networks. The work is surprisingly succinct, clear, and practical.

Further, it is written with such range that it can serve as an introductory text for newcomers to social network analysis (me included) as well as those who have been using this approach for a while (but may need to review the social media and data crawling aspects). Taken in total, this work is highly informative, with clear depictions of the social and technical sides of social media platforms.

From part 2:

One of the strengths of “Analyzing Social Media Networks with NodeXL” is that it introduces a powerful research method and a tool that helps tap electronic media and non-electronic social network information intelligently, in a way that does not over-state what is knowable. The authors, Derek Hansen, Ben Schneiderman, and Marc A. Smith, are no strangers to research or academic publishing, and theirs is a fairly conservative approach in terms of what may be asserted.

To frame what may be researched, the authors use a range of resources: some generalized research questions, examples from real-world research, and step-by-step techniques for data extraction, analysis, visualization, and then further analysis.

From part 3:

What is most memorable about “Analyzing Social Media Networks with NodeXL” is the depth of information about the various social network sites that may be crawled using NodeXL. With so many evolving social network platforms, and each capturing and storing information differently, it helps to know what an actual data extractions mean.

I haven’t seen the book personally, but from this review it sounds like a good model for technical writing for a lay audience.

For that matter, a good model for writing about topic maps for a lay audience. (Many of the issues being similar.)

Comments Off

November 29, 2012

November 7, 2012

September 14, 2012

First Party Fraud (In Four Parts)

Filed under: Business Intelligence,Graphs,Networks,Social Graphs,Social Networks — Patrick Durusau @ 1:00 pm

Mike Betron as written a four-part series on first party fraud that merits your attention:

First Part Fraud [Part 1]

What is First Party Fraud?

First-party fraud (FPF) is defined as when somebody enters into a relationship with a bank using either their own identity or a fictitious identity with the intent to defraud. First-party fraud is different from third-party fraud (also known as “identity fraud”) because in third-party fraud, the perpetrator uses another person’s identifying information (such as a social security number, address, phone number, etc.). FPF is often referred to as a “victimless” crime, because no consumers or individuals are directly affected. The real victim in FPF is the bank, which has to eat all of the financial losses.

First-Party Fraud: How Do We Assess and Stop the Damage? [Part 2]

Mike covers the cost of first party fraud and then why it is so hard to combat.

Why is it so hard to detect FPF?

Given the amount of financial pain incurred by bust-out fraud, you might wonder why banks haven’t developed a solution and process for detecting and stopping it.

There are three primary reasons why first-party fraud is so hard to identify and block:

1) The fraudsters look like normal customers

…

2) The crime festers in multiple departments

…

3) The speed of execution is very fast

Fighting First Party Fraud With Social Link Analysis (3 of 4)

And you know, those pesky criminals won’t use their universally assigned identifiers for financial transactions. (Any security system that relies on good faith isn’t a security system, it’s an opportunity.)

A Trail of Clues Left by Criminals

Although organized fraudsters are sophisticated, they often leave behind evidence that can be used to uncover networks of organized crime. Fraudsters know that due to Know Your Customer (KYC) and Customer Due Diligence (CDD) regulations, their identification will be verified when they open an account with a financial institution. To pass these checks, the criminals will either modify their own identity slightly or else create a synthetic identity, which consists of combining real identity information (e.g., a social security number) with fake identity information (names, addresses, phone numbers, etc.).

Fortunately for banks, false identity information can be expensive and inconvenient to acquire and maintain. For example, apartments must be rented out to maintain a valid address. Additionally, there are only so many cell phones a person can carry at one time and only so many aliases that can be remembered. Because of this, fraudsters recycle bits and pieces of these valuable assets.

The reuse of identity information has inspired Infoglide to begin to create new technology on top of its IRE platform called Social Link Analysis (SLA). SLA works by examining the “linkages” between the recycled identities, therefore identifying potential fraud networks. Once the networks are detected, Infoglide SLA applies advanced analytics to determine the risk level for both the network and for every individual associated with that network.

First Party Fraud (post 4 of 4) – A Use Case

As discussed in our previous blog in this series, Social Link Analysis works by identifying linkages between individuals to create a social network. Social Link Analysis can then analyze the network to identify organized crime, such as bust-out fraud and internal collusion.

During the Social Link Analysis process, every individual is connected to a single network. An analysis at a large tier 1 bank will turn up millions of networks, but the majority of individuals only belong to very small networks (such as a husband and wife, and possibly a child). However, the social linking process will certainly turn up a small percentage of larger networks of interconnected individuals. It is in these larger networks where participants of bust-out fraud are hiding.

Due to the massive number of networks within a system, the analysis is performed mathematically (e.g. without user interface) and scores and alerts are generated. However, any network can be “visualized” using the software to create a graphic display of information and connections. In this example, we’ll look at a visualization of a small network that the social link analysis tool has alerted as a possible fraud ring.

A word of caution.

To leap from the example individuals being related to each other to:

As a result, Social Link Analysis has detected four members of a network, each with various amounts of charged-off fraud.

Is quite a leap.

Having charged off loans, with re-use of telephone numbers and a mobile population, doesn’t necessarily mean anyone is guilty of “charged-off fraud.”

Could be, but you should tread carefully and with legal advice before jumping to conclusions of fraud.

For good customer relations, if not avoiding bad PR and legal liability.

PS: Topic maps can help with this type of data. Including mapping in the bank locations or even personnel who accepted particular loans.

Comments Off

September 8, 2012

Who’s the Most Influential in a Social Graph?

Filed under: Graphs,Social Graphs — Patrick Durusau @ 3:11 pm

Who’s the Most Influential in a Social Graph? New Software Recognizes Key Influencers Faster Than Ever

At an airport, many people are essential for planes to take off. Gate staffs, refueling crews, flight attendants and pilots are in constant communication with each other as they perform required tasks. But it’s the air traffic controller who talks with every plane, coordinating departures and runways. Communication must run through her in order for an airport to run smoothly and safely.

In computational terms, the air traffic controller is the “betweenness centrality,” the most connected person in the system. In this example, finding the key influencer is easy because each departure process is nearly the same.

Determining the most influential person on a social media network (or, in computer terms, a graph) is more complex. Thousands of users are interacting about a single subject at the same time. New people (known computationally as edges) are constantly joining the streaming conversation.

Georgia Tech has developed a new algorithm that quickly determines betweenness centrality for streaming graphs. The algorithm can identify influencers as information changes within a network. The first-of-its-kind streaming tool was presented this week by Computational Science and Engineering Ph.D. candidate Oded Green at the Social Computing Conference in Amsterdam.

“Unlike existing algorithms, our system doesn’t restart the computational process from scratch each time a new edge is inserted into a graph,” said College of Computing Professor David Bader, the project’s leader. “Rather than starting over, our algorithm stores the graph’s prior centrality data and only does the bare minimal computations affected by the inserted edges.”

No pointers to the paper, yet, but the software is said to be open source.

Will make a new post when the article appears. To make sure it gets on your radar.

On obvious use of “influence” in a topic map is what topics have the most impact on the subject identities represented by other topics.

Such as if I remove person R, do we still think persons W – Z are members of a terrorist group?

Bonus question: I wonder what influence Jack Menzel, Product Management Director at Google, has in social graphs now?

PS: Just in case you want to watch for this paper to appear:

O. Green, R. McColl, and D.A. Bader, “A Fast Algorithm for Incremental Betweenness Centrality,” ASE/IEEE International Conference on Social Computing (SocialCom), Amsterdam, The Netherlands, September 3-5, 2012.

(From Prof. David A. Bader’s CV page.)

Comments Off

August 17, 2012

July 20, 2012

May 29, 2012

January 22, 2012

The Role of Social Networks in Information Diffusion

Filed under: Networks,Social Graphs,Social Media,Social Networks — Patrick Durusau @ 7:35 pm

The Role of Social Networks in Information Diffusion by Eytan Bakshy, Itamar Rosenn, Cameron Marlow and Lada Adamic.

Abstract:

Online social networking technologies enable individuals to simultaneously share information with any number of peers. Quantifying the causal effect of these technologies on the dissemination of information requires not only identification of who influences whom, but also of whether individuals would still propagate information in the absence of social signals about that information. We examine the role of social networks in online information diffusion with a large-scale field experiment that randomizes exposure to signals about friends’ information sharing among 253 million subjects in situ. Those who are exposed are significantly more likely to spread information, and do so sooner than those who are not exposed. We further examine the relative role of strong and weak ties in information propagation. We show that, although stronger ties are individually more influential, it is the more abundant weak ties who are responsible for the propagation of novel information. This suggests that weak ties may play a more dominant role in the dissemination of information online than currently believed.

Sample size: 253 million Facebook users.

Pay attention to the line:

We show that, although stronger ties are individually more influential, it is the more abundant weak ties who are responsible for the propagation of novel information.

If you have an “Web scale” (whatever that means) information delivery issue, you need to not only target CNN and Drudge with press releases but should consider targeting actors with abundant weak ties.

Thinking this could be important in topic map driven applications that “push” novel information into the social network of a large, distributed company. You know how few of us actually read the tiresome broadcast stuff from HR, etc., so what if the important parts were “reported” piecemeal by others?

It is great to have a large functioning topic map but it doesn’t become useful until people make the information it delivers their own and take action based upon it.

Comments Off

January 11, 2012

Social Networks and Archival Context Project (SNAC)

Filed under: Archives,Networks,Social Graphs,Social Networks — Patrick Durusau @ 8:03 pm

Social Networks and Archival Context Project (SNAC)

From the homepage:

The Social Networks and Archival Context Project (SNAC) will address the ongoing challenge of transforming description of and improving access to primary humanities resources through the use of advanced technologies. The project will test the feasibility of using existing archival descriptions in new ways, in order to enhance access and understanding of cultural resources in archives, libraries, and museums.

Archivists have a long history of describing the people who—acting individually, in families, or in formally organized groups—create and collect primary sources. They research and describe the people who create and are represented in the materials comprising our shared cultural legacy. However, because archivists have traditionally described records and their creators together, this information is tied to specific resources and institutions. Currently there is no system in place that aggregates and interrelates those descriptions.

Leveraging the new standard Encoded Archival Context-Corporate Bodies, Persons, and Families (EAC-CPF), the SNAC Project will use digital technology to “unlock” descriptions of people from finding aids and link them together in exciting new ways.

On the Prototype page you will find the following description:

While many of the names found in finding aids have been carefully constructed, frequently in consultation with LCNAF, many other names present extraction and matching challenges. For example, many personal names are in direct rather than indirect (or catalog entry) order. Life dates, if present, some times appear in parentheses or brackets. Numerous names some times appear in the same <persname>, <corpname>, or <famname>. Many names are incorrectly tagged, for example, a personal name tagged as a .

We will continue to refine the extraction and matching algorithms over the course of the project, but it is anticipated that it will only be possible to address some problems through manual editing, perhaps using “professional crowd sourcing.”

While the project is still a prototype, it occurs to me that it would make a handy source of identifiers.

Try:

Or one of the many others you will find at: Find Corporate, Personal, and Family Archival Context Records.

OK, now I have a question for you: All of the foregoing also appear in Wikipedia.

For your comparison:

If you could choose only one identifier for a subject, would you choose the SNAC or the Wikipedia links?

I ask because some semantic approaches take a “one ring” approach to identification. Ignoring the existence of multiple identifiers, even URL identifiers for the same subjects.

Of course, you already know that with topic maps you can have multiple identifiers for any subject.

In CTM syntax:

bush-vannevar
href=”http://socialarchive.iath.virginia.edu/xtf/view?docId=bush-vannevar-1890-1974-cr.xml ;
href=”http://en.wikipedia.org/wiki/Vannevar_Bush ;
– “Vannevar Bush” ;
– varname: “Bush, Vannevar, 1890-1974” ;
– varname: “Bush, Vannevar, 1890-” .

Which of course means that if I want to make a statement about the webpage for Vannevar Bush at Wikipedia, I can do so without any confusion:

wikipedia-vannevar-bush
= href=”http://en.wikipedia.org/wiki/Vannevar_Bush ;
descr: “URL as subject locator.” .

Or I can comment on a page at SNAC and map additional information to it. And you will always know if I am using the URL as an identifier or to point you towards a subject.

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 18, 2017

April 14, 2015

October 1, 2014

March 6, 2013

February 4, 2013

February 1, 2013

November 29, 2012

November 7, 2012

September 14, 2012

September 8, 2012

August 17, 2012

July 20, 2012

May 29, 2012

January 22, 2012

January 11, 2012

January 4, 2012

December 12, 2011

October 21, 2011

August 22, 2011

July 24, 2011

July 3, 2011

June 28, 2011

April 29, 2011

February 25, 2011