Social Networks « Another Word For It

September 25, 2013

Easier than Excel:…

Filed under: Excel,Gephi,Graphs,Networks,Social Networks — Patrick Durusau @ 4:59 pm

Easier than Excel: Social Network Analysis of DocGraph with Gephi by Janos G. Hajagos and Fred Trotter. (PDF)

From the session description:

The DocGraph dataset was released at Strata RX 2012. The dataset is the result of FOI request to CMS by healthcare data activist Fred Trotter (co-presenter). The dataset is minimal where each row consists of just three numbers: 2 healthcare provider identifiers and a weighting factor. By combining these three numbers with other publicly available information sources novel conclusions can be made about delivery of healthcare to Medicare members. As an example of this approach see: http://tripleweeds.tumblr.com/post/42989348374/visualizing-the-docgraph-for-wyoming-medicare-providers

The DocGraph dataset consists of over 49,685,810 relationships between 940,492 different Medicare providers. Analyzing the complete dataset is too big for traditional tools but useful subsets of the larger dataset can be analyzed with Gephi. Gephi is a opensource tool to visually explore and analyze graphs. This tutorial will teach participants how to use Gephi for social network analysis on the DocGraph dataset.

Outline of the tutorial:

Part 1: DocGraph and the network data model (30% of the time)

The DocGraph dataset The raw data Helper data (NPI associated data) The graph / network data model Nodes versus edges How graph models are integral to social networking Other Healthcare graph data sets

Part 2: Using Gephi to perform analysis (70% of the time)

Basic usage of Gephi Saving and reading the GraphML format Laying out edges and nodes of a graph Navigating and exploring the graph Generating graph metrics on the network Filtering a subset of the graph Producing the final output of the graph.

Links from the last slide:

http://strata.oreilly.com/2012/11/docgraph-open-social-doctor-data.html (information)

https://github.com/jhajagos/DocGraph (code)

http://notonlydev.com/docgraph-data (open source $1 covers bandwidth fees)

https://groups.google.com/forum/#!forum/docgraph (mailing list)

Just in case you don’t have it bookmarked already: Gephi.

The type of workshop that makes an entire conference seem like lagniappe.

Just sorry I will have to appreciate it from afar.

Work through this one carefully. You will acquire useful skills doing so.

Comments Off

May 19, 2013

March 13, 2013

March 6, 2013

Social Graphs and Applied NoSQL Solutions [Merging Graphs?]

Filed under: Graphs,Networks,Social Graphs,Social Networks — Patrick Durusau @ 11:20 am

Social Graphs and Applied NoSQL Solutions by John L. Myers.

From the post:

Recent postings have been more about the “theory” behind the wonderful world of NoSQL and less about how to implement a solution with a NoSQL platform. Well it’s about time that I changed that. This posting will be about how the graph structure and graph databases in particular can be an excellent “applied solution” of NoSQL technologies.

When Facebook released its Graph Search, the general public finally got a look at what the “backend” of Facebook looked like or its possible uses … For many the consumer to consumer (c2c) version of Facebook’s long available business-to-business and business-to-consumer offerings was a bit more of the “creepy” vs. the “cool” of the social media content. However, I think it will have the impact of opening people’s eyes on how their content can and probably should be used for search and other analytical purposes.

With graph structures, unlike tabular structures such as row and column data schemas, you look at the relationships between the nodes (i.e. customers, products, locations, etc.) as opposed to looking at the attributes of a particular object. For someone like me, who has long advocated that we should look at how people, places and things interact with each other versus how their “demographics” (i.e. size, shape, income, etc.) make us “guess” how they interact with each other. In my opinion, demographics and now firmographics have been used as “substitutes” for how people and organizations behave. While this can be effective in the aggregate, as we move toward a “bucket of one” treatment model for customers or clients, for instance, we need to move away from using demographics/firmographics as a primary analysis tool.

Let’s say that graph databases become as popular as SQL databases. You can’t scratch an enterprise without finding a graph database.

And they are all as different from each other as the typical SQL database is today.

How do you go about merging graph databases?

Can you merge a graph database and retain the characteristics of the graph databases separately?

If graph databases become as popular as they should, those are going to be real questions in the not too distant future.

Comments Off

February 23, 2013

February 9, 2013

The Perfect Case for Social Network Analysis [Maybe yes. Maybe no.]

Filed under: Graphs,Networks,Security,Social Networks — Patrick Durusau @ 8:21 pm

New Jersey-based Fraud Ring Charged this Week: The Perfect Case for Social Network Analysis by Mike Betron.

When I first saw the headline, I thought the New Jersey legislature had gotten busted.

No such luck, although with real transparency on contributions, relationships and state contracts, prison building would become a growth industry in New Jersey and elsewhere.

From the post:

As reported by MSN Money this week, eighteen members of a fraud ring have just been charged in what may be one of the largest international credit card scams in history. The New Jersey-based fraud ring is reported to have stolen at least $200 million, fooling credit card agencies by creating thousands of fake identities to create accounts.

What They Did

The FBI claims the members of the ring began their activity as early as 2007, and over time, used more than 7,000 fake identities to get more than 25,000 credit cards, using more than 1,800 addresses. Once they obtained credit cards, ring members started out by making small purchases and paying them off quickly to build up good credit scores. The next step was to send false reports to credit agencies to show that the account holders had paid off debts – and soon, their fake account holders had glowing credit ratings and high spending limits. Once the limits were raised, the fraudsters would “bust out,” taking out cash loans or maxing out the cards with no intention of paying them back.

But here’s the catch: The criminals in this case created synthetic identities with fake identity information (social security numbers, names, addresses, phone numbers, etc.). Addresses for the account holders were used multiple times on multiple accounts, and the members created at least 80 fake businesses which accepted credit card payments from the ring members.

This is exactly the kind of situation that would be caught by Social Network Analysis (SNA) software. Unfortunately, the credit card companies in this case didn’t have it.

Well, yes and no.

Yes, if Social Network Analysis (SNA) software were looking for the right relationships, the it could catch the fraud in question.

No, if Social Network Analysis (SNA) software were looking at the wrong relationships, the it would not catch the fraud in question.

Analysis isn’t a question of technology.

For example, what one policy change would do more to prevent future 9/11 type incidents that all the $billions spent since 9/11/2001?

Would you believe: Don’t open the cockpit door for hijackers. (full stop)

The 9/11 hijackers took advantage of the “Common Strategy” flaw in U.S. hijacking protocols.

One of the FAA officials most involved with the Common Strategy in the period leading up to 9/11 described it as an approach dating back to the early 1980s, developed in consultation with the industry and the FBI, and based on the historical record of hijackings. The point of the strategy was to “optimize actions taken by a flight crew to resolve hijackings peacefully” through systematic delay and, if necessary, accommodation of the hijackers. The record had shown that the longer a hijacking persisted, the more likely it was to have a peaceful resolution. The strategy operated on the fundamental assumptions that hijackers issue negotiable demands, most often for asylum or the release of prisoners, and that “suicide wasn’t in the game plan” of hijackers.

Hijackers may blow up a plane, kill or torture passengers, but not opening the cockpit door prevents a future 9/11 type event.

But at 9/11, there was no historical experience with hijacking a plane to use a weapon.

Historical experience is just as important for detecting fraud.

Once a pattern is identified for fraud, SNA or topic maps or several other technologies can spot it.

But it has to be identified that first time.

Comments Off

February 4, 2013

The Swipp API: Creating the World’s Social Intelligence

Filed under: Social Graphs,Social Media,Social Networks — Patrick Durusau @ 11:29 am

The Swipp API: Creating the World’s Social Intelligence by Greg Bates.

From the post:

The Swipp API allows developers to integrate Swipp’s “Social Intelligence” into their sites and applications. Public information is not available on the API; interested parties are asked to email info@swipp.com. Once available the APIs will “make it possible for people to interact around any topic imaginable.”

[graphic omitted]

Having operated in stealth mode for 2 years, Swipp founders Don Thorson and Charlie Costantini decided to go public after Facebook’s release of it’s somewhat different competitor, the social graph. The idea is to let users rate any topic they can comment on or anything they can photograph. Others can chime in, providing an average rating by users. One cool difference: you can dislike something as well as like it, giving a rating from -5 to +5. According to Darrell Etherington at Techcrunch, the company has a three-pronged strategy of a consumer app just described, a business component tailored around specific events like the Superbowl, that will help businesses target specific segments.

A fact that seems to be lost in most discussions of social media/sites is that social intelligence already exists.

Social media/sites may assist in the capturing/recording of social intelligence but that isn’t the same thing as creating social intelligence.

It is an important distinction because understanding the capture/recording role enables us to focus on what we want to capture and in what way?

What we decide to capture or record greatly influences the utility of the social intelligence we gather.

Such as capturing how users choose to identify particular subjects or relationships between subjects, for example.

PS: The goal of Swipp is to create a social network and ratings system (like Facebook) that is open for re-use elsewhere on the web. Adding semantic integration to that social networks and ratings system would be a plus I would imagine.

Comments Off

February 2, 2013

Neo4j – Social Networking – QA – Scientific Communication

Filed under: Graphs,Neo4j,Social Networks — Patrick Durusau @ 3:10 pm

René Pickhardt’s blog post title was: Slides of Related work application presented in the Graphdevroom at FOSDEM, which is unlikely to catch your eye. The paper title is: A neo4j powered social networking and Question & Answer application to enhance scientific communication.

I took the liberty of crafting a shorter title for this post.

The problems René addresses are shared by all academics:

Finding new relevant publications

Connecting people interested in the same topic

This project is the result of the merger of the Open Citation and Related Work project, on which see: Open Citations and Related Work projects merge.

The terminology for the project components:

Open Citations Corpus: data corpus
Open Citations Corpus Datastore (OCCD): infrastructure of the data corpus
Related Work: user-oriented services built on top of the citation data

Resources:

You need to take a long look at the project in general but the data in particular.

From the data webpage:

We downloaded the source files of all arxiv articles published until 2012-09-31, extracted the references and matched them against the metadata using these python scripts. The result is a 2.0Gb sized *.txt file with more than 16m lines representing the citaiton graph in the following format:

Document level linking so there is still topic map work to be done merging the same subjects identified differently but this data set is certainly a “leg up” on that task.

We should all encourage if not actively contribute to the Related Work project.

Comments Off

February 1, 2013

REVIEW: Crawling social media and depicting social networks with NodeXL [in 3 parts]

Filed under: NodeXL,Social Graphs,Social Media,Social Networks — Patrick Durusau @ 8:08 pm

REVIEW: Crawling social media and depicting social networks with NodeXL by Eruditio Loginquitas.appears in three parts: Part 1 of 3, Part 2 of 3 and Part 3 of 3.

From part 1:

Surprisingly, given the complexity of the subject matter and the various potential uses by researchers from a range of fields, “Analyzing…” is a very coherent and highly readable text. The ideas are well illustrated throughout with full-color screenshots.

In the introduction, the authors explain that this is a spatially organized book—in the form of an organic tree. The early chapters are the roots which lay the groundwork of social media and social network analysis. Then, there is a mid-section that deals with how to use the NodeXL add-on to Excel. Finally, there are chapters that address particular social media platforms and how data is extracted and analyzed from each type. These descriptors include email, thread networks, Twitter, Facebook, WWW hyperlink networks, Flickr, YouTube, and wiki networks. The work is surprisingly succinct, clear, and practical.

Further, it is written with such range that it can serve as an introductory text for newcomers to social network analysis (me included) as well as those who have been using this approach for a while (but may need to review the social media and data crawling aspects). Taken in total, this work is highly informative, with clear depictions of the social and technical sides of social media platforms.

From part 2:

One of the strengths of “Analyzing Social Media Networks with NodeXL” is that it introduces a powerful research method and a tool that helps tap electronic media and non-electronic social network information intelligently, in a way that does not over-state what is knowable. The authors, Derek Hansen, Ben Schneiderman, and Marc A. Smith, are no strangers to research or academic publishing, and theirs is a fairly conservative approach in terms of what may be asserted.

To frame what may be researched, the authors use a range of resources: some generalized research questions, examples from real-world research, and step-by-step techniques for data extraction, analysis, visualization, and then further analysis.

From part 3:

What is most memorable about “Analyzing Social Media Networks with NodeXL” is the depth of information about the various social network sites that may be crawled using NodeXL. With so many evolving social network platforms, and each capturing and storing information differently, it helps to know what an actual data extractions mean.

I haven’t seen the book personally, but from this review it sounds like a good model for technical writing for a lay audience.

For that matter, a good model for writing about topic maps for a lay audience. (Many of the issues being similar.)

Comments Off

January 25, 2013

January 23, 2013

January 14, 2013

December 14, 2012

Structure and Dynamics of Information Pathways in Online Media

Filed under: Information Flow,Information Theory,Networks,News,Social Networks — Patrick Durusau @ 6:16 am

Structure and Dynamics of Information Pathways in Online Media by Manuel Gomez Rodriguez, Jure Leskovec, Bernhard Schölkopf.

Abstract:

Diffusion of information, spread of rumors and infectious diseases are all instances of stochastic processes that occur over the edges of an underlying network. Many times networks over which contagions spread are unobserved, and such networks are often dynamic and change over time. In this paper, we investigate the problem of inferring dynamic networks based on information diffusion data. We assume there is an unobserved dynamic network that changes over time, while we observe the results of a dynamic process spreading over the edges of the network. The task then is to infer the edges and the dynamics of the underlying network.

We develop an on-line algorithm that relies on stochastic convex optimization to efficiently solve the dynamic network inference problem. We apply our algorithm to information diffusion among 3.3 million mainstream media and blog sites and experiment with more than 179 million different pieces of information spreading over the network in a one year period. We study the evolution of information pathways in the online media space and find interesting insights. Information pathways for general recurrent topics are more stable across time than for on-going news events. Clusters of news media sites and blogs often emerge and vanish in matter of days for on-going news events. Major social movements and events involving civil population, such as the Libyan’s civil war or Syria’s uprise, lead to an increased amount of information pathways among blogs as well as in the overall increase in the network centrality of blogs and social media sites.

A close reading of this paper will have to wait for the holidays but it will be very near the top of the stack!

Transient subjects anyone?

Comments Off

November 29, 2012

November 11, 2012

November 10, 2012

November 4, 2012

October 24, 2012

September 29, 2012

September 28, 2012

September 14, 2012

First Party Fraud (In Four Parts)

Filed under: Business Intelligence,Graphs,Networks,Social Graphs,Social Networks — Patrick Durusau @ 1:00 pm

Mike Betron as written a four-part series on first party fraud that merits your attention:

First Part Fraud [Part 1]

What is First Party Fraud?

First-party fraud (FPF) is defined as when somebody enters into a relationship with a bank using either their own identity or a fictitious identity with the intent to defraud. First-party fraud is different from third-party fraud (also known as “identity fraud”) because in third-party fraud, the perpetrator uses another person’s identifying information (such as a social security number, address, phone number, etc.). FPF is often referred to as a “victimless” crime, because no consumers or individuals are directly affected. The real victim in FPF is the bank, which has to eat all of the financial losses.

First-Party Fraud: How Do We Assess and Stop the Damage? [Part 2]

Mike covers the cost of first party fraud and then why it is so hard to combat.

Why is it so hard to detect FPF?

Given the amount of financial pain incurred by bust-out fraud, you might wonder why banks haven’t developed a solution and process for detecting and stopping it.

There are three primary reasons why first-party fraud is so hard to identify and block:

1) The fraudsters look like normal customers

…

2) The crime festers in multiple departments

…

3) The speed of execution is very fast

Fighting First Party Fraud With Social Link Analysis (3 of 4)

And you know, those pesky criminals won’t use their universally assigned identifiers for financial transactions. (Any security system that relies on good faith isn’t a security system, it’s an opportunity.)

A Trail of Clues Left by Criminals

Although organized fraudsters are sophisticated, they often leave behind evidence that can be used to uncover networks of organized crime. Fraudsters know that due to Know Your Customer (KYC) and Customer Due Diligence (CDD) regulations, their identification will be verified when they open an account with a financial institution. To pass these checks, the criminals will either modify their own identity slightly or else create a synthetic identity, which consists of combining real identity information (e.g., a social security number) with fake identity information (names, addresses, phone numbers, etc.).

Fortunately for banks, false identity information can be expensive and inconvenient to acquire and maintain. For example, apartments must be rented out to maintain a valid address. Additionally, there are only so many cell phones a person can carry at one time and only so many aliases that can be remembered. Because of this, fraudsters recycle bits and pieces of these valuable assets.

The reuse of identity information has inspired Infoglide to begin to create new technology on top of its IRE platform called Social Link Analysis (SLA). SLA works by examining the “linkages” between the recycled identities, therefore identifying potential fraud networks. Once the networks are detected, Infoglide SLA applies advanced analytics to determine the risk level for both the network and for every individual associated with that network.

First Party Fraud (post 4 of 4) – A Use Case

As discussed in our previous blog in this series, Social Link Analysis works by identifying linkages between individuals to create a social network. Social Link Analysis can then analyze the network to identify organized crime, such as bust-out fraud and internal collusion.

During the Social Link Analysis process, every individual is connected to a single network. An analysis at a large tier 1 bank will turn up millions of networks, but the majority of individuals only belong to very small networks (such as a husband and wife, and possibly a child). However, the social linking process will certainly turn up a small percentage of larger networks of interconnected individuals. It is in these larger networks where participants of bust-out fraud are hiding.

Due to the massive number of networks within a system, the analysis is performed mathematically (e.g. without user interface) and scores and alerts are generated. However, any network can be “visualized” using the software to create a graphic display of information and connections. In this example, we’ll look at a visualization of a small network that the social link analysis tool has alerted as a possible fraud ring.

A word of caution.

To leap from the example individuals being related to each other to:

As a result, Social Link Analysis has detected four members of a network, each with various amounts of charged-off fraud.

Is quite a leap.

Having charged off loans, with re-use of telephone numbers and a mobile population, doesn’t necessarily mean anyone is guilty of “charged-off fraud.”

Could be, but you should tread carefully and with legal advice before jumping to conclusions of fraud.

For good customer relations, if not avoiding bad PR and legal liability.

PS: Topic maps can help with this type of data. Including mapping in the bank locations or even personnel who accepted particular loans.

Comments Off

August 17, 2012

August 2, 2012

July 27, 2012

Computational Aspects of Social Networks (CASoN) [Conference]

Filed under: Conferences,Networks,Social Networks — Patrick Durusau @ 7:51 am

Computational Aspects of Social Networks (CASoN)

Important Dates:

Paper submission due:Aug. 15 2012
Notification of paper acceptance:Sep. 15 2012
Final manuscript due:Sep. 30 2012
Registration and full payment due:Sep. 30 2012
Conference date:Nov. 21-23 2012

Conference venue: São Carlos, Brazil.

From the Call for Papers:

The International Conference on Computational Aspects of Social Networks (CASoN 2012) brings together an interdisciplinary venue for social scientists, mathematicians, computer scientists, engineers, computer users, and students to exchange and share their experiences, new ideas, and research results about all aspects (theory, applications and tools) of intelligent methods applied to Social Networks, and to discuss the practical challenges encountered and the solutions adopted.

Social networks provide a powerful abstraction of the structure and dynamics of diverse kinds of people or people-to-technology interaction. These social network systems are usually characterized by the complex network structures and rich accompanying contextual information. Recent trends also indicate the usage of complex network as a key feature for next generation usage and exploitation of the Web. This international conference on Computational Aspect of Networks is focused on the foundations of social networks as well as case studies, empirical, and other methodological works related to the computational tools for the automatic discovery of Web-based social networks. This conference provides an opportunity to compare and contrast the ethological approach to social behavior in animals (including the study of animal tracks and learning by members of the same species) with web-based evidence of social interaction, perceptual learning, information granulation, the behavior of humans and affinities between web-based social networks. The main topics cover the design and use of various computational intelligence tools and software, simulations of social networks, representation and analysis of social networks, use of semantic networks in the design and community-based research issues such as knowledge discovery, privacy and protection, and visualization.

We solicit original research and technical papers not published elsewhere. The papers can be theoretical, practical and application, and cover a broad set of intelligent methods, with particular emphasis on Social Network computing.

One of the more interesting aspects of social network study, at least to me, is the existence of social networks of researchers who are studying social networks. Implies, to me at least, that “subjects” of discussion have their origins in social networks.

Some approaches, I won’t name names, take “subjects” as given and never question their origins. That leads directly to fragile systems/ontologies because change isn’t taken into account.

Clearly saying “stop” is insufficient, else the many attempts to fix some standardized language would have succeeded long ago.

If you know approaches that attempt to allow for change, would appreciate a note.

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 25, 2013

May 19, 2013

March 13, 2013

March 6, 2013

February 23, 2013

February 9, 2013

February 4, 2013

February 2, 2013

February 1, 2013

January 25, 2013

January 23, 2013

January 14, 2013

December 14, 2012

November 29, 2012

November 11, 2012

November 10, 2012

November 4, 2012

October 24, 2012

September 29, 2012

September 28, 2012

September 14, 2012

August 17, 2012

August 2, 2012

July 27, 2012

May 21, 2012

May 20, 2012

May 10, 2012