Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 13, 2015

Bytes that Rock! Software Awards 2015 (Nominations Open Now – Close 16th November 2015)

Filed under: Blogs,Contest,Games,Software — Patrick Durusau @ 2:38 pm

Bytes that Rock! Software Awards 2015 (Nominations Open Now – Close 16th November 2015)

An awards program for excellence in software and blogs!

The only limitation I could find is:

Bytes that Rock recognizes the best software and blogs for their excellence in the past 12 months.

Your game/software/blog may have been excellent three (3) years ago but that doesn’t count. 😉

Subject to that mild limitation, step up and:

Submit a blog, software or game clicking on the categories below!

Software blogs
VideoGame blogs
Security blogs

PC Software
Software UI
Innovative Software
Protection Software
Open Source Software

PC Games
Indie Games
Mods for games

This is not a next week, or after I ask X, or when I get home task.

This is a hit a submit link now task!

You will feel better after having made a nomination. Promise. 😉

BTR_1

(Select the graphic for a much larger version of the image.)

January 20, 2015

Flask and Neo4j

Filed under: Blogs,Graphs,Neo4j,Python — Patrick Durusau @ 5:03 pm

Flask and Neo4j – An example blogging application powered by Flask and Neo4j. by Nicole White.

From the post:

I recommend that you read through Flask’s quickstart guide before reading this tutorial. The following is drawn from Flask’s tutorial on building a microblog application. This tutorial expands the microblog example to include social features, such as tagging posts and recommending similar users, by using Neo4j instead of SQLite as the backend database.
(14 parts follow here)

The fourteen parts take you all the way through deployment on Heroku.

I don’t think you will abandon your current blogging platform but you will gain insight into Neo4j and Flask. A non-trivial outcome.

May 31, 2014

Conference on Weblogs and Social Media (Proceedings)

Filed under: Blogs,Social Media,Social Networks,Text Mining — Patrick Durusau @ 1:53 pm

Proceedings of the Eighth International Conference on Weblogs and Social Media

A great collection of fifty-eight papers and thirty-one posters on weblogs and social media.

Not directly applicable to topic maps but social media messages are as confused, ambiguous, etc., as any area could be. Perhaps more so but there isn’t a reliable measure for semantic confusion that I am aware of to compare different media.

These papers may give you some insight into social media and useful ways for processing its messages.

I first saw this in a tweet by Ben Hachey.

July 9, 2013

AAAI – Weblogs and Social Media

Filed under: Artificial Intelligence,Blogs,Social Media,Tweets — Patrick Durusau @ 12:34 pm

Seventh International AAAI Conference on Weblogs and Social Media

Abstracts and papers from the Seventh International AAAI Conference on Weblogs and Social Media.

Much to consider:

Frontmatter: Six (6) entries.

Full Papers: Sixty-nine (69) entries.

Poster Papers: Eighteen (18) entries.

Demonstration Papers: Five (5) entries.

Computational Personality Recognition: Ten (10) entries.

Social Computing for Workforce 2.0: Seven (7) entries.

Social Media Visualization: Four (4) entries.

When the City Meets the Citizen: Nine (9) entries.

Be aware that the links for tutorials and workshops only give you the abstracts describing the tutorials and workshops.

There is the obligatory “blind men and the elephant” paper:

Blind Men and the Elephant: Detecting Evolving Groups in Social News

Abstract:

We propose an automated and unsupervised methodology for a novel summarization of group behavior based on content preference. We show that graph theoretical community evolution (based on similarity of user preference for content) is effective in indexing these dynamics. Combined with text analysis that targets automatically-identified representative content for each community, our method produces a novel multi-layered representation of evolving group behavior. We demonstrate this methodology in the context of political discourse on a social news site with data that spans more than four years and find coexisting political leanings over extended periods and a disruptive external event that lead to a significant reorganization of existing patterns. Finally, where there exists no ground truth, we propose a new evaluation approach by using entropy measures as evidence of coherence along the evolution path of these groups. This methodology is valuable to designers and managers of online forums in need of granular analytics of user activity, as well as to researchers in social and political sciences who wish to extend their inquiries to large-scale data available on the web.

It is a great paper but commits a common error when it notes:

Like the parable of Blind Men and the Elephant2, these techniques provide us with disjoint, specific pieces of information.

Yes, the parable is oft told to make a point about partial knowledge, but the careful observer will ask:

How are we different from the blind men trying to determine the nature of an elephant?

Aren’t we also blind men trying to determine the nature of blind men who are examining an elephant?

And so on?

Not that being blind men should keep us from having opinions, but it should may us wary of how deeply we are attached to them.

Not only are there elephants all the way down, there are blind men before, with (including ourselves) and around us.

December 9, 2012

Dissing Disqus

Filed under: Blogs — Patrick Durusau @ 2:55 pm

I wanted to advise a blogger of a URL error I found. Was going to simply leave a comment with the correct information.

Imagine my surprise when I tried to authenticate using Twitter to be told that Disqus could:

  • Read Tweets from your timeline.
  • See who you follow, and follow new people.
  • Update your profile.
  • Post Tweets for you.

The “Read Tweets from your timeline.” doesn’t bother me. Mostly because I don’t write anything down I would mind being public. 😉

But, following new people, updating my profile and posting tweets, all for me, what about that doesn’t suck?

If you use Disqus, don’t expect any comments from me.

Abandoning Disqus could make such over-reaching less common.

November 11, 2012

Analysis of the statistics blogosphere

Filed under: Blogs,Data Mining,Python,Social Networks — Patrick Durusau @ 8:11 pm

Analysis of the statistics blogosphere by John Johnson.

From the post:

My analysis of the statistics blogosphere for the Coursera Social Networking Analysis class is up. The Python code and the data are up at my github repository. Enjoy!

Included are most of the Python code I used to obtain blog content, some of my attempts to automate the building of the network (I ended up using a manual process in the end), and my analysis. I also included the data. (You can probably see some of your own content.)

Excellent post on mining blog content.

A rich source of data for a topic map on the subject of your dreams.

June 3, 2012

Discussion of scholarly information in research blogs

Filed under: Bibliometrics,Blogs,Citation Analysis,Citation Indexing — Patrick Durusau @ 3:13 pm

Discussion of scholarly information in research blogs by Hadas Shema.

From the post:

As some of you know, Mike Thelwall, Judit Bar-Ilan (both are my dissertation advisors) and myself published an article called “Research Blogs and the Discussion of Scholarly Information” in PLoS One. Many people showed interest in the article, and I thought I’d write a “director’s commentary” post. Naturally, I’m saving all your tweets and blog posts for later research.

The Sample

We characterized 126 blogs with 135 authors from Researchblogging.Org (RB), an aggregator of blog posts dealing with peer-review research. Two over-achievers had two blogs each, and 11 blogs had two authors.

While our interest in research blogs started before we ever heard of RB, it was reading an article using RB that really kick-started the project. Groth & Gurney (2010) wrote an article titled “Studying scientific discourse on the Web using bibliometrics: A chemistry blogging case study.” The article made for a fascinating read, because it applied bibliometric methods to blogs. Just like it says in the title, Groth & Gurney took the references from 295 blog posts about Chemistry and analyzed them the way one would analyze citations from peer-reviewed articles. They managed that because they used RB, which aggregates only posts by bloggers who take the time to formally cite their sources. Major drooling ensued at that point. People citing in a scholarly manner out of their free will? It’s Christmas!

Questions that stand out for me on blogs:

Will our indexing/searching of blogs have the same all or nothing granularity of scholarly articles?

If not, why not?

April 22, 2012

Finding New Story Links Through Blog Clustering

Filed under: Blogs,Clustering,Searching — Patrick Durusau @ 7:08 pm

Finding New Story Links Through Blog Clustering

Matthew Hurst writes:

The basic mechanism used in track // microsoft to cluster articles is similar to that used by Techmeme. A fixed set of blogs are crawled and clustered based on specific features such as link structure and content (and in the case of Techmeme, additional human input). However, what about blogs that aren't known to the system?

I recently added a feature to track // microsoft which analyses clusters for popular urls and adds those to the bottom of the cluster. The title of the web page is used as a simple description of the popular page.

In the recent story about Nuno Silva's mistaken comment regarding the future of Windows Phone devices, there were many links to Nuno's own blog post. In addition to the large cluster of known blogs that were determined to be talking about the story, track // microsoft also surfaced Nuno's post through analysing the popular links discovered within the cluster.

Interesting blog discovery method.

April 10, 2012

The Trend Point

Filed under: Analytics,Blogs,Open Source — Patrick Durusau @ 6:45 pm

The Trend Point

Described by a “sister” publication as:

ArnoldIT has rolled out The Trend Point information service. Published Monday through Friday, the information services focuses on the intersection of open source software and next-generation analytics. The approach will be for the editors and researchers to identify high-value source documents and then encapsulate these documents into easily-digested articles and stories. In addition, critical commentary, supplementary links, and important facts from the source document are provided. Unlike a news aggregation service run by automated agents, librarians and researchers use the ArnoldIT Overflight tools to track companies, concepts, and products. The combination of human-intermediated research with Overflight provide an executive or business professional with a quick, easy, and free way to keep track of important developments in open source analytics. There is no charge for the service.

I was looking for something different to say other than just reporting a new data stream and found this under the “about” link:

I write for fee columns for Enterprise Technology Management, Information Today, Online Magazine, and KMWorld plus a few occasional items. My content reaches somewhere between one and three people each month.

I started to monetize Beyond Search in 2008. I have expanded our content services to white papers about a search, content processing or analytics. These reports are prepared for a client. The approach is objective and we include information that makes these documents suitable for the client’s marketing and sales efforts. Clients work closely with the Beyond Search professional to help ensure that the message is on target and clear. Rates are set to be within reach of organizations regardless of their size.

You can get coverage in this or one of our other information services, but we charge for our time. Stated another way: If you want a story about you, your company, or your product, you will be expected to write a check or pay via PayPal. We do not do news. We do this. (emphasis added to the first paragraph)

For some reason, I would have expected Stephen E. Arnold to reach more than …between one and three people each month. That sounds low to me. 😉

The line: “We do not do news.” Makes me wonder what the University of Southhampton paid to have a four page document described as a “dissertation.” See: New Paper: Linked Data Strategy for Global Identity. Or for that matter, what will it cost to get into “The Trend Point?”

Thoughts?

April 3, 2012

Tracking Video Game Buzz

Filed under: Blogs,Clustering,Data Mining,Tweets — Patrick Durusau @ 4:17 pm

Tracking Video Game Buzz

Matthew Hurst writes:

Briefly, I pushed out an experimental version of track // games to track tropics in the blogosphere relating to video games. As with track // microsoft it uses gathers posts from blogs, clusters them and uses an attention metric based on Bitly and Twitter to rank the clusters, new posts and videos.

Currently at the top of the stack is Bungie Waves Goodbye To Halo.

Wonder if Matthew could be persuaded to do the same for the elections this Fall in the United States? 😉

January 15, 2012

Pbm: A new dataset for blog mining

Filed under: Blogs,Dataset — Patrick Durusau @ 9:15 pm

Pbm: A new dataset for blog mining by Mehwish Aziz and Muhammad Rafi.

Abstract:

Text mining is becoming vital as Web 2.0 offers collaborative content creation and sharing. Now Researchers have growing interest in text mining methods for discovering knowledge. Text mining researchers come from variety of areas like: Natural Language Processing, Computational Linguistic, Machine Learning, and Statistics. A typical text mining application involves preprocessing of text, stemming and lemmatization, tagging and annotation, deriving knowledge patterns, evaluating and interpreting the results. There are numerous approaches for performing text mining tasks, like: clustering, categorization, sentimental analysis, and summarization. There is a growing need to standardize the evaluation of these tasks. One major component of establishing standardization is to provide standard datasets for these tasks. Although there are various standard datasets available for traditional text mining tasks, but there are very few and expensive datasets for blog-mining task. Blogs, a new genre in web 2.0 is a digital diary of web user, which has chronological entries and contains a lot of useful knowledge, thus offers a lot of challenges and opportunities for text mining. In this paper, we report a new indigenous dataset for Pakistani Political Blogosphere. The paper describes the process of data collection, organization, and standardization. We have used this dataset for carrying out various text mining tasks for blogosphere, like: blog-search, political sentiments analysis and tracking, identification of influential blogger, and clustering of the blog-posts. We wish to offer this dataset free for others who aspire to pursue further in this domain.

This paper details construction of the blog data set used in Sentence based semantic similarity measure for blog-posts.

The aspect I found most interesting was the restriction of the data set to a particular domain. When I was using physical research tools (books) in libraries, there was no “index to everything” available. Nor would I have used it had it been available.

If I had a social science question (political science major) or later a law question (law school), I would pick a physical research tool (PRT) that was appropriate to the search request. Why? Because specialized publications were curated to facilitate research in a particular area, including identification of synonyms and cross-referencing of information you might otherwise not notice.

Is this blogging dataset a clue that if we created sub-sets of the entire WWW, that we could create indexing/analysis routines specific to those datasets? And hence give users a measurably better search experience?

Powered by WordPress