Archive for the ‘R’ Category

“Practical Data Science with R” MEAP (ordered)

Tuesday, May 21st, 2013

Big News! “Practical Data Science with R” MEAP launched! by John Mount.

From the post:

Nina Zumel and I ( John Mount ) have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has now entered Manning Early Access Program (MEAP) which allows you to subscribe to chapters as they become available and give us feedback before the book goes into print.

R image

Deal of the Day May 21 2013: Half off Practical Data Science with R. Use code dotd0521au.

I ordered the “Practical Data Science with R” MEAP today, based on my other Manning MEAP experiences.

You?

How to Build a Text Mining, Machine Learning….

Monday, May 13th, 2013

How to Build a Text Mining, Machine Learning Document Classification System in R! by Timothy DAuria.

From the description:

We show how to build a machine learning document classification system from scratch in less than 30 minutes using R. We use a text mining approach to identify the speaker of unmarked presidential campaign speeches. Applications in brand management, auditing, fraud detection, electronic medical records, and more.

Well made video introduction to R and text mining.

A Crash Course in R

Friday, May 3rd, 2013

A Crash Course in R

From the post:

This code has been kindly contributed by Robin Edwards (from UCL CASA).

There are many useful introductory guides out there to R, but below is the kind of thing I now wish I’d been given when I first started using it – something with simple logically-progressive examples and minimal explanatory text. Copy the text below into a new script in R and run line-by-line to give a quick intro to many of R’s most basic principles and functionality. You can also download a text file with it here. It is by no means comprehensive, even at the most basic level, but still I hope someone finds it useful. You may want to look at RStudio as it is more user-friendly.

The sort of thing you can run through to develop “finger” memory as well. ;-)

Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Wednesday, April 24th, 2013

Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices by Shivaram Venkataraman, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, and Robert S. Schreiber.

Abstract:

It is cumbersome to write machine learning and graph algorithms in data-parallel models such as MapReduce and Dryad. We observe that these algorithms are based on matrix computations and, hence, are inefficient to implement with the restrictive programming and communication interface of such frameworks.

In this paper we show that array-based languages such as R [3] are suitable for implementing complex algorithms and can outperform current data parallel solutions. Since R is single-threaded and does not scale to large datasets, we have built Presto, a distributed system that extends R and addresses many of its limitations. Presto efficiently shares sparse structured data, can leverage multi-cores, and dynamically partitions data to mitigate load imbalance. Our results show the promise of this approach: many important machine learning and graph algorithms can be expressed in a single framework and are substantially faster than those in Hadoop and Spark.

Your mileage may vary but the paper reports that for PageRank, Presto is 40X faster than Hadoop and 15X Spark.

Unfortunately I can’t point you to any binary or source code for Presto.

Still, the description is an interesting one at a time of rapid development of computing power.

Deducer: R Graphic Interface For Everyone

Sunday, April 21st, 2013

Deducer: R Graphic Interface For Everyone

From the webpage:

Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is two fold.

  1. Provide an intuitive graphical user interface (GUI) for R, encouraging non-technical users to learn and perform analyses without programming getting in their way.
  2. Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming.

Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Windows RGUI and RTerm).

You may also be interested in: How to install R, JGR and Deducer in Ubuntu. If so, add: Solving OpenJDK install errors in Ubuntu.

As you might imagine, data analysis has multiple languages (sound familiar?). Which one you choose is largely a matter of personal preference.

Data Computation Fundamentals [Promoting Data Literacy]

Saturday, April 20th, 2013

Data Computation Fundamentals by Daniel Kaplan and Libby Shoop.

From the first lesson:

Teaching the Grammar of Data

Twenty years ago, science students could get by with a working knowledge of a spreadsheet program. Those days are long gone, says Danny Kaplan, DeWitt Wallace Professor of Mathematics and Computer Science. “Excel isn’t going to cut it,” he says. “In today’s world, students can’t escape big data. Though it won’t be easy to teach it, it will only get harder as they move into their professional training.”

To that end, Kaplan and computer science professor Libby Shoop have developed a one-credit class called Data Computation Fundamentals, which is being offered beginning this semester. Though Kaplan doesn’t pretend the course can address all the complexities of specific software packages, he does hope it will provide a framework that students can apply when they come across databases or data-reliant programs in biology, chemistry, and physics. “We believe we can give students that grammar of data that they need to use these modern capabilities,” he says.

Not quite “have developed.” Should say, “are developing, in conjunction with a group of about 20 students.”

Data literacy impacts the acceptance and use of data and tools for using data.

Teaching people to read and write is not a threat to commercial authors.

By the same token, teaching people to use data is not a threat to competent data analysts.

Help the authors and yourself by reviewing the course and offering comments for its improvement.

I first saw this at: A Course in Data and Computing Fundamentals.

Nozzle R Package

Sunday, April 14th, 2013

Nozzle R Package

From the webpage:

Nozzle is an R package for generation of reports in high-throughput data analysis pipelines. Nozzle reports are implemented in HTML, JavaScript, and Cascading Style Sheets (CSS), but developers do not need any knowledge of these technologies to work with Nozzle. Instead they can use a simple R API to design and implement powerful reports with advanced features such as foldable sections, zoomable figures, sortable tables, and supplementary information. Please cite our Bioinformatics paper if you are using Nozzle in your work.

I have only looked at the demo reports but this looks quite handy.

It doesn’t hurt to have extensive documentation to justify a conclusion that took you only moments to reach.

R Cheatsheets

Wednesday, April 10th, 2013

R Cheatsheets

I ran across this collection of cheatsheets for R today.

The R Reference Card for Data Mining is interesting to me but you want to look at some of the others.

Enjoy!

Spring Cleaning Data: 1 of 6… [Federal Reserve]

Tuesday, April 9th, 2013

Spring Cleaning Data: 1 of 6 – Downloading the Data & Opening Excel Files

From the post:

With spring in the air, I thought it would be fun to do a series on (spring) cleaning data. The posts will follow my efforts to to download the data, import into R, cleaned it up, merge the different files, add columns of information created, and then a master file exported. During the process I will be offering at times different ways to do things, this is an attempt to show how there is no one way of doing something, but there are several. When appropriate I will demonstrate as many as I can think of, given the data.

This series of posts will be focusing on the Discount Window of the Federal Reserve. I know I seem to be picking on the Feds, but I am genuinely interested in what they have. The fact that there is data on the discount window is, to be blunt, took legislation from congress to get. The first step in this project was to find the data. The data and additional information can be downloaded here.

I don’t have much faith in government data but if you are going to debate on the “data,” such as it is, you will need to clean it up and combine it with other data.

This is a good start in that direction for data from the Federal Reserve.

If you are interested in data from other government agencies, publishing the steps needed to clean/combine their data would move everyone forward.

A topic map of cleaning directions for government data could be a useful tool.

Not that clean data = government transparency but it might make it easier to spot the shadows.

High-Performance and Parallel Computing with R

Tuesday, April 9th, 2013

High-Performance and Parallel Computing with R by Dirk Eddelbuettel.

From the webpage:

This CRAN task view contains a list of packages, grouped by topic, that are useful for high-performance computing (HPC) with R. In this context, we are defining ‘high-performance computing’ rather loosely as just about anything related to pushing R a little further: using compiled code, parallel computing (in both explicit and implicit modes), working with large objects as well as profiling.

Here you will find R packages for:

  • Explicit parallelism
  • Implicit parallelism
  • Grid computing
  • Hadoop
  • Random numbers
  • Resource managers and batch schedulers
  • Applications
  • GPUs
  • Large memory and out-of-memory data
  • Easier interfaces for Compiled code
  • Profiling tools

Despite HPC advances over the last decade, semantics remain an unsolved problem.

Perhaps raw computational capacity isn’t the key to semantics.

If not, some different approach awaits to be discovered.

I first saw this in a tweet by One R Tip a Day.

R 3.0 Launched

Thursday, April 4th, 2013

R 3.0 Launched by Ajay Ohri.

Ajay picks some highlights from the R 3.0 release and points you to the full news for a complete list.

The Ubuntu update servers don’t have R 3.0, yet. If you are in a hurry, see cran.r-project.org for a source list.

Using R For Statistical Analysis – Two Useful Videos

Saturday, March 30th, 2013

Using R For Statistical Analysis – Two Useful Videos by Bruce Berriman.

Bruce has uncovered two interesting videos on using R:

Introduction to R – A Brief Tutorial for R (Software for Statistical Analysis), and,

An Introduction to R for Data Mining by Joseph Rickert. (Recording of the webinar by the same name.)

Bruce has additional links that will be useful with the videos.

Enjoy!

Massive online data stream mining with R

Tuesday, March 26th, 2013

Massive online data stream mining with R

From the post:

A few weeks ago, the stream package has been released on CRAN. It allows to do real time analytics on data streams. This can be very usefull if you are working with large datasets which are already hard to put in RAM completely, let alone to build some statistical model on it without getting into RAM problems.

The stream package is currently focussed on clustering algorithms available in MOA (http://moa.cms.waikato.ac.nz/details/stream-clustering/) and also eases interfacing with some clustering already available in R which are suited for data stream clustering. Classification algorithms based on MOA are on the todo list. Current available clustering algorithms are BIRCH, CluStream, ClusTree, DBSCAN, DenStream, Hierarchical, Kmeans and Threshold Nearest Neighbor.

What if data were always encountered as a stream?

Could request a “re-streaming” of data but best to do analysis in one streaming.

How would that impact your notion of subject identity?

How would you compensate for information learned later in the stream?

A map of worldwide email traffic, created with R

Wednesday, March 13th, 2013

A map of worldwide email traffic, created with R by David Smith.

The Washing Post reports that by analyzing more than 10 million emails sent through the Yahoo! Mail service in 2012, a team of researchers used the R language to create a map of countries whose citizens email each other most frequently:

Worldwide Email traffic

Some discussion of Huntington’s Clash of Civilizations, but I have a different question:

If a map is a snapshot of a territory, can’t a later snapshot might show changes to the same territory?

Rather than debating Huntington and his money making but shallow view of the world and its history, why not intentionally broaden the communication network you see above?

A map, even a topic map, isn’t destiny, it’s a guide to finding a path to a new location or information.

Complex Graphics (lattice) [Division of Labor?]

Thursday, March 7th, 2013

Complex Graphics (lattice) by Dr. Tom Philippi.

From the webpage:

Clear communication of pattern via graphing data is no accident; a number of people spend their careers developing approaches based on human perceptions and cognitive science. While Edward Tufte and his “The Visual Display of Quantitative Information” is more widely known, William Cleveland’s “The Elements of Graphing Data” is perhaps more influential: reduced clutter, lowess or other smoothed curves through data, banking to 45° to emphasize variation in slopes, emphasizing variability as well as trends, paired plots to illustrate multiple components of the data such as fits and resuduals, and dot plots all come from his work.

It should come as no surprise that tools to implement principles from the field of graphical communication have been developed in R. The trellis package was originally developed to implement many of William Cleveland’s methods in S plus. Deepayan Sarkar wrote the lattice package as a port and extention of trellis graphs to R.

There is a second major package for advanced graphics in R; ggplot (now ggplot2), based on the Grammer of Graphics. Hadley Wickham wrote most of the ggplot2 package, as well as the book in the Use R! series on ggplot. My limited understanding of the Grammer of Graphics is that layers are specifications of data, mapping or transformations, geoms (geometric objects such as scatterplots or histograms), statistics (bin, smooth, density, etc.), and positions. Graphs are composed of detaults, one or more layers, scales, and coordinate systems. Again, each component has a default, so informative graphs may be produced by simple calls, but every detail may be tweaked if desired.

I do not recommend one package over the other. I started with lattice, and it may be a bit more complementary to analyses because of ease of recasting formulas from analytical functions to formulas for lattice graphs. ggplot2 may be more familiar to folks used to photoshop and other graphics and image processing tools, and it may be a better foundation for development over the next years. Both lattice and ggplot2 are built upon the grid graphics primatives, so a real wizard could compose graphics objects via a combination of both tools. This web page present lattice graphics solely because I have more experience with them and thus understand them better.

Very thorough coverage of the lattice package for R from Dr. Tom Philippi of the National Park Service.

Includes examples and further resources.

Visualization of data is a useful division of labor.

Machines label, sort, display data based on our instructions, but users/viewers determine the significance, if any, of the display.

I first saw this in a tweet from One R Tip a Day.

R and Hadoop Data Analysis – RHadoop

Wednesday, February 27th, 2013

R and Hadoop Data Analysis – RHadoop by Istvan Szegedi.

From the post:

R is a programming language and a software suite used for data analysis, statistical computing and data visualization. It is highly extensible and has object oriented features and strong graphical capabilities. At its heart R is an interpreted language and comes with a command line interpreter – available for Linux, Windows and Mac machines – but there are IDEs as well to support development like RStudio or JGR.

R and Hadoop can complement each other very well, they are a natural match in big data analytics and visualization. One of the most well-known R packages to support Hadoop functionalities is RHadoop that was developed by RevolutionAnalytics.

Nice introduction that walks you through installation and illustrates the use of RHadoop for analysis.

The ability to analyze “big data” is becoming commonplace.

The more that becomes a reality, the greater the burden on the user to critically evaluate the analysis that produced the “answers.”

Yes, repeatable analysis yielded answer X, but that just means applying the same assumptions to the same data gave the same result.

The same could be said about division by zero, although no one would write home about it.

R Bootcamp Materials!

Monday, February 25th, 2013

R Bootcamp Materials! by Jared Knowles.

From the post:

To train new employees at the Wisconsin Department of Public Instruction, I have developed a 2-3 day series of training modules on how to get work done in R. These modules cover everything from setting up and installing R and RStudio to creating reproducible analyses using the knitr package. There are also some experimental modules for introductions to basic computer programming, and a refresher course on statistics. I hope to improve both of these over time. 

I am happy to announce that all of these materials are available online, for free.

​The bootcamp covers the following topics:

  1. Introduction to R​ : History of R, R as a programming language, and features of R.
  2. Getting Data In :​ How to import data into R, manipulate, and manage multiple data objects. 
  3. Sorting and Reshaping Data :  ​Long to wide, wide to long, and everything in between!
  4. Cleaning Education Data​ : Includes material from the Strategic Data Project about how to implement common business rules in processing administrative data. 
  5. Regression and Basic Analytics in R​ : Using school mean test scores to do OLS regression and regression diagnostics — a real world example. 
  6. Visualizing Data : ​Harness the power of R’s data visualization packages to make compelling and informative visualizations.
  7. Exporting Your Work : ​Learn the knitr​ package, and how to export graphics, and create PDF reports.
  8. Advanced Topics :​ A potpourri of advanced features in R (by request)
  9. A Statistics Refresher : ​With interactive examples using shiny​ 
  10. ​Programming Principles : ​Tips and pointers about writing code. (Needs work)

The best part is, all of the materials are available online and free of charge! (Check out the R Bootcamp page). They are constantly evolving. We have done two R Bootcamps so far, and hope to do more. Each time the materials get a little better. ​

The R Bootcamp page enables you to download all the materials or view the modules separately.

If you already know R, pass it on.

Video: Data Mining with R

Sunday, February 17th, 2013

Video: Data Mining with R by David Smith.

From the post:

Yesterday's Introduction to R for Data Mining webinar was a record setter, with more than 2000 registrants and more than 700 attending the live session presented by Joe Rickert. If you missed it, I've embedded the video replay below, and Joe's slides (with links to many useful resources) are also available.

During the webinar, Joe demoed several examples of data mining with R packages, including rattle, caret, and RevoScaleR from Revolution R Enteprise. If you want to adapt Joe's demos for your own data mining ends, Joe has made his scripts and data files available for download on github.

Glad this showed up! I accidentally missed the webinar.

Enjoy!

Mapping the census…

Sunday, February 10th, 2013

Mapping the census: how one man produced a library for all by Simon Rogers.

From the post:

The census is an amazing resource – so full of data it’s hard to know where to begin. And increasingly where to begin is by putting together web-based interactives – like this one on language and this on transport patterns that we produced this month.

But one academic is taking everything back to basics – using some pretty sophisticated techniques. Alex Singleton, a lecturer in geographic information science (GIS) at Liverpool University has used R to create the open atlas project.

Singleton has basically produced a detailed mapping report – as a PDF and vectored images – on every one of the local authorities of England & Wales. He automated the process and has provided the code for readers to correct and do something with. In each report there are 391 pages, each with a map. That means, for the 354 local authorities in England & Wales, he has produced 127,466 maps.

Check out Simon’s post to see why Singleton has undertaken such a task.

Question: Was the 2011 census more “transparent,” or “useful” after Singleton’s work or before?

I would say more “transparent” after Singleton’s work.

You?

RDSTK: An R wrapper for the Data Science Toolkit API

Saturday, February 9th, 2013

RDSTK: An R wrapper for the Data Science Toolkit API

From the webpage:

This package provides an R interface to Pete Warden’s Data Science Toolkit. See www.datasciencetoolkit.org for more information. The source code for this package can be found at github.com/rtelmore/RDSTK Happy hacking!

If you don’t know the Data Science Toolkit, you should.

I first saw this at Pete Warden’s Five short links, February 8, 2013.

Introduction To R For Data Mining

Wednesday, February 6th, 2013

Introduction To R For Data Mining

Date: Thursday, February 14, 2013
Time: 10:00am – 11:00am Pacific Time
Presenter: Joseph Rickert, Technical Marketing Manager, Revolution Analytics

From the post:

We at Revolution Analytics are often asked “What is the best way to learn R?” While acknowledging that there may be as many effective learning styles as there are people we have identified three factors that greatly facilitate learning R. For a quick start:

  • Find a way of orienting yourself in the open source R world
  • Have a definite application area in mind
  • Set an initial goal of doing something useful and then build on it

In this webinar, we focus on data mining as the application area and show how anyone with just a basic knowledge of elementary data mining techniques can become immediately productive in R. We will:

  • Provide an orientation to R’s data mining resources
  • Show how to use the "point and click" open source data mining GUI, rattle, to perform the basic data mining functions of exploring and visualizing data, building classification models on training data sets, and using these models to classify new data.
  • Show the simple R commands to accomplish these same tasks without the GUI
  • Demonstrate how to build on these fundamental skills to gain further competence in R
  • Move away from using small test data sets and show with the same level of skill one could analyze some fairly large data sets with RevoScaleR

Data scientists and analysts using other statistical software as well as students who are new to data mining should come away with a plan for getting started with R.

You have to do something while waiting for your significant other to get off work on Valentine’s Day. ;-)

So long as you don’t try to watch the webinar on a smart phone at the restaurant, you should be ok.


Update: Video of the webinar: An Introduction to R for Data Mining.

Creating beautiful maps with R

Sunday, January 27th, 2013

Creating beautiful maps with R by David Smith.

From the post:

Spanish R user and solar energy lecturer Oscar Perpiñán Lamigueiro has written a detailed three-part guide to creating beautiful maps and choropleths (maps color-coded with regional data) using the R language. Motivated by the desire to recreate this graphic from the New York Times, Oscar describes how he creates similar high-quality maps using R.

David summarizes the three part series by Oscar Perpiñán Lamigueiro with links to parts, software and data.

No guarantees you will produce maps as good as the New York Times but it won’t be from a lack of instruction. ;-)

Maps in R: choropleth maps

Sunday, January 27th, 2013

Maps in R: choropleth maps by Max Marchi.

From the post:

This is the third article of the Maps in R series. After having shown how to draw a map without placing data on it and how to plot point data on a map, in this installment the creation of a choropleth map will be presented.

A choropleth map is a thematic map featuring regions colored or shaded according to the value assumed by the variable of interest in that particular region.

Another step towards becoming a map maker with R!

SPARQL with R in less than 5 minutes [Fire Data]

Saturday, January 26th, 2013

SPARQL with R in less than 5 minutes

From the post:

In this article we’ll get up and running on the Semantic Web in less than 5 minutes using SPARQL with R. We’ll begin with a brief introduction to the Semantic Web then cover some simple steps for downloading and analyzing government data via a SPARQL query with the SPARQL R package.

What is the Semantic Web?

To newcomers, the Semantic Web can sound mysterious and ominous. By most accounts, it’s the wave of the future, but it’s hard to pin down exactly what it is. This is in part because the Semantic Web has been evolving for some time but is just now beginning to take a recognizable shape (DuCharme 2011). Detailed definitions of the Semantic Web abound, but simply put, it is an attempt to structure the unstructured data on the Web and to formalize the standards that make that structure possible. In other words, it’s an attempt to create a data definition for the Web.

I will have to re-read Bob Ducharme’s “Learning SPARQL.” I didn’t realize the “Semantic Web” was beginning to “…take a recognizable shape.” After a decade of attempting to find an achievable agenda, it’s about time.

The varying interpretations of Semantic Web origin tales are quite amusing. In the first creation account, independent agents were going to schedule medical appointments and tennis matches for us. In the second account, our machine were going to reason across structured data to produce new insights. More recently, the vision is of a web of CMU Coke machines connected to the WWW, along with other devices. (The Internet of Things.)

I suppose the next version will be computers that can exchange information using the TCP/IP protocol and various standards, like HTML, for formatting documents. Plus some declaration that semantics will be handled in a future version, sufficiently far off to keep grant managers from fearing an end to the project.

The post is a good example of using R to use SPARQL and you will encounter data at SPARQL endpoints so it is a useful exercise.

The example data set is one of wildfires and acres burned per year, 1960-2008.

More interesting fire data sets can be found at: Fire Detection GIS Data.

Mapping that data by date, weather conditions/trends, known impact, would require coordination between diverse data sets.

R Is Not So Hard! A Tutorial

Tuesday, January 15th, 2013

David Lillis is writing a tutorial on R under the title: R Is Not So Hard! A Tutorial.

So far:

Part 1: Basic steps with R.

Part 2: Creation of a two variables and what can be done with them.

Part 3: Covers using a regression model.

Edd Dumbill calls out R by name in The future of programming.

Maps in R: Plotting data points on a map

Tuesday, January 15th, 2013

Maps in R: Plotting data points on a map by Max Marchi.

From the post:

In the introductory post of this series I showed how to plot empty maps in R.

Today I’ll begin to show how to add data to R maps. The topic of this post is the visualization of data points on a map.

Max continues this series with datasets from airports in Europe and demonstrates how to map the airports to geographic locations. He also represents the airports with icons that correspond to their traffic statistics.

Useful principles for any data set with events that can be plotted against geographic locations.

Parades, patrols, convoys, that sort of thing.

Using R with Hadoop [Webinar]

Monday, January 14th, 2013

Using R with Hadoop by David Smith.

From the post:

In two weeks (on January 24), Think Big Analytics' Jeffrey Breen will present a new webinar on using R with Hadoop. Here's the webinar description:

R and Hadoop are changing the way organizations manage and utilize big data. Think Big Analytics and Revolution Analytics are helping clients plan, build, test and implement innovative solutions based on the two technologies that allow clients to analyze data in new ways; exposing new insights for the business. Join us as Jeffrey Breen explains the core technology concepts and illustrates how to utilize R and Revolution Analytics’ RevoR in Hadoop environments.

Topics include:

  • How to use R and Hadoop
  • Hadoop streaming
  • Various R packages and RHadoop
  • Hive via JDBC/ODBC
  • Using Revolution’s RHadoop
  • Big data warehousing with R and Hive

You can register for the webinar at the link below. If you do plan to attend the live session (where you can ask Jeffrey questions), be sure to sign in early — we're limited to 1000 participants and there are already more than 1000 registrants. If you can't join the live session (or it's just not at a convenient time for you), signing up will also get you a link to the recorded replay and a download link for the slides as soon as they're available after the webinar.

Definitely one for the calendar!

R and Data Mining: Examples and Case Studies (Update)

Thursday, January 3rd, 2013

R and Data Mining: Examples and Case Studies by Yanchang Zhao.

The PDF version now includes chapters 7 and 9 (on which see: Book “R and Data Mining: Examples and Case Studies” on CRAN [blank chapters] and only the case study chapters are omitted.

You will also find the R code for the book and an “R Reference Card for Data Mining.”

Enjoy!

100 most read R posts for 2012 [No Data = No Topic Maps]

Wednesday, January 2nd, 2013

Tal Galili writes in 100 most read R posts for 2012 (stats from R-bloggers) – big data, visualization, data manipulation, and other languages:

R-bloggers.com is now three years young. The site is an (unofficial) online journal of the R statistical programming environment, written by bloggers who agreed to contribute their R articles to the site.

Last year, I posted on the top 24 R posts of 2011. In this post I wish to celebrate R-bloggers’ third birthmounth by sharing with you:

  1. Links to the top 100 most read R posts of 2012
  2. Statistics on “how well” R-bloggers did this year
  3. My wishlist for the R community for 2013 (blogging about R, guest posts, and sponsors)

A number of posts on R that may be useful in data mining to create topic maps.

I retain my interest in the theory/cutting edge side of things. But discovering more than 1/2 $trillion in untraceable payments in a government report is a thrill. The 560+ $Billion Shell Game

It’s untraceable for members of the public. I am certain insiders at the OMB can trace it quite easily.

Which makes you wonder why they are hoarding that information?

Will try to season the blog with more data into topic maps type posts in 2013.

Suggestions and comments on potential data sets for topic maps most welcome!

Why Do the New Orleans Saints Lose?…

Thursday, December 27th, 2012

Why Do the New Orleans Saints Lose? Data Visualization II by Nathan Lemoine.

I’m not a nationalist, apparatchik, school, state, profession, class, religion, language or development approach booster.

I must confess, however, I am a New Orleans Saints fan. Diversity, read other teams, are a necessary evil to give the Saints someone to beat. ;-)

An exercise you can repeat/expand with other teams (shudder), in other sports (shudder, shudder), to explore R and visualization of data.

What other stats/information would you want to incorporate/visualize?