Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 17, 2016

R Markdown

Filed under: R,R Markdown — Patrick Durusau @ 8:41 pm

R Markdown

From the webpage:

R Markdown provides an authoring framework for data science. You can use a single R Markdown file to both

  • save and execute code
  • generate high quality reports that can be shared with an audience

R Markdown documents are fully reproducible and support dozens of static and dynamic output formats. This 1-minute video provides a quick tour of what’s possible with R Markdown:

I started to omit this posting, reasoning that with LaTeX and XML, what other languages for composing documents are really necessary?

😉

I don’t suppose it will hurt to have a third language option for your authoring needs.

Enjoy!

Text [R, Scraping, Text]

Filed under: Data,R,Web Scrapers — Patrick Durusau @ 8:31 pm

Text by Amelia McNamara.

Covers “scraping, text, and timelines.”

Using R, focuses on scraping, works through some of “…Scott, Karthik, and Garrett’s useR tutorial.”

In case you don’t know the useR tutorial:

Also known as (AKA) Extracting data from the web APIs and beyond:

No matter what your domain of interest or expertise, the internet is a treasure trove of useful data that comes in many shapes, forms, and sizes, from beautifully documented fast APIs to data that need to be scraped from deep inside of 1990s html pages. In this 3 hour tutorial you will learn how to programmatically read in various types of web data from experts in the field (Founders of the rOpenSci project and the training lead of RStudio). By the end of the tutorial you will have a basic idea of how to wrap an R package around a standard API, extract common non-standard data formats, and scrape data into tidy data frames from web pages.

Covers other resources and materials.

Enjoy!

July 21, 2016

An analysis of Pokémon Go types, created with R

Filed under: Programming,R — Patrick Durusau @ 3:35 pm

An analysis of Pokémon Go types, created with R by David Smith.

From the post:

As anyone who has tried Pokémon Go recently is probably aware, Pokémon come in different types. A Pokémon’s type affects where and when it appears, and the types of attacks it is vulnerable to. Some types, like Normal, Water and Grass are common; others, like Fairy and Dragon are rare. Many Pokémon have two or more types.

To get a sense of the distribution of Pokémon types, Joshua Kunst used R to download data from the Pokémon API and created a treemap of all the Pokémon types (and for those with more than 1 type, the secondary type). Johnathon’s original used the 800+ Pokémon from the modern universe, but I used his R code to recreate the map for the 151 original Pokémon used in Pokémon Go.

If you or your dog:

via SIZZLE

need a break from Pokémon Go, check out this post!

You will get some much needed rest, polish up your R skills and perhaps learn something about the Pokémon API.

The Pokémon Go craze brings to mind the potential for the creation of alternative location-based games. Accessing locations which require steady nerves and social engineering skills. That definitely has potential.

Say a spy-vs-spy character at a location near a “secret” military base? 😉

July 2, 2016

Developing Expert p-Hacking Skills

Filed under: Peer Review,Psychology,Publishing,R,Statistics — Patrick Durusau @ 4:00 pm

Introducing the p-hacker app: Train your expert p-hacking skills by Ned Bicare.

Ned’s p-hacker app will be welcomed by everyone who publishes where p-values are accepted.

Publishers should mandate authors and reviewers to submit six p-hacker app results along with any draft that contains, or is a review of, p-values.

The p-hacker app results won’t improve a draft and/or review, but when compared to the draft, will improve the publication in which it might have appeared.

From the post:

My dear fellow scientists!

“If you torture the data long enough, it will confess.”

This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if it was wrong to do creative data analysis.

In fact, the art of creative data analysis has experienced despicable attacks over the last years. A small but annoyingly persistent group of second-stringers tries to denigrate our scientific achievements. They drag psychological science through the mire.

These people propagate stupid method repetitions; and what was once one of the supreme disciplines of scientific investigation – a creative data analysis of a data set – has been crippled to conducting an empty-headed step-by-step pre-registered analysis plan. (Come on: If I lay out the full analysis plan in a pre-registration, even an undergrad student can do the final analysis, right? Is that really the high-level scientific work we were trained for so hard?).

They broadcast in an annoying frequency that p-hacking leads to more significant results, and that researcher who use p-hacking have higher chances of getting things published.

What are the consequence of these findings? The answer is clear. Everybody should be equipped with these powerful tools of research enhancement!

The art of creative data analysis

Some researchers describe a performance-oriented data analysis as “data-dependent analysis”. We go one step further, and call this technique data-optimal analysis (DOA), as our goal is to produce the optimal, most significant outcome from a data set.

I developed an online app that allows to practice creative data analysis and how to polish your p-values. It’s primarily aimed at young researchers who do not have our level of expertise yet, but I guess even old hands might learn one or two new tricks! It’s called “The p-hacker” (please note that ‘hacker’ is meant in a very positive way here. You should think of the cool hackers who fight for world peace). You can use the app in teaching, or to practice p-hacking yourself.

Please test the app, and give me feedback! You can also send it to colleagues: http://shinyapps.org/apps/p-hacker.

Enjoy!

June 29, 2016

Computerworld’s advanced beginner’s guide to R

Filed under: Programming,R — Patrick Durusau @ 7:15 pm

Computerworld’s advanced beginner’s guide to R by David Smith.

From the post:

Many newcomers to R got their start learning the language with Computerworld’s Beginner’s Guide to R, a 6-part introduction to the basics of the language. Now, budding R users who want to take their skills to the next level have a new guide to help them: Computerword’s Advanced Beginner’s Guide to R. Written by Sharon Machlis, author of the prior Beginner’s guide and regular reporter of R news at Computerworld, this new 72-page guide dives into some trickier topics related to R: extracting data via API, data wrangling, and data visualization.

Well, what are you waiting for?

Either read it or pass it along!

Enjoy!

June 28, 2016

Integrated R labs for high school students

Filed under: Programming,R,Statistics,Teaching — Patrick Durusau @ 6:56 pm

Integrated R labs for high school students by Amelia McNamara.

From the webpage:

Amelia McNamara, James Molyneux, Terri Johnson

This looks like a very promising approach for capturing the interests of high school students in statistics and R.

From the larger project, Mobilize, curriculum page:

Mobilize centers its curricula around participatory sensing campaigns in which students use their mobile devices to collect and share data about their communities and their lives, and to analyze these data to gain a greater understanding about their world.Mobilize breaks barriers by teaching students to apply concepts and practices from computer science and statistics in order to learn science and mathematics. Mobilize is dynamic: each class collects its own data, and each class has the opportunity to make unique discoveries. We use mobile devices not as gimmicks to capture students’ attention, but as legitimate tools that bring scientific enquiry into our everyday lives.

Mobilize comprises four key curricula: Introduction to Data Science (IDS), Algebra I, Biology, and Mobilize Prime, all focused on preparing students to live in a data-driven world. The Mobilize curricula are a unique blend of computational and statistical thinking subject matter content that teaches students to think critically about and with data. The Mobilize curricula utilize innovative mobile technology to enhance math and science classroom learning. Mobilize brings “Big Data” into the classroom in the form of participatory sensing, a hands-on method in which students use mobile devices to collect data about their lives and community, then use Mobilize Visualization tools to analyze and interpret the data.

I like the approach of having the student collect their own and process their own data. If they learn to question their own data and processes, hopefully they will ask questions about data processing results presented as “facts.” (Since 2016 is a presidential election year in the United States, questioning claimed data results is especially important.)

Enjoy!

June 9, 2016

ggplot2 – Elegant Graphics for Data Analysis – At Last Call

Filed under: Ggplot2,Graphics,R — Patrick Durusau @ 4:53 pm

ggplot2 – Elegant Graphics for Data Analysis by Hadley Wickham.

Hadley tweeted today that “ggplot2” is still up but will be removed after publication.

If you want/need a digital copy, now would be a good time to acquire one.

May 22, 2016

Modeling data with functional programming – State based systems

Filed under: Functional Programming,R — Patrick Durusau @ 9:06 pm

Modeling data with functional programming – State based systems by Brian Lee Yung Rowe.

Brian has just released chapter 8 of his Modeling data with functional programming in R, State based systems.

BTW, Brian mentions that his editor is looking for more proof reviewers.

Enjoy!

May 5, 2016

Efficient R programming

Filed under: Programming,R — Patrick Durusau @ 9:24 am

Efficient R programming by Colin Gillespie and Robin Lovelace.

From the present text of Chapter 2:

An efficient computer set-up is analogous to a well-tuned vehicle: its components work in harmony, it is well-serviced, and it is fast. This chapter describes the software decisions that will enable a productive workflow. Starting with the basics and moving to progressively more advanced topics, we explore how the operating system, R version, startup files and IDE can make your R work faster (though IDE could be seen as basic need for efficient programming). Ensuring correct configuration of these elements will have knock-on benefits in many aspects of your R workflow. That’s why we cover them at this early stage (hardware, the other fundamental consideration, is covered in the next chapter). By the end of this chapter you should understand how to set-up your computer and R installation (skip to section 2.3 if R is not already installed on your computer) for optimal computational and programmer efficiency. It covers the following topics:

  • R and the operating systems: system monitoring on Linux, Mac and Windows
  • R version: how to keep your base R installation and packages up-to-date
  • R start-up: how and why to adjust your .Rprofile and .Renviron files
  • RStudio: an integrated development environment (IDE) to boost your programming productivity
  • BLAS and alternative R interpreters: looks at ways to make R faster

For lazy readers, and to provide a taster of what’s to come, we begin with our ‘top 5’ tips for an efficient R set-up. It is important to understand that efficient programming is not simply the result of following a recipe of tips: understanding is vital for knowing when to use a memorised solution to a problem and when to go back to first principles. Thinking about and understanding R in depth, e.g. by reading this chapter carefully, will make efficiency second nature in your R workflow.

Nope, go see Chapter 2 if you want the top 5 tips for efficient R set-up.

The text and code are being developed at the website and the authors welcome “pull requests and general comments.”

Don’t be shy!

April 10, 2016

NSA Grade – Network Visualization with Gephi

Filed under: Gephi,Networks,R,Visualization — Patrick Durusau @ 5:07 pm

Network Visualization with Gephi by Katya Ognyanova.

It’s not possible to cover Gephi in sixteen (16) pages but you will wear out more than one printed copy of these sixteen (16) pages as you become experienced with Gephi.

This version is from a Gephi workshop at Sunbelt 2016.

Katya‘s homepage offers a wealth of network visualization posts and extensive use of R.

Follow her at @Ognyanova.

PS: Gephi equals or exceeds visualization capabilities in use by the NSA, depending upon your skill as an analyst and the quality of the available data.

April 6, 2016

Advanced Data Mining with Weka – Starts 25 April 2016

Filed under: Machine Learning,Python,R,Spark,Weka — Patrick Durusau @ 4:43 pm

Advanced Data Mining with Weka by Ian Witten.

From the webpage:

This course follows on from Data Mining with Weka and More Data Mining with Weka. It provides a deeper account of specialized data mining tools and techniques. Again the emphasis is on principles and practical data mining using Weka, rather than mathematical theory or advanced details of particular algorithms. Students will analyse time series data, mine data streams, use Weka to access other data mining packages including the popular R statistical computing language, script Weka in Python, and deploy it within a cluster computing framework. The course also includes case studies of applications such as classifying tweets, functional MRI data, image classification, and signal peptide prediction.

The syllabus: https://weka.waikato.ac.nz/advanceddataminingwithweka/assets/pdf/syllabus.pdf.

Advanced Data Mining with Weka is open for enrollment and starts 25 April 2016.

Five very intense weeks await!

Will you be there?

I first saw this in a tweet by Alyona Medelyan.

March 14, 2016

APL in R “The past isn’t dead. It isn’t even past.”*

Filed under: Arrays,Programming,R — Patrick Durusau @ 8:13 pm

APL in R by Jan de Leeuw and Masanao Yajima.

From the introduction:

APL was introduced by Iverson (1962). It is an array language, with many functions to manipulate multidimensional arrays. R also has multidimensional arrays, but not as many functions to work with them.

In R there are no scalars, there are vectors of length one. For a vector x in R we have dim(x) equal to NULL and length(x) > 0. For an array, including a matrix, we have length(dim(x)) > 0. APL is an array language, which means everything is an array. For each array both the shape ⍴A and the rank ⍴⍴A are defined. Scalars are arrays with shape equal to one, vectors are arrays with rank equal to one.

If you want to evaluate APL expressions using a traditional APL virtual keyboard, we recommend the nice webpage at ngn.github.io/apl/web/index.html. EliStudio at fastarray.appspot.com/default.html is essentially an APL interpreter running in a Qt GUI, using ascii symbols and symbol-pairs to replace traditional APL symbols (Chen and Ching (2013)). Eli does not have nested arrays. It does have ecc, which compiles eli to C.

In 1994 one of us coded most APL array operations in XLISP-STAT. The code is still available at gifi.stat.ucla.edu/apl.

Certain this will be useful for R programmers but more generally curious if there is a genealogy of functions across programming languages?

Enjoy!

*Apologies to William Faulkner.

February 24, 2016

Visualizing the Clinton Email Network in R

Filed under: Networks,R,Visualization — Patrick Durusau @ 5:04 pm

Visualizing the Clinton Email Network in R by Bob Rudis.

From the post:

This isn’t a post about politics. I do have opinions about the now infamous e-mail server (which will no doubt come out here), but when the WSJ folks made it possible to search the Clinton email releases I though it would be fun to get the data into R to show how well the igraph and ggnetwork packages could work together, and also show how to use svgPanZoom to make it a bit easier to poke around the resulting hairball network.

NOTE: There are a couple “Assignment” blocks in here. My Elements of Data Science students are no doubt following the blog by now so those are meant for you 🙂 Other intrepid readers can ignore them.

A great walk through on importing, analyzing, and visualizing any email archive, not just Hillary’s.

You will quickly find that “…connecting the dots…” isn’t as useful as the intelligence community would have you believe.

Yes, yes! There is a call to Papa John’s! Oh, that’s not a code name, that’s a pizza place. (Even suspected terrorists have to eat.)

Great to have the dots. Great to have connections. Not so great if that is all that you have.

I found a number of other interesting posts at Bob’s blog: http://rud.is/b/.

Including: Dairy-free Parker House Rolls! I bake fairly often so am susceptible to this sort of posting. Looks very good!

February 19, 2016

How I build up a ggplot2 figure [Class Response To ggplot2 criticism]

Filed under: Ggplot2,R,Visualization — Patrick Durusau @ 8:50 pm

How I build up a ggplot2 figure by John Muschelli.

From the post:

Recently, Jeff Leek at Simply Statistics discussed why he does not use ggplot2. He notes “The bottom line is for production graphics, any system requires work.” and describes a default plot that needs some work:

John responds to perceived issues with using ggplot2 by walking through each issue and providing you with examples of how to solve it.

That doesn’t mean that you will switch to ggplot2, but it does mean you will be better informed of your options.

An example to be copied!

February 17, 2016

Rectal and Other Data

Filed under: R,Visualization — Patrick Durusau @ 3:35 pm

Hadley Wickham has posted neiss:

The neiss package provides access to all data (2009-2014) from the National Electronic Injury Surveillance System, which is a sample of all accidents reported to emergency rooms in the US.

You will recall this is the data set used by Nathan Yau in NSFW: Million to One Shot, Doc,, an analysis of rectal injuries.

A lack of features in the data prevents some types of analysis, such as the type of objects plotted as a function of weight, for example.

I’m sure there are other patterns, seasonal?, that you can derive from the data.

Enjoy!

PS: R library.

February 16, 2016

Katia – rape screening in R

Filed under: Image Processing,Image Recognition,R — Patrick Durusau @ 8:54 pm

Katia – rape screening in R

From the webpage:

It’s Not Enough to Condemn Violence Against Women. We Need to End It.

All 12 innocent female victims above were atrociously killed, sexually assaulted, or registered missing after meeting strangers on mainstream dating, personals, classifieds, or social networking services.

INTRODUCTION TO THE KATIA RAPE SCREEN

Those 12 beautiful faces in the gallery above, are our sisters and daughters. Looking at their pictures is like looking through a tiny pinhole onto an unprecedented rape and domestic violence crisis that is destroying the American family unit.

Verified by science, the KATIA rape screen, coded in the computer programming language, R, can provably stop a woman from ever meeting her attacker.

The technology is named after a RAINN-counseled first degree aggravated rape survivor named Katia.

It is based on the work of a Google engineer from the Reverse Image Search project and a RAINN (Rape, Abuse & Incest National Network) counselor, with a clinical background in mathematical statistics, who has over a period of 15 years compiled a linguistic pattern analysis of the messages that rapists use to lure women online.

Learn more about the science behind Katia.

This project is taking concrete steps to reduce violence against women.

What more is there to say?

February 15, 2016

networkD3: D3 JavaScript Network Graphs from R

Filed under: D3,Graphs,Javascript,Networks,R — Patrick Durusau @ 5:41 pm

networkD3: D3 JavaScript Network Graphs from R by Christopher Gandrud, JJ Allaire, & Kent Russell.

From the post:

This is a port of Christopher Gandrud’s R package d3Network for creating D3 network graphs to the htmlwidgets framework. The htmlwidgets framework greatly simplifies the package’s syntax for exporting the graphs, improves integration with RStudio’s Viewer Pane, RMarkdown, and Shiny web apps. See below for examples.

It currently supports three types of network graphs:

I haven’t compared this to GraphViz but the Sankey diagram option is impressive!

February 12, 2016

Tufte in R

Filed under: R,Visualization — Patrick Durusau @ 7:41 pm

Tufte in R by Lukasz Piwek.

From the post:

The idea behind Tufte in R is to use R – the most powerful open-source statistical programming language – to replicate excellent visualisation practices developed by Edward Tufte. It’s not a novel approach – there are plenty of excellent R functions and related packages wrote by people who have much more expertise in programming than myself. I simply collect those resources in one place in an accessible and replicable format, adding a few bits of my own coding discoveries.

Piwek says his idea isn’t novel but I am sure this will be of interest to both R and Tufte fans!

Is anyone else working through the Tufte volumes in R or Processing?

Those would be great projects to have bookmarked.

February 10, 2016

Build your own neural network classifier in R

Filed under: Classifier,Neural Networks,R — Patrick Durusau @ 5:14 pm

Build your own neural network classifier in R by Jun Ma.

From the post:

Image classification is one important field in Computer Vision, not only because so many applications are associated with it, but also a lot of Computer Vision problems can be effectively reduced to image classification. The state of art tool in image classification is Convolutional Neural Network (CNN). In this article, I am going to write a simple Neural Network with 2 layers (fully connected). First, I will train it to classify a set of 4-class 2D data and visualize the decision boundary. Second, I am going to train my NN with the famous MNIST data (you can download it here: https://www.kaggle.com/c/digit-recognizer/download/train.csv) and see its performance. The first part is inspired by CS 231n course offered by Stanford: http://cs231n.github.io/, which is taught in Python.

One suggestion, based on some unrelated reading, don’t copy-n-paste the code.

Key in the code so you will get accustomed to your typical typing mistakes, which are no doubt different from mine!

Plus you will develop muscle memory in your fingers and code will either “look right” or not.

Enjoy!

PS: For R, Jun’s blog looks like one you need to start following!

February 8, 2016

Data from the World Health Organization API

Filed under: Medical Informatics,R,Visualization — Patrick Durusau @ 11:28 am

Data from the World Health Organization API by Peter’s stats stuff – R.

From the post:

Eric Persson released yesterday a new WHO R package which allows easy access to the World Health Organization’s data API. He’s also done a nice vignette introducing its use.

I had a play and found it was easy access to some interesting data. Some time down the track I might do a comparison of this with other sources, the most obvious being the World Bank’s World Development Indicators, to identify relative advantages – there’s a lot of duplication of course. It’s a nice problem to have, too much data that’s too easy to get hold of. I wish we’d had that problem when I studied aid and development last century – I vividly remember re-keying numbers from almanac-like hard copy publications, and pleased we were to have them too!

Here’s a plot showing country-level relationships between the latest data of three indicators – access to contraception, adolescent fertility, and infant mortality – that help track the Millennium Development Goals.

With visualizations and R code!

A nice way to start off your data mining week!

Enjoy!

I first saw this in a tweet by Christophe Lalanne.

January 13, 2016

Using ‘R’ for betting analysis [Data Science For The Rest Of Us]

Filed under: Data Analysis,R — Patrick Durusau @ 4:45 pm

Using ‘R’ for betting analysis by Minrio Mella.

From the post:

Gaining an edge in betting often boils down to intelligent data analysis, but faced with daunting amounts of data it can be hard to know where to start. If this sounds familiar, R – an increasingly popular statistical programming language widely used for data analysis – could be just what you’re looking for.

What is R?

R is a statistical programming language that is used to visualize and analyse data. Okay, this sounds a little intimidating but actually it isn’t as scary as it may appear. Its creators – two professors from New Zealand – wanted an intuitive statistical platform that their students could use to slice and dice data and create interesting visual representation like 3D graphs.

Given its relative simplicity but endless scope for applications (packages) R has steadily gained momentum amongst the world’s brightest statisticians and data scientists. Facebook use R for statistical analysis of status updates and many of the complex word clouds you might see online are powered by R.

There are now thousands of user created libraries to enhance R functionality and given how much successful betting boils down to effective data analysis, packages are being created to perform betting related analysis and strategies.

On a day when the PowerBall lottery has a jackpot of $1.5 billion, a post on betting analysis is appropriate.

Especially since most data science articles are about sentiment analysis, recommendations, all of which is great if you are marketing videos in a streaming environment across multiple media channels.

At home? Not so much.

Mirio’s introduction to R walks you through getting R installed along with a library for Pinnacle Sports for odds conversion.

No guarantees on your betting performance but having a subject you are interested in, betting, makes it much more likely you will learn R.

Enjoy!

January 6, 2016

Ggplot2 Quickref

Filed under: Charts,Ggplot2,R — Patrick Durusau @ 5:04 pm

Ggplot2 Quickref by Selva Prabhakaran.

If you use ggplot2, map this to a “hot” key on your keyboard.

Enjoy!

January 5, 2016

Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction [Gatekeeping]

Filed under: History,R,Text Mining — Patrick Durusau @ 7:43 pm

Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction by Cameron Blevins and Lincoln Mullen.

Abstract:

This article describes a new method for inferring the gender of personal names using large historical datasets. In contrast to existing methods of gender prediction that treat names as if they are timelessly associated with one gender, this method uses a historical approach that takes into account how naming practices change over time. It uses historical data to measure the likelihood that a name was associated with a particular gender based on the time or place under study. This approach generates more accurate results for sources that encompass changing periods of time, providing digital humanities scholars with a tool to estimate the gender of names across large textual collections. The article first describes the methodology as implemented in the gender package for the R programming language. It goes on to apply the method to a case study in which we examine gender and gatekeeping in the American historical profession over the past half-century. The gender package illustrates the importance of incorporating historical approaches into computer science and related fields.

An excellent introduction to the gender package for R, historical grounding of the detection of gender by name, with the highlight of the article being the application of this technique to professional literature in American history.

It isn’t uncommon to find statistical techniques applied to texts whose authors and editors are beyond the reach of any critic or criticism.

It is less than common to find statistical techniques applied to extant members of a profession.

Kudos to both Blevins and Mullen for refinement the detection of gender and for applying that refinement publishing in American history.

January 4, 2016

rOpenSci (updated tutorials) [Learn Something, Write Something]

Filed under: Open Data,Open Science,R — Patrick Durusau @ 9:47 pm

rOpenSci has updated 16 of its tutorials!

More are on the way!

Need a detailed walk through of what our packages allow you to do? Click on a package below, quickly install it and follow along. We’re in the process of updating existing package tutorials and adding several more in the coming weeks. If you find any bugs or have comments, drop a note in the comments section or send us an email. If a tutorial is available in multiple languages we indicate that with badges, e.g., (English) (Português).

  • alm    Article-level metrics
  • antweb    AntWeb data
  • aRxiv    Access to arXiv text
  • bold    Barcode data
  • ecoengine    Biodiversity data
  • ecoretriever    Retrieve ecological datasets
  • elastic    Elasticsearch R client
  • fulltext    Text mining client
  • geojsonio    GeoJSON/TopoJSON I/O
  • gistr    Work w/ GitHub Gists
  • internetarchive    Internet Archive client
  • lawn    Geospatial Analysis
  • musemeta    Scrape museum metadata
  • rAltmetric    Altmetric.com client
  • rbison    Biodiversity data from USGS
  • rcrossref    Crossref client
  • rebird    eBird client
  • rentrez    Entrez client
  • rerddap    ERDDAP client
  • rfisheries    OpenFisheries.org client
  • rgbif    GBIF biodiversity data
  • rinat    Inaturalist data
  • RNeXML    Create/consume NeXML
  • rnoaa    Client for many NOAA datasets
  • rplos    PLOS text mining
  • rsnps    SNP data access
  • rvertnet    VertNet.org biodiversity data
  • rWBclimate    World Bank Climate data
  • solr    SOLR database client
  • spocc    Biodiversity data one stop shop
  • taxize    Taxonomic toolbelt
  • traits    Trait data
  • treebase

     

        Treebase data
  • wellknown    Well-known text <-> GeoJSON
  • More tutorials on the way.

Good documentation is hard to come by and good tutorials even more so.

Yet, here are rOpenSci you will find thirty-four (34) tutorials and more on the way.

Let’s answer that moronic security saying: See Something, Say Something, with:

Learn Something, Write Something.

December 29, 2015

Great R packages for data import, wrangling & visualization [+ XQuery]

Filed under: Data Mining,R,Visualization,XQuery — Patrick Durusau @ 5:37 pm

Great R packages for data import, wrangling & visualization by Sharon Machlis.

From the post:

One of the great things about R is the thousands of packages users have written to solve specific problems in various disciplines — analyzing everything from weather or financial data to the human genome — not to mention analyzing computer security-breach data.

Some tasks are common to almost all users, though, regardless of subject area: data import, data wrangling and data visualization. The table below show my favorite go-to packages for one of these three tasks (plus a few miscellaneous ones tossed in). The package names in the table are clickable if you want more information. To find out more about a package once you’ve installed it, type help(package = "packagename") in your R console (of course substituting the actual package name ).

Forty-seven (47) “favorites” sounds a bit on the high side but some people have more than one “favorite” ice cream, or obsession. 😉

You know how I feel about sort-order and I could not detect an obvious one in Sharon’s listing.

So, I extracted the package links/name plus the short description into a new table:

car data wrangling
choroplethr mapping
data.table data wrangling, data analysis
devtools package development, package installation
downloader data acquisition
dplyr data wrangling, data analysis
DT data display
dygraphs data visualization
editR data display
fitbitScraper misc
foreach data wrangling
ggplot2 data visualization
gmodels data wrangling, data analysis
googlesheets data import, data export
googleVis data visualization
installr misc
jsonlite data import, data wrangling
knitr data display
leaflet mapping
listviewer data display, data wrangling
lubridate data wrangling
metricsgraphics data visualization
openxlsx misc
plotly data visualization
plotly data visualization
plyr data wrangling
psych data analysis
quantmod data import, data visualization, data analysis
rcdimple data visualization
RColorBrewer data visualization
readr data import
readxl data import
reshape2 data wrangling
rga Web analytics
rio data import, data export
RMySQL data import
roxygen2 package development
RSiteCatalyst Web analytics
rvest data import, web scraping
scales data wrangling
shiny data visualization
sqldf data wrangling, data analysis
stringr data wrangling
tidyr data wrangling
tmap mapping
XML data import, data wrangling
zoo data wrangling, data analysis

Enjoy!


I want to use XQuery at least once a day in 2016 on my blog. To keep myself honest, I will be posting any XQuery I use.

To sort and extract two of the columns from Mary’s table, I copied the table to a separate file and ran this XQuery:

  1. xquery version “1.0”;
  2. <html>
  3. <table>{
  4. for $row in doc(“/home/patrick/working/favorite-R-packages.xml”)/table/tr
  5. order by lower-case(string($row/td[1]/a))
  6. return <tr>{$row/td[1]} {$row/td[2]}</tr>
  7. }</table>
  8. </html>

One of the nifty aspects of XQuery is that you can sort, as on line 5, in all lower-case on the first <td> element, while returning the same element as written in the original table. Which gives better (IMHO) sort order than UPPERCASE followed by lowercase.

This same technique should make you the master of any simple tables you encounter on the web.

PS: You should always acknowledge the source of your data and the original author.

I first saw Sharon’s list in a tweet by Christophe Lalanne.

December 26, 2015

Fun with facets in ggplot2 2.0

Filed under: Facets,Ggplot2,R — Patrick Durusau @ 1:22 pm

Fun with facets in ggplot2 2.0 by Bob Rudis.

From the post:

ggplot2 2.0 provides new facet labeling options that make it possible to create beautiful small multiple plots or panel charts without resorting to icky grob manipulation.

Very appropriate for this year in Georgia (US) at any rate. Facets are used to display temperature by year and temperature versus Kwh by year.

The high today, 26th of December, 2015, is projected to be 77°F.

Sigh, that’s just not December weather.

December 21, 2015

ggplot 2.0.0

Filed under: Ggplot2,Graphics,R,Visualization — Patrick Durusau @ 6:25 pm

ggplot 2.0.0 by Hadley Wickham.

From the post:

I’m very pleased to announce the release of ggplot2 2.0.0. I know I promised that there wouldn’t be any more updates, but while working on the 2nd edition of the ggplot2 book, I just couldn’t stop myself from fixing some long standing problems.

On the scale of ggplot2 releases, this one is huge with over one hundred fixes and improvements. This might break some of your existing code (although I’ve tried to minimise breakage as much as possible), but I hope the new features make up for any short term hassle. This blog post documents the most important changes:

  • ggplot2 now has an official extension mechanism.
  • There are a handful of new geoms, and updates to existing geoms.
  • The default appearance has been thoroughly tweaked so most plots should look better.
  • Facets have a much richer set of labelling options.
  • The documentation has been overhauled to be more helpful, and require less integration across multiple pages.
  • A number of older and less used features have been deprecated.

These are described in more detail below. See the release notes for a complete list of all changes.

It’s one thing to find an error in the statistics of a research paper.

It is quite another to visualize the error in a captivating way.

No guarantees for some random error but ggplot 2.0.0 is one of the right tools for such a job.

December 18, 2015

Buzzfeed uses R for Data Journalism

Filed under: Journalism,News,R,Reporting — Patrick Durusau @ 11:36 am

Buzzfeed uses R for Data Journalism by David Smith.

From the post:

Buzzfeed isn't just listicles and cat videos these days. Science journalist Peter Aldhous recently joined Buzzfeed's editorial team, after stints at Nature, Science and New Scientist magazines. He brings with him his data journalism expertise and R programming skills to tell compelling stories with data on the site. His stories, like this one on the rates of terrorism incidents in the USA, often include animated maps or interactive charts created with R. 

Data journalists and would be data journalists should be following the use of R and Python at Buzzfeed.

You don’t have to read Buzzfeed (I have difficulty with its concept of “news”), as David points out a way to follow all the Buzzfeed projects that make it to GitHub.

See David’s post for other great links.

Enjoy!

December 12, 2015

Fun with ddR: Using Distributed Data Structures in R [Your Holiday Quiet Spot]

Filed under: Distributed Computing,Distributed Systems,R — Patrick Durusau @ 5:52 pm

Fun with ddR: Using Distributed Data Structures in R by Edward Ma and Vishrut Gupta (Hewlett Packard Enterprise).

From the post:

A few weeks ago, we revealed ddR (Distributed Data-structures in R), an exciting new project started by R-Core, Hewlett Packard Enterprise, and others that provides a fresh new set of computational primitives for distributed and parallel computing in R. The package sets the seed for what may become a standardized and easy way to write parallel algorithms in R, regardless of the computational engine of choice.

In designing ddR, we wanted to keep things simple and familiar. We expose only a small number of new user functions that are very close in semantics and API to their R counterparts. You can read the introductory material about the package here. In this post, we show how to use ddR functions.

Imagine that you are trapped after an indeterminate holiday meal in the TV room where A Christmas Story is playing for the fourth time that day.

You are at the point of saying/doing something that will offend the living members of your spouses family and generations to come.

What can you do?

Surely your powers of concentration exceed those of bridge players who claim to not see naked people cavorting about during bridge games.

Pull up the ddR post on your smartphone, read it and jump to the documentation and/or example programs.

You will have to be woken out of your reverie and handed your coat when it is time to go.

Well, maybe not exactly but it beats the hell out of biting one of your smaller relatives.

December 4, 2015

3 ways to win “Practical Data Science with R”! (Contest ends December 12, 2015 at 11:59pm EST)

Filed under: Contest,Data Science,R — Patrick Durusau @ 5:25 pm

3 ways to win “Practical Data Science with R”!.

Renee is running a contest to give away three copies of “Practical Data Science with R” by Nina Zumel and John Mount!

You must enter on or before December 12, 2015 at 11:59pm EST.

Three ways to win, see Renee’s post for the details!

« Newer PostsOlder Posts »

Powered by WordPress