Archive for the ‘Programming’ Category

MIT License Wins Converts (some anyway)

Friday, September 22nd, 2017

Relicensing React, Jest, Flow, and Immutable.js by Adam Wolff.

From the post:

Next week, we are going to relicense our open source projects React, Jest, Flow, and Immutable.js under the MIT license. We’re relicensing these projects because React is the foundation of a broad ecosystem of open source software for the web, and we don’t want to hold back forward progress for nontechnical reasons.

This decision comes after several weeks of disappointment and uncertainty for our community. Although we still believe our BSD + Patents license provides some benefits to users of our projects, we acknowledge that we failed to decisively convince this community.

In the wake of uncertainty about our license, we know that many teams went through the process of selecting an alternative library to React. We’re sorry for the churn. We don’t expect to win these teams back by making this change, but we do want to leave the door open. Friendly cooperation and competition in this space pushes us all forward, and we want to participate fully.

This shift naturally raises questions about the rest of Facebook’s open source projects. Many of our popular projects will keep the BSD + Patents license for now. We’re evaluating those projects’ licenses too, but each project is different and alternative licensing options will depend on a variety of factors.

We’ll include the license updates with React 16’s release next week. We’ve been working on React 16 for over a year, and we’ve completely rewritten its internals in order to unlock powerful features that will benefit everyone building user interfaces at scale. We’ll share more soon about how we rewrote React, and we hope that our work will inspire developers everywhere, whether they use React or not. We’re looking forward to putting this license discussion behind us and getting back to what we care about most: shipping great products.

Since I bang on about Facebook‘s 24×7 censorship and shaping of your worldview, it’s only fair to mention when they make a good choice.

It in no way excuses or justifies their ongoing offenses against the public but it’s some evidence that decent people remain employed at Facebook.

With any luck, the decent insiders will wrest control of Facebook away from its government toadies and collaborators.

RStartHere

Monday, September 18th, 2017

RStartHere by Garrett Grolemund.

R packages organized by their role in data science:

This is very cool! Use and share!

@rstudio Cheatsheets Now B&W Printer Friendly

Saturday, September 9th, 2017

Mara Averick, @dataandme, tweets:

All the @rstudio Cheatsheets have been B&W printer-friendlier-ized

It’s a small thing but appreciated when documentation is B&W friendly.

PS: The @rstudio cheatsheets are also good examples layout and clarity.

The International Conference on Functional Programming – 2017

Tuesday, September 5th, 2017

The International Conference on Functional Programming – 2017 – Papers

If you are on the Gulf or East coast of the United States, take this opportunity to download papers to read following land fall of Irma.

You may not have Internet service but if you have printed several papers out as emergency preparedness, you won’t be at a loss for reading materials.

I’ve been in the impact zone of several hurricanes and while reading materials don’t make repairs go any faster, they do help pass the time.

Reinventing Wheels with No Wheel Experience

Friday, June 30th, 2017

Rob Graham, @ErrataRob, captured an essential truth when he tweeted:

Wheel re-invention is inherent every new programming language, every new library, and no doubt, nearly every new program.

How much “wheel experience” every programmer has across the breath of software vulnerabilities?

Hard to imagine meaningful numbers on the “wheel experience” of programmers in general but vulnerability reports make it clear either “wheel experience” is lacking or the lesson didn’t stick. Your call.

Vulnerabilities may occur in any release so standard practice is to check every release, however small. Have your results independently verified by trusted others.

PS: For the details on systemd, see: Sergey Bratus and the systemd thread.

You Are Not Google (Blasphemy I Know, But He Said It, Not Me)

Thursday, June 8th, 2017

You Are Not Google by Ozan Onay.

From the post:

Software engineers go crazy for the most ridiculous things. We like to think that we’re hyper-rational, but when we have to choose a technology, we end up in a kind of frenzy — bouncing from one person’s Hacker News comment to another’s blog post until, in a stupor, we float helplessly toward the brightest light and lay prone in front of it, oblivious to what we were looking for in the first place.

This is not how rational people make decisions, but it is how software engineers decide to use MapReduce.

Spoiler: Onay will also say you are not Amazon or LinkedIn.

Just so you know and can prepare for the ego shock.

Great read that invokes Poyla’s First Principle:


Understand the Problem

This seems so obvious that it is often not even mentioned, yet students are often stymied in their efforts to solve problems simply because they don’t understand it fully, or even in part. Polya taught teachers to ask students questions such as:

  • Do you understand all the words used in stating the problem?
  • What are you asked to find or show?
  • Can you restate the problem in your own words?
  • Can you think of a picture or a diagram that might help you understand the problem?
  • Is there enough information to enable you to find a solution?

Onay coins a mnemonic for you to apply and points to additional reading.

Enjoy!

PS: Caution: Understanding a problem can cast doubt on otherwise successful proposals for funding. Your call.

Copy-n-Paste Security Alert!

Wednesday, June 7th, 2017

Security: The Dangers Of Copying And Pasting R Code.

From the post:

Most of the time when we stumble across a code snippet online, we often blindly copy and paste it into the R console. I suspect almost everyone does this. After all, what’s the harm?

The post illustrates how innocent appearing R code can conceal unhappy surprises!

Concealment isn’t limited to R code.

Any CSS controlled display is capable of concealing code for you to copy-n-paste into a console, terminal window, script or program.

Endless possibilities for HTML pages/emails with code + a “little something extra.”

What are your copy-n-paste practices?

C Reference Manual (D.M. Richie, 1974)

Tuesday, May 23rd, 2017

C Reference Manual (D.M. Richie, 1974)

I mention the C Reference Manual, now forty-three (43) years old, as encouragement to write good documentation.

It may have a longer life than you ever expected!

For example, in 1974 Richie writes:

2.2 Identifier (Names)

An identifier is a sequence of letters and digits: the first character must be alphabetic.

Which we find replicated years later in ISO/IEC 8879 : 1986 (SGML):

4.198 name: A name token whose first character is a name start character.

4.201 name start character: A character that can begin a name: letters and others designated by the concrete syntax.

And in production [53]:


name start character =
LC Letter \
UC Letter \
LCNMSTRT \
UCNMSTRT

Where Figure 1 of 9.2.1 SGML Character defines LC Letter as a-z, UC Letter as A-Z, LCNMSTRT as (none), UCNMSTRT as (none), in the concrete syntax.

And in 1997, the letter vs. digit distinction, finds its way into Extensible Markup Language (XML) 1.0.


[4] NameChar ::= Letter | Digit | ‘.’ | ‘-‘ | ‘_’ | ‘:’ | CombiningChar | Extender
[5] Name ::= (Letter | ‘_’ | ‘:’) (NameChar)*

“Letter” is a link to a production referencing all the qualifying Unicode characters which is too long to include here.

What started off as an arbitrary choice, “alphabetic” characters as name start characters in 1974, is picked up some 12 years later (1986) in ISO/IEC 8879 (SGML), both of which were bound by a restricted character set.

When the opportunity came to abandon the letter versus digit distinction in name start characters (XML 1.0), the result is a larger character repertoire for name start characters, but digits continue as second-class citizens.

Can you point to an explanation why Richie preferred alphabetic characters over digits for name start characters?

ARM Releases Machine Readable Architecture Specification (Intel?)

Saturday, April 22nd, 2017

ARM Releases Machine Readable Architecture Specification by Alastair Reid.

From the post:

Today ARM released version 8.2 of the ARM v8-A processor specification in machine readable form. This specification describes almost all of the architecture: instructions, page table walks, taking interrupts, taking synchronous exceptions such as page faults, taking asynchronous exceptions such as bus faults, user mode, system mode, hypervisor mode, secure mode, debug mode. It details all the instruction formats and system register formats. The semantics is written in ARM’s ASL Specification Language so it is all executable and has been tested very thoroughly using the same architecture conformance tests that ARM uses to test its processors (See my paper “Trustworthy Specifications of ARM v8-A and v8-M System Level Architecture”.)

The specification is being released in three sets of XML files:

  • The System Register Specification consists of an XML file for each system register in the architecture. For each register, the XML details all the fields within the register, how to access the register and which privilege levels can access the register.
  • The AArch64 Specification consists of an XML file for each instruction in the 64-bit architecture. For each instruction, there is the encoding diagram for the instruction, ASL code for decoding the instruction, ASL code for executing the instruction and any supporting code needed to execute the instruction and the decode tree for finding the instruction corresponding to a given bit-pattern. This also contains the ASL code for the system architecture: page table walks, exceptions, debug, etc.
  • The AArch32 Specification is similar to the AArch64 specification: it contains encoding diagrams, decode trees, decode/execute ASL code and supporting ASL code.

Alastair provides starting points for use of this material by outlining his prior uses of the same.

Raises the question why an equivalent machine readable data set isn’t available for Intel® 64 and IA-32 Architectures? (PDF manuals)

The data is there, but not in a machine readable format.

Anyone know why Intel doesn’t provide the same convenience?

Build Your Own Text Editor (“make changes, see the results”)

Thursday, April 6th, 2017

Build Your Own Text Editor by Jeremy Ruten.

From the webpage:

Welcome! This is an instruction booklet that shows you how to build a text editor in C.

The text editor is antirez’s kilo, with some changes. It’s about 1000 lines of C in a single file with no dependencies, and it implements all the basic features you expect in a minimal editor, as well as syntax highlighting and a search feature.

This booklet walks you through building the editor in 184 steps. Each step, you’ll add, change, or remove a few lines of code. Most steps, you’ll be able to observe the changes you made by compiling and running the program immediately afterwards.

I explain each step along the way, sometimes in a lot of detail. Free free to skim or skip the prose, as the main point of this is that you are going to build a text editor from scratch! Anything you learn along the way is bonus, and there’s plenty to learn just from typing in the changes to the code and observing the results.

See the appendices for more information on the tutorial itself (including what to do if you get stuck, and where to get help).

If you’re ready to begin, then go to chapter 1!
… (emphasis in original)

I mention this tutorial because:

  • It’s an opportunity to see editor issues “from the other side.”
  • Practice reading and understanding C
  • I like the “make changes, see the results” approach

Of the three, the “make changes, see the results” approach is probably the most important.

Examples that “just work” are great and I look for them all the time. 😉

But imagine examples that take you down the false leads and traps, allowing you to observe the cryptic error messages from XQuery for example. You do work your way to a solution but are not given one out of the box.

“Cryptic” is probably overly generous with regard to XQuery error messages. Suggestions of a better one word term, usable in mixed company for them?

Eroding the Presumption of Innocence in USA

Saturday, April 1st, 2017

You may be laboring under the false impression that people charged with crimes in the USA are presumed innocence until proven guilty beyond a reasonable doubt in a court of law.

I regret to inform you that presumption is being eroded away.

Kevin Poulsen has a compelling read in FBI Arrests Hacker Who Hacked No One about the case of Taylor Huddleston was arraigned on March 31, 2017 in the Federal District Court for the Eastern District of Virginia, docket number: 1:2017 cr 34.

Kevin’s crime? He wrote a piece of software that has legitimate uses, such as sysadmins trouble shooting a user’s computer remotely. That tool was pirated by others and put to criminal use. Now the government wants to take his freedom and his home.

Compare Kevin’s post to the indictment, which I have uploaded for your reading pleasure. There is a serious disconnect between Poulsen’s post and the indictment, as the government makes much out of a lot of hand waving and very few specifics.

Taylor did obtain a Release on Personal Recognizance or Unsecured Bond, which makes you think the judge isn’t overly impressed with the government’s case.

I would have jumped at such a release as well but I find it disturbing, from a presumption of innocence perspective, that the judge also required:

My transcription:

No access to internet through any computer or other data capable device including smart phones

Remember that Taylor Huddleston is presumed innocence so how is that consistent with prohibiting him from a lawful activity, such as access to the internet?

Simple response: It’s not.

As I said, I would have jumped at the chance for a release on personal recognizance too. Judges are eroding the presumption of innocence with the promise of temporary freedom.

Wishing Huddleson the best of luck and that this erosion of the presumption of innocence won’t go unnoticed/unchallenged.

Notes to (NUS) Computer Science Freshmen…

Monday, March 13th, 2017

Notes to (NUS) Computer Science Freshmen, From The Future

From the intro:

Early into the AY12/13 academic year, Prof Tay Yong Chiang organized a supper for Computer Science freshmen at Tembusu College. The bunch of seniors who were gathered there put together a document for NUS computing freshmen. This is that document.

Feel free to create a pull request to edit or add to it, and share it with other freshmen you know.

There is one sad note:


The Art of Computer Programming (a review of everything in Computer Science; pretty much nobody, save Knuth, has finished reading this)

When you think about the amount of time Knuth has spent researching, writing and editing The Art of Computer Programming (TAOCP), it doesn’t sound unreasonable to expect others, a significant number of others, to have read it.

Any online reading groups focused on TAOCP?

Software Is Politics [Proudhon’s Response]

Sunday, February 19th, 2017

Software Is Politics by Richard Pope.

From the post:

If you work in software or design in 2016, you also work in politics. The inability of Facebook’s user interface, until recently, to distinguish between real and fake news is the most blatant example. But there are subtler examples all around us, from connected devices that threaten our privacy to ads targeting men for high-paying jobs.

Digital services wield power. They can’t be designed simply for ease of use—the goal at most companies and organizations. Digital services must be understandable, accountable, and trusted. It is now a commercial as well as a moral imperative.

DESIGN IS POLITICAL

Power and politics are not easy topics for many designers to chew on, but they’re foundational to my career. I worked for the U.K.’s Government Digital Service for five years, part of the team that delivered Gov.uk. I set up the labs team at Consumer Focus, the U.K.’s statutory consumer rights organization, building tools to empower consumers. In 2007, I cofounded the Rewired State series of hackdays that aimed to get developers and designers interested in making government better. I’ve also worked at various commercial startups including moo.com and ScraperWiki.

The last piece of work I did in government was on a conceptual framework for the idea of government as a platform. “Government as a platform” is the idea of treating government like a software stack to make it possible to build well-designed services for people. The work involved sketching some ideas out in code, not to try and solve them upfront, but to try and identify where some of the hard design problems were going to be. Things like: What might be required to enable an end-to-end commercial service for buying a house? Or what would it take for local authorities to be able to quickly spin up a new service for providing parking permits?

With this kind of thinking, you rapidly get into questions of power: What should the structure of government be? Should there be a minister responsible for online payment? Secretary of state for open standards? What does it do to people’s understanding of their government?

Which cuts to the heart of the problem in software design today: How do we build stuff that people can understand and trust, and is accountable when things go wrong? How do we design for recourse?
… (emphasis in original)

The flaw in Pope’s desire for applications are “…accountable, understandable, and trusted…” by all, is that it conceals the choosing of sides.

Or as Craig Gurian in Equally free to sleep under the bridge illustrates by quoting Anatole France:

“In its majestic equality, the law forbids rich and poor alike to sleep under bridges, beg in the streets and steal loaves of bread.”

Applications that are “…accountable, understandable, and trusted…” will have silently chosen sides just as the law does now.

Better to admit to and make explicit the choices of who serves and who eats in the design of applications. At least then disparities are not smothered by the pretense of equality.

Or as Proudhon would say:

What is equality before the law without equality of fortunes? A balance with false weights.

Speak not of “…accountable, understandable, and trusted…” applications in the abstract but for and against who?

Fundamentals of Functional Programming (email lessons)

Tuesday, February 14th, 2017

Learn the fundamentals of functional programming — for free, in your inbox by Preethi Kasireddy.

From the post:

If you’re a software developer, you’ve probably noticed a growing trend: software applications keep getting more complicated.

It falls on our shoulders as developers to build, test, maintain, and scale these complex systems. To do so, we have to create well-structured code that is easy to understand, write, debug, reuse, and maintain.

But actually writing programs like this requires much more than just practice and patience.

In my upcoming course, Learning Functional JavaScript the Right Way, I’ll teach you how to use functional programming to create well-structured code.

But before jumping into that course (and I hope you will!), there’s an important prerequisite: building a strong foundation in the underlying principles of functional programming.

So I’ve created a new free email course that will take you on a fun and exploratory journey into understanding some of these core principles.

Let’s take a look at what the email course will cover, so you can decide how it fits into your programming education.
…(emphasis in original)

I haven’t taken an email oriented course in quite some time so interested to see how this contrasts with video lectures, etc.

Enjoy!

A Data Driven Exploration of Kung Fu Films

Tuesday, January 24th, 2017

A Data Driven Exploration of Kung Fu Films by Jim Vallandingham.

From the post:

Recently, I’ve been a bit caught up in old Kung Fu movies. Shorting any technical explorations, I have instead been diving head-first into any and all Netflix accessible martial arts masterpieces from the 70’s and 80’s.

While I’ve definitely been enjoying the films, I realized recently that I had little context for the movies I was watching. I wondered if some films, like our latest favorite, Executioners from Shaolin, could be enjoyed even more, with better understanding of the context in which these films exist in the Kung Fu universe.

So, I began a data driven quest for truth and understanding (or at least a semi-interesting dataset to explore) of all Shaw Brothers Kung Fu movies ever made!

If you’re not familiar with the genre, here is a three-minute final fight collage from YouTube:

When I saw the title, I was hopeful that Jim had captured the choreography of the movies for comparison.

No such luck! 😉

That would be an extremely difficult and labor intensive task.

Just in case you are curious, there is a Dance Notation Bureau with extensive resources should you decide to capture one or more Kung Fu films in notation.

Or try Notation Reloaded: eXtensible Dance Scripting Notation by Matthew Gough.

A search using “xml dance notation” produces a number of interesting resources.

Three More Reasons To Learn R

Friday, January 6th, 2017

Three reasons to learn R today by David Smith.

From the post:

If you're just getting started with data science, the Sharp Sight Labs blog argues that R is the best data science language to learn today.

The blog post gives several detailed reasons, but the main arguments are:

  1. R is an extremely popular (arguably the most popular) data progamming language, and ranks highly in several popularity surveys.
  2. Learning R is a great way of learning data science, with many R-based books and resources for probability, frequentist and Bayesian statistics, data visualization, machine learning and more.
  3. Python is another excellent language for data science, but with R it's easier to learn the foundations.

Once you've learned the basics, Sharp Sight also argues that R is also a great data science to master, even though it's an old langauge compared to some of the newer alternatives. Every tool has a shelf life, but R isn't going anywhere and learning R gives you a foundation beyond the language itself.

If you want to get started with R, Sharp Sight labs offers a data science crash course. You might also want to check out the Introduction to R for Data Science course on EdX.

Sharp Sight Labs: Why R is the best data science language to learn today, and Why you should master R (even if it might eventually become obsolete)

If you need more reasons to learn R:

  • Unlike Facebook, R isn’t a sinkhole of non-testable propositions.
  • Unlike Instagram, R is rarely NSFW.
  • Unlike Twitter, R is a marketable skill.

Glad to hear you are learning R!

Pattern Overloading

Tuesday, December 6th, 2016

Pattern Overloading by Ramsey Nasser.

From the post:

C-like languages have a problem of overloaded syntax that I noticed while teaching high school students. Consider the following snippets in such a language:

foo(45)

function foo(int x) {

for(int i=0;i < 10; i++) {

if(x > 10) {

case(x) {

A programmer experienced with this family would see

  1. Function invocation
  2. Function definition
  3. Control flow examples

In my experience, new programmers see these constructs as instances of the same idea: name(some-stuff) more-stuff. This is not an unreasonable conclusion to reach. The syntax for each construct is shockingly similar given that their semantics are wildly different.

You won’t be called upon to re-design C but Nasser’s advice:

Syntactic similarity should mirror semantic similarity

Or, to take a quote from the UX world

Similar things should look similar and dissimilar things should look dissimilar

is equally applicable to any syntax that you design.

Clojure/conj 2016 – Videos – Sorted

Monday, December 5th, 2016

Clojure/conf 2016 has posted videos of all presentations (thanks!) to YouTube, which displays them in no particular order.

To help with my viewing and perhaps yours, here are the videos in title order:

  1. Adventures in Understanding Documents – Scott Tuddenham
  2. Audyx.com 40k locs to build the first web – based sonogram – Asher Coren
  3. Barliman: trying the halting problem backwards, blindfolded – William Byrd, Greg Rosenblatt
  4. Becoming Omniscient with Sayid – Bill Piel
  5. Building a powerful Double Entry Accounting system – Lucas Cavalcanti
  6. Building composable abstractions – Eric Normand
  7. Charting the English Language…in pure Clojure – Alexander Mann
  8. Clarifying Rules Engines with Clara Rules – Mike Rodriguez
  9. Clojure at DataStax: The Long Road From Python to Clojure – Nick Bailey
  10. A Clojure DSL for defining CI/CD orchestrations at scale – Rohit Kumar, Viraj Purang
  11. Composing music with clojure.spec – Wojciech Franke
  12. In situ model-based learning in PAMELA – Paul Robertson, Tom Marble
  13. Juggling Patterns and Programs – Steve Miner
  14. Overcoming the Challenges of Mentoring – Kim Crayton
  15. A Peek Inside SAT Solvers – Jon Smock
  16. Powderkeg: teaching Clojure to Spark – Igor Ges, Christophe Grand
  17. Production Rules on Databases – Paula Gearon
  18. Programming What Cannot Be Programmed: Aesthetics and Narrative – D. Schmüdde
  19. Proto REPL, a New Clojure Development and Visualization Tool – Jason Gilman
  20. Simplifying ETL with Clojure and Datomic – Stuart Halloway
  21. Spec-ulation Keynote – Rich Hickey
  22. Spectrum, a library for statically "typing" clojure.spec – Allen Rohner
  23. Using Clojure with C APIs for crypto and more – lvh
  24. WormBase database migration to Datomic on AWS: A case Study – Adam Wright

Enjoy!

OSS-Fuzz: Continuous fuzzing for open source software

Thursday, December 1st, 2016

Announcing OSS-Fuzz: Continuous fuzzing for open source software

From the post:

We are happy to announce OSS-Fuzz, a new Beta program developed over the past years with the Core Infrastructure Initiative community. This program will provide continuous fuzzing for select core open source software.

Open source software is the backbone of the many apps, sites, services, and networked things that make up “the internet.” It is important that the open source foundation be stable, secure, and reliable, as cracks and weaknesses impact all who build on it.

Recent security stories confirm that errors like buffer overflow and use-after-free can have serious, widespread consequences when they occur in critical open source software. These errors are not only serious, but notoriously difficult to find via routine code audits, even for experienced developers. That’s where fuzz testing comes in. By generating random inputs to a given program, fuzzing triggers and helps uncover errors quickly and thoroughly.

In recent years, several efficient general purpose fuzzing engines have been implemented (e.g. AFL and libFuzzer), and we use them to fuzz various components of the Chrome browser. These fuzzers, when combined with Sanitizers, can help find security vulnerabilities (e.g. buffer overflows, use-after-free, bad casts, integer overflows, etc), stability bugs (e.g. null dereferences, memory leaks, out-of-memory, assertion failures, etc) and sometimes even logical bugs.

OSS-Fuzz’s goal is to make common software infrastructure more secure and stable by combining modern fuzzing techniques with scalable distributed execution. OSS-Fuzz combines various fuzzing engines (initially, libFuzzer) with Sanitizers (initially, AddressSanitizer) and provides a massive distributed execution environment powered by ClusterFuzz.
… (emphasis in original)

Another similarity between open and closed source software.

Closed source software is continuously being fuzzed.

By volunteers.

Yes? 😉

One starting place for more information: Effective file format fuzzing by Mateusz “j00ru” Jurczyk (Black Hat Europe 2016, London) and his website: http://j00ru.vexillium.org/.

Programming has Ethical Consequences?

Friday, November 25th, 2016

Has anyone tracked down the blinding flash that programming has ethical consequences?

Programmers are charged to point out ethical dimensions and issues not noticed by muggles.

This may come as a surprise but programmers in the broader sense have been aware of ethical dimensions to programming for decades.

Perhaps the best known example of a road to Damascus type event is the Trinity atomic bomb test in New Mexico. Oppenheimer recalling a line from the Bhagavad Gita:

“Now I am become Death, the destroyer of worlds.”

To say nothing of the programmers who labored for years to guarantee world wide delivery of nuclear warheads in 30 minutes or less.

But it isn’t necessary to invoke a nuclear Armageddon to find ethical issues that have faced programmers prior to the current ethics frenzy.

Any guesses as to how red line maps were created?

Do you think “red line” maps just sprang up on their own? Or was someone collecting, collating and analyzing the data, much as we would do now but more slowly?

Every act of collecting, collating and analyzing data, now with computers, can and probably does have ethical dimensions and issues.

Programmers can and should raise ethical issues, especially when they may be obscured or clouded by programming techniques or practices.

However, programmers announcing ethical issues to their less fortunate colleagues isn’t likely to lead to a fruitful discussion.

Learning R programming by reading books: A book list

Thursday, November 24th, 2016

Learning R programming by reading books: A book list by Liang-Cheng Zhang.

From the post:

Despite R’s popularity, it is still very daunting to learn R as R has no click-and-point feature like SPSS and learning R usually takes lots of time. No worries! As self-R learner like us, we constantly receive the requests about how to learn R. Besides hiring someone to teach you or paying tuition fees for online courses, our suggestion is that you can also pick up some books that fit your current R programming level. Therefore, in this post, we would like to share some good books that teach you how to learn programming in R based on three levels: elementary, intermediate, and advanced levels. Each level focuses on one task so you will know whether these books fit your needs. While the following books do not necessarily focus on the task we define, you should focus the task when you reading these books so you are not lost in contexts.

Books and reading form the core of my most basic prejudice: Literacy is the doorway to unlimited universes.

A prejudice so strong that I have to work hard at realizing non-literates live in and sense worlds not open to literates. Not less complex, not poorer, just different.

But book lists in particular appeal to that prejudice and since my blog is read by literates, I’m indulging that prejudice now.

I do have a title to add to the list: Practical Data Science with R by Nina Zumel and John Mount.

Judging from the other titles listed, Practical Data Science with R falls in the intermediate range. Should not be your first R book but certainly high on the list for your second R book.

Avoid the rush! Start working on your Amazon wish list today! 😉

How to get started with Data Science using R

Sunday, November 20th, 2016

How to get started with Data Science using R by Karthik Bharadwaj.

From the post:

R being the lingua franca of data science and is one of the popular language choices to learn data science. Once the choice is made, often beginners find themselves lost in finding out the learning path and end up with a signboard as below.

In this blog post I would like to lay out a clear structural approach to learning R for data science. This will help you to quickly get started in your data science journey with R.

You won’t find anything you don’t already know but this is a great short post to pass onto others.

Point out R skills will help them expose and/or conceal government corruption.

Python Data Science Handbook

Saturday, November 19th, 2016

Python Data Science Handbook (Github)

From the webpage:

Jupyter notebook content for my OReilly book, the Python Data Science Handbook.

pdsh-cover

See also the free companion project, A Whirlwind Tour of Python: a fast-paced introduction to the Python language aimed at researchers and scientists.

This repository will contain the full listing of IPython notebooks used to create the book, including all text and code. I am currently editing these, and will post them as I make my way through. See the content here:

Enjoy!

Useful Listicle: The 5 most downloaded R packages

Tuesday, November 15th, 2016

The 5 most downloaded R packages

From the post:

Curious which R packages your colleagues and the rest of the R community are using? Thanks to Rdocumentation.org you can now see for yourself! Rdocumentation.org aggregates R documentation and download information from popular repositories like CRAN, BioConductor and GitHub. In this post, we’ll take a look at the top 5 R packages with the most direct downloads!

Sorry! No spoiler!

Do check out:

Rdocumentation.org aggregates help documentation for R packages from CRAN, BioConductor, and GitHub – the three most common sources of current R documentation. RDocumentation.org goes beyond simply aggregating this information, however, by bringing all of this documentation to your fingertips via the RDocumentaion package. The RDocumentation package overwrites the basic help functions from the utils package and gives you access to RDocumentation.org from the comfort of your RStudio IDE. Look up the newest and most popular R packages, search through documentation and post community examples.

As they say:

Create an RDocumentation account today!

I’m always sympathetic to documentation but more so today because I have wasted hours over the past two or three days on issues that could have been trivially documented.

I will be posting “corrected” documentation later this week.

PS: If you have or suspect you have poorly written documentation, I have some time available for paid improvement of the same.

None/Some/All … Are Suicide Bombers & Probabilistic Programming Languages

Tuesday, November 8th, 2016

The Design and Implementation of Probabilistic Programming Languages by Noah D. Goodman and Andreas Stuhlmüller.

Abstract:

Probabilistic programming languages (PPLs) unify techniques for the formal description of computation and for the representation and use of uncertain knowledge. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages communities. This book explains how to implement PPLs by lightweight embedding into a host language. We illustrate this by designing and implementing WebPPL, a small PPL embedded in Javascript. We show how to implement several algorithms for universal probabilistic inference, including priority-based enumeration with caching, particle filtering, and Markov chain Monte Carlo. We use program transformations to expose the information required by these algorithms, including continuations and stack addresses. We illustrate these ideas with examples drawn from semantic parsing, natural language pragmatics, and procedural graphics.

If you want to sharpen the discussion of probabilistic programming languages, substitute in the pragmatics example:

‘none/some/all of the children are suicide bombers’,

The substitution raises the issue of how “certainty” can/should vary depending upon the gravity of results.

Who is a nice person?, has low stakes.

Who is a suicide bomber?, has high stakes.

Resource: Malware analysis – …

Tuesday, October 4th, 2016

Resource: Malware analysis – learning How To Reverse Malware: A collection of guides and tools by Claus Cramon Houmann.

This resource will provide you theory around learning malware analysis and reverse engineering malware. We keep the links up to date as the infosec community creates new and interesting tools and tips.

Some technical reading to enjoy instead of political debates!

Enjoy!

The Simpsons by the Data [South Park as well]

Thursday, September 29th, 2016

The Simpsons by the Data by Todd Schneider.

From the post:

The Simpsons needs no introduction. At 27 seasons and counting, it’s the longest-running scripted series in the history of American primetime television.

The show’s longevity, and the fact that it’s animated, provides a vast and relatively unchanging universe of characters to study. It’s easier for an animated show to scale to hundreds of recurring characters; without live-action actors to grow old or move on to other projects, the denizens of Springfield remain mostly unchanged from year to year.

As a fan of the show, I present a few short analyses about Springfield, from the show’s dialogue to its TV ratings. All code used for this post is available on GitHub.

Alert! You must run Flash in order to access Simpsons World, the source of Todd’s data.

Advice: Treat Flash as malware and run in a VM.

Todd covers the number of words spoken per character, gender imbalance, focus on characters, viewership, and episode summaries (tf-idf).

Other analysis awaits your imagination and interest.

BTW, if you want comedy data a bit closer to the edge, try Text Mining South Park by Kaylin Walker. Kaylin uses R for her analysis as well.

Other TV programs with R-powered analysis?

Hacker-Proof Code Confirmed [Can Liability Be Far Behind?]

Thursday, September 22nd, 2016

Hacker-Proof Code Confirmed by Kevin Hartnett.

From the post:

In the summer of 2015 a team of hackers attempted to take control of an unmanned military helicopter known as Little Bird. The helicopter, which is similar to the piloted version long-favored for U.S. special operations missions, was stationed at a Boeing facility in Arizona. The hackers had a head start: At the time they began the operation, they already had access to one part of the drone’s computer system. From there, all they needed to do was hack into Little Bird’s onboard flight-control computer, and the drone was theirs.

When the project started, a “Red Team” of hackers could have taken over the helicopter almost as easily as it could break into your home Wi-Fi. But in the intervening months, engineers from the Defense Advanced Research Projects Agency (DARPA) had implemented a new kind of security mechanism — a software system that couldn’t be commandeered. Key parts of Little Bird’s computer system were unhackable with existing technology, its code as trustworthy as a mathematical proof. Even though the Red Team was given six weeks with the drone and more access to its computing network than genuine bad actors could ever expect to attain, they failed to crack Little Bird’s defenses.

“They were not able to break out and disrupt the operation in any way,” said Kathleen Fisher, a professor of computer science at Tufts University and the founding program manager of the High-Assurance Cyber Military Systems (HACMS) project. “That result made all of DARPA stand up and say, oh my goodness, we can actually use this technology in systems we care about.”

Reducing the verification requirement to a manageable size appears to be the key to DARPA’s success.

That is rather than verification of the entire program, only critical parts, such as excluding hackers, need to be verified.

If this spreads, failure to formally verify critical parts of software would be a natural place to begin imposing liability for poorly written code.

PS: Would formal proof of data integration be a value-add?

R Weekly

Monday, September 12th, 2016

R Weekly

A new weekly publication of R resources that began on 21 May 2016 with Issue 0.

Mostly titles of post and news articles, which is useful, but not as useful as short summaries, including the author’s name.

Watch your Python script with strace

Sunday, September 11th, 2016

Description:

Modern operating systems sandbox each process inside of a virtual memory map from which direct I/O operations are generally impossible. Instead, a process has to ask the operating system every time it wants to modify a file or communicate bytes over the network. By using operating system specific tools to watch the system calls a Python script is making — using “strace” under Linux or “truss” under Mac OS X — you can study how a program is behaving and address several different kinds of bugs.

Brandon Rhodes does a delightful presentation on using strace with Python.

Slides for Tracing Python with strace or truss.

I deeply enjoyed this presentation, which I discovered while looking at a Python regex issue.

Anticipate running strace on the Python script this week and will report back on any results or failure to obtain results! (Unlike in academic publishing, experiments and investigations do fail.)