Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 9, 2013

Homology Theory — A Primer

Filed under: Homology,Mathematics — Patrick Durusau @ 9:48 am

Homology Theory — A Primer by Jeremy Kun.

From the post:

This series on topology has been long and hard, but we’re are quickly approaching the topics where we can actually write programs. For this and the next post on homology, the most important background we will need is a solid foundation in linear algebra, specifically in row-reducing matrices (and the interpretation of row-reduction as a change of basis of a linear operator).

Last time we engaged in a whirlwind tour of the fundamental group and homotopy theory. And we mean “whirlwind” as it sounds; it was all over the place in terms of organization. The most important fact that one should take away from that discussion is the idea that we can compute, algebraically, some qualitative features about a topological space related to “n-dimensional holes.” For one-dimensional things, a hole would look like a circle, and for two dimensional things, it would look like a hollow sphere, etc. More importantly, we saw that this algebraic data, which we called the fundamental group, is a topological invariant. That is, if two topological spaces have different fundamental groups, then they are “fundamentally” different under the topological lens (they are not homeomorphic, and not even homotopy equivalent).

Unfortunately the main difficulty of homotopy theory (and part of what makes it so interesting) is that these “holes” interact with each other in elusive and convoluted ways, and the algebra reflects it almost too well. Part of the problem with the fundamental group is that it deftly eludes our domain of interest: we don’t know a general method to compute the damn things!

Jeremy continues his series on topology and promises programs are not far ahead!

April 5, 2013

Probability and Statistics Cookbook

Filed under: Mathematics,Probability,Statistics — Patrick Durusau @ 3:02 pm

Probability and Statistics Cookbook by Matthias Vallentin.

From the webpage:

The cookbook contains a succinct representation of various topics in probability theory and statistics. It provides a comprehensive reference reduced to the mathematical essence, rather than aiming for elaborate explanations.

When Matthias says “succient,” he is quite serious:

Probability Screenshot

But by the time you master the twenty-seven pages of this “cookbook,” you will have a very good grounding on probability and statistics.

April 3, 2013

100 Savvy Sites on Statistics and Quantitative Analysis

Filed under: Mathematics,Quantitative Analysis,Statistics — Patrick Durusau @ 4:21 am

100 Savvy Sites on Statistics and Quantitative Analysis

From the post:

Nate Silver’s unprecedented accurate prediction of state-by-state election results in the most recent presidential race was a watershed moment for the public awareness of statistics. While data gathering and analysis has become a massive industry in the past decade, it hasn’t always been as well covered in the press or publicly accessible as it is now. With more and more of our daily interactions being mediated through computers and the internet, it is easier than ever to gather detailed quantitative data and do statistical analysis on that data derive valuable information and predictions from it.

Knowledge of statistics and quantitative analysis techniques is more valuable than ever. From biostatisticians to politicians and economists, people in every field are using statistics to further their careers and knowledge. These sites are some of the most useful, informative, and comprehensive on the web covering stats and quantitative analysis.

Covers everything from Comprehensive Statistics Sites and Big Data to Data Visualization and Sports Stats.

Fire up your alternative to Google Reader!

I first saw this at 100 Savvy Sites on Statistics and Quantitative Analysis by Vincent Granville.

April 2, 2013

An Applet for the Investigation of Simpson’s Paradox

Filed under: BigData,Mathematics,Statistics — Patrick Durusau @ 6:17 am

An Applet for the Investigation of Simpson’s Paradox by Kady Schneiter and Jürgen Symanzik. (Journal of Statistics Education, Volume 21, Number 1 (2013))

Simpson’s paradox is best illustrated by the University of California, Berkeley sex discrimination case. Taken in the aggregate, admissions to the graduate school appeared to greatly favor men. Taken by department, no department discriminated against women and most favored admission of women. Same data, different level of examination. That is Simpson’s paradox.

Abstract:

This article describes an applet that facilitates investigation of Simpson’s Paradox in the context of a number of real and hypothetical data sets. The applet builds on the Baker-Kramer graphical representation for Simpson’s Paradox. The implementation and use of the applet are explained. This is followed by a description of how the applet has been used in an introductory statistics class and a discussion of student responses to the applet.

From Wikipedia on Simpson’s Paradox:

In probability and statistics, Simpson’s paradox, or the Yule–Simpson effect, is a paradox in which a trend that appears in different groups of data disappears when these groups are combined, and the reverse trend appears for the aggregate data. This result is often encountered in social-science and medical-science statistics,[1] and is particularly confounding when frequency data are unduly given causal interpretations.[2] Simpson’s Paradox disappears when causal relations are brought into consideration.

A cautionary tale about the need to understand data sets and how combining them may impact outcomes of statistical analysis.

Journal of Statistics Education

Filed under: BigData,Mathematics,Statistics — Patrick Durusau @ 5:56 am

Journal of Statistics Education

From the mission statement:

The Journal of Statistics Education (JSE) disseminates knowledge for the improvement of statistics education at all levels, including elementary, secondary, post-secondary, post-graduate, continuing, and workplace education. It is distributed electronically and, in accord with its broad focus, publishes articles that enhance the exchange of a diversity of interesting and useful information among educators, practitioners, and researchers around the world. The intended audience includes anyone who teaches statistics, as well as those interested in research on statistical and probabilistic reasoning. All submissions are rigorously refereed using a double-blind peer review process.

Manuscripts submitted to the journal should be relevant to the mission of JSE. Possible topics for manuscripts include, but are not restricted to: curricular reform in statistics, the use of cooperative learning and projects, innovative methods of instruction, assessment, and research (including case studies) on students’ understanding of probability and statistics, research on the teaching of statistics, attitudes and beliefs about statistics, creative and tested ideas (including experiments and demonstrations) for teaching probability and statistics topics, the use of computers and other media in teaching, statistical literacy, and distance education. Articles that provide a scholarly overview of the literature on a particular topic are also of interest. Reviews of software, books, and other teaching materials will also be considered, provided these reviews describe actual experiences using the materials.

In addition JSE also features departments called “Teaching Bits: A Resource for Teachers of Statistics” and “Datasets and Stories.” “Teaching Bits” summarizes interesting current events and research that can be used as examples in the statistics classroom, as well as pertinent items from the education literature. The “Datasets and Stories” department not only identifies interesting datasets and describes their useful pedagogical features, but enables instructors to download the datasets for further analysis or dissemination to students.

Associated with the Journal of Statistics Education is the JSE Information Service. The JSE Information Service provides a source of information for teachers of statistics that includes the archives of EDSTAT-L (an electronic discussion list on statistics education), information about the International Association for Statistical Education, and links to many other statistics education sources.

If you are going to talk about big data, of necessity you are also going to talk about statistics.

A very good free online resource on statistics.

April 1, 2013

New Book Explores the P-NP Problem [Explaining Topic Maps]

Filed under: Marketing,Mathematical Reasoning,Mathematics — Patrick Durusau @ 5:24 pm

New Book Explores the P-NP Problem by Shar Steed.

From the post:

The Golden Ticket: P, NP, and the Search for the Impossible, written by CCC Council and CRA board member, Lance Fortnow is now available. The inspiration for the book came in 2009 when Fortnow published an article on the P-NP problem for Communications of the ACM. With more than 200,000 downloads, the article is one of the website’s most popular, which signals that this is an issue that people are interested in exploring. The P-NP problem is the most important open problem in computer science because it attempts measure the limits of computation.

The book is written to appeal to readers outside of computer science and shed light on the fact that there are deep computational challenges that computer scientists face. To make it relatable, Fortnow developed the “Golden Ticket” analogy, comparing the P-NP problem to the search for the golden ticket in Charlie and the Chocolate Factory, a story many people can relate to. Fortnow avoids mathematical and technical terminology and even the formal definition of the P-NP problem, and instead uses examples to explain concepts

“My goal was to make the book relatable by telling stories. It is a broad based book that does not require a math or computer science background to understand it.”

Fortnow also credits CRA and CCC for giving him inspiration to write the book.

Fortnow has explained the P-NP problem without using “…mathematical and technical commentary and even the formal definition of the P-NP problem….”

Now, we were talking about how difficult it is to explain topic maps?

Suggest we all read this as a source of inspiration for better (more accessible) explanations and tutorials on topic maps.

(I just downloaded it to the Kindle reader on a VM running on my Ubuntu box. This promises to be a great read!)

March 29, 2013

Mathematics Cannot Be Patented. Case Dismissed.

Filed under: Law,Mathematics,Patents — Patrick Durusau @ 4:48 am

Mathematics Cannot Be Patented. Case Dismissed. by Alan Schoenbaum.

From the post:

Score one for the good guys. Rackspace and Red Hat just defeated Uniloc, a notorious patent troll. This case never should have been filed. The patent never should have been issued. The ruling is historic because, apparently, it was the first time that a patent suit in the Eastern District of Texas has been dismissed prior to filing an answer in the case, on the grounds that the subject matter of the patent was found to be unpatentable. And was it ever unpatentable.

Red Hat indemnified Rackspace in the case. This is something that Red Hat does well, and kudos to them. They stand up for their customers and defend these Linux suits. The lawyers who defended us deserve a ton of credit. Bill Lee and Cynthia Vreeland of Wilmer Hale were creative and persuasive, and their strategy to bring the early motion to dismiss was brilliant.

The patent at issue is a joke. Uniloc alleged that a floating point numerical calculation by the Linux operating system violated U.S. Patent 5,892,697 – an absurd assertion. This is the sort of low quality patent that never should have been granted in the first place and which patent trolls buy up by the bushel full, hoping for fast and cheap settlements. This time, with Red Hat’s strong backing, we chose to fight.

The outcome was just what we had in mind. Chief Judge Leonard Davis found that the subject matter of the software patent was unpatentable under Supreme Court case law and, ruling from the bench, granted our motion for an early dismissal. The written order, which was released yesterday, is excellent and well-reasoned. It’s refreshing to see that the judiciary recognizes that many of the fundamental operations of a computer are pure mathematics and are not patentable subject matter. We expect, and hope, that many more of these spurious software patent lawsuits are dismissed on similar grounds.

A potential use case for a public topic map on patents?

At least on software patents?

Thinking that a topic map could be constructed of all the current patents that address mathematical operations, enabling academics and researchers to focus on factual analysis of the processes claimed by those patents.

From the factual analysis, other researchers, primarily lawyers and law students, could outline legal arguments, tailored for each patent, as to its invalidity.

A community resource, not unlike a patent bank, that would strengthen the community’s hand when dealing with patent trolls.

PS: I guess this means I need to stop working on my patent for addition. 😉

March 26, 2013

MapEquation.org

Filed under: Graphics,Graphs,Mapping,Mathematics,Networks,Visualization — Patrick Durusau @ 1:30 pm

MapEquation.org by Daniel Edler and Martin Rosvall.

From the “about” page:

What do we do?

We develop mathematics, algorithms and software to simplify and highlight important structures in complex systems.

What are our goals?

To navigate and understand big data like we navigate and understand the real world by maps.

Suggest you start with the Apps.

Very impressive and has data available for loading.

You can also upload your own data.

Spend some time with Code and Publications as well.

I first saw this in a tweet by Chris@SocialTexture.

March 23, 2013

Tensors and Their Applications…

Filed under: Linked Data,Machine Learning,Mathematics,RDF,Tensors — Patrick Durusau @ 6:36 pm

Tensors and Their Applications in Graph-Structured Domains by Maximilian Nickel and Volker Tresp. (Slides.)

Along with the slides, you will like abstract and bibliography found at: Machine Learning on Linked Data: Tensors and their Applications in Graph-Structured Domains.

Abstract:

Machine learning has become increasingly important in the context of Linked Data as it is an enabling technology for many important tasks such as link prediction, information retrieval or group detection. The fundamental data structure of Linked Data is a graph. Graphs are also ubiquitous in many other fields of application, such as social networks, bioinformatics or the World Wide Web. Recently, tensor factorizations have emerged as a highly promising approach to machine learning on graph-structured data, showing both scalability and excellent results on benchmark data sets, while matching perfectly to the triple structure of RDF. This tutorial will provide an introduction to tensor factorizations and their applications for machine learning on graphs. By the means of concrete tasks such as link prediction we will discuss several factorization methods in-depth and also provide necessary theoretical background on tensors in general. Emphasis is put on tensor models that are of interest to Linked Data, which will include models that are able to factorize large-scale graphs with millions of entities and known facts or models that can handle the open-world assumption of Linked Data. Furthermore, we will discuss tensor models for temporal and sequential graph data, e.g. to analyze social networks over time.

Devising a system to deal with the heterogeneous nature of linked data.

Just skimming the slides I could see, this looks very promising.

I first saw this in a tweet by Stefano Bertolo.


Update: I just got an email from Maximilian Nickel and he has altered the transition between slides. Working now!

From slide 53 forward is pure gold for topic map purposes.

Heavy sledding but let me give you one statement from the slides that should capture your interest:

Instance matching: Ranking of entities by their similarity in the entity-latent-component space.

Although written about linked data, not limited to linked data.

What is more, Maximilian offers proof that the technique scales!

Complex, configurable, scalable determination of subject identity!

[Update: deleted note about issues with slides, which read: (Slides for ISWC 2012 tutorial, Chrome is your best bet. Even better bet, Chrome on Windows. Chrome on Ubuntu crashed every time I tried to go to slide #15. Windows gets to slide #46 before failing to respond. I have written to inquire about the slides.)]

Methods of Proof — Induction

Filed under: Mathematical Reasoning,Mathematics,Programming — Patrick Durusau @ 1:20 pm

Methods of Proof — Induction by Jeremy Kun.

Jeremy covers proof by induction in the final post for his “proof” series.

Induction is used to prove statements about natural numbers (positive integers).

Lars Marius Garshol recently concluded slides on big data with:

  • Vast potential
    • to both big data and machine learning
  • Very difficult to realize that potential
    • requires mathematics, which nobody knows
  • We need to wake up!

Big Data 101 by Lars Marius Garshol.

If you want to step up your game with big data, you will need to master mathematics.

Excel and other software can do mathematics but can’t choose the mathematics to apply.

That requires you.

March 22, 2013

The Shape of Data

Filed under: Data,Mathematics,Topology — Patrick Durusau @ 1:17 pm

The Shape of Data by Jesse Johnson.

From the “about” page:

Whether your goal is to write data intensive software, use existing software to analyze large, high dimensional data sets, or to better understand and interact with the experts who do these things, you will need a strong understanding of the structure of data and how one can try to understand it. On this blog, I plan to explore and explain the basic ideas that underlie modern data analysis from a very intuitive and minimally technical perspective: by thinking of data sets as geometric objects.

When I began learning about machine learning and data mining, I found that the intuition I had formed while studying geometry was extremely valuable in understanding the basic concepts and algorithms. My main obstacle has been to figure out what types of problems others are interested in solving, and what types of solutions would make the most difference. I hope that by sharing what I know (and what I continue to learn) from my own perspective, others will help me to figure out what are the major questions that drive this field.

A new blog that addresses the topology of data, in an accessible manner.

March 12, 2013

A Concise Course in Algebraic Topology

Filed under: Algebra,Mathematics,Topology — Patrick Durusau @ 10:05 am

A Concise Course in Algebraic Topology by J.P. May. (PDF)

From the introduction:

The first year graduate program in mathematics at the University of Chicago consists of three three-quarter courses, in analysis, algebra, and topology. The first two quarters of the topology sequence focus on manifold theory and differential geometry, including differential forms and, usually, a glimpse of de Rham cohomology. The third quarter focuses on algebraic topology. I have been teaching the third quarter off and on since around 1970. Before that, the topologists, including me, thought that it would be impossible to squeeze a serious introduction to algebraic topology into a one quarter course, but we were overruled by the analysts and algebraists, who felt that it was unacceptable for graduate students to obtain their PhDs without having some contact with algebraic topology.

This raises a conundrum. A large number of students at Chicago go into topology, algebraic and geometric. The introductory course should lay the foundations for their later work, but it should also be viable as an introduction to the subject suitable for those going into other branches of mathematics. These notes reflect my efforts to organize the foundations of algebraic topology in a way that caters to both pedagogical goals. There are evident defects from both points of view. A treatment more closely attuned to the needs of algebraic geometers and analysts would include Čech cohomology on the one hand and de Rham cohomology and ˇ perhaps Morse homology on the other. A treatment more closely attuned to the needs of algebraic topologists would include spectral sequences and an array of calculations with them. In the end, the overriding pedagogical goal has been the introduction of basic ideas and methods of thought.

Tough sledding but having insights, like those found in the GraphLab project, require a deeper than usual understanding of the issues at hand.

I first saw this in a tweet by Topology Fact.

March 5, 2013

A Simple, Combinatorial Algorithm for Solving…

Filed under: Algorithms,Data Structures,Mathematics — Patrick Durusau @ 2:55 pm

A Simple, Combinatorial Algorithm for Solving SDD Systems in Nearly-Linear Time by Jonathan A. Kelner, Lorenzo Orecchia, Aaron Sidford, Zeyuan Allen Zhu.

Abstract:

In this paper, we present a simple combinatorial algorithm that solves symmetric diagonally dominant (SDD) linear systems in nearly-linear time. It uses very little of the machinery that previously appeared to be necessary for a such an algorithm. It does not require recursive preconditioning, spectral sparsification, or even the Chebyshev Method or Conjugate Gradient. After constructing a “nice” spanning tree of a graph associated with the linear system, the entire algorithm consists of the repeated application of a simple (non-recursive) update rule, which it implements using a lightweight data structure. The algorithm is numerically stable and can be implemented without the increased bit-precision required by previous solvers. As such, the algorithm has the fastest known running time under the standard unit-cost RAM model. We hope that the simplicity of the algorithm and the insights yielded by its analysis will be useful in both theory and practice.

In one popular account, the importance of the discovery was described this way:

The real value of the MIT paper, Spielman says, is in its innovative theoretical approach. “My work and the work of the folks at Carnegie Mellon, we’re solving a problem in numeric linear algebra using techniques from the field of numerical linear algebra,” he says. “Jon’s paper is completely ignoring all of those techniques and really solving this problem using ideas from data structures and algorithm design. It’s substituting one whole set of ideas for another set of ideas, and I think that’s going to be a bit of a game-changer for the field. Because people will see there’s this set of ideas out there that might have application no one had ever imagined.”

Thirty-two pages of tough sledding but if the commentaries are correct, this paper may have a major impact on graph processing.

March 1, 2013

Methods of Proof — Contradiction

Filed under: Mathematical Reasoning,Mathematics — Patrick Durusau @ 5:30 pm

Methods of Proof — Contradiction by Jeremy Kun.

From the post:

In this post we’ll expand our toolbox of proof techniques by adding the proof by contradiction. We’ll also expand on our knowledge of functions on sets, and tackle our first nontrivial theorem: that there is more than one kind of infinity.

Impossibility and an Example Proof by Contradiction

Many of the most impressive results in all of mathematics are proofs of impossibility. We see these in lots of different fields. In number theory, plenty of numbers cannot be expressed as fractions. In geometry, certain geometric constructions are impossible with a straight-edge and compass. In computing theory, certain programs cannot be written. And in logic even certain mathematical statements can’t be proven or disproven.

In some sense proofs of impossibility are hardest proofs, because it’s unclear to the layman how anyone could prove it’s not possible to do something. Perhaps this is part of human nature, that nothing is too impossible to escape the realm of possibility. But perhaps it’s more surprising that the main line of attack to prove something is impossible is to assume it’s possible, and see what follows as a result. This is precisely the method of proof by contradiction:

Assume the claim you want to prove is false, and deduce that something obviously impossible must happen.

There is a simple and very elegant example that I use to explain this concept to high school students in my guest lectures.

I hope you are following this series of posts but if not, at least read the example Jeremy has for proof by contradiction.

It’s a real treat.

Methods of Proof — Contrapositive

Filed under: Mathematical Reasoning,Mathematics — Patrick Durusau @ 5:30 pm

Methods of Proof — Contrapositive by Jeremy Kun.

From the post:

In this post we’ll cover the second of the “basic four” methods of proof: the contrapositive implication. We will build off our material from last time and start by defining functions on sets.

Functions as Sets

So far we have become comfortable with the definition of a set, but the most common way to use sets is to construct functions between them. As programmers we readily understand the nature of a function, but how can we define one mathematically? It turns out we can do it in terms of sets, but let us recall the desired properties of a function:

  • Every input must have an output.
  • Every input can only correspond to one output (the functions must be deterministic).

Jeremy continues his series on proof techniques.

Required knowledge for reading formal CS papers.

February 25, 2013

Discrete Structures (University of Washington 2008)

Filed under: Discrete Structures,Mathematics — Patrick Durusau @ 1:38 pm

I found the following material following a link in Christophe Lalanne’s A bag of tweets / February 2013 on “Common Mistakes in Discrete Mathematics.”

Clearly organized around a text but it wasn’t clear which text was being used.

Backing up the URI, I found the homepage for: CSE 321: Discrete Structures 2008, which listed the textbook as Rosen, Discrete Mathematics and Its Applications, McGraw-Hill, 6th Edition. (BTW, there is a 7th edition, Discrete Mathematics and Its Applications).

I also found links for:

Homework

Lecture Slides

Recorded Lectures

Post-Section Notes (note and a problem correction)

and, the origin of my inquiry:

Common Mistakes in Discrete Mathematics

In this section of the Guide we list many common mistakes that people studying discrete mathematics sometimes make. The list is organized chapter by chapter, based on when they first occur, but sometimes mistakes made early in the course perpetuate in later chapters. Also, some of these mistakes are remnants of misconceptions from high school mathematics (such as the impulse to assume that every operation distributes over every other operation).

In most cases we describe the mistake, give a concrete example, and then offer advice about how to avoid it. Note that additional advice about common mistakes in given, implicitly or explicitly, in the solutions to the odd-numbered exercises, which constitute the bulk of this Guide.

If 2008 sounds a bit old, you’re right. There is an update that requires a separate post. See: UW Courses in Computer Science and Engineering.

February 16, 2013

Methods of Proof — Direct Implication

Filed under: Mathematical Reasoning,Mathematics — Patrick Durusau @ 4:49 pm

Methods of Proof — Direct Implication by Jeremy Kun.

From the post:

I recently posted an exploratory piece on why programmers who are genuinely interested in improving their mathematical skills can quickly lose stamina or be deterred. My argument was essentially that they don’t focus enough on mastering the basic methods of proof before attempting to read research papers that assume such knowledge. Also, there are a number of confusing (but in the end helpful) idiosyncrasies in mathematical culture that are often unexplained. Together these can cause enough confusion to stymie even the most dedicated reader. I have certainly experienced it enough to call the feeling familiar.

Now I’m certainly not trying to assert that all programmers need to learn mathematics to improve their craft, nor that learning mathematics will be helpful to any given programmer. All I claim is that someone who wants to understand why theorems are true, or how to tweak mathematical work to suit their own needs, cannot succeed without a thorough understanding of how these results are developed in the first place. Function definitions and variable declarations may form the scaffolding of a C program while the heart of the program may only be contained in a few critical lines of code. In the same way, the heart of a proof is usually quite small and the rest is scaffolding. One surely cannot understand or tweak a program without understanding the scaffolding, and the same goes for mathematical proofs.

And so we begin this series focusing on methods of proof, and we begin in this post with the simplest such methods. I call them the “basic four,” and they are:

  • Proof by direct implication
  • Proof by contradiction
  • Proof by contrapositive, and
  • Proof by induction.

This post will focus on the first one, while introducing some basic notation we will use in the future posts. Mastering these proof techniques does take some practice, and it helps to have some basic mathematical content with which to practice on. We will choose the content of set theory because it’s the easiest field in terms of definitions, and its syntax is the most widely used in all but the most pure areas of mathematics. Part of the point of this primer is to spend time demystifying notation as well, so we will cover the material at a leisurely (for an experienced mathematician: aggravatingly slow) pace.

Parallel processing, multi-core memory architectures, graphs and the like are a long way from the cookbook stage of programming.

If you want to be on the leading edge, some mathematics are going to be required.

This series can bring you one step closer to mathematical literacy.

I say “can” because whether it will or not, depends upon you.

February 9, 2013

…no Hitchhiker’s Guide…

Filed under: Computation,Computer Science,Mathematical Reasoning,Mathematics — Patrick Durusau @ 8:22 pm

Why there is no Hitchhiker’s Guide to Mathematics for Programmers by Jeremy Kun.

From the post:

Do you really want to get better at mathematics?

Remember when you first learned how to program? I do. I spent two years experimenting with Java programs on my own in high school. Those two years collectively contain the worst and most embarrassing code I have ever written. My programs absolutely reeked of programming no-nos. Hundred-line functions and even thousand-line classes, magic numbers, unreachable blocks of code, ridiculous code comments, a complete disregard for sensible object orientation, negligence of nearly all logic, and type-coercion that would make your skin crawl. I committed every naive mistake in the book, and for all my obvious shortcomings I considered myself a hot-shot programmer! At leaa st I was learning a lot, and I was a hot-shot programmer in a crowd of high-school students interested in game programming.

Even after my first exposure and my commitment to get a programming degree in college, it was another year before I knew what a stack frame or a register was, two more before I was anywhere near competent with a terminal, three more before I fully appreciated functional programming, and to this day I still have an irrational fear of networking and systems programming (the first time I manually edited the call stack I couldn’t stop shivering with apprehension and disgust at what I was doing).

A must read post if you want to be on the cutting edge of programming.

February 6, 2013

The Evolution of Regression Modeling… [Webinar]

Filed under: Mathematics,Modeling,Regression — Patrick Durusau @ 12:46 pm

The Evolution of Regression Modeling: From Classical Linear Regression to Modern Ensembles by Mikhail Golovnya and Illia Polosukhin.

Dates/Times:

Part 1: Fri March 1, 10 am, PST

Part 2: Friday, March 15, 10 am, PST

Part 3: Friday, March 29, 10 am, PST

Part 4: Friday, April 12, 10 am, PST

From the webpage:

Class Description: Regression is one of the most popular modeling methods, but the classical approach has significant problems. This webinar series address these problems. Are you are working with larger datasets? Is your data challenging? Does your data include missing values, nonlinear relationships, local patterns and interactions? This webinar series is for you! We will cover improvements to conventional and logistic regression, and will include a discussion of classical, regularized, and nonlinear regression, as well as modern ensemble and data mining approaches. This series will be of value to any classically trained statistician or modeler.

Details:

Part 1: March 1 – Regression methods discussed

  •     Classical Regression
  •     Logistic Regression
  •     Regularized Regression: GPS Generalized Path Seeker
  •     Nonlinear Regression: MARS Regression Splines

Part 2: March 15 – Hands-on demonstration of concepts discussed in Part 1

  •     Step-by-step demonstration
  •     Datasets and software available for download
  •     Instructions for reproducing demo at your leisure
  •     For the dedicated student: apply these methods to your own data (optional)

Part 3: March 29 – Regression methods discussed
*Part 1 is a recommended pre-requisite

  •     Nonlinear Ensemble Approaches: TreeNet Gradient Boosting; Random Forests; Gradient Boosting incorporating RF
  •     Ensemble Post-Processing: ISLE; RuleLearner

Part 4: April 12 – Hands-on demonstration of concepts discussed in part 3

  •     Step-by-step demonstration
  •     Datasets and software available for download
  •     Instructions for reproducing demo at your leisure
  •     For the dedicated student: apply these methods to your own data (optional)

Salford Systems offers other introductory videos, webinars and tutorial and case studies.

Regression modeling is a tool you will encounter in data analysis and is likely to be an important part of your exploration toolkit.

I first saw this at KDNuggets.

January 27, 2013

Information field theory

Filed under: Data Analysis,Information Field Theory,Mathematics,Uncertainty — Patrick Durusau @ 5:41 pm

Information field theory

From the webpage:

Information field theory (IFT) is information theory, the logic of reasoning under uncertainty, applied to fields. A field can be any quantity defined over some space, e.g. the air temperature over Europe, the magnetic field strength in the Milky Way, or the matter density in the Universe. IFT describes how data and knowledge can be used to infer field properties. Mathematically it is a statistical field theory and exploits many of the tools developed for such. Practically, it is a framework for signal processing and image reconstruction.

IFT is fully Bayesian. How else can infinitely many field degrees of freedom be constrained by finite data?

It can be used without the knowledge of Feynman diagrams. There is a full toolbox of methods.

It reproduces many known well working algorithms. This should be reassuring.

And, there were certainly previous works in a similar spirit. See below for IFT publications and previous works.

Anyhow, in many cases IFT provides novel rigorous ways to extract information from data.

Please, have a look! The specific literature is listed below and more general highlight articles on the right hand side.

Just in case you want to be on the cutting edge of information extraction. 😉

And you might note that Feynman diagrams are graphic representations (maps) of complex mathematical equations.

Computational Information Geometry

Filed under: Geometry,Information Geometry,Mathematics — Patrick Durusau @ 5:40 pm

Computational Information Geometry by Frank Nielsen.

From the homepage:

Computational information geometry deals with the study and design of efficient algorithms in information spaces using the language of geometry (such as invariance, distance, projection, ball, etc). Historically, the field was pioneered by C.R. Rao in 1945 who proposed to use the Fisher information metric as the Riemannian metric. This seminal work gave birth to the geometrization of statistics (eg, statistical curvature and second-order efficiency). In statistics, invariance (by non-singular 1-to-1 reparametrization and sufficient statistics) yield the class of f-divergences, including the celebrated Kullback-Leibler divergence. The differential geometry of f-divergences can be analyzed using dual alpha-connections. Common algorithms in machine learning (such as clustering, expectation-maximization, statistical estimating, regression, independent component analysis, boosting, etc) can be revisited and further explored using those concepts. Nowadays, the framework of computational information geometry opens up novel horizons in music, multimedia, radar, and finance/economy.

Numerous resources including publications, links to conference proceedings (some with videos), software and other materials, including a tri-lingual dictionary, Japanese, English, French, of terms in information geometry.

Dictionary of computational information geometry

Filed under: Geometry,Information Geometry,Mathematics — Patrick Durusau @ 5:40 pm

Dictionary of computational information geometry (PDF) by Frank Nielsen. (Compiled January 23, 2013)

The title is a bit misleading.

It should read: “[Tri-Lingual] Dictionary of computational information geometry.”

Terms are defined in:

Japanese-English

English-Japanese

Japanese-French

An excellent resource in a linguistically diverse world!

January 15, 2013

Symbolab

Filed under: Mathematics,Mathematics Indexing,Search Engines,Searching — Patrick Durusau @ 8:31 pm

Symbolab

Described as:

Symbolab is a search engine for students, mathematicians, scientists and anyone else looking for answers in the mathematical and scientific realm. Other search engines that do equation search use LaTex, the document mark up language for mathematical symbols which is the same as keywords, which unfortunately gives poor results.

Symbolab uses proprietary machine learning algorithms to provide the most relevant search results that are theoretically and semantically similar, rather than visually similar. In other words, it does a semantic search, understanding the query behind the symbols, to get results.

The nice thing about math and science is that it’s universal – there’s no need for translation in order to understand an equation. This means scale can come much quicker than other search engines that are limited by language.

From: The guys from The Big Bang Theory will love mathematical search engine Symbolab by Shira Abel. (includes an interview with Michael Avny, the CEO of Symbolab.

Limited to web content at the moment but a “scholar” option is in the works. I assume that will extend into academic journals.

Focused now on mathematics, physics and chemistry, but in principle should be extensible to related areas. I am particularly anxious to hear they are indexing CS publications!

Would be really nice if Springer, Elsevier, the AMS and others would permit indexing of their equations.

That presumes publishers would realize that shutting out users not at institutions is a bad marketing plan. With a marginal delivery cost of near zero and sunk costs from publication already fixed, every user a publisher gains at $200/year for their entire collection is $200 they did not have before.

Not to mention the citation and use of their publication, which just drives more people to publish there. A virtuous circle if you will.

The only concern I have is the comment:

The nice thing about math and science is that it’s universal – there’s no need for translation in order to understand an equation.

Which is directly contrary to what Michael is quoted as saying in the interview:

You say “Each symbol can mean different things within and across disciplines, order and position of elements matter, priority of features, etc.” Can you give an example of this?

The authors of the Foundations of Rule Learning spent five years attempting to reconcile notations used in rule making. Some symbols had different meanings. They resorted to inventing yet another notation as a solution.

Why the popular press perpetuates the myth of a universal language isn’t clear.

It isn’t useful and in some cases, such as national security, it leads to waste of time and resources on attempts to invent a universal language.

The phrase “myth of a universal language” should be a clue. Universal languages don’t exist. They are myths, by definition.

Anyone who says differently is trying to sell you something, Something that is in their interest and perhaps not yours.

I first saw this at Introducing Symbolab: Search for Equations by Angela Guess.

January 11, 2013

Probability Theory — A Primer

Filed under: Mathematics,Probability — Patrick Durusau @ 7:36 pm

Probability Theory — A Primer by Jeremy Kun.

From the post:

It is a wonder that we have yet to officially write about probability theory on this blog. Probability theory underlies a huge portion of artificial intelligence, machine learning, and statistics, and a number of our future posts will rely on the ideas and terminology we lay out in this post. Our first formal theory of machine learning will be deeply ingrained in probability theory, we will derive and analyze probabilistic learning algorithms, and our entire treatment of mathematical finance will be framed in terms of random variables.

And so it’s about time we got to the bottom of probability theory. In this post, we will begin with a naive version of probability theory. That is, everything will be finite and framed in terms of naive set theory without the aid of measure theory. This has the benefit of making the analysis and definitions simple. The downside is that we are restricted in what kinds of probability we are allowed to speak of. For instance, we aren’t allowed to work with probabilities defined on all real numbers. But for the majority of our purposes on this blog, this treatment will be enough. Indeed, most programming applications restrict infinite problems to finite subproblems or approximations (although in their analysis we often appeal to the infinite).

We should make a quick disclaimer before we get into the thick of things: this primer is not meant to connect probability theory to the real world. Indeed, to do so would be decidedly unmathematical. We are primarily concerned with the mathematical formalisms involved in the theory of probability, and we will leave the philosophical concerns and applications to future posts. The point of this primer is simply to lay down the terminology and basic results needed to discuss such topics to begin with.

So let us begin with probability spaces and random variables.

Jeremy’s “primer” posts make good background reading. (A primers listing.)

Work through them carefully for best results.

December 14, 2012

mathURL

Filed under: Mathematics,TeX/LaTeX — Patrick Durusau @ 3:07 pm

mathURL live equation editing · permanent short links · LaTeX+AMS input

Try: http://mathurl.com/5euwuy

Includes layout, letters and symbols, operators and relations, punctuation and accents, functions, formatting and common forms as selectable items that generate LaTeX code in the editing window.

Interesting to think about use of such a link as a subject identifier.

I first saw this in a tweet from Tex tips.

December 6, 2012

Advanced Data Analysis from an Elementary Point of View

Filed under: Data Analysis,Mathematics,Statistics — Patrick Durusau @ 11:35 am

Advanced Data Analysis from an Elementary Point of View by Cosma Rohilla Shalizi. (UPDATE: 2014 draft

From the Introduction:

These are the notes for 36-402, Advanced Data Analysis, at Carnegie Mellon. If you are not enrolled in the class, you should know that it’s the methodological capstone of the core statistics sequence taken by our undergraduate majors (usually in their third year), and by students from a range of other departments. By this point, they have taken classes in introductory statistics and data analysis, probability theory, mathematical statistics, and modern linear regression (“401”). This class does not presume that you have learned but forgotten the material from the pre-requisites; it presumes that you know that material and can go beyond it. The class also presumes a firm grasp on linear algebra and multivariable calculus, and that you can read and write simple functions in R. If you are lacking in any of these areas, now would be an excellent time to leave.

36-402 is a class in statistical methodology: its aim is to get students to understand something of the range of modern1 methods of data analysis, and of the considerations which go into choosing the right method for the job at hand (rather than distorting the problem to fit the methods the student happens to know). Statistical theory is kept to a minimum, and largely introduced as needed.

[Footnote 1] Just as an undergraduate “modern physics” course aims to bring the student up to about 1930 (more specifically, to 1926), this class aims to bring the student up to about 1990.

Very recent introduction to data analysis. Shalizi includes a list of concepts in the introduction that best be mastered before tackling this material.

According to footnote 1, when you have mastered this material, you have another twenty-two years to make up in general and on your problem in particular.

Still, knowing it cold will put you ahead of a lot of data analysis you are going to encounter.

I first saw this in a tweet by Gene Golovchinsky.

December 5, 2012

Mathematical Writing

Filed under: Mathematics,Writing — Patrick Durusau @ 7:09 am

Mathematical Writing by Don Knuth, Tracy Larrabee, and Paul M. Roberts.

From the course catalog:

CS 209. Mathematical Writing—Issues of technical writing and the effective presentation of mathematics and computer science. Preparation of theses, papers, books, and “literate” computer programs. A term paper on a topic of your choice; this paper may be used for credit in another course.

Stanford University, Fall, 1987.

An admixture of notes, citations, examples, and war stories by several legends in the world of CS.

While making a number of serious points about writing, the materials are also highly entertaining.

I first saw this in Christophe Lalanne’s A bag of tweets / November 2012.

The Elements of Statistical Learning (2nd ed.)

Filed under: Machine Learning,Mathematics,Statistical Learning,Statistics — Patrick Durusau @ 6:50 am

The Elements of Statistical Learning (2nd ed.) by Trevor Hastie, Robert Tibshirani and Jerome Friedman. (PDF)

The authors note in the preface to the first edition:

The field of Statistics is constantly challenged by the problems that science and industry brings to its door. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded both in size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of “data mining”; statistical and computational problems in biology and medicine have created “bioinformatics.” Vast amounts of data are being generated in many fields, and the statistician’s job is to make sense of it all: to extract important patterns and trends, and understand “what the data says.” We call this learning from data.

I’m sympathetic to that sentiment but with the caveat that it is our semantic expectations of the data that give it any meaning to be “learned.”

Data isn’t lurking outside our door with “meaning” captured separate and apart from us. Our fancy otherwise obscures our role in the origin of “meaning” that we attach to data. In part to bolster the claim that the “facts/data say….”

It is us who take up the gauge for our mute friends, facts/data, and make claims on their behalf.

If we recognized those as our claims, perhaps we would be more willing to listen to the claims of others. Perhaps.

I first saw this in a tweet by Michael Conover.

November 21, 2012

ALGEBRA, Chapter 0

Filed under: Algebra,Category Theory,Mathematics — Patrick Durusau @ 11:27 am

ALGEBRA, Chapter 0 by Paolo Aluffi. (PDF)

From the introduction:

This text presents an introduction to algebra suitable for upper-level undergraduate or beginning graduate courses. While there is a very extensive offering of textbooks at this level, in my experience teaching this material I have invariably felt the need for a self-contained text that would start ‘from zero’ (in the sense of not assuming that the reader has had substantial previous exposure to the subject), but impart from the very beginning a rather modern, categorically-minded viewpoint, and aim at reaching a good level of depth. Many textbooks in algebra satisfy brilliantly some, but not all of these requirements. This book is my attempt at providing a working alternative.

There is a widespread perception that categories should be avoided at first blush, that the abstract language of categories should not be introduced until a student has toiled for a few semesters through example-driven illustrations of the nature of a subject like algebra. According to this viewpoint, categories are only tangentially relevant to the main topics covered in a beginning course, so they can simply be mentioned occasionally for the general edification of a reader, who will in time learn about them (by osmosis?). Paraphrasing a reviewer of a draft of the present text, ‘Discussions of categories at this level are the reason why God created appendices’.

It will be clear from a cursory glance at the table of contents that I think otherwise. In this text, categories are introduced around p. 20, after a scant reminder of the basic language of naive set theory, for the main purpose of providing a context for universal properties. These are in turn evoked constantly as basic definitions are introduced. The word ‘universal’ appears at least 100 times in the first three chapters.

If you are interested in a category theory based introduction to algebra, this may be the text for you. Suitable (according to the author) for use in a classroom or for self-study.

The ability to reason carefully, about what we imagine is known, should not be underestimated.

I first saw this in a tweet from Algebra Fact.

October 31, 2012

Make your own buckyball

Filed under: Geometry,Graphs,Mathematics,Modeling,Visualization — Patrick Durusau @ 1:05 pm

Make your own buckyball by John D. Cook.

From the post:

This weekend a couple of my daughters and I put together a buckyball from a Zometool kit. The shape is named for Buckminster Fuller of geodesic dome fame. Two years after Fuller’s death, scientists discovered that the shape appears naturally in the form of a C60 molecule, named Buckminsterfullerene in his honor. In geometric lingo, the shape is a truncated icosahedron. It’s also the shape of many soccer balls.

Don’t be embarrassed to use these at the office.

According to the PR, Roger Penrose does.

« Newer PostsOlder Posts »

Powered by WordPress