Monday, June 26, 2017
Sunday, June 25, 2017
In this article, I present a few modern techniques that have been used in various business contexts, comparing performance with traditional methods. The advanced techniques in question are math-free, innovative, efficiently process large amounts of unstructured data, and are robust and scalable. Implementations in Python, R, Julia and Perl are provided, but here we focus on an Excel version that does not even require any Excel macros, coding, plug-ins, or anything other than the most basic version of Excel. It is actually easily implemented in standard, basic SQL too, and we invite readers to work on an SQL version.
Who should use the spreadsheet?
First, the spreadsheet (as well as the Python, R, Perl or Julia version) are free to use and modify in any context, even commercial, and even to make a product out of it and sell it. It is part of my concept of open patent, in which I share all my intellectual property publicly and for free.
The spreadsheet is designed as a tutorial, thought it processes the same data set as the one used for the Python version. It is aimed at people that are not professional coders, people who manage data scientists, BI experts, MBA professionals, and people from other fields, with an interest in understanding the mechanics of some state-of-the-art machine learning techniques, without having to spend months or years learning mathematics, programming, and computer science. A few hours is needed to understand the details. This spreadsheet can be the first step to help you transition to a new, more analytical career path, or to better understand the data scientists that you manage or interact with. Or to spark a career in data science. Or even to teach machine learning concepts to high school students.
The spreadsheet also features a traditional technique (linear regression) for comparison purposes.
Click here to read this article, download the spreadsheet, and start using it.
Thursday, June 22, 2017
We propose a simple model-free solution to compute any confidence interval and to extrapolate these intervals beyond the observations avai...
These articles are between 3 and 5 year old, but are still valuable today. The methodology used in these articles is modern, and still stat...
The list below is a (non-comprehensive) selection of what I believe should be taught first, in data science classes, based on 30 years of b...
Full title: Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems . Published June 2, 2018. Aut...