Friday, November 29, 2019

Variance, Attractors and Behavior of Chaotic Statistical Systems

We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distributions. The purpose is to create a unified theory of these systems. These systems can be deterministic or random, yet due to their gentle chaotic nature, they exhibit the same behavior in both cases. They lead to new models with numerous applications in Fintech, cryptography, simulation and benchmarking tests of statistical hypotheses. They are also related to numeration systems. One of the highlights in this article is the discovery of a simple variance formula for an infinite sum of highly correlated random variables. We also try to find and characterize attractor distributions: these are the limiting distributions for the systems in question, just like the Gaussian attractor is the universal attractor with finite variance in the central limit theorem framework. Each of these systems is governed by a specific functional equation, typically a stochastic integral equation whose solutions are the attractors. This equation helps establish many of their properties. The material discussed here is state-of-the-art and original, yet presented in a format accessible to professionals with limited exposure to statistical science. Physicists, statisticians, data scientists and people interested in signal processing, chaos modeling, or dynamical systems will find this article particularly interesting. Connection to other similar chaotic systems is also discussed.
Read the full article here.
Content of this article:
1. The Geometric System: Definition and Properties
  • A test for independence
  • Connection to the Fixed-Point Theorem
2. Geometric and Uniform Attractors
  • General formula
  • The geometric attractor
  • Not any distribution can be an attractor
  • The uniform attractor
3. Discrete X Resulting in a Gaussian-looking Attractor
  • Towards a numerical solution
4. Special Cases with Continuous Distribution for X
  • An almost perfect equality
  • Is the log-normal distribution an attractor?
5. Connection to Binary Digits and Singular Distributions
  • Numbers made up of random digits
  • Singular distributions
  • Connection to Infinite Random Products
6. A General Classification of Chaotic Statistical Distributions
Read the full article here.

Thursday, November 28, 2019

New Family of Generalized Gaussian or Cauchy Distributions

In this article, we explore a new type of generalized univariate normal distributions that satisfies useful statistical properties, with interesting applications. This new class of distributions is defined by its characteristic function, and applications are discussed in the last section. These distributions are semi-stable (we define what this means below). In short it is a much wider class than the stable distributions (the only stable distribution with a finite variance being the Gaussian one) and it encompasses all stable distributions as a subset. It is a sub-class of the divisible distributions. 
Content of this article:
  • New two-parameter distribution G(ab): introduction, properties
  • Generalized central limit theorem
  • Characteristic function
  • Density: special cases, moments, mathematical conjecture
  • Simulations
  • Weakly semi-stable distributions
  • Counter-example
  • Applications and conclusions
Read the full article here

Saturday, October 26, 2019

More Weird Statistical Distributions

Some original and very interesting material is presented here, with possible applications in Fintech. No need for a PhD in math to understand this article: I tried to make the presentation as simple as possible, focusing on high-level results rather than technicalities. Yet, professional statisticians and mathematicians, even academic researchers, will find some deep and fascinating results worth further exploring.
Can you identify patterns in this chart? (see section 2.2. in the article for an answer)
Let's start with 
Here the X(k)'s are random variable identically and independently distributed, commonly referred to as X. We are trying to find the distribution of Z.
Contents
1. Using a Simple Discrete Distribution for X
2. Towards a Better Model
  • Approximate Solution
  • The Fractal, Brownian-like Error Term
3. Finding X and Z Using Characteristic Functions
  • Test with Log-normal Distribution for X
  • Playing with the Characteristic Functions
  • Generalization to Continued Fractions and Nested Cubic Roots
4. Exercises
Read this article here

Wednesday, October 2, 2019

Surprising Uses of Synthetic Random Data Sets

I have used synthetic data sets many times for simulation purposes, most recently in my articles Six degrees of Separations between any two Datasets and How to Lie with p-values. Many applications (including the data sets themselves) can be found in my books Applied Stochastic Processes and New Foundations of Statistical Science. For instance, these data sets can be used to benchmark some statistical tests of hypothesis (the null hypothesis known to be true or false in advance) and to assess the power of such tests or confidence intervals. In other cases, it is used to simulate clusters and test cluster detection / pattern detection algorithms, see here.  I also used such data sets to discover two new deep conjectures in number theory (see here), to design new Fintech models such as bounded Brownian motions, and find new families of statistical distributions (see here).
Goldbach's comet 
In this article, I focus on peculiar random data sets to prove -- heuristically -- two of the most famous math conjectures in number theory, related to prime numbers: the Twin Prime conjecture, and the Goldbach conjecture. The methodology is at the intersection of probability theory, experimental math, and probabilistic number theory. It involves working with infinite data sets, dwarfing any data set found in any business context.
Read full article here.

Monday, September 9, 2019

Six Degrees of Separation Between Any Two Data Sets

This is an interesting data science conjecture, inspired by the well known six degrees of separation problem, stating that there is a link involving no more than 6 connections between any two people on Earth, say between you and anyone living (say) in North Korea.   
Here the link is between any two univariate data sets of the same size, say Data A and Data B. The claim is that there is a chain involving no more than 6 intermediary data sets, each highly correlated to the previous one (with a correlation above 0.8), between Data A and Data B. The concept is illustrated in the example below, where only 4 intermediary data sets (labeled Degree 1, Degree 2, Degree 3, and Degree 4) are actually needed. 
Correlation table for the 6 data sets
The view the (random) data sets, understand how the chain of intermediary data sets was built, and access the spreadsheets to reproduce the results or test on different data, follow this link. It makes for an interesting theoretical data science research project, for people with too much free time on their hands. 

Sunday, September 8, 2019

Two New Deep Conjectures in Probabilistic Number Theory

The material discussed here is also of interest to machine learning, AI, big data, and data science practitioners, as much of the work is based on heavy data processing, algorithms, efficient coding, testing, and experimentation. Also, it's not just two new conjectures, but paths and suggestions to solve these problems. The last section contains a few new, original exercises, some with solutions, and may be useful to students, researchers, and instructors offering math and statistics classes at the college level: they range from easy to very difficult. Some great probability theorems are also discussed, in layman's terms: see section 1.2. 
The two deep conjectures highlighted in this article (conjectures B and C) are related to the digit distribution of well known math constants such as Pi or log 2, with an emphasis on binary digits of SQRT(2). This is an old problem, one of the most famous ones in mathematics, still unsolved today.
Content of this article
A Strange Recursive Formula
  • Conjecture A
  • A deeper result
  • Conjecture B
  • Connection to the Berry-Esseen theorem
  • Potential path to solving this problem
Potential Solution Based on Special Rational Number Sequences
  • Interesting statistical result
  • Conjecture C
  • Another curious statistical result
Exercises
Read the full article here

Friday, August 30, 2019

A Strange Family of Statistical Distributions

I introduce here a family of very peculiar statistical distributions governed by two parameters: p, a real number in [0, 1], and b, an integer > 1. 
Potential applications are found in cryptography, Fintech (stock market modeling), Bitcoin, number theory, random number generation, benchmarking statistical tests (see here) and even gaming (see here.) However, the most interesting application is probably to gain insights about how non-normal numbers look like, especially their chaotic nature. It is a fundamental tool to help solve one of the most intriguing mathematical conjectures of all times (yet unsolved): are the digits of standard constants such as Pi or SQRT(2) uniformly distributed or not? For instance, when b = 2, any departure from p = 0.5 (a normal seed) results in a strong discontinuity for f(x) at x = 0.5. If you look at the above chart, f(0) = f(1/2) = f(1) regardless of p, but discontinuities are masking this fact. 

Variance, Attractors and Behavior of Chaotic Statistical Systems

We study the properties of a typical chaotic system to derive general insights that apply to a large class of unusual statistical distribut...