Friday, March 16, 2018

A Simple Introduction to Complex Stochastic Processes - Part 2

In my first article on this topic (see here) I introduced some of the complex stochastic processes used by Wall Street data scientists, using a simple approach that can be understood by people with no statistics background other than a first course such as stats 101. I defined and illustrated the continuous Brownian motion (the mother of all these stochastic processes) using approximations by discrete random walks, simply re-scaling the X-axis and the Y-axis appropriately, and making time increments (the X-axis) smaller and smaller, so that the limiting process is a time-continuous one. This was done without using any complicated mathematics such as measure theory or filtrations.
Here I am going one step further, introducing the integral and derivative of such processes, using rudimentary mathematics. All the articles that I've found on this subject are full of complicated equations and formulas. It is not the case here. Not only do I explain this material in simple English, but I also provide pictures to show how an Integrated Brownian motion looks like (I could not find such illustrations in the literature), how to compute its variance, and focus on applications, especially to number theory, Fintech and cryptography problems. Along the way, I discuss moving averages in a theoretical but basic framework (again with pictures), discussing what the optimal window should be for these (time-continuous or discrete) time series.
You can read the full article, here
DSC Resources

Monday, February 5, 2018

Are the Digits of Pi Truly Random? - Must Read for Math and Data Geeks

This article covers far more than the title suggests. It is written in simple English and accessible to quantitative professionals from a variety of backgrounds. Deep mathematical and data science research (including a result about the randomness of Pi, which is just a particular case) are presented here, without using arcane terminology or complicated equations.  
The topic discussed here, under a unified framework, is at the intersection of mathematics, probability theory, chaotic systems, stochastic processes, data and computer science. Many exotic objects are investigated, such as an unusual version of the logistic map, nested square roots, and representation of a number in a fractional or irrational base system. 
The article is also useful to anyone interested in learning these topics, whether they have any interest in the randomness or Pi or not, because of the numerous potential applications. I hope the style is refreshing, and I believe that you will find plenty of material rarely if ever discussed in textbooks or in the classroom. The requirements to understand this material are minimal, as I went to great lengths (over a period of years) to make it accessible to a large audience.
The randomness of the digits of Pi is one of the most fascinating, unsolved mathematical problems of all times, having been investigated by many million of people over several hundred years. The scope of this article encompasses this particular problem as part of a far more general framework. More questions are asked than answered, making this document a stepping stone for future research.
This article is structured as follows:
1. General Framework
  • Questions, Properties and Notations about Chaotic Sequences Investigated Here
  • Potential Applications, Including Random Number Generation
2. Examples of Chaotic Sequences Representing Numbers
  • Data Science Step
  • Mathematical Step
  • Numbers in Base 2, 10, 3/2 or Pi 
  • Nested Square Roots
  • Logistic Map
3. About the Randomness of the Digits of Pi
  • The Digits of Pi are Random in the Logistic Map System
  • Paths to Proving Randomness in the Decimal System
  • Connection with Brownian Motions
4. Curious Facts
  • Randomness and The Bad Seeds Paradox
  • Application to Cryptography, Financial Markets, and HPC
  • Exercises
  • Digits of Pi in Base Pi

Wednesday, January 31, 2018

Four Interesting Math Problems

The level in this article is for college students familiar with calculus, This material will be also of interest to college professors looking for new material to teach, or for original exam questions, as well as for business data scientists with some spare time, interested in refreshing their math skills. The problems cover real analysis, mathematical algorithms and numerical precision, correct visualizations, as well as geometry. The third problem is the most interesting one in my opinion, and could become a subject of active mathematical research with one new great, unsolved conjecture being proposed, of a probabilistic nature. The last problem has many applications in engineering science.
This article is structured as follows:
1. The Simplest Function Defined by an Infinite Product
  • Exercise
2. Surprising Series for Powers of Number 2
3. From Continuous Fractions to Nested Square Roots and More
  • Algorithm to compute the coefficients
  • Problems
  • Example: Nested Square Root for the Number Pi
  • Conjecture
4. Geometry: Shape Rearrangements and Coverage Problems

Thursday, January 11, 2018

Beautiful Number Theory Problem and Sandbox for Data Scientists

The Waring conjecture - actually a problem associated with a number of conjectures, many now being solved - is one of the most fascinating mathematical problems. This article covers new aspects of this problem, with a generalization and new conjectures, some with a tentative solution, and a new framework to tackle the problem. Yet it is written in simple English and accessible to the layman.
I also review a number of famous related mathematical conjectures, including one with a $1 million award still waiting for a solution, as well as Goldbach's conjecture, yet unproved as of today.  Many curious properties of the Floor function are also listed, and the emphasis is on machine learning and efficient computer-intensive algorithms to try to find surprising results, which then need to be formally proved or disproved.
Content of this article:
1. General Framework
  • Spectacular Result
  • New Generalization of Golbach's Conjecture
  • New Generalization of Fermat's Conjecture
2. Generalized Waring Problem
  • Definitions
  • Main Results
  • Open Problems
  • Fun Facts (Actually, Conjectures!)
3. Algorithms and Source Code
  • Case n = 2: Sums of Two Terms
  • Case n = 4: Sums of Four Terms
4. Related Conjectures and Solved Problems
  • The One Million Dollar Conjecture

Wednesday, December 27, 2017

A Simple Introduction to Complex Stochastic Processes

Stochastic processes have many applications, including in finance and physics. It is an interesting model to represent many phenomena. Unfortunately the theory behind it is very difficult, making it accessible to a few 'elite' data scientists, and not popular in business contexts.
One of the most simple examples is a random walk, and indeed easy to understand with no mathematical background. However, time-continuous stochastic processes are always defined and studied using advanced and abstract mathematical tools such as measure theory, martingales, and filtration. If you wanted to learn about this topic, get a deep understanding on how they work, but were deterred after reading the first few pages of any textbook on the subject due to jargon and arcane theories, here is your chance to really understand how it works.
Rather than making it a topic of interest to post-graduate scientists only, here I make it accessible to everyone, barely using any maths in my explanations besides the central limit theorem. In short, if you are a biologist, a journalist, a business executive, a student or an economist with no statistical knowledge beyond Stats 101, you will be able to get a deep understanding of the mechanics of complex stochastic processes, after reading this article. The focus is on using applied concepts that everyone is familiar with, rather than mathematical abstraction. 
My general philosophy is that powerful statistical modeling and machine learning can be done with simple techniques, understood by the layman, as illustrated in my article on machine learning without mathematics or advanced machine learning with basic excel
1. Construction of Time-Continuous Stochastic Processes: Brownian Motion 
Probably the most basic stochastic process is a random walk where the time is discrete. The process is defined by X(t+1) equal to X(t) + 1 with probability 0.5, and to X(t) - 1 with probability 0.5. It constitutes an infinite sequence of auto-correlated random variables indexed by time. For instance, it can represent the daily logarithm of stock prices, varying under market-neutral conditions. If we start at t = 0 with X(0) = 0, and if we define U(t) as a random variable taking the value +1 with probability 0.5, and -1 with probability 0.5, then X(n) = U(1) + ... + U(n).  Here we assume that the variables U(t) are independent and with the same distribution. Note that X(n) is a random variable taking integer values between -n and +n.
Five simulations of a Brownian motion  (x-axis is the time t, u-axis is Z(t)
What happens if we change the time scale (x-axis) from daily to hourly, or to every millisecond? We then also need to re-scale the values (y-axis) appropriately, otherwise the process exhibits massive oscillations (from -n to +n) in very short time periods. At the limit, if we consider infinitesimal time increments, the process becomes a continuous one. Much of the complex mathematics needed to define these continuous processes do no more than finding the correct re-scaling of the y-axis, to make the limiting process meaningful. 

Sunday, November 19, 2017

High Precision Computing: Benchmark, Examples, and Tutorial

In some applications, using the standard precision in your programming language of choice, may not be enough, and can lead to disastrous errors. In some cases, you work with a library that is supposed to provide very high precision, when in fact the library in question does not work as advertised. In some cases, lack of precision results in obvious problems that are easy to spot, and in some cases, everything seems to be working fine and you are not aware that your simulations are completely wrong after as little as 30 iterations. We explore this case in this article, using a simple example that can be used to test the precision of your tool and of your results. 
Such problems arise frequently with algorithms that do not converge to a fixed solution, but instead generate numbers that oscillate continuously in some interval, converging in distribution rather than in value, unlike traditional algorithms that aim to optimize some function. The examples abound in chaotic theory, and the simplest case is the recursion X(k + 1) = 4 X(k) (1- X(k)), starting with a seed s = X(0) in [0, 1]. We will use this example - known as the logistic map - to benchmark various computing systems.  
Read full article for explanations about this picture
Examples of algorithms that can be severely impacted by aggregated loss of precision, besides ill-conditioned problems, include:
  • Markov Chain Monte Carlo (MCMC) simulations, a modern statistical method of estimation for complex problems and nested or hierarchical models, including Bayesian networks. 
  • Reflective stochastic processes, see here. This includes some some types or Brownian or Wiener processes.
  • Chaotic processes, see here (especially section 2.) These include fractals. 
  • Continuous random number generators, see here.
The conclusions based on the faulty sequences generated are not necessarily invalid, as long as the focus is on the distribution being studied, rather than on the exact values from specific sequences.

Tuesday, November 7, 2017

Fascinating Chaotic Sequences with Cool Applications

Here we describe well-known chaotic sequences, including new generalizations, with application to random number generation, highly non-linear auto-regressive models for times series, simulation, random permutations, and the use of big numbers (libraries available in programming languages to work with numbers with hundreds of decimals) as standard computer precision almost always produces completely erroneous results after a few iterations  -- a fact rarely if ever mentioned in the scientific literature, but illustrated here, together with a solution. It is possible that all scientists who published on chaotic processes, used faulty numbers because of this issue.
This article is accessible to non-experts, even though we solve a special stochastic equation for the first time, providing an unexpected exact solution, for a new chaotic process that generalizes the logistic map. We also describe a general framework for continuous random number generators, and investigate the interesting auto-correlation structure associated with some of these sequences. References are provided, as well as fast source code to process big numbers accurately, and even an elegant mathematical proof in the last section.
This article is also a useful read for participants in our upcoming competition (to be announced soon) as it addresses a similar stochastic integral equation problem, also with exact solution, in the related context of self-correcting random walks - another kind of memory-less process. 
The approach used here starts with traditional data science and simulations for exploratory analysis, with empirical results confirmed later by mathematical arguments in the last section. 

A Simple Introduction to Complex Stochastic Processes - Part 2

In my first article on this topic (see   here ) I introduced some of the complex stochastic processes used by Wall Street data scientists, ...