Editor’s Note: This is the third of four blog posts detailing our Google Summer of Code 2013 students’ work, edited by John Woods.
Gem Maintainer’s Note: These gems have changed recently. Edits reflect the changes.
Statsample is a basic and advanced statistics suite in Ruby. It attempts to support JRuby and MRI/YARV equally, and also provides pure Ruby implementations for many functions.
Statsample is a ruby gem for statistical analysis in ruby.
It includes a rich API, except for problems involving time series and generalized linear models (GLM), for which the functionality was rather basic.
So, in this Google Summer of Code 2013 program, working on the SciRuby Project, I released two extensions:
These gems aim to take Statsample further and incorporate various functionalities and estimation techniques on continuous data.
Statsample TimeSeries is equipped with a variety of operations. A few of those functionalities are:
- _Autocorrelation of series: For finding repeating patterns (like a periodic signal) in noisy data or for identifying persistence (if it rained today, will it rain tomorrow?).
- Autoregressive and Moving Average: Autoregressive models (AR and ARMA) are useful for describing random processes such as found in nature and economics believed to be predictable from past behavior (e.g., El Niño, the stock market).
- Partial autocorrelation with Yule–Walker, a method for calculating the coefficients of autoregressive models.
- Levinson–Durbin estimation: for solving linear equations involving a Toeplitz matrix, such as in signal processing or cyclic signals.
- Kalman filtering (or linear quadratic estimation): often used for determining position and motion of a moving object based on sensor information (e.g., for drawing a vehicle’s position on a map using GPS data, or for aircraft or spacecraft navigation based on sensor inputs)
To get your hands dirty,
- Install Statsample with
gem install statsample.
- Next, install the TimeSeries extension with
gem install statsample-timeseries.
EDIT: Statsample-timeseries now uses daru for data storage and cleaning. Thus all ephemeral time series statistics functions (moving average, acf, etc.) have been moved to Daru::Vector, which can be indexed on a DateTimeIndex, which lets you access data indexed by a time stamp. See the daru README for examples.
Statsample::TimeSeries::Series has been deprecated in favour of
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Statsample GLM includes many helpful regression techniques, which can be used for regression analysis on data. Some of those techniques are:
- Poisson Regression: used to model contingency tables and counts
- Logistic Regression
- Exponential Regression: one case of nonlinear regression (examples might include the temperature of a cup of coffee left in a cold room, or the decay of an orbit)
- Iteratively Reweighted Least Squares: used to mitigate the effects of outliers
The top level module for regression techniques is
Using it is as simple as ever:
- First, install
gem install statsample.
- Then, install GLM by
Let’s get started:
1 2 3 4 5 6 7 8 9 10 11 12
We have some more plans for GLM module. First in the list is to make the algorithms work with singular value decomposition, because manual inversion of matrices is not fun for larger values in a Poisson regression.
I have blogged about most of the functionalities; additional information is available there.
For more updated use cases refer to the notebooks in the respective project READMEs.
I had an amazing summer!
Stay tuned and Enjoy.