SciRuby

Tools for Scientific Computing in Ruby

Updates: Minimization and Integration

Minimization

The Minimization gem now supports the following unidimensional function minimizations provided by GSL. The supported methods include the pure Ruby implementations of:

  1. Newton–Raphson
  2. Golden Section
  3. Brent
  4. Quad Golden

Of these, the Golden Section, Brent, and Quad Golden are also available via Minimization’s GSL interface (and are thus faster). Everything is organized in such a way that the faster C code (i.e., GSL) will be executed when GSL is available, but that otherwise the Ruby implementation will be used. I still have to beautify the code and add documentation.

Integration

The Integration gem has been transitioned from Hoe to Bundler. For Gauss–Kronrod Quadrature, I have hard-coded the values of nodes and weights (for 15, 21, 31, 41, and 61 points) — which were already hardcoded in the case of the Gauss quadrature.

Additionally, I added basic methods like Simpson’s Three-Eighths Method, Milne’s Method, Boole’s Quadrature and Open Trapezoid.

This week, I will be reviewing a pull request which aims to change the structure of the whole Integration gem.

After that I plan to implement more adaptive methods and incorporate the non-adaptive methods under a single Newton–Cotes function.

Lastly, I am brainstorming designs for symbolic integration using JScience and JRuby.

Progress on Minimization Methods

Current Progress on Minimization Gem

In the first half of the summer, I plan to introduce some new numerical minimization methods to SciRuby’s Minimization gem. As per my proposal, I began by implementing the Powell’s multidimensional minimization method. Powell’s method has a better convergence in most cases than the Nelder–Mead algorithm, and is also a multidimensional minimization method which doesn’t use any derivative of the function.

I started by studying SciPy and Apache Commons library’s Powell’s optimizer. I decided to base my implementation on the method from the Apache Commons Mathematics Library. Powell’s method requires a line minimum searching algorithm, for which I used Brent minimizer (already available in SciRuby).

Having finished with Powell’s method, I am now working on the Fletcher–Reeves minimization method — a gradient method which uses the first derivative of the integrating function.

Introduction: Minimization and Integration (Lahiru)

Editor’s Note: We have two students working on numerical minimization and integration this summer, Rajat and Lahiru. Rajat’s introductory post appeared two weeks ago.

Introduction

I’m Lahiru Lasandun and I’m an undergraduate of University of Moratuwa, Sri Lanka. I’ve been selected for Google Summer of Code 2014 for SciRuby’s Minimization and Integration projects.

I was working with SciRuby about a month before GSOC started and did some tests on how to enhance the performance of these numerical computations. My first idea was to use multi-threading. With the instuctions and guidance of mentors, I tested more methods such as Erlang multi-processing, the AKKA package of multi-threading, and finally OpenCL. The final decision was to use OpenCL to enhance computation power of these mathematical computations with the support of multi-cores and GPUs.

Minimization Gem

After GSOC started, I began working on SciRuby’s Minimization gem. I proposed multidimensional minimization methods for the Minimization gem, which already had plenty of unidimensional minimization methods. I chose two non-gradient and two gradient minimization methods as well as simulated annealing.

Integration Gem

For Integration, I proposed to replicate some unidimensional integration methods from the GNU Scientific Library, GSL. Additionally, I proposed to add OpenCL support to enhance performance of integration methods.

Current Progress

Currently, I am working on Nelder–Mead multidimensional minimization method which is a non-gradient method, including working on the relevant test cases.

Introduction to the Minimization and Integration Project (Rajat)

Editor’s Note: We have two students working on numerical minimization and integration this summer, Rajat and Lahiru. Lahiru will be writing a separate post about his work.

Introduction to the Minimization and Integration Project

Hi. My name is Rajat Kapoor and I have been selected to work with SciRuby for Google Summer of Code 2014.

Minimization and Integration are two of the many available gems in the SciRuby suite. My project this year aims to improve these gems to replicate the functionality provided by GNU’s GSL. I will be trying to implement all the minimization and integration algorithms present in GSL in pure Ruby, with improvements as needed, so that these functions are easily accesible to all Ruby users, while the users which have GSL already installed will have an advantage in terms of speedy computations.

What Minimization and Integration actually mean

Minimization refers to the process of finding out the minimum of a mathematical function whose values might depend on multiple variables. Unidimensional minimization restricts these problems to functions of one single variable. Integration is the same as the very widely used concept in calculus which basically boils down to finding the summation of the value of a function at small intervals, when the width of the intervals in infinitesimally small. I can bet that you knowingly or unknowingly use both these things on a daily basis.

The plan

The project can be broken into two major parts: Minimization and Integration, as these are two seperate gems.

The Minimization gem can be broken in two parts: unidimensional (or univariate) and multidimensional (multivariate). With respect to coding, these two can again be broken down into sub-parts: pure Ruby implementations and GSL support. The Integration gem will include the pure Ruby implementations as well as GSL support. Along with this, some support for symbolic integration will be added for JRuby users by way of the JScience library.

Progress

The pure Ruby implementations of the unidimensional minimization part are almost finished. I am also working on the GSL support for the same along with it. I plan to finish up any unidimensional minimization work by the end of this week and start the work with multidimensional minimization methods.

Keep watching this blog for more updates regarding my project.

Introducing the FFTW SciRuby GSoC Project

My name is Magdalen Berns and I am a physics student with a technical background in live audio. I am particularly interested in using science and technology to improve access for all.

This summer, I will be working on implementing the external library appropriately named “Fastest Fourier Transform in the West” version 3 (FFTW3) C and Fortran API in Ruby for this year’s Google Summer of Code (GSoC).

The primary aim of the project is to give SciRuby the capability to handle signal analysis, processing and synthesis by performing discrete fast Fourier transform operations on NMatrix objects.

After some investigation during the preparation stages of GSoC, it was determined that implementing FFTW3 is more desirable than starting from scratch in pure Ruby because the FFTW3 API is already extensively used, developed, and optimised far beyond what would be achievable in just three months. So, putting FFTW3 in the driving seat allows the SciRuby project to take advantage of the good work of the FFTW3 developers by bringing it to Ruby.

Putting NMatrix to the test with FFTW3 should give users the opportunity to test drive NMatrix — and SciRuby’s NMatrix developers a chance to root out bugs.

Since a gem called ruby-fftw3 already existed to perform FFTW3 operations on NArray objects, I forked that repository as a starting point. Things are progressing on my Github fork right now.

My mentor for this project is Colin Fuller who is an exceptionally talented programmer — and he really knows his git too. He has been a great help as I adapt to the learning curve of working in C and Ruby (languages which I am less familiar with than say, Java or JavaScript).

As I work, I intend to share useful gems of information I gather. Those, in addition to my weekly project updates, will appear right here in this blog so others can hopefully benefit.

I have already posted a few useful bits and bobs on thismagpie.com which relate to my work so far. I hope to add those to the SciRuby blog, too, provided the readers are interested in that and time permits. Of course, readers here can feel free to have a browse of the keywords sciruby, ruby and git on there for the time being. I sometimes add posts, manuals and tutorials from external sites where I find useful ones on the web too, so watch out for these too.

Please, feel free to watch or follow along as the project comes together and those inclined are welcome to share constructive comments and advice or raise bugs on the fftw3 issue tracker. Input about my work is very welcome as the project progresses. This gem is being written for the community, after all!

You can find me on Twitter (Facebook) or GitHub under the username @thisMagpie.

Introducing the GSoC 2014 D3 Project

Hello. I am Naoki, one of four Google Summer of Code (GSoC) 2014 students in SciRuby. Let me introduce my project. The goal of the GSoC 2014 D3 Project is to create a new plotting library for SciRuby. D3.js is the most suitable JavaScript library to achieve this goal.

There are several non-Ruby plotting software libraries in the wild, like ggplot, matplotlib, and ggplot2. Actually, SciRuby already has its own plotting libraries named Plotrb and Rubyvis. The main feature of my project compared with those software packages is interactivity. Interactivity has various meanings here: interactivity when generating plots, interactivity when viewing them, and server–client interactivity. My project includes all of those.

My project can be divided into two components, one JavaScript and the other Ruby. JavaScript serves as a back-end, and Ruby as a front-end. I’m currently working on the former part. Have a look at a few examples I’ve assembled:

This project involves a number of challenges, but I believe it to be achievable during this Summer of Code. Thank you for reading!

Ruby Science Foundation Selected for GSoC 2014

We’re excited to announce that the Ruby Science Foundation has been selected as a mentoring organization for Google Summer of Code 2014!

Last year was our first year as a mentoring organization, and we had a great group of students working with us on machine learning, timeseries statistics, the semantic web, and scientific plotting.

This year we’ve got a super set of possible projects including more flexible matrix computations, automatic Ruby interface generation for scientific libraries, a dataframe library for structuring and manipulating datasets, interactive plotting, a scientific notebook, high-performance minimization and integration libraries, and a semantic web datastore backend for scientific computing.

If you’re interested in applying as a student, learning more, or even contributing independent of GSoC, head over to our GSoC 2014 ideas page to see what projects we think are great. Don’t hesitate to tell us if you’ve got an amazing idea for a different project, too! If you’re still left wondering where to start, check out the issue tracker for NMatrix, the matrix computation library used as the basis for a number of our projects, and our top priority at the moment.

Good luck to all the GSoC applicants out there, and happy coding!

Some Words From GSoC 2013 Alumni

In 2013, SciRuby was a mentoring organization for the Google Summer of Code. We asked our alumni:

1) How did you experience GSoC/SciRuby and what has it brought you?

2) What advice would you give new applicants?

Monica Dragan from Romania worked on gene validation, see also her blog. Actually, Monica was part of a different GSoC organisation, PhyloSoC, but also participated in our Ruby-centric meetings and code reviews. She shared her SciRuby GSoC experience:

Monica: During the GSoC period I developed a bioinformatics tool written in Ruby. First of all I learned a new programming language, as I had no experience with Ruby before. On this GSoC occasion I had the opportunity to get in touch with the community and I met people passionate about their work, with whom I continued the collaboration afterwards. But what I really gained from this experience is that I increased my enthusiasm about bioinformatics and I confirmed myself that this is the field I want to focus on in the next years.

Alberto Arrigoni from Italy worked on data mining and machine learning algorithms for Ruby and shared his GSoC experience:

Alberto: As a PhD student in the field of bioinformatics, my GSOC experience was very exciting and useful at different levels. On a training level, I had the unique chance to learn more in depth some topics of machine learning I had wanted to explore in the past, but never had quite the opportunity or the resources. On a more technical level, I appreciated the support of the GSOC mentors and the Sciruby community, which counts numerous experts and a very active mailing list.

Ankur Goel from India worked on statsample-timeseries for Ruby. Ankur shared,

Ankur: It was the best learning experience. I learnt quite a lot of statistics while working on my TimeSeries extension; after GSoC, I picked up Machine Learning course and I was able to relate it to very easily after working on regression techniques in GLM extension. I can’t thank enough for the opportunity provided and the trust endowed by my mentor on me. Learning to write quality code and getting reviews was a cherry on cake!

Will Strinz from Madison, USA, worked on RDF generators for Ruby for the semantic web:

Will: GSoC 2013 was a new experience for me in terms of managing my own time, planning my own project, and keeping up consistent interaction with my mentors across time zones. Despite a decent amount of prior experience with Ruby, it was also a challenge and an opportunity for me to really understand the tools and practices I knew, and learn to use the ones I wasn’t familiar with. As for what it’s brought me, aside from a job I secured partly through skills and project portfolio I gained during GSoC, and the power of knowing how to do just about any programming task using Ruby, I learned a lot about how to manage a project and interact with people in the real world. Communicating properly and in a timely manner over email and other asynchronous services is absolutely critical to the work I do now, and a lot harder than people make it out to be. Staying in touch with my mentors and making sure we were all on the same page about my project was something I spent a lot of time on, and in doing so I gained a lot of comfort with the process. Additionally, GSoC was my first true experience designing a large piece of software, where I couldn’t just give up and trash it when the code started getting messy or confusing. It really forced me to adopt good practices around testing and organization, especially since I had better programmers than myself looking over my work. Software architecture is something you just don’t learn in college level CS courses, and by the time I’d graduated, I’d started hearing a lot of my CS professors saying this too. Some day in the future, maybe soon, there will be classes taught about just this subject, but for now there’s no better way to learn about it than by working on a real project, with some accountability and motivation to actually get it done.

Our alumni give new GSoC applicants the following advice:

Monica: GSoC is a great experience that you should try as a student! What is cool about GSoC is that you work on the project you are keen on and manage your time as you wish. Also, working remotely involves additional challenges. In the end you improve your experience and get to know a lot of new and great people.

Alberto: I think one of the best features offered by the GSOC is the possibility to collaborate with (and learn from) people who share the same scientific interests and have very different backgrounds and skills. Though this may be somewhat ‘expected’ for mentors, I was also lucky to find other GSOC students willing to bond and share experiences and opinions. My advice is to be cooperative and try to learn as much as possible from/with them!

Ankur: Work really hard. Do your homework before you ask questions or before quoting anything in proposal. Writing a good proposal is necessary, and you must really be aware of what you are writing - a good research is necessary. SciRuby community members are readily available to help you at mailing list and #sciruby channel. A thorough discussion with the mentor will help you out.

Will: To new applicants this year I’d stress one thing above all else; get in touch with people on the sciruby mailing list. Introduce yourself as soon as possible, and start discussing your project ideas when you have something in mind. People on the mailing list are very friendly and helpful, so don’t be afraid to start a conversation, but also expect constructive criticism of your proposals. Answering any questions or concerns promptly and thoroughly not only shows that your know your stuff and are passionate about your project, it also shows that you are a good fit for GSoC in general. Don’t assume you’re in just because you’ve had a good dialogue, but plan and communicate as though you are; don’t wait for the project to start to fill in details or contact your prospective mentors personally. Once you’ve submitted a proposal, all of this goes double. The closer you get to the deadline, the less time there will be to polish your application and respond to questions, so make sure you’re doing it quickly and effectively.

Our SciRuby GSoC alumni added:

Monica: If I don’t join this year, I wish you good luck with the new students!

Ankur: I will be happy to sign up again as student, this year!

Will: I know I’ve said this already, but GSoC last year was a defining moment in my path to becoming a software developer, career-wise sure, but more importantly in the coder vs hacker vs computer scientist vs software developer sense. If there’s anything I can do to get involved this year, I’ll be available.

Statistics With Ruby: Time Series and General Linear Models

Editor’s Note: This is the third of four blog posts detailing our Google Summer of Code 2013 students’ work, edited by John Woods.

Gem Maintainer’s Note: These gems have changed recently. Edits reflect the changes.

Introduction

Statsample is a basic and advanced statistics suite in Ruby. It attempts to support JRuby and MRI/YARV equally, and also provides pure Ruby implementations for many functions.

Statsample is a ruby gem for statistical analysis in ruby.

It includes a rich API, except for problems involving time series and generalized linear models (GLM), for which the functionality was rather basic.

So, in this Google Summer of Code 2013 program, working on the SciRuby Project, I released two extensions:

These gems aim to take Statsample further and incorporate various functionalities and estimation techniques on continuous data.

Statsample TimeSeries

Statsample TimeSeries is equipped with a variety of operations. A few of those functionalities are:

  • _Autocorrelation of series: For finding repeating patterns (like a periodic signal) in noisy data or for identifying persistence (if it rained today, will it rain tomorrow?).
  • Autoregressive and Moving Average: Autoregressive models (AR and ARMA) are useful for describing random processes such as found in nature and economics believed to be predictable from past behavior (e.g., El NiƱo, the stock market).
  • Partial autocorrelation with Yule–Walker, a method for calculating the coefficients of autoregressive models.
  • Levinson–Durbin estimation: for solving linear equations involving a Toeplitz matrix, such as in signal processing or cyclic signals.
  • Kalman filtering (or linear quadratic estimation): often used for determining position and motion of a moving object based on sensor information (e.g., for drawing a vehicle’s position on a map using GPS data, or for aircraft or spacecraft navigation based on sensor inputs)

To get your hands dirty,

  • Install Statsample with gem install statsample.
  • Next, install the TimeSeries extension with gem install statsample-timeseries.

EDIT: Statsample-timeseries now uses daru for data storage and cleaning. Thus all ephemeral time series statistics functions (moving average, acf, etc.) have been moved to Daru::Vector, which can be indexed on a DateTimeIndex, which lets you access data indexed by a time stamp. See the daru README for examples.

Statsample::TimeSeries::Series has been deprecated in favour of Daru::Vector.

To demonstrate:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
require 'daru'
require 'statsample-timeseries'

ts = Daru::Vector.new(100.times.map { rand(100) }, index: Daru::DateTimeIndex.date_range(:start => '2012-2', :periods => 100))
ts.acf # Calculate auto-correlation
ts.pacf # Calculate partial autocorrelation
# Partial autocorrelation with 11 lags by maximum likelihood estimation
ts.pacf(11, 'mle')
ts.ar # Autoregressive coefficients

# ARIMA(2, 1, 1)
k_obj = Statsample::TimeSeries.arima(ts, 2, 1, 1)
k_obj.ar # autoregressive coefficients
k_obj.ma # moving average coefficients

Statsample GLM

Statsample GLM includes many helpful regression techniques, which can be used for regression analysis on data. Some of those techniques are:

The top level module for regression techniques is Statsample::GLM.

Using it is as simple as ever:

  • First, install statsample by gem install statsample.
  • Then, install GLM by gem installstatsample-glm`.

Let’s get started:

1
2
3
4
5
6
7
8
9
10
11
12
require 'daru'
require 'statsample-glm'
# Create the datasets:
x1 = Daru::Vector.new([0.537322309644812,-0.717124209978434,-0.519166718891331,0.434970973986765,-0.761822002215759,1.51170030921189,0.883854199811195,-0.908689798854196,1.70331977539793,-0.246971150634099,-1.59077593922623,-0.721548040910253,0.467025703920194,-0.510132788447137,0.430106510266798,-0.144353683251536,-1.54943800728303,0.849307651309298,-0.640304240933579,1.31462478279425,-0.399783455165345,0.0453055645017902,-2.58212161987746,-1.16484414309359,-1.08829266466281,-0.243893919684792,-1.96655661929441,0.301335373291024,-0.665832694463588,-0.0120650855753837,1.5116066367604,0.557300353673344,1.12829931872045,0.234443748015922,-2.03486690662651,0.275544751380246,-0.231465849558696,-0.356880153225012,-0.57746647541923,1.35758352580655,1.23971669378224,-0.662466275100489,0.313263561921793,-1.08783223256362,1.41964722846899,1.29325100940785,0.72153880625103,0.440580131022748,0.0351917814720056, -0.142353224879252])
x2 = Daru::Vector.new([-0.866655707911859,-0.367820249977585,0.361486610435,0.857332626245179,0.133438466268095,0.716104533073575,1.77206093023382,-0.10136697295802,-0.777086491435508,-0.204573554913706,0.963353531412233,-1.10103024900542,-0.404372761837392,-0.230226345183469,0.0363730246866971,-0.838265540390497,1.12543549657924,-0.57929175648001,-0.747060244805248,0.58946979365152,-0.531952663697324,1.53338594419818,0.521992029051441,1.41631763288724,0.611402316795129,-0.518355638373296,-0.515192557101107,-0.672697937866108,1.84347042325327,-0.21195540664804,-0.269869371631611,0.296155694010096,-2.18097898069634,-1.21314663927206,1.49193669881581,1.38969280369493,-0.400680808117106,-1.87282814976479,1.82394870451051,0.637864732838274,-0.141155946382493,0.0699950644281617,1.32568550595165,-0.412599258349398,0.14436832227506,-1.16507785388489,-2.16782049922428,0.24318371493798,0.258954871320764,-0.151966534521183])

y = Daru::Vector.new([0,0,1,0,1,1,1,1,0,1,1,1,1,0,1,0,1,1,0,1,0,1,1,1,1,0,0,1,1,0,0,1,0,0,1,1,0,0,1,1,0,1,1,1,1,0,0,0,1,1])

x = Daru::DataFrame.new({"x1"=>x1,"x2"=>x2})

obj = Statsample::GLM.compute(x, y, :binomial)
# => Returns logistic regression object

The documentation and API details is available here

We have some more plans for GLM module. First in the list is to make the algorithms work with singular value decomposition, because manual inversion of matrices is not fun for larger values in a Poisson regression.

Conclusion

I have blogged about most of the functionalities; additional information is available there.

For more updated use cases refer to the notebooks in the respective project READMEs.

Please explore and use the libraries; I eagerly await your input, suggestions and questions. Feel free to leave any questions on the Statsample GLM tracker or the Statsample TimeSeries tracker.

I had an amazing summer!

Stay tuned and Enjoy.

Call for Funding: More Women Needed in Open Source Science Software

Women make up 51% of the American workforce, and yet only 20% of software engineers are female. Worldwide, the situation is similar. In open source software engineering, the statistics are worse: only 1.5–5% are female.

One of the organizations which presented at the Google Summer of Code Mentor Summit was the GNOME Foundation’s Outreach Program for Women (OPW). OPW is similar to GSoC, except that OPW doesn’t require its applicants to be students — or know how to program when the coding period begins. The pay is competitive with GSoC. And of course, only women can apply.

In the process of our Google Code-In 2013 application, I recruited several female mentors to work with female GCI students — not a requirement, but I think it helps to have supportive people involved with whom one can identify. Unfortunately, we weren’t selected for the Code-In (not too disappointing given the several venerable and accomplished organizations that were chosen). But we want to have another go, this time by applying for the Outreach Program for Women.

Here’s where we need your help.

Work for a company that might want to support this goal? Show this to your boss. Have him or her get in touch with us (sciruby.project at gmail dot com).

If you don’t work for such a company, but would still like to help, you can also get in touch at the same email address. As a general rule of thumb, you can always donate via Pledgie, even if you don’t have access to tons of money.

By the way, here’s a blog post by one of our mentors, Anna Belak.