SciRuby

Tools for Scientific Computing in Ruby

GSoC 2015: New NMatrix Gems for Advanced Linear Algebra Features

My Google Summer of Code project was working on the NMatrix project, moving functionality that depends on external libraries from the core nmatrix gem to optional plugin gems. NMatrix is a Ruby library for linear algebra, used by many other projects. In addition to the code that was part of NMatrix proper, NMatrix previously required the ATLAS library, which implemented fast versions of common matrix operations like multiplication and inversion, as well as more advanced operations like eigenvalue decomposition and Cholesky decomposition.

There were two separate but related motivations for my project. The first was to simplify the NMatrix installation process. ATLAS can be difficult to install, so the installation process for NMatrix was complicated, especially on OS X, and may have discouraged people from using NMatrix. The second motivation was that by separating out the ATLAS code from the main NMatrix code, it would be easier to add new linear algebra backends which provide similar features. Indeed, I implemented a second backend this summer.

The end result of my summer’s work:

  • The core nmatrix gem does not depend on any external linear matrix libraries. It provides non-optimized implementations of common matrix operations.
  • All code that requires ATLAS has been moved into the new nmatrix-atlas gem, so that those who are only interested in the core functionality are not required to install ATLAS. nmatrix-atlas provides optimized implementations of common matrix operations, as well as advanced functions not available in nmatrix. I wrote a blog post describing the setup for releasing multiple gems from the same repository, which this required.
  • A new gem nmatrix-lapacke, which provides the same features as nmatrix-atlas, but instead of depending specifically on the ATLAS library, requires any generic LAPACK and BLAS implementation. This should be easier to use for many users as they may already have LAPACK installed (it comes pre-installed with OS X and is commonly used in Linux systems), but not ATLAS.
  • The installation procedure is simplified, especially for those installing just the nmatrix gem. Compare the new installation instructions to the old ones.

The one deviation from my original proposal was that I originally intended to remove all the ATLAS code and release only the nmatrix-lapacke plugin, so that we would only have one interface to the advanced linear algebra functions, but I decided to keep the ATLAS code, since the nmatrix-lapacke code is new and has not had a chance to be thoroughly tested.

Usage

1
2
3
4
5
require 'nmatrix'
# create a 3-by-3 matrix
a = NMatrix.new([3,3], [1,2,3, 4,5,6, 7,8,9], dtype: :float64)
#invert it using non-optimized NMatrix-internal implementation
a.invert!
1
2
3
4
5
6
require 'nmatrix'
require 'nmatrix/atlas' #or require 'nmatrix/lapacke'
# create a 3-by-3 matrix
a = NMatrix.new([3,3], [1,2,3, 4,5,6, 7,8,9], dtype: :float64)
#invert it using optimized implementation provided by ATLAS
a.invert!

For advanced functions not provided by the core nmatrix gem, for example gesvd, nmatrix-atlas and nmatrix-lapacke provide a common interface:

1
2
3
4
5
require 'nmatrix'
require 'nmatrix/atlas'
a = NMatrix.new([4,5],[1,0,0,0,2, 0,0,3,0,0, 0,0,0,0,0, 0,4,0,0,0],
dtype: dtype)
u, s, vt = a.gesvd
1
2
3
4
5
6
#Identical to the above, except for the require
require 'nmatrix'
require 'nmatrix/lapacke'
a = NMatrix.new([4,5],[1,0,0,0,2, 0,0,3,0,0, 0,0,0,0,0, 0,4,0,0,0],
dtype: dtype)
u, s, vt = a.gesvd

If the developer wants to use an advanced feature, but does not care whether the user is using nmatrix-atlas or nmatrix-lapacke, they can require nmatrix/lapack_plugin, which will require whichever of the two is available, instead of being forced to choose between the two.

As a fun test of the new gems, I did a very simple benchmark, just testing how long it took to invert a 1500-by-1500 matrix in place using NMatix#invert!:

  • nmatrix (no external libraries): 3.67s
  • nmatrix-atlas: 0.96s
  • nmatrix-lapacke with ATLAS: 0.99s
  • nmatrix-lapacke with OpenBLAS (multithreading enabled): 0.39s
  • nmatrix-lapacke with reference implementations of LAPACK and BLAS: 3.72s

This is not supposed to be a thorough or realistic benchmark (performance will depend on your system, on how you built the libraries, and on the exact functions that you use), but there are still a few interesting conclusions we can draw from it:

  • Performance is much better using the two highly optimized libraries (ATLAS and OpenBLAS) than using either the NMatrix internal implementation or the reference implementation.
  • When using ATLAS, performance is similar whether using nmatrix-atlas and nmatrix-lapacke (this means we could consider deprecating the nmatix-atlas gem).

Overall, my summer has been productive. I implemented everything that I proposed and feedback from testers so far has been positive. I plan to stay involved with NMatrix, especially to follow up on any issues related to my changes. Although I won’t be a student next summer, I would certainly consider participating in Google Summer of Code in the future as a mentor. I’d like to thank my mentor John Woods and the rest of the SciRuby community for support and feedback throughout the summer.

GnuplotRB and GSoC 2015

Introduction

This summer I’ve been participating in Google Summer of Code with GnuplotRB project (plotting tool for Ruby users based on Gnuplot) for SciRuby. GSoC is almost over and I’m releasing v0.3.1 of GnuplotRB as a gem. In this blog post I want to introduce the gem and highlight some of its capabilities.

Features

There are several existing plotting tools for Ruby such as Nyaplot, Plotrb, Rubyvis and Gnuplot gem. However they are not designed for large datasets and have fewer plotting styles and options than Gnuplot. Gnuplot gem was developed long ago and nowadays consists mostly of hacks and does not support modern Gnuplot features such as multiplot.

Therefore my goal was to develop new gem for Gnuplot which would allow full use of its features in Ruby. I was inspired to build an easy-to-use interface for the most commonly used features of Gnuplot and allow users to customize their plots with Gnuplot options as easily as possible in Rubyesque way.

Ruby Wrappers for SymEngine, a C++ Symbolic Manipulation Library

I am sure you have heard about Wolfram Alpha. Some of you would have used its Mathematics section at some point of time to cross check your solution to a maths problem. If you haven’t, please do. A computer algebra system (CAS) does the same thing. Algebraic computation or symbolic computation are all used interchangeably. A CAS solves problems the same way a human does, but way more quickly and precisely. There is very less chance for an error, that we humans often make.

Introduction

My project was to write the Ruby extensions for the library SymEngine and come up with a Ruby-ish interface, after which we can use the features of SymEngine from Ruby.

SymEngine is a library for symbolic computation in C++. You may ask, why SymEngine? There are other CASs that I know of. This question was indeed asked. At the beginning, the idea was to use ruby wrappers for sage (a mathematics software system) which uses Pynac, an interface to GiNaC (another CAS). As it turns out from the benchmarks, SymEngine is much faster than Pynac. What about directly wrapping GiNaC? SymEngine is also a bit faster than GiNaC.

The motivation for SymEngine itself is to develop it once in C++ and then use it from other languages rather than doing the same thing all over again for each language that it is required in. In particular, a long term goal is to make Sage as well as SymPy use it by default, thus unifying the Python CAS communities. The goal of implementing the Ruby wrappers is to provide a CAS for the Ruby community.

How will this be useful?

There are times when we might need a symbolic computation library. Here is an incomplete list of some of the situations:

  • We need to do some algebra to get systems of equations in suitable form for numerical computation.
  • We need to make substitutions for some variables and don’t want to risk a math error by hand. There might be times when we want to partially simplify an expression by substituting only a few of its variables.
  • We have a situation where we need to find the optimum number of something to maximise our profits, and the mathematical model devised is just too complicated to solve by hand.
  • We need to perform non-trivial derivatives or integrals.
  • We are trying to understand the domain of a new function.
  • Most root finding algorithms search for a single root, whereas often there are more than one.
  • We need to solve a linear system or manipulate a symbolic matrix exactly.
  • We need to manipulate exact expressions and solutions, as opposed to approximate ones using a numerical method.

With that said, a symbolic manipulation library is indispensable for scientists and students. Ruby has gained a great deal of popularity over the years, and a symbolic manipulation library gem like this project in Ruby might prove to be the foundation for a computer algebra system in Ruby. With many efforts like these, Ruby might become the first choice for academicians given how easy it is to code your logic in Ruby.

How to install the gem?

To install, please follow the compile instructions given in the README. After you are done, I would suggest to test the extensions. To run the test suite execute rspec spec on the command line, from the symengine/ruby dir.

The gem is still in alpha release. Please help us out by reporting any issues in the repo issue tracker.

What can I do with the gem?

Currently, the following features are available in the gem: - Construct expressions out of variables (mathematical). - Simplify the expressions. - Carry out arithmetic operations like +, -, *, /, ** with the variables and expressions. - Extract arguments or variables from an expression. - Differentiate an expression with respect to another. - Substitute variables with other expressions.

Features that will soon be ported to the SymEngine gem - Functions, including trigonometric, hyperbolic and some special functions. - Matrices, and their operations. - Basic number-theoretic functions.

I have developed a few IRuby notebooks that demonstrate the use of the new SymEngine module in ruby.

Below is an example taken from the notebooks.


Using the SymEngine Gem

SymEngine is a module in the extensions, and the classes are a part of it. So first you fire up the interpreter or an IRuby notebook and load the file:

1
2
require 'symengine'
=> true

Go ahead and try a function:

1
2
3
4
5
6
SymEngine.ascii_art
=>   _____           _____         _
    |   __|_ _ _____|   __|___ ___|_|___ ___
    |__   | | |     |   __|   | . | |   | -_|
    |_____|_  |_|_|_|_____|_|_|_  |_|_|_|___|
          |___|               |___|

or create a variable:

1
2
basic = SymEngine::Basic.new
=> #<SymEngine::Basic:0x00000001e95290>

This shows that we have successfully loaded the module.

SymEngine::Symbol

Just like there are variables like x, y, and z in a mathematical expression or equation, we have SymEngine::Symbol in SymEngine to represent them. To use a variable, first we need to make a SymEngine::Symbol object with the string we are going to represent the variable with.:

1
2
3
4
5
6
7
puts x = SymEngine::Symbol.new("x")
puts y = SymEngine::Symbol.new("y")
puts z = SymEngine::Symbol.new("z")

x
y
z

Then we can construct expressions out of them:

1
2
3
e = (x-y)*(x**y/z)
e.to_s
=> "x**y*(x - y)/z"

In SymEngine, every object is an instance of Basic or its subclasses. So, even an instance of SymEngine::Symbol is a Basic object.:

1
2
3
4
5
x.class
=> SymEngine::Symbol

x.is_a? SymEngine::Basic
=> true

Now that we have an expression, we would like to see it’s expanded form using #expand:

1
2
3
f = e.expand()
f.to_s
=> "x**(1 + y)/z - x**y*y/z"

Or check if two expressions are same:

1
2
f == - (x**y*y/z) + (x**y*x/z)
=> true

But e and f are not equal since they are only mathematically equal, not structurally:

1
2
e == f
=> false

Let us suppose you want to know what variables/symbols your expression has. You can do that with the #free_symbols method, which returns a set of the symbols that are in the expression.:

1
2
f.free_symbols
=> #<Set: {#<SymEngine::Basic:0x00000001f0ca70>, #<SymEngine::Basic:0x00000001f0ca48>, #<SymEngine::Basic:0x00000001f0ca20>}>

Let us use #map method to see the elements of the Set.:

1
2
f.free_symbols.map { |x| x.to_s }
=> ["x", "y", "z"]

#args returns the terms of the expression,:

1
2
f.args.map { |x| x.to_s }
["-x**y*y/z", "x**(1 + y)/z"]

or if it is a single term it breaks down the elements:

1
2
f.args[0].args.map { |k| k.to_s }
=> ["-1", "x**y", "y", "z**(-1)"]

SymEngine::Integer

You can make objects of class SymEngine::Integer. It’s like regular Integer in ruby kernel, except it can do all the operations a Basic object can — such as arithmetic operations, etc.:

1
2
3
4
5
a = SymEngine::Integer.new(12)
b = SymEngine::Integer.new(64)
a**b

=> 1168422057627266461843148138873451659428421700563161428957815831003136

Additionally, it can support numbers of arbitrarily large length.

1
2
(a**x).to_s
=> "12**x"

SymEngine::Rational

You can also make objects of class SymEngine::Rational which is the SymEngine counterpart for Rationals in Ruby.:

1
2
3
4
c = Rational('2/3')
d = SymEngine::Rational.new(c)

=> 2/3

Like any other Basic object arithmetic operations can be done on this rational type too.:

1
2
(a-d).to_s
=> "34/3"

You need not create an instance of SymEngine::Integer or SymEngine::Rational, every time you want to use them in an expression that uses many Integers. Let us say you already have Integer/Rational object. Even then you can use them without having to create a new SymEngine object.:

1
2
3
k = (1 / (x * y) - x * y + 2) * (c + x * y) # c is a Rational object, not SymEngine::Rational
k.to_s
=> "(2/3 + x*y)*(2 + 1/(x*y) - x*y)"

As you can see, ruby kernel Integers and Rationals interoperate seamlessly with the SymEngine objects.

1
2
k.expand.to_s
=> "7/3 + (2/3)*1/(x*y) + (4/3)*x*y - x**2*y**2"

What I learned

In the rest of the post, I would like to summarise my work and what I learned as a participant of Google Summer of Code 2015.

Pre-midterm Evaluations

I am a newbie when it comes to Ruby, and it took me a while to setup the gem and configure files for the building of extensions.

The struggle between shared, static and dynamic libraries

I faced a lot of problem in the early stages, when I was trying to build the extensions. Ondrej, my mentor, and Isuru, a fellow GSoC student, helped me a lot. There were many C flags that were being reported as missing. Some flags cmake added by default but extconf.rb didn’t, the same one that was required to be added to build it as a shared library. I am still confused about the details, some of which are explored in greater detail in my personal blog. Finally, the library had to be built as a dynamic one. The problem of missing C flags was resolved later by hooking the process to cmake rather than mkmf.

Load Errors and problems in linking

Many LoadErrors popped up, but were eventually solved. Ivan helped a lot in debugging the errors. In the end, it turned out to be a simple file missing in the gemspec, that was not being installed.

Reconfiguring building

One of our aims during developing this was to get rid of unessential dependencies. The ones we already had the tools for. Like later the file extconf.rb, that is used to generate Makefile for the extension was removed, because that could also be done by cmake. Flags were added to cmake for building the Ruby extensions, like the flag -DWITH_RUBY=yes. The Makefile then generates the library symengine.so in the directory lib/symengine.Along with extconf.rb, the file extconf.h was also gone. Along these lines, the dependency on rake was also removed, and with that the Rakefile. Any task automation will most probably be done in python. So, the Rake::ExtensionTask was done by cmake and the Rake::GemPackageTask was replaced by the manual method of gem build symengine.gemspec and gem install symengine-0.0.0.gem

Travis setup

Not many projects have travis-ci setup for multiple languages. Not even the tutorials had clearly mentioned about setting up for multiple languages. But I did know about one of them, which is Shogun, the machine-learning toolbox. I referred to their .travis.yml and setup it up. If something like this wouldn’t have worked the plan was to manually install the required version of ruby and then execute the shell commands.

Making a basic object

Finally, I was able to successfully build the extensions, link the extensions with the SymEngine library, load the ruby-extension library in the interpreter and successfully instantiate an object of type Basic.

Inheritance and Symbols

At this time, the way inheritance works(like the sequence of formation and destruction of objects of a class that had a superclass) with the Ruby C API, was confusing for all of us. I designed an experiment to check what was actually happening. That cleared things out, and made the it easier to wrap things from now on. I also wrapped the Symbol class during the course.

Post-midterm Evaluations

Redesign of the C interface

We had to design an ugly function to wrap vector in C. That led us to redesign the C interface. This approach had no reinterpret casting that was being done earlier. Each data structure had a type that was determined at compile time. For C, it was an opaque structure, while for C++ the opaque structure declared in the shared header file was implemented in the source file that had C++ data types. This blog post explains it further.

Integer and Rational

While trying to port the SymEngine classes, Integer and Rational, I had to port many methods in Basic before that. I also replicated the rake tasks in NMatrix, for detection of memory leaks, in form of bash scripts.

Common enumeration

Since all objects in the Ruby C API are of the type CBasic, we needed a function that would give us the typename during the runtime for the corresponding objects to be wrapped in ruby, as an object of the correct Class in ruby. Since, this was achieved with enum in C++, the same thing could be done in C too, with all the classes written manually again. But there was no guarantee for this to be consistent, if ever the features required to be wrapped for a new language, and also manually adding the class in all the enum list everytime a new class is added was prone to errors. So, to make this DRY, we automated this by sharing the list of enums. More details for the implementation can be found here.

Class coercion and interoperability

To support interoperability with the builtin ruby types, I had to overload the methods in builtin classes earlier(this was not continued). Overriding all the existing binary operations for a ruby class to support SymEngine types, violated the open/closed principle. There was indeed another way, which is ‘Class Coercion’. It was suggested by Isuru. After that, SymEngine types could seamlessly interoperate between the ruby types.

Arithmetic operations

After this, all the arithmetic operations had been successfully ported. Each Basic object can now perform arithmetic operations with other Basic object(sometimes even ruby objects like Integer). The test file in python, that had all the corresponding test cases was ported to its RSpec counterpart.

Substitutions

Recently I completed porting the substitutions module to the extensions(#subs). This feature has added a lot of convenience as now you can substitute a SymEngine::Symbol with some other value in an expression and then #expand to get the result.

Trigonometric functions

Currently, I am working on porting the trigonometric functions in SymEngine to the extensions. This would first require to wrap the Function class and then the TrigFunction class in SymEngine.

Integration of other Ruby gems

I also have plans to integrate the ruby bindings for gmp, mpfr and mpc libraries, that are already available as gems, with ruby bindings for our library. I have created an issue here. Feel free to drop any suggestions.


There is much scope for improvement in both the projects. For SymEngine, to support more features like polynomials and series-expansion in the near future, and improving the user interface and the exception handling for the extensions. In short, making the extensions more ruby-ish.

I am grateful to my mentor, Mr. Ondřej Čertík, the Ruby Science Foundation and the SymPy Organisation for the opportunity that they gave me and guiding me through the project, and my team-mates for helping me with the issues. I hope more people will contribute to the project and together we will give a nice symbolic manipulation gem to the Ruby community.

Summary of Work on Daru This Summer for GSOC 2015

Over this summer as a part of Google Summer of Code 2015, daru received a lot of upgrades and new features which have it made a pretty robust tool for data analysis in pure Ruby. Of course, a lot of work still remains for bringing daru at par with the other data analysis solutions on offer today, but I feel the work done this summer has put daru on that path.

The new features led to the inclusion of daru in many of SciRuby’s gems, which use daru’s data storage, access and indexing features for storing and carrying around data. Statsample, statsample-glm, statsample-timeseries, statsample-bivariate-extensions are all now compatible with daru and use Daru::Vector and Daru::DataFrame as their primary data structures. I also overhauled Daru’s plotting functionality, that interfaced with nyaplot for creating interactive plots directly from the data.

Also, new gems developed by other GSOC students, notably Ivan’s GnuplotRB gem and Alexej’s mixed_models gem both now accept data from daru data structures. Do see their repo pages for seeing interesting ways of using daru.

The work on daru is also proving to be quite useful for other people, which led to a talk/presentation at DeccanRubyConf 2015, which is one of the three major Ruby conferences in India. You can see the slides and notebooks presented at the talk here. Given the current interest in data analysis and the need for a viable solution in Ruby, I plan to take daru much further. Keep watching the repo for interesting updates :)

In the rest of this post I’ll elaborate on all the work done this summer.

Pre-mid term submissions

Daru as a gem before GSOC was not exactly user friendly. There were many cases, particularly the iterators, that required some thinking before anybody used them. This is against the design philosophy of daru, or even Ruby general, where surprising programmers with ubiqtuos constructs is usually frowned down upon by the community. So the first thing that I did mainly concerned overhauling the daru’s many iterators for both Vector and DataFrame.

For example, the #map iterator from Enumerable returns an Array no matter object you call it on. This was not the case before, where #map would a Daru::Vector or Daru::DataFrame. This behaviour was changed, and now #map returns an Array. If you want a Vector or a DataFrame of the modified values, you should call #recode on Vector or DataFrame.

Each of these iterators also accepts an optional argument, :row or :vector, which will define the axis over which iteration is supposed to be carried out. So now there are the #each, #map, #map!, #recode, #recode!, #collect, #collect_matrix, #all?, #any?, #keep_vector_if and #keep_row_if. To iterate over elements along with their respective indexes (or labels), you can likewise use #each_row_with_index, #each_vector_with_index, #map_rows_with_index, #map_vector_with_index, #collect_rows_with_index, #collect_vector_with_index or #each_index. I urge you to go over the docs of each of these methods to utilize the full power of daru.

Apart from the improvements to iterators there was also quite a bit of refactoring involved for many methods (courtesy Alexej). The refactoring of certain core methods has made daru much faster than previous versions.

The next (major) thing to do was making daru compatible with Statsample. This was very essential since statsample is very important tool for statistics in Ruby and it was using its own Vector and Dataset classes, which weren’t very robust as computation tools and very difficult to use when it came to cleaning or munging data. So I replaced statsample’s Vector and Dataset clases with Daru::Vector and Daru::DataFrame. It involved a significant amount of work on both statsample and daru — Statsample because many constructs had to be changed to make them compatible with daru, and daru because there was a lot of essential functionality in these classes that had to be ported to daru.

Porting code from statsample to daru improved daru significantly. There were a whole of statistics methods in statsample that were imported into daru and you can now use all them from daru. Statsample also works well with rubyvis, a great tool for visualization. You can now do that with daru as well.

Many new methods for reading and writing data to and from files were also added to daru. You can now read and write data to and from CSV, Excel, plain text files or even SQL databases.

In effect, daru is now completely compatible with Statsample (and all the other Statsample extensions). You can use daru data structures for storing data and pass them to statsample for performing computations. The biggest advantage of this approach is that the analysed data can be passed around to other scientific Ruby libraries (some of which listed above) that use daru as well. Since daru offers in-built functions to better ‘see’ your data, better visualization is possible.

See these blogs and notebooks for a complete overview of daru’s new features.

Also see the notebooks in the statsample README for using daru with statsample.

Post-mid term submissions

Most of time post the mid term submissions was spent in implementing the time series functions for daru.

I implemented a new index, the DateTimeIndex, which can used for indexing data on time stamps. It enables users to query data based on time stamps. Time stamps can either be specified with precise Ruby DateTime objects or can be specified as strings, which will lead to retrival of all the data falling under that time. For example specifying ‘2012’ returns all data that falls in the year 2012. See detailed usage of DateTimeIndex and DateTime in conjunction with other daru constructs in the daru README.

An essential utility in implementing DateTimeIndex was DateOffset, which is a new set of classes that offsets dates based on certain rules or business logic. It can advance or lag a Ruby DateTime to the nearest day, or any day of the week, or the end or beginning of the month, etc. DateOffset is an essential part of DateTimeIndex and can also be used as a stand-alone utility for advancing/lagging DateTime objects. This blog post elaborates more on the nuances of DateOffset and its usage.

The last thing done during the post mid term was complete compatibility with Ankur Goel’s statsample-timeseries, which was created by during GSOC 2013. Statsample-timeseries is a comprehensive suite offering various functions for statistical analysis of time sries data. It now works with daru containers and can be used for statistical analysis of data indexed on Daru::DateTimeIndex. See some use cases in the README.

I’d like to conclude by thanking all the people directly and indirectly involved in making this project a success - My mentor Carlos for his help and support throughout the summer, Ivan, Alexej and Will for their support and feedback in various stages of developing daru. Also a big thank you to all the SciRuby maintainers for making this happen!

NMatrix’s First Beta Release (Belated Announcement)

Almost two months ago, in December, we release our first beta of NMatrix, our linear algebra library for the Ruby language.

Rather than discussing what NMatrix has (which you can find in the readme), I want to take this opportunity to discuss where we’d like it to go in the future. Think of this as our roadmap to 1.0.

External Library Requirements

NMatrix currently has an extremely limited set of Ruby gem dependencies — essentially only packable, which is needed for some of the I/O functionality — but as with other linear algebra libraries, its native library requirements can be problematic.

Specifically, NMatrix requires ATLAS — which is virtually impossible to install on many Macs, and is no longer readily available with Yosemite — and also likes to have LAPACK around. Moreover, some would prefer to use other math libraries with NMatrix.

There are several partial solutions. The clearest is that any components of NMatrix which require external non-standard libraries need to be abstracted into separate gems — such as nmatrix-atlas and nmatrix-io — and that these components should be interchangeable with others. NMatrix then needs to be able to fall back on either native C/C++ or Ruby implementations of certain functionalities.

To this end, a number of simple LAPACK and CBLAS functions have been implemented in NMatrix, with volunteers working on a few others (the list is available in the README). One of our proposed Google Summer of Code 2015 project ideas involves breaking up NMatrix into several gems.

The end result will be a much simpler installation process, and more freedom of choice.

Removal of Unnecessary Features

We thought it would be quite clever to include rational number support in NMatrix, including three types (32-bit, 64-bit, and 128-bit). Unfortunately, these types increase compilation time by about 30% (back-of-the-envelope calculation), and aren’t particularly useful since most matrix operations with rationals as inputs give irrational-valued matrices as their outputs. The inclusion of rational types also complicates the codebase.

Additionally, rationals are computationally problematic; what happens if there is overflow? Ruby handles overflow gracefully in its own rational type, which NMatrix users would still be able to utilize via the :object dtype.

You

The biggest change going forward is that NMatrix development needs to support actual use-cases. In the past, we’ve aimed to flesh out support for sparse storage formats, as well as math operations for the dense storage-type. It is folly to develop features no one wants to use. Future core development, then, will aim to make NMatrix’s current feature set more usable.

That means that future non-core development depends on you. What would you like to see in NMatrix? What would make it more useful for you?

If you want to become involved, now is a good time. We’re getting ready to submit a Google Summer of Code application (our third, I believe), and we’re in need of students — and especially mentors. Please consider joining our mailing list and adding yourself as a mentor on the ideas page.

Thanks so much for your support!

Nyaplot: Interactive Plots Generator With Ruby

Introduction

Three months of GSoC 2014 term was over this week and finally I released my product Nyaplot as a gem. This blog post was written to introduce Nyaplot version 0.1.1.

Features

There are various plotting libraries in the world as ggplot2 from R community and Plotrb and Rubyvis from Ruby. The features of Nyaplot compared with those libraries are interactivity, simplicity, and extensibility.

Interactivity is the main theme of my D3 project in GSoC 2014. You can soon find the aforementioned interactivity by clicking plots with your mouse or trackpad. This feature is also available on browsers bundled with Android or iOS thanks to technologies like SVG and WebGL.

However, the word ‘interactivity’ is not limited to situations like the above. You can explore that for yourself by using Nyaplot in IRuby notebook. Various modules prepared by Nyaplot help you to create plots interactively in the notebook. You can also publish the result quickly by uploading the notebook to your Dropbox storage, a gist or pastebin, or somewhere else. Here is an example which Mauricio Giraldo created and published on gist.

Simplicity is also an important element. Many plotting libraries has MATLAB-like function-based API but Nyaplot does not. Nyaplot is designed in a more Ruby-like object-oriented style, and its plots consist of various objects created from different classes. But Nyaplot has simple shortcut methods and users may avoid the more complex API.

Extension libraries

Nyaplot can be easily and dramatically extended with some JavaScript code. It bundles 3 extension libraries in its gem package: Nyaplot3D for 3D plots, Mapnya for map visualization, and Bionya for circular plots inspired by circos. Their structure is very simple and you can write the extension easily if you have a little experience in JavaScript and Ruby. For example, the Ruby part of Bionya has only about a hundred lines, even though the graphic’s style is far from that of standard plots.

Quick start

You can install Nyaplot by simply running gem install nyaplot. After that, find example code from path_to_gems/nyaplot-0.1.1/examples/rb and run some of them using ruby command. These scripts will generate some plots with Nyaplot and export them as html files to current directory.

In addition, I strongly recommend you to install IRuby notebook at the same time. IRuby is a web-based, interactive Ruby environment and is useful for quickly creating plots with Nyaplot. The introduction video embedded at the top of this post was also created on IRuby.

IRuby depends on some software outside of Ruby-ecosystem, so its installation method is a bit complicated. Please read the description in the readme to learn more.

Usage

I already wrote tutorials, so I’d like to limit the role of this paragraph to the introduction of the basic usage. You can skip reading this paragraph and find details in the nbviewer tutorial and documentation.

The minimum code to create a scatter plot is as shown below:

1
2
plot = Nyaplot::Plot.new
sc = plot.add(:scatter, [0,1,2,3,4], [-1,2,-3,4,-5])

Nyaplot::Plot is the base class to create plots and Nyaplot::Plot#add is the method to add diagram to its instance. The first argument is to specify the type of diagram, and the second and third are for data mapped into x and y axes. Plot#add returns an instance of Nyaplot::Diagram and you can change attributes like color and stroke of each diagram component.

For example you can change its color by running the code below.

1
2
color = Nyaplot::Colors.qual
sc.color(color)

Nyaplot::Colors is the simple wrapper for Colorbrewer, and one of its methods Nyaplot::Colors.qual randomly returns colorset suitable for qualitative data.

If you execute this code in IRuby notebook, you can check if you favor the colorset through html-table based interface. See the tutorial to learn more.

Nyaplot::Colors

The plot generated from the code can be exported with two lines below.

1
2
plot.show # show plot on IRuby notebook
plot.export_html # export plot to the current directory as a html file

We cannot explain Nyaplot without Nyaplot::DataFrame. Example code above does not contain the word ‘DataFrame’, but the software internally create an instance of Nyaplot:DataFrame and creates plots based on it.

To create the same scatter plot as above, run the code below.

1
2
df = Nyaplot::DataFrame.new({a: [0,1,2,3,4], b: [-1,2,-3,4,-5]})
plot.add_with_df(df, :scatter, :a, :b)

You may not feel any advantage of using Plot#add_with_df instead of Plot#add, but the former is useful when creating more complicated plots. Have a look at the tutorial to learn more.

Conclusion

My GSoC term has concluded, but that do not mean the end of development. There are still many things to do for improvement of this project. For example Mapnya and Bionya are both experimental implementation, and the functionality of Nyaplot::DataFrame is relatively limited. If you are interested in this project, please fork either or both of the two repositories below and send me pull-requests.

Any form of contribution is welcome.

At last I’d like to express my gratitude to my mentor Pjotr Prins and other members of SciRuby community. I would not have been able to finish my GSoC term without great help from them. Feedback from people outside of the community also helped me a lot. I had a great summer thanks to all people related to this project.

Benchmarks for Integration Gem

The Integration gem now includes quite a few functions from which to choose. To make this choice easier, I have done some benchmarking to see how fast and acuurate each of these functions are. The results of these tests for some randomly chosen functions are given below. The speed test reports shows the user CPU time, system CPU time, the sum of the user and system CPU times, and the elapsed real time to perform 100 iterations of the code. The unit of time is seconds.

For function f(x)= x2+3*x+7 on interval (0,1)

Speed Tests

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Benchmarking with 100 iterations
                                user     system      total        real
rectangle                  72.040000   0.010000  72.050000 ( 72.058014)
                                user     system      total        real
trapezoid                 105.500000   0.000000 105.500000 (105.489225)
                                user     system      total        real
simpson                     0.020000   0.000000   0.020000 (  0.017601)
                                user     system      total        real
romberg                     0.000000   0.000000   0.000000 (  0.002219)
                                user     system      total        real
adaptive_quadrature         0.010000   0.000000   0.010000 (  0.002757)
                                user     system      total        real
gauss                       0.010000   0.000000   0.010000 (  0.006439)
                                user     system      total        real
gauss_kronrod               0.000000   0.000000   0.000000 (  0.008615)
                                user     system      total        real
simpson3by8                 0.080000   0.000000   0.080000 (  0.074791)
                                user     system      total        real
boole                       0.100000   0.000000   0.100000 (  0.100300)
                                user     system      total        real
open_trapezoid            181.470000   0.050000 181.520000 (181.572745)
                                user     system      total        real
milne                       0.060000   0.000000   0.060000 (  0.058677)
                                user     system      total        real
qng                         0.000000   0.000000   0.000000 (  0.003788)
                                user     system      total        real
qag                         0.010000   0.000000   0.010000 (  0.010407)

Accuracy Tests

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
+---------------------+-------------------+-------------------+------------------------+-------------------+
|       Method        |      Result       |   Actual Result   |         Error          |     Accuracy      |
+---------------------+-------------------+-------------------+------------------------+-------------------+
| rectangle           | 8.833333324123249 | 8.833333333333334 | 9.210085138988688e-09  | 99.99999989573489 |
| trapezoid           | 8.83333334502252  | 8.833333333333334 | 1.1689186507624072e-08 | 99.99999986766959 |
| simpson             | 8.833333333333332 | 8.833333333333334 | 1.7763568394002505e-15 | 99.99999999999997 |
| romberg             | 9.0               | 8.833333333333334 | 0.16666666666666607    | 98.11320754716982 |
| adaptive_quadrature | 8.833333333333334 | 8.833333333333334 | 0.0                    | 100.0             |
| gauss               | 8.750000000006125 | 8.833333333333334 | 0.08333333332720905    | 99.05660377365425 |
| gauss_kronrod       | 8.75              | 8.833333333333334 | 0.08333333333333393    | 99.0566037735849  |
| simpson3by8         | 8.833333333333336 | 8.833333333333334 | 1.7763568394002505e-15 | 99.99999999999997 |
| boole               | 8.833333333333336 | 8.833333333333334 | 1.7763568394002505e-15 | 99.99999999999997 |
| open_trapezoid      | 8.833333325264693 | 8.833333333333334 | 8.068640866554233e-09  | 99.9999999086569  |
| milne               | 8.833333333333334 | 8.833333333333334 | 0.0                    | 100.0             |
| qng                 | 8.833333333333334 | 8.833333333333334 | 0.0                    | 100.0             |
| qag                 | 8.833333333333336 | 8.833333333333334 | 1.7763568394002505e-15 | 99.99999999999997 |
+---------------------+-------------------+-------------------+------------------------+-------------------+

For function f(x) = x5+7*x2+2*cos(x) on interval (0,1)

Speed Tests

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Benchmarking with 100 iterations
                                user     system      total        real
rectangle                 534.790000   0.510000 535.300000 (538.238675)
                                user     system      total        real
trapezoid                 835.950000   0.540000 836.490000 (839.233313)
                                user     system      total        real
simpson                     0.650000   0.000000   0.650000 (  0.652694)
                                user     system      total        real
romberg                     0.010000   0.000000   0.010000 (  0.002073)
                                user     system      total        real
adaptive_quadrature         0.170000   0.000000   0.170000 (  0.167520)
                                user     system      total        real
gauss                       0.000000   0.000000   0.000000 (  0.007621)
                                user     system      total        real
gauss_kronrod               0.020000   0.000000   0.020000 (  0.010610)
                                user     system      total        real
simpson3by8                 0.820000   0.000000   0.820000 (  0.831039)
                                user     system      total        real
boole                       0.120000   0.000000   0.120000 (  0.119929)
                                user     system      total        real
open_trapezoid            1019.100000   0.150000 1019.250000 (1020.034828)
                                user     system      total        real
milne                       0.980000   0.000000   0.980000 (  0.970671)
                                user     system      total        real
qng                         0.000000   0.000000   0.000000 (  0.008263)
                                user     system      total        real
qag                         0.020000   0.000000   0.020000 (  0.020050)

Accuracy Tests

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
+---------------------+--------------------+--------------------+------------------------+-------------------+
|       Method        |       Result       |   Actual Result    |         Error          |     Accuracy      |
+---------------------+--------------------+--------------------+------------------------+-------------------+
| rectangle           | 4.182941950501391  | 4.1829419696157935 | 1.9114402505238104e-08 | 99.99999954303927 |
| trapezoid           | 4.182941993679488  | 4.1829419696157935 | 2.406369414842402e-08  | 99.99999942471844 |
| simpson             | 4.182941969798875  | 4.1829419696157935 | 1.8308110583120651e-10 | 99.99999999562314 |
| romberg             | 5.54030230586814   | 4.1829419696157935 | 1.3573603362523468     | 67.5501035847021  |
| adaptive_quadrature | 4.182941969702552  | 4.1829419696157935 | 8.675815621472793e-11  | 99.9999999979259  |
| gauss               | 3.5364151237832213 | 4.1829419696157935 | 0.6465268458325721     | 84.54372901826423 |
| gauss_kronrod       | 3.5364151237807455 | 4.1829419696157935 | 0.6465268458350479     | 84.54372901820506 |
| simpson3by8         | 4.182941969676288  | 4.1829419696157935 | 6.049472034419523e-11  | 99.99999999855378 |
| boole               | 4.182941969615793  | 4.1829419696157935 | 8.881784197001252e-16  | 99.99999999999997 |
| open_trapezoid      | 4.182941952971977  | 4.1829419696157935 | 1.6643816103112385e-08 | 99.99999960210263 |
| milne               | 4.18294196954598   | 4.1829419696157935 | 6.981348832368894e-11  | 99.99999999833099 |
| qng                 | 4.182941969615793  | 4.1829419696157935 | 8.881784197001252e-16  | 99.99999999999997 |
| qag                 | 4.1829419696157935 | 4.1829419696157935 | 0.0                    | 100.0             |
+---------------------+--------------------+--------------------+------------------------+-------------------+

If you are keen to test any function of your choice you can pull from my Integration gem fork and change the files in the benchmark folder. For both the files, accuracy.rb and speed.rb, change the line func = lambda{|x| f(x)} with f(x) as your desired function. In case you want to perform accuracy benchmarks, make sure that you know the exact result of the integration operation before hand and assign this value to the variable actual_result Feel free to comment if you face any problem running the benchmarks with your desired functions.

The Nyaplot Internal Event Handler

Nyaplot multi-pane example

Introduction

Last week I implemented the multiple panes interactivity to the front-end of Nyaplot. You can see the demo on this notebook.

Skip over single pane demo, and have a look at the paragraph ‘Multiple panes demo’ at the bottom. In this example, the histogram and bar chart are created individually, and then thrown into one instance of Nyaplot::Frame. After running frame.show, both of the diagrams appear in one out-put box on the notebook. When you click the histogram, bar chart refrects that and change its height.

In this article I will explain about the event handler of Nyaplot, which allows for interactivity among panes.

The Trick

The trick is actually hidden in Nyaplot::DataFrame. In this example, both of the two diagrams are generated from columns in one DataFrame. When Nyaplot receives a column object (instance of Nyaplot::Series, defined here), it internally find its source data frame and remembers its location.

Consequently, Nyaplot can find that histogram and bar chart are sharing their data source, and allows them to interact with each other.

Nyaplot::DataFrame Example

Implementation

Before explaining the event handler, I want to outline Nyaplot’s diagram rendering procedure. In the back-end JavaScript library, diagrams, such as histograms, fetch column data from data frames and generate their SVG DOM nodes dynamically (source code).

The key is a data frame written in JavaScript. After mouse events coming from a filter (such as the gray box on the plot area) are handled, the histogram registers its filter function to the data frame and instructs all diagrams to update their SVG DOM objects. The bar chart updates its SVG from data filtered by the new function.

As you can see, Nyaplot’s inner workings are fairly simple; and these workings are common to all diagrams supported by Nyaplot. So not only the histogram and bar chart, but other diagrams as well, can behave interactively (e.g., histogram, bar, and Venn example).

Conclusion

Despite, or perhaps because of, the relative simplicity of the structure, the event handler functions well. The actual interactivity in Nyaplot is based on the data frame object, both in the Ruby front-end and the JavaScript back-end. As a result, handling events in DataFrame is both natural and efficient for Nyaplot.

Additional Information

Nyaplot and its back-end library Nyaplotjs are being developed on GitHub.

If you are interesed, feel free to try it and raise issues or send pull-requests.

Usage: Integration Gem Improvements

Numerical Integration

Integration is the process of finding the summation of the value of a function at small intervals, when the width of the intervals is infinitesimally small. As there is no such thing as “infinitesimally small interval”, there are several other methods which help us to approximate the values of integrals.

Numerical integration is one such technique which helps us to estimate the value of the definite integral of a function over an interval. Numerical integration is based on the principle of evaluating the values of a function at specific points in the interval and then taking the product of those values with a corresponding weight, which is a standard constant specific to the method being used.

Integration Gem

The Integration gem is now equipped with a host of additional algorithms for approximating the integral of a function ranging from the simple ones like Simpson’s method to the more complex ones like Gauss–Kronrod and Adaptive Quadrature method.

Example

Let us say you need to find out the integral of a function 5*x**2 -7*Math.cos(x) over the interval (0,1). We can solve this using the following snippet using the Integration gem:

1
2
Integration.integrate(0,1,{:method=>:simpson}) {|x| 5*x**2 -7*Math.cos(x)}
# => -4.223630227110514

We see that the actual value of this integral is -4.2236302269886088799000849584, which is pretty close to the value estimated by the integration gem. Instead of {:method=>:simpson} in the above code, you can use any one of these implemented methods:

1
rectangle,:trapezoid,:simpson, :adaptive_quadrature , :gauss, :romberg, :monte_carlo, :gauss_kronrod, :simpson3by8, :boole, :open_trapezoid, :milne

Each of these algorithms gives similar results — except if the function has rapid changes in the interval of integration, as is the nature of numerical integration.

The Progress of the GSoC D3/Visualization Project

This week is the half-way point of this year’s Google Summer of Code. Let me report the progress during the term.

Example Nyaplot chart of a spiral

Progress on the front-end library

I am developing Nyaplot as the front-end library for my work.

One topic on Nyaplot is API design. That is a difficult problem since each plotting library has very different methods for creating plots. For example, matplotlib and Bokeh are both written in Python, but their demo code is written in very different styles. After discussing with my mentor, I decided to implement the function-based API.

As an exmaple of that API, here is a minimal working example for generating bar charts with Nyaplot:

1
2
3
4
require 'nyaplot'
plot = Nyaplot::Plot.new
plot.add(:bar, ['nya1', 'nya2', 'nya3'],[10,20,30])
plot.show

If you want to change options (e.g., color or title), put the return value of Nyaplot::Plot.add into a variable and call its methods.

1
2
bar = plot.add(:bar, ['nya1', 'nya2', 'nya3'],[10,20,30])
bar.title("the number of cats") # name the bar chart

IRuby integration

Interaction with IRuby is a priority for Nyaplot. IRuby is a web-based Ruby lab notebook-like environment, based on IPython, which is also useful for fast prototyping.

The image at the top of this blog post is a hyperbolic spiral which I generated with Nyaplot in IRuby. Have a look at the notebook on nbviewer, an IRuby and IPython notebook hosting services.

Progress on the back-end library

Nyaplot uses Nyaplotjs as its back-end library. I spent a lot of time working to implement interactivity among multiple panes to Nyaplotjs. See an example here.

Three-pane interactivity screenshot

The input data source is a TSV file of 2,044 lines. Multiple-pane interactivity is especially important when visualizing such a large dataset.

Have a look at the radio buttons beside the Venn diagram. The left three panels decide which set to put into each of three circles (VENN1, VENN2, VENN3). The right panel decides which data in each area (overlapping area, non-overlapping area, and all) to use in the other two panes.

The gray box on the histogram decides the range of PNR values. If you select the range 0.3 to 0.7 with it, the right bar chart will reflect that and show how many of the selected data are in that range.

Nyaplotjs provides this interactivity using an event handler connected to a dataframe object. That allows us to handle unidirectional update method (e.g., histogram updates bar chart, but bar chart does not update histogram).

Conclusion

I finished the first half term of Google Summer of Code 2014, but I still have a lot of things to do. I would like to continue to add interactivity to both the front-end and back-end libraries.

I am developing those two libraries in separate repositories on GitHub. If you are interested, feel free to raise issues or send pull-requests, even during the GSoC term.