Python or GO ? Which is the better machine learning language?

avatar

back.jpg

For those of you that followed me in the past, you probably know that I am Kotlin fanboy and for good reasons. That language is beautiful, expressive, easy to read and understand no matter the technical background. However its nature was not suited for my next project which consisted of implementing an algorithm that solves linear systems.

The problem 📕📖

I know it sounds like the most boring thing in the world and it kind of is, but this algorithm has some nice real-world usages. It can be used in recommendation techniques like the ones that Netflix or Spotify use to recommend you new stuff to watch and listen to. Let's say Netflix has a database of all the users and their movie rankings arranged as a matrix. This matrix has clearly a lot of rows that are similar to one another since your ratings match more or less with others from people with similar taste in movies. This means a lot of those rows are actually redundant when trying to perform an algorithmic computation of the recommendation. However Netflix can simply get from their database some other two smaller matrices that form the original one of ratings X movies, let's say a matrix with ratings X topic and another one of topic X movies. In machine learning terminology this is called non-negative matrix factorisation.

If we multiply the two matrices we get the original one, but that's not what we want. We want to make a prediction based on those two smaller matrices. Translated in mathematical terms, for the people that paid attention in maths, we need to solve the linear system U*V*b = y, where our original matrix A is A = U*V. This means we have to solve two linear systems V*b = x and X*x = y.

Supposing y is a vector of movie popularity we want to see how the popularity impacted the ratings. y can be anything related to movies that we want to analyze, like for example: a vector of yes/no values corresponding to whether Leonardo DiCaprio played in the respective movie. b will tell us how the y factor influenced the user ratings. This is called regression.

The algorithm I'm trying to implement will be:

DeepinScreenshot_selectarea_20200504135529.png

Enough math, let's talk languages 💬

Part 1. Which language is better?

I'm not going to say the usual trope of "all languages are equally good". Some are better than others depending on what you need to achieve. Are Python and GO better than others? Yes absolutely. There are also better languages than those two for this kind of problem like GNU Octave (nope not gonna recommend Matlab) or R or Julia which I haven't tried yet.

The advantage of Python and GO is that they are fairly easy to read, write and understand and support a large array of mathematical functionalities through NumPy and Gonum respectively. Unlike the others, Python and Go are general-purpose languages meaning that you can do anything with them and are not restricted to only one type of work.

Part 2. GO is immature, but it's really my fault for not knowing math

If you saw this post from a few days ago, I was venting about how I spent a whole day debugging the GO implementation of the algorithm. This was due to Gonum not computing the minimum-norm least-squares solution correctly through their Solve method. Or I shouldn't say correctly as their result was indeed correct it was just different than Python's. And so I spent two days figuring out how to write my own implememntation of a python functtion that calculated exactly what I needed.

And I failed. Python does some black-magic in the background that made their solution a little better than mine. And by a little better I mean 1.60942535335e-31 better.

Even though GO and Gonum might take some time to catch up to Python in terms of functionality, the team behind Gonum was extremely polite, professional and mature and due to our interaction we'll see a new Solve method around which will be just like Numpy's linalg.lstsq method. Thanks to everyone behind those two projects.

Anyway, as it tuned out, the solution obtained by my minimum-norm least-squares function was right even though different than Python's and the differences between the two probably was due to some rounding errors as it often happens when working with numbers that small.

So Math 1 - 0 Me.

Part 3. Python is great, but slow

It took me 4 hours to do everything in Python. So yeah... talk about efficiency in the development cycle. It was slow however. Much slower than GO even though the program was running in a multi-threaded manner.

In GO you have something called goroutines which are more or less like the Kotlin coroutines. They are a light construct and multiple coroutines can run on the same OS thread until one executes blocking code. This results in fewer used resources and improved speed when compared to traditional threads that Python uses.

goroutines.jpg

Some reader of my previous article suggested PyPy as a faster alternative to normal Python, but I saw no massive speed gains and it was hard to install due to incompatibilities with matplotlib (python library for plotting) and took a lot of time to install numpy.

The code 👨‍💻

Python and GO are pretty similar when writing code, and even though I prefer GO's style, I'll say that Python is the easier one to write and the more fluent language.

Let's some similarities. Computing matrix and vector norms in Python:

# Computing the squared Frobenuis norms of U and V
frobenius_U = pow(numpy.linalg.norm(U, 'fro'), 2)
frobenius_V = pow(numpy.linalg.norm(V, 'fro'), 2)

euclidean_U = pow(npy.linalg.norm(U[chosen_U], 2), 2)
euclidean_V = pow(npy.linalg.norm(V[chosen_V], 2), 2)

And the same in GO:

func FrobeniusSquared(matrix *mat.Dense) float64 {
    return math.Pow(mat.Norm(matrix, 2), 2)
}

func EuclideanNormSquared(vector mat.Vector) float64 {
    return math.Pow(mat.Norm(vector, 2), 2)
}

As you can see both have a dedicated function to get such norms, but the Python version has more options.

Since the algorithm uses matrix operations let's see how both languages handle those.
Python is first:

x = x + ((y[chosen_U][0] - numpy.array(U[chosen_U]).dot(x)[0]) / euclidean_U) * numpy.array(U[chosen_U]).T
        b = b + ((x[chosen_V][0] - numpy.array(V[chosen_V]).dot(b)[0]) / euclidean_V) * numpy.array(V[chosen_V]).T

And now GO:

x.AddScaledVec(
    x,
    (y.At(randU, 0)-mat.Dot(chosenU, x))/euclideanU,
    chosenU)

b.AddScaledVec(
    b,
    (x.At(randV, 0)-mat.Dot(chosenV, b))/euclideanV,
    chosenV)

It's pretty clear what both do, they multiply a number with a vector and then add the result to another vector. I like how GO is laid out here, while Python is easier to write since we are using the known mathematical symbols.

The algorithm is supposed to randomly choose rows from both the U and V matrix and compute the x and b vectors with the formula above. Both Python and GO have a function to take samples from a series of numbers with given weights.

The Python function would be something like:

def __choose_row(probs: []) -> int:
    """
    This method chooses a random row number from a bunch of rows depending
    on the probability of each row. Cool.

    :param probs: The probabilities of each row
    :returns: The number of a row
    """
    choices = range(len(probs))
    return npy.random.choice(
        choices,
        1,
        p=probs
    )

And the GO one:

// GetRandomRow performs weighted sampling with the weights you provide in the rowsProb array
//
// The scope of this function is to return a random row index for the RkRk and RekRek algorithms.
// Each row probability is computed by GetRowsProbability method
func GetRandomRow(rowsProb []float64) int {
    source := rand2.NewSource(uint64(time.Now().UnixNano()))

    index, _ := sampleuv.NewWeighted(rowsProb, source).Take()

    return index
}

The whole code can be freely examined and used for your own purposes. Both projects are posted on Github:

Speed 🏃‍♂

Okay, let's talk numbers. How fast are both languages?

The tests I am about to present were made on my laptop which has an Intel i7 6700HQ CPU, an NVIDIA 960M, 8GB RAM and a normal SSD

In the first test, U is of size 5x2 and V is of size 2x3. The algorithm is let to run 100.000.000 times or until the difference from the exact solution is smaller than 10-4.
The test is repeated 50 times so what you'll be seeing is the average of those 50 tests.

chart1.png

Let's raise the bar a little bit and let the maximum error be 10-32. Same matrix sizes. I had to adjust the iteration limit because 100.000.000 was too much for both languages and for my hardware to handle. So I capped the limit 1.000.000.

chart2.jpeg

As you can see the benefits of using PyPy over Python in this specific case is minimal. GO was a whooping 122 times faster than Python.

Alright let's run the last set of tests. We are going to test the number crunching performance of the two languages. We take U of size 500x200 and V of size 200x300. We set the tolerance of the system at 10-16 and let the 1.000.000 iterations limit as is. Let's go.

chart3.jpeg

Again at this specific type of workflow PyPy offers negligible benefits over regular Python and GO outperformed both of them. I'm not saying that PyPy has no benefits over regular Python, but the speed improvements it offers might no apply to the type of work my algorithm does.

For the sake of it, I've ran a few more tests with GO just to give you an idea of how insanely fast this language is.

chart4.jpeg


Conclusions

As I have stated in the preview, I think that the two languages have their ups and downs and it matters what you need to do.

Do you need to ship your product fast or do you need to ship a fast product?

This is not to say that shipping a product fast and using Python is bad, sometimes especially in the business world shipping a product on time is more valuable than shipping the best product. The time you would sink into something like GO which doesn't have a large of a community as Python can cost your company or yourself real money.

However if you need your project to be performant, I do recommend GO wholeheartedly.

What is your favourite? GO or Python? And why? Let me know.



0
0
0.000
9 comments
avatar

Hi! Thanks for sharing your thoughts. Your post is very complete and accurate. Unfortunately, I don't know anything about "GO", you made me curious! I will follow you with pleasure. Have a nice day!

!discovery 40

0
0
0.000
avatar

I'm glad I stirred your curiosity 😄. GO is a nice language, it may sometimes lack the "easiness" of python, but it can be used for various purposes and is a great language for parallel work

0
0
0.000
avatar

Really need to have a play with Go sometime. There was a tutorial floating around the web somewhere that showed how to write a mining algorithm with it but I never followed up on it. As for the machine learning I’ve tried a bit before when I took a short course with colleges on edx.org, for that we used R though which has a massive following in the stats community (also the first language, after Basic on the BBC, that I learned). Thanks for the article dude, solid work and good food for thought.

0
0
0.000
avatar

GO is great for parallel work. This one of the reasons it's so fast. The goroutines and being somewhat based on C make it a real speed demon.

But GO also has some interesting libraries for things unrelated to machine learning. You can even do web development in GO 😂

0
0
0.000
avatar

Congratulations @alexbalan! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

You received more than 300 upvotes. Your next target is to reach 400 upvotes.

You can view your badges on your board and compare to others on the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Do not miss the last post from @hivebuzz:

Revolution! Revolution!
Vote for us as a witness to get one more badge and upvotes from us with more power!
0
0
0.000
avatar

Based on my on perspective and experience in programming, python is the best choice. It's a very constructive language.

0
0
0.000
avatar

Python has the advantage of having a massive community behind and powerful set of libraries, but like any other language it's far from perfect and I wouldn't call it the best.

Maybe the best in some situations. I can agree with that depending on the situation.

0
0
0.000
avatar

Yeah.. it depends on the situation and intend purpose.

0
0
0.000