Subqueries and the JOIN command in MySQL

Hello fellow devs, this article is a continuation of the one I published a few days ago (which you can check out here). In the last article, I explained some basic SQL commands by going over the solutions for a project given by Harvard, where I have to query a SQL file containing data from IMDb. The topics I didn't talk about are subqueries and joining two or more tables with the JOIN command, so I will explain them by solving two problems.

The first problem

If you read my previous article, you can probably see the answer to this question. But for those who don't know, this is what the answer looks like:

SELECT AVG(rating)
FROM ratings 
WHERE year = 2012;

What the above code simply means is: give me the average rating of all the movies released in 2012. The AVG command calculates the average value of whatever is passed into it, and in this case, it's the ratings of all the movies released in 2012 but that code won't work here and I will explain why but first, let's look at the structure of the database.

Go down a bit and you will see CREATE TABLE ratings, that's the ratings table and in it, we have a movie_id column, a rating column (which contains the ratings of all the movies), a votes column (containing the total votes each movie has) and the last thing is something called FOREIGN KEY which isn't a column and I will explain it later on.

You can see that the ratings table doesn't have a year column that contains the year in which each movie was released but that year column can be seen in the movies table (the first table in the database). So, how do we access the year column if it exists in another table? That's where subqueries come in.

SELECT AVG(rating)
FROM ratings
WHERE movie_id
IN (SELECT id
FROM movies
WHERE year = 2012);

The code looks similar to the first one but you can see a couple of changes in it. Subqueries can be a bit confusing but one technique I use is to break the problem into small parts and start building the query from the inner one. To get the average ratings of all movies released in 2012, we must first get those movies before worrying about their ratings.

SELECT id
FROM movies
WHERE year = 2012;

This simply means give me the id number of all the movies released in 2012. The result might be something like: 1, 2, 5, 6, 7, ... n. We can now use this id number to access the rating of each movie in the ratings table, and we do that by using the movie_id column found in the ratings table. The id number column in the movies table corresponds to the movie_id column in the ratings table, that's the work of that FOREIGN KEY you saw in the ratings table. Simply put: id = movie_id

SELECT AVG(rating)
FROM ratings
WHERE movie_id
IN (SELECT id
FROM movies
WHERE year = 2012);

The code in the bracket is the subquery and it executes first and now we have the id of all movies released in 2012. The rest of the code goes thus: give me the average rating from the ratings table where the movie_id is in the result of that subquery. Let's assume the subquery gave a result of 1, 2, 5, 6, 7 which is the id of movies released in 2012, the outer query will look for the ratings of the movies whose movie_id is in that result. Recall: movie_id = Id

This is the output:

The second problem

This is similar to the last problem but the same technique won't be effective and we have to use a command known as JOIN. This command does a very simple thing: it joins two or more tables into one big table, and now we can access the peculiar columns of data in each of the tables.

SELECT movies.title, ratings.rating
FROM movies
JOIN ratings
ON movies.id = ratings.movie_id
WHERE movies.year = 2010
ORDER BY rating DESC, title;

The question said we should list movies and their ratings, that's exactly what the first line does. The two things we are listing out are in two different tables, so I specify them by saying movies.title and ratings.rating. The dot is similar to the dot notation in JavaScript used for accessing the properties of an object and in python, it is used to access the items in a dictionary.

In SQL, the dot notation is used to access the columns of a table. So, in that first line of code, I am saying: go into the movies table and select the contents of the title column, the same thing applies to the ratings table. FROM movies JOIN ratings is simply how we join both tables, the ratings and movies table are now joined together but before that can happen, they must be joined together by a column that is common to both tables.

Do you recall where I said there's a movie_id column in the ratings table that corresponds to the id column in the movies table? That's exactly the column that is common to both of these tables, they are bound together via the FOREIGN KEY I mentioned earlier. The ON command is used to specify the columns that are peculiar to both tables and in this case, it is the id and movie_id columns, which makes movies.id = ratings.movie_id true

After joining both tables, we can now access the data that exists in both tables. Since we are told to only list out movies released in 2010, we do that with the WHERE command and then we sort or order the result by their ratings, starting from the highest to the lowest (descending order). For movies that have the same ratings, we order them alphabetically by their title, this explains the last line of code.

This is the output. I have limited the result to only 10:

Note: This same JOIN method can be used to solve the first problem instead of using a subquery, it will produce the same result.

And that is it!

SQL is a very fascinating and interesting language, I really enjoyed working on this project by Harvard and I learned a lot about SQL. I also worked on another project where I had to use SQL queries to figure out who stole Harvard's rubber duck by using codes to look for clues in a database file containing witness reports, ATM records, security logs, and a lot more.

It was a very fun project but I won't write about it coz it will probably take me over 5 articles to completely cover it 😆 it was way more complicated than this one. For now, I have set SQL aside and started learning JavaScript, I can't wait to start building fun projects with it and writing about them.

Thanks for reading

Connect with me on:
Twitter: @kushyzeena
Readcash: @kushyzee

Lead image: Image by pressfoto on Freepik
Edited with Canva


0
0
0.000
9 comments
avatar

Congratulations @kushyzee! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

You distributed more than 800 upvotes.
Your next target is to reach 900 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Check out the last post from @hivebuzz:

Our Hive Power Delegations to the August PUM Winners
Feedback from the September 1st Hive Power Up Day
Hive Power Up Month Challenge 2022-08 - Winners List
Support the HiveBuzz project. Vote for our proposal!
0
0
0.000
avatar

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support. 
 

0
0
0.000
avatar

Gotta remember to keep this as a guide, I always know just enough SQL to solve the problem at hand then promptly forget it.

0
0
0.000
avatar
(Edited)

Don't we all 😆 most times it's not always about knowing the right syntax to solve a problem, but it's about knowing how to solve a problem and you can use that to search for the right syntax using Google. Thank you so much for reading my article, I appreciate it

0
0
0.000
avatar

The problem is when you needs to perform lots of JOIN commands lol it starts to grow big the query!
!1UP

0
0
0.000
avatar

That is exactly what happened in the last project I did. I joined over 5 tables and it got to a part where it became very confusing 😂 I could no longer keep up with the codes . Thank you for reading

0
0
0.000