Getting HiveSQL to Work with Python on Linux
One of the most helpful projects that have come out of the Hive developer community in my opinion is the Microsoft SQL database that mirrors the Hive blockchain.
Having the ability to easily slice, sort, filter, and dice the information is a lifesaver, and makes things much quicker to query too!
Unfortunately for us open source freaks, Microsoft SQL server is not baked in to our operating systems so we have to go to a little more effort.
Partly for my own benefit, here is how to get your connection working on Ubuntu.
PyODBC for Python
We will be using PyODBC. If you are on Windows, congratulations, you just need to do the routine pip3 install pyodbc
. On Linux (or even Mac) you might need to give it a bit more help in the form of the ODBC drivers.
ODBC SQL Drivers
Open Database Connectivity (ODBC) is an application programming interface (API) that allows applications to access data from multiple database management systems (DBMSs)
You might wonder what ODBC is, and why Linux and MacOS do not have it.
Doesn't it have "Open" in the name?
ODBC drivers do exist on Linux, macOS and a variety of UNIX platforms. The API became the basis for the Call Level Interface (CLI) definitions from the SQL Access Group still in popular use in the Mini and Mainframe worlds. The vast majority of ODBC applications and drivers come from companies other than Microsoft.
It was originally created by Microsoft and Simba Technologies, and they provide drivers for Windows 95 and later, but anyone can write ODBC applications or CLI drivers.
Here is a BASH script for installing the ODBC driver from the shell and accepting the license terms.
Ubuntu
if ! [[ "18.04 20.04 22.04 24.04" == *"$(grep VERSION_ID /etc/os-release | cut -d '"' -f 2)"* ]];
then
echo "Ubuntu $(grep VERSION_ID /etc/os-release | cut -d '"' -f 2) is not currently supported.";
exit;
fi
# Download the package to configure the Microsoft repo
curl -sSL -O https://packages.microsoft.com/config/ubuntu/$(grep VERSION_ID /etc/os-release | cut -d '"' -f 2)/packages-microsoft-prod.deb
# Install the package
sudo dpkg -i packages-microsoft-prod.deb
# Delete the file
rm packages-microsoft-prod.deb
# Install the driver
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18
# optional: for bcp and sqlcmd
sudo ACCEPT_EULA=Y apt-get install -y mssql-tools18
echo 'export PATH="$PATH:/opt/mssql-tools18/bin"' >> ~/.bashrc
source ~/.bashrc
# optional: for unixODBC development headers
sudo apt-get install -y unixodbc-dev
MacOS
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release
brew update
HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql18 mssql-tools18
HiveSQL
To test it out you will need your HiveSQL password details.
If you haven't already, register with the instructions here.
You will get back a memo that contains the hostname to connect to, database name, your login and your password.
The safest easy way to keep these details safe from prying eyes in your Python code is to set them as environment variables using export
in your bash profile or shell script.
For example, here is how to get the most recent 10 posts for a tag/category, eg. #retro:
import os
import sys
import pyodbc
# Make a connection to the server
def get_connection():
db_host = os.environ['DBServer']
db_name = os.environ['Database']
db_user = os.environ['DBLogin']
db_password = os.environ['DBPassword']
connection = pyodbc.connect(driver='{ODBC Driver 18 for SQL Server}', host=db_host, database=db_name, user=db_user, password=db_password)
return connection
# Get the 10 most recent posts for the category, eg. 'retro'
def get_latest(connection, keyword):
cursor = connection.cursor()
sql_command ="select top (10) id, title, category, " \
"created, '@'+author+'/'+permlink as link " \
" from Comments " \
" where depth=0 and category='{}' " \
" order by ID desc;".format(keyword)
result = cursor.execute(sql_command)
columns = [column[0] for column in cursor.description]
results = []
for row in result.fetchall():
results.append(dict(zip(columns, row)))
return results
# Get the category keyword from the command line
if len(sys.argv) > 1:
keyword = str(sys.argv[1])
print(f"Filtering by '{keyword}'")
connection=get_connection()
print(get_latest(connection, keyword))
else:
print("Please supply the keyword")
exit()
Here’s how it works:
Environment Variables for Database Connection:
Database connection details (host, database name, username, and password) are stored in environment variables and are accessed using os.environ
.
Database Connection:
The get_connection()
function makes a connection to the SQL Server database using the pyodbc
library and the above details.
Fetc Recent Posts:
The get_latest(connection, keyword)
function accepts a database connection and a category keyword as arguments then it constructs an SQL query to select the 10 most recent posts (TOP (10)
), filtering by the provided category.
This query grabs the id, title, category, created date, and builds a link.
Finally the result is formatted using dictionaries, where each dictionary represents a row with column names as keys.
Command-Line Parameter:
Rather than hard code for a specific tag, the script checks if a category keyword is provided as a command-line argument.
If a keyword is provided, it is passed to the get_latest()
, and the results are printed to the terminal.
I've used @hivesql a lot as it's such an easy way to query Hive. I have various scripts on Github. The performance is pretty good. With my #BritList I was using beem to fetch some details, but I changed it all to HiveSQL and it was many times quicker.
Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!
Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).
You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support.