Using hiveSQL to find content

avatar
(Edited)

I was asked recently about using @hiveSQL as a tool to find 'good posts' to curate, and so at the risk of facing the wrath of @arcange (who may appear in the comments to tell me my query is killing the DataBase), here is a sample query with the code explanations.


data.jpg
middle right, lol


First you will need some login details for hiveSQL. Send 0.01 HBD to @hiveSQL and wait for an encrypted memo. I use peakD and Hive Keychain and so there will be a padlock next to the memo to decrypt it.


image.png
Reaz wants dataaa, hiveSQL obliges


image.png
Click the padlock to find out what's inside!

Next you will need a tool to connect to the DataBase. I use Linqpad which is free for what we require: https://www.linqpad.net/Download.aspx

Once it is downloaded and installed, you can use the details sent to you via the encrypted memo to connect.


image.png
Check 'Remember this connection' at the bottom of this window to save some time


Now you are connected it's time to pull some data. But first, a quick run through the buttons you'll need. From left to right:

image.png

Run your query, Stop your query, Rich Text results, and Tabular Results. Also, make sure your Language is set to SQL.

And a note before we start - it is not a good idea to select everything from a table. If your query is running for more than a minute or two, it might be an idea to stop it and have a rethink. It is also not good to use 'LIKE' statements to scan lots of text. I have done that here and but only on a small set of data - there is likely a better way!


Let's start with finding out content created in last 1 hour by author having reputation between 30 to 60 and post having minimum 500 words and reward less than 1$?

This is a fairly basic template, and so I've added some more filters in. The full query is below but I want to break it up to explain various parts.


SELECT
    author, A.reputation_ui,'https://hive.blog/'+ url, 
    left(body,500),
    pending_payout_value, 
    C.created, 
    body_length as CharacterCount

This is what will hopefully be returned by your query. The author, their reputation, the URL of the post (with your choice of condenser appended to the front, the first 500 characters of text (for an idea of the content), when it was created, and the number of characters in the content. A word count is not available and so dividing that total by 4 or 5 should give a rough idea.


FROM 
    Comments C
INNER JOIN Accounts A on C.author=A.name
WHERE

The Comments table is where 'posts' and comments/replies are kept. They are treated the same by the blockchain and SQL database, but are distinguishable in the DB using a couple of methods (one is discussed further below). For the author reputation, a join is made to the Accounts table. This is not so important for today but for those who care, the 'C' and the 'A' are aliases given to these tables when referencing them and the columns they hold.

'WHERE'.... stuff ='s other stuff.


    (
    NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"cn"%') 
    AND NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"ru"%')
    AND NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"kr"%') 
    )

This is used to filter out tags/topics. I don't know these languages and so wish to exclude them from the results. $.tags is a section of C.json_metadata and so wildcard searching, although painful, is less of an issue. There is likely a better way to do this, and I hope someone can chip in.


AND NOT category IN ('porn','tits','arse','fanny')

The category is the first tag and fortunately has its own column. You can use this to exclude tribes ('hive-XXXXX') or other topics you'd rather not see in the results.


AND depth = 0

This ensures that only top level comments (posts to you and I) are returned and it's use will save time and DataBase pain.


AND NOT author IN ('abh12345','add_spammer_here')

You know those authors who you are sick of seeing each day? You can exclude them here.


AND NOT active_votes LIKE('%"spaminator"%')

Another part of the query to make arcange cry, a wildcard search through the voters so far. In this case, if 'spaminator' has given the thumbs down to the content, it wont appear in the results.


AND datediff(minute, C.created,GEUTCTUTCDATE()) < 60*4

This compares the creation date of the post to this moment in time, converts that into seconds and compares against 60*4 - 4 hours. I ran the query with the suggested 1 hour window and 0 results were returned, damn!


AND A.reputation_ui BETWEEN 30 AND 60

Joining to the Accounts table and using 'A' as the alias, we can check the author reputation. In this case, the reputation will need to be between..... 30 and 60 to appear in the results.


AND pending_payout_value > 0.0100
AND pending_payout_value < 3.0000 

The could also be written as a 'BETWEEN' statement and it is to check the current pending payout of the post.


AND body_length  > 4000

The post must be at least 4000 characters in length, which if we use 4.79 characters as the average word length, is about 835 words. Fairly chunky for a post these days.


ORDER BY created desc

And finally, the way to order the results. I have used the date with the newest post first.


And here is the query in full:

SELECT
    author, A.reputation_ui,'https://hive.blog/'+ url, 
    left(body,500),
    pending_payout_value, 
    C.created, 
    body_length as CharacterCount
FROM 
    Comments C
INNER JOIN Accounts A on C.author=A.name
WHERE
    (
    NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"cn"%') 
    AND NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"ru"%')
    AND NOT JSON_QUERY(C.json_metadata,'$.tags') LIKE('%"kr"%') 
    )
AND NOT category IN ('hive','porn')
AND depth = 0
AND NOT author IN ('abh12345','add_spammer_here')
AND NOT active_votes LIKE('%"spaminator"%')
AND datediff(minute, C.created, GETUTCUTCDATE()) < 60*4
AND A.reputation_ui BETWEEN 30 AND 60
AND pending_payout_value > 0.0100
AND pending_payout_value < 3.0000 
AND body_length > 4000
ORDER BY created desc

The query above has taken less than a second to return a set of results which has been between 8 and 16 items. To me, that doesn't sound like a lot of posts in the past 4 hours, but obviously it is due to the criteria chosen and this can of course be changed.

And change you will need to, unless you speak Spanish or Polish. Although I did find this one:

https://peakd.com/@gubbatv/posts

And I'm not convinced it is her so I sent a short message and will comment on the post if I find out more.

Edit: Confirmed to be the owner of https://gubbatv.com/

Anyway, I hope this helps @sanjeevm, and anyone else who is interested. If you plan to have a go with the above, check the post again later to see if arcange has commented with some nicer script to replace the wildcard 'LIKE' searches.

Cheers,

Ash

Posted with STEMGeeks



0
0
0.000
80 comments
avatar

I really ought to play with this. I have done plenty of SQL in the past, so I should be able to get into it.

0
0
0.000
avatar

Nice, then it's just a matter of getting used to the views and what data is where :)

0
0
0.000
avatar
(Edited)

I've logged into the database, but don't see the Hive stuff, just those that define the standard system. I am more used to Oracle than SQL Server, but the principles should be the same.

I'm using a thing called DBeaver on Linux.

Update: Aha! Needed to specify the schema

0
0
0.000
avatar

Glad you found it cause I wouldn't be able to help much i think!

0
0
0.000
avatar

I've not used this tool before. For my Brit List I used Python and beem. SQL may be better for some other things. It's certainly quick. Need to think about what queries I can use. Could investigate some abusers. There's one I know of who is creating lots of accounts. Could check if any have rewards.

0
0
0.000
avatar

I am tempted myself. I used to write loads of queries on my old job and still occasionally do the odd one to keep my hand in.

Simpler times!!

That dB is gonna take a pounding if everyone has a go!! :0)

0
0
0.000
avatar

Yeah it will which is why I tried to make clear the faults and gave something nice and speedy!

0
0
0.000
avatar

You did. It's a really good example to kick off from!!

0
0
0.000
avatar

Cheers :)

And now you can find out cool stuff like how many times downvoted KC :D :O

0
0
0.000
avatar

The downvote counts!! Actually, as soon as I get some time free at work I'm gonna go for it!

0
0
0.000
avatar

So am I responsible for the sin 😁

0
0
0.000
avatar

Hehe, it looks like it!! Heads will roll 😃😃

0
0
0.000
avatar

First @geekgirl is making posts about using @hivesql, now you?
I really have to get to it!

Happy to see what you can come up with in the mean time.

0
0
0.000
avatar

Thought I'd follow the trend, I've been using the thing for 3 years :)

0
0
0.000
avatar

That sounds very complicated. Haha. It would be better if Hive just worked the same way Leo does as far as curation goes.

0
0
0.000
avatar
(Edited)

Great! I used to use that @curie tool but it only works on STEEM, I could use a replacement.

EDIT: Got it working.. I will have a mess around with your code and do some custom stuff next!

0
0
0.000
avatar

Well this one is a fair base template I think - message me on discord if you have an queries.

0
0
0.000
avatar

Its great, got it working.. and its fast.

0
0
0.000
avatar

Nice!

Try and keep the criteria tight and with few free text 'wildcard' searches - arcange will kick my ass otherwise!

0
0
0.000
avatar

Haha, Only ran it twice.., @steevc might be doing more than me, he's also got it all working.

0
0
0.000
avatar

I'm exploring some of the tables. Lots of potential uses. I find enough to vote on anyway.

0
0
0.000
avatar

Congrats, you were upvoted from this account because you were in Top 25 engagers yesterday on STEMGeeks .
You made a total of 11 comments and talked to 8 different authors .
For more details about this project please read here - link to announcement post
You can also delegate and get weekly payouts.

0
0
0.000
avatar

I have a replacement almost ready. Will deploy the App soon.

0
0
0.000
avatar

I could get into this.., haven't done any coding for a while, but I'm a geek at heart!

0
0
0.000
avatar

Is @curie dead on hive ?

0
0
0.000
avatar

No mate, very much alive but more subdued than before. The witness account is hanging around the top 30 and I curate daily for them (if I can find anything worthy!).

0
0
0.000
avatar

Good to hear, it has witness vote from me. However, yesterday, when I went to the page, I saw it was referring to Steem. At least let's update that. Also the certificate is not valid, the browser pops up a warning.

Curie is a meritocratic, voluntary organization which supports the development and growth of new and existing authors on the Steem blockchain. We have multiple methods of growth, which include but not limited to: direct curation, supporting curation communities, and supporting community projects and development of applications which use the Steem blockchain.

0
0
0.000
avatar

I will ask the gods for you, it doesn't look good.. I have to admit. I haven't been on the website in a couple of years.

0
0
0.000
avatar

Nice! I was working on the exact same thing yesterday. I will deploy searching posts as an App today or tomorrow. It’s almost done. Just having issues displaying body content properly.

0
0
0.000
avatar

Sounds good :) I think people are looking for something simple to use as a search.

0
0
0.000
avatar

I was all excited for a minute there and then you got to the example code. 😂

0
0
0.000
avatar

Congratulations @abh12345! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

You received more than 66000 HP as payout for your posts and comments.
Your next payout target is 68000 HP.
The unit is Hive Power equivalent because your rewards can be split into HP and HBD

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

Hive Tour Update - Advanced posting
Valentine's day challenge - Give a badge to your beloved!
0
0
0.000
avatar

I was following to the top 3rd but I think I concede defeat in SQL! The queries we (you) made for HMRC will be my limit, this is next level wizardry!

0
0
0.000
avatar

Looking at different views/tables here with more filter criteria but if you know what the columns mean it's not so complicated. Glad you got started with a practical example though!

0
0
0.000
avatar

Yes, I think it's just a case of knowing the names of columns that are used to select the data - but you can find that just by looking in to the views in that SQL software you shared. Pretty happy we went through what we needed to - now we just need HIVE designed fountain pens and we'd be sorted 😃

0
0
0.000
avatar

now we just need HIVE designed fountain pens and we'd be sorted

Don't look at me as the seller, I cannot be arsed accounting for another income stream!

0
0
0.000
avatar

Finally......👌I am going to explore more. I use Squirrel SQL Client which is also free - do you think a detailed post on how to use that would help ?

0
0
0.000
avatar

The tool isn't so important, I only use the buttons I mentioned and haven't used Squirrel. I hope you find the example useful!

0
0
0.000
avatar

Well I got as far as down loading the program, and then two updates for the program to work, then was actually able to connect to the database. Now all I need to do is figure out how to get stuff out, that will be another day. I need to work on my spreadsheet a little bit and figure out how to use the data base to populate it with just the stuff I want.

Such as from my account only rewards for the last seven days, how to get it for the entire month and how much since just the beginning of Hive.

these numbers kind of.png

Each month has a different number of days, so I really don't want just the 30 day results but the whole month results. So a sample query might help me if you can work one up for me, I have a little bit of HBD saved, (3.793) I haven't converted it lately, and some non powered up Hive, (40). Give me a price, I know I don't have much but I can scrimp and save.

There is a lot of information available from him.

0
0
0.000
avatar

First, no need to send me anything :)

select CONVERT(DATE,timestamp) as date, sum(hbd_payout) as amount, 'SBD' as currency from VOAuthorRewards where author = 'bashadow'
and (timestamp < '03/20/2020 15:30:00')
group by CONVERT(DATE,timestamp)
union
select CONVERT(DATE,timestamp), sum(hive_payout) as amount, 'STEEM' as currency from VOAuthorRewards where author = 'bashadow'
and (timestamp < '03/20/2020 15:30:00')
group by CONVERT(DATE,timestamp)
union
select CONVERT(DATE,timestamp), sum(hbd_payout) as amount, 'HBD' as currency from VOAuthorRewards where author = 'bashadow'
and (timestamp > '03/20/2020 15:29:59')
group by CONVERT(DATE,timestamp)
union
select CONVERT(DATE,timestamp), sum(hive_payout) as amount, 'HIVE' as currency from VOAuthorRewards where author = 'bashadow'
and (timestamp > '03/20/2020 15:29:59')
group by CONVERT(DATE,timestamp)
order by CONVERT(DATE,timestamp) asc

This is your author rewards (liquid only) for Steem up to the 20th March 15:30 and for Hive from that point onwards.

They are grouped into days. If you pivot table this in Excel then you should be able to choose between weeks/quarters/years.

If you want a more accurate rewards total, it might be an idea to use sum(hive_payout)2 (2 on each) to account for vested rewards.

Curation, as it's only given in Vests, is tougher to find a value for as Vests per STEEM/HIVE is always rising.

select CONVERT(DATE,timestamp) as 'date', sum(reward) as Vests from VOCurationRewards where curator = 'bashadow'
group by CONVERT(DATE,timestamp)
order by CONVERT(DATE,timestamp) asc

You can find the current Vests per Hive on hivetasks.com under the Market tab.

image.png

It was around 480 in at the end of 2017, so I've taken the middle ground:

select CONVERT(DATE,timestamp) as 'date', sum(reward) as vests, sum(reward)/1000000*500 as 'SP/HP guess' from VOCurationRewards where curator = 'bashadow'
group by CONVERT(DATE,timestamp)
order by CONVERT(DATE,timestamp) asc

Something to get you started I hope , enjoy!

0
0
0.000
avatar

Thank you. I first tried the newest version of LINQPad 6, then had to go back to LINQPad 5 because I got some error, but the Version 5 worked right off the bat.

Even took a little bit to remember to pick SQL on the language, so thank you that will be a big help. Yeah getting a good conversion from vest is not to easy a thing I guess, like converting the HBD to Hive, ever changing prices.

That will help me a lot to begin my exploration, I did not want to worry about bogging the system down with comparing lots of accounts, just interested in my own. Now time to learn more about making the queries.

0
0
0.000
avatar

heck yeah this is super neat, Always good to help people find stuff.

0
0
0.000
avatar

This is used to filter out tags/topics. I don't know these languages and so wish to exclude them from the results. $.tags is a section of C.json_metadata and so wildcard searching, although painful, is less of an issue. There is likely a better way to do this, and I hope someone can chip in.

The way I do it is expand the json out and use it as a join and cross apply, kind of a pain in the ass but works good. Every query that uses this is a brain teaser.

0
0
0.000
avatar

Yeah that sounds nasty, but probably better/quicker. Got an example? :D

0
0
0.000
avatar
SELECT Json_value(voters.voters, '$.voter')   AS voter,
               Json_value(voters.voters, '$.time')    AS time,
               Json_value(voters.voters, '$.rshares') AS rshares
        FROM   (SELECT votes.[value] AS voters,
                       c.author,
                       c.permlink
                FROM   comments c
                       CROSS apply Openjson(c.active_votes) AS votes
                WHERE  c.author = 'themarkymark'
                       AND c.permlink =
                           '315-pairs-of-socks-have-a-marketcap-of-usd28-million-dollars') AS voters
0
0
0.000
avatar

I like it, cheers!

0
0
0.000
avatar

Congrats, you were upvoted from this account because you were in Top 25 engagers yesterday on STEMGeeks .
You made a total of 3 comments and talked to 3 different authors .
Your rank is 5 .
For more details about this project please read here - link to announcement post
You can also delegate and get weekly payouts.

0
0
0.000
avatar
(Edited)

This is a really great post on how to use HiveSQL, but...

Fear my wrath! You made mistakes again! 👿

SELECT
    ...
    LEN(body) as CharacterCount

This is a useless waste of resources to recomputes body length for each comment, each time you send your query. Have you ever looked at the columns of the table: there is a body_length column with such precomputed value!

AND datediff(minute, C.created, GETDATE()) < 60*4

Another noobish error: all dates and times stored in the blockchain (and in HiveSQL) are UTC based. Therefore, you need to use GETUTCDATE() when comparing dates.

AND NOT active_votes LIKE('%"spaminator"%')

Contrary to what you think, I have nothing against this part of your WHERE clause.
Of course, this remark is only valid because your WHERE clause contains enough other conditions that will use the indexes before the additional filtering with the LIKE is performed.

AND LEN(body) > 4000

Don't tell me you did it twice 🤦‍♀

Desperate I feel

0
0
0.000
avatar

Mistakes, or opportunities for you to improve my work :)

Thank you for running HiveSQL, I shall correct my wrong doings and bow to YodaSQL on my departure.

0
0
0.000
avatar

Congrats, you were upvoted from this account because you were in Top 25 engagers yesterday on STEMGeeks .
You made a total of 4 comments and talked to 4 different authors .
Your rank is 1 .
For more details about this project please read here - link to announcement post
You can also delegate and get weekly payouts.

0
0
0.000
avatar

Amazing.
I always use python to connect to HiveSQL .

Just curious , if you want to retrieve all the comments for past 30 days, can I know how much time it takes?

0
0
0.000
avatar

Congrats, you were upvoted from this account because you were in Top 25 engagers yesterday on STEMGeeks .
You made a total of 3 comments and talked to 2 different authors .
Your rank is 3 .
For more details about this project please read here - link to announcement post
You can also delegate and get weekly payouts.

0
0
0.000
avatar

Great post! Thank you for the illustration!Wondering if it is possible to check power down power up metrics using hive SQL..

0
0
0.000
avatar

Yes I think that is there.

Power ups

select * from txtransfers where [from] = 'drniche'
and type = 'transfer_to_vesting'
order by timestamp asc

Power downs are in Vests, and so the HIVE amount is only approximate (as the number VESTS per HIVE is constantly increasing).

select , vesting_shares525/10000000 as approx_HIVE from TxWithdraws
where account = 'abh12345' and vesting_shares <> '0.000000'
order by timestamp asc

From the Market Info tab on https://hivetasks.com/@drniche

image.png

0
0
0.000
avatar

This is wonderful!! Thank you so much! I will try it out in the database tonight.

0
0
0.000
avatar

Awesome post! Thank you for sharing this information. It was easy to follow and I was able to query after about 15 minutes. My only comment was when I transfered 0.01 HBD to @hiveSQL, I received an error, so I needed to change the name to @hivesql.

I didn't realized the hiveSQL database is free. That's so cool. Do you know who is funding it to be free? Also, do know if I can query Splinterlands DEC transfers with this database? I am able to query this information through the Splinterlands API, however the transfers data is limited to a 30 day window. I do not know of another way to obtain transfers that are older than 30 days. Do you?

0
0
0.000
avatar

Ah cool, thanks for checking it out and for that correction.

I didn't realized the hiveSQL database is free. That's so cool. Do you know who is funding it to be free?

It didn't used to be free, I paid 40 SBD/HBD a month for years, but:

https://peakd.com/me/proposals - see about 7 down.

Also, do know if I can query Splinterlands DEC transfers with this database?

This might be in TxCustoms, but I think it is on the sidechain. I don't deal with that one but perhaps @dalz or @gerber could be so kind to post a guide or a comment here :)

0
0
0.000
avatar

I appreciate the guidance and for reaching out for more help.

0
0
0.000
avatar
(Edited)

SELECT
timestamp,
json_metadata

FROM Txcustoms
WHERE 
CONVERT(DATE,timestamp) BETWEEN '2020-12-01' AND '2021-02-28'
AND [tid] in ('ssc-mainnet-hive')
AND [json_metadata] like ('%"DEC"%')
AND [json_metadata] like ('%"null"%')

I use this query for DEC transfers to null ... you need to parse the json data as well

0
0
0.000
avatar

Ah nice, so you could do something like this then...

SELECT top 10 
timestamp,
json_value(json_metadata, '$.contractPayload.symbol') as symbol,
json_value(json_metadata, '$.contractPayload.quantity') as quantity,
json_metadata
FROM Txcustoms
WHERE 
CONVERT(DATE,timestamp) BETWEEN '2021-01-01' AND '2021-02-28'
AND [tid] in ('ssc-mainnet-hive')
AND json_value(json_metadata, '$.contractPayload.symbol') = 'DEC'
AND json_value(json_metadata, '$.contractPayload.to')  = 'null'
0
0
0.000
avatar

That will do it :)
Testing it soon!

0
0
0.000
avatar

Thanks abh12345, you and @dalz really came through for me today. I like how you parsed some of the json string.
FYI - when we use [tid] = ('ssc-mainnet-hive'), we get transfers on the Hive engine.
When we used [tid] = ('sm_token_transfer'), we get in-game transfers.

0
0
0.000
avatar

Good to know :)

I've not looked at this table much, mainly out of fear of its size and the json parsing, but I think I'll pay a bit more attention to it in future. Thanks for opening up the conversation, and cheers dalz!

0
0
0.000
avatar

Thanks so much for your prompt reply. Your query sample was very helpful. I was able to update the where clause to obtain exactly what I needed. I had no clue all this data was available all this time. This is very cool.

0
0
0.000
avatar

(This is for @arcange to continue from comment left at https://ecency.com/hive-168869/@arcange/re-1657869089260.)


@hivesql sent me the encrypted memo (along with a refund of 0.001 HBD).

Hive Keychain is required to view the encrypted memo, so for memo purposes I switch to Peak D before I launch Keychain.

Keychain doesn't know about my Memo Key, so I need to import it; however, when I go to import the Memo Key I get this error message from Keychain:

This is not a private WIFI.

I'm using a mobile hotspot, so I don't understand why I need to use WIFi of any kind. So at the moment I'm at a dead end. I don't know how to proceed from here.

0
0
0.000
avatar

I don't think it's a good idea to comment on an unrelated post.

Your problem is related to Keychain, not HiveSQL.
If you read the error message carefully, you will see that it says "This is not a private WIF (and not WIFI) which means that the encoded private key is not valid.

Support for Keychain is provided on their Discord
Support for HiveSQL is also provided on Discord

0
0
0.000