RE: [ENG/ITA] Python & Hive: My Scripts are Ready! My First Project is Completed :)

avatar
(Edited)

You are viewing a single comment's thread:

I'm so tired I have no idea what I've done, but working on your suggestion I was able to improve the script, which is now much faster (around 3 times faster I think!).

This is the part I changed:

# return the eligible posts in a list
def get_post_list(url):
    with requests.Session() as session:
        last_hive_block_num = get_properties(url, session)[
            "last_irreversible_block_num"
        ]
        last_block_num = load_last_block()
        if last_block_num is None:
            last_block_num = last_hive_block_num
            save_last_block(last_block_num)
        print(
            f"working on {last_block_num}. Last hive block is {last_hive_block_num}"
        )

        post_list = []

        if last_block_num < last_hive_block_num:
            diff = int(last_hive_block_num) - int(last_block_num)
            max_iterations = 100
            iterations = min(diff, max_iterations)

            for _ in range(iterations):

                ops = get_ops_in_block(last_block_num, url, session)
                list = get_post(ops)
                post_list.extend(list)

                last_block_num += 1
                save_last_block(last_block_num)

        elif last_block_num == last_hive_block_num:
            print("All blocks are up-to-date, waiting for new blocks...")
            time.sleep(3)

        else:
            sleep = (last_block_num - last_hive_block_num) * 3
            print(f"Selected block is too high: sleeping for {sleep} seconds")
            time.sleep(sleep)

        return post_list

Probably it could have been implemented better, but I'm still very happy of the improvement!

Thanks for your awesome suggestion :)

EDIT: made a slight improvement!

!PIZZA



0
0
0.000
6 comments
avatar

You are now successfully processing multiple blocks in one go, instead of fetching them one-by-one and processing them separately. The minute-long pause sort of added insult to injury and made it look like the script made even less progress.

Btw. I like the idea of a dynamic sleep time based on the difference in block numbers! How'd you come up with that? I hate to admit I wouldn't have thought of something like that.

About your original script...

I had to add some print statements to see what it did though. I think seeing on-screen what the script does is helpful in many cases to understand what is going on.

Since I saw your script wrote logs, I also put 'tail -f' to follow them, but it looks like I didn't catch any Italian posts this time as nothing much got written on them.

I wrote a really long rant about double conversions from:

def convert_and_count_words(md_text):
    html = markdown.markdown(md_text)  # Convert Markdown to HTML
    soup = BeautifulSoup(html, "html.parser")
    text = soup.get_text()  # Extract plain text from the HTML
    words = re.findall(r"\b\w+\b", text)  # Find all words in the text
    return len(words)

...to only using Regex to strip out Markdown elements:

import re

# Convert markdown text to plain text and count the number of words
def convert_and_count_words(md_text):
    # Strip markdown syntax and convert directly to plain text
    # Removing common markdown elements like headers, lists, and formatting
    plain_text = re.sub(r'[#*\[\]\(\)`>]', '', md_text)
    
    # Find all words in the text
    words = re.findall(r"\b\w+\b", plain_text)
    
    return len(words)

...because it would've been faster and removed the need for extra libraries, but then I realized Hive posts quite often contain html tags. Even I use them quite often to make my posts look better. So maybe it's important to keep the double conversions after all.

So here I would've been completely off the base again.

Oh, btw, my editor (I currently use Neovim) complained about this line in your code:

            if valid_language == False:
                continue

The warning was: "Avoid equality comparisons to 'False'; use if not valid_language: for false checks"

The problem is that you made a comparison to a True/False boolean, and I guess it is not considered proper Python code. So the corrected code is like the warning said:

            if not valid_language:
                continue

Well... I think I might have to go through the other script too. This one was fun.

0
0
0.000
avatar

How'd you come up with that?

My biggest concern was to start checking hive blocks before they even existed, so I asked ChatGPT - yeah, I often ask it a lot of questions !LOL - if I could use the range function dinamically, switching between a default amount and a max amount, whichever was lower. And then it suggested me to use "min()" to check what the lower among the two was... this is how it went 😅

it looks like I didn't catch any Italian posts

Totally possible! For testing purpose I set the language to "en", reduce the words to a very few and often switch from posts to comments.

Hive posts quite often contain html tags

I came up with the solution because of that, as I soon realized I was counting a lot of strings that weren't words, so I looked for a solution... and Beautiful Soup is such a cool name ahahah

and I guess it is not considered proper Python code

Well, I'm still happy that the rest is considered Python ahahah

Thanks for this correction! It makes totally sense and I have no idea why I keep forgetting things I know... ahhhh it's exactly as you said in the other comment: code is an illusion!

0
0
0.000
avatar

ChatGPT seems to be a very good teacher, although quite often it just slams me with the complete solution and doesn't let me think for myself. I don't know what to think about that. But I guess it does sometimes come up with quite elegant solutions. A friend of mine once said that with ChatGPT, you are not really chatting with a computer, but the rest of the world. It's like you are chatting with the knowledge of everyone, so that's why it comes up with interesting solutions too.

Oh, I will try changing the language to see whether I can catch any english posts with the script, yes that might be a workable idea! :) Btw. the script needs some error-checking:

Lastest block 89122762 written to the file.
Last block file exists.
Lastest block read from the file: 89122762
Lastest block 89122763 written to the file.
Last block file exists.
Lastest block read from the file: 89122763
Traceback (most recent call last):
  File "/home/ambience/Projects/Code/Python/Hive/Arc7icWolf/venv/lib/python3.12/site-packages/requests/models.py", line 974, in json
    return complexjson.loads(self.text, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ambience/Projects/Code/Python/Hive/Arc7icWolf/oma_fetch_italian_posts.py", line 193, in <module>
    main()
  File "/home/ambience/Projects/Code/Python/Hive/Arc7icWolf/oma_fetch_italian_posts.py", line 172, in main
    post_list = get_post_list(url)
                ^^^^^^^^^^^^^^^^^^
  File "/home/ambience/Projects/Code/Python/Hive/Arc7icWolf/oma_fetch_italian_posts.py", line 145, in get_post_list
    ops = get_ops_in_block(last_block_num, url, session)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ambience/Projects/Code/Python/Hive/Arc7icWolf/oma_fetch_italian_posts.py", line 72, in get_ops_in_block
    ops_in_block = response.json()["result"]
                   ^^^^^^^^^^^^^^^
  File "/home/ambience/Projects/Code/Python/Hive/Arc7icWolf/venv/lib/python3.12/site-packages/requests/models.py", line 978, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Since I have no idea what this error is trying to tell me, I asked ChatGPT, and since its answer was quite long, I'll condense it here:

The error json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) occurs when the response received from the API is empty or not a valid JSON.

I guess that for one reason or another, the server borked the output, and the script didn't quite know what to do.

These were the suggested ways to fix it:

  • Check for Empty Responses: Modify your get_ops_in_block function to handle the case where the response is empty or not JSON.
  • Add Retry Mechanism: Sometimes the issue can be temporary, and retrying after a small delay can resolve it.
  • Log the Response: Add logging to see exactly what the response contains when it fails. This could help you pinpoint whether it’s an HTML error page, an empty response, or something else.
  • Use a Different API Node: Sometimes, API nodes can experience downtime or performance issues. Try using a different Hive API node.

In case you want to be adventurous, check my script for error handling. If you get lost, ask ChatGPT and it will come up with an excellent example that's the complete answer and leaves nothing to actually learn. ;)

Personally, if I am being adventurous, I like to pose my questions in a way that forces the bot to give a detailed explanation of how to fix the problem without showing me any code. Usually I just get frustrated and cheat again. But maybe, just maybe it is a good idea to at least try.

Cheers!

!BEER

0
0
0.000
avatar

I like to pose my questions in a way that forces the bot to give a detailed explanation of how to fix the problem without showing me any code

Ahahah love it! I noticed the same problem with ChatGPT, hence why I usually try to avoid it... unless I'm so clueless that I have no idea what to do 😅 and this happens too often lately...

Btw. the script needs some error-checking

I saw you inserted a lot of error-checking in your script, so, if you don't mind, I'll take some inspiration from yours :)

There's plenty of room for improvements in my script and I only have to find the will to start working on it again 🤣

0
0
0.000