Python Script to Save All Your Posts As MarkDown or Html Files

avatar
(Edited)

I always wanted to be able to save my Steem posts locally. After that better searching tools are available than the ones we have at the blockchain level.

I have only started poking around the development APIs for Steem, and this is the first script with a real purpose I've done in Python. On top of that, I'm also kind of new to Ubuntu. :)

If you are a dev and have been doing this for a while, you probably can write a more efficient script.

I wasn't looking for efficiency when I wrote it, I was interested to learn, and from there maybe others who are also Python beginners or haven't tried to code using Steem APIs. Hence the extensive comments.

Features and options:

  • saves all your markdown posts as .md files
  • saves all your raw HTML posts as .html files
  • you can set a main sub-directory or sub-path in the current directory where the files will be placed
  • posts will be placed in subdirectories based on the creation date (year-month) or primary tag - option to set at the beginning of the script
  • you can save the posts for any account
  • you can save resteemed posts as well or not
  • you can add tags at the end of the post or not
  • title is automatically added as H1 at the beginning of the post

I've tested the script on Python 3.7.4, but I believe it should work on earlier versions. Also the script is written for Linux/Ubuntu, for Windows you will need to adapt the parts of the script handling paths and creation of directories.

You will also need a good Markdown viewer/editor to see the saved files. I used Typora, but it looks like this will be a paid software when it exits beta version, so a good free alternative will be nice.

So, here's the Python script. Pay attention, settings are hard coded, you'll have to manually change them.

While I'm far from a Python or Steem dev expert, if you have questions let me know.

Feedback to improve from more experienced devs is welcomed as well. :)

import os
import sys
import json
from steem import Steem
s = Steem()

# script parameters
# =================

# author
author_name = 'testuser123'

# relative directory under which the posts will be saved (don't add a final "/"!)
main_save_dir = 'steem-posts-' + author_name

# structure of directories under which posts will be saved
# Options:
# primary-tag - posts are saved under their primary tag subdirectory
# year-month - posts are saved under the year-month of their creation date subdirectory
dir_struct_option = 'year-month'
print('Save posts by ' + dir_struct_option)

# bool flag to determine if tags are added at the end of the post or not
adding_tags_to_saved_post = True
print('Adding tags to the end of each post? ' + str(adding_tags_to_saved_post))

# bool flag to determine if to save resteemed posts of other authors as well
include_resteem_posts = False
print('Include resteemed posts? ' + str(include_resteem_posts))

# =====================
# end script parameters
#

#create main save directory (as a subdirectory or sub-path of the current directory)
try:
os.makedirs(main_save_dir)
print('Directory ' + main_save_dir + ' created in current directory ' + os.curdir)
except FileExistsError:
print('Directory ' + main_save_dir + ' already exists in current directory ' + os.curdir)
except OSError:
print('Directory ' + main_save_dir + ' couldn\'t be created in current directory ' + os.curdir)

#save current dir
cur_dir_saved = os.curdir

# loops through all the posts of the given author
# we break out of the loop after we reach the last post of the author
i = 1
while True:

#retrieve current blog post info
#theoretically we can retreieve more than one blog per call, in my tests anything more than 2 generated an error, so I prefered to take them one by one
try:
blogs = s.get_blog(author_name, i, 1)
except Exception:
print('Couldn\'t get blog #' + str(i) + '. Trying again. Ctrl+C to interrupt.')
continue
#is it empty? then we reached the end and we should break out of the loop
if blogs == []: break

#is it the author's post or a resteem?
#if it's a resteem continue from the next iteration and resteems are not to be included
if blogs[0]['comment']['author'] != author_name:
if not include_resteem_posts:
print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Skipping it.')
i += 1
continue
else:
print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Including it.')

#choose the name of the subdir where to place the saved posts
#(i.e. posts can be saved by primary-tag or date [year-month])
if dir_struct_option == 'primary-tag':
subdir_name = 'tags/' + blogs[0]['comment']['category']
elif dir_struct_option == 'year-month':
subdir_name = 'date/' + blogs[0]['comment']['created'][0:7]

#attempt to create the subdir first
if cur_dir_saved == '.':
dir_name = main_save_dir + '/' + subdir_name
elif cur_dir_saved == '/':
dir_name = cur_dir_saved + main_save_dir + '/' + subdir_name
else:
dir_name = cur_dir_saved + '/' + main_save_dir + '/' + subdir_name

#create the subdirectory/ies where we will place our files
try:
os.makedirs(dir_name)
print('Directory ' + dir_name + ' created.')
except FileExistsError:
pass
except OSError:
print('Directory ' + dir_name + ' couldn\'t be created.')
raise OSError

#deserialize json_metadata
json_metadata_str = blogs[0]['comment']['json_metadata']
json_metadata_dict = json.loads(json_metadata_str)

try:
format = json_metadata_dict['format']
except KeyError:
print('Broken blog json before format key. Defaulting to "markdown+html".')
format = 'markdown+html'

#is the post markdown?
if format == 'markdown+html' or format == 'markdown':
#choose the filename as the blog post's permlink + ".md" extension
filename = blogs[0]['comment']['permlink'] + '.md'

if (adding_tags_to_saved_post):
#get tags and create a string with them to add at the end of the post
try:
tags_str = '\n\n'
for x in json_metadata_dict['tags']:
tags_str += '#' + x + ' '
except KeyError:
tags_str = ''
else: tags_str = ''

#get post body
body = blogs[0]['comment']['body']

#get post title
title = blogs[0]['comment']['title']

#format the body to also include title at the begining as H1 and tags (with #) at the end
body_with_title_and_tags = '# ' + title + '\n\n' + body + tags_str
#or is the post raw html?
else:
#choose the filename as the blog post's permlink + ".md" extension
filename = blogs[0]['comment']['permlink'] + '.html'

if (adding_tags_to_saved_post):
#get tags and create a string with them to add at the end of the post
try:
tags_str = '\n\n'
for x in json_metadata_dict['tags']:
tags_str += '<a id="' + x + '" href="#' + x + '">' + x + '</a> '
except KeyError:
tags_str = ''
else: tags_str = ''

#get post body
body = blogs[0]['comment']['body']

#get post title
title = blogs[0]['comment']['title']

#format the body to also include title at the begining as H1 and tags (with #) at the end
body_with_title_and_tags = '<h1>' + title + '</h1>\n\n' + body + tags_str

#write post to file (overwrite if exists)
try:
f = open(dir_name + '/' + filename, 'w')
f.write(body_with_title_and_tags)
f.close()
print('Post #' + str(i) + ': ' + dir_name + '/' + filename + ' successfully saved.')
except OSError:
print('Something went wrong while attempting to write file ' + dir_name + '/' + filename)
raise OSError

i+=1

print('No (more) posts.')

Update: Edited the post because in the original there were some errors due to the copy-pasted code to html, which I haven't initially tested.



0
0
0.000
10 comments
avatar

According to the Bible, Graven Images: Should You Worship These According to the Bible?

Watch the Video below to know the Answer...

(Sorry for sending this comment. We are not looking for our self profit, our intentions is to preach the words of God in any means possible.)


Comment what you understand of our Youtube Video to receive our full votes. We have 30,000 #SteemPower. It's our little way to Thank you, our beloved friend.
Check our Discord Chat
Join our Official Community: https://steemit.com/created/hive-182074

0
0
0.000
avatar

My name is Jesus Christ and I do not condone this spamming in my name. Your spam is really fucking annoying @hiroyamagishi aka @overall-servant aka @olaf123 and your spam-bot army. This is not what my father, God, created the universe for. You must stop spamming immediately or I will make sure that you go to hell.

If anybody wants to support my eternal battling of these relentless religion spammers, please consider upvoting this comment or delegating to @the-real-jesus

0
0
0.000
avatar
(Edited)

why use steem module and not beem module? many steem features are no longer up to date.
https://github.com/holgern/beem
beem is a bit more uptodate. what i noticed though when the api.steemit.com site was down, is that it relies even if you specify a different node, still on the steemit node, so when installing it from github you first have to replace all api.steemit.com in the sourcecode with a different API you trust.

also ive been trying to write posts and upvote using directly the API requests over the requests module to be able to update my code more flexibly and not rely on another steem user but i havent figured out yet how to correctly format the broadcast operation and i havent found anyone yet willing to help me....

but here is what i have for example to get the blog posts from your steem account:

import json
import ast
import requests
def query(node,data,tor):
headers = {'Content-Type': 'application/json',}
if tor==False:
return requests.post(node,headers=headers, data=data)
else:
session=requests.session()
session.proxies={'http': 'socks5://127.0.0.1:9050', 'https': 'socks5://127.0.0.1:9050'}
return session.post(node,headers=headers, data=data, proxy=proxy)
def get_blog(name,nod,tor,start,end):
querry='{"jsonrpc":"2.0", "method":"condenser_api.get_blog", "params":["'+name+'",'+str(start)+','+str(end)+'], "id":1}'
return dict(dict(json.loads(query(nod,querry,tor).text))["result"][0])

i havent tested yet (and i see now tht i comment some mistakes) if the tor function works yet, but when having the tor browser open and sending the traffic over local host port 9050 would usually send the traffic through the tor browser.

if someone were so kind and help me out how to correctly write a vote query broadcast operation i would be very grateful

0
0
0.000
avatar

why use steem module and not beem module? many steem features are no longer up to date.

I haven't seen Holger in a while. Will he or someone else keep updating beem? Not that there's anyone updating Steem APIs at Steemit, Inc. now.

You're already more experienced in Python and Steem/beem APIs than I am. Maybe you'll receive some guidance from someone who is even more experienced...

0
0
0.000
avatar

As a note, I use VS Code (because I'm a dev I guess) w/ an extension to preview .md files as I write them (basically like writing a post with preview), probably similar free apps to do it with that aren't as massive as VS Code though.

0
0
0.000
avatar

Yes, I used VS Code to write this Python script as well. Didn't try it for md though, but I will. Thanks for mentioning it.

0
0
0.000
avatar

Just checked it, I was using Markdown Preview Enhanced for the extension, looks like there are a few though. No problem, nice script man!

0
0
0.000
avatar

Great, I'll check it out. Thanks again!

0
0
0.000