PyAutoGUI - Automate Repetitive Tasks With Python

in #python2 months ago (edited)

pyautogui.png

Python is a powerful programming language with a growing community and solutions to accomplish various tasks with ease. One of the use cases for python is automating repetitive tasks. This allows us to be more efficient and saves time to spend it on things that matter more. There are solutions like Selenium that help with automating browser related tasks and web scraping. There are solutions like pdfPlumber that helps with extracting and processing data in pdf files.

There is also PyAutoGUI. As the name suggests it helps with automating graphical user interfaces. In other words it allows us to write code that will automatically perform series of mouse click and keyboard input commands. Be it work related task, or some software we use for business, or maybe games we play. If we can identify all the steps we manually perform, we can write them in our code and let the python script imitate our actions.

Once the code written using PyAutoGUI is executed, it will take over the control of the mouse and the keyboard. Some operating systems may detect that and ask us for set proper permissions to allow this to happen. It happened on my mac. First thing we want to learn is how to stop the script. Things may go wrong, and unexpected actions may be executed. We need to know how to stop it. There is already built-in failsafe enabled with pyAutoGUI. It is recommended not to disable it, although there is an option to do. While the code is running, we can move the mouse to any of the four corners of the screen to stop the code.

However, as I was testing, failsafe didn't work when I have a While True loop without a condition to break the loop. You may also end up using While True while figuring out the coordinates for the areas that need to be clicked, doubleClicked, dragged, etc. If you do that, make sure to add a some condition to break out of the loop. Let's look at the code below:

import pyautogui

while True:
    currentMouseX, currentMouseY = pyautogui.position()
    print("X: " + str(currentMouseX) + " Y: " + str(currentMouseY))
    if currentMouseX == 0:
        break



pyautogui.position() method returns a tuple of x and y coordinates of the mouse on the screen. This helps with identifying the button coordinates to click or other useful coordinates to automate the task. I run the script from the terminal. I set my terminal background to transparent so I can see all of the screen. Then I write down the needed coordinates. Condition check for x == 0 helps with ending the script by moving the mouse to the left side of the screen.

Another useful method is pyautogui.size(), which returns width and height of the screen. PyaAutoGUI only works with the primary monitor. If you have multiple monitors, only the primary monitor can be used for automating purposes. PyAutoGUI works with Windows, macOS, and Linux. It may also work with Raspberry Pi. I haven't tested, so not 100% sure.

One of the cool features it has is locating the image on the screen. Let's say you have an App that has custom buttons that don't look like native OS buttons. You want to be able to click them. You can screenshot that area of the button and using this image ask pyAutoGUI to locate this image on the screen. It can be done with pyautogui.locateOnScreen(image) method or pyautogui.locateCenterOnScreen(image).

Since everything on the screen is made of pixels, pyAutoGUI compares the pixels and tries to find the location of the image. Because of this, it will take longer for the script to run if these methods are used many times. .locateOnScreen() returns Box class object with for values: left, top, width, and height of the image on the screen. Don't forget to pass the image as an argument. You may need to make sure to pass full file path to the image if not sure relative path is working.
On the other hand, .locateCenterOnScreen() will return x and y coordinate to the center of the image on the screen. For most situations this might be the preferred method to use, since we need the x and y coordinates to click on or drag from.

While locating images on screen is a cool feature, it does slow don't the script a little bit. But also, it doesn't work as expected on computers with retina display. I ran into a problem when I was using this feature on my macbook pro with retina display. There is an easy solution like dividing the x and y coordinates by 2 to get the correct coordinates. However, I find it easier to identify x, and y coordinates manually first then just use these coordinates instead of locating the image on the screen. Both options works.

Let's look at some mouse control functions. We already know how to get the size of the screen and x,y coordinates of the mouse's current position. We can move the mouse with .moveTo() function. It can take two arguments for x and y coordinates to move the mouse cursor. We can also pass a third argument as a time delay, so we can visually see the mouse cursor moving, instead of movement happening instantly.

pyautogui.moveTo(400,400,5) will move the mouse to x of 400, y of 400 over 5 seconds.

Speaking of time. It may also be a good idea to use time module in combination with pyautogui. Because often times we will need to use time.sleep(2) to delay the execution of the next code. For example you click a button on your App and it needs to load some data or a different UI. We need that next button to appear for us to click. If we don't wait a second or two, we may end up getting unexpected results.

There are more mouse functions that allow us to perform simple clicks, right click, left click, double click, drag, scroll, mouse up and mouse down. Feel free to visit the pyAutoGUI documentation for the full list of available functions.

We can also control the keyboard with functions like .press(), .keyDown(), .keyUp() or we can type in words and sentences with .write("some text"). PyAutoGUI is also capable of using hotkeys and keyboard shortcuts.

If want to take a screenshot of the screen at any given time when script is running, we can do so too using pyautogui.screenshot() function. This will allow us to create even more complex automating algorithms if we utilize other module that can evaluate the screenshots and come with options for the next move.

You can find detailed PyAutoGUI documentation here. The documentation is well written, plenty of examples to get started and experiment with pyAutoGUI. It is a great tool to learn and use. I already have some interesting ideas to automates some tasks. This will save some more time for me and I might even be able to experiment it with some games.

Have you used pyAutoGUI? Please share your thoughts and experiences with it. Do you like automating things? What kind of tasks do you automate and would like to be able to automate. Let me know in the comments.

Sort:  

we can move the mouth to any of the four corners of the screen to stop the code

typo, or random lisp for the lols ?
Good post either way, I'll have to find an excuse to play with pyAutoGUI soon thanks for the heads up

lol. Yes, that was a typo. I had to ctrl+F to see if that happened more than once. Fixed it. Thank You.

Thanks for sharing, this is great.

I'm taking over a new role at work and even though I'm not a programmer I'm trying to learn some Python to automate a bunch of repetitive and manual tasks and make my team more efficient.

One of the things I'm trying to implement right now is a Naive Bayes text classifier to classify user comments on our product satisfaction survey.

We do it manually right now but it takes a lot of time to go over each comment and think about what feature or aspect of our product it's related to.

!PIZZA

Ah cool. I hope this can help. But it seems like you will need something that will process text and categorize it. I saw you have been playing with machine learning. I haven't even started with ML yet, but looks like you are taking right path. Let me know how you achieve automating this task.
Thanks.

Lol now if I could automate making my bed that would be great

Word in the street is, soon we will all live in a metaverse. You will be able to automate anything there.

This is pretty cool. I've used Automator and Applescript in the past, but Automator is kind of limited and Applescript is more powerful but requires learning a new language to access that power. I didn't realize there was a Python library for GUI automation. Seems like an easier alternative since I already know Python pretty well.

Will have to check it out. Not sure what I need to automate right now, but it's always good to have another tool in the toolbox.

All my automated tasks are built on selenium but I was actually running into some issues with it lately when trying to automate tasks on my laptop, which made me work as well with the keyboard library. It seems that PyAutoGUI can solve this issue, I will give it a try - thanks for sharing!

Awesome. Let me know how it goes. Selenium is great. I need to experiment more with it.

Thanks for updating on stuff like this but am not that familiar with python programming

Congratulations @geekgirl! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

You got more than 8000 replies.
Your next target is to reach 8250 replies.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

Hive Power Up Month - Feedback from Day 21


The rewards earned on this comment will go directly to the person sharing the post on Twitter as long as they are registered with @poshtoken. Sign up at https://hiveposh.com.

PIZZA!

PIZZA Holders sent $PIZZA tips in this post's comments:
@tfranzini(2/10) tipped @geekgirl (x1)

You can now send $PIZZA tips in Discord via tip.cc!

PyAutoGUI is good for the automatisation of regular tasks.