Python is a powerful programming language with a growing community and solutions to accomplish various tasks with ease. One of the use cases for python is automating repetitive tasks. This allows us to be more efficient and saves time to spend it on things that matter more. There are solutions like Selenium that help with automating browser related tasks and web scraping. There are solutions like pdfPlumber that helps with extracting and processing data in pdf files.
There is also PyAutoGUI. As the name suggests it helps with automating graphical user interfaces. In other words it allows us to write code that will automatically perform series of mouse click and keyboard input commands. Be it work related task, or some software we use for business, or maybe games we play. If we can identify all the steps we manually perform, we can write them in our code and let the python script imitate our actions.
Once the code written using PyAutoGUI is executed, it will take over the control of the mouse and the keyboard. Some operating systems may detect that and ask us for set proper permissions to allow this to happen. It happened on my mac. First thing we want to learn is how to stop the script. Things may go wrong, and unexpected actions may be executed. We need to know how to stop it. There is already built-in failsafe enabled with pyAutoGUI. It is recommended not to disable it, although there is an option to do. While the code is running, we can move the mouse to any of the four corners of the screen to stop the code.
However, as I was testing, failsafe didn't work when I have a
While True loop without a condition to break the loop. You may also end up using
While True while figuring out the coordinates for the areas that need to be clicked, doubleClicked, dragged, etc. If you do that, make sure to add a some condition to break out of the loop. Let's look at the code below:
import pyautogui while True: currentMouseX, currentMouseY = pyautogui.position() print("X: " + str(currentMouseX) + " Y: " + str(currentMouseY)) if currentMouseX == 0: break
pyautogui.position() method returns a tuple of x and y coordinates of the mouse on the screen. This helps with identifying the button coordinates to click or other useful coordinates to automate the task. I run the script from the terminal. I set my terminal background to transparent so I can see all of the screen. Then I write down the needed coordinates. Condition check for x == 0 helps with ending the script by moving the mouse to the left side of the screen.
Another useful method is
pyautogui.size(), which returns width and height of the screen. PyaAutoGUI only works with the primary monitor. If you have multiple monitors, only the primary monitor can be used for automating purposes. PyAutoGUI works with Windows, macOS, and Linux. It may also work with Raspberry Pi. I haven't tested, so not 100% sure.
One of the cool features it has is locating the image on the screen. Let's say you have an App that has custom buttons that don't look like native OS buttons. You want to be able to click them. You can screenshot that area of the button and using this image ask pyAutoGUI to locate this image on the screen. It can be done with
pyautogui.locateOnScreen(image) method or
Since everything on the screen is made of pixels, pyAutoGUI compares the pixels and tries to find the location of the image. Because of this, it will take longer for the script to run if these methods are used many times.
Box class object with for values: left, top, width, and height of the image on the screen. Don't forget to pass the image as an argument. You may need to make sure to pass full file path to the image if not sure relative path is working.
On the other hand,
.locateCenterOnScreen() will return x and y coordinate to the center of the image on the screen. For most situations this might be the preferred method to use, since we need the x and y coordinates to click on or drag from.
While locating images on screen is a cool feature, it does slow don't the script a little bit. But also, it doesn't work as expected on computers with retina display. I ran into a problem when I was using this feature on my macbook pro with retina display. There is an easy solution like dividing the x and y coordinates by 2 to get the correct coordinates. However, I find it easier to identify x, and y coordinates manually first then just use these coordinates instead of locating the image on the screen. Both options works.
Let's look at some mouse control functions. We already know how to get the size of the screen and x,y coordinates of the mouse's current position. We can move the mouse with
.moveTo() function. It can take two arguments for x and y coordinates to move the mouse cursor. We can also pass a third argument as a time delay, so we can visually see the mouse cursor moving, instead of movement happening instantly.
pyautogui.moveTo(400,400,5) will move the mouse to x of 400, y of 400 over 5 seconds.
Speaking of time. It may also be a good idea to use time module in combination with pyautogui. Because often times we will need to use
time.sleep(2) to delay the execution of the next code. For example you click a button on your App and it needs to load some data or a different UI. We need that next button to appear for us to click. If we don't wait a second or two, we may end up getting unexpected results.
There are more mouse functions that allow us to perform simple clicks, right click, left click, double click, drag, scroll, mouse up and mouse down. Feel free to visit the pyAutoGUI documentation for the full list of available functions.
We can also control the keyboard with functions like
.keyUp() or we can type in words and sentences with
.write("some text"). PyAutoGUI is also capable of using hotkeys and keyboard shortcuts.
If want to take a screenshot of the screen at any given time when script is running, we can do so too using
pyautogui.screenshot() function. This will allow us to create even more complex automating algorithms if we utilize other module that can evaluate the screenshots and come with options for the next move.
You can find detailed PyAutoGUI documentation here. The documentation is well written, plenty of examples to get started and experiment with pyAutoGUI. It is a great tool to learn and use. I already have some interesting ideas to automates some tasks. This will save some more time for me and I might even be able to experiment it with some games.
Have you used pyAutoGUI? Please share your thoughts and experiences with it. Do you like automating things? What kind of tasks do you automate and would like to be able to automate. Let me know in the comments.