Skip to content

albina0104/automate-boring-stuff-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Practice Projects

My solutions to the practice projects from the book "Automate the Boring Stuff with Python, 2nd Edition".

My favourite practice projects:

  • Chapter 17 – Keeping Time, Scheduling Tasks, and Launching Programs
    • Practice 2. "Scheduled Web Comic Downloader" - scrapes comic websites and downloads new comics once a day.
  • Chapter 18 – Sending Email and Text Messages
    • Practice 2. "Umbrella Reminder" - scrapes a weather website, LLM intelligently decides whether there is a chance of rain, the email reminder to grab an umbrella is sent every morning when the rain is expected.
    • Practice 4. "Controlling Your Computer Through Email" - checks emails for instructions, downloads attachments, sends email when the task is done.

Note: the projects were made on Ubuntu OS, with Python 3.10.12.

What I learned

Some useful notes about what I learned outside of the book while doing the practice projects.

Chapter 7 – Pattern Matching with Regular Expressions

Practice 2. Strong Password Detection

Chapter 9 – Reading and Writing Files

Practice 3. Regex Search

  • When we try to find lines in .txt files that match a regular expression, we have to use the re.DOTALL flag because the lines in .txt files end with \n - new lines, and without this flag nothing is matching at all due to presence of new lines, e.g.:
    user_regex = re.compile(regex_string, re.DOTALL)

Chapter 12 – Web Scraping

Practice 1. Command Line Emailer

  • When I tried to use pyinputplus.inputPassword() for accepting the user password, it did not work in PyCharm IDE, I got the following error:

    termios.error: (25, 'Inappropriate ioctl for device')

    Solution: edit configuration options - Modify options - select "Emulate terminal in output console" - run the program.

    (Answer was found here.)

  • An issue with Selenium:

    1. I switch to an iframe:
    browser.switch_to.frame(sign_in_iframe)
    1. I input a password and submit a form inside the iframe, and the page reloads, the next page opens
    2. I try to find the next needed element, but the window does not exist anymore. Error:
    selenium.common.exceptions.NoSuchWindowException: Message: Browsing context has been discarded

    The issue happens because there's no that iframe anymore. Solution: switch to default context.

    browser.switch_to.default_content()

Practice 2. Image Site Downloader

If you try to scrape a website, and the website is protected against bots by Cloudflare - you will get a 403 Forbidden error, even if you send the 100% exact same HTTP request that is successful in the browser.

Cloudflare can protect a website with the following techniques:

Practice 4. Link Verification

Useful resources:

Chapter 13 – Working with Excel Spreadsheets

Practice Question 14

Q: If you want to retrieve the result of a cell’s formula instead of the cell’s formula itself, what must you do first?

A: Load the workbook with the data_only=True parameter.

import openpyxl

# Load the workbook with data_only=True
wb = openpyxl.load_workbook('produceSales.xlsx', data_only=True)
sheet = wb['Sheet']
# Get the calculated value of the formula in cell D2
calculated_value = sheet['D2'].value
print(f'Calculated value of the formula: {calculated_value}')

However, openpyxl never evaluates formula - it doesn't compute the formula results; it simply reads what's already stored in the file. When you open a workbook with data_only=True, openpyxl reads the cached results of the formulas from the file. These results are typically calculated by Excel or another spreadsheet application and then saved in the file.

So, openpyxl can access and display the formula results only if they have been calculated and saved by the spreadsheet application.

Practice 2. Blank Row Inserter

Accessing cells individually in an Excel workbook can be very slow. The sheet methods iter_rows() and iter_cols() in the openpyxl library iterate over rows and columns much faster.

Chapter 15 – Working with PDF and Word Documents

Practice 1. PDF Paranoja (encrypting and decrypting PDFs)

PyPDF2 is deprecated. We should use pypdf (it is similar, but also much simpler). Documentation: https://pypdf.readthedocs.io/en/latest/user/encryption-decryption.html

(https://stackoverflow.com/a/75572419)

Practice 3. Brute-Force PDF Password Breaker

By checking pypdf source code, I found that decrypt() returns one of these 3 values:

class PasswordType(IntEnum):
    NOT_DECRYPTED = 0
    USER_PASSWORD = 1
    OWNER_PASSWORD = 2

So we can consider the file decrypted when the result of pdf_reader.decrypt(password) is not 0.

Chapter 18 – Sending Email and Text Messages

Practice 1. Random Chore Assignment Emailer

The fastest technique to clone a list without referencing the original list:

emails_to_choose_from = emails[:]

Practice 2. Umbrella Reminder

For this practice project, I decided to use a local LLM to intelligently determine the chance of rain based on the weather forecast in text format scraped from a weather website. So I installed a local ollama and used a lightweight LLM model gemma:2b.

Useful resources:

Practice 3. Auto Unsubscriber

Practice 4. Controlling Your Computer Through Email

Chapter 19 – Manipulating Images

Practice 3. Custom Seating Cards

Other things

  • How to write docstrings:
def is_strong_password(password):
    """
    Checks if the given password is strong.

    A strong password is defined as one that is at least eight characters long,
    contains both uppercase and lowercase characters, and has at least one digit.

    :param password: The password string to be checked.
    :type password: str
    :return: True if the password is strong, False otherwise.
    :rtype: bool
    """
    return True if password_regex.fullmatch(password) else False
  • How to configure a logger:
logging.basicConfig(level=logging.DEBUG, filename='filename.log',
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

Links

Author

About

Python practice projects from the book "Automate the Boring Stuff with Python"

Topics

Resources

Stars

Watchers

Forks

Languages