Image by thebarrowboy on Flickr
Code can always be improved. Check out these tips to make you the best programmer you can be!
Appearances can be deceiving. Just because a block of code looks good doesn’t mean that it’s error free; the code could be inefficient, or confusing to read, or it could leave the door wide open for malicious hacking attacks. A good programmer wants to be vigilant against such coding faux pas!
Since the best way to learn is by example, we’ll start by diving into the nitty-gritty of a small 50-line Python script that acts as a simple “proof-reader”.
First, the program downloads a list of English-language words from the Internet and stores them in a list, creating a simple dictionary. Next, it prompts the user to enter the name of a text file. The program then reads though this file and puts the new words in a different list. Finally, the program works through the user’s words one by one to see if they’re in the dictionary. If not, it highlights the offending mistakes in red and displays the text back to the user.
There’s a few subtleties:
- The dictionary words are stored in a set, not a list. The only important difference (for us) is that lists have a specific order while sets don’t, and this makes it considerably faster to check for a particular word inside a set.
- The program “sanitizes” the user’s words, which is a fancy to say it cleans them up by removing attached punctuation marks and converting capital letters into lowercase equivalents. Otherwise words like “However,” and “as:” would be flagged as mistakes since they don’t exactly match the entries “however” and “as”.
Take a peek at the code and see if you can spot where it goes wrong, or where potential problems might crop up. Don’t be intimidated if there are lines you don’t instantly understand! If a function is unfamiliar, pop it into a Google search engine and see if you can find an explanation. You can also play around with removing lines to see how their absence changes the program.
import urllib2 from termcolor import colored running = True while running: # Import dictionary words from internet link filepath = "https://raw.githubusercontent.com/dwyl/english-words/master/words.txt" file = urllib2.urlopen(filepath).read() words = file.split("\n") # "\n" is a newline character (the enter key). Here, we're separating words on different lines # Sanitize inputs for i in range(len(words)): words[i] = words[i].lower() # make capital letters lowercase words = set(words) # convert from a list to a set # Import words from local file filepath2 = raw_input("Enter name of file: ") # prompt user for name of file print("") # Add space in console display file2 = open(filepath2, 'r').read() words2 = file2.split(" ") # separate words using spaces # Create list of sanitized inputs words3 =  # create a new, empty list for i in range(len(words2)): words3.append(words2[i].lower()) # Remove punctuation marks punctuation = [".", ",", "!", "?", ";", ":", "(", ")"] for symbol in punctuation: words3[i] = words3[i].replace(symbol, "") # Check for mistakes using dictionary, highlight mistakes in red in the original list for i in range(len(words3)): if words3[i] not in words: words2[i] = colored(words2[i], "red") # Display results in terminal for word in words2: print(word), print("\n") # Ask user if they'd like to proof another file answer = raw_input("Would you like to proofread another file? (y/n) ") if answer == "y": running = True else: running = False print("")
Unfortunately, this particular script can’t be run in an online Python IDE. The “urllib2” package, which we use to download content from the internet, must be pre-installed on a computer, and most online programming environments don’t offer access to custom packages.
To run the script yourself I recommend downloading the free version of PyCharm. Other IDEs like Komodo Edit or Atom work just as well, though each has a slightly different method for installing urllib2. The instructions for PyCharm can be found here.
Finally, make sure that the text file you want to open is in the same folder as your script; otherwise you’ll have to specify the entire file path.
When running, you’ll see something like this:
Problem #1: Naming Conventions
Variable names should be descriptive. If your data structure is a list of words, then “words” is a decent name choice, although “words_list” might be even better since it tells us about the structure and the content of the variable. This is extra important Python, which isn’t a typed language. However, even with that slight improvement, we still get code like this:
for i in range(len(words_list3)): if words_list3[i] not in words_list: words_list2[i] = colored(words_list2[i], "red")
Convoluted enough to confuse even experienced programmers!
When you have many similar variables, it’s a good idea to make their names extremely distinct, even if means using long awkward names. Consider:
for i in range(len(sanitized_user_words)): if sanitized_user_words[i] not in dictionary_words: original_user_words[i] = colored(original_user_words[i], "red")
Much more intuitive! Even though list interaction and indexing is tricky to read, we can tell the program is checking our sanitized words against the dictionary words, and if there’s a mistake, it updates the user’s original words to a red-coloured version of themselves.
Are there any other variable names you think could be improved?
Problem #2: Exception Handling
Whenever you open a file or connect to a website there’s always a chance something will go wrong. The file could be corrupted, or your internet connection could unexpectedly break. In these scenarios the program crashes without explanation, leaving the user annoyed and confused.
In programming, risky operations should always be enclosed in try and except clauses. The “try” block contains the risky code, and if an exception (error) occurs during runtime, the program switches over to the code in the “except” block. If everything goes hunky-dory, the “except” block is ignored.
Typically, the code in the “except block” is used to clean up the program — close files, databases, and network connections — and print a user-friendly message that explains the error or the crash. More advanced programs might attempt to diagnose and fix the error.
The modified looks like this:
try: dict_file = urllib2.urlopen(dict_filepath).read() except: print("An error occurred while trying to import dictionary words.") sys.exit(0)
Problem #3: Validating User Input
Most users are smart, reasonable people, but there’s always that one guy who inputs a letter when the program asks for a number. As a rule of thumb, always validate user input. Are there any letters in that phone number? Does the user-entered e-mail address contain the mandatory “@“ symbol? At the very least, input validation prevents awkward crashes, and at best, it stops hackers using a program to break into a computer.
In our proof-reader, we ask the user to enter the name of the file. Perhaps we want to check the file extension — does the name end in “.txt”, currently the only type of text file we can handle?
user_filepath = raw_input("Enter name of file: ") # prompt user for name of file if not user_filepath.endswith(".txt"): print("Invalid file name") sys.exit(0)
Problem #4: What goes in the While Loop?
If you run the program yourself, you’ll notice that creating the dictionary takes a lot of time — a few seconds, to be precise. To be fair, we’re connecting to a server over a network and downloading almost 47,000 words. A few seconds may not seem like much, but it’s irritating to the user waiting in limbo.
So here’s the question — do we really need to make the dictionary from scratch at every iteration? Or would the program work equally well if we make the dictionary once and then start the loop?
Costly, slow operations may be unavoidable, but your job as a programmer is to minimize them as much as possible. This could mean being clever about when you start your loop. It could mean “caching” — keeping big files stored locally on your computer so that you don’t have to re-fetch them from a website or database. Or, in extreme cases, it could mean finding a more efficient programming language or algorithm.
Typically, “costly” programming operations involve networking, reading and writing to databases, manipulating graphics or iterating through lists with lots of elements.
“Hard-coding” means using a value directly inside your code instead of encapsulating that value inside a variable. For example:
words = file.split("\n")
separator = "\n" words = file.split(separator)
This can lead to problems when expanding a program, especially when there’s numerous (sometimes thousands!) of files of code. What if you change your program to download the dictionary from a different source, only this source separates its words using semi-colons? Since hard-coding is subtle, the offending lines can easily get lost. Programmers might even forget that the hard-coded value exists!
As a general rule, use variables for any and all values that might change in a program’s future. A helpful practice is to cluster variables at the top of your file, so that they’re easy to find and you can’t forget about them. In the proof-reader, you might include the following after importing packages but before entering the while loop:
filepath = "https://raw.githubusercontent.com/dwyl/english-words/master/words.txt" punctuation = [".", ",", "!", "?", ";", ":", "(", ")"] separator = "\n" highlight_color = “red"
That said, hard-coding isn’t always bad. Here’s one example:
file2 = open(filepath2, 'r').read()
The second argument — ‘r’ — indicates that we’re opening the file for a reading operation and not a writing operation. This is a core feature of the program; it’s not going to change no matter how many tools we add to the software. In this case, using an extra variable would just clutter up the code.
What if you can guarantee that the code is never, ever going to change? If you’re writing a quick script at a hackathon, or as a proof-of-concept, then hard-coding is convenient and saves time. But in a professional setting software always changes and evolves. Better to build good habits and avoid hardcoding from the start.
Updated Code Listing
import urllib2 import sys from termcolor import colored dict_filepath = "https://raw.githubusercontent.com/dwyl/english-words/master/words.txt" punctuation = [".", ",", "!", "?", ";", ":", "(", ")"] highlight_color = "red" print("Welcome to the simple proof-reader! Please wait while we prepare the program.\n") # Import dictionary words from internet link try: dict_file = urllib2.urlopen(dict_filepath).read() except: print("An error occurred while trying to import dictionary words.") sys.exit(0) dictionary_words = dict_file.split("\n") # separate words on different lines # Sanitize inputs for i in range(len(dictionary_words)): dictionary_words[i] = dictionary_words[i].lower() # make capital letters lowercase dictionary_words = set(dictionary_words) # convert from a list to a set """ START THE MAIN LOOP """ running = True while running: # Import words from local file user_filepath = raw_input("Enter name of file: ") # prompt user for name of file if not user_filepath.endswith(".txt"): print("Invalid file name") sys.exit(0) print("") # Add space in console display try: user_file = open(user_filepath, 'r').read() except: print("Oops! That file could not be opened.") sys.exit(0) original_user_words = user_file.split(" ") # separate words using spaces # Create list of sanitized inputs sanitized_user_words =  # create a new, empty list for i in range(len(original_user_words)): sanitized_user_words.append(original_user_words[i].lower()) # Remove punctuation marks for symbol in punctuation: sanitized_user_words[i] = sanitized_user_words[i].replace(symbol, "") # Check for mistakes using dictionary, highlight mistakes in red in the original list for i in range(len(sanitized_user_words)): if sanitized_user_words[i] not in dictionary_words: original_user_words[i] = colored(original_user_words[i], highlight_color) # Display results in terminal for word in original_user_words: print(word), print("\n") # Ask user if they'd like to proof another file proof_again = raw_input("Would you like to proofread another file? (y/n) ") if proof_again == "y": running = True else: running = False print("")
What a makeover!
While we could still suggest more changes, mostly of these “improvements” are a question of taste. For example, we could make our prompts more descriptive and user-friendly. We could also split the main body of code into separate functions, which might help when expanding our program by allowing us to reuse snippets of code. Some programmers would claim this makes the code more confusing. Others programmers would find it easier to read.
Writing good code is an art as well as a science. If all these “best practices” seem overwhelming, just remember: if your code works, then you’ve already done the hard part! The rest is just polish.
Exception Handling in Python
Tips for better programming
How to choose variable names
Installing urllib2 package in PyCharm
Free Python IDEs
Also In The October 2018 Issue
Create a fun adventure game with sprites using a retro fantasy computer.
Hide information in plain sight using a clever code and a good book.
Logic puzzles help develop reasoning skills useful for programming, computer science, and anything you might do.
Find perfect and fun gifts for your loved ones that teach STEAM concepts and skills.
From light-up bow-ties to conductive thread, you’ll be the life of the party with this STEAM-inspired gear.
A free online test service reveals how much personal data your web browser is giving away.
Add more tools to your command line arsenal, including running mini-scripts and making backup copies.
Use switches to take your robotic creations to the next level.
Create the American flag in SketchUp using this detailed tutorial.
From lasers to supernovas, Berboucha is making science communication a priority.
Code can always be improved. Check out these tips to make you the best programmer you can be!
It’s a programming language unlike any you’ve seen before. Check out this symbolic system designed for mathematical calculations.
It’s a game that’s obsessing the world. Harmful, or a potential gateway to new skills?
Some fun Python code that introduces you to the arcane world of event handling.
New, improved, faster, and sleeker - it’s Scratch 3, your new favourite block language!
Learn about the brilliant algorithm behind all of your GPS devices.
It’s free, comprehensive, and available on-the-go. This cool app helps you master Python faster than ever before.
Open up whole new worlds to explore through these interesting, diverse add-ons.
Links from the bottom of all the October 2018 articles, collected in one place for you to print, share, or bookmark.
Interesting stories about computer science, software programming, and technology for October 2018.