In the world where AI, ML, and Data Science are the dominant technologies these days, all the mentioned technologies are dependent on the Python programming language in some or the other way. So becoming a master in Python can open many doors in your career and land in some of the best opportunities across the planet.
I am hoping you as a reader here is either beginning in Python or practicing to become an expert or maybe challenging your skills with even harder problems to work on with Python. No matter wherever you rate yourself in the Python skill, trying to work on Python projects would definitely uplift your skills and build up your profile to face the competitive world outside. Although, Python books and Python tutorials are pretty helpful to and provide quite detailed knowledge of the ultimate test of your learning would come from the capability that you can code and create something of your own.
Before jumping right into the project ideas let us read how can Python projects help you as a Python developer and which platform you should consider before you start any Python projects.
1. MAD LIBS GENERATOR
In the world where AI, ML, and Data Science are the dominant technologies these days, all the mentioned technologies are dependent on the Python programming language in some or the other way. So becoming a master in Python can open many doors in your career and land in some of the best opportunities across the planet.
I am hoping you as a reader here is either beginning in Python or practicing to become an expert or maybe challenging your skills with even harder problems to work on with Python. No matter wherever you rate yourself in the Python skill, trying to work on Python projects would definitely uplift your skills and build up your profile to face the competitive world outside. Although, Python books and Python tutorials are pretty helpful to and provide quite detailed knowledge of the ultimate test of your learning would come from the capability that you can code and create something of your own.
Before jumping right into the project ideas let us read how can Python projects help you as a Python developer and which platform you should consider before you start any Python projects.
1. MAD LIBS GENERATOR
This python beginner project is a good start for beginner software developers as it has concepts like strings, variables, and concatenation. Mad Libs Generator teaches to manipulate user-inputted data as the Mad Libs refer to a series of inputs that a user enters. The input from the user could be anything from an adjective, a pronoun, or even a verb. After all the inputs are entered the application takes all the data and arranges it to build a story template.
SOURCE CODE
""" Mad Libs Generator
----------------------------------------
"""
//Loop back to this point once code finishes
loop = 1
while (loop < 10):
// All the questions that the program asks the user
noun = input("Choose a noun: ")
p_noun = input("Choose a plural noun: ")
noun2 = input("Choose a noun: ")
place = input("Name a place: ")
adjective = input("Choose an adjective (Describing word): ")
noun3 = input("Choose a noun: ")
// Displays the story based on the users input
print ("------------------------------------------")
print ("Be kind to your",noun,"- footed", p_noun)
print ("For a duck may be somebody's", noun2,",")
print ("Be kind to your",p_noun,"in",place)
print ("Where the weather is always",adjective,".")
print ()
print ("You may think that is this the",noun3,",")
print ("Well it is.")
print ("------------------------------------------")
// Loop back to "loop = 1"
loop = loop + 1
2. NUMBER GUESSING GAME
This project is an exciting fun game for beginners to build up. The program generates a random number from 1 to 10, or 1 to 100 any range that is specified and the user must guess the number after a hint from the computer. Every time a user’s guess is wrong they are prompted with more hints to make it easier for them to guess the number but at the cost of reducing the score. The clue any math clue like multiples, divisible, greater or smaller, or a combination of all.
The program also requires functions to check if an actual number is entered by the user or not, to compare the input number with the actual number, to find the difference between the two numbers.
SOURCE CODE
""" Number Guessing Game
----------------------------------------
"""
import random
attempts_list = []
def show_score():
if len(attempts_list) <= 0:
print("There is currently no high score, it's yours for the taking!")
else:
print("The current high score is {} attempts".format(min(attempts_list)))
def start_game():
random_number = int(random.randint(1, 10))
print("Hello traveler! Welcome to the game of guesses!")
player_name = input("What is your name? ")
wanna_play = input("Hi, {}, would you like to play the guessing game? (Enter Yes/No) ".format(player_name))
// Where the show_score function USED to be
attempts = 0
show_score()
while wanna_play.lower() == "yes":
try:
guess = input("Pick a number between 1 and 10 ")
if int(guess) < 1 or int(guess) > 10:
raise ValueError("Please guess a number within the given range")
if int(guess) == random_number:
print("Nice! You got it!")
attempts += 1
attempts_list.append(attempts)
print("It took you {} attempts".format(attempts))
play_again = input("Would you like to play again? (Enter Yes/No) ")
attempts = 0
show_score()
random_number = int(random.randint(1, 10))
if play_again.lower() == "no":
print("That's cool, have a good one!")
break
elif int(guess) > random_number:
print("It's lower")
attempts += 1
elif int(guess) < random_number:
print("It's higher")
attempts += 1
except ValueError as err:
print("Oh no!, that is not a valid value. Try again...")
print("({})".format(err))
else:
print("That's cool, have a good one!")
if __name__ == '__main__':
start_game()
3. ROCK PAPER SCISSORS GAME
This program or a mini-game is designed when you don’t have anyone to play or you are under lockdown alone. There are a number of functions that this program requires so let us have an overview of each.- a random function: to generate rock, paper, or scissors.
- valid function: to check the validity of the move.
- result function: to declare the winner of the round.
- scorekeeper: to keep track of the score.
The program requires the user to make the first move before it makes one the move. Once the move is validated the input is evaluated, the input entered could be a string or an alphabet. After evaluating the input string a winner is decided by the result function and the score of the round is updated by the scorekeeper function.
SOURCE CODE
""" Rock Paper Scissors
----------------------------------------
"""
import random
import os
import re
os.system('cls' if os.name=='nt' else 'clear')
while (1 < 2):
print "\n"
print "Rock, Paper, Scissors - Shoot!"
userChoice = raw_input("Choose your weapon [R]ock], [P]aper, or [S]cissors: ")
if not re.match("[SsRrPp]", userChoice):
print "Please choose a letter:"
print "[R]ock, [S]cissors or [P]aper."
continue
// Echo the user's choice
print "You chose: " + userChoice
choices = ['R', 'P', 'S']
opponenetChoice = random.choice(choices)
print "I chose: " + opponenetChoice
if opponenetChoice == str.upper(userChoice):
print "Tie! "
#if opponenetChoice == str("R") and str.upper(userChoice) == "P"
elif opponenetChoice == 'R' and userChoice.upper() == 'S':
print "Scissors beats rock, I win! "
continue
elif opponenetChoice == 'S' and userChoice.upper() == 'P':
print "Scissors beats paper! I win! "
continue
elif opponenetChoice == 'P' and userChoice.upper() == 'R':
print "Paper beat rock, I win! "
continue
else:
print "You win!"
4. WEBSITE BLOCKER
We all know while surfing through the net many unwanted sites popup to distract us. This project comes at help in such cases as it can be built up to block certain websites from opening. The program is beneficial for people who get easily distracted to switch to social media sites while into something serious.
SOURCE CODE
""" Website Blocker
----------------------------------------
"""
import time
from datetime import datetime as dt
hosts_path = r"/etc/hosts" // r is for raw string
hosts_temp = "hosts"
redirect = "127.0.0.1"
web_sites_list = ["www.facebook.com", "facebook.com"] // users can modify the list of the websites they want to block
while True:
if dt(dt.now().year, dt.now().month, dt.now().day, 9) < dt.now() < dt(dt.now().year, dt.now().month, dt.now().day,22):
print("Working hours")
with open(hosts_path, "r+") as file:
content = file.read()
for website in web_sites_list:
if website in content:
pass
else:
file.write(redirect+" "+website+"\n")
else:
print("Fun time")
with open(hosts_path, "r+") as file:
content = file.readlines()
file.seek(0) // reset the pointer to the top of the text file
for line in content:
// here comes the tricky line, basically we overwrite the whole file
if not any(website in line for website in web_sites_list):
file.write(line)
// do nothing otherwise
file.truncate() // this line is used to delete the trailing lines (that contain DNS)
time.sleep(5)
5. BINARY SEARCH ALGORTIHM
The name is evident enough to give an overview of the project. The program requires you to create a list of numbers between 0 to whatever range you prefer, with every succeeding number having a difference of 2 between them.
When the user inputs a random number to be searched the program begins its search by dividing the list into two halves. The first half is searched for the required number and if found, the other half is rejected and vice versa. The search continues until the number is found or the subarray size becomes zero. This Python project idea could also help you write a program to search an element in the list.
SOURCE CODE
""" Binary Search Algorithm
----------------------------------------
"""
// iterative implementation of binary search in Python
def binary_search(a_list, item):
"""Performs iterative binary search to find the position of an integer in a given, sorted, list.
a_list -- sorted list of integers
item -- integer you are searching for the position of
"""
first = 0
last = len(a_list) - 1
while first <= last:
i = (first + last) / 2
if a_list[i] == item:
return ' found at position '.format(item=item, i=i)
elif a_list[i] > item:
last = i - 1
elif a_list[i] < item:
first = i + 1
else:
return ' not found in the list'.format(item=item)
// recursive implementation of binary search in Python
def binary_search_recursive(a_list, item):
"""Performs recursive binary search of an integer in a given, sorted, list.
a_list -- sorted list of integers
item -- integer you are searching for the position of
"""
first = 0
last = len(a_list) - 1
if len(a_list) == 0:
return ' was not found in the list'.format(item=item)
else:
i = (first + last) // 2
if item == a_list[i]:
return ' found'.format(item=item)
else:
if a_list[i] < item:
return binary_search_recursive(a_list[i+1:], item)
else:
return binary_search_recursive(a_list[:i], item)
6. CALCULATOR
Building this project you would learn to design a graphical UI and make you familiar with a library like Tkinter. This library enables you to create buttons to perform different operations and display results on the screen.
SOURCE CODE
""" Calculator
----------------------------------------
"""
def addition ():
print("Addition")
n = float(input("Enter the number: "))
t = 0 //Total number enter
ans = 0
while n != 0:
ans = ans + n
t+=1
n = float(input("Enter another number (0 to calculate): "))
return [ans,t]
def subtraction ():
print("Subtraction");
n = float(input("Enter the number: "))
t = 0 //Total number enter
sum = 0
while n != 0:
ans = ans - n
t+=1
n = float(input("Enter another number (0 to calculate): "))
return [ans,t]
def multiplication ():
print("Multiplication")
n = float(input("Enter the number: "))
t = 0 //Total number enter
ans = 1
while n != 0:
ans = ans * n
t+=1
n = float(input("Enter another number (0 to calculate): "))
return [ans,t]
def average():
an = []
an = addition()
t = an[1]
a = an[0]
ans = a / t
return [ans,t]
// main...
while True:
list = []
print(" My first python program!")
print(" Simple Calculator in python by Malik Umer Farooq")
print(" Enter 'a' for addition")
print(" Enter 's' for substraction")
print(" Enter 'm' for multiplication")
print(" Enter 'v' for average")
print(" Enter 'q' for quit")
c = input(" ")
if c != 'q':
if c == 'a':
list = addition()
print("Ans = ", list[0], " total inputs ",list[1])
elif c == 's':
list = subtraction()
print("Ans = ", list[0], " total inputs ",list[1])
elif c == 'm':
list = multiplication()
print("Ans = ", list[0], " total inputs ",list[1])
elif c == 'v':
list = average()
print("Ans = ", list[0], " total inputs ",list[1])
else:
print ("Sorry, invilid character")
else:
break
7. ALARM CLOCK
This is an interesting Command Line Interface (CLI) Python application for an intermediate-level developer. People across the globe use alarm clock features in their devices but this project can be altered in a bit different manner. Some certain YouTube links can be added to a text file and the project is programmed in a way that when a user sets an alarm then the code shall pick a random link from the video and will start playing the YouTube link.
SOURCE CODE
""" Alarm Clock
----------------------------------------
"""
import datetime
import os
import time
import random
import webbrowser
// If video URL file does not exist, create one
if not os.path.isfile("youtube_alarm_videos.txt"):
print('Creating "youtube_alarm_videos.txt"...')
with open("youtube_alarm_videos.txt", "w") as alarm_file:
alarm_file.write("https://www.youtube.com/watch?v=anM6uIZvx74")
def check_alarm_input(alarm_time):
"""Checks to see if the user has entered in a valid alarm time"""
if len(alarm_time) == 1: // [Hour] Format
if alarm_time[0] < 24 and alarm_time[0] >= 0:
return True
if len(alarm_time) == 2: // [Hour:Minute] Format
if alarm_time[0] < 24 and alarm_time[0] >= 0 and \
alarm_time[1] < 60 and alarm_time[1] >= 0:
return True
elif len(alarm_time) == 3: // [Hour:Minute:Second] Format
if alarm_time[0] < 24 and alarm_time[0] >= 0 and \
alarm_time[1] < 60 and alarm_time[1] >= 0 and \
alarm_time[2] < 60 and alarm_time[2] >= 0:
return True
return False
// Get user input for the alarm time
print("Set a time for the alarm (Ex. 06:30 or 18:30:00)")
while True:
alarm_input = input(">> ")
try:
alarm_time = [int(n) for n in alarm_input.split(":")]
if check_alarm_input(alarm_time):
break
else:
raise ValueError
except ValueError:
print("ERROR: Enter time in HH:MM or HH:MM:SS format")
// Convert the alarm time from [H:M] or [H:M:S] to seconds
seconds_hms = [3600, 60, 1] // Number of seconds in an Hour, Minute, and Second
alarm_seconds = sum([a*b for a,b in zip(seconds_hms[:len(alarm_time)], alarm_time)])
// Get the current time of day in seconds
now = datetime.datetime.now()
current_time_seconds = sum([a*b for a,b in zip(seconds_hms, [now.hour, now.minute, now.second])])
// Calculate the number of seconds until alarm goes off
time_diff_seconds = alarm_seconds - current_time_seconds
// If time difference is negative, set alarm for next day
if time_diff_seconds < 0:
time_diff_seconds += 86400 // number of seconds in a day
// Display the amount of time until the alarm goes off
print("Alarm set to go off in %s" % datetime.timedelta(seconds=time_diff_seconds))
// Sleep until the alarm goes off
time.sleep(time_diff_seconds)
// Time for the alarm to go off
print("Wake Up!")
// Load list of possible video URLs
with open("youtube_alarm_videos.txt", "r") as alarm_file:
videos = alarm_file.readlines()
// Open a random video from the list
webbrowser.open(random.choice(videos))
8. TIC-TAC-TOE
This game is very popular amongst all of us and even fun to build as a Python project. I am pretty sure most of us know how to play it but let me give a quick brush up.
It is a two-player game and consists of a nine-square grid. Each player chooses their move and with O or X and marks their square one at each chance. The player who succeeds in making their marks all in one line whether diagonally, horizontally, or vertically wins. The challenge for the other player is to block the game for their opponent and also to make their chain.
For building this project in Python who can use the Pygame Python library that is loaded with all computer graphics and sounds.
SOURCE CODE
""" Tic Tac Toe
----------------------------------------
"""
import random
import sys
board=[i for i in range(0,9)]
player, computer = '',''
// Corners, Center and Others, respectively
moves=((1,7,3,9),(5,),(2,4,6,8))
// Winner combinations
winners=((0,1,2),(3,4,5),(6,7,8),(0,3,6),(1,4,7),(2,5,8),(0,4,8),(2,4,6))
// Table
tab=range(1,10)
def print_board():
x=1
for i in board:
end = ' | '
if x%3 == 0:
end = ' \n'
if i != 1: end+='---------\n';
char=' '
if i in ('X','O'): char=i;
x+=1
print(char,end=end)
def select_char():
chars=('X','O')
if random.randint(0,1) == 0:
return chars[::-1]
return chars
def can_move(brd, player, move):
if move in tab and brd[move-1] == move-1:
return True
return False
def can_win(brd, player, move):
places=[]
x=0
for i in brd:
if i == player: places.append(x);
x+=1
win=True
for tup in winners:
win=True
for ix in tup:
if brd[ix] != player:
win=False
break
if win == True:
break
return win
def make_move(brd, player, move, undo=False):
if can_move(brd, player, move):
brd[move-1] = player
win=can_win(brd, player, move)
if undo:
brd[move-1] = move-1
return (True, win)
return (False, False)
// AI goes here
def computer_move():
move=-1
// If I can win, others do not matter.
for i in range(1,10):
if make_move(board, computer, i, True)[1]:
move=i
break
if move == -1:
// If player can win, block him.
for i in range(1,10):
if make_move(board, player, i, True)[1]:
move=i
break
if move == -1:
// Otherwise, try to take one of desired places.
for tup in moves:
for mv in tup:
if move == -1 and can_move(board, computer, mv):
move=mv
break
return make_move(board, computer, move)
def space_exist():
return board.count('X') + board.count('O') != 9
player, computer = select_char()
print('Player is [%s] and computer is [%s]' % (player, computer))
result='%%% Deuce ! %%%'
while space_exist():
print_board()
print('#Make your move ! [1-9] : ', end='')
move = int(input())
moved, won = make_move(board, player, move)
if not moved:
print(' >> Invalid number ! Try again !')
continue
//
if won:
result='*** Congratulations ! You won ! ***'
break
elif computer_move()[1]:
result='=== You lose ! =='
break;
print_board()
print(result)
9. DIRECTORY TREE GENERATOR
This project is useful for visualizing the relationship between files and directories and making their positioning easy to comprehend. Python OS library can be used to list the files and directories within a specific directory. The excellent frameworks of this project are Docopt and Argparse.
SOURCE CODE
""" Directory Tree Generator
----------------------------------------
"""
import argparse
import os
from walkdir import filtered_walk
parser = argparse.ArgumentParser(description='Print the directory-tree code for the LaTeX dirtree package.')
parser.add_argument(dest='path', type=str, help="Root directory of the tree")
parser.add_argument('-d', '--maxDepth', dest='maxDepth', type=int, help="Max depth for tree expansion")
parser.add_argument('-H', '--includeHidden', dest='includeHidden', action='store_true', help='Include hidden files')
parser.add_argument('-S', '--includeSystem', dest='includeSystem', action='store_true', help='Include system files')
system_file_names = [".DS_Store"]
// Delete trailing / in rootDir which can lead to errors
def delete_trailing_slash(path_name):
while path_name.endswith('/'):
path_name = path_name[:-1]
return path_name
// Count how many levels deep is the directory with respect to dirRoot
def get_relative_depth(dir_path, level_offset):
return dir_path.count(os.path.sep) - level_offset
// Escape illegal symbols for LaTeX
def escape_illegal(name):
illegal_char_array = ['\\', '&', '%', '$', '#', '_', '{', '}', '~', '^']
for char in illegal_char_array:
name = name.replace(char, "\\" + char)
return name
rootDir = delete_trailing_slash(parser.parse_args().path)
includeHidden = parser.parse_args().includeHidden
includeSystem = parser.parse_args().includeSystem
maxDepth = parser.parse_args().maxDepth
// if the directory exists
if os.path.isdir(rootDir) and os.path.exists(rootDir):
indentChar = " "
// Depth of the root (i.e. number of "/")
levelOffset = rootDir.count(os.path.sep) - 1
// Create filter
excluded_filter = []
if not includeHidden:
excluded_filter.append(".*")
if not includeSystem:
excluded_filter += system_file_names
print ("\dirtree{%")
for dirName, subdirList, fileList in sorted(filtered_walk(rootDir, depth=maxDepth, excluded_dirs=excluded_filter,
excluded_files=excluded_filter)):
level = get_relative_depth(dirName, levelOffset)
baseName = os.path.basename(dirName)
if level == 1: // for the first level only print the whole path
print(indentChar + "." + str(level) + " {" + escape_illegal(dirName) + "} .")
else:
print(indentChar * level + "." + str(level) + " {" + escape_illegal((os.path.basename(dirName))) + "} .")
level += 1
for fileName in sorted(fileList):
print(indentChar * level + "." + str(level) + " {" + escape_illegal(fileName) + "} .")
print ("}")
else:
print ("Error: root directory not found")
10. CURRENCY CONVERTER
This is a straightforward project with a simple GUI. The name quite evidently describes the role of the project is to convert currencies from one unit into another. For example, converting Indian rupee to USD or euro. Tkinter, the standard Python interface can be used to design and develop this application.
SOURCE CODE
""" Currency Converter
----------------------------------------
"""
import urllib.request
import json
def currency_converter(currency_from, currency_to, currency_input):
yql_base_url = "https://query.yahooapis.com/v1/public/yql"
yql_query = 'select%20*%20from%20yahoo.finance.xchange%20where%20pair' \
'%20in%20("'+currency_from+currency_to+'")'
yql_query_url = yql_base_url + "?q=" + yql_query + "&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys"
try:
yql_response = urllib.request.urlopen(yql_query_url)
try:
json_string = str(yql_response.read())
json_string = json_string[2:
json_string = json_string[:-1]
print(json_string)
yql_json = json.loads(json_string)
last_rate = yql_json['query']['results']['rate']['Rate']
currency_output = currency_input * float(last_rate)
return currency_output
except (ValueError, KeyError, TypeError):
print(yql_query_url)
return "JSON format error"
except IOError as e:
print(str(e))
currency_input = 1
// currency codes : http://en.wikipedia.org/wiki/ISO_4217
currency_from = "USD"
currency_to = "TRY"
rate = currency_converter(currency_from, currency_to, currency_input)
print(rate)
11. CONTENT AGGREGATOR
Surfing through different websites and articles in search of good and authentic content is a time-consuming process. This Python project can help you save time looking for content. A content aggregator searches popular websites in search for relevant content and then complies with all the content and provides the user with unbiased content.
SOURCE CODE
""" Content Aggregator
----------------------------------------
"""
import urllib, os, requests, datetime, subprocess
// reddit imports
import praw, pprint
// pip install feedparser
import feedparser
// stockexchange
from nsetools import Nse
// Place your CLIENT_ID & CLIENT_SECRET below
reddit = praw.Reddit(client_id='XXXXXXX',
client_secret='XXXXXXXXXXX',
grant_type_access='client_credentials',
user_agent='script/1.0')
// class Reddit:
// def TopNews(self):
// Add your favorite NEWS subreddits in the argument as many as you'd like.
// for submission in reddit.subreddit('News+WorldNews+UpliftingNews+').top(limit=10):
// top_news = reddit.domain(submission).top('month')
// print(top_news)
"""
Each class contains functions which further calls
APIs from the neccesary packages and the rest is
self explanatory I suppose
"""
class News:
def Indian_News(self):
newsfeed = feedparser.parse(
"http://feeds.feedburner.com/ndtvnews-india-news"
)
print("Today's News: ")
for i in range(0, 20):
entry = newsfeed.entries[i]
print(entry.title)
print(entry.summary)
print("------News Link--------")
print(entry.link)
print("###########################################")
print('-------------------------------------------------------------------------------------------------------')
class Medium:
// https://github.com/thepracticaldev/dev.to/issues/28#issuecomment-325544385
def medium_programming(self):
feed = feedparser.parse(
"https://medium.com/feed/tag/programming"
)
print("Programming Today: ")
for i in range(10):
entry = feed.entries[i]
print(entry.title)
print("URL: " + entry.link)
print("###########################################")
print('-------------------------------------------------------------------------------------------------------')
def medium_python(self):
feed_python = feedparser.parse(
"https://medium.com/feed/tag/python"
)
print("Python Today: ")
for i in range(10):
entry = feed_python.entries[i]
print(entry.title)
print("URL: " + entry.link)
print("###########################################")
print('-------------------------------------------------------------------------------------------------------')
def medium_developer(self):
feed_developer = feedparser.parse(
"https://medium.com/feed/tag/developer"
)
print("Developer News Today: ")
for i in range(5):
entry = feed_developer.entries[i]
print(entry.title)
print("URL: " + entry.link)
print("###########################################")
print('-------------------------------------------------------------------------------------------------------')
class StockExchange:
def nse_stock(self):
nse = Nse()
print("TOP GAINERS OF YESTERDAY")
pprint.pprint(nse.get_top_gainers())
print("###########################################")
print("TOP LOSERS OF YESTERDAY")
pprint.pprint(nse.get_top_losers())
print("###########################################")
print('-------------------------------------------------------------------------------------------------------')
// objects inititalization
// reddit_object = Reddit()
News_object = News()
Medium_object = Medium()
StockExchange_object = StockExchange()
if __name__ == "__main__":
// Functions call of each class
// reddit_object.TopNews()
News_object.Indian_News()
Medium_object.medium_python()
Medium_object.medium_programming()
Medium_object.medium_developer()
StockExchange_object.nse_stock()
12. PLAGIARISM CHECKER
With content creation and blogging one of the good businesses in the market everyone wants to try their hands on this but some lack sufficient funds to give their articles a free plagiarism check as mostly plagiarism checkers do not come for free. Building a Python plagiarism checker could be built here using a natural language processing library along with the search API to search the first few pages of Google and detect plagiarism if any.
SOURCE CODE
""" Plagiarism Checker
----------------------------------------
"""
import click
from .matcher import Text, ExtendedMatch, Matcher
import os
import glob
import csv
import logging
import itertools
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def getFiles(path):
"""
Determines whether a path is a file or directory.
If it's a directory, it gets a list of all the text files
in that directory, recursively. If not, it gets the file.
"""
if os.path.isfile(path):
return [path]
elif os.path.isdir(path):
// Get list of all files in dir, recursively.
return glob.glob(path + "/**/*.txt", recursive=True)
else:
raise click.ClickException("The path %s doesn't appear to be a file or directory" % path)
def checkLog(logfile, textpair):
"""
Checks the log file to make sure we haven't already done a particular analysis.
Returns True if the pair is in the log already.
"""
pairs = []
logging.debug('Looking in the log for textpair:' % textpair)
if not os.path.isfile(logfile):
logging.debug('No log file found.')
return None
with open(logfile, newline='') as f:
reader = csv.reader(f)
for row in reader:
pairs.append([row[0], row[1]])
// logging.debug('Pairs already in log: %s' % pairs)
return textpair in pairs
def createLog(logfile, columnLabels):
"""
Creates a log file and sets up headers so that it can be easily read
as a CSV later.
"""
header = ','.join(columnLabels) + '\n'
with open(logfile, 'w') as f:
f.write(header)
f.close
@click.command()
@click.argument('text1')
@click.argument('text2')
@click.option('-t', '--threshold', type=int, default=3, \
help='The shortest length of match to include in the list of initial matches.')
@click.option('-c', '--cutoff', type=int, default=5, \
help='The shortest length of match to include in the final list of extended matches.')
@click.option('-n', '--ngrams', type=int, default=3, \
help='The ngram n-value to match against.')
@click.option('-l', '--logfile', default='log.txt', help='The name of the log file to write to.'
@click.option('--stops', is_flag=True, help='Include stopwords in matching.', default=False)
@click.option('--verbose', is_flag=True, help='Enable verbose mode, giving more information.')
def cli(text1, text2, threshold, cutoff, ngrams, logfile, verbose, stops):
""" This program finds similar text in two text files. """
//Determine whether the given path is a file or directory.
texts1 = getFiles(text1)
texts2 = getFiles(text2)
if verbose:
logging.basicConfig(level=logging.DEBUG)
if stops:
logging.debug('Including stopwords in tokenizing.')
logging.debug('Comparing this/these text(s): %s' % str(texts1))
logging.debug('with this/these text(s): %s' % str(texts2))
pairs = list(itertools.product(texts1, texts2))
numPairs = len(pairs)
logging.debug('Comparing %s pairs.' % numPairs)
// logging.debug('List of pairs to compare: %s' % pairs)
logging.debug('Loading files into memory.')
texts = {}
prevTextObjs = {}
for filename in texts1+texts2:
with open(filename, errors="ignore") as f:
text = f.read()
if filename not in texts:
texts[filename] = text
logging.debug('Loading complete.')
for index, pair in enumerate(pairs):
timeStart = os.times().elapsed
logging.debug('Now comparing pair %s of %s.' % (index+1, numPairs))
logging.debug('Comparing %s with %s.' % (pair[0], pair[1]))
// Make sure we haven't already done this pair.
inLog = checkLog(logfile, [pair[0], pair[1]])
if inLog is None:
// This means that there isn't a log file. Let's set one up.
// Set up columns and their labels.
columnLabels = ['Text A', 'Text B', 'Threshold', 'Cutoff', 'N-Grams', 'Num Matches', 'Text A Length', 'Text B Length', 'Locations in A', 'Locations in B']
logging.debug('No log file found. Setting one up.')
createLog(logfile, columnLabels)
if inLog:
logging.debug('This pair is already in the log. Skipping.')
continue
logging.debug('Processing texts.')
filenameA, filenameB = pair[0], pair[1]
textA, textB = texts[filenameA], texts[filenameB]
// Put this in a dictionary so we don't have to process a file twice.
for filename in [filenameA, filenameB]:
if filename not in prevTextObjs:
logging.debug('Processing text: %s' % filename)
prevTextObjs[filename] = Text(texts[filename], filename)
// Just more convenient naming.
textObjA = prevTextObjs[filenameA]
textObjB = prevTextObjs[filenameB]
// Reset the table of previous text objects, so we don't overload memory.
// This means we'll only remember the previous two texts.
prevTextObjs =
// Do the matching.
myMatch = Matcher(textObjA, textObjB, threshold=threshold, cutoff=cutoff, ngramSize=ngrams, removeStopwords=stops)
myMatch.match()
timeEnd = os.times().elapsed
timeElapsed = timeEnd-timeStart
logging.debug('Matching completed in %s seconds.' % timeElapsed)
// Write to the log, but only if a match is found.
if myMatch.numMatches > 0:
logItems = [pair[0], pair[1], threshold, cutoff, ngrams, myMatch.numMatches, myMatch.textA.length, myMatch.textB.length, str(myMatch.locationsA), str(myMatch.locationsB)]
logging.debug('Logging items: %s' % str(logItems))
line = ','.join(['"%s"' % item for item in logItems]) + '\n'
f = open(logfile, 'a')
f.write(line)
f.close()
if __name__ == '__main__':
cli()
13. WEB CRAWLER
Web crawler project is an automated script designed to surf the internet and store the content of certain webpages. A web crawler is especially useful to find up-to-date information using multi-thread concepts in its program. A crawler bot is built using Python’s request module or Scrapy, a Python’s open-source web crawling framework explicitly designed for web scraping and extracting data by using APIs. Here you can see this python project source code.
SOURCE CODE
""" Web Crawler
----------------------------------------
"""
import re
import sys
import time
import math
import urllib2
import urlparse
import optparse
import hashli
from cgi import escape
from traceback import format_exc
from Queue import Queue, Empty as QueueEmpty
from bs4 import BeautifulSoup
class Link (object):
def __init__(self, src, dst, link_type):
self.src = src
self.dst = dst
self.link_type = link_type
def __hash__(self):
return hash((self.src, self.dst, self.link_type))
def __eq__(self, other):
return (self.src == other.src and
self.dst == other.dst and
self.link_type == other.link_type)
def __str__(self):
return self.src + " -> " + self.dst
class Crawler(object):
def __init__(self, root, depth_limit, confine=None, exclude=[], locked=True, filter_seen=True):
self.root = root
self.host = urlparse.urlparse(root)[1]
## Data for filters:
self.depth_limit = depth_limit # Max depth (number of hops from root)
self.locked = locked # Limit search to a single host?
self.confine_prefix=confine # Limit search to this prefix
self.exclude_prefixes=exclude; # URL prefixes NOT to visit
self.urls_seen = set() # Used to avoid putting duplicates in queue
self.urls_remembered = set() # For reporting to user
self.visited_links= set() # Used to avoid re-processing a page
self.links_remembered = set() # For reporting to user
self.num_links = 0 # Links found (and not excluded by filters)
self.num_followed = 0 # Links followed.
# Pre-visit filters: Only visit a URL if it passes these tests
self.pre_visit_filters=[self._prefix_ok,
self._exclude_ok,
self._not_visited,
self._same_host]
// Out-url filters: When examining a visited page, only process
// links where the target matches these filters.
if filter_seen:
self.out_url_filters=[self._prefix_ok,
self._same_host]
else:
self.out_url_filters=[]
def _pre_visit_url_condense(self, url):
""" Reduce (condense) URLs into some canonical form before
visiting. All occurrences of equivalent URLs are treated as
identical.
All this does is strip the \"fragment\" component from URLs,
so that http://foo.com/blah.html\#baz becomes
http://foo.com/blah.html """
base, frag = urlparse.urldefrag(url)
return base
// URL Filtering functions. These all use information from the
// state of the Crawler to evaluate whether a given URL should be
// used in some context. Return value of True indicates that the
// URL should be used.
def _prefix_ok(self, url):
"""Pass if the URL has the correct prefix, or none is specified"""
return (self.confine_prefix is None or
url.startswith(self.confine_prefix))
def _exclude_ok(self, url):
"""Pass if the URL does not match any exclude patterns"""
prefixes_ok = [ not url.startswith(p) for p in self.exclude_prefixes]
return all(prefixes_ok)
def _not_visited(self, url):
"""Pass if the URL has not already been visited"""
return (url not in self.visited_links)
def _same_host(self, url):
"""Pass if the URL is on the same host as the root URL"""
try:
host = urlparse.urlparse(url)[1]
return re.match(".*%s" % self.host, host)
except Exception, e:
print >> sys.stderr, "ERROR: Can't process url '%s' (%s)" % (url, e)
return False
def crawl(self):
""" Main function in the crawling process. Core algorithm is:
q <- starting page
while q not empty:
url <- q.get()
if url is new and suitable:
page <- fetch(url)
q.put(urls found in page)
else:
nothing
new and suitable means that we don't re-visit URLs we've seen
already fetched, and user-supplied criteria like maximum
search depth are checked. """
q = Queue()
q.put((self.root, 0))
while not q.empty():
this_url, depth = q.get()
#Non-URL-specific filter: Discard anything over depth limit
if depth > self.depth_limit:
continue
//Apply URL-based filters.
do_not_follow = [f for f in self.pre_visit_filters if not f(this_url)]
#Special-case depth 0 (starting URL)
if depth == 0 and [] != do_not_follow:
print >> sys.stderr, "Whoops! Starting URL %s rejected by the following filters:", do_not_follow
//If no filters failed (that is, all passed), process URL
if [] == do_not_follow:
try:
self.visited_links.add(this_url)
self.num_followed += 1
page = Fetcher(this_url)
page.fetch()
for link_url in [self._pre_visit_url_condense(l) for l in page.out_links()]:
if link_url not in self.urls_seen:
q.put((link_url, depth+1))
self.urls_seen.add(link_url)
do_not_remember = [f for f in self.out_url_filters if not f(link_url)]
if [] == do_not_remember:
self.num_links += 1
self.urls_remembered.add(link_url)
link = Link(this_url, link_url, "href")
if link not in self.links_remembered:
self.links_remembered.add(link)
except Exception, e:
print >>sys.stderr, "ERROR: Can't process url '%s' (%s)" % (this_url, e)
#print format_exc()
class OpaqueDataException (Exception):
def __init__(self, message, mimetype, url):
Exception.__init__(self, message)
self.mimetype=mimetype
self.url=url
class Fetcher(object):
"""The name Fetcher is a slight misnomer: This class retrieves and interprets web pages."""
def __init__(self, url):
self.url = url
self.out_urls = []
def __getitem__(self, x):
return self.out_urls[x]
def out_links(self):
return self.out_urls
//def _addHeaders(self, request):
// request.add_header("User-Agent", AGENT)
def _open(self):
url = self.url
try:
request = urllib2.Request(url)
handle = urllib2.build_opener()
except IOError:
return None
return (request, handle)
def fetch(self):
request, handle = self._open()
#self._addHeaders(request)
if handle:
try:
data=handle.open(request)
mime_type=data.info().gettype()
url=data.geturl();
if mime_type != "text/html":
raise OpaqueDataException("Not interested in files of type %s" % mime_type,
mime_type, url)
content = unicode(data.read(), "utf-8",
errors="replace")
soup = BeautifulSoup(content)
tags = soup('a')
except urllib2.HTTPError, error:
if error.code == 404:
print >> sys.stderr, "ERROR: %s -> %s" % (error, error.url)
else:
print >> sys.stderr, "ERROR: %s" % error
tags = []
except urllib2.URLError, error:
print >> sys.stderr, "ERROR: %s" % error
tags = []
except OpaqueDataException, error:
print >>sys.stderr, "Skipping %s, has type %s" % (error.url, error.mimetype)
tags = []
for tag in tags:
href = tag.get("href")
if href is not None:
url = urlparse.urljoin(self.url, escape(href))
if url not in self:
self.out_urls.append(url)
def getLinks(url):
page = Fetcher(url)
page.fetch()
"""for i, url in enumerate(page):
print "%d. %s" % (i, url) """
j = 1
for i, url in enumerate(page):
if url.find("http")>=0:
print "%d. %s" % (j, url)
j = j + 1
def parse_options():
"""parse_options() -> opts, args
Parse any command-line options given returning both
the parsed options and arguments.
"""
parser = optparse.OptionParser()
parser.add_option("-q", "--quiet",
action="store_true", default=False, dest="quiet",
help="Enable quiet mode")
parser.add_option("-l", "--links",
action="store_true", default=False, dest="links",
help="Get links for specified url only")
parser.add_option("-d", "--depth",
action="store", type="int", default=30, dest="depth_limit",
help="Maximum depth to traverse")
parser.add_option("-c", "--confine",
action="store", type="string", dest="confine",
help="Confine crawl to specified prefix")
parser.add_option("-x", "--exclude", action="append", type="string",
dest="exclude", default=[], help="Exclude URLs by prefix")
parser.add_option("-L", "--show-links", action="store_true", default=False,
dest="out_links", help="Output links found")
parser.add_option("-u", "--show-urls", action="store_true", default=False,
dest="out_urls", help="Output URLs found")
parser.add_option("-D", "--dot", action="store_true", default=False,
dest="out_dot", help="Output Graphviz dot file")
opts, args = parser.parse_args()
if len(args) < 1:
parser.print_help(sys.stderr)
raise SystemExit, 1
if opts.out_links and opts.out_urls:
parser.print_help(sys.stderr)
parser.error("options -L and -u are mutually exclusive")
return opts, args
class DotWriter:
""" Formats a collection of Link objects as a Graphviz (Dot)
graph. Mostly, this means creating a node for each URL with a
name which Graphviz will accept, and declaring links between those
nodes."""
def __init__ (self):
self.node_alias = {}
def _safe_alias(self, url, silent=False):
"""Translate URLs into unique strings guaranteed to be safe as
node names in the Graphviz language. Currently, that's based
on the md5 digest, in hexadecimal."""
if url in self.node_alias:
return self.node_alias[url]
else:
m = hashlib.md5()
m.update(url)
name = "N"+m.hexdigest()
self.node_alias[url]=name
if not silent:
print "\t%s [label=\"%s\"];" % (name, url)
return name
def asDot(self, links):
""" Render a collection of Link objects as a Dot graph"""
print "digraph Crawl {"
print "\t edge [K=0.2, len=0.1];"
for l in links:
print "\t" + self._safe_alias(l.src) + " -> " + self._safe_alias(l.dst) + ";"
print "}"
def main():
opts, args = parse_options()
url = args[0]
if opts.links:
getLinks(url)
raise SystemExit, 0
depth_limit = opts.depth_limit
confine_prefix=opts.confine
exclude=opts.exclude
sTime = time.time()
print >> sys.stderr, "Crawling %s (Max Depth: %d)" % (url, depth_limit)
crawler = Crawler(url, depth_limit, confine_prefix, exclude)
crawler.crawl()
if opts.out_urls:
print "\n".join(crawler.urls_seen)
if opts.out_links:
print "\n".join([str(l) for l in crawler.links_remembered])
if opts.out_dot:
d = DotWriter()
d.asDot(crawler.links_remembered)
eTime = time.time()
tTime = eTime - sTime
print >> sys.stderr, "Found: %d" % crawler.num_links
print >> sys.stderr, "Followed: %d" % crawler.num_followed
print >> sys.stderr, "Stats: (%d/s after %0.2fs)" % (
int(math.ceil(float(crawler.num_links) / tTime)), tTime)
if __name__ == "__main__":
main()
14. MUSIC PLAYER
How about building your personal music player? This really sounds exciting to me. Create and build not just any other music app but also aa app which searches through files and explores your program directories in search of music. Build an interactive interface to be used by other users as well.
Consider adding features like browsing through tracks, volume control, song/artist/ album/ movie display, database management, algorithm construction, and data processing to develop a fully-featured interactive app.
SOURCE CODE
""" Music Player
----------------------------------------
"""
import os
import threading
import time
import tkinter.messagebox
from tkinter import *
from tkinter import filedialog
from tkinter import ttk
from ttkthemes import themed_tk as tk
from mutagen.mp3 import MP3
from pygame import mixer
root = tk.ThemedTk()
root.get_themes() // Returns a list of all themes that can be set
root.set_theme("radiance") // Sets an available theme
// Fonts - Arial (corresponds to Helvetica), Courier New (Courier), Comic Sans MS, Fixedsys,
// MS Sans Serif, MS Serif, Symbol, System, Times New Roman (Times), and Verdana
//
// Styles - normal, bold, roman, italic, underline, and overstrike.
statusbar = ttk.Label(root, text="Welcome to Melody", relief=SUNKEN, anchor=W, font='Times 10 italic')
statusbar.pack(side=BOTTOM, fill=X)
// Create the menubar
menubar = Menu(root)
root.config(menu=menubar)
// Create the submenu
subMenu = Menu(menubar, tearoff=0)
playlist = []
// playlist - contains the full path + filename
// playlistbox - contains just the filename
// Fullpath + filename is required to play the music inside play_music load function
def browse_file():
global filename_path
filename_path = filedialog.askopenfilename()
add_to_playlist(filename_path)
mixer.music.queue(filename_path)
def add_to_playlist(filename):
filename = os.path.basename(filename)
index = 0
playlistbox.insert(index, filename)
playlist.insert(index, filename_path)
index += 1
menubar.add_cascade(label="File", menu=subMenu)
subMenu.add_command(label="Open", command=browse_file)
subMenu.add_command(label="Exit", command=root.destroy)
def about_us():
tkinter.messagebox.showinfo('About Melody', 'This is a music player build using Python Tkinter by @attreyabhatt')
subMenu = Menu(menubar, tearoff=0)
menubar.add_cascade(label="Help", menu=subMenu)
subMenu.add_command(label="About Us", command=about_us)
mixer.init() // initializing the mixer
root.title("Melody")
root.iconbitmap(r'images/melody.ico')
// Root Window - StatusBar, LeftFrame, RightFrame
// LeftFrame - The listbox (playlist)
// RightFrame - TopFrame,MiddleFrame and the BottomFrame
leftframe = Frame(root)
leftframe.pack(side=LEFT, padx=30, pady=30)
playlistbox = Listbox(leftframe)
playlistbox.pack()
addBtn = ttk.Button(leftframe, text="+ Add", command=browse_file)
addBtn.pack(side=LEFT)
def del_song():
selected_song = playlistbox.curselection()
selected_song = int(selected_song[0])
playlistbox.delete(selected_song)
playlist.pop(selected_song)
delBtn = ttk.Button(leftframe, text="- Del", command=del_song)
delBtn.pack(side=LEFT)
rightframe = Frame(root)
rightframe.pack(pady=30)
topframe = Frame(rightframe)
topframe.pack()
lengthlabel = ttk.Label(topframe, text='Total Length : --:--')
lengthlabel.pack(pady=5)
currenttimelabel = ttk.Label(topframe, text='Current Time : --:--', relief=GROOVE)
currenttimelabel.pack()
def show_details(play_song):
file_data = os.path.splitext(play_song)
if file_data[1] == '.mp3':
audio = MP3(play_song)
total_length = audio.info.length
else:
a = mixer.Sound(play_song)
total_length = a.get_length()
// div - total_length/60, mod - total_length % 60
mins, secs = divmod(total_length, 60)
mins = round(mins)
secs = round(secs)
timeformat = '{:02d}:{:02d}'.format(mins, secs)
lengthlabel['text'] = "Total Length" + ' - ' + timeformat
t1 = threading.Thread(target=start_count, args=(total_length,))
t1.start()
def start_count(t):
global paused
// mixer.music.get_busy(): - Returns FALSE when we press the stop button (music stop playing)
// Continue - Ignores all of the statements below it. We check if music is paused or not.
current_time = 0
while current_time <= t and mixer.music.get_busy():
if paused:
continue
else:
mins, secs = divmod(current_time, 60)
mins = round(mins)
secs = round(secs)
timeformat = '{:02d}:{:02d}'.format(mins, secs)
currenttimelabel['text'] = "Current Time" + ' - ' + timeformat
time.sleep(1)
current_time += 1
def play_music():
global paused
if paused:
mixer.music.unpause()
statusbar['text'] = "Music Resumed"
paused = FALSE
else:
try:
stop_music()
time.sleep(1)
selected_song = playlistbox.curselection()
selected_song = int(selected_song[0])
play_it = playlist[selected_song]
mixer.music.load(play_it)
mixer.music.play()
statusbar['text'] = "Playing music" + ' - ' + os.path.basename(play_it)
show_details(play_it)
except:
tkinter.messagebox.showerror('File not found', 'Melody could not find the file. Please check again.')
def stop_music():
mixer.music.stop()
statusbar['text'] = "Music Stopped"
paused = FALSE
def pause_music():
global paused
paused = TRUE
mixer.music.pause()
statusbar['text'] = "Music Paused"
def rewind_music():
play_music()
statusbar['text'] = "Music Rewinded"
def set_vol(val):
volume = float(val) / 100
mixer.music.set_volume(volume)
// set_volume of mixer takes value only from 0 to 1. Example - 0, 0.1,0.55,0.54.0.99,1
muted = FALSE
def mute_music():
global muted
if muted: // Unmute the music
mixer.music.set_volume(0.7)
volumeBtn.configure(image=volumePhoto)
scale.set(70)
muted = FALSE
else: // mute the music
mixer.music.set_volume(0)
volumeBtn.configure(image=mutePhoto)
scale.set(0)
muted = TRUE
middleframe = Frame(rightframe)
middleframe.pack(pady=30, padx=30)
playPhoto = PhotoImage(file='images/play.png')
playBtn = ttk.Button(middleframe, image=playPhoto, command=play_music)
playBtn.grid(row=0, column=0, padx=10)
stopPhoto = PhotoImage(file='images/stop.png')
stopBtn = ttk.Button(middleframe, image=stopPhoto, command=stop_music)
stopBtn.grid(row=0, column=1, padx=10)
pausePhoto = PhotoImage(file='images/pause.png')
pauseBtn = ttk.Button(middleframe, image=pausePhoto, command=pause_music)
pauseBtn.grid(row=0, column=2, padx=10)
// Bottom Frame for volume, rewind, mute etc.
bottomframe = Frame(rightframe)
bottomframe.pack()
rewindPhoto = PhotoImage(file='images/rewind.png')
rewindBtn = ttk.Button(bottomframe, image=rewindPhoto, command=rewind_music)
rewindBtn.grid(row=0, column=0)
mutePhoto = PhotoImage(file='images/mute.png')
volumePhoto = PhotoImage(file='images/volume.png')
volumeBtn = ttk.Button(bottomframe, image=volumePhoto, command=mute_music)
volumeBtn.grid(row=0, column=1)
scale = ttk.Scale(bottomframe, from_=0, to=100, orient=HORIZONTAL, command=set_vol)
scale.set(70) # implement the default value of scale when music player starts
mixer.music.set_volume(0.7)
scale.grid(row=0, column=2, pady=15, padx=30)
def on_closing():
stop_music()
root.destroy()
root.protocol("WM_DELETE_WINDOW", on_closing)
root.mainloop()
15. INSTAGRAM PICTURE DOWNLOADER
This application comes handy when you wish to delete an Instagram account but wish to keep your collection of images. As this app uses user credentials to open their account and then look for their friend’s ID and download their photos.
SOURCE CODE
""" Instagram Photo Downloader
----------------------------------------
"""
from sys import argv
import urllib
from bs4 import BeautifulSoup
import datetime
def ShowHelp():
print 'Insta Image Downloader'
print ''
print 'Usage:'
print 'insta.py [OPTION] [URL]'
print ''
print 'Options:'
print '-u [Instagram URL]\tDownload single photo from Instagram URL'
print '-f [File path]\t\tDownload Instagram photo(s) using file list'
print '-h, --help\t\tShow this help message'
print ''
print 'Example:'
print 'python insta.py -u https://instagram.com/p/xxxxx'
print 'python insta.py -f /home/username/filelist.txt'
print ''
exit()
def DownloadSingleFile(fileURL):
print 'Downloading image...'
f = urllib.urlopen(fileURL)
htmlSource = f.read()
soup = BeautifulSoup(htmlSource,'html.parser')
metaTag = soup.find_all('meta', {'property':'og:image'})
imgURL = metaTag[0]['content']
fileName = datetime.datetime.now().strftime("%Y-%m-%d_%H:%M:%S") + '.jpg'
urllib.urlretrieve(imgURL, fileName)
print 'Done. Image saved to disk as ' + fileName
if __name__ == '__main__':
if len(argv) == 1:
ShowHelp()
if argv[1] in ('-h', '--help'):
ShowHelp()
elif argv[1] == '-u':
instagramURL = argv[2]
DownloadSingleFile(instagramURL)
elif argv[1] == '-f':
filePath = argv[2]
f = open(filePath)
line = f.readline()
while line:
instagramURL = line.rstrip('\n')
DownloadSingleFile(instagramURL)
line = f.readline()
f.close()
VIEW MORE
0 Comments