A small program
- by Michael A
About 3 weeks ago I was invited in to a job as a creative python developer. I had applied for this job a while ago, and didn’t expect to get in since I have no real web devlopment experience with Python so I was pleasantly surprised to get an invitation.
To this invitation I had to amongst others create a program that:
“Can download all links on a page, and save it in a file in a structured way. You must use python3 and assume you have all modules available for the task”
So I started down the road of finding out what modules/libraries I could use for this task .I found something called BS4(BeautifulSoup)
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with a parser to provide navigating, searching, and modifying the parse tree.
So I was already down the “lets create the program now!” when I stopped and thought. Hmm.. maybe I should just write down in rough steps what this program should do, right? So this is what I came up with:
- Create an object for the webpage
- Download the webpage from the object before in text format
- Create a soup with BS4
- Open a new csv file on disk and set it to write mode
- create a for loop that iterates through the site searching for links
I fired up my editor and started with getting the dependencies, requests and bs4. With the dependencies done I setup some simple variables to store the webpage, the url request stored in text format, the soup object for our downloaded url. I then needed to create a local file in csv format, to store the data. Now the for loop took a little time. I needed to find out what I was looking for, so I did some online checking on what tags I could look for to find links on a page. The a tag followed by href gives me the links on the page. Now creating the for loop I look through our soup object for the a-tag, then save all the links found with the href attrib as data. Then I write the data to the file I opened before and comma seperate the links, and make line shifts as well to make it easy to read. Then for the fun of it I added a print statement to print out the links. Finally we close the file(always remember to close the file I am told..)
import requests
import bs4
url = "<webpage url>"
download_url = requests.get(url).text
soup = BeautifulSoup(download_url,"html5lib")
# Opening a file in write mode
f = open("list-of-links.csv", "w")
for links in soup.find_all('a'):
data = links.get('href')
f.write(data)
f.write(",")
f.write("\n")
print(links.get('href'))
# Always remember to close it, when you are done
f.close()
And so I was done with the first task.. I went to the interview and explained my thought process on the task, and answered other questions regarding what modules I used and why. It was fun even though it didn’t pan out.
In my next post I will show you some data science that I worked on a while back. Looking forward to it next week!
About 3 weeks ago I was invited in to a job as a creative python developer. I had applied for this job a while ago, and didn’t expect to get in since I have no real web devlopment experience with Python so I was pleasantly surprised to get an invitation. To this invitation I had…