A small program

About 3 weeks ago I was invited in to a job as a creative python developer. I had applied for this job a while ago, and didn’t expect to get in since I have no real web devlopment experience with Python so I was pleasantly surprised to get an invitation.

To this invitation I had to amongst others create a program that:

“Can download all links on a page, and save it in a file in a structured way. You must use python3 and assume you have all modules available for the task”

So I started down the road of finding out what modules/libraries I could use for this task .I found something called BS4(BeautifulSoup)

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with a parser to provide navigating, searching, and modifying the parse tree.

So I was already down the “lets create the program now!” when I stopped and thought. Hmm.. maybe I should just write down in rough steps what this program should do, right? So this is what I came up with:

  • Create an object for the webpage
  • Download the webpage from the object before in text format
  • Create a soup with BS4
  • Open a new csv file on disk and set it to write mode
  • create a for loop that iterates through the site searching for links

I fired up my editor and started with getting the dependencies, requests and bs4. With the dependencies done I setup some simple variables to store the webpage, the url request stored in text format, the soup object for our downloaded url. I then needed to create a local file in csv format, to store the data. Now the for loop took a little time. I needed to find out what I was looking for, so I did some online checking on what tags I could look for to find links on a page. The a tag followed by href gives me the links on the page. Now creating the for loop I look through our soup object for the a-tag, then save all the links found with the href attrib as data. Then I write the data to the file I opened before and comma seperate the links, and make line shifts as well to make it easy to read. Then for the fun of it I added a print statement to print out the links. Finally we close the file(always remember to close the file I am told..)

import requests
import bs4

url = "<webpage url>"
download_url = requests.get(url).text
soup = BeautifulSoup(download_url,"html5lib")

# Opening a file in write mode
f = open("list-of-links.csv", "w")

for links in soup.find_all('a'):
    data = links.get('href')
    f.write(data)
    f.write(",")
    f.write("\n")
    print(links.get('href'))
# Always remember to close it, when you are done
f.close()

And so I was done with the first task.. I went to the interview and explained my thought process on the task, and answered other questions regarding what modules I used and why. It was fun even though it didn’t pan out.

In my next post I will show you some data science that I worked on a while back. Looking forward to it next week!

About 3 weeks ago I was invited in to a job as a creative python developer. I had applied for this job a while ago, and didn’t expect to get in since I have no real web devlopment experience with Python so I was pleasantly surprised to get an invitation. To this invitation I had…

Leave a Reply

Your email address will not be published. Required fields are marked *