The Problem

Recently I wanted to look for a bank with high interest rates. Just like anyone would I googled "interest rates Sri Lanka" and looked for a website to compare rates against each other. To my suprise there was only one website that did something even close to this. But that website also lacked most of the features I thought would be very useful for someone looking to pick a good bank.

BankInfo

BankInfo is a easy way to compare and find savings/interest rates and loan rates of banks in Sri Lanka. It also provides data and statistics on banks performances to aid in deciding a good bank.

Getting the data

Scraping data off a website is a easy and well documented process. The problem was not scraping but rather knowing when to scrape these data, because most bank change their rates monthly and every weekly. There a couple of ways to go about doing this. But after some thinking I settled on a system that consists of two components.

Inside VPS I set-up

The reminder
A Python script to be executed at 00:00 every day (Sri Lankan time)
The scraper
Another python script that actually scrapes the data using BS4.

1. The reminder

The reminder script is a script that checks whether data about rates is out of date or not. There are 1000+ accounts form 16 different banks in BankInfo going through each one and checking whether they are changed is not efficient. So we just take a note of the "last updated" note on each banks website and check whether it's changed. If so it triggers the scraper and only scraped that specific website(s) and updated the DB.

The "at" command in Linux:

"To schedule a task using the 'at' command in Linux, you simply specify the time for the task followed by the task itself. The basic syntax is as follows: echo 'Task' | at [desired_time]. The 'at' command is a powerful tool for scheduling tasks to run at a specific time in the Linux environment."

So using this at command the script is set to run at 00:00.

2. The scraper

Each banks website has shows their rates on different ways, so for each 16 bank a separate algorithm was developed to scrape the data. This was the part that took the most of the development time. The script takes a argument with a bank id.