Scraping websites where in the actual information is not published before a stipulated time, is necessary. For example a recent attempt to book time slots for an examination that my sister was yet to take brought forward some insights and needs. Tasks such as Tatkal ticket booking, exam slot booking, trying to open the Results page of some exam ...etc are all time critical jobs . Since these are all First-Come-First-Serve based tasks, being the first one to open the website is crucial. I was met with a similar challenge today wherein the time slot booking for VITEEE 2014 was to start today at 10:30PM. However, the website link pointed to the same webpage with some instructions even after 11:00PM. As I said, these are time critical applications and this particular one wasn't a case of the webpage not opening but more of a case of the webpage getting updated. So pinging the website won't serve the purpose. So a quick 5min script in python using the BeautifulSoup module solved the problem. The code is simple and obvious for most python hobbyists but the exposed possibilities are many. The Beep API via pywin32 was used to alert me when the webpage gets updated by playing a beep. The cue that hinted the update of the webpage was a change in the text part of the page. As it was coded in a very short time not a much attention was paid to the exceptions that might arise. Just wanted to get the work done. Only thing to keep in mind is that one must be reasonable in setting the ping time. Setting it too low might overload the server which I need not tell you. This isn't something big but was something that works and also definitely worth sharing.
No comments:
Post a Comment