Question:
How can I scrape weather data from a website?
Jack F
2007-07-03 12:10:22 UTC
I'm running Mac OS X and by using a shell script or an applescript I'd like to scrape websites for the current temperature, wind speed, ect and then have the data printed out clean on my desktop with Geektool.

I've seen this done before many times but I can't find one for Canada or International weather sites that still work.

I don't mind using lynx to dump the contents as well!
Four answers:
csanon
2007-07-03 12:23:03 UTC
You want to parse data that contains the weather data. Obviously, the less data and better “program” oriented it is, the more preferable. Normally, the first thing I look for is an RSS feed on the website. RSS feeds have gained popularity, and many sites offer them. For example, weather.com gives out customized RSS feeds (http://www.weather.com/weather/rss/subscription/USNY0996 ). If I can’t find one, I look around the site for any specific system they have. Some prefer to use another format, like JSON or XML-RPC, usually accessible through an API. Others have their own special data scripts to access.



Your worst case scenario is not having any such simplified data source to work with. In which case, you’ll have to grab the webpage itself and extract the appropriate content.



I’m not an Mac user, but as a programmer I can tell you what the general process is, so you know what to Google for. You need to pick an appropriate programming language. I normally use Python, but you are free to use whatever is appropriate for you and on a Mac. Look up how to retrieve the data you want. If it’s an RSS feed, look up on Google how to obtain RSS feeds through your language. For XML-RPC, how to make RPC calls. If you have a webpage to sift through, look up how to make an HTTP request.



Once you get the data, sifting through it is also dependent on the format of the data. For RSS, it is an XML file. So you need to figure out how to parse an XML file. For JSON, you either use a JSON parsing library or look at the JSON format and write your own. For a webpage, you may want to look at regular expressions.
2007-07-03 12:20:23 UTC
First, I'd dump the web page to a file using lynx, wget or curl.

Then, I'd examine where in the file the temperature data are stored.

Finally, I'd construct a regular expression to use with awk, perl or other similar tool to extract the data from the file.



However I don't use Mac OS X, I'm using Linux, so my answer may be a little vague (and I don't know also how the Geektool imports the data).
Vijay
2014-09-18 03:32:06 UTC
Or just reach to some Data Scraping service providers like PromptCloud
2014-04-17 17:17:21 UTC
I have LabVIEW from my office and it works great scraping weather data from local TV station websites. Here is an example: http://labviewtest.blogspot.com/2012/04/website-scraping-with-labview.html


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...