Scrapping a website from the server!?

Question:

anonymous

1970-01-01 00:00:00 UTC

Scrapping a website from the server!?

Four answers:

anonymous

2016-10-09 08:17:45 UTC

Myspace and fb are broadly speaking blocked because of fact they're rated as a public website that enables person content cloth. Proxies are protection breaches waiting to take place; no longer basically that the owner of the proxy website now has your logon practise on your account. utilizing a proxy to bypass the filters/firewalls will propose you're in violation of your person contract with the college or your place of company. those web content are no longer college/artwork appropriate, you will possibly be punished by utilizing your college/administration…

Brady

2007-06-13 22:13:06 UTC

first of all.. most of the time scrapping is against the copyright of the original website and therefore illegal. i'm not sure where your getting your stock data from but if your not paying for it somehow its probably illegal.

that said.. you can use php file_get_contents("http://www.wherethequotesare"); you will then need to get your data out of the html that is returned, save it to your database and then you can use javascript to reload the page after a minute. and thats pretty much what scrapping is.. extracting what you want out of an html file.

csanon

2007-06-13 22:19:00 UTC

> Can this be done with only php, or would i need to put javascript as well?

JavaScript is useless. It only works in one context, and that is when someone downloads it to their computer (using a webbrowser) and has it set to run. Clearly, this isn't such a situation.

No I can't really give entire details. I might as well be running the scraping software. Oh by the way, it's called scraping, not scrapping. Perhaps that's why Google didn't give you the right results. You did Google, right?

Your steps can roughly be broken down into obtain scrapable data (webpage, screen data, whatever), parse this appropriately, and extract relevant information. That's the general method regardless of language and what you are scraping.

Obtaining a remote webpage is relatively easy. You have to find the right functions in your language, or if they don't exist code a small sockets app, and grab the data.

The real trouble is in efficiently parsing, and this becomes a Comp Sci exercise depending on the complexity. The simplest thing might be regular expressions, so if they fit the bill, you can have fun with RegEx. Otherwise, you'll be writing a more complex parser.

Once you have a parser that can filter the page, run it through your acquired data, and extract the important tokens to an appropriate storage. You probably have experience with MySQL database, although pretty much any sane data source like another DBMS or XML file or flat file or whatever can make do.

Have fun googling.

jake cigar™ is retired

2007-06-13 22:32:10 UTC

I scrape sites using JavaScript (and a four line perl program).

ajax allows me to read a file from my server, and the perl program pulls it from the other sites to mine!

I've scraped using perl/php alone. this is especially useful for reading rss feeds (scheduled via a cron job) and reading the xml (or html) and re-formatting it for use.

the illegality/immorality comes into play if you don't show where you got the info or modify the original links to go to your own sites or their competitors.

ⓘ

This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.