Question:
Is there a way to dynamically insert news text from a news website into a website? Javascript ?
travis_ferreira
2006-05-14 21:56:02 UTC
I would like to add current news text on my site from a news website I select. I wondering if this is possible ?
Five answers:
Allen Mathew
2006-05-14 22:44:15 UTC
Try AJAX, read the news form the selected site,as a part of your server program, then have your client request for the news, then by using XMLHttpRequest, and the DOM of the HTML page you can insert the text from the server into any part of HTML page using JavaScript on your client side.
John C
2006-05-14 22:50:51 UTC
If you are talking about screen scraping, it is kind of a chore. There are software tools and stand alone libraries that support that, or at least make it very easy.



Take a look at the Apache Cocoon block for doing Portal work. It has elaborate means to log into a website, scrape screens (retrieve information from the web pages), reformat the information in a format suitable for a web page, and even build the web page around it.



If Cocoon is too heavy for your needs, you can simply use the HttpClient library - if you are programming in Java. It can read stuff from websites easily.



If you do not want to use Java, and like you mentioned you do not want to use PHP - there are two other scripting languages that you should look at. They are both object-oriented, and both have the ability to read information from websites and process input data that is in the XML format.



1. Python

2. Ruby



They are both a decade or more old, so you do not need to worry that either one is immature or unstable.



Python has been used to do many parts of the Yahoo and Google websites since their inception in the 1990s. Anytime you see a .py in a website URL, that site is using Python.



Ruby is slightly newer than Python and was relatively unknown in the US until half a decade ago. Recently, Ruby has become explosively popular due to the release of the 1.0 version of its Ruby On Rails framework.





You should be aware that if you get it from a web page commercial news web site you are going to have the practical issue that it is not "pain text". It is in HTML, XHTML, or XML - most likely. You will probably have to deal with that, not just text.



Also, it will all be copyrighted.





Unless there is a good reason why you cannot, obtain the feed from the news site using RSS, which is a specialized form of XML. It is the simplest thing to do and the most widely used format for publishing news.



You might be able to pull down news feed/search data in non-RSS and non-Atom formats - maybe some format like NewsML or some proprietary XML file format. RSS is a pretty simple format to work with, though.





You might need to do the aggregation of the news items from the RSS feeds on your website. It sounds like that is what you are planning to do anyway.



You can use XSLT on your server to transform the feed information from RSS (XML) into HTML or XTHML - whichever format your page is in.



All modern web browsers (Firefox, Safari, even IE 6.0) support XSLT. You can do a little XSLT processing in your browser, though you will get better performance if you do it on there server side - where you can take advantage of caching and other sensible things. Of course, you can always aggregate on the server side and then polish the formatting on the client side.



That is the great thing about XML. You can build pipelines that process it in a series of actions, and you can "chunk" where those actions go between the server and the client. Allocating the processing between the server and the client largely becomes a "tactical" question - not a commitment to a single immutable, inflexible strategy.





A more bush league but kind of cool thing to try would be to make the whole web page that has the feeds on it be XML, and use CSS 2.1 stylesheets cause the XML data to appear like a web page.



CSS cannot reorder and sort data the way XSLT can, though it can filter it a bit - and reposition it quite nicely.





XSLT can generate 3 types of output: XML (default), HTML, and text.



If you want to generate HTML, you will have to specify it, by including this exact directive at the appropriate place in your XSLT script:







XHTML is a form of XML, so you get to skate with the default if the page you will be including the converted RSS feed into is in XHTML format.





One note of caution. If you are operating a commercial website and/or one used by other people, and you are providing the newsfeed as a choice instead of your users just inputing it, then be careful.



A lot of news organizations might expect you to pay for the feed or at least get their permission in advance.



Others take the pragmatic approach of assigning you an ID to access their feed/service, and limiting you to a thousand or whatever searches/requests per day.



So read the terms of use for any RSS feeds you obtain from commercial news organizations.





RSS feeds (and presumably Atom feeds) are notorious for not being well formed (for that matter, so are HTML pages). Commercial newsfeeds would hopefully be in better shape than RSS coming from weblogs. But you never know. The home pages for the biggest newspapers in the country generate hundreds of errors when the page is fed into a validator (e.g. validator.w3.org).



The Syndic8 folks have been discussing the problems with RSS feeds not following the RSS syntax (including XML syntax!) for years.





Yahoo also offers a news search service. It might be appropriate/useful for people who want to obtain their news information from Yahoo, and are only interested in news stories that have certain words in them.





Javascript might be useful for prototyping functionality right in the browser. You might be able to use it in production too. But Javascript will be pretty slow. Especially if it is pulling data from a lot of sites.





If you have a clever architect, he will know ways of overcoming or avoiding these problems. These are just the basic resources you can plug into a solution - not a complete, finished, expert solution.





Read the Sources I have provided. They will provide specific examples and guidance for what I have described.
programmer
2006-05-14 22:43:38 UTC
You can do this with the help of RSS Feeds, use PHP at the server to parse the RSS for you!
shinnphoto
2006-05-14 21:59:56 UTC
You can do this using PHP, and perhaps something with an RSS feed.
Musicman1962
2006-05-14 22:00:52 UTC
USE an RSS feed (really simple syndication) on your website to stream news. Here is a good link that explains it (BBC)



http://news.bbc.co.uk/2/hi/help/3223484.stm



Good luck!


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...