Question:
Reading a .txt from a url in Java. Help?
James
2012-11-29 17:38:12 UTC
I am trying to read a .txt file from a website. Here's what I have so far, what am I missing?

/**
* @(#)URLReader.java
* @
* @ version 1.00 2012/11/29
*/

// http://www.gutenberg.org/files/1399/1399-8.txt

import java.net.*;
import java.io.*;


public class URLReader
{
public static void main(String[] args) throws Exception
{
// Open a connection to the URL, and get an input stream for reading data from the URL.
String Gutenberg = "http://www.gutenberg.org/files/1399/1399-8.txt";
URL url = new URL(Gutenberg);
System.out.println("Reading URL: " + url);
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
System.out.println();

// Copy lines of text from the input stream to the screen, until "end-of-file" is encountered
// (or an error occurs).
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
Three answers:
husoski
2012-11-29 17:57:16 UTC
I can't see all of what you're doing because of the Y!A 40 character rule. I use a slightly different method, illustrated below:



import java.net.*;

import java.io.*;



public class URLReader {

public static void main(String[] args) // throws Exception

{

URL yahoo = new URL("http://www.yahoo.com/");

BufferedReader in = new BufferedReader(

new InputStreamReader(

yahoo.openStream() ) );

String inputLine;

while ((inputLine = in.readLine()) != null)

System.out.println(inputLine);



in.close();

}

}



Oops...hit submit instead of preview. Anyway, that works and displays the html source for the yahoo.com main page. When I update it to your text file URL, it gets the same problem you're likely to get: a 403 "Forbidden" error. I get the same response from wget on linux. Apparently Project Gutenberg actively tries to prohibit non-interactive downloads. That's probably to save bandwidth for everyone else.



You'll have to prepare headers to make yourself look exactly like a browser. I don't have docs for that, and it's a bad Karma move anyway.
Kaydell
2012-11-29 18:14:03 UTC
I googled for an answer and found this:

http://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html



I tried out the code an it works for some URLs but for others like your URL, I got an HTTP 403 error which means "forbidden".



So, I think that the Java code is right. I just think that the Gutenberg website has limitations to automated connections downloading their pages.
Susan
2016-05-18 05:23:35 UTC
I'm not sure if I understand what you're asking but you (if you haven't already done so) try putting the text files in the .jar file with the program. That might work. Also, try using the complete path to the file instead of just its name. I hope this helps.


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Continue reading on narkive:
Loading...