Question:
sorting a large file in java without running out of memory?
tuscani_tiburon
2007-02-14 22:48:24 UTC
I've written a java program that works very well by reading in a text file (one text string per line), alphabetizing it, and saving it to a new text file. My problem is that, while it works well with average sized files, I would like to have it alphabetically sort a file that contains 1,557,459 text strings. Every time I try to sort this larger file, I get an error saying:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.Object.clone(Native Method)
at java.util.regex.Matcher.toMatchResult(Unknown Source)
at java.util.Scanner.match(Unknown Source)
at java.util.Scanner.nextLine(Unknown Source)
at Sorter.loadStrings(Sorter.java:64)
at Sorter.main(Sorter.java:25)

anyone have any ideas?
Three answers:
run4ever79
2007-02-14 23:40:17 UTC
You can accomplish this by using a merge technique and at least 1 third file to use as a swap space. Divide the first in half saving each have in separate files. One is designated for smaller elements one for larger. then merge, like you would in merge sort. Read each a line at a time. saving the least of the lines into the 3rd file until they are ordered. Then repeat the process until you have merged them without any "inversions." (smaller elements in the file you expect larger ones to come from).
Pfo
2007-02-15 07:32:49 UTC
run4ever79 is right on, the technique is called mergesort. In fact, I would generalize your algorithm to handle any number of files. Basically, what you do is choose a number of elements (say, 1,000) and load the first 1,000 into an array, sort them and write the sorted results to file. Repeat this until all the values are sorted in their own files. Next, load two of the files, and read values one at a time, sort these piecemeal and place them in a file. Repeat this until all of your results are sorted in one file. I'm sure their are better explanations of mergesort online.
?
2016-12-04 10:01:27 UTC
i do no longer understand why u r utilising this previous version to run report at the same time as Java have already new variations like JDk6 ... i imagine to run java report through command propmt without operating any batch report u ought to set JAVA_HOME and Classpath in the Enviorment Variable through MyComputer->RightClick then houses->stepped forward equipment houses->Enviorment varible i do no longer understand if it may help u thanks


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...