Question:
How can I select 160 lines randomly from a text file and print them to another using Perl?
2011-04-11 08:03:42 UTC
I have a text file with around 1000 lines of data in the .txt file. I used perl to sort through all the data and parse them. I am having difficulty with the rand function and the srand function. I was wondering if someone would know a good way to program this. I have to parse this data for research I am currently conducting. I need to get rid of data bias, so Perl has to do the unbiased work for me to select specific dates out of the file for me. I was wondering if anyone could help. Thanks.
Three answers:
?
2011-04-11 12:42:05 UTC
I'll assume what you mean to ask is "How do I select 160 unique and random lines from a group of 1000 using perl."



If you are doing an actual scientific study, you'll probably need to familiarize yourself with perl's rand function more, to be able to argue that it is truly random.



First problem: unique lines. If you simply do rand(1000), you run the chance of getting duplicate values. Not a problem. For this, we use splice.



Second problem: select random lines. For this we use rand(), as you have surmised. Lets look it up at perldoc.perl.org and see what it says:



"Returns a random fractional number greater than or equal to 0 and less than the value of EXPR. .... Automatically calls srand unless srand has already been called.



Apply int() to the value returned by rand() if you want random integers instead of random fractional numbers."



You don't need to use srand, as it is called automatically. Yay perl.



" splice ARRAY,OFFSET,LENGTH



Removes the elements designated by OFFSET and LENGTH from an array.... In scalar context, returns the last element removed, .... The array grows or shrinks as necessary."



This can be used to trim out values out of an array, much like using a scissor to cut away lines from a paper and save them in a scrapbook.



Next, for clarity, I remind you of the beauty of the pound sign, which perl considers as the start of a comment that is not evaluated by the perl script. You can add and remove lines this way, and add clarifying comments, such as mine below.



So here's the program:



#!/user/bin/perl -w

# Use your own path

use strict; # always use strict

my $file = "sample.txt";

open (FILE, $file) or die "$0: $!"; # printing errors is a Good Thing (tm)

my @file = ; # Store all the lines in an array

close FILE;



open OUT, ">my_samples.txt" or die "$0: $!"; # Caution: will overwrite any existing files

my $i;

for ($i = 0; $i<160;$i++) { # Start sampling loop

my $rand = int(rand($#file));

# rand gives you a random number between 0 and the number of lines remaining in the array, note

# that this number is by necessity dynamic, it will drop 1000, 999, 998, etc.

my $sample = splice @file, $rand, 1; # Cut out the $rand-th line, offset 1 means just one line

print OUT $sample;

} # end of for loop



Being a simple program, it should work right away, but you may need to tweak it. You can run it several times and compare the results by using ">>" instead of ">" in the open() statement. ">>" means append, ">" means truncate and start a new file (overwrite).



If you want to sample the randomness in a clever fashion, swap the file content in the array with numbers 0-999, print it to file, and analyze it in Excel, or your favourite statistic program. The simple way to make such an array is:



my @file = (0 .. 999);



Note that this will require you to comment out the other lines where we open the file, and read it, and put the file content into @file.



Good luck!
techieguy
2011-04-11 11:20:26 UTC
Let's say your file name is "sample.txt". The you could use this Perl one-liner:



perl -lne 'chomp; push @x,$_; END {do {print $x[int(rand($#x))]; $i++} until $i == 160}' sample.txt



A couple of tests with this Perl one-liner follow. I have a test file called "sample.txt" that has 20 lines. I'll use the code to select 7 lines at random.



$

$ cat sample.txt

This is line no. 1

This is line no. 2

This is line no. 3

This is line no. 4

This is line no. 5

This is line no. 6

This is line no. 7

This is line no. 8

This is line no. 9

This is line no. 10

This is line no. 11

This is line no. 12

This is line no. 13

This is line no. 14

This is line no. 15

This is line no. 16

This is line no. 17

This is line no. 18

This is line no. 19

This is line no. 20

$

$

$ perl -lne 'chomp; push @x,$_; END {do {print $x[int(rand($#x))]; $i++} until $i == 7}' sample.txt

This is line no. 4

This is line no. 11

This is line no. 8

This is line no. 4

This is line no. 15

This is line no. 3

This is line no. 5

$

$ perl -lne 'chomp; push @x,$_; END {do {print $x[int(rand($#x))]; $i++} until $i == 7}' sample.txt

This is line no. 15

This is line no. 10

This is line no. 8

This is line no. 14

This is line no. 5

This is line no. 18

This is line no. 16

$

$ perl -lne 'chomp; push @x,$_; END {do {print $x[int(rand($#x))]; $i++} until $i == 7}' sample.txt

This is line no. 15

This is line no. 18

This is line no. 14

This is line no. 19

This is line no. 11

This is line no. 9

This is line no. 12

$

$
Yahgoogle
2011-04-12 03:04:07 UTC
The perl based application Replace Pioneer can do:

1. launch "replace pioneer", ctrl-o open file

2. ctrl-h open 'replace' window

* set 'replace with pattern' to rand_str_unique(-160,split('\n',$match))

3. click 'replace', done!



RP free trial download:


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...