There are many ways that these transforms *could*
have been accomplished, but below is a plausible
implementation:
-------------------------------------------------
#!/bin/sh
# Steps in the command pipeline:
# 1) "tr" to convert spaces to newlines in
# order to put each word on its own line
# 2) "tr" to convert upper case to lower case
# 3) "sed" to remove punctuation and numbers
# 4) "grep" to only match words with at least 4 chars
# 5) "sort" with options:
# "-r" for reverse lexographic order
# "-u" to only output unique lines
#
# Note that each line *ends* with a backslash
# (without any trailing spaces) so that these
# physical lines create a single logical line.
tr -s "[:blank:]" "\n" < $1 | \
tr "[A-Z]" "[a-z]" | \
sed "s/[0123456789,.:;?&-]//g" | \
grep '[a-z][a-z][a-z][a-z]' | \
sort -r -u
-------------------------------------------------
I tested this on a copy of the Declaraction of
Independence. That's more text than I want to
put into this answer, but just to see that it
works, here is the the "head" of the output:
$ words.sh decl_of_independence | head -15
would
world
works
without
within
with
will
whose
wholesome
while
which
whereby
whenever
when
whatsoever