Question:
How to Use Java Scanner delimiter to strip out not character symbols?
ababab
2010-06-25 12:14:28 UTC
Hi

I am trying to build a word dictionary sort of.
I need to scan the words from an input file and use those words.

But I also need to strip out the non-characters like comma, dot, semicolon, colon, dash and single and double-quote signs.

So how do u use the Scanner.useDelimiter() to treat those signs like regular whitespaces ?

The java site : http://java.sun.com/j2se/1.5.0/docs/api/java/util/Scanner.html has code like this:

String input = "ken fish great fish manfish loves-fish";
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.next());
System.out.println(s.next());
System.out.println(s.next());
System.out.println(s.next());

The output is :
ken
great
man
loves-

that treats "fish" as another delimiter, but the \\s* confuses me..
is that escape character for whitespace ? or what is that for ?

------------------------------------------
Is there any method in String class that I can use to strip out strings from a specific character
like I want to convert "don't" into "dont" or "done." into "done".

Thanks for your help.
Three answers:
ʃοχειλ
2010-06-25 13:48:52 UTC
I have written a demo and put the source in the following pastebin link. Do have a look:



http://pastebin.com/b2fUWaFf



I have used three Scanner objects. The first one to get a file name from user. The second one to read the file data line by line. And finally, the third one to scan each line word by word to pick each word, strip off any punctuation at the beginning, end or in the middle of the word, and if the length after the stripping is more than zero, and the list does not contain that word, it is added to the list.



I have used an array-list of string to hold each word seen for the first time. The action of dropping the punctuations is used in line 28:



word = word.replaceAll ("[^\\w]", "");



The first arg of replaceAll is a regular expression pattern. It says that replace any occurence of anything other than an alphanumeric character (a-z, A-Z, _, 0-9) with an empty string, therefore, any pattern like the following (first) words+punc. would be turn to the second pattern:



"Hello" --> Hello

don't --> dont

happy/sad --> happysad

what? --> what

what_is_it? --> what_is_it



(underscore is considered as an alphanumeric character in regular expressions).



If this pattern is not good, let me know. I will provide an alternative pattern (if possible).
feagle
2016-11-02 15:23:00 UTC
Scanner Use Delimiter
annice
2016-09-11 03:39:05 UTC
its convenient! all it's is ooo O's and a zero 0. then u area and feature a ( thingie four dots..... an additional ) factor and a couple of extra dots.. you simply repeat this till u get to the heal. to make that you want a cut down/ then a _ underline factor and an additional ) oooO (....)..Oooo ...(.....(.....) .._)..... )../ .......... (_/ see convenient!


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...