Question:
I need help with a Java RegEx assignment. So confused!?
www333###
2011-09-14 19:24:53 UTC
I am trying to write an app that reads in inputs from a file using the scanner class, and then uses MatchResult to parse it. In order for the input to "pass" it can only have 5 word characters in a line [A-Z a-z 1-9]. So, basically I have written the program up to the point of MatchResult but I don't know what to do next to output the ones that passed and the ones that failed. I am trying an if-else loop, but I'm not sure how to write one in Java or which argument exactly to pass to it...
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;
import java.util.regex.*;

public class ByteMonitor {

public static void main(String[] args)
{


try {
BufferedReader inputFile = new BufferedReader(new FileReader("input.txt"));
String line = null; //corresponds to a string in the file

while((line = inputFile.readLine()) != null){ ; //while there are lines left in the input file.

Scanner scan = new Scanner(line);
scan.findInLine ("[.*][\\w^_{5}][.*]"); //find word or number characters of the length of 5.
MatchResult result = scan.match();

/*if ( )
{
System.out.println ("OK") ;
}
else
{
System.out.println("FAIL");

}*/
scan.close();

}

//file I/O can potentially generate a FileNotFoundException if the specified file
//does not exist, and therefore must implement exception handling syntax

}catch (IOException e) { System.err.println("Error With File");}
}
}




Any advice???
Three answers:
McFate
2011-09-14 20:37:34 UTC
First, your regex is not what you want.



Square brackets list alternative characters (e.g., "[abc]" matches "a", "b" or "c"). ".*" matches any sequence of characters ("." = any one character, "*" = previous character 0 or more times). But "[.*]" matches only the literal character "." OR the literal character "*" because the square-brackets don't support wildcarding inside. This is an even bigger problem for the middle of your regex "[\\w^_{5}]" (that matches the literal character backslash, w, caret, underscore, {, 5, or }... but nothing else. Definitely not what you are looking for.



The upshot is that the string "*w*" matches your regular expression (as do strings like ".{." and ".5*"), but strings that you probably want to match (such as "abcde") do not.



You want to use \w (a word character) and \W (a non-word character) to construct your regex... but not in square brackets. Taking your original description literally, you want lines that contain exactly 5 word characters (supposedly with any number of non-word characters before, after, or in between them). That would be a regex like:



^\W*(\w\W*){5}$



You use the "^" anchor to ensure that the pattern includes the very first character (otherwise the pattern could skip over some word characters, and match 5 word characters in the middle of your String). Start the String with any number of non-word characters (\W*). Then one word character (\w) followed by any number of non-word characters (\W*) -- the whole thing repeated exactly 5 times ({5}). Then the end-of-line anchor ($) for the same reason as the caret. That's the regex for exactly 5 word characters with any number of non-word characters in any position.



Note that in Java String instances, backslashes escape other things. So you have to double all the backslashes when entering that regex as a constant String, in order to get a single backslash into the regex.



Then, by the way, it is sufficient to call only "scan.findInLine()" which returns the string that matches the regex, or null if there is no match. You don't need to worry about MatchResult.



Here's a simple test program that I wrote, and the results of testing it with several Strings:



=====

public static void main( String[] args ) {

doTest("abcde");

doTest(" a b c d e ");

doTest(" a b c ");

doTest(" abcdef ");

doTest("*w*");

}



public static void doTest( String s ) {

System.out.println("Testing \"" + s + "\"");

Scanner scan = new Scanner(s);

String s2 = scan.findInLine( "^\\W*(\\w\\W*){5}$" );

scan.close();



if (s2 != null) {

System.out.println("Found in line");

} else {

System.out.println("Not in line");

} }

=====



And here is the output of my little test program, annotated (// my comments):



=====

Testing "abcde" // only 5 word characters

Found in line



Testing " a b c d e " // 5 word characters with non-word in between

Found in line



Testing " a b c " // only 3 word characters

Not in line



Testing " abcdef " // 6 word characters

Not in line



Testing "*w*" // test for your original expression

Not in line

=====



Combine that with the file I/O stuff you already wrote, and you should have the program you want.
2016-10-03 15:27:59 UTC
nicely, a single character won't be able to be the two x and y, so because it is not probably the priority right here. It suits 0 or greater of (x or y). It unearths 2 issues that are (x or y), certainly one of that's x and the different y. in case you like to shrink what proportion (x or y) it suits, you do no longer choose to apply the * quantifier.
abhijit
2011-09-14 21:28:01 UTC
/*

* To change this template, choose Tools | Templates

* and open the template in the editor.

*/



package scjp;



import java.util.regex.*;



class Main

{

public static void main(String[] args)

{

Scanner scan = new Scanner(s);



String txt="abc d";



String re1="\\w{5}"; // Alphanum 1



Pattern p = Pattern.compile(re1,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);

Matcher m = p.matcher(txt);

if (m.matches())

{



System.out.print("PASS\n");

}else

System.out.print("FAIL\n");





}

}


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...