Question:
Regex - syntax to achieve the following?
Jean Luc
2010-09-16 22:24:17 UTC
Hello. I have been asked by my CO to analyze traffic to our website. I have an advanced log analysis tool that uses regex to build segments. I will try to explain my need... I have to measure traffic to a page on our site called /groups/ OR /groups (trailing slah version is same page). That might seem easy ... a simple regex "/groups" ... and that would work but that would give erroneous results. See below:

Target match = http://www.domain.com/groups OR http://www.domain.com/groups/

The regex does not need to contain domain text so lets throw that away:

Target match = /groups OR /groups/

So, if I created a simple regex "/groups" it could match against:

/groups
/groups/
/groups/page
/page/groups/page

So thats no good. Somehow the regex needs to obey the following exactly:

string must directly follow the domain extension and not be a sub page:
.com/groups = good
.com/page/groups = bad

string can contain a trailing slash
.com/groups = good
.com/groups/ = good

after the trailing slash, no additional directories can exist:
.com/groups = good
.com/groups/ = good
.com/groups/page/ = bad

BUT query string are allowed:
.com/groups?var=data = good
.com/groups/?var=data = good

Can anyone help me formulate a regex to accomplish this? Please I have an insain deadline. TYVM!!!

Robert
Four answers:
ʄaçade
2010-09-16 23:07:19 UTC
Hmm. You are right. Let me try again. ...





# Here are your data:

[ʄaçade]> cat q1.dat



/groups/conference/

/groups/meeting?vaw=1234

/groups/other

No other data following the trailing "/" should

/groups/?var=1234

/group?var=1234

/groups/

/groups



# Here is Perl regex extracting the /groups or /groups/?:



[ʄaçade]> perl -ane 'print if /(groups\/?$)|(groups\/\?)/;' q1.dat

/groups/?var=1234

/groups/

/groups

[ʄaçade]>





Does THAT do it?
poeticjustice72182
2010-09-17 00:25:09 UTC
\.com\/[\w_\-]+(\/)?(\?[\w_\-\=\&]+)?$



Note that I'm not using delimiters in the regex above...



Translated in to English this says:



Look for '.com' followed by at least one of a letter, number, underscore or dash, which is then optionally followed by a '/'. Following this slash you may optionally have a '?' character which must then be followed by at least 1 of a letter, number, underscore, dash, equals or ampersand. Whether you have this optional portion or not, the string must then end.



This will not match any of your bad instances, appears to match all of your good that I could see.



Quick caveat though...this regex will match .com/sub_dir where sub_dir does not have to be 'groups'. To be explicit for 'groups' just do this:



\.com\/groups(\/)?(\?[\w_\-\=\&]+)?$
catchings
2016-12-14 10:14:38 UTC
No. it relatively is extra difficulty to repair this than it may be to rewrite it. "solid winds have been accompanied by a heavy downpour which uprooted timber that snapped overhead electric powered lines." difficulty-verb settlement makes extra experience than attempting to be succinct.
anonymous
2016-12-12 15:27:50 UTC
No. it relatively is greater difficulty to repair this than it could be to rewrite it. "sturdy winds have been observed by using a heavy downpour which uprooted timber that snapped overhead electric powered traces." project-verb contract makes greater sense than attempting to be succinct.


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...