Question:
Extracting a Particular Text Pattern from HTML Files?
Dhananjeyan
2007-06-07 08:01:07 UTC
I have more than 4000 HTML Files. I need only a particular line from those files.

Each file contains some lines with a same pattern, and I would like to extract only those lines.

The pattern that I would like to extract from all the files looks like: javascript:listen('http://www.domainname.com','http://www.anotherdomainname.com/somefilename.ext',0)

Can Anybody suggest me a solution for this?

PS: I don't have a Linux System, hence I need a batch file Script or a kind of software which runs on Windows Platform for obtaining the solution for this problem.

Thanks in advance.
Three answers:
humilisquero
2007-06-08 01:30:58 UTC
Hi



I understand from your question that you want to find all occurrences of the given pattern, with each occurrence potentially having a unique URL. I also assume that you are only interested in the URL, because you cut off the rest of the text in your example of the pattern. I then assume that you wish to manipulate or use the search results or URLs in some programmatic way.



Assuming then that you have some programming experience, I suggest that you write a program or script that uses regular expressions to find the occurrences. The following regular expression pattern should suffice:



javascript:listen\('http://\w+.\w+.\w+[\/\w*.]*'



After some brief testing, this pattern found occurrences like:

1) javascript:listen('http://www.domainname.com'

2) javascript:listen('http://www.mydomainname.com/test/page.svc'



If this helped you in any way, but is not sufficient, you may read more about Regular Expressions here:

http://www.regular-expressions.info/reference.html



There are many programming languages that support the use of regular expressions, e.g. vbscript, Microsoft .NET C#, C++, VB, etc.
?
2016-05-19 03:54:44 UTC
If you have the PDF Reader, you can extract the Text and Graphics in most cases (Some documents don't allow it.) but not to reinsert it or create a PDF. Some HP printer software, does allow you to Print to a PDF. Select output as PDF instead of a printer. It saves in My Documents folder. You could also print off the Document and then scan it back in using an OCR program (Most Scanners come with a basic version.)
Vyshali P
2007-06-07 10:16:11 UTC
You have following options



1. get grep for windows, and use it to search your expressions in files.

2. use windows find command, it work like grep, but its not exact match to grep.



following options are available with find



C:\>find /?

Searches for a text string in a file or files.



FIND [/V] [/C] [/N] [/I] [/OFF[LINE]] "string" [[drive:][path]filename[ ...]]



/V Displays all lines NOT containing the specified string.

/C Displays only the count of lines containing the string.

/N Displays line numbers with the displayed lines.

/I Ignores the case of characters when searching for the string.

/OFF[LINE] Do not skip files with offline attribute set.

"string" Specifies the text string to find.

[drive:][path]filename

Specifies a file or files to search.



If a path is not specified, FIND searches the text typed at the prompt

or piped from another command.



I hope this helps

for any help send a mail to help@paijwar.com


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Continue reading on narkive:
Loading...