Question:
inputting files and editing them in C++?
Lindsey
2012-02-19 20:41:09 UTC
I have this assignment in my c++ class (yes, this is homework) and I am not sure how to begin. I don't want someone to give me the code to do it, but explain how to do this as I have no idea. Thank you for even reading this.

My assignment is ask a user to input a file, these files contain old HTML codes. The programs job is to fix these old html codes into newer versions and output it into a new file. We only have to edit certain codes (that are listed below). I also need to count how many times the program corrects something, and give the final answer of fixes at the end.

"Has a function with input and output streams as arguments to do the corrections by adding " /" (space+slash) right before the ">" in the following tags:

should become
should become

should become


should become
Note that it is possible for the input file to have many of each type of these old style html tags, and they can appear in no particular order.
Note also that some of these tags have a large amount of "stuff" called attributes which you need to leave alone... you will just need to find the next ">".
Note that you may assume that the "<" followed by two matching characters uniquely identifies the tag. In other words, if you find ; if you find ; if you find
; and if you find
;
You MAY also assume that none of these four specific tags have already been updated in the file, so you do not need to worry about checking to see if there is already a " /" before the ">" of these tags; however, note that you cannot just put a " /" before EVERY ">" in the whole document because that will make the html page completely fail. Thus, the " />" should be added ONLY at the end of the specific tags: , ,
, and
."

My question is, how do I search for these corrections within an input file? I really don't know how to begin other than opening up the file in the program. Once again, thank you.
Four answers:
Jim
2012-02-19 23:40:25 UTC
oh, this is going to be fun. nested switch statements for the last 2 characters. for example im

but for the 1st character you are going to use a while statement searching for the first < that comes along. easy. I wouldn't use putback(). you can keep a 3-character queue (array, whatever) if you have to of the characters. I don't suggest you use queue, deque would be better if you use anything like that because it's walkable, or you can use list, with .push_back() and .pop_front()

you can walk a list like so:

list lc;

list::iterator lci;

for (lci = lc.begin(); lci != lc.end(); lci++) {

//do something with *lci, lci acts like a smart pointer

}





welcome to the world of lexical analyzers and compilers.



in your switch statements in the leaves go the statements which generate your code.

3rd link is the list of "void elements" according to the w3c. void elements are what others call singletons, they are tags which are not open/close, but only a single tag. in XHTML and XML ONLY these tags have a closing / next to the > like this /> which may have optional whitespace before the / so you can have



/> or
but in HTML it is always
but browsers will ignore the / if you add one. this might be quirks mode. in HTML tags and attributes can be upper or lower case or a mix (fun, huh?), but in XHTML tags and attribs MUST be lowercase and XML tags and attribs can be either but must be consistent throughout the document to my knowledge (I have not tried making an XML schema where case matters, that could be interesting). YOU need to know exactly what type of document you are going to be lexing and translating. you can tell by the at the top.



w3schools is wrong about these tags, HTML5 is not going to close them with a /, w3schools has nothing to do with the w3c.
husoski
2012-02-19 21:41:09 UTC
Here's where to begin:



#include



This is so much easier in C++ than in C just for that one option. The other suggestion could work for you, provided that you can read enough of the file into a string to see the end of the tag you are parsing. I'd take a related approach that involves reading the file one character at a time to break it up into tokens of two kinds: tags and text.



The text you will copy to the output file unchanged (newlines and all), so you DON'T want to use << input from a file stream. I'd use the get() method of istream, but the getline() function from could also be used. You'd have to put back the newlines that it removes, though.



My use of strings would be for the returned tokens. The getToken() function or method (if you're defining your own class for the parser as an object) will look at the next unread character and decide what kind of token to parse. If it's not a '<' then everything up to, but not including, the next '<' or end-of-file is a text token, returned as a string. if it's a <, then everything up to and including the next *unquoted* '>' is a tag.



Quoted text in a tag can be surrounded by apostrophes (') or quotes ("). Quotes are not special between apostrophes and vice versa, so keep track of the starting quote when you're in one.



One handy function I'd write immediately is one to extract and lowercase the name of a tag. That will save the main code headaches when looking for specific tags.



With all this, your main loop could be as simple as:



string token = getToken();

while (token.length() > 0)

{

... if ( (token[0] == '<') && isSpecialTag(getTagName(token) )

... ... htmlout << addSlash(token);

... else

... ... htmlout << token;

}



Most of your work will be in the getToken() function. If you're comfortable with defining your own class types, I'd suggest a parser object that contains any state variables for parsing, like the ifstream() for reading the input file, and a flag or nextChar variable that remembers that the last token operation read, but did not return a '<' character.
ʄaçade
2012-02-19 20:52:57 UTC
The String class has a search method. For each type of correction, go hunt for it throughout the file. When you find one, count it and replace it. Keep a separate counter for each kind of token you seek. Because HTML tokens can span multiple lines, you need to disregard line boundaries (read the whole file into one big String first). Rescan the big String over and over for each token type. When you have scanned and replaced each token type in all occurrences, your big String buffer becomes the new version of the HTML file. (Do not over-write your input file ever!)





http://www.cplusplus.com/reference/string/string/
serena
2016-10-13 13:50:13 UTC
incorporate #incorporate void considerable() { report *fp; fp = fopen("MYFILE.txt", "a"); fprintf(fp, "%sn ", "hi international, the place there is will, there's a manner."); fclose(fp) ; }


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...