Question:
Help with problem in C?
Lenny
2008-07-24 13:43:24 UTC
I need to parse through a list of words (say, in a text file) and print all of the words only once. Specifically, I mean that there are duplicates in the file and I need to print all the words without printing any of them twice. Any ideas on how to do this?
Four answers:
cja
2008-07-24 14:08:32 UTC
The suggestion to use an STL map would be good, if you were using C++.



You asked about C, though, so I'm going to suggest a linked list. Each node of the list will contain a unique word from the file. For efficiency, insert words into the list in sorted order, so you don't have to search the entire list for each insertion. You can also add a count to the list node, to keep track of how many times each word appears in the file.



A more clever solution is to use a data structure that would enable more efficient searching than the linear search you need to do with the simple linked list. Some kind of tree, perhaps, or a skip list. I'd recommend getting the linked list version working first, then think about optimization later.



Your linked list node struct could look like this:



struct listNode {

char *word;

int count;

struct listNode *next;

};



Some careful memory management will be required.
James C
2008-07-24 13:48:23 UTC
Well, without giving you source code, you'd probably want to open a stream to the file, and set up a buffer array to store seen words in. Read each word in, and do a for-loop through all the entries in your array to see if the current word is in the list. If not, add it, if it is, don't. Then just print all the words in your list.
anonymous
2008-07-24 13:48:48 UTC
Create an array of strings. As the code comes to each word, it should compare it to the array of words it already has. If it doesn't see the word, then that word is added to the array. If it does see the word already in the array, then it should move to the next word in the file.



Just make sure that your array is large enough. You can't make it bigger once the program is running.
anonymous
2008-07-24 13:48:22 UTC
This sort of problem is easily done using the standard template library. The technique to avoid duplicates is to use an associative array. In the STL this is called a "map".


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...