help about mini-search-engine using c++

Discussion in 'Programmer's Corner' started by moslem, Jun 27, 2010.

  1. moslem

    Thread Starter New Member

    Dec 16, 2009
    20
    0
    hello every one
    i've aproject about mini_search_engine
    and this is the what's doctor wrote to us
    You are given a folder of input text or HTML files (say 50 files) and a set of keywords (say 40 keyword).
    Build an efficient data structure (an index file) using hashing that will provide information as which files
    contain a certain keyword and in which line. The structure should be saved on hard disk and loaded in
    memory upon system startup. Then you build a small program in which a user will have a simple
    interface asking for a keyword:​
    Input Keyword to Search​
    :
    The user then types "water" for example.
    The program then searches the index file and should have an output such as:

    The keyword “water” exists in:
    water-resources.html:
    Line 212: The water problem in the middle east ....
    Line 345: The Nile water is the main source of ...
    Line 2003: Water is a blessing from God ...
    arab-water-supply.txt:
    Line 2: shortage of water in ..
    Line 25: Libya has ample supply of water from rain ...​
    and so on. Can you handle the rule that when a file is changed, the index is re-constructed?
    please i want some ideas about that and the planner of the project
    thanks alot
     
  2. someonesdad

    Senior Member

    Jul 7, 2009
    1,585
    141
    This is a common and relatively straightforward task usually assigned in a beginning data structures course. Almost any algorithm/data structures book will have information on it. Instead of asking people for ideas, consult one of the many textbooks that discuss the technique; then let us know what parts of the problem you don't understand. Your problem statement doesn't say whether you have to design the hashing stuff yourself or whether you can use library stuff (writing your own hashing stuff is a bit more work, but a good exercise if you've never done it before -- especially devising decent strategies to deal with collisions). If you can use library stuff, there's a GNU or Boost STL dictionary/map structure based on hashing that could be used to satisfy the spirit of the problem, if not the teacher's wishes. :)

    Hints: some of the major tasks are parsing the input, creating and populating the data structure you use, then gathering and processing the input from the user.
     
    moslem likes this.
  3. Ahmed2010

    New Member

    Jun 30, 2010
    3
    0
    i have the same project too ... Dr:Khaled fo2ad is international now
    anyway , could u know how to read file name that u r already in it ?
    and is there a function that get the line number ?
    i thought about counter that increasing every line . but how could i know that i'm in anew line ?
     
  4. moslem

    Thread Starter New Member

    Dec 16, 2009
    20
    0
    you can do this using loop to loop the line till u reach the required line then getline this line
     
  5. Ahmed2010

    New Member

    Jun 30, 2010
    3
    0
    and how do u know that the line was finished ? especially there is no ENTER at the end of it ?
     
  6. moslem

    Thread Starter New Member

    Dec 16, 2009
    20
    0
    the command getline gets the line till enter so when u write getline it'll be get all the line then put this command in for loop till the required line .
     
  7. Ahmed2010

    New Member

    Jun 30, 2010
    3
    0
    but as i said before there is no ENTER in some files that mean there is no NULL at the end of the line ,so how u know that the line was end ?????
     
  8. retched

    AAC Fanatic!

    Dec 5, 2009
    5,201
    312
    I there is no 'ENTER' then it can still be considered the same line.
    It will not "wrap" until there is an 'ENTER' so if you have a very wide screen, it will be 1 line
     
Loading...