Skip to content

bilec/search-project

Repository files navigation

search-project

stack build
stack run 

TO-DO

  • don't load invertedIndex in search
  • in invertedIndex, sort links here acording to pageRank
  • each word should have id (number) and inverted index should consist of ids and links
  • it would be nice if we move all data to in-memory-db (e.g. mongoDB)

Requirements

  1. parse - it needs collection.jl file in folder where it is being run
  2. pageRank - it needs webPageInfo.txt file in folder where it is being run
  3. invertedIndex - same as for pageRank
  4. search - it needs pageRank.txt and invertedIndex.txt files in folder where it is being run

Usage

When you start application menu will appear and you can choose from :

  1. Parse collection.jl file
  2. Calculate pageRank
  3. Calculate reverse index
  4. Search

After whichever operation you chose this menu will appear again.

Sources

pageRank:

Dataset

https://www.kaggle.com/datasets/aldebbaran/html-br-collection

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •