stack build
stack run
- don't load invertedIndex in search
- in invertedIndex, sort links here acording to pageRank
- each word should have id (number) and inverted index should consist of ids and links
- it would be nice if we move all data to in-memory-db (e.g. mongoDB)
- parse - it needs collection.jl file in folder where it is being run
- pageRank - it needs webPageInfo.txt file in folder where it is being run
- invertedIndex - same as for pageRank
- search - it needs pageRank.txt and invertedIndex.txt files in folder where it is being run
When you start application menu will appear and you can choose from :
- Parse collection.jl file
- Calculate pageRank
- Calculate reverse index
- Search
After whichever operation you chose this menu will appear again.
pageRank:
- https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.258.9919&rep=rep1&type=pdf
- http://cs.brown.edu/courses/cs016/static/files/assignments/projects/GraphHelpSession.pdf
- https://michaelnielsen.org/blog/using-your-laptop-to-compute-pagerank-for-millions-of-webpages/
https://www.kaggle.com/datasets/aldebbaran/html-br-collection