splitter for generating transliteration corpus

Description

This is Python script that uses the text file generated by 'Wikipedia Parallel Title Extractor - https://github.com/clab/wikipedia-parallel-titles' as an input.
This script process the input text file (mentioned above) to generate a parallel corpus.
Output of this script (parallel corpus) can be used to train transliteration model on MOSES.

Special thanks to Dr. Rao Muhammad Adeel Nawab and Sir Muhammad Sharjeel for their continous support.

Download the script file (splitter.py)
Copy the input file (generated by wikipedia parallel title script) in same directory
run the terminal/cmd command 'python splitter.py '
Two output files will be generated for each language seperately.

This Script is tested on English-Urdu parallel titles extracted from https://dumps.wikimedia.org/urwiki/20180801/ using https://github.com/clab/wikipedia-parallel-titles
Python version 3.6 was used for testing this script.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
splitter.py		splitter.py