Creating a ground truth multilingual dataset of news and talk show transcriptions through crowdsourcing