Skip to main content

Merry Christmas and Happy Holidays

Wish you all a Merry Christmas! Have a happy and fun-filled holiday season! And let us all hope to bring out our best in the coming New Year!

Cheers!

Comments

Popular posts from this blog

Text Highlighting in Latex

While preparing a manuscript with Latex, it is often useful to highlight the changes made in the current revision with a different color. This can be achieved using the \ textcolor command provided by Latex. For example, \textcolor {red}{Hello World} would display the string "Hello World" in red color. However, the final/published copy of the manuscript does not contain any highlighted text. Therefore, if a large volume of changes were made, it becomes tiresome at the end to find and remove all the individual portions of highlighted text. This can be circumvented by defining a utility command to switch highlighting on and off as desired. In the following, we define a new Latex command, highlighttext , for this purpose. The command takes only a single argument—the text to be highlighted.     \usepackage {color}    % For highlighting changes in this version with red color   \newcommand { \highlighttext }[1] { \textcolor {red}{#1}}   % Remove...

Cohere Aya Dataset: Exploring the Split-by-language Collection

A snapshot of the Aya collection (Bengali) . Image taken from HuggingFace. In February 2024, Cohere launched Aya , a multilingual Large Language Model (LLM). Alongside, a set of datasets used to train Aya has also been released. For example, the aya_dataset consists around 205K examples annotated by humans. On the other hand, the recently released aya_collection_language_split is a gigantic dataset with more than 500 million data points spread across more than 100 languages. As the name suggests, this dataset is split by language. For example, all data points in Bengali, irrespective of the underlying task, can be found in a single split. Apart from the original human-annotated examples from the aya_dataset, aya_collection_language_split also contains a lot of translated and templated data. The dataset is released using an Apache-2.0 license, allowing academic and commercial use. The Bengali Language Split Each language split in the Aya collection has three splits. The Bengali split,...

50K Views of the DTN Blog

The DTN blog recently reached a milestone — it crossed 50,000 page views! The journey of this blog — and that of mine with the ONE simulator — started in 2011. Back in those days, old-timers would recall, there were not much resources available on the ONE simulator. After spending some time with it, I was finally able to gain some understanding about its functionality and work flow. I realized that a short how-to document might benefit others. What started as a humble effort to provide a quick tutorial on the ONE simulator soon became a popular resource in the community over the years. I'm thankful to all the users of the ONE mailing list who continuously kept me motivated to enrich this blog. I took this opportunity to prepare an infographic on the usage of the DTN blog based on the statistics provided by Blogger. It has not been possible to include all statistics in the infographic. A few interesting observations are noted below. Undoubtedly, the ONE tutorial is the mos...