Posts

i replaced a model’s reasoning with gibberish

these days the story everyone tells is: let the model “think” before it answers, and it does better. you give it a scratchpad, it writes out a long chain of reasoning, and accuracy goes up. papers like s1 and snell et al., and the whole o1 narrative, are built on this idea of spending more compute at test time by thinking longer. while running a budget-forcing study i kept wondering about something simple. when thinking helps, is it because of what the model writes in the scratchpad? or just because it spent a few thousand extra tokens before answering — the extra forward passes, the longer context — regardless of what those tokens say? ...

softmax numerical instability

numerical instability of softmax the softmax function is widely used in machine learning, especially for converting raw scores (logits) into probabilities in classification problems. it is defined as: Softmax(z_i) = exp(z_i) / sum(exp(z_j) for j in range(n)) where z_i represents the logits for class i. the problem of numerical instability the softmax function can suffer from numerical instability, particularly when the input logits z contain large or small values. this is because computing exp(z_i) can result in very large or very small numbers, leading to overflow or underflow errors when summed across all classes. ...

blockbuster

recently, while watching the music video for faris shafi’s blockbuster, i noticed something interesting. the song was in punjabi, but the subtitles (and even the hoardings and other text) on the screen were in the arabic script. it struck me then that punjabi is quite a unique language, as it’s written in two different scripts: shahmukhi (used mainly in pakistan) and gurmukhi (used mainly in india). i can understand some gurmukhi punjabi, but shahmukhi remains a mystery to me. yet, since the two scripts represent the same sounds, it should be possible to transliterate one into the other. might be worth exploring. ...

GREENER

greener: graph neural networks for news media profiling in the summer of 2020 i got an opportunity to work on GREENER with dr. preslav nakov and tanbih team. this project aimed at detecting bias and factuality of news media but beyond mere textual signals. in this blog post, we’ll dive into the technical details behind greener, including its methods and concepts. problem definition greener’s goal is to profile news media outlets based on two characteristics: ...

Graphing the Movies

First Post With the end semester exams coming up, I figured it’s the perfect time to start my blog (yeah, great timing, right?). As I was studying Graphs for exam prep, I wanted to implement something cool using them. After searching for a while, I stumbled upon this dataset. It’s from Moviegalaxies, a site that provides network graph data from around 773 films. Pretty neat, huh? Data Processing So, I took the raw data and processed it into a CSV file, showing how two characters appear together in a scene, which is represented as edges of a graph. For example, if you look at the Godfather, there are two scenes where Anthony and Michael both appear together. Here’s how it looks in the CSV format: ...