The Beta distribution is **a probability distribution on probabilities**. For example, we can use it to model the probabilities: the Click-Through Rate of your advertisement, the conversion rate of customers actually purchasing on your website, how likely readers will clap for your blog, how likely it is that Trump will…

My company held an AI conference called** TransformX** last week, and the contents were really good, so I wanted to share them with my readers. IMO, it was the best AI conference this year (other than academic ones) in terms of various topics, interesting speakers, accessibility, relevancy, etc. …

Watching Tokyo Olympics during this summer, especially Anna Kiesenhofer, the Austrian Mathematician who won the gold medal in cycling, made me think that the concept of training is not unique to pro-athletes but also applies well to other technicians, e.g. engineers, musicians, dancers, etc.

I’m a software engineer and I…

My team spoke very highly about this blog (and they’re also wondering if self-supervised learning could eliminate the need for labeling entirely) so I gave it a read. It was a very well-written, thorough overview of self-supervised learning. What stands out the most was it was written by Dr. Lecun…

Camera calibration or camera resectioning **estimates the parameters of a pinhole camera model** given photograph. Usually, the pinhole camera parameters are represented in a 3 × 4 matrix called the camera matrix. …

*The views expressed on this post are mine alone and do not reflect the views of my employer, Microsoft.*

**Text-to-SQL** is a task to translate a user’s query spoken in natural language into SQL automatically. It is the project that I’m working on at Microsoft.

If this problem is solved…

If you read any scientific papers, e.g. medical, artificial intelligence, climate, political, etc., or any poll result, there is a term that almost always appears — the p-value.

But what exactly is a p-value? Why does it show up in all these contexts?

This table lists the symptoms and their…

The (somewhat vague) term “Operations Research” was coined during World War I. The British military brought together a group of scientists to allocate insufficient resources — for example, food, medics, weapons, troops, etc. — in the most effective way possible to different military **operations**. So the term “*operations*” is from…

Prior probability is **the probability of an event before we see the data**.

In Bayesian Inference, the prior is our guess about the probability based on what we know now, before new data becomes available.

Conjugate prior just can not be understood without knowing Bayesian inference.

For the rest of…

In one sentence: to **update the probability** **as we gather more data.**

The core of Bayesian Inference is to combine two different distributions (likelihood and prior) into one “smarter” distribution (posterior). Posterior is **“smarter” in the sense that the classic maximum likelihood estimation (MLE) doesn’t take into account a prior…**