Q-Function

The Q-function at state doing action computes the expected reward from following policy