Q-Function The Q-function at state s doing action a computes the expected reward from following policy π Qπ(s,a)=E[t≥0∑γtrt∣s0=s,a0=a,π]