Learning human objectives by evaluating hypothetical behaviours