Remote services and applications that users access via their local clients (laptops or desktops) usually assume that, following a successful user authentication at the beginning of the session, all subsequent communication reflects the users intent. However, this is not true if the adversary gains control of the client and can therefore manipulate what the user sees and what is sent to the remote server. To protect the users communication with the remote server despite a potentially compromised local client, we propose the concept of continuous visual supervision by a second device equipped with a camera. Motivated by the rapid increase of the number of incoming devices with front-facing cameras, such as augmented reality headsets and smart home assistants, we build upon the core idea that the users actual intended input is what is shown on the clients screen, despite what ends up being sent to the remote server. A statically positioned camera enabled device can, therefore, continuously analyze the clients screen to enforce that the client behaves honestly despite potentially being malicious. We evaluate the present-day feasibility and deployability of this concept by developing a fully functional prototype, running a host of experimental tests on three different mobile devices, and by conducting a user study in which we analyze participants use of the system during various simulated attacks. Experimental evaluation indeed confirms the feasibility of the concept of visual supervision, given that the system consistently detects over 98% of evaluated attacks, while study participants with little instruction detect the remaining attacks with high probability.