No Arabic abstract
Remote services and applications that users access via their local clients (laptops or desktops) usually assume that, following a successful user authentication at the beginning of the session, all subsequent communication reflects the users intent. However, this is not true if the adversary gains control of the client and can therefore manipulate what the user sees and what is sent to the remote server. To protect the users communication with the remote server despite a potentially compromised local client, we propose the concept of continuous visual supervision by a second device equipped with a camera. Motivated by the rapid increase of the number of incoming devices with front-facing cameras, such as augmented reality headsets and smart home assistants, we build upon the core idea that the users actual intended input is what is shown on the clients screen, despite what ends up being sent to the remote server. A statically positioned camera enabled device can, therefore, continuously analyze the clients screen to enforce that the client behaves honestly despite potentially being malicious. We evaluate the present-day feasibility and deployability of this concept by developing a fully functional prototype, running a host of experimental tests on three different mobile devices, and by conducting a user study in which we analyze participants use of the system during various simulated attacks. Experimental evaluation indeed confirms the feasibility of the concept of visual supervision, given that the system consistently detects over 98% of evaluated attacks, while study participants with little instruction detect the remaining attacks with high probability.
We present the design and design rationale for the user interfaces for Privacy Enhancements for Android (PE for Android). These UIs are built around two core ideas, namely that developers should explicitly declare the purpose of why sensitive data is being used, and these permission-purpose pairs should be split by first party and third party uses. We also present a taxonomy of purposes and ways of how these ideas can be deployed in the existing Android ecosystem.
Clients of permissionless blockchain systems, like Bitcoin, rely on an underlying peer-to-peer network to send and receive transactions. It is critical that a client is connected to at least one honest peer, as otherwise the client can be convinced to accept a maliciously forked view of the blockchain. In such an eclipse attack, the client is unable to reliably distinguish the canonical view of the blockchain from the view provided by the attacker. The consequences of this can be catastrophic if the client makes business decisions based on a distorted view of the blockchain transactions. In this paper, we investigate the design space and propose two approaches for Bitcoin clients to detect whether an eclipse attack against them is ongoing. Each approach chooses a different trade-off between average attack detection time and network load. The first scheme is based on the detection of suspicious block timestamps. The second scheme allows blockchain clients to utilize their natural connections to the Internet (i.e., standard web activity) to gossip about their blockchain views with contacted servers and their other clients. Our proposals improve upon previously proposed eclipse attack countermeasures without introducing any dedicated infrastructure or changes to the Bitcoin protocol and network, and we discuss an implementation. We demonstrate the effectiveness of the gossip-based schemes through rigorous analysis using original Internet traffic traces and real-world deployment. The results indicate that our protocol incurs a negligible overhead and detects eclipse attacks rapidly with high probability, and is well-suited for practical deployment.
Scalability is a common issue among the most used permissionless blockchains, and several approaches have been proposed accordingly. As Ethereum is set to be a solid foundation for a decentralized Internet web, the need for tackling scalability issues while preserving the security of the network is an important challenge. In order to successfully deliver effective scaling solutions, Ethereum is on the path of a major protocol improvement called Ethereum 2.0 (Eth2), which implements sharding. As the change of consensus mechanism is an extremely delicate matter, this improvement will be achieved through different phases, the first of which is the implementation of the Beacon Chain. For this, a specification has been developed and multiple groups have implemented clients to run the new protocol. In this work, we analyse the resource usage behaviour of different clients running as Eth2 nodes, comparing their performance and analysing differences. Our results show multiple network perturbations and how different clients react to it.
Smart home IoT systems often rely on cloud-based servers for communication between components. Although there exists a body of work on IoT security, most of it focuses on securing clients (i.e., IoT devices). However, cloud servers can also be compromised. Existing approaches do not typically protect smart home systems against compromised cloud servers. This paper presents FIDELIUS: a runtime system for secure cloud-based storage and communication even in the presence of compromised servers. FIDELIUSs design is tailored for smart home systems that have intermittent Internet access. In particular, it supports local control of smart home devices in the event that communication with the cloud is lost, and provides a consistency model using transactions to mitigate inconsistencies that can arise due to network partitions. We have implemented FIDELIUS, developed a smart home benchmark that uses FIDELIUS, and measured FIDELIUSs performance and power consumption. Our experiments show that compared to the commercial Particle.io framework, FIDELIUS reduces more than 50% of the data communication time and increases battery life by 2X. Compared to PyORAM, an alternative (ORAM-based) oblivious storage implementation, FIDELIUS has 4-7X faster access times with 25-43X less data transferred.
While most conversational AI systems focus on textual dialogue only, conditioning utterances on visual context (when its available) can lead to more realistic conversations. Unfortunately, a major challenge for incorporating visual context into conversational dialogue is the lack of large-scale labeled datasets. We provide a solution in the form of a new visually conditioned Future Utterance Prediction task. Our task involves predicting the next utterance in a video, using both visual frames and transcribed speech as context. By exploiting the large number of instructional videos online, we train a model to solve this task at scale, without the need for manual annotations. Leveraging recent advances in multimodal learning, our model consists of a novel co-attentional multimodal video transformer, and when trained on both textual and visual context, outperforms baselines that use textual inputs alone. Further, we demonstrate that our model trained for this task on unlabelled videos achieves state-of-the-art performance on a number of downstream VideoQA benchmarks such as MSRVTT-QA, MSVD-QA, ActivityNet-QA and How2QA.