No Arabic abstract
Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspects of data science. We focused on their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g., Jupyter Notebook). We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model). We also found that the collaborative practices workers employ, such as documentation, vary according to the kinds of tools they use. Based on these findings, we discuss design implications for supporting data science team collaborations and future research directions.
Modern security operations centers (SOCs) employ a variety of tools for intrusion detection, prevention, and widespread log aggregation and analysis. While research efforts are quickly proposing novel algorithms and technologies for cyber security, access to actual security personnel, their data, and their problems are necessarily limited by security concerns and time constraints. To help bridge the gap between researchers and security centers, this paper reports results of semi-structured interviews of 13 professionals from five different SOCs including at least one large academic, research, and government organization. The interviews focused on the current practices and future desires of SOC operators about host-based data collection capabilities, what is learned from the data, what tools are used, and how tools are evaluated. Questions and the responses are organized and reported by topic. Then broader themes are discussed. Forest-level takeaways from the interviews center on problems stemming from size of data, correlation of heterogeneous but related data sources, signal-to-noise ratio of data, and analysts time.
The rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data scientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models based on a target objectives (e.g. accuracy or run-time efficiency). Though not yet widely adopted, we are interested in understanding how AutoAI will impact the practice of data science. We conducted interviews with 20 data scientists who work at a large, multinational technology company and practice data science in various business settings. Our goal is to understand their current work practices and how these practices might change with AutoAI. Reactions were mixed: while informants expressed concerns about the trend of automating their jobs, they also strongly felt it was inevitable. Despite these concerns, they remained optimistic about their future job security due to a view that the future of data science work will be a collaboration between humans and AI systems, in which both automation and human expertise are indispensable.
Despite the long history of studying instant messaging usage in organizations, we know very little about how todays people participate in group chat channels and interact with others. In this short note, we aim to update the existing knowledge on how group chat is used in the context of todays organizations. We have the privilege of collecting a total of 4300 publicly available group chat channels in Slack from an R&D department in a multinational IT company. Through qualitative coding of 100 channels, we identified 9 channel categories such as project based channels and event channels. We further defined a feature metric with 21 features to depict the group communication style for these group chat channels, with which we successfully trained a machine learning model that can automatically classify a given group channel into one of the 9 categories. In addition, we illustrated how these communication metrics could be used for analyzing teams collaboration activities. We focused on 117 project teams as we have their performance data, and further collected 54 out of the 117 teams Slack group data and generated the communication style metrics for each of them. With these data, we are able to build a regression model to reveal the relationship between these group communication styles and one indicator of the project team performance.
Social biases based on gender, race, etc. have been shown to pollute machine learning (ML) pipeline predominantly via biased training datasets. Crowdsourcing, a popular cost-effective measure to gather labeled training datasets, is not immune to the inherent social biases of crowd workers. To ensure such social biases arent passed onto the curated datasets, its important to know how biased each crowd worker is. In this work, we propose a new method based on counterfactual fairness to quantify the degree of inherent social bias in each crowd worker. This extra information can be leveraged together with individual worker responses to curate a less biased dataset.
Video editing can be a very tedious task, so unsurprisingly Artificial Intelligence has been increasingly used to streamline the workflow or automate away tedious tasks. However, it is very difficult to get an overview of what intelligent video editing tools are in the research literature and needs for automation from the video editors. So, we identified the field of intelligent video editing tools in research, and we survey the opinions of professional video editors. We have also summarized current state of the art in artificial intelligence research with the intention of identifying what are the possibilities and current technical limits towards truly intelligent video editing tools. The findings contribute towards understanding of the field of intelligent video editing tools, highlights unaddressed automation needs by the survey and provides general suggestions for further research in intelligent video editing tools.