ترغب بنشر مسار تعليمي؟ اضغط هنا

Limiting Tags Fosters Efficiency

56   0   0.0 ( 0 )
 نشر من قبل Tiago Santos
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Tagging facilitates information retrieval in social media and other online communities by allowing users to organize and describe online content. Researchers found that the efficiency of tagging systems steadily decreases over time, because tags become less precise in identifying specific documents, i.e., they lose their descriptiveness. However, previous works did not answer how or even whether community managers can improve the efficiency of tags. In this work, we use information-theoretic measures to track the descriptive and retrieval efficiency of tags on Stack Overflow, a question-answering system that strictly limits the number of tags users can specify per question. We observe that tagging efficiency stabilizes over time, while tag content and descriptiveness both increase. To explain this observation, we hypothesize that limiting the number of tags fosters novelty and diversity in tag usage, two properties which are both beneficial for tagging efficiency. To provide qualitative evidence supporting our hypothesis, we present a statistical model of tagging that demonstrates how novelty and diversity lead to greater tag efficiency in the long run. Our work offers insights into policies to improve information organization and retrieval in online communities.



قيم البحث

اقرأ أيضاً

WhatsApp is the most popular messaging app in the world. The closed nature of the app, in addition to the ease of transferring multimedia and sharing information to large-scale groups make WhatsApp unique among other platforms, where an anonymous enc rypted messages can become viral, reaching multiple users in a short period of time. The personal feeling and immediacy of messages directly delivered to the users phone on WhatsApp was extensively abused to spread unfounded rumors and create misinformation campaigns during recent elections in Brazil and India. WhatsApp has been deploying measures to mitigate this problem, such as reducing the limit for forwarding a message to at most five users at once. Despite the welcomed effort to counter the problem, there is no evidence so far on the real effectiveness of such restrictions. In this work, we propose a methodology to evaluate the effectiveness of such measures on the spreading of misinformation circulating on WhatsApp. We use an epidemiological model and real data gathered from WhatsApp in Brazil, India and Indonesia to assess the impact of limiting virality features in this kind of network. Our results suggest that the current efforts deployed by WhatsApp can offer significant delays on the information spread, but they are ineffective in blocking the propagation of misinformation campaigns through public groups when the content has a high viral nature.
Risk-limiting audits (RLAs) are expected to strengthen the public confidence in the correctness of an election outcome. We hypothesize that this is not always the case, in part because for large margins between the winner and the runner-up, the numbe r of ballots to be drawn can be so small that voters lose confidence. We conduct a user study with 105 participants resident in the US. Our findings confirm the hypothesis, showing that our study participants felt less confident when they were told the number of ballots audited for RLAs. We elaborate on our findings and propose recommendations for future use of RLAs.
Many researchers work on improving the data efficiency of machine learning. What would happen if they succeed? This paper explores the social-economic impact of increased data efficiency. Specifically, we examine the intuition that data efficiency wi ll erode the barriers to entry protecting incumbent data-rich AI firms, exposing them to more competition from data-poor firms. We find that this intuition is only partially correct: data efficiency makes it easier to create ML applications, but large AI firms may have more to gain from higher performing AI systems. Further, we find that the effect on privacy, data markets, robustness, and misuse are complex. For example, while it seems intuitive that misuse risk would increase along with data efficiency -- as more actors gain access to any level of capability -- the net effect crucially depends on how much defensive measures are improved. More investigation into data efficiency, as well as research into the AI production function, will be key to understanding the development of the AI industry and its societal impacts.
This paper describes our solution to the multi-modal learning challenge of ICML. This solution comprises constructing three-level representations in three consecutive stages and choosing correct tag words with a data-specific strategy. Firstly, we us e typical methods to obtain level-1 representations. Each image is represented using MPEG-7 and gist descriptors with additional features released by the contest organizers. And the corresponding word tags are represented by bag-of-words model with a dictionary of 4000 words. Secondly, we learn the level-2 representations using two stacked RBMs for each modality. Thirdly, we propose a bimodal auto-encoder to learn the similarities/dissimilarities between the pairwise image-tags as level-3 representations. Finally, during the test phase, based on one observation of the dataset, we come up with a data-specific strategy to choose the correct tag words leading to a leap of an improved overall performance. Our final average accuracy on the private test set is 100%, which ranks the first place in this challenge.
Using time series of US patents per million inhabitants, knowledge-generating cycles can be distinguished. These cycles partly coincide with Kondratieff long waves. The changes in the slopes between them indicate discontinuities in the knowledge-gene rating paradigms. The knowledge-generating paradigms can be modeled in terms of interacting dimensions (for example, in university-industry-government relations) that set limits to the maximal efficiency of innovation systems. The maximum values of the parameters in the model are of the same order as the regression coefficients of the empirical waves. The mechanism of the increase in the dimensionality is specified as self-organization which leads to the breaking of existing relations into the more diversified structure of a fractal-like network. This breaking can be modeled in analogy to 2D and 3D (Koch) snowflakes. The boost of knowledge generation leads to newly emerging technologies that can be expected to be more diversified and show shorter life cycles than before. Time spans of the knowledge-generating cycles can also be analyzed in terms of Fibonacci numbers. This perspective allows for forecasting expected dates of future possible paradigm changes. In terms of policy implications, this suggests a shift in focus from the manufacturing technologies to developing new organizational technologies and formats of human interactions
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا