ترغب بنشر مسار تعليمي؟ اضغط هنا

Low-tubal-rank tensor approximation has been proposed to analyze large-scale and multi-dimensional data. However, finding such an accurate approximation is challenging in the streaming setting, due to the limited computational resources. To alleviate this issue, this paper extends a popular matrix sketching technique, namely Frequent Directions, for constructing an efficient and accurate low-tubal-rank tensor approximation from streaming data based on the tensor Singular Value Decomposition (t-SVD). Specifically, the new algorithm allows the tensor data to be observed slice by slice, but only needs to maintain and incrementally update a much smaller sketch which could capture the principal information of the original tensor. The rigorous theoretical analysis shows that the approximation error of the new algorithm can be arbitrarily small when the sketch size grows linearly. Extensive experimental results on both synthetic and real multi-dimensional data further reveal the superiority of the proposed algorithm compared with other sketching algorithms for getting low-tubal-rank approximation, in terms of both efficiency and accuracy.
State-of-the-art temporal action detectors to date are based on two-stream input including RGB frames and optical flow. Although combining RGB frames and optical flow boosts performance significantly, optical flow is a hand-designed representation wh ich not only requires heavy computation, but also makes it methodologically unsatisfactory that two-stream methods are often not learned end-to-end jointly with the flow. In this paper, we argue that optical flow is dispensable in high-accuracy temporal action detection and image level data augmentation (ILDA) is the key solution to avoid performance degradation when optical flow is removed. To evaluate the effectiveness of ILDA, we design a simple yet efficient one-stage temporal action detector based on single RGB stream named DaoTAD. Our results show that when trained with ILDA, DaoTAD has comparable accuracy with all existing state-of-the-art two-stream detectors while surpassing the inference speed of previous methods by a large margin and the inference speed is astounding 6668 fps on GeForce GTX 1080 Ti. Code is available at url{https://github.com/Media-Smart/vedatad}.
We consider a new setting of facility location games with ordinal preferences. In such a setting, we have a set of agents and a set of facilities. Each agent is located on a line and has an ordinal preference over the facilities. Our goal is to desig n strategyproof mechanisms that elicit truthful information (preferences and/or locations) from the agents and locate the facilities to minimize both maximum and total cost objectives as well as to maximize both minimum and total utility objectives. For the four possible objectives, we consider the 2-facility settings in which only preferences are private, or locations are private. For each possible combination of the objectives and settings, we provide lower and upper bounds on the approximation ratios of strategyproof mechanisms, which are asymptotically tight up to a constant. Finally, we discuss the generalization of our results beyond two facilities and when the agents can misreport both locations and preferences.
We study the problem of maximizing a monotone $k$-submodular function $f$ under a knapsack constraint, where a $k$-submodular function is a natural generalization of a submodular function to $k$ dimensions. We present a deterministic $(frac12-frac{1} {2e})$-approximation algorithm that evaluates $f$ $O(n^5k^4)$ times.
In this paper, we present CogNet, a knowledge base (KB) dedicated to integrating three types of knowledge: (1) linguistic knowledge from FrameNet, which schematically describes situations, objects and events. (2) world knowledge from YAGO, Freebase, DBpedia and Wikidata, which provides explicit knowledge about specific instances. (3) commonsense knowledge from ConceptNet, which describes implicit general facts. To model these different types of knowledge consistently, we introduce a three-level unified frame-styled representation architecture. To integrate free-form commonsense knowledge with other structured knowledge, we propose a strategy that combines automated labeling and crowdsourced annotation. At present, CogNet integrates 1,000+ semantic frames from linguistic KBs, 20,000,000+ frame instances from world KBs, as well as 90,000+ commonsense assertions from commonsense KBs. All these data can be easily queried and explored on our online platform, and free to download in RDF format for utilization under a CC-BY-SA 4.0 license. The demo and data are available at http://cognet.top/.
We study the facility location games with candidate locations from a mechanism design perspective. Suppose there are n agents located in a metric space whose locations are their private information, and a group of candidate locations for building fac ilities. The authority plans to build some homogeneous facilities among these candidates to serve the agents, who bears a cost equal to the distance to the closest facility. The goal is to design mechanisms for minimizing the total/maximum cost among the agents. For the single-facility problem under the maximum-cost objective, we give a deterministic 3-approximation group strategy-proof mechanism, and prove that no deterministic (or randomized) strategy-proof mechanism can have an approximation ratio better than 3 (or 2). For the two-facility problem on a line, we give an anonymous deterministic group strategy-proof mechanism that is (2n-3)-approximation for the total-cost objective, and 3-approximation for the maximum-cost objective. We also provide (asymptotically) tight lower bounds on the approximation ratio.
We study a participatory budgeting problem of aggregating the preferences of agents and dividing a budget over the projects. A budget division solution is a probability distribution over the projects. The main purpose of our study concerns the compar ison between the system optimum solution and a fair solution. We are interested in assessing the quality of fair solutions, i.e., in measuring the system efficiency loss under a fair allocation compared to the one that maximizes (egalitarian) social welfare. This indicator is called the price of fairness. We are also interested in the performance of several aggregation rules. Asymptotically tight bounds are provided both for the price of fairness and the efficiency guarantee of aggregation rules.
We study single-candidate voting embedded in a metric space, where both voters and candidates are points in the space, and the distances between voters and candidates specify the voters preferences over candidates. In the voting, each voter is asked to submit her favorite candidate. Given the collection of favorite candidates, a mechanism for eliminating the least popular candidate finds a committee containing all candidates but the one to be eliminated. Each committee is associated with a social value that is the sum of the costs (utilities) it imposes (provides) to the voters. We design mechanisms for finding a committee to optimize the social value. We measure the quality of a mechanism by its distortion, defined as the worst-case ratio between the social value of the committee found by the mechanism and the optimal one. We establish new upper and lower bounds on the distortion of mechanisms in this single-candidate voting, for both general metrics and well-motivated special cases.
129 - Chenhao Wang 2019
Lip-reading aims to recognize speech content from videos via visual analysis of speakers lip movements. This is a challenging task due to the existence of homophemes-words which involve identical or highly similar lip movements, as well as diverse li p appearances and motion patterns among the speakers. To address these challenges, we propose a novel lip-reading model which captures not only the nuance between words but also styles of different speakers, by a multi-grained spatio-temporal modeling of the speaking process. Specifically, we first extract both frame-level fine-grained features and short-term medium-grained features by the visual front-end, which are then combined to obtain discriminative representations for words with similar phonemes. Next, a bidirectional ConvLSTM augmented with temporal attention aggregates spatio-temporal information in the entire input sequence, which is expected to be able to capture the coarse-gained patterns of each word and robust to various conditions in speaker identity, lighting conditions, and so on. By making full use of the information from different levels in a unified framework, the model is not only able to distinguish words with similar pronunciations, but also becomes robust to appearance changes. We evaluate our method on two challenging word-level lip-reading benchmarks and show the effectiveness of the proposed method, which also demonstrate the above claims.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا