No Arabic abstract
This half day workshop explores challenges in data search, with a particular focus on data on the web. We want to stimulate an interdisciplinary discussion around how to improve the description, discovery, ranking and presentation of structured and semi-structured data, across data formats and domain applications. We welcome contributions describing algorithms and systems, as well as frameworks and studies in human data interaction. The workshop aims to bring together communities interested in making the web of data more discoverable, easier to search and more user friendly.
In this paper, we propose ART1 neural network clustering algorithm to group users according to their Web access patterns. We compare the quality of clustering of our ART1 based clustering technique with that of the K-Means and SOM clustering algorithms in terms of inter-cluster and intra-cluster distances. The results show the average inter-cluster distance of ART1 is high compared to K-Means and SOM when there are fewer clusters. As the number of clusters increases, average inter-cluster distance of ART1 is low compared to K-Means and SOM which indicates the high quality of clusters formed by our approach.
A compilator is a program which is development in a programming language that read a file known as source. After this file have to translate and have to convert in other program known as object or to generate a exit. The best way for to know any programming language is analizing a compilation process which is same in all programming paradigm existents. To like to generate a tool that permit a learning in university course. This course could explain in any plataform such as Linux o Windows. This goal is posible through development a Web aplication which is unite with a compilator, it is Traductor Writing System (Sistema de Escritura de Traductores). This system is complete and permit extend and modify the compilator. The system is a module in Moodle which is a Course Management System (CMS) that help teachers for to create comunities of learning in line. This software is in free software license (GPL).
In this paper, we present a reconfigurable hybrid Photonic-Plasmonic Network-on-Chip (NoC) based on the Dynamic Data Driven Application System (DDDAS) paradigm. In DDDAS computations and measurements form a dynamic closed feedback loop in which they tune one another in response to changes in the environment. Our proposed system enables dynamic augmentation of a base electrical mesh topology with an optical express bus during the run-time. In addition, the measurement process itself adjusts to the environment. In order to achieve lower latencies, lower dynamic power, and higher throughput, we take advantage of a Configurable Hybrid Photonic Plasmonic Interconnect (CHyPPI) for our reconfigurable connections. We evaluate the performance and power of our system against kernels from NAS Parallel Benchmark (NPB) in addition to some synthetically generated traffic. In comparison to a 16x16 base electrical mesh, D3NOC shows up to 89% latency and 67% dynamic power net improvements beyond overhead-corrected performance. It should be noted that the design-space of NoC reconfiguration is vast and the goal of this study is not design-space exploration. Our goal is to show the potentials of adaptive dynamic measurements when coupled with other reconfiguration techniques in the NoC context.
The Linked Open Data practice has led to a significant growth of structured data on the Web in the last decade. Such structured data describe real-world entities in a machine-readable way, and have created an unprecedented opportunity for research in the field of Natural Language Processing. However, there is a lack of studies on how such data can be used, for what kind of tasks, and to what extent they can be useful for these tasks. This work focuses on the e-commerce domain to explore methods of utilising such structured data to create language resources that may be used for product classification and linking. We process billions of structured data points in the form of RDF n-quads, to create multi-million words of product-related corpora that are later used in three different ways for creating of language resources: training word embedding models, continued pre-training of BERT-like language models, and training Machine Translation models that are used as a proxy to generate product-related keywords. Our evaluation on an extensive set of benchmarks shows word embeddings to be the most reliable and consistent method to improve the accuracy on both tasks (with up to 6.9 percentage points in macro-average F1 on some datasets). The other two methods however, are not as useful. Our analysis shows that this could be due to a number of reasons, including the biased domain representation in the structured data and lack of vocabulary coverage. We share our datasets and discuss how our lessons learned could be taken forward to inform future research in this direction.
The scarcity of Smart Home data is still a pretty big problem, and in a world where the size of a dataset can often make the difference between a poor performance and a good performance for problems related to machine learning projects, this needs to be resolved. But whereas the problem of retrieving real data cant really be resolved, as most of the time the process of installing sensors and retrieving data can be found to be really expensive and time-consuming, we need to find a faster and easier solution, which is where synthetic data comes in. Here we propose BinarySDG (Binary Synthetic Data Generator) as a flexible and easy way to generate synthetic data for binary sensors.