No Arabic abstract
The metro system is playing an increasingly important role in the urban public transit network, transferring a massive human flow across space everyday in the city. In recent years, extensive research studies have been conducted to improve the service quality of metro systems. Among them, crowd management has been a critical issue for both public transport agencies and train operators. In this paper, by utilizing accumulated smart card data, we propose a statistical model to predict in-situ passenger density, i.e., number of on-board passengers between any two neighbouring stations, inside a closed metro system. The proposed model performs two main tasks: i) forecasting time-dependent Origin-Destination (OD) matrix by applying mature statistical models; and ii) estimating the travel time cost required by different parts of the metro network via truncated normal mixture distributions with Expectation-Maximization (EM) algorithm. Based on the prediction results, we are able to provide accurate prediction of in-situ passenger density for a future time point. A case study using real smart card data in Singapore Mass Rapid Transit (MRT) system demonstrate the efficacy and efficiency of our proposed method.
Nowadays, metro systems play an important role in meeting the urban transportation demand in large cities. The understanding of passenger route choice is critical for public transit management. The wide deployment of Automated Fare Collection(AFC) systems opens up a new opportunity. However, only each trips tap-in and tap-out timestamp and stations can be directly obtained from AFC system records; the train and route chosen by a passenger are unknown, which are necessary to solve our problem. While existing methods work well in some specific situations, they dont work for complicated situations. In this paper, we propose a solution that needs no additional equipment or human involvement than the AFC systems. We develop a probabilistic model that can estimate from empirical analysis how the passenger flows are dispatched to different routes and trains. We validate our approach using a large scale data set collected from the Shenzhen metro system. The measured results provide us with useful inputs when building the passenger path choice model.
Existing studies have extensively used spatiotemporal data to discover the mobility patterns of various types of travellers. Smart card data (SCD) collected by the automated fare collection systems can reflect a general view of the mobility pattern of public transit riders. Mobility patterns of transit riders are temporally and spatially dynamic, and therefore difficult to measure. However, few existing studies measure both the mobility and stability of transit riders travel patterns over a long period of time. To analyse the long-term changes of transit riders travel behaviour, the authors define a metric for measuring the similarity between SCD, in this study. Also an improved density-based clustering algorithm, simplified smoothed ordering points to identify the clustering structure (SS-OPTICS), to identify transit rider clusters is proposed. Compared to the original OPTICS, SS-OPTICS needs fewer parameters and has better generalisation ability. Further, the generated clusters are categorized according to their features of regularity and occasionality. Based on the generated clusters and categories, fine- and coarse-grained travel pattern transitions of transit riders over four years from 2010 to 2014 are measured. By combining socioeconomic data of Beijing in the year of 2010 and 2014, the interdependence between stability and mobility of transit riders travel behaviour is also discussed.
In this paper, we target at recovering the exact routes taken by commuters inside a metro system that arenot captured by an Automated Fare Collection (AFC) system and hence remain unknown. We strategicallypropose two inference tasks to handle the recovering, one to infer the travel time of each travel link thatcontributes to the total duration of any trip inside a metro network and the other to infer the route preferencesbased on historical trip records and the travel time of each travel link inferred in the previous inferencetask. As these two inference tasks have interrelationship, most of existing works perform these two taskssimultaneously. However, our solutionTripDecoderadopts a totally different approach. To the best of ourknowledge,TripDecoderis the first model that points out and fully utilizes the fact that there are some tripsinside a metro system with only one practical route available. It strategically decouples these two inferencetasks by only taking those trip records with only one practical route as the input for the first inference taskof travel time and feeding the inferred travel time to the second inference task as an additional input whichnot only improves the accuracy but also effectively reduces the complexity of both inference tasks. Twocase studies have been performed based on the city-scale real trip records captured by the AFC systems inSingapore and Taipei to compare the accuracy and efficiency ofTripDecoderand its competitors. As expected,TripDecoderhas achieved the best accuracy in both datasets, and it also demonstrates its superior efficiencyand scalability.
Electronic health records (EHR) systems contain vast amounts of medical information about patients. These data can be used to train machine learning models that can predict health status, as well as to help prevent future diseases or disabilities. However, getting patients medical data to obtain well-trained machine learning models is a challenging task. This is because sharing the patients medical records is prohibited by law in most countries due to patients privacy concerns. In this paper, we tackle this problem by sharing the models instead of the original sensitive data by using the mimic learning approach. The idea is first to train a model on the original sensitive data, called the teacher model. Then, using this model, we can transfer its knowledge to another model, called the student model, without the need to learn the original data used in training the teacher model. The student model is then shared to the public and can be used to make accurate predictions. To assess the mimic learning approach, we have evaluated our scheme using different medical datasets. The results indicate that the student model mimics the teacher model performance in terms of prediction accuracy without the need to access to the patients original data records.
Data augmentation by mixing samples, such as Mixup, has widely been used typically for classification tasks. However, this strategy is not always effective due to the gap between augmented samples for training and original samples for testing. This gap may prevent a classifier from learning the optimal decision boundary and increase the generalization error. To overcome this problem, we propose an alternative framework called Data Interpolating Prediction (DIP). Unlike common data augmentations, we encapsulate the sample-mixing process in the hypothesis class of a classifier so that train and test samples are treated equally. We derive the generalization bound and show that DIP helps to reduce the original Rademacher complexity. Also, we empirically demonstrate that DIP can outperform existing Mixup.