Merchant Category Identification Using Credit Card Transactions

334 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chin-Chia Michael Yeh

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Chin-Chia Michael Yeh - Zhongfang Zhuang - Yan Zheng

التعلم الآلي استرجاع المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Digital payment volume has proliferated in recent years with the rapid growth of small businesses and online shops. When processing these digital transactions, recognizing each merchants real identity (i.e., business type) is vital to ensure the integrity of payment processing systems. Conventionally, this problem is formulated as a time series classification problem solely using the merchant transaction history. However, with the large scale of the data, and changing behaviors of merchants and consumers over time, it is extremely challenging to achieve satisfying performance from off-the-shelf classification methods. In this work, we approach this problem from a multi-modal learning perspective, where we use not only the merchant time series data but also the information of merchant-merchant relationship (i.e., affinity) to verify the self-reported business type (i.e., merchant category) of a given merchant. Specifically, we design two individual encoders, where one is responsible for encoding temporal information and the other is responsible for affinity information, and a mechanism to fuse the outputs of the two encoders to accomplish the identification task. Our experiments on real-world credit card transaction data between 71,668 merchants and 433,772,755 customers have demonstrated the effectiveness and efficiency of the proposed model.

قيم البحث

103 - Faxi Yuan , Amir Esmalian , Bora Oztekin 2021

The objective of this study is to examine spatial patterns of impacts and recovery of communities based on variances in credit card transactions. Such variances could capture the collective effects of household impacts, disrupted accesses, and busine ss closures, and thus provide an integrative measure for examining disaster impacts and community recovery in disasters. Existing studies depend mainly on survey and sociodemographic data for disaster impacts and recovery effort evaluations, although such data has limitations, including large data collection efforts and delayed timeliness results. In addition, there are very few studies have concentrated on spatial patterns and disparities of disaster impacts and short-term recovery of communities, although such investigation can enhance situational awareness during disasters and support the identification of disparate spatial patterns of disaster impacts and recovery in the impacted regions. This study examines credit card transaction data Harris County (Texas, USA) during Hurricane Harvey in 2017 to explore spatial patterns of disaster impacts and recovery during from the perspective of community residents and businesses at ZIP code and county scales, respectively, and to further investigate their spatial disparities across ZIP codes. The results indicate that individuals in ZIP codes with populations of higher income experienced more severe disaster impact and recovered more quickly than those located in lower-income ZIP codes for most business sectors. Our findings not only enhance the understanding of spatial patterns and disparities in disaster impacts and recovery for better community resilience assessment, but also could benefit emergency managers, city planners, and public officials in harnessing population activity data, using credit card transactions as a proxy for activity, to improve situational awareness and resource allocation.

أجهزة الكمبيوتر والمجتمع الاقتصاد العام الفيزياء والمجتمع

SCARFF: a Scalable Framework for Streaming Credit Card Fraud Detection with Spark

60 - Fabrizio Carcillo , Andrea Dal Pozzolo , Yann-Ael Le Borgne 2017

The expansion of the electronic commerce, together with an increasing confidence of customers in electronic payments, makes of fraud detection a critical factor. Detecting frauds in (nearly) real time setting demands the design and the implementation of scalable learning techniques able to ingest and analyse massive amounts of streaming data. Recent advances in analytics and the availability of open source solutions for Big Data storage and processing open new perspectives to the fraud detection field. In this paper we present a SCAlable Real-time Fraud Finder (SCARFF) which integrates Big Data tools (Kafka, Spark and Cassandra) with a machine learning approach which deals with imbalance, nonstationarity and feedback latency. Experimental results on a massive dataset of real credit card transactions show that this framework is scalable, efficient and accurate over a big stream of transactions.

النظم الموزعة والتوازية والحوسبة العنقودية

Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data

117 - Xiancai Tian , Chen Zhang , Baihua Zheng 2020

The metro system is playing an increasingly important role in the urban public transit network, transferring a massive human flow across space everyday in the city. In recent years, extensive research studies have been conducted to improve the servic e quality of metro systems. Among them, crowd management has been a critical issue for both public transport agencies and train operators. In this paper, by utilizing accumulated smart card data, we propose a statistical model to predict in-situ passenger density, i.e., number of on-board passengers between any two neighbouring stations, inside a closed metro system. The proposed model performs two main tasks: i) forecasting time-dependent Origin-Destination (OD) matrix by applying mature statistical models; and ii) estimating the travel time cost required by different parts of the metro network via truncated normal mixture distributions with Expectation-Maximization (EM) algorithm. Based on the prediction results, we are able to provide accurate prediction of in-situ passenger density for a future time point. A case study using real smart card data in Singapore Mass Rapid Transit (MRT) system demonstrate the efficacy and efficiency of our proposed method.

التعلم الآلي التعلم الالي

Hindsight Credit Assignment

142 - Anna Harutyunyan , Will Dabney , Thomas Mesnard 2019

We consider the problem of efficient credit assignment in reinforcement learning. In order to efficiently and meaningfully utilize new data, we propose to explicitly assign credit to past decisions based on the likelihood of them having led to the ob served outcome. This approach uses new information in hindsight, rather than employing foresight. Somewhat surprisingly, we show that value functions can be rewritten through this lens, yielding a new family of algorithms. We study the properties of these algorithms, and empirically show that they successfully address important credit assignment challenges, through a set of illustrative tasks.

التعلم الآلي التعلم الالي

Identification of Orchid Species Using Content-Based Flower Image Retrieval

180 - D. H. Apriyanti , A.A. Arymurthy , L.T. Handoko 2014

In this paper, we developed the system for recognizing the orchid species by using the images of flower. We used MSRM (Maximal Similarity based on Region Merging) method for segmenting the flower object from the background and extracting the shape fe ature such as the distance from the edge to the centroid point of the flower, aspect ratio, roundness, moment invariant, fractal dimension and also extract color feature. We used HSV color feature with ignoring the V value. To retrieve the image, we used Support Vector Machine (SVM) method. Orchid is a unique flower. It has a part of flower called lip (labellum) that distinguishes it from other flowers even from other types of orchids. Thus, in this paper, we proposed to do feature extraction not only on flower region but also on lip (labellum) region. The result shows that our proposed method can increase the accuracy value of content based flower image retrieval for orchid species up to $pm$ 14%. The most dominant feature is Centroid Contour Distance, Moment Invariant and HSV Color. The system accuracy is 85,33% in validation phase and 79,33% in testing phase.

الرؤية الحاسوبية وتمييز الأنماط استرجاع المعلومات التعلم الآلي