Open Access Open Access  Restricted Access Subscription Access

Big Data Trip Classification on the New York City Taxi and Uber Sensor Network

Huiyu Sun,
Siyuan Hu,
Suzanne McIntosh,
Yi Cao,

Abstract


Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application.

Citation Format:
Huiyu Sun, Siyuan Hu, Suzanne McIntosh, Yi Cao, "Big Data Trip Classification on the New York City Taxi and Uber Sensor Network," Journal of Internet Technology, vol. 19, no. 2 , pp. 591-598, Mar. 2018.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.





Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Library and Information Center, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd. Shoufeng, Hualien 97401, Taiwan, R.O.C.
Tel: +886-3-931-7017  E-mail: jit.editorial@gmail.com