Joya Deri, Franz Franchetti and José M. F. Moura (Proc. IEEE International Conference on Big Data (Big Data), IEEE, pp. 2616-2625, 2016)
Big Data Computation of Taxi Movement in New York City
Preprint (2.7 MB)
Published paper (link to publisher)
Bibtex

We seek to extract and explore statistics that characterize New York City traffic flows based on 700 million taxi trips in the 2010-2013 New York City taxi data. This paper presents a two-part solution for intensive computation: space and time design considerations for estimating taxi trajectories with Dijkstra’s algorithm, and job parallelization and scheduling with HTCondor. Our contribution is to present a solution that reduces execution time from 3,000 days to less than a day with detailed analysis of the necessary design decisions.

Keywords:
Big data