Copyrights to these papers may be held by the publishers. The download files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
Joya Deri, Franz Franchetti and José M. F. Moura (Proc. IEEE International Conference on Big Data (Big Data), IEEE, pp. 2616-2625, 2016)
Big Data Computation of Taxi Movement in New York City
Preprint (2.7 MB)
Published paper (link to publisher)
We seek to extract and explore statistics that characterize New York City traffic flows based on 700 million taxi trips in the 2010-2013 New York City taxi data. This paper presents a two-part solution for intensive computation: space and time design considerations for estimating taxi trajectories with Dijkstra’s algorithm, and job parallelization and scheduling with HTCondor. Our contribution is to present a solution that reduces execution time from 3,000 days to less than a day with detailed analysis of the necessary design decisions.Keywords: Big data