Incremental Sorting for Large Dynamic Data Sets


Aydin A. A. , ANDERSON K. M.

1st IEEE International Conference on Big Data Computing Service and Applications (BigDataService), San-Francisco, Kostarika, 30 Mart - 03 Nisan 2015, ss.170-175 identifier identifier

  • Cilt numarası:
  • Doi Numarası: 10.1109/bigdataservice.2015.35
  • Basıldığı Şehir: San-Francisco
  • Basıldığı Ülke: Kostarika
  • Sayfa Sayıları: ss.170-175

Özet

In today's world of pervasive computing, it is straightforward for organizations to generate large amounts of data in support of a variety of business needs. For this reason, it is important to build tools that allow analysts to manage and investigate these data sets quickly and efficiently. One feature needed by these tools is the ability to sort large amounts of data along a number of dimensions to facilitate the search for useful information. In this paper, we describe a new method for incrementally sorting large, multi-dimensional, dynamic data sets. Our particular use case involves sorting large Twitter data sets but our technique can be applied more generally across a variety of data types. Our approach is evaluated with respect to its scalability and by comparing it to several alternatives. It is currently able to efficiently sort data sets consisting of tens of millions of tweets along a variety of dimensions even when the data set is under active collection and new tweets are being added each day. The approach incrementally integrates the new tweets and provides sorted views of all tweets along various dimensions without having to re-sort the previously sorted tweets. The paper presents the benefits of the technique, discusses its limitations, and describes its software engineering contributions.