Record-By-Record Processing

This page began as a bibliography, and this paragraph may be replaced over time by a Synopsis of the topic.

~

AKBAR, Adnan, 2017. Extracting Knowledge from Raw IoT Data Streams. University of Surrey (United Kingdom). page [Accessed 20 February 2024].

AKIDAU, Tyler, BARBIER, Paul, CSERI, Istvan, HUESKE, Fabian, JONES, Tyler, LIONHEART, Sasha, MILLS, Daniel, PAULIUKEVICH, Dzmitry, PROBST, Lukas, SEMMLER, Niklas, SOTOLONGO, Dan and ZHANG, Boyuan, 2023. What’s the Difference? Incremental Processing with Change Queries in Snowflake. Proceedings of the ACM on Management of Data. Online. 13 June 2023. Vol. 1, no. 2, p. 1–27. DOI 10.1145/3589776. [Accessed 20 February 2024]. Incremental algorithms are the heart and soul of stream processing. Low latency results depend on the ability to react to the subset of changes in a dataset over time rather than reprocessing the entirety of a dataset as it evolves. But while the SQL language is well suited for representing streams of changes (via tables) and their application to tables over time (via DML), it entirely lacks a method to query the changes to a table or view in the first place. In this paper, we present CHANGES queries and STREAM objects, Snowflake’s primitives for querying and consuming incremental changes to table objects over time. CHANGES queries and STREAMs have been in use within Snowflake for three years, and see broad adoption across our customers. We describe the semantics of these primitives, discuss the implementation challenges, present an analysis of their usage at Snowflake, and contrast with other offerings. ALEXANDER, John Lindsay, 1987. An implementation of domains and keys in SQL: a thesis presented in fulfilment of the requirements for the degree Master of Philosophy at Massey University. Online. PhD Thesis. Massey University. Available from: https://mro.massey.ac.nz/handle/10179/8549 [Accessed 20 February 2024]. ANTONIOU, Grigoris, BARYANNIS, George, BATSAKIS, Sotiris, GOVERNATORI, Guido, ROBALDO, Livio, SIRAGUSA, Giovanni and TACHMAZIDIS, Ilias, 2018. Legal reasoning and big data: opportunities and challenges. Legal Reasoning and Big Data: Opportunities and Challenges. Online. 2018. Available from: https://orbilu.uni.lu/bitstream/10993/38959/1/Antoniuo%20et%20al%20-%20Legal%20Reasoning%20and%20Big%20Data%20Opportunities%20and%20Challenges..pdf [Accessed 20 February 2024]. BARBIER, PAUL, ISTVAN, CSERI, DANIEL, MILLS, SOTOLONGO, DAN and BOYUAN, ZHANG, 2023. What’s the Difference? Incremental Processing with Change Queries in Snowflake. . Online. 2023. Available from: https://liuyehcf.github.io/resources/paper/Incremental-Processing-with-Change-Queries-in-Snowflake.pdf [Accessed 20 February 2024]. BELO, Orlando, SANTOS, Vasco, OLIVEIRA, Bruno, GOMES, Cláudia and MARQUES, Ricardo, 2018. RAID-B2K, transforming BPMN conceptual schemas into Kettle execution primitives. International Journal of Information and Decision Sciences. Online. 2018. Vol. 10, no. 1, p. 3. DOI 10.1504/IJIDS.2018.090666. [Accessed 20 February 2024]. BERMAN, Helen, 2002. Protein Data Bank Project at Rutgers University. Online. National Science Foundation, Arlington, VA (US). Available from: https://www.osti.gov/biblio/805813 [Accessed 20 February 2024]. COULTER, Cynthia M., WRIGHT, Michael, BABINEC, Michael, KREMPASKY, Frances, PINCKARD, Susan H. and YORK, Maurice, 2005. TECHNICAL SERVICES REPORT. Technical Services Quarterly. Online. 1 January 2005. Vol. 22, no. 4, p. 77–105. DOI 10.1300/J124v22n04_06. [Accessed 20 February 2024]. CURTIS, Jonathan, 2018. A Comparison of Real Time Stream Processing Frameworks. . Online. 2018. Available from: https://arrow.tudublin.ie/scschcomdis/134/ [Accessed 20 February 2024]. DAWOOD, Haitham M., RODRIGUEZ-MAREK, Adrian, BAYLESS, Jeff, GOULET, Christine and THOMPSON, Eric, 2016. A Flatfile for the KiK-net Database Processed Using an Automated Protocol. Earthquake Spectra. Online. May 2016. Vol. 32, no. 2, p. 1281–1302. DOI 10.1193/071214eqs106. [Accessed 20 February 2024]. The Kiban-Kyoshin network (KiK-net) database is an important resource for ground motion (GM) studies. The processing of the KiK-net records is a necessary first step to enable their use in engineering applications. In this manuscript we present a step-by-step automated protocol used to systematically process about 157,000 KiK-net strong ground motion records. The automated protocol includes the selection of the corner frequency for high-pass filtering. In addition, a comprehensive set of metadata was compiled for each record. As a part of the metadata collection, two algorithms were used to identify dependent and independent earthquakes. Earthquakes are also classified into active crustal or subduction type events; most of the GM records correspond to subduction type earthquakes. A flatfile with all the metadata and the spectral acceleration of the processed records is uploaded to NEEShub ( https://nees.org/resources/7849 , Dawood et al. 2014 ). DAWOOD, Haitham Mohamed Mahmoud Mousad, 2014. Partitioning uncertainty for non-ergodic probabilistic seismic hazard analyses. Online. PhD Thesis. Virginia Tech. Available from: https://vtechworks.lib.vt.edu/handle/10919/70757 [Accessed 20 February 2024]. DECANETO, Alessandra, 2016. Design and testing of an active big data architecture for social and crowding emergency management. . Online. 2016. Available from: https://www.politesi.polimi.it/handle/10589/134427 [Accessed 20 February 2024]. DEY, Akon, FEKETE, Alan and RÖHM, Uwe, 2015. Scalable distributed transactions across heterogeneous stores. In: 2015 IEEE 31st International Conference on Data Engineering. Online. IEEE. 2015. p. 125–136. Available from: https://ieeexplore.ieee.org/abstract/document/7113278/ [Accessed 20 February 2024]. EDWARDS, W. B., 1965. Evaluation and performance of computers: interaction of hardware and software parameters in tape operations. In: Proceedings of the 1965 20th national conference on -. Online. Cleveland, Ohio, United States: ACM Press. 1965. p. 54–65. DOI 10.1145/800197.806033. [Accessed 20 February 2024]. GIBSON, Richard, ALAKO, Blaise, AMID, Clara, CERDENO-TÁRRAGA, Ana, CLELAND, Iain, GOODGAME, Neil, TEN HOOPEN, Petra, JAYATHILAKA, Suran, KAY, Simon and LEINONEN, Rasko, 2016. Biocuration of functional annotation at the European nucleotide archive. Nucleic Acids Research. Online. 2016. Vol. 44, no. D1, p. D58–D66. Available from: https://academic.oup.com/nar/article-abstract/44/D1/D58/2503102 [Accessed 20 February 2024]. GRAYBEAL, Daniel Y., 2005. Semiautomated quality control of historical sub-daily surface synoptic meteorological data: Application of attributes control methodology. In: 15th Conference on Applied Climatology. Online. 2005. Available from: https://ams.confex.com/ams/15AppClimate/techprogram/paper_94197.htm [Accessed 20 February 2024]. GRUENSTAEUDL, Michael and HARTMARING, Yannick, 2018. EMBL2checklists: A Python package to facilitate the user-friendly submission of plant DNA barcoding sequences to ENA. bioRxiv. Online. 2018. P. 435644. Available from: https://www.biorxiv.org/content/10.1101/435644.abstract [Accessed 20 February 2024]. GRUENSTAEUDL, Michael and HARTMARING, Yannick, 2019. EMBL2checklists: A Python package to facilitate the user-friendly submission of plant and fungal DNA barcoding sequences to ENA. Plos one. Online. 2019. Vol. 14, no. 1, p. e0210347. Available from: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0210347 [Accessed 20 February 2024]. HOY, Andrew R., 2013. An observational study of alcohol-related readmission dynamics in young people. The Lancet. Online. 2013. Vol. 382, p. S13. Available from: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(13)62438-1/fulltext [Accessed 20 February 2024]. JADHAV, Niranjan, 2017. Comparing Performance of Spark and Flink on Batch and Streaming Data. Online. Wayne State University. Available from: https://search.proquest.com/openview/6dc218bb444929a8b16ad56f6c836b53/1?pq-origsite=gscholar&cbl=18750 [Accessed 20 February 2024]. JÄRNANKAR, Jesper and SANDSTRÖM, Jacob, 2018. Event-driven data collection and processing. . Online. 2018. Available from: https://odr.chalmers.se/bitstream/20.500.12380/256445/1/256445.pdf [Accessed 20 February 2024]. KURTH, Martin, RUDDY, David and RUPP, Nathan, 2004. Repurposing MARC metadata: using digital project experience to develop a metadata management design. Library Hi Tech. Online. 2004. Vol. 22, no. 2, p. 153–165. Available from: https://www.emerald.com/insight/content/doi/10.1108/07378830410524585/full/html [Accessed 20 February 2024]. LEE, Jeffrey E., 2013. The GLAS Science Algorithm Software (GSAS) Detailed Design Document Version 6. Online. Available from: https://ntrs.nasa.gov/citations/20130014757 [Accessed 20 February 2024]. MAK, M. W., 1996. BY MW Mak. . Online. 1996. Available from: https://liacs.leidenuniv.nl/assets/PDF/mak.96.pdf [Accessed 20 February 2024]. MCKERRACHER, Priscilla, HAN, Hyung, HOLLAND, Douglas and STOCK, Jacqueline, 1999. Database applications in science data systems for low-cost satellite missions. . Online. 1999. Available from: https://digitalcommons.usu.edu/smallsat/1999/all1999/14/ [Accessed 20 February 2024]. MIRZAKHANOV, Vugar and GARDASHOVA, Latafat, 2019. Wu–Mendel approach for linguistic summarization: practical considerations and solutions. In: 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). Online. IEEE. 2019. p. 1–8. Available from: https://ieeexplore.ieee.org/abstract/document/8858998/ [Accessed 20 February 2024]. OLIVEIRA, Bruno and BELO, Orlando, 2014. ETL Patterns on YAWL. In: Proceedings of the 16th International Conference on Enterprise Information Systems-Volume 1. Online. 2014. p. 299–307. Available from: https://pdfs.semanticscholar.org/3a39/8e356ec7eba17fbfb2621945b35a799dd467.pdf [Accessed 20 February 2024]. OLIVEIRA, Bruno, SANTOS, Vasco, GOMES, Claudia, MARQUES, Ricardo and BELO, Orlando, 2015. Conceptual-Physical Bridging-From BPMN Models to Physical Implementations on Kettle. In: BPM (Demos). Online. Citeseer. 2015. p. 55–59. Available from: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=c446b7b67d66733f3df230d5635fa507fda88b05 [Accessed 20 February 2024]. OO, Sitt Min, [no date]. Dynamic Windows processing in RDF mapping engines for data streams. . Online. Available from: https://libstore.ugent.be/fulltxt/RUG01/003/014/961/RUG01-003014961_2021_0001_AC.pdf [Accessed 20 February 2024]. PAPAGEORGOPOULOS, Nikos, 2023. Comparative analysis of trajectory similarity techniques for vessels in real time: a case study on maritime traffic monitoring. Online. Master’s Thesis. Πανεπıotaστ\acute\etaμıotao Πεıotaραıota\acuteømegaς. Available from: https://dione.lib.unipi.gr/xmlui/handle/unipi/15539 [Accessed 20 February 2024]. RASH, James, 1986. A prototype expert system in OPS5 for data error detection. Telematics and Informatics. Online. 1986. Vol. 3, no. 3, p. 199–209. Available from: https://www.sciencedirect.com/science/article/pii/S0736585386800958 [Accessed 20 February 2024]. SANDBERG, Goran, 1981. A primer on relational data base concepts. IBM systems journal. Online. 1981. Vol. 20, no. 1, p. 23–40. Available from: https://ieeexplore.ieee.org/abstract/document/5387893/ [Accessed 20 February 2024]. SINCLAIR, Russell, 2000a. From Access to SQL Server. Online. Apress. Available from: https://books.google.ch/books?hl=de&lr=&id=gcEYAAAAQBAJ&oi=fnd&pg=PR9&dq=%22record-by-record+processing%22&ots=cm20lA3ke4&sig=f6ge25OcWTyapnyR1FBCve_DkDc [Accessed 20 February 2024]. SINCLAIR, Russell, 2000b. Planning the Upgrade. In: SINCLAIR, Russell, From Access to SQL Server. Online. Berkeley, CA: Apress. p. 37–57. ISBN 978-1-893115-24-8. [Accessed 20 February 2024]. TA-SHMA, Paula, AKBAR, Adnan, GERSON-GOLAN, Guy, HADASH, Guy, CARREZ, Francois and MOESSNER, Klaus, 2017. An ingestion and analytics architecture for iot applied to smart city use cases. IEEE Internet of Things Journal. Online. 2017. Vol. 5, no. 2, p. 765–774. Available from: https://ieeexplore.ieee.org/abstract/document/7964673/ [Accessed 20 February 2024]. WESTBROOK, John, FENG, Zukang, JAIN, Shri, BHAT, Talapady N., THANKI, Narmada, RAVICHANDRAN, Veerasamy, GILLILAND, Gary L., BLUHM, Wolfgang, WEISSIG, Helge and GREER, Douglas S., 2002. The protein data bank: unifying the archive. Nucleic acids research. Online. 2002. Vol. 30, no. 1, p. 245–248. Available from: https://academic.oup.com/nar/article-abstract/30/1/245/1332518 [Accessed 20 February 2024]. WRIGHT, S. L., 1990. The Evolution of the SSE Data Storage System into a Persistent Object System. In: ROSENBERG, John and KOCH, David, Persistent Object Systems. Online. London: Springer London. p. 248–257. Workshops in Computing. ISBN 978-3-540-19626-6. [Accessed 20 February 2024]. ZAHARIA, Matei, DAS, Tathagata, LI, Haoyuan, HUNTER, Timothy, SHENKER, Scott and STOICA, Ion, 2012. Discretized streams: A fault-tolerant model for scalable stream processing. University of California at Berkeley Technical Report No. UCB/EECS-2012-259. Online. 2012. Available from: https://apps.dtic.mil/sti/citations/ADA575859 [Accessed 20 February 2024]. ΣEΓKOY, Mαργαρ\acuteıotaτα, BOYΛΓAPHΣ, Nıotaκóλαoς and MAKPOΠOYΛOΣ, Kømegaνσταντ\acuteıotaνoς, [no date]. PROSCHEMA: \’Eνα περıotaβ\acute\alphaλλoν επεξεργασ\acuteıotaας αρχε\acuteıotaømegaν ıotaσχυρ\acute\etaς κ\acuteıotaνησης σε Matlab. .