Please use this identifier to cite or link to this item:
https://etd.cput.ac.za/handle/20.500.11838/3086
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Kabaso, Boniface, Dr | - |
dc.contributor.author | Anikwue, Arinze | - |
dc.date.accessioned | 2020-04-30T10:08:08Z | - |
dc.date.available | 2020-04-30T10:08:08Z | - |
dc.date.issued | 2019 | - |
dc.identifier.uri | http://hdl.handle.net/20.500.11838/3086 | - |
dc.description | Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2019 | en_US |
dc.description.abstract | The proliferation of data from sources like social media, and sensor devices has become overwhelming for traditional data storage and analysis technologies to handle. This has prompted a radical improvement in data management techniques, tools and technologies to meet the increasing demand for effective collection, storage and curation of large data set. Most of the technologies are open-source. Big data is usually described as very large dataset. However, a major feature of big data is its velocity. Data flow in as continuous stream and require to be actioned in real-time to enable meaningful, relevant value. Although there is an explosion of technologies to handle big data, they are usually targeted at processing large dataset (historic) and real-time big data independently. Thus, the need for a unified framework to handle high volume dataset and real-time big data. This resulted in the development of models such as the Lambda architecture. Effective decision-making requires processing of historic data as well as real-time data. Some decision-making involves complex processes, depending on the likelihood of events. To handle uncertainty, probabilistic systems were designed. Probabilistic systems use probabilistic models developed with probability theories such as hidden Markov models with inference algorithms to process data and produce probabilistic scores. However, development of these models requires extensive knowledge of statistics and machine learning, making it an uphill task to model real-life circumstances. A new research area called probabilistic programming has been introduced to alleviate this bottleneck. This research proposes the combination of modern open-source big data technologies with probabilistic programming and Lambda architecture on easy-to-get hardware to develop a highly fault-tolerant, and scalable processing tool to process both historic and real-time big data in real-time; a common solution. This system will empower decision makers with the capacity to make better informed resolutions especially in the face of uncertainty. The outcome of this research will be a technology product, built and assessed using experimental evaluation methods. This research will utilize the Design Science Research (DSR) methodology as it describes guidelines for the effective and rigorous construction and evaluation of an artefact. Probabilistic programming in the big data domain is still at its infancy, however, the developed artefact demonstrated an important potential of probabilistic programming combined with Lambda architecture in the processing of big data. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Cape Peninsula University of Technology | en_US |
dc.subject | Big Data | en_US |
dc.subject | big data processing | en_US |
dc.subject | probabilistic reasoning | en_US |
dc.subject | probabilistic programming | en_US |
dc.subject | Lambda architecture | en_US |
dc.title | Real-time probabilistic reasoning system using Lambda architecture | en_US |
dc.type | Thesis | en_US |
Appears in Collections: | Information Technology - Master's Degree |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Anikwue_Arinze_214177149.pdf | 1.89 MB | Adobe PDF | View/Open |
Page view(s)
716
Last Week
1
1
Last month
9
9
checked on Nov 27, 2024
Download(s)
1,467
checked on Nov 27, 2024
Google ScholarTM
Check
Items in Digital Knowledge are protected by copyright, with all rights reserved, unless otherwise indicated.