Wednesday, October 17, 2012

WSO2 BAM 2.0 - Overview

Introduction to WSO2 BAM

WSO2 BAM is an open source (with Apache license version 2.0) business activity monitor developed on top of WSO2 Carbon framework 4.0. WSO2 introduced BAM for its products initially to monitor the statistics of WSO2 Enterprise Service Bus (ESB). But in BAM 2 the concept was broad and it acts as a server basically capable of capturing data from external source, process them and visualize them according to the requirement.
WSO2 BAM is not completely described by the definition of BAM describes previously. WSO2 provides two different solutions for monitoring data in real-time and periodically. Realtime data analysis is achieved from the WSO2 CEP Server which is capable of processing data in a little latency inside memory which based on Siddhi CEP engine. WSO2 BAM is used for analyzing data batch-wise which is using secondary storage stored data as the source and generate web based visualization user interfaces and reports based on analyzed data.
One of the main advantage of WSO2 BAM 2.0 is its inherent design to deal with scalability and extendability to achieve requirements of big data handling.

Overall Structure of WSO2 BAM

Mainly WSO2 BAM is defined with three major parts.
  1. Data Receiver
  2. Analyzer
  3. Visualizer

Data Receiver

All the data to be analyzed and visualized should be sent to the BAM via Data Receiver. Data receiver immediately store them in a Secondary storage. (Current implementation is a Cassandra big data database)
Data Receiver consists of a generic data collector component called "Data Bridge" which provides an generic data API to the external data sources. Current implementation supports Thrift protocol which is a fast binary protocol that runs on top of TCP and facilitates a very high throughput using some optimization techniques based on a concept called "Data Streams". And also Data Bridge provides a generic API for secondary storage which enables many types of external secondary storage systems to subscribe for data events. Current implementation of secondary storage is Cassandra as mentioned earlier. Therefore the Data Bridge interface of secondary storage is implemented by the "Cassandra Persistence Manager" that stores the data events pushed by the Data Bridge.


In WSO2 BAM, received data into the Cassandra database are used for processing inside the Analyzer. Data are processed as batch processes and processed data are stored into a secondary storage media. That secondary storage can either be an big data storage system or an RDMS storage system. Intermediately processed data are usually stored in a big data database and final results are usually stored in a RDMS database. This known as a Polyglot data architecture.
In current implementation Apache Hadoop engine is used as the processing engine and process queries are written on top of Apache Hive which is a SQL like language. Hive scripts are executed periodically on the given dataset as scheduled earlier.


The next part of the BAM is designed to visualize processed in two ways.
  1. In web based UI visualization elements
  2. Generate report documents
Report generation and UI visualization both are using the processed data produced by the Analyzer. In UI visualization, at present, we use both Jaggery based custom dashboards for identified specific usecases and WSO2 Gadget Server (GS) based gadgets. From the above two, generating GS based gadgets are facilitated with a Gadget Generation component in BAM, that enables even an ordinary user, to generate custom gadgets for their own usecases.

Secondary Storage

There mainly two types of secondary storage used in BAM.
  1. Big data databases
  2. RDMS databases
Cassandra is used as the big data database solution and MySQL is generally used as the RDMS solution. But in the BAM pack H2 embedded database is used as the default RDMS database.
RDMS databases are usually used for storing result datasets which are several orders smaller than the original dataset where original datasets are considered as big data datasets.

Data Agents

Data Agents are custom components designed for each data source. For example "BAM Mediator" and "Service Data Agent" are such data agents specifically designed for WSO2 ESB and WSO2 Application Server. (AS) Current API of Data Agent should implement the Data Bridge API to push data into Data Receiver. In other words Data Agents should communicate with the Data Receiver as Data Streams.

Lets discuss more deeper about WSO2 BAM 2.0 in next post.

No comments:

Post a Comment